Simulation Study to Identify Factors Affecting the Performance of LSTM and XGBoost for Anomaly Detection on Labeled Time Series Data

Time series analysis has evolved to include forecasting and anomaly detection, which can be applied in various fields. Machine learning methods, such as long short-term memory (LSTM) and extreme gradient boosting (XGBoost), are widely developed because they are considered superior to conventional me...

Full description

Saved in:
Bibliographic Details
Main Authors: Muhammad Rizky Nurhambali, Yenni Angraini, Anwar Fitrianto
Format: Article
Language:Indonesian
Published: Universitas Muhammadiyah Purwokerto 2025-08-01
Series:Jurnal Informatika
Subjects:
Online Access:http://jurnalnasional.ump.ac.id/index.php/JUITA/article/view/26604
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Time series analysis has evolved to include forecasting and anomaly detection, which can be applied in various fields. Machine learning methods, such as long short-term memory (LSTM) and extreme gradient boosting (XGBoost), are widely developed because they are considered superior to conventional methods. Both use a forecasting approach for anomaly detection. However, the limitations of both methods on anomalies, such as data length, labeling method, and number of anomalies have not been explored. Therefore, this study aims to identify factors that affect the performance of LSTM and XGBoost in forecasting and anomaly detection through various scenarios and compare their metrics evaluation. The study utilizes Jakarta's air quality index data for 2018–2023, which was preprocessed and augmented for simulation purposes. The study shows that the LSTM method is superior to XGBoost, as shown by the lower MAPE (14.7024%), lower RMSE (13.9909), and higher balanced accuracy (0.9935). These results are reinforced by the significant Mann-Whitney test between the two methods, indicating a difference in the method's accuracy. In addition, the Kruskal-Wallis test for each combination of method and treatment showed significant results. These results indicate that data length, labeling method, and number of anomalies affect the method's accuracy
ISSN:2086-9398
2579-8901