Simulation Study to Identify Factors Affecting the Performance of LSTM and XGBoost for Anomaly Detection on Labeled Time Series Data
Time series analysis has evolved to include forecasting and anomaly detection, which can be applied in various fields. Machine learning methods, such as long short-term memory (LSTM) and extreme gradient boosting (XGBoost), are widely developed because they are considered superior to conventional me...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | Indonesian |
| Published: |
Universitas Muhammadiyah Purwokerto
2025-08-01
|
| Series: | Jurnal Informatika |
| Subjects: | |
| Online Access: | http://jurnalnasional.ump.ac.id/index.php/JUITA/article/view/26604 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Time series analysis has evolved to include forecasting and anomaly detection, which can be applied in various fields. Machine learning methods, such as long short-term memory (LSTM) and extreme gradient boosting (XGBoost), are widely developed because they are considered superior to conventional methods. Both use a forecasting approach for anomaly detection. However, the limitations of both methods on anomalies, such as data length, labeling method, and number of anomalies have not been explored. Therefore, this study aims to identify factors that affect the performance of LSTM and XGBoost in forecasting and anomaly detection through various scenarios and compare their metrics evaluation. The study utilizes Jakarta's air quality index data for 2018–2023, which was preprocessed and augmented for simulation purposes. The study shows that the LSTM method is superior to XGBoost, as shown by the lower MAPE (14.7024%), lower RMSE (13.9909), and higher balanced accuracy (0.9935). These results are reinforced by the significant Mann-Whitney test between the two methods, indicating a difference in the method's accuracy. In addition, the Kruskal-Wallis test for each combination of method and treatment showed significant results. These results indicate that data length, labeling method, and number of anomalies affect the method's accuracy |
|---|---|
| ISSN: | 2086-9398 2579-8901 |