Evaluation of the Performance of Unsupervised Learning Algorithms for Intrusion Detection in Unbalanced Data Environments
This study evaluated the performance of unsupervised machine learning algorithms for intrusion detection in unbalanced data environments using the BoT-IoT dataset. Algorithms such as K-means++, DBSCAN, Local Outlier Factor (LOF), and Isolation Forest (I-forest) were analyzed using metrics like purit...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10794744/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849220099037200384 |
|---|---|
| author | Gutierrez-Portela Fernando Almenares Mendoza Florina Calderon-Benavides Liliana |
| author_facet | Gutierrez-Portela Fernando Almenares Mendoza Florina Calderon-Benavides Liliana |
| author_sort | Gutierrez-Portela Fernando |
| collection | DOAJ |
| description | This study evaluated the performance of unsupervised machine learning algorithms for intrusion detection in unbalanced data environments using the BoT-IoT dataset. Algorithms such as K-means++, DBSCAN, Local Outlier Factor (LOF), and Isolation Forest (I-forest) were analyzed using metrics like purity, homogeneity, completeness, V-measure, and adjusted mutual information to assess their effectiveness in detecting attacks such as DDoS, DoS, and reconnaissance. Optimal cluster selection methods were also explored, and principal component analysis (PCA) was applied to explain data variability. Results showed that K-means++ achieved 95% purity with 95% and 99% prediction accuracies for normal and abnormal data, respectively, while I-forest delivered similar results and excelled in computational efficiency, consuming only 10% of CPU resources compared to 16% for other algorithms. These findings highlight I-forest’s effectiveness and efficiency in intrusion detection, offering a viable solution for cybersecurity environments with limited resources and significant data imbalance. |
| format | Article |
| id | doaj-art-aa381329d9c947e3b78ced77e8ef81c0 |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2024-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-aa381329d9c947e3b78ced77e8ef81c02024-12-20T00:00:47ZengIEEEIEEE Access2169-35362024-01-011219013419015710.1109/ACCESS.2024.351661510794744Evaluation of the Performance of Unsupervised Learning Algorithms for Intrusion Detection in Unbalanced Data EnvironmentsGutierrez-Portela Fernando0https://orcid.org/0000-0003-3722-3809Almenares Mendoza Florina1Calderon-Benavides Liliana2https://orcid.org/0000-0001-8658-9036Aqua Research Group, Cooperative University of Colombia, Ibagué, ColombiaDepartment of Telematics Engineering, Universidad Carlos III de Madrid (UC3M), Madrid, SpainInformation Technologies Academic Unit, Autonomous University of Bucaramanga, Bucaramanga, ColombiaThis study evaluated the performance of unsupervised machine learning algorithms for intrusion detection in unbalanced data environments using the BoT-IoT dataset. Algorithms such as K-means++, DBSCAN, Local Outlier Factor (LOF), and Isolation Forest (I-forest) were analyzed using metrics like purity, homogeneity, completeness, V-measure, and adjusted mutual information to assess their effectiveness in detecting attacks such as DDoS, DoS, and reconnaissance. Optimal cluster selection methods were also explored, and principal component analysis (PCA) was applied to explain data variability. Results showed that K-means++ achieved 95% purity with 95% and 99% prediction accuracies for normal and abnormal data, respectively, while I-forest delivered similar results and excelled in computational efficiency, consuming only 10% of CPU resources compared to 16% for other algorithms. These findings highlight I-forest’s effectiveness and efficiency in intrusion detection, offering a viable solution for cybersecurity environments with limited resources and significant data imbalance.https://ieeexplore.ieee.org/document/10794744/Intrusion detection systemsunsupervised modelsmachine learninganomaly detectionmetrics |
| spellingShingle | Gutierrez-Portela Fernando Almenares Mendoza Florina Calderon-Benavides Liliana Evaluation of the Performance of Unsupervised Learning Algorithms for Intrusion Detection in Unbalanced Data Environments IEEE Access Intrusion detection systems unsupervised models machine learning anomaly detection metrics |
| title | Evaluation of the Performance of Unsupervised Learning Algorithms for Intrusion Detection in Unbalanced Data Environments |
| title_full | Evaluation of the Performance of Unsupervised Learning Algorithms for Intrusion Detection in Unbalanced Data Environments |
| title_fullStr | Evaluation of the Performance of Unsupervised Learning Algorithms for Intrusion Detection in Unbalanced Data Environments |
| title_full_unstemmed | Evaluation of the Performance of Unsupervised Learning Algorithms for Intrusion Detection in Unbalanced Data Environments |
| title_short | Evaluation of the Performance of Unsupervised Learning Algorithms for Intrusion Detection in Unbalanced Data Environments |
| title_sort | evaluation of the performance of unsupervised learning algorithms for intrusion detection in unbalanced data environments |
| topic | Intrusion detection systems unsupervised models machine learning anomaly detection metrics |
| url | https://ieeexplore.ieee.org/document/10794744/ |
| work_keys_str_mv | AT gutierrezportelafernando evaluationoftheperformanceofunsupervisedlearningalgorithmsforintrusiondetectioninunbalanceddataenvironments AT almenaresmendozaflorina evaluationoftheperformanceofunsupervisedlearningalgorithmsforintrusiondetectioninunbalanceddataenvironments AT calderonbenavidesliliana evaluationoftheperformanceofunsupervisedlearningalgorithmsforintrusiondetectioninunbalanceddataenvironments |