Evaluation of the Performance of Unsupervised Learning Algorithms for Intrusion Detection in Unbalanced Data Environments

This study evaluated the performance of unsupervised machine learning algorithms for intrusion detection in unbalanced data environments using the BoT-IoT dataset. Algorithms such as K-means++, DBSCAN, Local Outlier Factor (LOF), and Isolation Forest (I-forest) were analyzed using metrics like purit...

Full description

Saved in:
Bibliographic Details
Main Authors: Gutierrez-Portela Fernando, Almenares Mendoza Florina, Calderon-Benavides Liliana
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10794744/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849220099037200384
author Gutierrez-Portela Fernando
Almenares Mendoza Florina
Calderon-Benavides Liliana
author_facet Gutierrez-Portela Fernando
Almenares Mendoza Florina
Calderon-Benavides Liliana
author_sort Gutierrez-Portela Fernando
collection DOAJ
description This study evaluated the performance of unsupervised machine learning algorithms for intrusion detection in unbalanced data environments using the BoT-IoT dataset. Algorithms such as K-means++, DBSCAN, Local Outlier Factor (LOF), and Isolation Forest (I-forest) were analyzed using metrics like purity, homogeneity, completeness, V-measure, and adjusted mutual information to assess their effectiveness in detecting attacks such as DDoS, DoS, and reconnaissance. Optimal cluster selection methods were also explored, and principal component analysis (PCA) was applied to explain data variability. Results showed that K-means++ achieved 95% purity with 95% and 99% prediction accuracies for normal and abnormal data, respectively, while I-forest delivered similar results and excelled in computational efficiency, consuming only 10% of CPU resources compared to 16% for other algorithms. These findings highlight I-forest’s effectiveness and efficiency in intrusion detection, offering a viable solution for cybersecurity environments with limited resources and significant data imbalance.
format Article
id doaj-art-aa381329d9c947e3b78ced77e8ef81c0
institution Kabale University
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-aa381329d9c947e3b78ced77e8ef81c02024-12-20T00:00:47ZengIEEEIEEE Access2169-35362024-01-011219013419015710.1109/ACCESS.2024.351661510794744Evaluation of the Performance of Unsupervised Learning Algorithms for Intrusion Detection in Unbalanced Data EnvironmentsGutierrez-Portela Fernando0https://orcid.org/0000-0003-3722-3809Almenares Mendoza Florina1Calderon-Benavides Liliana2https://orcid.org/0000-0001-8658-9036Aqua Research Group, Cooperative University of Colombia, Ibagué, ColombiaDepartment of Telematics Engineering, Universidad Carlos III de Madrid (UC3M), Madrid, SpainInformation Technologies Academic Unit, Autonomous University of Bucaramanga, Bucaramanga, ColombiaThis study evaluated the performance of unsupervised machine learning algorithms for intrusion detection in unbalanced data environments using the BoT-IoT dataset. Algorithms such as K-means++, DBSCAN, Local Outlier Factor (LOF), and Isolation Forest (I-forest) were analyzed using metrics like purity, homogeneity, completeness, V-measure, and adjusted mutual information to assess their effectiveness in detecting attacks such as DDoS, DoS, and reconnaissance. Optimal cluster selection methods were also explored, and principal component analysis (PCA) was applied to explain data variability. Results showed that K-means++ achieved 95% purity with 95% and 99% prediction accuracies for normal and abnormal data, respectively, while I-forest delivered similar results and excelled in computational efficiency, consuming only 10% of CPU resources compared to 16% for other algorithms. These findings highlight I-forest’s effectiveness and efficiency in intrusion detection, offering a viable solution for cybersecurity environments with limited resources and significant data imbalance.https://ieeexplore.ieee.org/document/10794744/Intrusion detection systemsunsupervised modelsmachine learninganomaly detectionmetrics
spellingShingle Gutierrez-Portela Fernando
Almenares Mendoza Florina
Calderon-Benavides Liliana
Evaluation of the Performance of Unsupervised Learning Algorithms for Intrusion Detection in Unbalanced Data Environments
IEEE Access
Intrusion detection systems
unsupervised models
machine learning
anomaly detection
metrics
title Evaluation of the Performance of Unsupervised Learning Algorithms for Intrusion Detection in Unbalanced Data Environments
title_full Evaluation of the Performance of Unsupervised Learning Algorithms for Intrusion Detection in Unbalanced Data Environments
title_fullStr Evaluation of the Performance of Unsupervised Learning Algorithms for Intrusion Detection in Unbalanced Data Environments
title_full_unstemmed Evaluation of the Performance of Unsupervised Learning Algorithms for Intrusion Detection in Unbalanced Data Environments
title_short Evaluation of the Performance of Unsupervised Learning Algorithms for Intrusion Detection in Unbalanced Data Environments
title_sort evaluation of the performance of unsupervised learning algorithms for intrusion detection in unbalanced data environments
topic Intrusion detection systems
unsupervised models
machine learning
anomaly detection
metrics
url https://ieeexplore.ieee.org/document/10794744/
work_keys_str_mv AT gutierrezportelafernando evaluationoftheperformanceofunsupervisedlearningalgorithmsforintrusiondetectioninunbalanceddataenvironments
AT almenaresmendozaflorina evaluationoftheperformanceofunsupervisedlearningalgorithmsforintrusiondetectioninunbalanceddataenvironments
AT calderonbenavidesliliana evaluationoftheperformanceofunsupervisedlearningalgorithmsforintrusiondetectioninunbalanceddataenvironments