Evaluation of the Performance of Unsupervised Learning Algorithms for Intrusion Detection in Unbalanced Data Environments

This study evaluated the performance of unsupervised machine learning algorithms for intrusion detection in unbalanced data environments using the BoT-IoT dataset. Algorithms such as K-means++, DBSCAN, Local Outlier Factor (LOF), and Isolation Forest (I-forest) were analyzed using metrics like purit...

Full description

Saved in:

Bibliographic Details
Main Authors:	Gutierrez-Portela Fernando, Almenares Mendoza Florina, Calderon-Benavides Liliana
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Access
Subjects:	Intrusion detection systems unsupervised models machine learning anomaly detection metrics
Online Access:	https://ieeexplore.ieee.org/document/10794744/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849220099037200384
author	Gutierrez-Portela Fernando Almenares Mendoza Florina Calderon-Benavides Liliana
author_facet	Gutierrez-Portela Fernando Almenares Mendoza Florina Calderon-Benavides Liliana
author_sort	Gutierrez-Portela Fernando
collection	DOAJ
description	This study evaluated the performance of unsupervised machine learning algorithms for intrusion detection in unbalanced data environments using the BoT-IoT dataset. Algorithms such as K-means++, DBSCAN, Local Outlier Factor (LOF), and Isolation Forest (I-forest) were analyzed using metrics like purity, homogeneity, completeness, V-measure, and adjusted mutual information to assess their effectiveness in detecting attacks such as DDoS, DoS, and reconnaissance. Optimal cluster selection methods were also explored, and principal component analysis (PCA) was applied to explain data variability. Results showed that K-means++ achieved 95% purity with 95% and 99% prediction accuracies for normal and abnormal data, respectively, while I-forest delivered similar results and excelled in computational efficiency, consuming only 10% of CPU resources compared to 16% for other algorithms. These findings highlight I-forest’s effectiveness and efficiency in intrusion detection, offering a viable solution for cybersecurity environments with limited resources and significant data imbalance.
format	Article
id	doaj-art-aa381329d9c947e3b78ced77e8ef81c0
institution	Kabale University
issn	2169-3536
language	English
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-aa381329d9c947e3b78ced77e8ef81c02024-12-20T00:00:47ZengIEEEIEEE Access2169-35362024-01-011219013419015710.1109/ACCESS.2024.351661510794744Evaluation of the Performance of Unsupervised Learning Algorithms for Intrusion Detection in Unbalanced Data EnvironmentsGutierrez-Portela Fernando0https://orcid.org/0000-0003-3722-3809Almenares Mendoza Florina1Calderon-Benavides Liliana2https://orcid.org/0000-0001-8658-9036Aqua Research Group, Cooperative University of Colombia, Ibagué, ColombiaDepartment of Telematics Engineering, Universidad Carlos III de Madrid (UC3M), Madrid, SpainInformation Technologies Academic Unit, Autonomous University of Bucaramanga, Bucaramanga, ColombiaThis study evaluated the performance of unsupervised machine learning algorithms for intrusion detection in unbalanced data environments using the BoT-IoT dataset. Algorithms such as K-means++, DBSCAN, Local Outlier Factor (LOF), and Isolation Forest (I-forest) were analyzed using metrics like purity, homogeneity, completeness, V-measure, and adjusted mutual information to assess their effectiveness in detecting attacks such as DDoS, DoS, and reconnaissance. Optimal cluster selection methods were also explored, and principal component analysis (PCA) was applied to explain data variability. Results showed that K-means++ achieved 95% purity with 95% and 99% prediction accuracies for normal and abnormal data, respectively, while I-forest delivered similar results and excelled in computational efficiency, consuming only 10% of CPU resources compared to 16% for other algorithms. These findings highlight I-forest’s effectiveness and efficiency in intrusion detection, offering a viable solution for cybersecurity environments with limited resources and significant data imbalance.https://ieeexplore.ieee.org/document/10794744/Intrusion detection systemsunsupervised modelsmachine learninganomaly detectionmetrics
spellingShingle	Gutierrez-Portela Fernando Almenares Mendoza Florina Calderon-Benavides Liliana Evaluation of the Performance of Unsupervised Learning Algorithms for Intrusion Detection in Unbalanced Data Environments IEEE Access Intrusion detection systems unsupervised models machine learning anomaly detection metrics
title	Evaluation of the Performance of Unsupervised Learning Algorithms for Intrusion Detection in Unbalanced Data Environments
title_full	Evaluation of the Performance of Unsupervised Learning Algorithms for Intrusion Detection in Unbalanced Data Environments
title_fullStr	Evaluation of the Performance of Unsupervised Learning Algorithms for Intrusion Detection in Unbalanced Data Environments
title_full_unstemmed	Evaluation of the Performance of Unsupervised Learning Algorithms for Intrusion Detection in Unbalanced Data Environments
title_short	Evaluation of the Performance of Unsupervised Learning Algorithms for Intrusion Detection in Unbalanced Data Environments
title_sort	evaluation of the performance of unsupervised learning algorithms for intrusion detection in unbalanced data environments
topic	Intrusion detection systems unsupervised models machine learning anomaly detection metrics
url	https://ieeexplore.ieee.org/document/10794744/
work_keys_str_mv	AT gutierrezportelafernando evaluationoftheperformanceofunsupervisedlearningalgorithmsforintrusiondetectioninunbalanceddataenvironments AT almenaresmendozaflorina evaluationoftheperformanceofunsupervisedlearningalgorithmsforintrusiondetectioninunbalanceddataenvironments AT calderonbenavidesliliana evaluationoftheperformanceofunsupervisedlearningalgorithmsforintrusiondetectioninunbalanceddataenvironments

Evaluation of the Performance of Unsupervised Learning Algorithms for Intrusion Detection in Unbalanced Data Environments

Similar Items