EM-AUC: A Novel Algorithm for Evaluating Anomaly Based Network Intrusion Detection Systems

Effective network intrusion detection using anomaly scores from unsupervised machine learning models depends on the performance of the models. Although unsupervised models do not require labels during the training and testing phases, the assessment of their performance metrics during the evaluation...

Full description

Saved in:
Bibliographic Details
Main Authors: Kevin Z. Bai, John M. Fossaceca
Format: Article
Language:English
Published: MDPI AG 2024-12-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/1/78
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841548897552433152
author Kevin Z. Bai
John M. Fossaceca
author_facet Kevin Z. Bai
John M. Fossaceca
author_sort Kevin Z. Bai
collection DOAJ
description Effective network intrusion detection using anomaly scores from unsupervised machine learning models depends on the performance of the models. Although unsupervised models do not require labels during the training and testing phases, the assessment of their performance metrics during the evaluation phase still requires comparing anomaly scores against labels. In real-world scenarios, the absence of labels in massive network datasets makes it infeasible to calculate performance metrics. Therefore, it is valuable to develop an algorithm that calculates robust performance metrics without using labels. In this paper, we propose a novel algorithm, Expectation Maximization-Area Under the Curve (EM-AUC), to derive the Area Under the ROC Curve (AUC-ROC) and the Area Under the Precision-Recall Curve (AUC-PR) by treating the unavailable labels as missing data and replacing them through their posterior probabilities. This algorithm was applied to two network intrusion datasets, yielding robust results. To the best of our knowledge, this is the first time AUC-ROC and AUC-PR, derived without labels, have been used to evaluate network intrusion detection systems. The EM-AUC algorithm enables model training, testing, and performance evaluation to proceed without comprehensive labels, offering a cost-effective and scalable solution for selecting the most effective models for network intrusion detection.
format Article
id doaj-art-141e22ffced24c83806f3872ea6cd58c
institution Kabale University
issn 1424-8220
language English
publishDate 2024-12-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj-art-141e22ffced24c83806f3872ea6cd58c2025-01-10T13:20:48ZengMDPI AGSensors1424-82202024-12-012517810.3390/s25010078EM-AUC: A Novel Algorithm for Evaluating Anomaly Based Network Intrusion Detection SystemsKevin Z. Bai0John M. Fossaceca1Independent Researcher, Westwood, MA 02090, USADepartment of Engineering Management and Systems Engineering, George Washington University, Washington, DC 20052, USAEffective network intrusion detection using anomaly scores from unsupervised machine learning models depends on the performance of the models. Although unsupervised models do not require labels during the training and testing phases, the assessment of their performance metrics during the evaluation phase still requires comparing anomaly scores against labels. In real-world scenarios, the absence of labels in massive network datasets makes it infeasible to calculate performance metrics. Therefore, it is valuable to develop an algorithm that calculates robust performance metrics without using labels. In this paper, we propose a novel algorithm, Expectation Maximization-Area Under the Curve (EM-AUC), to derive the Area Under the ROC Curve (AUC-ROC) and the Area Under the Precision-Recall Curve (AUC-PR) by treating the unavailable labels as missing data and replacing them through their posterior probabilities. This algorithm was applied to two network intrusion datasets, yielding robust results. To the best of our knowledge, this is the first time AUC-ROC and AUC-PR, derived without labels, have been used to evaluate network intrusion detection systems. The EM-AUC algorithm enables model training, testing, and performance evaluation to proceed without comprehensive labels, offering a cost-effective and scalable solution for selecting the most effective models for network intrusion detection.https://www.mdpi.com/1424-8220/25/1/78network intrusion detectionunsupervised machine learning modelsEM-AUC algorithmmissing data inferenceArea Under the Roc CurveArea Under the Precision-Recall Curve
spellingShingle Kevin Z. Bai
John M. Fossaceca
EM-AUC: A Novel Algorithm for Evaluating Anomaly Based Network Intrusion Detection Systems
Sensors
network intrusion detection
unsupervised machine learning models
EM-AUC algorithm
missing data inference
Area Under the Roc Curve
Area Under the Precision-Recall Curve
title EM-AUC: A Novel Algorithm for Evaluating Anomaly Based Network Intrusion Detection Systems
title_full EM-AUC: A Novel Algorithm for Evaluating Anomaly Based Network Intrusion Detection Systems
title_fullStr EM-AUC: A Novel Algorithm for Evaluating Anomaly Based Network Intrusion Detection Systems
title_full_unstemmed EM-AUC: A Novel Algorithm for Evaluating Anomaly Based Network Intrusion Detection Systems
title_short EM-AUC: A Novel Algorithm for Evaluating Anomaly Based Network Intrusion Detection Systems
title_sort em auc a novel algorithm for evaluating anomaly based network intrusion detection systems
topic network intrusion detection
unsupervised machine learning models
EM-AUC algorithm
missing data inference
Area Under the Roc Curve
Area Under the Precision-Recall Curve
url https://www.mdpi.com/1424-8220/25/1/78
work_keys_str_mv AT kevinzbai emaucanovelalgorithmforevaluatinganomalybasednetworkintrusiondetectionsystems
AT johnmfossaceca emaucanovelalgorithmforevaluatinganomalybasednetworkintrusiondetectionsystems