The Choice of Training Data and the Generalizability of Machine Learning Models for Network Intrusion Detection Systems
Network Intrusion Detection Systems (NIDS) driven by Machine Learning (ML) algorithms are usually trained using publicly available datasets consisting of labeled traffic samples, where labels refer to traffic classes, usually one benign and multiple harmful. This paper studies the generalizability o...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-07-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/15/8466 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Network Intrusion Detection Systems (NIDS) driven by Machine Learning (ML) algorithms are usually trained using publicly available datasets consisting of labeled traffic samples, where labels refer to traffic classes, usually one benign and multiple harmful. This paper studies the generalizability of models trained on such datasets. This issue is crucial given the application of such a model to actual internet traffic because high-performance measures obtained on datasets do not necessarily imply similar efficiency on the real traffic. We propose a procedure consisting of cross-validation using various sets sharing some standard traffic classes combined with the t-SNE visualization. We apply it to investigate four well-known and widely used datasets: UNSW-NB15, CIC-CSE-IDS2018, BoT-IoT, and ToN-IoT. Our investigation reveals that the high accuracy of a model obtained on one set used for training is reproducible on others only to a limited extent. Moreover, benign traffic classes’ generalizability differs from harmful traffic. Given its application in the actual network environment, it implies that one needs to select the data used to train the ML model carefully to determine to what extent the classes present in the dataset used for training are similar to those in the real target traffic environment. On the other hand, merging datasets may result in more exhaustive data collection, consisting of a more diverse spectrum of training samples. |
|---|---|
| ISSN: | 2076-3417 |