Optimizing IoT intrusion detection with cosine similarity based dataset balancing and hybrid deep learning
Abstract With IoT networks expected to exceed 29 billion connected devices by 2030, the risk of cyberattacks has never been higher. As more devices come online, the attack surface for hackers continues to expand, making cybersecurity a pressing concern. Intrusion Detection Systems (IDS) are essentia...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-08-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-15631-3 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849226232577654784 |
|---|---|
| author | Arvind Prasad Wael Mohammad Alenazy Naved Ahmad Gauhar Ali Hanaa A. Abdallah Sadique Ahmad |
| author_facet | Arvind Prasad Wael Mohammad Alenazy Naved Ahmad Gauhar Ali Hanaa A. Abdallah Sadique Ahmad |
| author_sort | Arvind Prasad |
| collection | DOAJ |
| description | Abstract With IoT networks expected to exceed 29 billion connected devices by 2030, the risk of cyberattacks has never been higher. As more devices come online, the attack surface for hackers continues to expand, making cybersecurity a pressing concern. Intrusion Detection Systems (IDS) are essential for identifying and mitigating these threats in real-time. However, a significant challenge IDS faces is dealing with imbalanced datasets, where attack instances are significantly underrepresented compared to normal traffic. Training models on such skewed data leads to a bias toward majority-class patterns, reducing their ability to detect intrusions effectively. To address this issue, this work introduces CSMCR (Cosine Similarity-based Majority Class Reduction), a novel technique that selectively removes redundant majority-class samples while preserving dataset integrity. Unlike traditional approaches like SMOTE (oversampling) or random undersampling, CSMCR ensures that the retained majority instances remain diverse by analyzing feature-wise similarity. This prevents unnecessary data duplication and minimizes information loss. Additionally, we developed a hybrid deep learning model integrating RegNet and FBNet architectures to enhance feature extraction and classification performance. Experimental results on multiple IDS datasets confirm that balancing the dataset to a 1:1 ratio optimally prevents overfitting and improves model interpretability. The proposed model achieved an F1-score of 0.9758 on RT-IoT2022 and 0.9275 on UNSW Bot-IoT, outperforming SMOTE-based methods in accuracy and computational efficiency. Notably, CSMCR reduced training time by 53% compared to conventional oversampling techniques. Incremental training evaluations reveal that bias formation reduces performance beyond a 1:2 majority-to-minority ratio. These findings establish CSMCR as a robust, scalable, and computationally efficient IDS balancing strategy tailored for IoT network security. |
| format | Article |
| id | doaj-art-af3efe3c88fc4d9f9d596a99fbb0bf9f |
| institution | Kabale University |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-af3efe3c88fc4d9f9d596a99fbb0bf9f2025-08-24T11:31:04ZengNature PortfolioScientific Reports2045-23222025-08-0115112410.1038/s41598-025-15631-3Optimizing IoT intrusion detection with cosine similarity based dataset balancing and hybrid deep learningArvind Prasad0Wael Mohammad Alenazy1Naved Ahmad2Gauhar Ali3Hanaa A. Abdallah4Sadique Ahmad5Department of Computer Engineering & Applications, GLA UniversitySelf-Development Skills Dept.(Computer Skills), Common First Year Deanship, King Saud UniversityDepartment of Computer Science & Information Systems, College of Applied Science, AlMaarefa UniversityEIAS Data Science and Blockchain Lab, College of Computer and Information Sciences, Prince Sultan UniversityDepartment of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman UniversityEIAS Data Science and Blockchain Lab, College of Computer and Information Sciences, Prince Sultan UniversityAbstract With IoT networks expected to exceed 29 billion connected devices by 2030, the risk of cyberattacks has never been higher. As more devices come online, the attack surface for hackers continues to expand, making cybersecurity a pressing concern. Intrusion Detection Systems (IDS) are essential for identifying and mitigating these threats in real-time. However, a significant challenge IDS faces is dealing with imbalanced datasets, where attack instances are significantly underrepresented compared to normal traffic. Training models on such skewed data leads to a bias toward majority-class patterns, reducing their ability to detect intrusions effectively. To address this issue, this work introduces CSMCR (Cosine Similarity-based Majority Class Reduction), a novel technique that selectively removes redundant majority-class samples while preserving dataset integrity. Unlike traditional approaches like SMOTE (oversampling) or random undersampling, CSMCR ensures that the retained majority instances remain diverse by analyzing feature-wise similarity. This prevents unnecessary data duplication and minimizes information loss. Additionally, we developed a hybrid deep learning model integrating RegNet and FBNet architectures to enhance feature extraction and classification performance. Experimental results on multiple IDS datasets confirm that balancing the dataset to a 1:1 ratio optimally prevents overfitting and improves model interpretability. The proposed model achieved an F1-score of 0.9758 on RT-IoT2022 and 0.9275 on UNSW Bot-IoT, outperforming SMOTE-based methods in accuracy and computational efficiency. Notably, CSMCR reduced training time by 53% compared to conventional oversampling techniques. Incremental training evaluations reveal that bias formation reduces performance beyond a 1:2 majority-to-minority ratio. These findings establish CSMCR as a robust, scalable, and computationally efficient IDS balancing strategy tailored for IoT network security.https://doi.org/10.1038/s41598-025-15631-3Class imbalanceIDSIoT securityDeep learningCybersecurity |
| spellingShingle | Arvind Prasad Wael Mohammad Alenazy Naved Ahmad Gauhar Ali Hanaa A. Abdallah Sadique Ahmad Optimizing IoT intrusion detection with cosine similarity based dataset balancing and hybrid deep learning Scientific Reports Class imbalance IDS IoT security Deep learning Cybersecurity |
| title | Optimizing IoT intrusion detection with cosine similarity based dataset balancing and hybrid deep learning |
| title_full | Optimizing IoT intrusion detection with cosine similarity based dataset balancing and hybrid deep learning |
| title_fullStr | Optimizing IoT intrusion detection with cosine similarity based dataset balancing and hybrid deep learning |
| title_full_unstemmed | Optimizing IoT intrusion detection with cosine similarity based dataset balancing and hybrid deep learning |
| title_short | Optimizing IoT intrusion detection with cosine similarity based dataset balancing and hybrid deep learning |
| title_sort | optimizing iot intrusion detection with cosine similarity based dataset balancing and hybrid deep learning |
| topic | Class imbalance IDS IoT security Deep learning Cybersecurity |
| url | https://doi.org/10.1038/s41598-025-15631-3 |
| work_keys_str_mv | AT arvindprasad optimizingiotintrusiondetectionwithcosinesimilaritybaseddatasetbalancingandhybriddeeplearning AT waelmohammadalenazy optimizingiotintrusiondetectionwithcosinesimilaritybaseddatasetbalancingandhybriddeeplearning AT navedahmad optimizingiotintrusiondetectionwithcosinesimilaritybaseddatasetbalancingandhybriddeeplearning AT gauharali optimizingiotintrusiondetectionwithcosinesimilaritybaseddatasetbalancingandhybriddeeplearning AT hanaaaabdallah optimizingiotintrusiondetectionwithcosinesimilaritybaseddatasetbalancingandhybriddeeplearning AT sadiqueahmad optimizingiotintrusiondetectionwithcosinesimilaritybaseddatasetbalancingandhybriddeeplearning |