Optimizing IoT intrusion detection with cosine similarity based dataset balancing and hybrid deep learning

Abstract With IoT networks expected to exceed 29 billion connected devices by 2030, the risk of cyberattacks has never been higher. As more devices come online, the attack surface for hackers continues to expand, making cybersecurity a pressing concern. Intrusion Detection Systems (IDS) are essentia...

Full description

Saved in:
Bibliographic Details
Main Authors: Arvind Prasad, Wael Mohammad Alenazy, Naved Ahmad, Gauhar Ali, Hanaa A. Abdallah, Sadique Ahmad
Format: Article
Language:English
Published: Nature Portfolio 2025-08-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-15631-3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849226232577654784
author Arvind Prasad
Wael Mohammad Alenazy
Naved Ahmad
Gauhar Ali
Hanaa A. Abdallah
Sadique Ahmad
author_facet Arvind Prasad
Wael Mohammad Alenazy
Naved Ahmad
Gauhar Ali
Hanaa A. Abdallah
Sadique Ahmad
author_sort Arvind Prasad
collection DOAJ
description Abstract With IoT networks expected to exceed 29 billion connected devices by 2030, the risk of cyberattacks has never been higher. As more devices come online, the attack surface for hackers continues to expand, making cybersecurity a pressing concern. Intrusion Detection Systems (IDS) are essential for identifying and mitigating these threats in real-time. However, a significant challenge IDS faces is dealing with imbalanced datasets, where attack instances are significantly underrepresented compared to normal traffic. Training models on such skewed data leads to a bias toward majority-class patterns, reducing their ability to detect intrusions effectively. To address this issue, this work introduces CSMCR (Cosine Similarity-based Majority Class Reduction), a novel technique that selectively removes redundant majority-class samples while preserving dataset integrity. Unlike traditional approaches like SMOTE (oversampling) or random undersampling, CSMCR ensures that the retained majority instances remain diverse by analyzing feature-wise similarity. This prevents unnecessary data duplication and minimizes information loss. Additionally, we developed a hybrid deep learning model integrating RegNet and FBNet architectures to enhance feature extraction and classification performance. Experimental results on multiple IDS datasets confirm that balancing the dataset to a 1:1 ratio optimally prevents overfitting and improves model interpretability. The proposed model achieved an F1-score of 0.9758 on RT-IoT2022 and 0.9275 on UNSW Bot-IoT, outperforming SMOTE-based methods in accuracy and computational efficiency. Notably, CSMCR reduced training time by 53% compared to conventional oversampling techniques. Incremental training evaluations reveal that bias formation reduces performance beyond a 1:2 majority-to-minority ratio. These findings establish CSMCR as a robust, scalable, and computationally efficient IDS balancing strategy tailored for IoT network security.
format Article
id doaj-art-af3efe3c88fc4d9f9d596a99fbb0bf9f
institution Kabale University
issn 2045-2322
language English
publishDate 2025-08-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-af3efe3c88fc4d9f9d596a99fbb0bf9f2025-08-24T11:31:04ZengNature PortfolioScientific Reports2045-23222025-08-0115112410.1038/s41598-025-15631-3Optimizing IoT intrusion detection with cosine similarity based dataset balancing and hybrid deep learningArvind Prasad0Wael Mohammad Alenazy1Naved Ahmad2Gauhar Ali3Hanaa A. Abdallah4Sadique Ahmad5Department of Computer Engineering & Applications, GLA UniversitySelf-Development Skills Dept.(Computer Skills), Common First Year Deanship, King Saud UniversityDepartment of Computer Science & Information Systems, College of Applied Science, AlMaarefa UniversityEIAS Data Science and Blockchain Lab, College of Computer and Information Sciences, Prince Sultan UniversityDepartment of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman UniversityEIAS Data Science and Blockchain Lab, College of Computer and Information Sciences, Prince Sultan UniversityAbstract With IoT networks expected to exceed 29 billion connected devices by 2030, the risk of cyberattacks has never been higher. As more devices come online, the attack surface for hackers continues to expand, making cybersecurity a pressing concern. Intrusion Detection Systems (IDS) are essential for identifying and mitigating these threats in real-time. However, a significant challenge IDS faces is dealing with imbalanced datasets, where attack instances are significantly underrepresented compared to normal traffic. Training models on such skewed data leads to a bias toward majority-class patterns, reducing their ability to detect intrusions effectively. To address this issue, this work introduces CSMCR (Cosine Similarity-based Majority Class Reduction), a novel technique that selectively removes redundant majority-class samples while preserving dataset integrity. Unlike traditional approaches like SMOTE (oversampling) or random undersampling, CSMCR ensures that the retained majority instances remain diverse by analyzing feature-wise similarity. This prevents unnecessary data duplication and minimizes information loss. Additionally, we developed a hybrid deep learning model integrating RegNet and FBNet architectures to enhance feature extraction and classification performance. Experimental results on multiple IDS datasets confirm that balancing the dataset to a 1:1 ratio optimally prevents overfitting and improves model interpretability. The proposed model achieved an F1-score of 0.9758 on RT-IoT2022 and 0.9275 on UNSW Bot-IoT, outperforming SMOTE-based methods in accuracy and computational efficiency. Notably, CSMCR reduced training time by 53% compared to conventional oversampling techniques. Incremental training evaluations reveal that bias formation reduces performance beyond a 1:2 majority-to-minority ratio. These findings establish CSMCR as a robust, scalable, and computationally efficient IDS balancing strategy tailored for IoT network security.https://doi.org/10.1038/s41598-025-15631-3Class imbalanceIDSIoT securityDeep learningCybersecurity
spellingShingle Arvind Prasad
Wael Mohammad Alenazy
Naved Ahmad
Gauhar Ali
Hanaa A. Abdallah
Sadique Ahmad
Optimizing IoT intrusion detection with cosine similarity based dataset balancing and hybrid deep learning
Scientific Reports
Class imbalance
IDS
IoT security
Deep learning
Cybersecurity
title Optimizing IoT intrusion detection with cosine similarity based dataset balancing and hybrid deep learning
title_full Optimizing IoT intrusion detection with cosine similarity based dataset balancing and hybrid deep learning
title_fullStr Optimizing IoT intrusion detection with cosine similarity based dataset balancing and hybrid deep learning
title_full_unstemmed Optimizing IoT intrusion detection with cosine similarity based dataset balancing and hybrid deep learning
title_short Optimizing IoT intrusion detection with cosine similarity based dataset balancing and hybrid deep learning
title_sort optimizing iot intrusion detection with cosine similarity based dataset balancing and hybrid deep learning
topic Class imbalance
IDS
IoT security
Deep learning
Cybersecurity
url https://doi.org/10.1038/s41598-025-15631-3
work_keys_str_mv AT arvindprasad optimizingiotintrusiondetectionwithcosinesimilaritybaseddatasetbalancingandhybriddeeplearning
AT waelmohammadalenazy optimizingiotintrusiondetectionwithcosinesimilaritybaseddatasetbalancingandhybriddeeplearning
AT navedahmad optimizingiotintrusiondetectionwithcosinesimilaritybaseddatasetbalancingandhybriddeeplearning
AT gauharali optimizingiotintrusiondetectionwithcosinesimilaritybaseddatasetbalancingandhybriddeeplearning
AT hanaaaabdallah optimizingiotintrusiondetectionwithcosinesimilaritybaseddatasetbalancingandhybriddeeplearning
AT sadiqueahmad optimizingiotintrusiondetectionwithcosinesimilaritybaseddatasetbalancingandhybriddeeplearning