Anomaly-based threat detection in smart health using machine learning

Abstract Background Anomaly detection is crucial in healthcare data due to challenges associated with the integration of smart technologies and healthcare. Anomaly in electronic health record can be associated with an insider trying to access and manipulate the data. This article focuses around the...

Full description

Saved in:
Bibliographic Details
Main Authors: Muntaha Tabassum, Saba Mahmood, Amal Bukhari, Bader Alshemaimri, Ali Daud, Fatima Khalique
Format: Article
Language:English
Published: BMC 2024-11-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-024-02760-4
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846158484747321344
author Muntaha Tabassum
Saba Mahmood
Amal Bukhari
Bader Alshemaimri
Ali Daud
Fatima Khalique
author_facet Muntaha Tabassum
Saba Mahmood
Amal Bukhari
Bader Alshemaimri
Ali Daud
Fatima Khalique
author_sort Muntaha Tabassum
collection DOAJ
description Abstract Background Anomaly detection is crucial in healthcare data due to challenges associated with the integration of smart technologies and healthcare. Anomaly in electronic health record can be associated with an insider trying to access and manipulate the data. This article focuses around the anomalies under different contexts. Methodology This research has proposed methodology to secure Electronic Health Records (EHRs) within a complex environment. We have employed a systematic approach encompassing data preprocessing, labeling, modeling, and evaluation. Anomalies are not labelled thus a mechanism is required that predicts them with greater accuracy and less false positive results. This research utilized unsupervised machine learning algorithms that includes Isolation Forest and Local Outlier Factor clustering algorithms. By calculating anomaly scores and validating clustering through metrics like the Silhouette Score and Dunn Score, we enhanced the capacity to secure sensitive healthcare data evolving digital threats. Three variations of Isolation Forest (IForest)models (SVM, Decision Tree, and Random Forest) and three variations of Local Outlier Factor (LOF) models (SVM, Decision Tree, and Random Forest) are evaluated based on accuracy, sensitivity, specificity, and F1 Score. Results Isolation Forest SVM achieves the highest accuracy of 99.21%, high sensitivity (99.75%) and specificity (99.32%), and a commendable F1 Score of 98.72%. The Isolation Forest Decision Tree also performs well with an accuracy of 98.92% and an F1 Score of 99.35%. However, the Isolation Forest Random Forest exhibits lower specificity (72.84%) than the other models. Conclusion The experimental results reveal that Isolation Forest SVM emerges as the top performer showcasing the effectiveness of these models in anomaly detection tasks. The proposed methodology utilizing isolation forest and SVM produced better results by detecting anomalies with less false positives in this specific EHR of a hospital in North England. Furthermore the proposal is also able to identify new contextual anomalies that were not identified in the baseline methodology.
format Article
id doaj-art-d88a24f4b113479298aafdc626462e15
institution Kabale University
issn 1472-6947
language English
publishDate 2024-11-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj-art-d88a24f4b113479298aafdc626462e152024-11-24T12:28:59ZengBMCBMC Medical Informatics and Decision Making1472-69472024-11-0124111910.1186/s12911-024-02760-4Anomaly-based threat detection in smart health using machine learningMuntaha Tabassum0Saba Mahmood1Amal Bukhari2Bader Alshemaimri3Ali Daud4Fatima Khalique5Department of Computer Science, Bahria UniversityDepartment of Computer Science, Bahria UniversityDepartment of Information Systems and Technology, Collage of Computer Science and Engineering, University of JeddahSoftware Engineering Department, College of Computing and Information Sciences, King Saud UniversityFaculty of Resilience, Rabdan AcademyCentre of Excellence in Artificial Intelligence COE-AI, Bahria UniversityAbstract Background Anomaly detection is crucial in healthcare data due to challenges associated with the integration of smart technologies and healthcare. Anomaly in electronic health record can be associated with an insider trying to access and manipulate the data. This article focuses around the anomalies under different contexts. Methodology This research has proposed methodology to secure Electronic Health Records (EHRs) within a complex environment. We have employed a systematic approach encompassing data preprocessing, labeling, modeling, and evaluation. Anomalies are not labelled thus a mechanism is required that predicts them with greater accuracy and less false positive results. This research utilized unsupervised machine learning algorithms that includes Isolation Forest and Local Outlier Factor clustering algorithms. By calculating anomaly scores and validating clustering through metrics like the Silhouette Score and Dunn Score, we enhanced the capacity to secure sensitive healthcare data evolving digital threats. Three variations of Isolation Forest (IForest)models (SVM, Decision Tree, and Random Forest) and three variations of Local Outlier Factor (LOF) models (SVM, Decision Tree, and Random Forest) are evaluated based on accuracy, sensitivity, specificity, and F1 Score. Results Isolation Forest SVM achieves the highest accuracy of 99.21%, high sensitivity (99.75%) and specificity (99.32%), and a commendable F1 Score of 98.72%. The Isolation Forest Decision Tree also performs well with an accuracy of 98.92% and an F1 Score of 99.35%. However, the Isolation Forest Random Forest exhibits lower specificity (72.84%) than the other models. Conclusion The experimental results reveal that Isolation Forest SVM emerges as the top performer showcasing the effectiveness of these models in anomaly detection tasks. The proposed methodology utilizing isolation forest and SVM produced better results by detecting anomalies with less false positives in this specific EHR of a hospital in North England. Furthermore the proposal is also able to identify new contextual anomalies that were not identified in the baseline methodology.https://doi.org/10.1186/s12911-024-02760-4HealthcareAnomaly detectionInsider threatsElectronic Health Records(EHRs)Machine learning
spellingShingle Muntaha Tabassum
Saba Mahmood
Amal Bukhari
Bader Alshemaimri
Ali Daud
Fatima Khalique
Anomaly-based threat detection in smart health using machine learning
BMC Medical Informatics and Decision Making
Healthcare
Anomaly detection
Insider threats
Electronic Health Records(EHRs)
Machine learning
title Anomaly-based threat detection in smart health using machine learning
title_full Anomaly-based threat detection in smart health using machine learning
title_fullStr Anomaly-based threat detection in smart health using machine learning
title_full_unstemmed Anomaly-based threat detection in smart health using machine learning
title_short Anomaly-based threat detection in smart health using machine learning
title_sort anomaly based threat detection in smart health using machine learning
topic Healthcare
Anomaly detection
Insider threats
Electronic Health Records(EHRs)
Machine learning
url https://doi.org/10.1186/s12911-024-02760-4
work_keys_str_mv AT muntahatabassum anomalybasedthreatdetectioninsmarthealthusingmachinelearning
AT sabamahmood anomalybasedthreatdetectioninsmarthealthusingmachinelearning
AT amalbukhari anomalybasedthreatdetectioninsmarthealthusingmachinelearning
AT baderalshemaimri anomalybasedthreatdetectioninsmarthealthusingmachinelearning
AT alidaud anomalybasedthreatdetectioninsmarthealthusingmachinelearning
AT fatimakhalique anomalybasedthreatdetectioninsmarthealthusingmachinelearning