Toward reliable diabetes prediction: Innovations in data engineering and machine learning applications
Objective Diabetes is a metabolic disorder that causes the risk of stroke, heart disease, kidney failure, and other long-term complications because diabetes generates excess sugar in the blood. Machine learning (ML) models can aid in diagnosing diabetes at the primary stage. So, we need an efficient...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SAGE Publishing
2024-08-01
|
Series: | Digital Health |
Online Access: | https://doi.org/10.1177/20552076241271867 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841550785392934912 |
---|---|
author | Md. Alamin Talukder Md. Manowarul Islam Md Ashraf Uddin Mohsin Kazi Majdi Khalid Arnisha Akhter Mohammad Ali Moni |
author_facet | Md. Alamin Talukder Md. Manowarul Islam Md Ashraf Uddin Mohsin Kazi Majdi Khalid Arnisha Akhter Mohammad Ali Moni |
author_sort | Md. Alamin Talukder |
collection | DOAJ |
description | Objective Diabetes is a metabolic disorder that causes the risk of stroke, heart disease, kidney failure, and other long-term complications because diabetes generates excess sugar in the blood. Machine learning (ML) models can aid in diagnosing diabetes at the primary stage. So, we need an efficient ML model to diagnose diabetes accurately. Methods In this paper, an effective data preprocessing pipeline has been implemented to process the data and random oversampling to balance the data, handling the imbalance distributions of the observational data more sophisticatedly. We used four different diabetes datasets to conduct our experiments. Several ML algorithms were used to determine the best models to predict diabetes faultlessly. Results The performance analysis demonstrates that among all ML algorithms, random forest surpasses the current works with an accuracy rate of 86% and 98.48% for Dataset 1 and Dataset 2; extreme gradient boosting and decision tree surpass with an accuracy rate of 99.27% and 100% for Dataset 3 and Dataset 4, respectively. Our proposal can increase accuracy by 12.15% compared to the model without preprocessing. Conclusions This excellent research finding indicates that the proposed models might be employed to produce more accurate diabetes predictions to supplement current preventative interventions to reduce the incidence of diabetes and its associated costs. |
format | Article |
id | doaj-art-9ba2419f5e0e44d5b127c778f8894fa8 |
institution | Kabale University |
issn | 2055-2076 |
language | English |
publishDate | 2024-08-01 |
publisher | SAGE Publishing |
record_format | Article |
series | Digital Health |
spelling | doaj-art-9ba2419f5e0e44d5b127c778f8894fa82025-01-10T03:03:27ZengSAGE PublishingDigital Health2055-20762024-08-011010.1177/20552076241271867Toward reliable diabetes prediction: Innovations in data engineering and machine learning applicationsMd. Alamin Talukder0Md. Manowarul Islam1Md Ashraf Uddin2Mohsin Kazi3Majdi Khalid4Arnisha Akhter5Mohammad Ali Moni6 Department of Computer Science and Engineering, , Dhaka, Bangladesh Department of Computer Science and Engineering, , Dhaka, Bangladesh School of Information Technology, Deakin University, Waurn Ponds Campus, Geelong, Australia Department of Pharmaceutics, College of Pharmacy, , Riyadh, Saudi Arabia Department of Computer Science and Artificial Intelligence, College of Computing, , Makkah, Saudi Arabia Department of Computer Science and Engineering, , Dhaka, Bangladesh Artificial Intelligence & Data Science, Faculty of Health and Behavioural Sciences, The University of Queensland, Brisbane, AustraliaObjective Diabetes is a metabolic disorder that causes the risk of stroke, heart disease, kidney failure, and other long-term complications because diabetes generates excess sugar in the blood. Machine learning (ML) models can aid in diagnosing diabetes at the primary stage. So, we need an efficient ML model to diagnose diabetes accurately. Methods In this paper, an effective data preprocessing pipeline has been implemented to process the data and random oversampling to balance the data, handling the imbalance distributions of the observational data more sophisticatedly. We used four different diabetes datasets to conduct our experiments. Several ML algorithms were used to determine the best models to predict diabetes faultlessly. Results The performance analysis demonstrates that among all ML algorithms, random forest surpasses the current works with an accuracy rate of 86% and 98.48% for Dataset 1 and Dataset 2; extreme gradient boosting and decision tree surpass with an accuracy rate of 99.27% and 100% for Dataset 3 and Dataset 4, respectively. Our proposal can increase accuracy by 12.15% compared to the model without preprocessing. Conclusions This excellent research finding indicates that the proposed models might be employed to produce more accurate diabetes predictions to supplement current preventative interventions to reduce the incidence of diabetes and its associated costs.https://doi.org/10.1177/20552076241271867 |
spellingShingle | Md. Alamin Talukder Md. Manowarul Islam Md Ashraf Uddin Mohsin Kazi Majdi Khalid Arnisha Akhter Mohammad Ali Moni Toward reliable diabetes prediction: Innovations in data engineering and machine learning applications Digital Health |
title | Toward reliable diabetes prediction: Innovations in data engineering and machine learning applications |
title_full | Toward reliable diabetes prediction: Innovations in data engineering and machine learning applications |
title_fullStr | Toward reliable diabetes prediction: Innovations in data engineering and machine learning applications |
title_full_unstemmed | Toward reliable diabetes prediction: Innovations in data engineering and machine learning applications |
title_short | Toward reliable diabetes prediction: Innovations in data engineering and machine learning applications |
title_sort | toward reliable diabetes prediction innovations in data engineering and machine learning applications |
url | https://doi.org/10.1177/20552076241271867 |
work_keys_str_mv | AT mdalamintalukder towardreliablediabetespredictioninnovationsindataengineeringandmachinelearningapplications AT mdmanowarulislam towardreliablediabetespredictioninnovationsindataengineeringandmachinelearningapplications AT mdashrafuddin towardreliablediabetespredictioninnovationsindataengineeringandmachinelearningapplications AT mohsinkazi towardreliablediabetespredictioninnovationsindataengineeringandmachinelearningapplications AT majdikhalid towardreliablediabetespredictioninnovationsindataengineeringandmachinelearningapplications AT arnishaakhter towardreliablediabetespredictioninnovationsindataengineeringandmachinelearningapplications AT mohammadalimoni towardreliablediabetespredictioninnovationsindataengineeringandmachinelearningapplications |