Toward reliable diabetes prediction: Innovations in data engineering and machine learning applications

Objective Diabetes is a metabolic disorder that causes the risk of stroke, heart disease, kidney failure, and other long-term complications because diabetes generates excess sugar in the blood. Machine learning (ML) models can aid in diagnosing diabetes at the primary stage. So, we need an efficient...

Full description

Saved in:
Bibliographic Details
Main Authors: Md. Alamin Talukder, Md. Manowarul Islam, Md Ashraf Uddin, Mohsin Kazi, Majdi Khalid, Arnisha Akhter, Mohammad Ali Moni
Format: Article
Language:English
Published: SAGE Publishing 2024-08-01
Series:Digital Health
Online Access:https://doi.org/10.1177/20552076241271867
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841550785392934912
author Md. Alamin Talukder
Md. Manowarul Islam
Md Ashraf Uddin
Mohsin Kazi
Majdi Khalid
Arnisha Akhter
Mohammad Ali Moni
author_facet Md. Alamin Talukder
Md. Manowarul Islam
Md Ashraf Uddin
Mohsin Kazi
Majdi Khalid
Arnisha Akhter
Mohammad Ali Moni
author_sort Md. Alamin Talukder
collection DOAJ
description Objective Diabetes is a metabolic disorder that causes the risk of stroke, heart disease, kidney failure, and other long-term complications because diabetes generates excess sugar in the blood. Machine learning (ML) models can aid in diagnosing diabetes at the primary stage. So, we need an efficient ML model to diagnose diabetes accurately. Methods In this paper, an effective data preprocessing pipeline has been implemented to process the data and random oversampling to balance the data, handling the imbalance distributions of the observational data more sophisticatedly. We used four different diabetes datasets to conduct our experiments. Several ML algorithms were used to determine the best models to predict diabetes faultlessly. Results The performance analysis demonstrates that among all ML algorithms, random forest surpasses the current works with an accuracy rate of 86% and 98.48% for Dataset 1 and Dataset 2; extreme gradient boosting and decision tree surpass with an accuracy rate of 99.27% and 100% for Dataset 3 and Dataset 4, respectively. Our proposal can increase accuracy by 12.15% compared to the model without preprocessing. Conclusions This excellent research finding indicates that the proposed models might be employed to produce more accurate diabetes predictions to supplement current preventative interventions to reduce the incidence of diabetes and its associated costs.
format Article
id doaj-art-9ba2419f5e0e44d5b127c778f8894fa8
institution Kabale University
issn 2055-2076
language English
publishDate 2024-08-01
publisher SAGE Publishing
record_format Article
series Digital Health
spelling doaj-art-9ba2419f5e0e44d5b127c778f8894fa82025-01-10T03:03:27ZengSAGE PublishingDigital Health2055-20762024-08-011010.1177/20552076241271867Toward reliable diabetes prediction: Innovations in data engineering and machine learning applicationsMd. Alamin Talukder0Md. Manowarul Islam1Md Ashraf Uddin2Mohsin Kazi3Majdi Khalid4Arnisha Akhter5Mohammad Ali Moni6 Department of Computer Science and Engineering, , Dhaka, Bangladesh Department of Computer Science and Engineering, , Dhaka, Bangladesh School of Information Technology, Deakin University, Waurn Ponds Campus, Geelong, Australia Department of Pharmaceutics, College of Pharmacy, , Riyadh, Saudi Arabia Department of Computer Science and Artificial Intelligence, College of Computing, , Makkah, Saudi Arabia Department of Computer Science and Engineering, , Dhaka, Bangladesh Artificial Intelligence & Data Science, Faculty of Health and Behavioural Sciences, The University of Queensland, Brisbane, AustraliaObjective Diabetes is a metabolic disorder that causes the risk of stroke, heart disease, kidney failure, and other long-term complications because diabetes generates excess sugar in the blood. Machine learning (ML) models can aid in diagnosing diabetes at the primary stage. So, we need an efficient ML model to diagnose diabetes accurately. Methods In this paper, an effective data preprocessing pipeline has been implemented to process the data and random oversampling to balance the data, handling the imbalance distributions of the observational data more sophisticatedly. We used four different diabetes datasets to conduct our experiments. Several ML algorithms were used to determine the best models to predict diabetes faultlessly. Results The performance analysis demonstrates that among all ML algorithms, random forest surpasses the current works with an accuracy rate of 86% and 98.48% for Dataset 1 and Dataset 2; extreme gradient boosting and decision tree surpass with an accuracy rate of 99.27% and 100% for Dataset 3 and Dataset 4, respectively. Our proposal can increase accuracy by 12.15% compared to the model without preprocessing. Conclusions This excellent research finding indicates that the proposed models might be employed to produce more accurate diabetes predictions to supplement current preventative interventions to reduce the incidence of diabetes and its associated costs.https://doi.org/10.1177/20552076241271867
spellingShingle Md. Alamin Talukder
Md. Manowarul Islam
Md Ashraf Uddin
Mohsin Kazi
Majdi Khalid
Arnisha Akhter
Mohammad Ali Moni
Toward reliable diabetes prediction: Innovations in data engineering and machine learning applications
Digital Health
title Toward reliable diabetes prediction: Innovations in data engineering and machine learning applications
title_full Toward reliable diabetes prediction: Innovations in data engineering and machine learning applications
title_fullStr Toward reliable diabetes prediction: Innovations in data engineering and machine learning applications
title_full_unstemmed Toward reliable diabetes prediction: Innovations in data engineering and machine learning applications
title_short Toward reliable diabetes prediction: Innovations in data engineering and machine learning applications
title_sort toward reliable diabetes prediction innovations in data engineering and machine learning applications
url https://doi.org/10.1177/20552076241271867
work_keys_str_mv AT mdalamintalukder towardreliablediabetespredictioninnovationsindataengineeringandmachinelearningapplications
AT mdmanowarulislam towardreliablediabetespredictioninnovationsindataengineeringandmachinelearningapplications
AT mdashrafuddin towardreliablediabetespredictioninnovationsindataengineeringandmachinelearningapplications
AT mohsinkazi towardreliablediabetespredictioninnovationsindataengineeringandmachinelearningapplications
AT majdikhalid towardreliablediabetespredictioninnovationsindataengineeringandmachinelearningapplications
AT arnishaakhter towardreliablediabetespredictioninnovationsindataengineeringandmachinelearningapplications
AT mohammadalimoni towardreliablediabetespredictioninnovationsindataengineeringandmachinelearningapplications