Use of Machine Learning to Predict the Incidence of Type 2 Diabetes Among Relatively Healthy Adults: A 10-Year Longitudinal Study in Taiwan

<b>Background</b>: The prevalence of diabetes is increasing worldwide, particularly in the Pacific Ocean island nations. Although machine learning (ML) models and data mining approaches have been applied to diabetes research, there was no study utilizing ML models to predict diabetes inc...

Full description

Saved in:
Bibliographic Details
Main Authors: Ying-Qiang Liu, Tzu-Wei Chang, Lung-Chun Lee, Chia-Yu Chen, Pi-Shan Hsu, Yu-Tse Tsan, Chao-Tung Yang, Wei-Min Chu
Format: Article
Language:English
Published: MDPI AG 2024-12-01
Series:Diagnostics
Subjects:
Online Access:https://www.mdpi.com/2075-4418/15/1/72
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841549259650891776
author Ying-Qiang Liu
Tzu-Wei Chang
Lung-Chun Lee
Chia-Yu Chen
Pi-Shan Hsu
Yu-Tse Tsan
Chao-Tung Yang
Wei-Min Chu
author_facet Ying-Qiang Liu
Tzu-Wei Chang
Lung-Chun Lee
Chia-Yu Chen
Pi-Shan Hsu
Yu-Tse Tsan
Chao-Tung Yang
Wei-Min Chu
author_sort Ying-Qiang Liu
collection DOAJ
description <b>Background</b>: The prevalence of diabetes is increasing worldwide, particularly in the Pacific Ocean island nations. Although machine learning (ML) models and data mining approaches have been applied to diabetes research, there was no study utilizing ML models to predict diabetes incidence in Taiwan. We aimed to predict the onset of diabetes in order to raise health awareness, thereby promoting any necessary lifestyle modifications and help mitigate disease burden. <b>Methods</b>: The research dataset used in the study was retrieved from the Clinical Data Center of Taichung Veterans General Hospital. We collected data from the available electronic health records with a total of 33 items being employed for model construction. Individuals with diabetes and those with missing data were excluded. Ultimately, 6687 adults were included in the final analysis, where we implemented three different ML algorithms, including logistic regression (LR), random forest (RF) and extreme gradient boosting (XGBoost) in order to predict diabetes. <b>Results</b>: The top five important factors involved in the prediction model were glycated hemoglobin (HbA1c), fasting blood glucose, weight, free thyroxine (fT4), and triglycerides (TG). Notably, random forest, logistic regression, and XGBoost reached 99%, 99%, and 98% accuracy, respectively. fT4 seems to be one of the significant features in predicting the onset of diabetes. Moreover, this would be the first study using machine learning models to predict diabetes that has demonstrated the importance of thyroid hormone. <b>Conclusions</b>: A total of 33 items were able to be put into the machine learning model in order to predict diabetes with promising accuracy. In comparison to prior studies on machine learning models, this study not only identified similar key factors for predicting diabetes but also highlighted the significance of thyroid hormones, a factor that was previously overlooked. Moreover, it highlighted the relevance of predicting type 2 diabetes using more affordable methods, which would be useful for clinical healthcare professionals and endocrinologists who apply the models to clinical practice.
format Article
id doaj-art-397980418f5a4282897494981ad75d68
institution Kabale University
issn 2075-4418
language English
publishDate 2024-12-01
publisher MDPI AG
record_format Article
series Diagnostics
spelling doaj-art-397980418f5a4282897494981ad75d682025-01-10T13:16:38ZengMDPI AGDiagnostics2075-44182024-12-011517210.3390/diagnostics15010072Use of Machine Learning to Predict the Incidence of Type 2 Diabetes Among Relatively Healthy Adults: A 10-Year Longitudinal Study in TaiwanYing-Qiang Liu0Tzu-Wei Chang1Lung-Chun Lee2Chia-Yu Chen3Pi-Shan Hsu4Yu-Tse Tsan5Chao-Tung Yang6Wei-Min Chu7Department of Medical Education, Taichung Veterans General Hospital, Taichung 407219, TaiwanDepartment of Family Medicine, Taichung Veterans General Hospital, Taichung 407219, TaiwanDepartment of Family Medicine, Taichung Veterans General Hospital, Taichung 407219, TaiwanDepartment of Application Value-Added Service, SYSTEX Corporation, Taipei 114730, TaiwanDepartment of Family Medicine, Taichung Veterans General Hospital, Taichung 407219, TaiwanDivision of Occupational Medicine, Department of Emergency Medicine, Taichung Veterans General Hospital, Taichung 407219, TaiwanDepartment of Computer Science, Tunghai University, Taichung 407224, TaiwanDepartment of Family Medicine, Taichung Veterans General Hospital, Taichung 407219, Taiwan<b>Background</b>: The prevalence of diabetes is increasing worldwide, particularly in the Pacific Ocean island nations. Although machine learning (ML) models and data mining approaches have been applied to diabetes research, there was no study utilizing ML models to predict diabetes incidence in Taiwan. We aimed to predict the onset of diabetes in order to raise health awareness, thereby promoting any necessary lifestyle modifications and help mitigate disease burden. <b>Methods</b>: The research dataset used in the study was retrieved from the Clinical Data Center of Taichung Veterans General Hospital. We collected data from the available electronic health records with a total of 33 items being employed for model construction. Individuals with diabetes and those with missing data were excluded. Ultimately, 6687 adults were included in the final analysis, where we implemented three different ML algorithms, including logistic regression (LR), random forest (RF) and extreme gradient boosting (XGBoost) in order to predict diabetes. <b>Results</b>: The top five important factors involved in the prediction model were glycated hemoglobin (HbA1c), fasting blood glucose, weight, free thyroxine (fT4), and triglycerides (TG). Notably, random forest, logistic regression, and XGBoost reached 99%, 99%, and 98% accuracy, respectively. fT4 seems to be one of the significant features in predicting the onset of diabetes. Moreover, this would be the first study using machine learning models to predict diabetes that has demonstrated the importance of thyroid hormone. <b>Conclusions</b>: A total of 33 items were able to be put into the machine learning model in order to predict diabetes with promising accuracy. In comparison to prior studies on machine learning models, this study not only identified similar key factors for predicting diabetes but also highlighted the significance of thyroid hormones, a factor that was previously overlooked. Moreover, it highlighted the relevance of predicting type 2 diabetes using more affordable methods, which would be useful for clinical healthcare professionals and endocrinologists who apply the models to clinical practice.https://www.mdpi.com/2075-4418/15/1/72machine learning modelsdiabetesfree thyroxineglycated hemoglobinfasting blood glucoseweight
spellingShingle Ying-Qiang Liu
Tzu-Wei Chang
Lung-Chun Lee
Chia-Yu Chen
Pi-Shan Hsu
Yu-Tse Tsan
Chao-Tung Yang
Wei-Min Chu
Use of Machine Learning to Predict the Incidence of Type 2 Diabetes Among Relatively Healthy Adults: A 10-Year Longitudinal Study in Taiwan
Diagnostics
machine learning models
diabetes
free thyroxine
glycated hemoglobin
fasting blood glucose
weight
title Use of Machine Learning to Predict the Incidence of Type 2 Diabetes Among Relatively Healthy Adults: A 10-Year Longitudinal Study in Taiwan
title_full Use of Machine Learning to Predict the Incidence of Type 2 Diabetes Among Relatively Healthy Adults: A 10-Year Longitudinal Study in Taiwan
title_fullStr Use of Machine Learning to Predict the Incidence of Type 2 Diabetes Among Relatively Healthy Adults: A 10-Year Longitudinal Study in Taiwan
title_full_unstemmed Use of Machine Learning to Predict the Incidence of Type 2 Diabetes Among Relatively Healthy Adults: A 10-Year Longitudinal Study in Taiwan
title_short Use of Machine Learning to Predict the Incidence of Type 2 Diabetes Among Relatively Healthy Adults: A 10-Year Longitudinal Study in Taiwan
title_sort use of machine learning to predict the incidence of type 2 diabetes among relatively healthy adults a 10 year longitudinal study in taiwan
topic machine learning models
diabetes
free thyroxine
glycated hemoglobin
fasting blood glucose
weight
url https://www.mdpi.com/2075-4418/15/1/72
work_keys_str_mv AT yingqiangliu useofmachinelearningtopredicttheincidenceoftype2diabetesamongrelativelyhealthyadultsa10yearlongitudinalstudyintaiwan
AT tzuweichang useofmachinelearningtopredicttheincidenceoftype2diabetesamongrelativelyhealthyadultsa10yearlongitudinalstudyintaiwan
AT lungchunlee useofmachinelearningtopredicttheincidenceoftype2diabetesamongrelativelyhealthyadultsa10yearlongitudinalstudyintaiwan
AT chiayuchen useofmachinelearningtopredicttheincidenceoftype2diabetesamongrelativelyhealthyadultsa10yearlongitudinalstudyintaiwan
AT pishanhsu useofmachinelearningtopredicttheincidenceoftype2diabetesamongrelativelyhealthyadultsa10yearlongitudinalstudyintaiwan
AT yutsetsan useofmachinelearningtopredicttheincidenceoftype2diabetesamongrelativelyhealthyadultsa10yearlongitudinalstudyintaiwan
AT chaotungyang useofmachinelearningtopredicttheincidenceoftype2diabetesamongrelativelyhealthyadultsa10yearlongitudinalstudyintaiwan
AT weiminchu useofmachinelearningtopredicttheincidenceoftype2diabetesamongrelativelyhealthyadultsa10yearlongitudinalstudyintaiwan