Leveraging Shapley Additive Explanations for Feature Selection in Ensemble Models for Diabetes Prediction

Diabetes, a significant global health crisis, is primarily driven in India by unhealthy diets and sedentary lifestyles, with rapid urbanization amplifying these effects through convenience-oriented living and limited physical activity opportunities, underscoring the need for advanced preventative st...

Full description

Saved in:
Bibliographic Details
Main Authors: Prasant Kumar Mohanty, Sharmila Anand John Francis, Rabindra Kumar Barik, Diptendu Sinha Roy, Manob Jyoti Saikia
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Bioengineering
Subjects:
Online Access:https://www.mdpi.com/2306-5354/11/12/1215
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846105793474068480
author Prasant Kumar Mohanty
Sharmila Anand John Francis
Rabindra Kumar Barik
Diptendu Sinha Roy
Manob Jyoti Saikia
author_facet Prasant Kumar Mohanty
Sharmila Anand John Francis
Rabindra Kumar Barik
Diptendu Sinha Roy
Manob Jyoti Saikia
author_sort Prasant Kumar Mohanty
collection DOAJ
description Diabetes, a significant global health crisis, is primarily driven in India by unhealthy diets and sedentary lifestyles, with rapid urbanization amplifying these effects through convenience-oriented living and limited physical activity opportunities, underscoring the need for advanced preventative strategies and technology for effective management. This study integrates Shapley Additive explanations (SHAPs) into ensemble machine learning models to improve the accuracy and efficiency of diabetes predictions. By identifying the most influential features using SHAP, this study examined their role in maintaining high predictive performance while minimizing computational demands. The impact of feature selection on model accuracy was assessed across ten models using three feature sets: all features, the top three influential features, and all except these top three. Models focusing on the top three features achieved superior performance, with the ensemble model attaining a better performance in most of the metrics, outperforming comparable approaches. Notably, excluding these features led to a significant decline in performance, reinforcing their critical influence. These findings validate the effectiveness of targeted feature selection for efficient and robust clinical applications.
format Article
id doaj-art-cdbd7eda9fe34c868701ce6b8ee8c794
institution Kabale University
issn 2306-5354
language English
publishDate 2024-11-01
publisher MDPI AG
record_format Article
series Bioengineering
spelling doaj-art-cdbd7eda9fe34c868701ce6b8ee8c7942024-12-27T14:11:30ZengMDPI AGBioengineering2306-53542024-11-011112121510.3390/bioengineering11121215Leveraging Shapley Additive Explanations for Feature Selection in Ensemble Models for Diabetes PredictionPrasant Kumar Mohanty0Sharmila Anand John Francis1Rabindra Kumar Barik2Diptendu Sinha Roy3Manob Jyoti Saikia4Department of Computer Science and Engineering, National Institute of Technology, Meghalaya 793003, IndiaDepartment of Computer Science, King Khalid University, Abha Campus, Rijal Alma, Abha 61421, Saudi ArabiaSchool of Computer Applications, KIIT Deemed to be University, Bhubaneswar 751024, IndiaDepartment of Computer Science and Engineering, National Institute of Technology, Meghalaya 793003, IndiaBiomedical Sensors & Systems Lab, University of Memphis, Memphis, TN 38152, USADiabetes, a significant global health crisis, is primarily driven in India by unhealthy diets and sedentary lifestyles, with rapid urbanization amplifying these effects through convenience-oriented living and limited physical activity opportunities, underscoring the need for advanced preventative strategies and technology for effective management. This study integrates Shapley Additive explanations (SHAPs) into ensemble machine learning models to improve the accuracy and efficiency of diabetes predictions. By identifying the most influential features using SHAP, this study examined their role in maintaining high predictive performance while minimizing computational demands. The impact of feature selection on model accuracy was assessed across ten models using three feature sets: all features, the top three influential features, and all except these top three. Models focusing on the top three features achieved superior performance, with the ensemble model attaining a better performance in most of the metrics, outperforming comparable approaches. Notably, excluding these features led to a significant decline in performance, reinforcing their critical influence. These findings validate the effectiveness of targeted feature selection for efficient and robust clinical applications.https://www.mdpi.com/2306-5354/11/12/1215diabetes predictioninfluential feature valuesensemble modelsShapley additive explanations
spellingShingle Prasant Kumar Mohanty
Sharmila Anand John Francis
Rabindra Kumar Barik
Diptendu Sinha Roy
Manob Jyoti Saikia
Leveraging Shapley Additive Explanations for Feature Selection in Ensemble Models for Diabetes Prediction
Bioengineering
diabetes prediction
influential feature values
ensemble models
Shapley additive explanations
title Leveraging Shapley Additive Explanations for Feature Selection in Ensemble Models for Diabetes Prediction
title_full Leveraging Shapley Additive Explanations for Feature Selection in Ensemble Models for Diabetes Prediction
title_fullStr Leveraging Shapley Additive Explanations for Feature Selection in Ensemble Models for Diabetes Prediction
title_full_unstemmed Leveraging Shapley Additive Explanations for Feature Selection in Ensemble Models for Diabetes Prediction
title_short Leveraging Shapley Additive Explanations for Feature Selection in Ensemble Models for Diabetes Prediction
title_sort leveraging shapley additive explanations for feature selection in ensemble models for diabetes prediction
topic diabetes prediction
influential feature values
ensemble models
Shapley additive explanations
url https://www.mdpi.com/2306-5354/11/12/1215
work_keys_str_mv AT prasantkumarmohanty leveragingshapleyadditiveexplanationsforfeatureselectioninensemblemodelsfordiabetesprediction
AT sharmilaanandjohnfrancis leveragingshapleyadditiveexplanationsforfeatureselectioninensemblemodelsfordiabetesprediction
AT rabindrakumarbarik leveragingshapleyadditiveexplanationsforfeatureselectioninensemblemodelsfordiabetesprediction
AT diptendusinharoy leveragingshapleyadditiveexplanationsforfeatureselectioninensemblemodelsfordiabetesprediction
AT manobjyotisaikia leveragingshapleyadditiveexplanationsforfeatureselectioninensemblemodelsfordiabetesprediction