Leveraging Shapley Additive Explanations for Feature Selection in Ensemble Models for Diabetes Prediction

Diabetes, a significant global health crisis, is primarily driven in India by unhealthy diets and sedentary lifestyles, with rapid urbanization amplifying these effects through convenience-oriented living and limited physical activity opportunities, underscoring the need for advanced preventative st...

Full description

Saved in:
Bibliographic Details
Main Authors: Prasant Kumar Mohanty, Sharmila Anand John Francis, Rabindra Kumar Barik, Diptendu Sinha Roy, Manob Jyoti Saikia
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Bioengineering
Subjects:
Online Access:https://www.mdpi.com/2306-5354/11/12/1215
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Diabetes, a significant global health crisis, is primarily driven in India by unhealthy diets and sedentary lifestyles, with rapid urbanization amplifying these effects through convenience-oriented living and limited physical activity opportunities, underscoring the need for advanced preventative strategies and technology for effective management. This study integrates Shapley Additive explanations (SHAPs) into ensemble machine learning models to improve the accuracy and efficiency of diabetes predictions. By identifying the most influential features using SHAP, this study examined their role in maintaining high predictive performance while minimizing computational demands. The impact of feature selection on model accuracy was assessed across ten models using three feature sets: all features, the top three influential features, and all except these top three. Models focusing on the top three features achieved superior performance, with the ensemble model attaining a better performance in most of the metrics, outperforming comparable approaches. Notably, excluding these features led to a significant decline in performance, reinforcing their critical influence. These findings validate the effectiveness of targeted feature selection for efficient and robust clinical applications.
ISSN:2306-5354