Machine learning models integrating dietary data predict all-cause mortality in U.S. NAFLD patients: an NHANES-based study
Abstract Background Non-alcoholic fatty liver disease (NAFLD) is a leading cause of chronic liver disease, closely associated with metabolic abnormalities and unhealthy lifestyle habits. Despite the critical role of diet in disease progression, most existing prognostic models for NAFLD fail to incor...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-07-01
|
| Series: | Nutrition Journal |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s12937-025-01170-0 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849238727710212096 |
|---|---|
| author | Pinchu Chen Yao Li Chenfenglin Yang Qifan Zhang |
| author_facet | Pinchu Chen Yao Li Chenfenglin Yang Qifan Zhang |
| author_sort | Pinchu Chen |
| collection | DOAJ |
| description | Abstract Background Non-alcoholic fatty liver disease (NAFLD) is a leading cause of chronic liver disease, closely associated with metabolic abnormalities and unhealthy lifestyle habits. Despite the critical role of diet in disease progression, most existing prognostic models for NAFLD fail to incorporate dietary factors. This study aims to integrate demographic, serological, and nutritional data. It focuses on developing machine learning models that predict all-cause mortality risk in NAFLD patients, with a particular emphasis on dietary interventions. Methods Data from the National Health and Nutrition Examination Survey (NHANES) 2007–2018, comprising 2,589 NAFLD participants, were analyzed. Variables associated with survival outcomes were selected using LASSO-Cox regression. Five machine learning models—Random Survival Forest (RSF), Gradient Boosting Machine (GBM), CoxBoost, and Survival Support Vector Machine (SurvivalSVM), eXtreme Gradient Boosting (XGBoost) —were developed and their performance evaluated through time-dependent AUC, ROC curves, C-index, Brier score and Kaplan-Meier analysis. SHAP values were employed for model interpretability. Results LASSO-Cox regression identified 13 significant variables, including age, household income, blood glucose, sedentary behavior, dietary fiber intake and so on. The GBM and RSF models demonstrated strong predictive performance with AUC values around 0.8 for both 5- and 10-year survival predictions, and also performed well in terms of C-index and Brier score. SHAP analysis revealed that advanced age, low household income, hyperglycemia, and sedentary behavior were associated with poor prognosis, whereas higher dietary fiber intake was linked to improved survival. Conclusions This study integrates dietary data into machine learning models, demonstrating the potential for predicting all-cause mortality in NAFLD patients. The models, particularly RSF and GBM, show robust predictive accuracy, with dietary fiber intake consistently exhibiting a protective effect on survival outcomes. These findings suggest that dietary interventions, such as increasing dietary fiber intake, could improve the long-term prognosis of NAFLD patients. Clinical trial number Not applicable. |
| format | Article |
| id | doaj-art-d44a1a0e0cd6408c81be4d1da10a58a1 |
| institution | Kabale University |
| issn | 1475-2891 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | BMC |
| record_format | Article |
| series | Nutrition Journal |
| spelling | doaj-art-d44a1a0e0cd6408c81be4d1da10a58a12025-08-20T04:01:25ZengBMCNutrition Journal1475-28912025-07-0124111210.1186/s12937-025-01170-0Machine learning models integrating dietary data predict all-cause mortality in U.S. NAFLD patients: an NHANES-based studyPinchu Chen0Yao Li1Chenfenglin Yang2Qifan Zhang3Division of Hepatobiliopancreatic Surgery, Department of General Surgery, Nanfang Hospital, Southern Medical UniversityDivision of Hepatobiliopancreatic Surgery, Department of General Surgery, Nanfang Hospital, Southern Medical UniversityDivision of Hepatobiliopancreatic Surgery, Department of General Surgery, Nanfang Hospital, Southern Medical UniversityDivision of Hepatobiliopancreatic Surgery, Department of General Surgery, Nanfang Hospital, Southern Medical UniversityAbstract Background Non-alcoholic fatty liver disease (NAFLD) is a leading cause of chronic liver disease, closely associated with metabolic abnormalities and unhealthy lifestyle habits. Despite the critical role of diet in disease progression, most existing prognostic models for NAFLD fail to incorporate dietary factors. This study aims to integrate demographic, serological, and nutritional data. It focuses on developing machine learning models that predict all-cause mortality risk in NAFLD patients, with a particular emphasis on dietary interventions. Methods Data from the National Health and Nutrition Examination Survey (NHANES) 2007–2018, comprising 2,589 NAFLD participants, were analyzed. Variables associated with survival outcomes were selected using LASSO-Cox regression. Five machine learning models—Random Survival Forest (RSF), Gradient Boosting Machine (GBM), CoxBoost, and Survival Support Vector Machine (SurvivalSVM), eXtreme Gradient Boosting (XGBoost) —were developed and their performance evaluated through time-dependent AUC, ROC curves, C-index, Brier score and Kaplan-Meier analysis. SHAP values were employed for model interpretability. Results LASSO-Cox regression identified 13 significant variables, including age, household income, blood glucose, sedentary behavior, dietary fiber intake and so on. The GBM and RSF models demonstrated strong predictive performance with AUC values around 0.8 for both 5- and 10-year survival predictions, and also performed well in terms of C-index and Brier score. SHAP analysis revealed that advanced age, low household income, hyperglycemia, and sedentary behavior were associated with poor prognosis, whereas higher dietary fiber intake was linked to improved survival. Conclusions This study integrates dietary data into machine learning models, demonstrating the potential for predicting all-cause mortality in NAFLD patients. The models, particularly RSF and GBM, show robust predictive accuracy, with dietary fiber intake consistently exhibiting a protective effect on survival outcomes. These findings suggest that dietary interventions, such as increasing dietary fiber intake, could improve the long-term prognosis of NAFLD patients. Clinical trial number Not applicable.https://doi.org/10.1186/s12937-025-01170-0NAFLDMachine-learningNHANESDietary fiber |
| spellingShingle | Pinchu Chen Yao Li Chenfenglin Yang Qifan Zhang Machine learning models integrating dietary data predict all-cause mortality in U.S. NAFLD patients: an NHANES-based study Nutrition Journal NAFLD Machine-learning NHANES Dietary fiber |
| title | Machine learning models integrating dietary data predict all-cause mortality in U.S. NAFLD patients: an NHANES-based study |
| title_full | Machine learning models integrating dietary data predict all-cause mortality in U.S. NAFLD patients: an NHANES-based study |
| title_fullStr | Machine learning models integrating dietary data predict all-cause mortality in U.S. NAFLD patients: an NHANES-based study |
| title_full_unstemmed | Machine learning models integrating dietary data predict all-cause mortality in U.S. NAFLD patients: an NHANES-based study |
| title_short | Machine learning models integrating dietary data predict all-cause mortality in U.S. NAFLD patients: an NHANES-based study |
| title_sort | machine learning models integrating dietary data predict all cause mortality in u s nafld patients an nhanes based study |
| topic | NAFLD Machine-learning NHANES Dietary fiber |
| url | https://doi.org/10.1186/s12937-025-01170-0 |
| work_keys_str_mv | AT pinchuchen machinelearningmodelsintegratingdietarydatapredictallcausemortalityinusnafldpatientsannhanesbasedstudy AT yaoli machinelearningmodelsintegratingdietarydatapredictallcausemortalityinusnafldpatientsannhanesbasedstudy AT chenfenglinyang machinelearningmodelsintegratingdietarydatapredictallcausemortalityinusnafldpatientsannhanesbasedstudy AT qifanzhang machinelearningmodelsintegratingdietarydatapredictallcausemortalityinusnafldpatientsannhanesbasedstudy |