Machine learning models integrating dietary data predict all-cause mortality in U.S. NAFLD patients: an NHANES-based study

Abstract Background Non-alcoholic fatty liver disease (NAFLD) is a leading cause of chronic liver disease, closely associated with metabolic abnormalities and unhealthy lifestyle habits. Despite the critical role of diet in disease progression, most existing prognostic models for NAFLD fail to incor...

Full description

Saved in:
Bibliographic Details
Main Authors: Pinchu Chen, Yao Li, Chenfenglin Yang, Qifan Zhang
Format: Article
Language:English
Published: BMC 2025-07-01
Series:Nutrition Journal
Subjects:
Online Access:https://doi.org/10.1186/s12937-025-01170-0
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Background Non-alcoholic fatty liver disease (NAFLD) is a leading cause of chronic liver disease, closely associated with metabolic abnormalities and unhealthy lifestyle habits. Despite the critical role of diet in disease progression, most existing prognostic models for NAFLD fail to incorporate dietary factors. This study aims to integrate demographic, serological, and nutritional data. It focuses on developing machine learning models that predict all-cause mortality risk in NAFLD patients, with a particular emphasis on dietary interventions. Methods Data from the National Health and Nutrition Examination Survey (NHANES) 2007–2018, comprising 2,589 NAFLD participants, were analyzed. Variables associated with survival outcomes were selected using LASSO-Cox regression. Five machine learning models—Random Survival Forest (RSF), Gradient Boosting Machine (GBM), CoxBoost, and Survival Support Vector Machine (SurvivalSVM), eXtreme Gradient Boosting (XGBoost) —were developed and their performance evaluated through time-dependent AUC, ROC curves, C-index, Brier score and Kaplan-Meier analysis. SHAP values were employed for model interpretability. Results LASSO-Cox regression identified 13 significant variables, including age, household income, blood glucose, sedentary behavior, dietary fiber intake and so on. The GBM and RSF models demonstrated strong predictive performance with AUC values around 0.8 for both 5- and 10-year survival predictions, and also performed well in terms of C-index and Brier score. SHAP analysis revealed that advanced age, low household income, hyperglycemia, and sedentary behavior were associated with poor prognosis, whereas higher dietary fiber intake was linked to improved survival. Conclusions This study integrates dietary data into machine learning models, demonstrating the potential for predicting all-cause mortality in NAFLD patients. The models, particularly RSF and GBM, show robust predictive accuracy, with dietary fiber intake consistently exhibiting a protective effect on survival outcomes. These findings suggest that dietary interventions, such as increasing dietary fiber intake, could improve the long-term prognosis of NAFLD patients. Clinical trial number Not applicable.
ISSN:1475-2891