Prediction of preterm birth using machine learning: a comprehensive analysis based on large-scale preschool children survey data in Shenzhen of China

Abstract Background Preterm birth (PTB) is a significant cause of neonatal mortality and long-term health issues. Accurate prediction and timely prevention of PTB are essential for reducing associated child mortality and morbidity. Traditional predictive methods face challenges due to heterogeneous...

Full description

Saved in:
Bibliographic Details
Main Authors: Liwen Ding, Xiaona Yin, Guomin Wen, Dengli Sun, Danxia Xian, Yafen Zhao, Maolin Zhang, Weikang Yang, Weiqing Chen
Format: Article
Language:English
Published: BMC 2024-12-01
Series:BMC Pregnancy and Childbirth
Subjects:
Online Access:https://doi.org/10.1186/s12884-024-06980-4
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846136795283062784
author Liwen Ding
Xiaona Yin
Guomin Wen
Dengli Sun
Danxia Xian
Yafen Zhao
Maolin Zhang
Weikang Yang
Weiqing Chen
author_facet Liwen Ding
Xiaona Yin
Guomin Wen
Dengli Sun
Danxia Xian
Yafen Zhao
Maolin Zhang
Weikang Yang
Weiqing Chen
author_sort Liwen Ding
collection DOAJ
description Abstract Background Preterm birth (PTB) is a significant cause of neonatal mortality and long-term health issues. Accurate prediction and timely prevention of PTB are essential for reducing associated child mortality and morbidity. Traditional predictive methods face challenges due to heterogeneous risk factors and their interaction effects. This study aims to develop and evaluate six machine learning (ML) models to predict PTB using large-scale children survey data from Shenzhen, China, and to identify key predictors through Shapley Additive Explanations (SHAP) analysis. Methods Data from 84,050 mother–child pairs, collected in 2021 and 2022, were processed and divided into training, validation, and test sets. Six ML models were tested: L1-Regularised Logistic Regression, Light Gradient Boosting Machine (LightGBM), Naive Bayes, Random Forests, Support Vector Machine, and Extreme Gradient Boosting (XGBoost). Model performance was evaluated based on discrimination, calibration and clinical utility. SHAP analysis was used to interpret the importance and impact of individual features on PTB prediction. Results The XGBoost model demonstrated the best overall performance, with the area under the receiver operating characteristic curve (AUC) scores of 0.752 and 0.757 in the validation and test sets, respectively, along with favorable calibration and clinical utility. Key predictors identified were multiple pregnancies, threatened abortion, and maternal age of conception. SHAP analysis highlighted the positive impacts of multiple pregnancies and threatened abortion, as well as the negative impact of micronutrient supplementation on PTB. Conclusion Our study found that ML models, particularly XGBoost, show promise in accurately predicting PTB and identifying key risk factors. These findings provide the potential of ML for enhancing clinical interventions, personalizing prenatal care, and informing public health initiatives.
format Article
id doaj-art-c4d6458101ab4d5ba24f6c6f6fefe3e0
institution Kabale University
issn 1471-2393
language English
publishDate 2024-12-01
publisher BMC
record_format Article
series BMC Pregnancy and Childbirth
spelling doaj-art-c4d6458101ab4d5ba24f6c6f6fefe3e02024-12-08T12:48:37ZengBMCBMC Pregnancy and Childbirth1471-23932024-12-0124111410.1186/s12884-024-06980-4Prediction of preterm birth using machine learning: a comprehensive analysis based on large-scale preschool children survey data in Shenzhen of ChinaLiwen Ding0Xiaona Yin1Guomin Wen2Dengli Sun3Danxia Xian4Yafen Zhao5Maolin Zhang6Weikang Yang7Weiqing Chen8Department of Epidemiology and Health Statistics, School of Public Health, Sun Yat-Sen UniversityWomen’s and Children’s Hospital of Longhua District of ShenzhenWomen’s and Children’s Hospital of Longhua District of ShenzhenWomen’s and Children’s Hospital of Longhua District of ShenzhenWomen’s and Children’s Hospital of Longhua District of ShenzhenWomen’s and Children’s Hospital of Longhua District of ShenzhenDepartment of Epidemiology and Health Statistics, School of Public Health, Sun Yat-Sen UniversityWomen’s and Children’s Hospital of Longhua District of ShenzhenDepartment of Epidemiology and Health Statistics, School of Public Health, Sun Yat-Sen UniversityAbstract Background Preterm birth (PTB) is a significant cause of neonatal mortality and long-term health issues. Accurate prediction and timely prevention of PTB are essential for reducing associated child mortality and morbidity. Traditional predictive methods face challenges due to heterogeneous risk factors and their interaction effects. This study aims to develop and evaluate six machine learning (ML) models to predict PTB using large-scale children survey data from Shenzhen, China, and to identify key predictors through Shapley Additive Explanations (SHAP) analysis. Methods Data from 84,050 mother–child pairs, collected in 2021 and 2022, were processed and divided into training, validation, and test sets. Six ML models were tested: L1-Regularised Logistic Regression, Light Gradient Boosting Machine (LightGBM), Naive Bayes, Random Forests, Support Vector Machine, and Extreme Gradient Boosting (XGBoost). Model performance was evaluated based on discrimination, calibration and clinical utility. SHAP analysis was used to interpret the importance and impact of individual features on PTB prediction. Results The XGBoost model demonstrated the best overall performance, with the area under the receiver operating characteristic curve (AUC) scores of 0.752 and 0.757 in the validation and test sets, respectively, along with favorable calibration and clinical utility. Key predictors identified were multiple pregnancies, threatened abortion, and maternal age of conception. SHAP analysis highlighted the positive impacts of multiple pregnancies and threatened abortion, as well as the negative impact of micronutrient supplementation on PTB. Conclusion Our study found that ML models, particularly XGBoost, show promise in accurately predicting PTB and identifying key risk factors. These findings provide the potential of ML for enhancing clinical interventions, personalizing prenatal care, and informing public health initiatives.https://doi.org/10.1186/s12884-024-06980-4Preterm birthMachine learningPrediction modelSHAPMultiple pregnanciesThreatened abortion
spellingShingle Liwen Ding
Xiaona Yin
Guomin Wen
Dengli Sun
Danxia Xian
Yafen Zhao
Maolin Zhang
Weikang Yang
Weiqing Chen
Prediction of preterm birth using machine learning: a comprehensive analysis based on large-scale preschool children survey data in Shenzhen of China
BMC Pregnancy and Childbirth
Preterm birth
Machine learning
Prediction model
SHAP
Multiple pregnancies
Threatened abortion
title Prediction of preterm birth using machine learning: a comprehensive analysis based on large-scale preschool children survey data in Shenzhen of China
title_full Prediction of preterm birth using machine learning: a comprehensive analysis based on large-scale preschool children survey data in Shenzhen of China
title_fullStr Prediction of preterm birth using machine learning: a comprehensive analysis based on large-scale preschool children survey data in Shenzhen of China
title_full_unstemmed Prediction of preterm birth using machine learning: a comprehensive analysis based on large-scale preschool children survey data in Shenzhen of China
title_short Prediction of preterm birth using machine learning: a comprehensive analysis based on large-scale preschool children survey data in Shenzhen of China
title_sort prediction of preterm birth using machine learning a comprehensive analysis based on large scale preschool children survey data in shenzhen of china
topic Preterm birth
Machine learning
Prediction model
SHAP
Multiple pregnancies
Threatened abortion
url https://doi.org/10.1186/s12884-024-06980-4
work_keys_str_mv AT liwending predictionofpretermbirthusingmachinelearningacomprehensiveanalysisbasedonlargescalepreschoolchildrensurveydatainshenzhenofchina
AT xiaonayin predictionofpretermbirthusingmachinelearningacomprehensiveanalysisbasedonlargescalepreschoolchildrensurveydatainshenzhenofchina
AT guominwen predictionofpretermbirthusingmachinelearningacomprehensiveanalysisbasedonlargescalepreschoolchildrensurveydatainshenzhenofchina
AT denglisun predictionofpretermbirthusingmachinelearningacomprehensiveanalysisbasedonlargescalepreschoolchildrensurveydatainshenzhenofchina
AT danxiaxian predictionofpretermbirthusingmachinelearningacomprehensiveanalysisbasedonlargescalepreschoolchildrensurveydatainshenzhenofchina
AT yafenzhao predictionofpretermbirthusingmachinelearningacomprehensiveanalysisbasedonlargescalepreschoolchildrensurveydatainshenzhenofchina
AT maolinzhang predictionofpretermbirthusingmachinelearningacomprehensiveanalysisbasedonlargescalepreschoolchildrensurveydatainshenzhenofchina
AT weikangyang predictionofpretermbirthusingmachinelearningacomprehensiveanalysisbasedonlargescalepreschoolchildrensurveydatainshenzhenofchina
AT weiqingchen predictionofpretermbirthusingmachinelearningacomprehensiveanalysisbasedonlargescalepreschoolchildrensurveydatainshenzhenofchina