Prediction of preterm birth using machine learning: a comprehensive analysis based on large-scale preschool children survey data in Shenzhen of China
Abstract Background Preterm birth (PTB) is a significant cause of neonatal mortality and long-term health issues. Accurate prediction and timely prevention of PTB are essential for reducing associated child mortality and morbidity. Traditional predictive methods face challenges due to heterogeneous...
Saved in:
| Main Authors: | , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2024-12-01
|
| Series: | BMC Pregnancy and Childbirth |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s12884-024-06980-4 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1846136795283062784 |
|---|---|
| author | Liwen Ding Xiaona Yin Guomin Wen Dengli Sun Danxia Xian Yafen Zhao Maolin Zhang Weikang Yang Weiqing Chen |
| author_facet | Liwen Ding Xiaona Yin Guomin Wen Dengli Sun Danxia Xian Yafen Zhao Maolin Zhang Weikang Yang Weiqing Chen |
| author_sort | Liwen Ding |
| collection | DOAJ |
| description | Abstract Background Preterm birth (PTB) is a significant cause of neonatal mortality and long-term health issues. Accurate prediction and timely prevention of PTB are essential for reducing associated child mortality and morbidity. Traditional predictive methods face challenges due to heterogeneous risk factors and their interaction effects. This study aims to develop and evaluate six machine learning (ML) models to predict PTB using large-scale children survey data from Shenzhen, China, and to identify key predictors through Shapley Additive Explanations (SHAP) analysis. Methods Data from 84,050 mother–child pairs, collected in 2021 and 2022, were processed and divided into training, validation, and test sets. Six ML models were tested: L1-Regularised Logistic Regression, Light Gradient Boosting Machine (LightGBM), Naive Bayes, Random Forests, Support Vector Machine, and Extreme Gradient Boosting (XGBoost). Model performance was evaluated based on discrimination, calibration and clinical utility. SHAP analysis was used to interpret the importance and impact of individual features on PTB prediction. Results The XGBoost model demonstrated the best overall performance, with the area under the receiver operating characteristic curve (AUC) scores of 0.752 and 0.757 in the validation and test sets, respectively, along with favorable calibration and clinical utility. Key predictors identified were multiple pregnancies, threatened abortion, and maternal age of conception. SHAP analysis highlighted the positive impacts of multiple pregnancies and threatened abortion, as well as the negative impact of micronutrient supplementation on PTB. Conclusion Our study found that ML models, particularly XGBoost, show promise in accurately predicting PTB and identifying key risk factors. These findings provide the potential of ML for enhancing clinical interventions, personalizing prenatal care, and informing public health initiatives. |
| format | Article |
| id | doaj-art-c4d6458101ab4d5ba24f6c6f6fefe3e0 |
| institution | Kabale University |
| issn | 1471-2393 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | BMC |
| record_format | Article |
| series | BMC Pregnancy and Childbirth |
| spelling | doaj-art-c4d6458101ab4d5ba24f6c6f6fefe3e02024-12-08T12:48:37ZengBMCBMC Pregnancy and Childbirth1471-23932024-12-0124111410.1186/s12884-024-06980-4Prediction of preterm birth using machine learning: a comprehensive analysis based on large-scale preschool children survey data in Shenzhen of ChinaLiwen Ding0Xiaona Yin1Guomin Wen2Dengli Sun3Danxia Xian4Yafen Zhao5Maolin Zhang6Weikang Yang7Weiqing Chen8Department of Epidemiology and Health Statistics, School of Public Health, Sun Yat-Sen UniversityWomen’s and Children’s Hospital of Longhua District of ShenzhenWomen’s and Children’s Hospital of Longhua District of ShenzhenWomen’s and Children’s Hospital of Longhua District of ShenzhenWomen’s and Children’s Hospital of Longhua District of ShenzhenWomen’s and Children’s Hospital of Longhua District of ShenzhenDepartment of Epidemiology and Health Statistics, School of Public Health, Sun Yat-Sen UniversityWomen’s and Children’s Hospital of Longhua District of ShenzhenDepartment of Epidemiology and Health Statistics, School of Public Health, Sun Yat-Sen UniversityAbstract Background Preterm birth (PTB) is a significant cause of neonatal mortality and long-term health issues. Accurate prediction and timely prevention of PTB are essential for reducing associated child mortality and morbidity. Traditional predictive methods face challenges due to heterogeneous risk factors and their interaction effects. This study aims to develop and evaluate six machine learning (ML) models to predict PTB using large-scale children survey data from Shenzhen, China, and to identify key predictors through Shapley Additive Explanations (SHAP) analysis. Methods Data from 84,050 mother–child pairs, collected in 2021 and 2022, were processed and divided into training, validation, and test sets. Six ML models were tested: L1-Regularised Logistic Regression, Light Gradient Boosting Machine (LightGBM), Naive Bayes, Random Forests, Support Vector Machine, and Extreme Gradient Boosting (XGBoost). Model performance was evaluated based on discrimination, calibration and clinical utility. SHAP analysis was used to interpret the importance and impact of individual features on PTB prediction. Results The XGBoost model demonstrated the best overall performance, with the area under the receiver operating characteristic curve (AUC) scores of 0.752 and 0.757 in the validation and test sets, respectively, along with favorable calibration and clinical utility. Key predictors identified were multiple pregnancies, threatened abortion, and maternal age of conception. SHAP analysis highlighted the positive impacts of multiple pregnancies and threatened abortion, as well as the negative impact of micronutrient supplementation on PTB. Conclusion Our study found that ML models, particularly XGBoost, show promise in accurately predicting PTB and identifying key risk factors. These findings provide the potential of ML for enhancing clinical interventions, personalizing prenatal care, and informing public health initiatives.https://doi.org/10.1186/s12884-024-06980-4Preterm birthMachine learningPrediction modelSHAPMultiple pregnanciesThreatened abortion |
| spellingShingle | Liwen Ding Xiaona Yin Guomin Wen Dengli Sun Danxia Xian Yafen Zhao Maolin Zhang Weikang Yang Weiqing Chen Prediction of preterm birth using machine learning: a comprehensive analysis based on large-scale preschool children survey data in Shenzhen of China BMC Pregnancy and Childbirth Preterm birth Machine learning Prediction model SHAP Multiple pregnancies Threatened abortion |
| title | Prediction of preterm birth using machine learning: a comprehensive analysis based on large-scale preschool children survey data in Shenzhen of China |
| title_full | Prediction of preterm birth using machine learning: a comprehensive analysis based on large-scale preschool children survey data in Shenzhen of China |
| title_fullStr | Prediction of preterm birth using machine learning: a comprehensive analysis based on large-scale preschool children survey data in Shenzhen of China |
| title_full_unstemmed | Prediction of preterm birth using machine learning: a comprehensive analysis based on large-scale preschool children survey data in Shenzhen of China |
| title_short | Prediction of preterm birth using machine learning: a comprehensive analysis based on large-scale preschool children survey data in Shenzhen of China |
| title_sort | prediction of preterm birth using machine learning a comprehensive analysis based on large scale preschool children survey data in shenzhen of china |
| topic | Preterm birth Machine learning Prediction model SHAP Multiple pregnancies Threatened abortion |
| url | https://doi.org/10.1186/s12884-024-06980-4 |
| work_keys_str_mv | AT liwending predictionofpretermbirthusingmachinelearningacomprehensiveanalysisbasedonlargescalepreschoolchildrensurveydatainshenzhenofchina AT xiaonayin predictionofpretermbirthusingmachinelearningacomprehensiveanalysisbasedonlargescalepreschoolchildrensurveydatainshenzhenofchina AT guominwen predictionofpretermbirthusingmachinelearningacomprehensiveanalysisbasedonlargescalepreschoolchildrensurveydatainshenzhenofchina AT denglisun predictionofpretermbirthusingmachinelearningacomprehensiveanalysisbasedonlargescalepreschoolchildrensurveydatainshenzhenofchina AT danxiaxian predictionofpretermbirthusingmachinelearningacomprehensiveanalysisbasedonlargescalepreschoolchildrensurveydatainshenzhenofchina AT yafenzhao predictionofpretermbirthusingmachinelearningacomprehensiveanalysisbasedonlargescalepreschoolchildrensurveydatainshenzhenofchina AT maolinzhang predictionofpretermbirthusingmachinelearningacomprehensiveanalysisbasedonlargescalepreschoolchildrensurveydatainshenzhenofchina AT weikangyang predictionofpretermbirthusingmachinelearningacomprehensiveanalysisbasedonlargescalepreschoolchildrensurveydatainshenzhenofchina AT weiqingchen predictionofpretermbirthusingmachinelearningacomprehensiveanalysisbasedonlargescalepreschoolchildrensurveydatainshenzhenofchina |