Machine learning algorithms in constructing prediction models for assisted reproductive technology (ART) related live birth outcomes

Abstract Currently applicable models for predicting live birth outcomes in patients who received assisted reproductive technology (ART) have methodological or study design limitations that greatly obstruct their dissemination and application. Models suitable for Chinese couples have not yet been ide...

Full description

Saved in:
Bibliographic Details
Main Authors: Junwei Peng, Xiaoyujie Geng, Yiyue Zhao, Zhijin Hou, Xin Tian, Xinyi Liu, Yuanyuan Xiao, Yang Liu
Format: Article
Language:English
Published: Nature Portfolio 2024-12-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-024-83781-x
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Currently applicable models for predicting live birth outcomes in patients who received assisted reproductive technology (ART) have methodological or study design limitations that greatly obstruct their dissemination and application. Models suitable for Chinese couples have not yet been identified. We conducted a retrospective study by using a database includes a total of 11,938 couples who underwent in vitro fertilization (IVF) treatment between January 2015 and December 2022 in a medical institution of southwest China Yunnan province. Multiple candidate predictors were screened out by using the importance scores. Four machine learning (ML) algorithms including random forest, extreme gradient boosting, light gradient boosting machine and binary logistic regression were used to construct prediction models. An initial assessment of the predictive performance was conducted and validated by using cross-validation and bootstrap methods. A total of seven predictors were identified, namely maternal age, duration of infertility, basal follicle-stimulating hormone (FSH), progressive sperm motility, progesterone (P) on HCG day, estradiol (E2) on HCG day, and luteinizing hormone (LH) on HCG day. Of the four predictive models, the random forest model and the logistic regression model were considered to have the optimal performance, with the areas under the receiver operating characteristic curve (AUROC) curves of 0.671 (95% CI 0.630–0.713) and 0.674 (95% CI 0.627–0.720). The Brier scores were 0.183 (95% CI 0.170–0.196) and 0.183 (95% CI 0.170–0.196), respectively. Considering the simplicity of model fitting, we recommend the logistic regression model as the best predictive model for live birth. Furthermore, maternal age, P on HCG day and E2 on HCG day were deemed to have the highest contribution to model prediction.
ISSN:2045-2322