Predicting the risk of pulmonary embolism in patients with tuberculosis using machine learning algorithms

Abstract Background This study aimed to develop predictive models with robust generalization capabilities for assessing the risk of pulmonary embolism in patients with tuberculosis using machine learning algorithms. Methods Data were collected from two centers and categorized into development and va...

Full description

Saved in:
Bibliographic Details
Main Authors: Haobo Kong, Yong Li, Ya Shen, Jingjing Pan, Min Liang, Zhi Geng, Yanbei Zhang
Format: Article
Language:English
Published: BMC 2024-12-01
Series:European Journal of Medical Research
Subjects:
Online Access:https://doi.org/10.1186/s40001-024-02218-3
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Background This study aimed to develop predictive models with robust generalization capabilities for assessing the risk of pulmonary embolism in patients with tuberculosis using machine learning algorithms. Methods Data were collected from two centers and categorized into development and validation cohorts. Using the development cohort, candidate variables were selected via the Recursive Feature Elimination (RFE) method. Five machine learning algorithms, logistic regression (LR), random forest (RF), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and support vector machine (SVM), were utilized to construct the predictive models. Model performance was evaluated through nested cross-validation and area under the curve (AUC) metrics, supplemented by interpretations using Shapley Additive explanations (SHAP) and line charts of AUC values. Models were subjected to external validation using an independent validation group, facilitating the early identification and management of pulmonary embolism risks in tuberculosis patients. Results Data from 694 patients were used for model development, and 236 patients from the validation group met the enrollment criteria. The optimal subset of variables identified included D-dimer, smoking status, dyspnea, age, sex, diabetes, platelet count, cough, fibrinogen, hemoglobin, hemoptysis, hypertension, chronic obstructive pulmonary disease (COPD), and chest pain. The RF model outperformed others, achieving an AUC of 0.839 (95% CI 0.780–0.899) and maintaining the highest average performance in external fivefold cross-validation (AUC: 0.906 ± 0.041). Conclusions The RF model demonstrates high and consistent effectiveness in predicting pulmonary embolism risk in tuberculosis patients.
ISSN:2047-783X