A multi-biomarker machine learning approach for early prediction of interstitial lung disease in rheumatoid arthritis

Abstract Background Interstitial lung disease (ILD) is a severe complication affecting 10–30% of rheumatoid arthritis (RA) patients. Current diagnostic methods typically detect ILD only after substantial lung damage has occurred. This delay emphasizes the need for early detection strategies. This st...

Full description

Saved in:
Bibliographic Details
Main Authors: Jiaojiao Xu, Wei Zhang, Weili Bai, Nannan Gai, Jing Li, Yunqi Bao
Format: Article
Language:English
Published: BMC 2025-08-01
Series:BMC Pulmonary Medicine
Subjects:
Online Access:https://doi.org/10.1186/s12890-025-03855-y
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Background Interstitial lung disease (ILD) is a severe complication affecting 10–30% of rheumatoid arthritis (RA) patients. Current diagnostic methods typically detect ILD only after substantial lung damage has occurred. This delay emphasizes the need for early detection strategies. This study aims to develop and validate machine learning models for early RA-ILD prediction and identify key predictive biomarkers. Methods We conducted a cross-sectional study enrolling 149 RA patients (84 with ILD, 65 without ILD) between January 2020 and December 2023. We evaluated demographic characteristics, clinical parameters, and laboratory markers, including inflammatory indicators, hematological parameters, and specific biomarkers. We developed and compared four machine learning (ML) models (XGBoost, Random Forest, Support Vector Machine, and Logistic Regression) for ILD prediction capabilities. Results The XGBoost model demonstrated superior predictive performance (AUC = 0.891, 95% CI: 0.847–0.935). Feature importance analysis identified Krebs von den Lungen-6 (KL-6) as the strongest predictor (importance score = 0.285), followed by interleukin-6 (IL-6) and cytokeratin 19 fragment (CYFRA21-1). The ILD group exhibited significantly elevated levels of inflammatory markers and specific biomarkers, particularly KL-6 (826.4 ± 458.2 vs. 285.6 ± 124.8 U/ml, P < 0.001), alongside distinct patterns in hematological parameters. Conclusion Machine learning approaches, particularly XGBoost, demonstrate promising potential for early RA-ILD prediction. The integration of KL-6 and other identified biomarkers into clinical screening protocols may facilitate early detection and improved patient outcomes. These findings suggest that machine learning models could serve as valuable tools for risk stratification and early intervention in RA-ILD management, providing new approaches for individualized risk assessment in clinical practice.
ISSN:1471-2466