A multi-biomarker machine learning approach for early prediction of interstitial lung disease in rheumatoid arthritis

Abstract Background Interstitial lung disease (ILD) is a severe complication affecting 10–30% of rheumatoid arthritis (RA) patients. Current diagnostic methods typically detect ILD only after substantial lung damage has occurred. This delay emphasizes the need for early detection strategies. This st...

Full description

Saved in:
Bibliographic Details
Main Authors: Jiaojiao Xu, Wei Zhang, Weili Bai, Nannan Gai, Jing Li, Yunqi Bao
Format: Article
Language:English
Published: BMC 2025-08-01
Series:BMC Pulmonary Medicine
Subjects:
Online Access:https://doi.org/10.1186/s12890-025-03855-y
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849334037301166080
author Jiaojiao Xu
Wei Zhang
Weili Bai
Nannan Gai
Jing Li
Yunqi Bao
author_facet Jiaojiao Xu
Wei Zhang
Weili Bai
Nannan Gai
Jing Li
Yunqi Bao
author_sort Jiaojiao Xu
collection DOAJ
description Abstract Background Interstitial lung disease (ILD) is a severe complication affecting 10–30% of rheumatoid arthritis (RA) patients. Current diagnostic methods typically detect ILD only after substantial lung damage has occurred. This delay emphasizes the need for early detection strategies. This study aims to develop and validate machine learning models for early RA-ILD prediction and identify key predictive biomarkers. Methods We conducted a cross-sectional study enrolling 149 RA patients (84 with ILD, 65 without ILD) between January 2020 and December 2023. We evaluated demographic characteristics, clinical parameters, and laboratory markers, including inflammatory indicators, hematological parameters, and specific biomarkers. We developed and compared four machine learning (ML) models (XGBoost, Random Forest, Support Vector Machine, and Logistic Regression) for ILD prediction capabilities. Results The XGBoost model demonstrated superior predictive performance (AUC = 0.891, 95% CI: 0.847–0.935). Feature importance analysis identified Krebs von den Lungen-6 (KL-6) as the strongest predictor (importance score = 0.285), followed by interleukin-6 (IL-6) and cytokeratin 19 fragment (CYFRA21-1). The ILD group exhibited significantly elevated levels of inflammatory markers and specific biomarkers, particularly KL-6 (826.4 ± 458.2 vs. 285.6 ± 124.8 U/ml, P < 0.001), alongside distinct patterns in hematological parameters. Conclusion Machine learning approaches, particularly XGBoost, demonstrate promising potential for early RA-ILD prediction. The integration of KL-6 and other identified biomarkers into clinical screening protocols may facilitate early detection and improved patient outcomes. These findings suggest that machine learning models could serve as valuable tools for risk stratification and early intervention in RA-ILD management, providing new approaches for individualized risk assessment in clinical practice.
format Article
id doaj-art-c80ed961e844422e8e7e68a130de1c79
institution Kabale University
issn 1471-2466
language English
publishDate 2025-08-01
publisher BMC
record_format Article
series BMC Pulmonary Medicine
spelling doaj-art-c80ed961e844422e8e7e68a130de1c792025-08-20T03:45:40ZengBMCBMC Pulmonary Medicine1471-24662025-08-0125111310.1186/s12890-025-03855-yA multi-biomarker machine learning approach for early prediction of interstitial lung disease in rheumatoid arthritisJiaojiao Xu0Wei Zhang1Weili Bai2Nannan Gai3Jing Li4Yunqi Bao5Department of Rheumatology, Xi’an Fifth HospitalDepartment of Rheumatology, Xi’an Fifth HospitalDepartment of Rheumatology, Xi’an Fifth HospitalDepartment of Rheumatology, Xi’an Fifth HospitalDepartment of Rheumatology, Xi’an Fifth HospitalDepartment of Rheumatology, Xi’an Fifth HospitalAbstract Background Interstitial lung disease (ILD) is a severe complication affecting 10–30% of rheumatoid arthritis (RA) patients. Current diagnostic methods typically detect ILD only after substantial lung damage has occurred. This delay emphasizes the need for early detection strategies. This study aims to develop and validate machine learning models for early RA-ILD prediction and identify key predictive biomarkers. Methods We conducted a cross-sectional study enrolling 149 RA patients (84 with ILD, 65 without ILD) between January 2020 and December 2023. We evaluated demographic characteristics, clinical parameters, and laboratory markers, including inflammatory indicators, hematological parameters, and specific biomarkers. We developed and compared four machine learning (ML) models (XGBoost, Random Forest, Support Vector Machine, and Logistic Regression) for ILD prediction capabilities. Results The XGBoost model demonstrated superior predictive performance (AUC = 0.891, 95% CI: 0.847–0.935). Feature importance analysis identified Krebs von den Lungen-6 (KL-6) as the strongest predictor (importance score = 0.285), followed by interleukin-6 (IL-6) and cytokeratin 19 fragment (CYFRA21-1). The ILD group exhibited significantly elevated levels of inflammatory markers and specific biomarkers, particularly KL-6 (826.4 ± 458.2 vs. 285.6 ± 124.8 U/ml, P < 0.001), alongside distinct patterns in hematological parameters. Conclusion Machine learning approaches, particularly XGBoost, demonstrate promising potential for early RA-ILD prediction. The integration of KL-6 and other identified biomarkers into clinical screening protocols may facilitate early detection and improved patient outcomes. These findings suggest that machine learning models could serve as valuable tools for risk stratification and early intervention in RA-ILD management, providing new approaches for individualized risk assessment in clinical practice.https://doi.org/10.1186/s12890-025-03855-yRheumatoid arthritisInterstitial lung diseaseMachine learningKrebs von Den Lungen-6
spellingShingle Jiaojiao Xu
Wei Zhang
Weili Bai
Nannan Gai
Jing Li
Yunqi Bao
A multi-biomarker machine learning approach for early prediction of interstitial lung disease in rheumatoid arthritis
BMC Pulmonary Medicine
Rheumatoid arthritis
Interstitial lung disease
Machine learning
Krebs von Den Lungen-6
title A multi-biomarker machine learning approach for early prediction of interstitial lung disease in rheumatoid arthritis
title_full A multi-biomarker machine learning approach for early prediction of interstitial lung disease in rheumatoid arthritis
title_fullStr A multi-biomarker machine learning approach for early prediction of interstitial lung disease in rheumatoid arthritis
title_full_unstemmed A multi-biomarker machine learning approach for early prediction of interstitial lung disease in rheumatoid arthritis
title_short A multi-biomarker machine learning approach for early prediction of interstitial lung disease in rheumatoid arthritis
title_sort multi biomarker machine learning approach for early prediction of interstitial lung disease in rheumatoid arthritis
topic Rheumatoid arthritis
Interstitial lung disease
Machine learning
Krebs von Den Lungen-6
url https://doi.org/10.1186/s12890-025-03855-y
work_keys_str_mv AT jiaojiaoxu amultibiomarkermachinelearningapproachforearlypredictionofinterstitiallungdiseaseinrheumatoidarthritis
AT weizhang amultibiomarkermachinelearningapproachforearlypredictionofinterstitiallungdiseaseinrheumatoidarthritis
AT weilibai amultibiomarkermachinelearningapproachforearlypredictionofinterstitiallungdiseaseinrheumatoidarthritis
AT nannangai amultibiomarkermachinelearningapproachforearlypredictionofinterstitiallungdiseaseinrheumatoidarthritis
AT jingli amultibiomarkermachinelearningapproachforearlypredictionofinterstitiallungdiseaseinrheumatoidarthritis
AT yunqibao amultibiomarkermachinelearningapproachforearlypredictionofinterstitiallungdiseaseinrheumatoidarthritis
AT jiaojiaoxu multibiomarkermachinelearningapproachforearlypredictionofinterstitiallungdiseaseinrheumatoidarthritis
AT weizhang multibiomarkermachinelearningapproachforearlypredictionofinterstitiallungdiseaseinrheumatoidarthritis
AT weilibai multibiomarkermachinelearningapproachforearlypredictionofinterstitiallungdiseaseinrheumatoidarthritis
AT nannangai multibiomarkermachinelearningapproachforearlypredictionofinterstitiallungdiseaseinrheumatoidarthritis
AT jingli multibiomarkermachinelearningapproachforearlypredictionofinterstitiallungdiseaseinrheumatoidarthritis
AT yunqibao multibiomarkermachinelearningapproachforearlypredictionofinterstitiallungdiseaseinrheumatoidarthritis