Using XBGoost, an interpretable machine learning model, for diagnosing prostate cancer in patients with PSA < 20 ng/ml based on the PSAMR indicator

Abstract To create a diagnostic tool before biopsy for patients with prostate-specific antigen (PSA) levels < 20 ng/ml to minimize prostate biopsy-related discomfort and risks. Data from 655 patients who underwent transperineal prostate biopsy at the First Affiliated Hospital of Wannan Medical Co...

Full description

Saved in:
Bibliographic Details
Main Authors: Dengke Li, Baoyuan Chang, Qunlian Huang
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-85963-7
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841544645659590656
author Dengke Li
Baoyuan Chang
Qunlian Huang
author_facet Dengke Li
Baoyuan Chang
Qunlian Huang
author_sort Dengke Li
collection DOAJ
description Abstract To create a diagnostic tool before biopsy for patients with prostate-specific antigen (PSA) levels < 20 ng/ml to minimize prostate biopsy-related discomfort and risks. Data from 655 patients who underwent transperineal prostate biopsy at the First Affiliated Hospital of Wannan Medical College from July 2021 to January 2023 were collected and analyzed. After applying the Synthetic Minority Over-sampling TEchnique class balancing on the training set, multiple machine learning models were constructed by using the Least Absolute Shrinkage and Selection Operator (LASSO) feature selection to identify the significant variables. The best-performing model was selected and evaluated through tenfold cross-validation to ensure interpretability. Finally, the performance was assessed using the test set data for validation. The age, prostate-specific antigen mass ratio (PSAMR), Prostate Imaging–Reporting and Data System, and prostate volume were selected as the variables for model construction based on the LASSO regression. The receiver operating characteristic (ROC) results for multiple models in the validation set were as follows: XGBoost: 0.93 (0.88–0.97); logistic: 0.89 (0.83–0.95); LightGBM: 0.87 (0.80–0.93); AdaBoost: 0.90 (0.85–0.96); GNB: 0.88 (0.82–0.95); CNB: 0.79 (0.71–0.87); MLP: 0.78 (0.69–0.86); and Support Vector Machine: 0.81 (0.73–0.89). XGBoost was selected as the best model and reconstructed with tenfold cross-validation on the training data, resulting in the following ROC scores: training set 0.995 (0.991–0.999), validation set 0.945 (0.885–0.997 ), and test set 0.920 (0.868–0.972). The Kolmogorov–Smirnov curve, calibration curve and learning curve yielded positive results; The decision curve demonstrates that patients with threshold probabilities ranging from 10 to 95% can benefit from this model. We developed an XGBoost machine learning model based on the PSAMR indicator and interpreted it using the SHapley Additive exPlanations method. The model offered a high-performance non-invasive technique to diagnose prostate cancer in patients with PSA levels < 20 ng/ml.
format Article
id doaj-art-22e60f3adb5a4130ab60b8afdd109eb2
institution Kabale University
issn 2045-2322
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-22e60f3adb5a4130ab60b8afdd109eb22025-01-12T12:21:51ZengNature PortfolioScientific Reports2045-23222025-01-0115111110.1038/s41598-025-85963-7Using XBGoost, an interpretable machine learning model, for diagnosing prostate cancer in patients with PSA < 20 ng/ml based on the PSAMR indicatorDengke Li0Baoyuan Chang1Qunlian Huang2Department of Urology, The First Affiliated Hospital of Wannan Medical College, Yijishan HospitalDepartment of Urology, Suzhou Hospital of Anhui Medical University,(Suzhou Municipal Hospital of Anhui Province)Department of Urology, The First Affiliated Hospital of Wannan Medical College, Yijishan HospitalAbstract To create a diagnostic tool before biopsy for patients with prostate-specific antigen (PSA) levels < 20 ng/ml to minimize prostate biopsy-related discomfort and risks. Data from 655 patients who underwent transperineal prostate biopsy at the First Affiliated Hospital of Wannan Medical College from July 2021 to January 2023 were collected and analyzed. After applying the Synthetic Minority Over-sampling TEchnique class balancing on the training set, multiple machine learning models were constructed by using the Least Absolute Shrinkage and Selection Operator (LASSO) feature selection to identify the significant variables. The best-performing model was selected and evaluated through tenfold cross-validation to ensure interpretability. Finally, the performance was assessed using the test set data for validation. The age, prostate-specific antigen mass ratio (PSAMR), Prostate Imaging–Reporting and Data System, and prostate volume were selected as the variables for model construction based on the LASSO regression. The receiver operating characteristic (ROC) results for multiple models in the validation set were as follows: XGBoost: 0.93 (0.88–0.97); logistic: 0.89 (0.83–0.95); LightGBM: 0.87 (0.80–0.93); AdaBoost: 0.90 (0.85–0.96); GNB: 0.88 (0.82–0.95); CNB: 0.79 (0.71–0.87); MLP: 0.78 (0.69–0.86); and Support Vector Machine: 0.81 (0.73–0.89). XGBoost was selected as the best model and reconstructed with tenfold cross-validation on the training data, resulting in the following ROC scores: training set 0.995 (0.991–0.999), validation set 0.945 (0.885–0.997 ), and test set 0.920 (0.868–0.972). The Kolmogorov–Smirnov curve, calibration curve and learning curve yielded positive results; The decision curve demonstrates that patients with threshold probabilities ranging from 10 to 95% can benefit from this model. We developed an XGBoost machine learning model based on the PSAMR indicator and interpreted it using the SHapley Additive exPlanations method. The model offered a high-performance non-invasive technique to diagnose prostate cancer in patients with PSA levels < 20 ng/ml.https://doi.org/10.1038/s41598-025-85963-7XGBoost machine learning modelPSAMRProstate cancerSHAPSMOTE
spellingShingle Dengke Li
Baoyuan Chang
Qunlian Huang
Using XBGoost, an interpretable machine learning model, for diagnosing prostate cancer in patients with PSA < 20 ng/ml based on the PSAMR indicator
Scientific Reports
XGBoost machine learning model
PSAMR
Prostate cancer
SHAP
SMOTE
title Using XBGoost, an interpretable machine learning model, for diagnosing prostate cancer in patients with PSA < 20 ng/ml based on the PSAMR indicator
title_full Using XBGoost, an interpretable machine learning model, for diagnosing prostate cancer in patients with PSA < 20 ng/ml based on the PSAMR indicator
title_fullStr Using XBGoost, an interpretable machine learning model, for diagnosing prostate cancer in patients with PSA < 20 ng/ml based on the PSAMR indicator
title_full_unstemmed Using XBGoost, an interpretable machine learning model, for diagnosing prostate cancer in patients with PSA < 20 ng/ml based on the PSAMR indicator
title_short Using XBGoost, an interpretable machine learning model, for diagnosing prostate cancer in patients with PSA < 20 ng/ml based on the PSAMR indicator
title_sort using xbgoost an interpretable machine learning model for diagnosing prostate cancer in patients with psa 20 ng ml based on the psamr indicator
topic XGBoost machine learning model
PSAMR
Prostate cancer
SHAP
SMOTE
url https://doi.org/10.1038/s41598-025-85963-7
work_keys_str_mv AT dengkeli usingxbgoostaninterpretablemachinelearningmodelfordiagnosingprostatecancerinpatientswithpsa20ngmlbasedonthepsamrindicator
AT baoyuanchang usingxbgoostaninterpretablemachinelearningmodelfordiagnosingprostatecancerinpatientswithpsa20ngmlbasedonthepsamrindicator
AT qunlianhuang usingxbgoostaninterpretablemachinelearningmodelfordiagnosingprostatecancerinpatientswithpsa20ngmlbasedonthepsamrindicator