A new risk assessment model of venous thromboembolism by considering fuzzy population

Abstract Background Inpatients with high risk of venous thromboembolism (VTE) usually face serious threats to their health and economic conditions. Many studies using machine learning (ML) models to predict VTE risk overlook the impact of class-imbalance problem due to the low incidence rate of VTE,...

Full description

Saved in:
Bibliographic Details
Main Authors: Xin Wang, Yu-Qing Yang, Xin-Yu Hong, Si-Hua Liu, Jian-Chu Li, Ting Chen, Ju-Hong Shi
Format: Article
Language:English
Published: BMC 2024-12-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-024-02834-3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841559390239326208
author Xin Wang
Yu-Qing Yang
Xin-Yu Hong
Si-Hua Liu
Jian-Chu Li
Ting Chen
Ju-Hong Shi
author_facet Xin Wang
Yu-Qing Yang
Xin-Yu Hong
Si-Hua Liu
Jian-Chu Li
Ting Chen
Ju-Hong Shi
author_sort Xin Wang
collection DOAJ
description Abstract Background Inpatients with high risk of venous thromboembolism (VTE) usually face serious threats to their health and economic conditions. Many studies using machine learning (ML) models to predict VTE risk overlook the impact of class-imbalance problem due to the low incidence rate of VTE, resulting in inferior and unstable model performance, which hinders their ability to replace the Padua model, a widely used linear weighted model in clinic. Our study aims to develop a new VTE risk assessment model suitable for Chinese medical inpatients. Methods 3284 inpatients in the medical department of Peking Union Medical College Hospital (PUMCH) from January 2014 to June 2016 were collected. The training and test set were divided based on the admission time and inpatients from May 2016 to June 2016 were included as the test dataset. We explained the class imbalance problem from a clinical perspective and defined a new term, “fuzzy population”, to elaborate and model this phenomenon. By considering the “fuzzy population”, a new ML VTE risk assessment model was built through population splitting. Sensitivity and specificity of our method was compared with five ML models (support vector machine (SVM), random forest (RF), gradient boosting decision tree (GBDT), logistic regression (LR), and XGBoost) and the Padua model. Results The ‘fuzzy population’ phenomenon was explained and verified on the VTE dataset. The proposed model achieved higher specificity (64.94% vs. 63.30%) and the same sensitivity (90.24% vs. 90.24%) on test data than the Padua model. Other five ML models couldn’t simultaneously surpass the Padua’s sensitivity and specificity. Besides, our model was more robust than five ML models and its standard deviations of sensitivities and specificities were smaller. Adjusting the distribution of negative samples in the training set based on the ‘fuzzy population’ would exacerbate the instability of performance of five ML models, which limited the application of ML methods in clinic. Conclusions The proposed model achieved higher sensitivity and specificity than the Padua model, and better robustness than traditional ML models. This study built a population-split-based ML model of VTE by modeling the class-imbalance problem and it can be applied more broadly in risk assessment of other diseases.
format Article
id doaj-art-c686309bff144bffa070545b3ef9b9e1
institution Kabale University
issn 1472-6947
language English
publishDate 2024-12-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj-art-c686309bff144bffa070545b3ef9b9e12025-01-05T12:32:30ZengBMCBMC Medical Informatics and Decision Making1472-69472024-12-0124111110.1186/s12911-024-02834-3A new risk assessment model of venous thromboembolism by considering fuzzy populationXin Wang0Yu-Qing Yang1Xin-Yu Hong2Si-Hua Liu3Jian-Chu Li4Ting Chen5Ju-Hong Shi6Department of Ultrasound, Peking Union Medical College HospitalState Key Laboratory of Networking and Switching Technology, Beijing University of Posts and TelecommunicationsDepartment of Respiration, Peking Union Medical College HospitalDepartment of Respiration, Peking Union Medical College HospitalDepartment of Ultrasound, Peking Union Medical College HospitalComputer Science and Technology, Tsinghua UniversityDepartment of Respiration, Peking Union Medical College HospitalAbstract Background Inpatients with high risk of venous thromboembolism (VTE) usually face serious threats to their health and economic conditions. Many studies using machine learning (ML) models to predict VTE risk overlook the impact of class-imbalance problem due to the low incidence rate of VTE, resulting in inferior and unstable model performance, which hinders their ability to replace the Padua model, a widely used linear weighted model in clinic. Our study aims to develop a new VTE risk assessment model suitable for Chinese medical inpatients. Methods 3284 inpatients in the medical department of Peking Union Medical College Hospital (PUMCH) from January 2014 to June 2016 were collected. The training and test set were divided based on the admission time and inpatients from May 2016 to June 2016 were included as the test dataset. We explained the class imbalance problem from a clinical perspective and defined a new term, “fuzzy population”, to elaborate and model this phenomenon. By considering the “fuzzy population”, a new ML VTE risk assessment model was built through population splitting. Sensitivity and specificity of our method was compared with five ML models (support vector machine (SVM), random forest (RF), gradient boosting decision tree (GBDT), logistic regression (LR), and XGBoost) and the Padua model. Results The ‘fuzzy population’ phenomenon was explained and verified on the VTE dataset. The proposed model achieved higher specificity (64.94% vs. 63.30%) and the same sensitivity (90.24% vs. 90.24%) on test data than the Padua model. Other five ML models couldn’t simultaneously surpass the Padua’s sensitivity and specificity. Besides, our model was more robust than five ML models and its standard deviations of sensitivities and specificities were smaller. Adjusting the distribution of negative samples in the training set based on the ‘fuzzy population’ would exacerbate the instability of performance of five ML models, which limited the application of ML methods in clinic. Conclusions The proposed model achieved higher sensitivity and specificity than the Padua model, and better robustness than traditional ML models. This study built a population-split-based ML model of VTE by modeling the class-imbalance problem and it can be applied more broadly in risk assessment of other diseases.https://doi.org/10.1186/s12911-024-02834-3Venous thromboembolismRisk assessmentMachine learningFuzzy population
spellingShingle Xin Wang
Yu-Qing Yang
Xin-Yu Hong
Si-Hua Liu
Jian-Chu Li
Ting Chen
Ju-Hong Shi
A new risk assessment model of venous thromboembolism by considering fuzzy population
BMC Medical Informatics and Decision Making
Venous thromboembolism
Risk assessment
Machine learning
Fuzzy population
title A new risk assessment model of venous thromboembolism by considering fuzzy population
title_full A new risk assessment model of venous thromboembolism by considering fuzzy population
title_fullStr A new risk assessment model of venous thromboembolism by considering fuzzy population
title_full_unstemmed A new risk assessment model of venous thromboembolism by considering fuzzy population
title_short A new risk assessment model of venous thromboembolism by considering fuzzy population
title_sort new risk assessment model of venous thromboembolism by considering fuzzy population
topic Venous thromboembolism
Risk assessment
Machine learning
Fuzzy population
url https://doi.org/10.1186/s12911-024-02834-3
work_keys_str_mv AT xinwang anewriskassessmentmodelofvenousthromboembolismbyconsideringfuzzypopulation
AT yuqingyang anewriskassessmentmodelofvenousthromboembolismbyconsideringfuzzypopulation
AT xinyuhong anewriskassessmentmodelofvenousthromboembolismbyconsideringfuzzypopulation
AT sihualiu anewriskassessmentmodelofvenousthromboembolismbyconsideringfuzzypopulation
AT jianchuli anewriskassessmentmodelofvenousthromboembolismbyconsideringfuzzypopulation
AT tingchen anewriskassessmentmodelofvenousthromboembolismbyconsideringfuzzypopulation
AT juhongshi anewriskassessmentmodelofvenousthromboembolismbyconsideringfuzzypopulation
AT xinwang newriskassessmentmodelofvenousthromboembolismbyconsideringfuzzypopulation
AT yuqingyang newriskassessmentmodelofvenousthromboembolismbyconsideringfuzzypopulation
AT xinyuhong newriskassessmentmodelofvenousthromboembolismbyconsideringfuzzypopulation
AT sihualiu newriskassessmentmodelofvenousthromboembolismbyconsideringfuzzypopulation
AT jianchuli newriskassessmentmodelofvenousthromboembolismbyconsideringfuzzypopulation
AT tingchen newriskassessmentmodelofvenousthromboembolismbyconsideringfuzzypopulation
AT juhongshi newriskassessmentmodelofvenousthromboembolismbyconsideringfuzzypopulation