Impact of Data Pre-Processing Techniques on XGBoost Model Performance for Predicting All-Cause Readmission and Mortality Among Patients with Heart Failure

<b>Background:</b> Heart failure poses a significant global health challenge, with high rates of readmission and mortality. Accurate models to predict these outcomes are essential for effective patient management. This study investigates the impact of data pre-processing techniques on XG...

Full description

Saved in:
Bibliographic Details
Main Authors: Qisthi Alhazmi Hidayaturrohman, Eisuke Hanada
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:BioMedInformatics
Subjects:
Online Access:https://www.mdpi.com/2673-7426/4/4/118
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846105675120246784
author Qisthi Alhazmi Hidayaturrohman
Eisuke Hanada
author_facet Qisthi Alhazmi Hidayaturrohman
Eisuke Hanada
author_sort Qisthi Alhazmi Hidayaturrohman
collection DOAJ
description <b>Background:</b> Heart failure poses a significant global health challenge, with high rates of readmission and mortality. Accurate models to predict these outcomes are essential for effective patient management. This study investigates the impact of data pre-processing techniques on XGBoost model performance in predicting all-cause readmission and mortality among heart failure patients. <b>Methods:</b> A dataset of 168 features from 2008 heart failure patients was used. Pre-processing included handling missing values, categorical encoding, and standardization. Four imputation techniques were compared: Mean, Multivariate Imputation by Chained Equations (MICEs), k-nearest Neighbors (kNNs), and Random Forest (RF). XGBoost models were evaluated using accuracy, recall, F1-score, and Area Under the Curve (AUC). Robustness was assessed through 10-fold cross-validation. <b>Results:</b> The XGBoost model with kNN imputation, one-hot encoding, and standardization outperformed others, with an accuracy of 0.614, recall of 0.551, and F1-score of 0.476. The MICE-based model achieved the highest AUC (0.647) and mean AUC (0.65 ± 0.04) in cross-validation. All pre-processed models outperformed the default XGBoost model (AUC: 0.60). <b>Conclusions:</b> Data pre-processing, especially MICE with one-hot encoding and standardization, improves XGBoost performance in heart failure prediction. However, moderate AUC scores suggest further steps are needed to enhance predictive accuracy.
format Article
id doaj-art-f1893abc54a34db2880cebc758b4a4d5
institution Kabale University
issn 2673-7426
language English
publishDate 2024-11-01
publisher MDPI AG
record_format Article
series BioMedInformatics
spelling doaj-art-f1893abc54a34db2880cebc758b4a4d52024-12-27T14:13:19ZengMDPI AGBioMedInformatics2673-74262024-11-01442201221210.3390/biomedinformatics4040118Impact of Data Pre-Processing Techniques on XGBoost Model Performance for Predicting All-Cause Readmission and Mortality Among Patients with Heart FailureQisthi Alhazmi Hidayaturrohman0Eisuke Hanada1Graduate School of Science and Engineering, Saga University, Saga 840-8502, JapanFaculty of Science and Engineering, Saga University, Saga 840-8502, Japan<b>Background:</b> Heart failure poses a significant global health challenge, with high rates of readmission and mortality. Accurate models to predict these outcomes are essential for effective patient management. This study investigates the impact of data pre-processing techniques on XGBoost model performance in predicting all-cause readmission and mortality among heart failure patients. <b>Methods:</b> A dataset of 168 features from 2008 heart failure patients was used. Pre-processing included handling missing values, categorical encoding, and standardization. Four imputation techniques were compared: Mean, Multivariate Imputation by Chained Equations (MICEs), k-nearest Neighbors (kNNs), and Random Forest (RF). XGBoost models were evaluated using accuracy, recall, F1-score, and Area Under the Curve (AUC). Robustness was assessed through 10-fold cross-validation. <b>Results:</b> The XGBoost model with kNN imputation, one-hot encoding, and standardization outperformed others, with an accuracy of 0.614, recall of 0.551, and F1-score of 0.476. The MICE-based model achieved the highest AUC (0.647) and mean AUC (0.65 ± 0.04) in cross-validation. All pre-processed models outperformed the default XGBoost model (AUC: 0.60). <b>Conclusions:</b> Data pre-processing, especially MICE with one-hot encoding and standardization, improves XGBoost performance in heart failure prediction. However, moderate AUC scores suggest further steps are needed to enhance predictive accuracy.https://www.mdpi.com/2673-7426/4/4/118heart failureXGBoostdata pre-processingimputationpredictive analyticsstandardization
spellingShingle Qisthi Alhazmi Hidayaturrohman
Eisuke Hanada
Impact of Data Pre-Processing Techniques on XGBoost Model Performance for Predicting All-Cause Readmission and Mortality Among Patients with Heart Failure
BioMedInformatics
heart failure
XGBoost
data pre-processing
imputation
predictive analytics
standardization
title Impact of Data Pre-Processing Techniques on XGBoost Model Performance for Predicting All-Cause Readmission and Mortality Among Patients with Heart Failure
title_full Impact of Data Pre-Processing Techniques on XGBoost Model Performance for Predicting All-Cause Readmission and Mortality Among Patients with Heart Failure
title_fullStr Impact of Data Pre-Processing Techniques on XGBoost Model Performance for Predicting All-Cause Readmission and Mortality Among Patients with Heart Failure
title_full_unstemmed Impact of Data Pre-Processing Techniques on XGBoost Model Performance for Predicting All-Cause Readmission and Mortality Among Patients with Heart Failure
title_short Impact of Data Pre-Processing Techniques on XGBoost Model Performance for Predicting All-Cause Readmission and Mortality Among Patients with Heart Failure
title_sort impact of data pre processing techniques on xgboost model performance for predicting all cause readmission and mortality among patients with heart failure
topic heart failure
XGBoost
data pre-processing
imputation
predictive analytics
standardization
url https://www.mdpi.com/2673-7426/4/4/118
work_keys_str_mv AT qisthialhazmihidayaturrohman impactofdatapreprocessingtechniquesonxgboostmodelperformanceforpredictingallcausereadmissionandmortalityamongpatientswithheartfailure
AT eisukehanada impactofdatapreprocessingtechniquesonxgboostmodelperformanceforpredictingallcausereadmissionandmortalityamongpatientswithheartfailure