Mitigating class imbalance in churn prediction with ensemble methods and SMOTE

Abstract This study examines how imbalanced datasets affect the accuracy of machine learning models, especially in predictive analytics applications such as churn prediction. When datasets are skewed towards the majority class, it can lead to biased model performance, reducing overall effectiveness....

Full description

Saved in:
Bibliographic Details
Main Authors: R. Suguna, J. Suriya Prakash, H. Aditya Pai, T. R. Mahesh, Venkatesan Vinoth Kumar, Temesgen Engida Yimer
Format: Article
Language:English
Published: Nature Portfolio 2025-05-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-01031-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849728327485489152
author R. Suguna
J. Suriya Prakash
H. Aditya Pai
T. R. Mahesh
Venkatesan Vinoth Kumar
Temesgen Engida Yimer
author_facet R. Suguna
J. Suriya Prakash
H. Aditya Pai
T. R. Mahesh
Venkatesan Vinoth Kumar
Temesgen Engida Yimer
author_sort R. Suguna
collection DOAJ
description Abstract This study examines how imbalanced datasets affect the accuracy of machine learning models, especially in predictive analytics applications such as churn prediction. When datasets are skewed towards the majority class, it can lead to biased model performance, reducing overall effectiveness. To analyze this impact, the research utilizes a churn dataset to evaluate how data imbalance influences model accuracy. The study utilized nine individual classifiers along with six homogeneous ensemble models to evaluate the effects of imbalanced data on model performance. Single classifier models struggle to identify underlying patterns in imbalanced data, while ensembles improve predictive performance by focusing on the minority class. However, when trained on unbalanced data, their accuracy remains subpar. The top six classifiers were selected for further investigation based on their performance on the imbalanced data. A SMOTE sampling technique was employed to create a balanced dataset, ensuring that all classes were adequately represented. The generated model’s performance improved from 61 to 79%, indicating the removal of bias in the target data. The results showed that Adaboost, an optimal classifier, demonstrated superior performance with an F1-Score of 87.6% in identifying potential churn and assessing customer account health. The findings emphasize the importance of balanced datasets for accurate ML model predictions.
format Article
id doaj-art-5ec2996e0b2149dcadfa3b1f956847d9
institution DOAJ
issn 2045-2322
language English
publishDate 2025-05-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-5ec2996e0b2149dcadfa3b1f956847d92025-08-20T03:09:35ZengNature PortfolioScientific Reports2045-23222025-05-0115112010.1038/s41598-025-01031-0Mitigating class imbalance in churn prediction with ensemble methods and SMOTER. Suguna0J. Suriya Prakash1H. Aditya Pai2T. R. Mahesh3Venkatesan Vinoth Kumar4Temesgen Engida Yimer5Department of Computer Science and Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and TechnologyDepartment of Computer Science and Engineering, JAIN (Deemed-to-be-University)Department of CSE, MIT School of Computing, MIT Art, Design and Technology UniversityDepartment of Computer Science and Engineering, JAIN (Deemed-to-be-University)School of Computer Science Engineering and Information Systems, Vellore Institute of TechnologyDepartment of Mathematics, Dilla UniversityAbstract This study examines how imbalanced datasets affect the accuracy of machine learning models, especially in predictive analytics applications such as churn prediction. When datasets are skewed towards the majority class, it can lead to biased model performance, reducing overall effectiveness. To analyze this impact, the research utilizes a churn dataset to evaluate how data imbalance influences model accuracy. The study utilized nine individual classifiers along with six homogeneous ensemble models to evaluate the effects of imbalanced data on model performance. Single classifier models struggle to identify underlying patterns in imbalanced data, while ensembles improve predictive performance by focusing on the minority class. However, when trained on unbalanced data, their accuracy remains subpar. The top six classifiers were selected for further investigation based on their performance on the imbalanced data. A SMOTE sampling technique was employed to create a balanced dataset, ensuring that all classes were adequately represented. The generated model’s performance improved from 61 to 79%, indicating the removal of bias in the target data. The results showed that Adaboost, an optimal classifier, demonstrated superior performance with an F1-Score of 87.6% in identifying potential churn and assessing customer account health. The findings emphasize the importance of balanced datasets for accurate ML model predictions.https://doi.org/10.1038/s41598-025-01031-0Churn predictionEnsemble modelsImbalanced dataMachine learningPredictive analyticsSampling techniques
spellingShingle R. Suguna
J. Suriya Prakash
H. Aditya Pai
T. R. Mahesh
Venkatesan Vinoth Kumar
Temesgen Engida Yimer
Mitigating class imbalance in churn prediction with ensemble methods and SMOTE
Scientific Reports
Churn prediction
Ensemble models
Imbalanced data
Machine learning
Predictive analytics
Sampling techniques
title Mitigating class imbalance in churn prediction with ensemble methods and SMOTE
title_full Mitigating class imbalance in churn prediction with ensemble methods and SMOTE
title_fullStr Mitigating class imbalance in churn prediction with ensemble methods and SMOTE
title_full_unstemmed Mitigating class imbalance in churn prediction with ensemble methods and SMOTE
title_short Mitigating class imbalance in churn prediction with ensemble methods and SMOTE
title_sort mitigating class imbalance in churn prediction with ensemble methods and smote
topic Churn prediction
Ensemble models
Imbalanced data
Machine learning
Predictive analytics
Sampling techniques
url https://doi.org/10.1038/s41598-025-01031-0
work_keys_str_mv AT rsuguna mitigatingclassimbalanceinchurnpredictionwithensemblemethodsandsmote
AT jsuriyaprakash mitigatingclassimbalanceinchurnpredictionwithensemblemethodsandsmote
AT hadityapai mitigatingclassimbalanceinchurnpredictionwithensemblemethodsandsmote
AT trmahesh mitigatingclassimbalanceinchurnpredictionwithensemblemethodsandsmote
AT venkatesanvinothkumar mitigatingclassimbalanceinchurnpredictionwithensemblemethodsandsmote
AT temesgenengidayimer mitigatingclassimbalanceinchurnpredictionwithensemblemethodsandsmote