Ensemble-SMOTE: Mitigating Class Imbalance in Graduate on Time Detection

In education, detecting students graduating on time is difficult due to high data complexity. Researchers have employed various approaches in identifying on-time graduation with Machine Learning, but it remains a challenging task due to the class imbalance in the dataset. This study has aimed to (i)...

Full description

Saved in:
Bibliographic Details
Main Authors: Theng-Jia Law, Choo-Yee Ting, Hu Ng, Hui-Ngo Goh, Albert Quek
Format: Article
Language:English
Published: MMU Press 2024-06-01
Series:Journal of Informatics and Web Engineering
Subjects:
Online Access:https://journals.mmupress.com/index.php/jiwe/article/view/1076
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846137785727057920
author Theng-Jia Law
Choo-Yee Ting
Hu Ng
Hui-Ngo Goh
Albert Quek
author_facet Theng-Jia Law
Choo-Yee Ting
Hu Ng
Hui-Ngo Goh
Albert Quek
author_sort Theng-Jia Law
collection DOAJ
description In education, detecting students graduating on time is difficult due to high data complexity. Researchers have employed various approaches in identifying on-time graduation with Machine Learning, but it remains a challenging task due to the class imbalance in the dataset. This study has aimed to (i) compare various class imbalance treatment methods with different sampling ratios, (ii) propose an ensemble class imbalance treatment method in mitigating the problem of class imbalance, and (iii) develop and evaluate predictive models in identifying the likelihood of students graduating on time during their studies in university. The dataset is collected from 4007 graduates of a university from year 2021 and 2022 with 41 variables. After feature selection, various class imbalance treatment methods were compared with different sampling ratios ranging from 50% to 90%. Moreover, Ensemble-SMOTE is proposed to aggregate the dataset generated by Synthetic Minority Oversampling Technique variants in mitigating the problem of class imbalance effectively. The dataset generated by class imbalance treatment methods were used as the input of the predictive models in detecting on-time graduation. The predictive models were evaluated based on accuracy, precision, recall, F0.5-score, F1-score, F2-score, Area under the Curve, and Area Under the Precision-Recall Curve. Based on the findings, Logistic Regression with Ensemble-SMOTE outperformed other predictive models, and class imbalance treatment methods by achieving the highest average accuracy (87.24), recall (92.50%), F1-score (91.30%), and F2-score (92.02%) from 6th until 10th trimester. To assess the effectiveness of class imbalance treatment methods, Friedman test is performed to determine on significant difference between the models after applying Shapiro-Wilk test in normality test. Consequently, Ensemble-SMOTE is ranked as the top-performers by achieving the lowest value in the average rank based on the performance metrics. Additional research could incorporate and examine more complicated approaches in mitigating class imbalance when the dataset is highly imbalanced.
format Article
id doaj-art-2c2fdbd6d7424cd6a2323d4164faabe8
institution Kabale University
issn 2821-370X
language English
publishDate 2024-06-01
publisher MMU Press
record_format Article
series Journal of Informatics and Web Engineering
spelling doaj-art-2c2fdbd6d7424cd6a2323d4164faabe82024-12-08T04:29:38ZengMMU PressJournal of Informatics and Web Engineering2821-370X2024-06-013222925010.33093/jiwe.2024.3.2.171076Ensemble-SMOTE: Mitigating Class Imbalance in Graduate on Time DetectionTheng-Jia Law0https://orcid.org/0009-0001-2361-3614Choo-Yee Ting1Hu Ng2Hui-Ngo Goh3Albert Quek4Multimedia University, MalaysiaMultimedia University, MalaysiaMultimedia University, MalaysiaMultimedia University, MalaysiaMultimedia University, MalaysiaIn education, detecting students graduating on time is difficult due to high data complexity. Researchers have employed various approaches in identifying on-time graduation with Machine Learning, but it remains a challenging task due to the class imbalance in the dataset. This study has aimed to (i) compare various class imbalance treatment methods with different sampling ratios, (ii) propose an ensemble class imbalance treatment method in mitigating the problem of class imbalance, and (iii) develop and evaluate predictive models in identifying the likelihood of students graduating on time during their studies in university. The dataset is collected from 4007 graduates of a university from year 2021 and 2022 with 41 variables. After feature selection, various class imbalance treatment methods were compared with different sampling ratios ranging from 50% to 90%. Moreover, Ensemble-SMOTE is proposed to aggregate the dataset generated by Synthetic Minority Oversampling Technique variants in mitigating the problem of class imbalance effectively. The dataset generated by class imbalance treatment methods were used as the input of the predictive models in detecting on-time graduation. The predictive models were evaluated based on accuracy, precision, recall, F0.5-score, F1-score, F2-score, Area under the Curve, and Area Under the Precision-Recall Curve. Based on the findings, Logistic Regression with Ensemble-SMOTE outperformed other predictive models, and class imbalance treatment methods by achieving the highest average accuracy (87.24), recall (92.50%), F1-score (91.30%), and F2-score (92.02%) from 6th until 10th trimester. To assess the effectiveness of class imbalance treatment methods, Friedman test is performed to determine on significant difference between the models after applying Shapiro-Wilk test in normality test. Consequently, Ensemble-SMOTE is ranked as the top-performers by achieving the lowest value in the average rank based on the performance metrics. Additional research could incorporate and examine more complicated approaches in mitigating class imbalance when the dataset is highly imbalanced.https://journals.mmupress.com/index.php/jiwe/article/view/1076graduate on timein-universityclass imbalanceartificial intelligencemachine learning
spellingShingle Theng-Jia Law
Choo-Yee Ting
Hu Ng
Hui-Ngo Goh
Albert Quek
Ensemble-SMOTE: Mitigating Class Imbalance in Graduate on Time Detection
Journal of Informatics and Web Engineering
graduate on time
in-university
class imbalance
artificial intelligence
machine learning
title Ensemble-SMOTE: Mitigating Class Imbalance in Graduate on Time Detection
title_full Ensemble-SMOTE: Mitigating Class Imbalance in Graduate on Time Detection
title_fullStr Ensemble-SMOTE: Mitigating Class Imbalance in Graduate on Time Detection
title_full_unstemmed Ensemble-SMOTE: Mitigating Class Imbalance in Graduate on Time Detection
title_short Ensemble-SMOTE: Mitigating Class Imbalance in Graduate on Time Detection
title_sort ensemble smote mitigating class imbalance in graduate on time detection
topic graduate on time
in-university
class imbalance
artificial intelligence
machine learning
url https://journals.mmupress.com/index.php/jiwe/article/view/1076
work_keys_str_mv AT thengjialaw ensemblesmotemitigatingclassimbalanceingraduateontimedetection
AT chooyeeting ensemblesmotemitigatingclassimbalanceingraduateontimedetection
AT hung ensemblesmotemitigatingclassimbalanceingraduateontimedetection
AT huingogoh ensemblesmotemitigatingclassimbalanceingraduateontimedetection
AT albertquek ensemblesmotemitigatingclassimbalanceingraduateontimedetection