Ensemble-SMOTE: Mitigating Class Imbalance in Graduate on Time Detection
In education, detecting students graduating on time is difficult due to high data complexity. Researchers have employed various approaches in identifying on-time graduation with Machine Learning, but it remains a challenging task due to the class imbalance in the dataset. This study has aimed to (i)...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MMU Press
2024-06-01
|
| Series: | Journal of Informatics and Web Engineering |
| Subjects: | |
| Online Access: | https://journals.mmupress.com/index.php/jiwe/article/view/1076 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1846137785727057920 |
|---|---|
| author | Theng-Jia Law Choo-Yee Ting Hu Ng Hui-Ngo Goh Albert Quek |
| author_facet | Theng-Jia Law Choo-Yee Ting Hu Ng Hui-Ngo Goh Albert Quek |
| author_sort | Theng-Jia Law |
| collection | DOAJ |
| description | In education, detecting students graduating on time is difficult due to high data complexity. Researchers have employed various approaches in identifying on-time graduation with Machine Learning, but it remains a challenging task due to the class imbalance in the dataset. This study has aimed to (i) compare various class imbalance treatment methods with different sampling ratios, (ii) propose an ensemble class imbalance treatment method in mitigating the problem of class imbalance, and (iii) develop and evaluate predictive models in identifying the likelihood of students graduating on time during their studies in university. The dataset is collected from 4007 graduates of a university from year 2021 and 2022 with 41 variables. After feature selection, various class imbalance treatment methods were compared with different sampling ratios ranging from 50% to 90%. Moreover, Ensemble-SMOTE is proposed to aggregate the dataset generated by Synthetic Minority Oversampling Technique variants in mitigating the problem of class imbalance effectively. The dataset generated by class imbalance treatment methods were used as the input of the predictive models in detecting on-time graduation. The predictive models were evaluated based on accuracy, precision, recall, F0.5-score, F1-score, F2-score, Area under the Curve, and Area Under the Precision-Recall Curve. Based on the findings, Logistic Regression with Ensemble-SMOTE outperformed other predictive models, and class imbalance treatment methods by achieving the highest average accuracy (87.24), recall (92.50%), F1-score (91.30%), and F2-score (92.02%) from 6th until 10th trimester. To assess the effectiveness of class imbalance treatment methods, Friedman test is performed to determine on significant difference between the models after applying Shapiro-Wilk test in normality test. Consequently, Ensemble-SMOTE is ranked as the top-performers by achieving the lowest value in the average rank based on the performance metrics. Additional research could incorporate and examine more complicated approaches in mitigating class imbalance when the dataset is highly imbalanced. |
| format | Article |
| id | doaj-art-2c2fdbd6d7424cd6a2323d4164faabe8 |
| institution | Kabale University |
| issn | 2821-370X |
| language | English |
| publishDate | 2024-06-01 |
| publisher | MMU Press |
| record_format | Article |
| series | Journal of Informatics and Web Engineering |
| spelling | doaj-art-2c2fdbd6d7424cd6a2323d4164faabe82024-12-08T04:29:38ZengMMU PressJournal of Informatics and Web Engineering2821-370X2024-06-013222925010.33093/jiwe.2024.3.2.171076Ensemble-SMOTE: Mitigating Class Imbalance in Graduate on Time DetectionTheng-Jia Law0https://orcid.org/0009-0001-2361-3614Choo-Yee Ting1Hu Ng2Hui-Ngo Goh3Albert Quek4Multimedia University, MalaysiaMultimedia University, MalaysiaMultimedia University, MalaysiaMultimedia University, MalaysiaMultimedia University, MalaysiaIn education, detecting students graduating on time is difficult due to high data complexity. Researchers have employed various approaches in identifying on-time graduation with Machine Learning, but it remains a challenging task due to the class imbalance in the dataset. This study has aimed to (i) compare various class imbalance treatment methods with different sampling ratios, (ii) propose an ensemble class imbalance treatment method in mitigating the problem of class imbalance, and (iii) develop and evaluate predictive models in identifying the likelihood of students graduating on time during their studies in university. The dataset is collected from 4007 graduates of a university from year 2021 and 2022 with 41 variables. After feature selection, various class imbalance treatment methods were compared with different sampling ratios ranging from 50% to 90%. Moreover, Ensemble-SMOTE is proposed to aggregate the dataset generated by Synthetic Minority Oversampling Technique variants in mitigating the problem of class imbalance effectively. The dataset generated by class imbalance treatment methods were used as the input of the predictive models in detecting on-time graduation. The predictive models were evaluated based on accuracy, precision, recall, F0.5-score, F1-score, F2-score, Area under the Curve, and Area Under the Precision-Recall Curve. Based on the findings, Logistic Regression with Ensemble-SMOTE outperformed other predictive models, and class imbalance treatment methods by achieving the highest average accuracy (87.24), recall (92.50%), F1-score (91.30%), and F2-score (92.02%) from 6th until 10th trimester. To assess the effectiveness of class imbalance treatment methods, Friedman test is performed to determine on significant difference between the models after applying Shapiro-Wilk test in normality test. Consequently, Ensemble-SMOTE is ranked as the top-performers by achieving the lowest value in the average rank based on the performance metrics. Additional research could incorporate and examine more complicated approaches in mitigating class imbalance when the dataset is highly imbalanced.https://journals.mmupress.com/index.php/jiwe/article/view/1076graduate on timein-universityclass imbalanceartificial intelligencemachine learning |
| spellingShingle | Theng-Jia Law Choo-Yee Ting Hu Ng Hui-Ngo Goh Albert Quek Ensemble-SMOTE: Mitigating Class Imbalance in Graduate on Time Detection Journal of Informatics and Web Engineering graduate on time in-university class imbalance artificial intelligence machine learning |
| title | Ensemble-SMOTE: Mitigating Class Imbalance in Graduate on Time Detection |
| title_full | Ensemble-SMOTE: Mitigating Class Imbalance in Graduate on Time Detection |
| title_fullStr | Ensemble-SMOTE: Mitigating Class Imbalance in Graduate on Time Detection |
| title_full_unstemmed | Ensemble-SMOTE: Mitigating Class Imbalance in Graduate on Time Detection |
| title_short | Ensemble-SMOTE: Mitigating Class Imbalance in Graduate on Time Detection |
| title_sort | ensemble smote mitigating class imbalance in graduate on time detection |
| topic | graduate on time in-university class imbalance artificial intelligence machine learning |
| url | https://journals.mmupress.com/index.php/jiwe/article/view/1076 |
| work_keys_str_mv | AT thengjialaw ensemblesmotemitigatingclassimbalanceingraduateontimedetection AT chooyeeting ensemblesmotemitigatingclassimbalanceingraduateontimedetection AT hung ensemblesmotemitigatingclassimbalanceingraduateontimedetection AT huingogoh ensemblesmotemitigatingclassimbalanceingraduateontimedetection AT albertquek ensemblesmotemitigatingclassimbalanceingraduateontimedetection |