An Early Warning Method Based on Blending of Deep Generative Model and Oversampling Model for Online Learning
Early warning for learning performance requires to identify the maximum number of at-risk students as early as possible within a semester. However, educational data often suffer from the issue of data imbalance, making it challenging to simultaneously achieve both high precision (accurate identifica...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10974956/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Early warning for learning performance requires to identify the maximum number of at-risk students as early as possible within a semester. However, educational data often suffer from the issue of data imbalance, making it challenging to simultaneously achieve both high precision (accurate identification) and high recall (comprehensive coverage) in at-risk student detection. Deep generative models and oversampling models are effective methods to solve data imbalance issues, which can improve classification performance. This paper proposes a method that combines the advantages of deep generative models and oversampling models to build a blending model for dealing with imbalanced educational data, which can effectively improve the precision, recall, F1-score and AUC for online learning early warning. First, we compare baseline models to select the best classifier, then choose the highest-precision deep generative model and the highest-recall oversampling model to construct blending models, which are shown to improve early warning prediction metrics. Finally, interpretable models are used to analyze differences in at-risk student prediction between the blending model, deep generative model, and oversampling model. The proposed models are validated on both extremely imbalanced datasets and new semester datasets. Results show that: (1) Compared to the baseline model, both the base learners built by the deep generative model and the oversampling model can improve the evaluation metrics of the model, the deep generative base learners achieve higher precision than the oversampling model, while the oversampling base learners achieve higher recall than the deep generative base learners. (2) The blending model composed of deep generative base learner and oversampling base learner can further improve the F1-score and AUC based on their individual strengths, the proposed blending model can also conduct effective early warning three units earlier than baseline models. (3) Compared to its base learners, blending model G-B-Blending changes the key variables for prediction, and the at-risk students identified by the blending model come from the union set of at-risk students identified by GAN+GB and B-SMOTE+GB individually. (4) The blending model proposed in this paper achieves better prediction results than the baseline on both extremely imbalanced datasets and new semesters datasets, it can identify more at-risk students more accurately at earlier units, allowing teachers to save more energy and time for teaching interventions. This research provides significant insights for dealing with imbalanced datasets by blending with deep generative model and oversampling model in education. |
|---|---|
| ISSN: | 2169-3536 |