An oversampling FCM-KSMOTE algorithm for imbalanced data classification
In recent years, imbalanced data classification has emerged as a challenging task. To address this issue, we propose a novel oversampling method named FCM-KSMOTE. The algorithm initially performs a density-based fuzzy clustering on the data, then iterates to partition regions and perform oversamplin...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2024-12-01
|
| Series: | Journal of King Saud University: Computer and Information Sciences |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S1319157824003379 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1846100637059645440 |
|---|---|
| author | Hongfang Zhou Jiahao Tong Yuhan Liu Kangyun Zheng Chenhui Cao |
| author_facet | Hongfang Zhou Jiahao Tong Yuhan Liu Kangyun Zheng Chenhui Cao |
| author_sort | Hongfang Zhou |
| collection | DOAJ |
| description | In recent years, imbalanced data classification has emerged as a challenging task. To address this issue, we propose a novel oversampling method named FCM-KSMOTE. The algorithm initially performs a density-based fuzzy clustering on the data, then iterates to partition regions and perform oversampling inside each cluster. Secondly, it merges the clusters and conducts noise detection to obtain a balanced dataset. Finally, we conducted the experiments on 19 public datasets and 3 synthetic datasets. Six evaluation metrics of Recall, Accuracy, G-mean, Specificity, AUC and F1-Score were used in the experiments. The experimental results demonstrate that our method can significantly improve the recognition rate of the minority class while maintaining high accuracy for the majority class. Particularly with the RF classifier, our method ranks first in all evaluation metrics, with a Recall difference of up to 0.2 compared to the least performing method, demonstrating its substantial performance advantage. |
| format | Article |
| id | doaj-art-c1791d6a953c4f828bbfa714c1127ea0 |
| institution | Kabale University |
| issn | 1319-1578 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Journal of King Saud University: Computer and Information Sciences |
| spelling | doaj-art-c1791d6a953c4f828bbfa714c1127ea02024-12-30T04:15:29ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782024-12-013610102248An oversampling FCM-KSMOTE algorithm for imbalanced data classificationHongfang Zhou0Jiahao Tong1Yuhan Liu2Kangyun Zheng3Chenhui Cao4School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, China; Shaanxi Key Laboratory of Network Computing and Security Technology, Xi’an 710048, China; Corresponding author at: School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, China.School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, ChinaSchool of Finance, Hebei University of Economics and Business, Shijiazhuang 050061, ChinaSchool of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, ChinaSchool of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, ChinaIn recent years, imbalanced data classification has emerged as a challenging task. To address this issue, we propose a novel oversampling method named FCM-KSMOTE. The algorithm initially performs a density-based fuzzy clustering on the data, then iterates to partition regions and perform oversampling inside each cluster. Secondly, it merges the clusters and conducts noise detection to obtain a balanced dataset. Finally, we conducted the experiments on 19 public datasets and 3 synthetic datasets. Six evaluation metrics of Recall, Accuracy, G-mean, Specificity, AUC and F1-Score were used in the experiments. The experimental results demonstrate that our method can significantly improve the recognition rate of the minority class while maintaining high accuracy for the majority class. Particularly with the RF classifier, our method ranks first in all evaluation metrics, with a Recall difference of up to 0.2 compared to the least performing method, demonstrating its substantial performance advantage.http://www.sciencedirect.com/science/article/pii/S1319157824003379FCM-KSMOTEImbalanced data classificationDensity-based fuzzy clusteringPartition regionsOversampling |
| spellingShingle | Hongfang Zhou Jiahao Tong Yuhan Liu Kangyun Zheng Chenhui Cao An oversampling FCM-KSMOTE algorithm for imbalanced data classification Journal of King Saud University: Computer and Information Sciences FCM-KSMOTE Imbalanced data classification Density-based fuzzy clustering Partition regions Oversampling |
| title | An oversampling FCM-KSMOTE algorithm for imbalanced data classification |
| title_full | An oversampling FCM-KSMOTE algorithm for imbalanced data classification |
| title_fullStr | An oversampling FCM-KSMOTE algorithm for imbalanced data classification |
| title_full_unstemmed | An oversampling FCM-KSMOTE algorithm for imbalanced data classification |
| title_short | An oversampling FCM-KSMOTE algorithm for imbalanced data classification |
| title_sort | oversampling fcm ksmote algorithm for imbalanced data classification |
| topic | FCM-KSMOTE Imbalanced data classification Density-based fuzzy clustering Partition regions Oversampling |
| url | http://www.sciencedirect.com/science/article/pii/S1319157824003379 |
| work_keys_str_mv | AT hongfangzhou anoversamplingfcmksmotealgorithmforimbalanceddataclassification AT jiahaotong anoversamplingfcmksmotealgorithmforimbalanceddataclassification AT yuhanliu anoversamplingfcmksmotealgorithmforimbalanceddataclassification AT kangyunzheng anoversamplingfcmksmotealgorithmforimbalanceddataclassification AT chenhuicao anoversamplingfcmksmotealgorithmforimbalanceddataclassification AT hongfangzhou oversamplingfcmksmotealgorithmforimbalanceddataclassification AT jiahaotong oversamplingfcmksmotealgorithmforimbalanceddataclassification AT yuhanliu oversamplingfcmksmotealgorithmforimbalanceddataclassification AT kangyunzheng oversamplingfcmksmotealgorithmforimbalanceddataclassification AT chenhuicao oversamplingfcmksmotealgorithmforimbalanceddataclassification |