An oversampling FCM-KSMOTE algorithm for imbalanced data classification

In recent years, imbalanced data classification has emerged as a challenging task. To address this issue, we propose a novel oversampling method named FCM-KSMOTE. The algorithm initially performs a density-based fuzzy clustering on the data, then iterates to partition regions and perform oversamplin...

Full description

Saved in:
Bibliographic Details
Main Authors: Hongfang Zhou, Jiahao Tong, Yuhan Liu, Kangyun Zheng, Chenhui Cao
Format: Article
Language:English
Published: Elsevier 2024-12-01
Series:Journal of King Saud University: Computer and Information Sciences
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1319157824003379
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846100637059645440
author Hongfang Zhou
Jiahao Tong
Yuhan Liu
Kangyun Zheng
Chenhui Cao
author_facet Hongfang Zhou
Jiahao Tong
Yuhan Liu
Kangyun Zheng
Chenhui Cao
author_sort Hongfang Zhou
collection DOAJ
description In recent years, imbalanced data classification has emerged as a challenging task. To address this issue, we propose a novel oversampling method named FCM-KSMOTE. The algorithm initially performs a density-based fuzzy clustering on the data, then iterates to partition regions and perform oversampling inside each cluster. Secondly, it merges the clusters and conducts noise detection to obtain a balanced dataset. Finally, we conducted the experiments on 19 public datasets and 3 synthetic datasets. Six evaluation metrics of Recall, Accuracy, G-mean, Specificity, AUC and F1-Score were used in the experiments. The experimental results demonstrate that our method can significantly improve the recognition rate of the minority class while maintaining high accuracy for the majority class. Particularly with the RF classifier, our method ranks first in all evaluation metrics, with a Recall difference of up to 0.2 compared to the least performing method, demonstrating its substantial performance advantage.
format Article
id doaj-art-c1791d6a953c4f828bbfa714c1127ea0
institution Kabale University
issn 1319-1578
language English
publishDate 2024-12-01
publisher Elsevier
record_format Article
series Journal of King Saud University: Computer and Information Sciences
spelling doaj-art-c1791d6a953c4f828bbfa714c1127ea02024-12-30T04:15:29ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782024-12-013610102248An oversampling FCM-KSMOTE algorithm for imbalanced data classificationHongfang Zhou0Jiahao Tong1Yuhan Liu2Kangyun Zheng3Chenhui Cao4School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, China; Shaanxi Key Laboratory of Network Computing and Security Technology, Xi’an 710048, China; Corresponding author at: School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, China.School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, ChinaSchool of Finance, Hebei University of Economics and Business, Shijiazhuang 050061, ChinaSchool of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, ChinaSchool of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, ChinaIn recent years, imbalanced data classification has emerged as a challenging task. To address this issue, we propose a novel oversampling method named FCM-KSMOTE. The algorithm initially performs a density-based fuzzy clustering on the data, then iterates to partition regions and perform oversampling inside each cluster. Secondly, it merges the clusters and conducts noise detection to obtain a balanced dataset. Finally, we conducted the experiments on 19 public datasets and 3 synthetic datasets. Six evaluation metrics of Recall, Accuracy, G-mean, Specificity, AUC and F1-Score were used in the experiments. The experimental results demonstrate that our method can significantly improve the recognition rate of the minority class while maintaining high accuracy for the majority class. Particularly with the RF classifier, our method ranks first in all evaluation metrics, with a Recall difference of up to 0.2 compared to the least performing method, demonstrating its substantial performance advantage.http://www.sciencedirect.com/science/article/pii/S1319157824003379FCM-KSMOTEImbalanced data classificationDensity-based fuzzy clusteringPartition regionsOversampling
spellingShingle Hongfang Zhou
Jiahao Tong
Yuhan Liu
Kangyun Zheng
Chenhui Cao
An oversampling FCM-KSMOTE algorithm for imbalanced data classification
Journal of King Saud University: Computer and Information Sciences
FCM-KSMOTE
Imbalanced data classification
Density-based fuzzy clustering
Partition regions
Oversampling
title An oversampling FCM-KSMOTE algorithm for imbalanced data classification
title_full An oversampling FCM-KSMOTE algorithm for imbalanced data classification
title_fullStr An oversampling FCM-KSMOTE algorithm for imbalanced data classification
title_full_unstemmed An oversampling FCM-KSMOTE algorithm for imbalanced data classification
title_short An oversampling FCM-KSMOTE algorithm for imbalanced data classification
title_sort oversampling fcm ksmote algorithm for imbalanced data classification
topic FCM-KSMOTE
Imbalanced data classification
Density-based fuzzy clustering
Partition regions
Oversampling
url http://www.sciencedirect.com/science/article/pii/S1319157824003379
work_keys_str_mv AT hongfangzhou anoversamplingfcmksmotealgorithmforimbalanceddataclassification
AT jiahaotong anoversamplingfcmksmotealgorithmforimbalanceddataclassification
AT yuhanliu anoversamplingfcmksmotealgorithmforimbalanceddataclassification
AT kangyunzheng anoversamplingfcmksmotealgorithmforimbalanceddataclassification
AT chenhuicao anoversamplingfcmksmotealgorithmforimbalanceddataclassification
AT hongfangzhou oversamplingfcmksmotealgorithmforimbalanceddataclassification
AT jiahaotong oversamplingfcmksmotealgorithmforimbalanceddataclassification
AT yuhanliu oversamplingfcmksmotealgorithmforimbalanceddataclassification
AT kangyunzheng oversamplingfcmksmotealgorithmforimbalanceddataclassification
AT chenhuicao oversamplingfcmksmotealgorithmforimbalanceddataclassification