MKC-SMOTE: A Novel Synthetic Oversampling Method for Multi-Class Imbalanced Data Classification
The learning of multi-class imbalance problems presents greater challenges and has fewer research results compared to binary imbalance problems. Resampling techniques are widely employed to address data imbalance problems. However, the majority of existing resampling methods are designed specificall...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2024-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10811922/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841563337756770304 |
---|---|
author | Jiao Wang Norhashidah Awang |
author_facet | Jiao Wang Norhashidah Awang |
author_sort | Jiao Wang |
collection | DOAJ |
description | The learning of multi-class imbalance problems presents greater challenges and has fewer research results compared to binary imbalance problems. Resampling techniques are widely employed to address data imbalance problems. However, the majority of existing resampling methods are designed specifically for binary imbalance datasets and demonstrate significant limitations when applied to multi-class imbalance datasets. Therefore, this study introduces the MKC-SMOTE algorithm, a novel and effective method specifically tailored for multi-class imbalanced datasets. During the pre-processing phase, the algorithm takes into account the distribution of all classes and employs the k-nearest neighbors (kNN) algorithm to identify appropriate original samples for synthesizing minority class samples. It then utilizes an enhanced SMOTE algorithm for interpolation. In the post-processing phase, potentially misleading synthesized samples are eliminated by the undersampling technique. Consequently, the MKC-SMOTE algorithm generates high-quality minority class samples by strategically exploring the distributional regions of the classes. Extensive experiments were conducted on 21 real-world datasets, comparing the MKC-SMOTE algorithm with six imbalance problem handling methods and two classifiers. The results demonstrate that the MKC-SMOTE algorithm significantly enhances the classification performance of multi-class imbalanced datasets and outperforms several popular and state-of-the-art oversampling methods. |
format | Article |
id | doaj-art-01b1b318ed4a48a1a894caef7b65ece4 |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2024-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-01b1b318ed4a48a1a894caef7b65ece42025-01-03T00:00:39ZengIEEEIEEE Access2169-35362024-01-011219692919693810.1109/ACCESS.2024.352112010811922MKC-SMOTE: A Novel Synthetic Oversampling Method for Multi-Class Imbalanced Data ClassificationJiao Wang0https://orcid.org/0000-0001-6022-1807Norhashidah Awang1https://orcid.org/0000-0002-2280-7193School of Mathematical Sciences, Universiti Sains Malaysia, Penang, MalaysiaSchool of Mathematical Sciences, Universiti Sains Malaysia, Penang, MalaysiaThe learning of multi-class imbalance problems presents greater challenges and has fewer research results compared to binary imbalance problems. Resampling techniques are widely employed to address data imbalance problems. However, the majority of existing resampling methods are designed specifically for binary imbalance datasets and demonstrate significant limitations when applied to multi-class imbalance datasets. Therefore, this study introduces the MKC-SMOTE algorithm, a novel and effective method specifically tailored for multi-class imbalanced datasets. During the pre-processing phase, the algorithm takes into account the distribution of all classes and employs the k-nearest neighbors (kNN) algorithm to identify appropriate original samples for synthesizing minority class samples. It then utilizes an enhanced SMOTE algorithm for interpolation. In the post-processing phase, potentially misleading synthesized samples are eliminated by the undersampling technique. Consequently, the MKC-SMOTE algorithm generates high-quality minority class samples by strategically exploring the distributional regions of the classes. Extensive experiments were conducted on 21 real-world datasets, comparing the MKC-SMOTE algorithm with six imbalance problem handling methods and two classifiers. The results demonstrate that the MKC-SMOTE algorithm significantly enhances the classification performance of multi-class imbalanced datasets and outperforms several popular and state-of-the-art oversampling methods.https://ieeexplore.ieee.org/document/10811922/Multi-class imbalanced datasetclassificationSMOTE algorithmsynthetic minorityoversampling |
spellingShingle | Jiao Wang Norhashidah Awang MKC-SMOTE: A Novel Synthetic Oversampling Method for Multi-Class Imbalanced Data Classification IEEE Access Multi-class imbalanced dataset classification SMOTE algorithm synthetic minority oversampling |
title | MKC-SMOTE: A Novel Synthetic Oversampling Method for Multi-Class Imbalanced Data Classification |
title_full | MKC-SMOTE: A Novel Synthetic Oversampling Method for Multi-Class Imbalanced Data Classification |
title_fullStr | MKC-SMOTE: A Novel Synthetic Oversampling Method for Multi-Class Imbalanced Data Classification |
title_full_unstemmed | MKC-SMOTE: A Novel Synthetic Oversampling Method for Multi-Class Imbalanced Data Classification |
title_short | MKC-SMOTE: A Novel Synthetic Oversampling Method for Multi-Class Imbalanced Data Classification |
title_sort | mkc smote a novel synthetic oversampling method for multi class imbalanced data classification |
topic | Multi-class imbalanced dataset classification SMOTE algorithm synthetic minority oversampling |
url | https://ieeexplore.ieee.org/document/10811922/ |
work_keys_str_mv | AT jiaowang mkcsmoteanovelsyntheticoversamplingmethodformulticlassimbalanceddataclassification AT norhashidahawang mkcsmoteanovelsyntheticoversamplingmethodformulticlassimbalanceddataclassification |