MKC-SMOTE: A Novel Synthetic Oversampling Method for Multi-Class Imbalanced Data Classification

The learning of multi-class imbalance problems presents greater challenges and has fewer research results compared to binary imbalance problems. Resampling techniques are widely employed to address data imbalance problems. However, the majority of existing resampling methods are designed specificall...

Full description

Saved in:
Bibliographic Details
Main Authors: Jiao Wang, Norhashidah Awang
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10811922/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841563337756770304
author Jiao Wang
Norhashidah Awang
author_facet Jiao Wang
Norhashidah Awang
author_sort Jiao Wang
collection DOAJ
description The learning of multi-class imbalance problems presents greater challenges and has fewer research results compared to binary imbalance problems. Resampling techniques are widely employed to address data imbalance problems. However, the majority of existing resampling methods are designed specifically for binary imbalance datasets and demonstrate significant limitations when applied to multi-class imbalance datasets. Therefore, this study introduces the MKC-SMOTE algorithm, a novel and effective method specifically tailored for multi-class imbalanced datasets. During the pre-processing phase, the algorithm takes into account the distribution of all classes and employs the k-nearest neighbors (kNN) algorithm to identify appropriate original samples for synthesizing minority class samples. It then utilizes an enhanced SMOTE algorithm for interpolation. In the post-processing phase, potentially misleading synthesized samples are eliminated by the undersampling technique. Consequently, the MKC-SMOTE algorithm generates high-quality minority class samples by strategically exploring the distributional regions of the classes. Extensive experiments were conducted on 21 real-world datasets, comparing the MKC-SMOTE algorithm with six imbalance problem handling methods and two classifiers. The results demonstrate that the MKC-SMOTE algorithm significantly enhances the classification performance of multi-class imbalanced datasets and outperforms several popular and state-of-the-art oversampling methods.
format Article
id doaj-art-01b1b318ed4a48a1a894caef7b65ece4
institution Kabale University
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-01b1b318ed4a48a1a894caef7b65ece42025-01-03T00:00:39ZengIEEEIEEE Access2169-35362024-01-011219692919693810.1109/ACCESS.2024.352112010811922MKC-SMOTE: A Novel Synthetic Oversampling Method for Multi-Class Imbalanced Data ClassificationJiao Wang0https://orcid.org/0000-0001-6022-1807Norhashidah Awang1https://orcid.org/0000-0002-2280-7193School of Mathematical Sciences, Universiti Sains Malaysia, Penang, MalaysiaSchool of Mathematical Sciences, Universiti Sains Malaysia, Penang, MalaysiaThe learning of multi-class imbalance problems presents greater challenges and has fewer research results compared to binary imbalance problems. Resampling techniques are widely employed to address data imbalance problems. However, the majority of existing resampling methods are designed specifically for binary imbalance datasets and demonstrate significant limitations when applied to multi-class imbalance datasets. Therefore, this study introduces the MKC-SMOTE algorithm, a novel and effective method specifically tailored for multi-class imbalanced datasets. During the pre-processing phase, the algorithm takes into account the distribution of all classes and employs the k-nearest neighbors (kNN) algorithm to identify appropriate original samples for synthesizing minority class samples. It then utilizes an enhanced SMOTE algorithm for interpolation. In the post-processing phase, potentially misleading synthesized samples are eliminated by the undersampling technique. Consequently, the MKC-SMOTE algorithm generates high-quality minority class samples by strategically exploring the distributional regions of the classes. Extensive experiments were conducted on 21 real-world datasets, comparing the MKC-SMOTE algorithm with six imbalance problem handling methods and two classifiers. The results demonstrate that the MKC-SMOTE algorithm significantly enhances the classification performance of multi-class imbalanced datasets and outperforms several popular and state-of-the-art oversampling methods.https://ieeexplore.ieee.org/document/10811922/Multi-class imbalanced datasetclassificationSMOTE algorithmsynthetic minorityoversampling
spellingShingle Jiao Wang
Norhashidah Awang
MKC-SMOTE: A Novel Synthetic Oversampling Method for Multi-Class Imbalanced Data Classification
IEEE Access
Multi-class imbalanced dataset
classification
SMOTE algorithm
synthetic minority
oversampling
title MKC-SMOTE: A Novel Synthetic Oversampling Method for Multi-Class Imbalanced Data Classification
title_full MKC-SMOTE: A Novel Synthetic Oversampling Method for Multi-Class Imbalanced Data Classification
title_fullStr MKC-SMOTE: A Novel Synthetic Oversampling Method for Multi-Class Imbalanced Data Classification
title_full_unstemmed MKC-SMOTE: A Novel Synthetic Oversampling Method for Multi-Class Imbalanced Data Classification
title_short MKC-SMOTE: A Novel Synthetic Oversampling Method for Multi-Class Imbalanced Data Classification
title_sort mkc smote a novel synthetic oversampling method for multi class imbalanced data classification
topic Multi-class imbalanced dataset
classification
SMOTE algorithm
synthetic minority
oversampling
url https://ieeexplore.ieee.org/document/10811922/
work_keys_str_mv AT jiaowang mkcsmoteanovelsyntheticoversamplingmethodformulticlassimbalanceddataclassification
AT norhashidahawang mkcsmoteanovelsyntheticoversamplingmethodformulticlassimbalanceddataclassification