A stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasets
In several earlier studies, machine learning (ML) has been widely explored for fraud detection. However, fraud detection is still a challenging problem. This is due to the imbalanced nature of fraud data, which leads to underperformance by most models in detecting a few fraud cases. Undetected frau...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nigerian Society of Physical Sciences
2025-02-01
|
Series: | Journal of Nigerian Society of Physical Sciences |
Subjects: | |
Online Access: | https://journal.nsps.org.ng/index.php/jnsps/article/view/2066 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841525192028848128 |
---|---|
author | Idongesit E. Eteng Udeze L. Chinedu Ayei E. Ibor |
author_facet | Idongesit E. Eteng Udeze L. Chinedu Ayei E. Ibor |
author_sort | Idongesit E. Eteng |
collection | DOAJ |
description |
In several earlier studies, machine learning (ML) has been widely explored for fraud detection. However, fraud detection is still a challenging problem. This is due to the imbalanced nature of fraud data, which leads to underperformance by most models in detecting a few fraud cases. Undetected fraud cases also account for the loss of several millions of dollars annually. Thus, we propose an ensemble approach that stacks five classifiers - Support Vector Machine, Decision Trees, Random Forests, Gaussian Na¨?ve Bayes, and k-Nearest Neighbour, and uses the Logistic Regression meta-classifier to make predictions based on a stacking algorithm and novel pipeline. The effectiveness of the proposed model is examined on three datasets. The first two datasets were trained and tested initially without resampling and then compared with the results obtained using the Synthetic Minority Oversampling Technique (SMOTE) and RandomUnderSampler techniques. Only a balanced resampled dataset was trained on the third dataset that clearly showed an imbalance. From the results obtained, it is observed that the proposed model is highly competitive, with extant models producing ROC AUC of 99% and scoring above 98% in all other metrics. The approach is recommended for detecting fraud cases in similar case studies.
|
format | Article |
id | doaj-art-0bf1996d0b7f41e9b7150d646777e4b9 |
institution | Kabale University |
issn | 2714-2817 2714-4704 |
language | English |
publishDate | 2025-02-01 |
publisher | Nigerian Society of Physical Sciences |
record_format | Article |
series | Journal of Nigerian Society of Physical Sciences |
spelling | doaj-art-0bf1996d0b7f41e9b7150d646777e4b92025-01-17T18:52:29ZengNigerian Society of Physical SciencesJournal of Nigerian Society of Physical Sciences2714-28172714-47042025-02-017110.46481/jnsps.2025.2066A stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasetsIdongesit E. Eteng0 Udeze L. Chinedu1Ayei E. Ibor2Department of Computer Science, University of Calabar, Calabar, NigeriaDepartment of Computer Science and Creative Technologies, University of the West of England, Bristol, United KingdomDepartment of Computer Science, University of Calabar, Calabar, Nigeria In several earlier studies, machine learning (ML) has been widely explored for fraud detection. However, fraud detection is still a challenging problem. This is due to the imbalanced nature of fraud data, which leads to underperformance by most models in detecting a few fraud cases. Undetected fraud cases also account for the loss of several millions of dollars annually. Thus, we propose an ensemble approach that stacks five classifiers - Support Vector Machine, Decision Trees, Random Forests, Gaussian Na¨?ve Bayes, and k-Nearest Neighbour, and uses the Logistic Regression meta-classifier to make predictions based on a stacking algorithm and novel pipeline. The effectiveness of the proposed model is examined on three datasets. The first two datasets were trained and tested initially without resampling and then compared with the results obtained using the Synthetic Minority Oversampling Technique (SMOTE) and RandomUnderSampler techniques. Only a balanced resampled dataset was trained on the third dataset that clearly showed an imbalance. From the results obtained, it is observed that the proposed model is highly competitive, with extant models producing ROC AUC of 99% and scoring above 98% in all other metrics. The approach is recommended for detecting fraud cases in similar case studies. https://journal.nsps.org.ng/index.php/jnsps/article/view/2066Imbalanced datasetEnsemble ApproachFraud detectionStacking algorithmSynthetic Minority Oversampling Technique (SMOTE) |
spellingShingle | Idongesit E. Eteng Udeze L. Chinedu Ayei E. Ibor A stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasets Journal of Nigerian Society of Physical Sciences Imbalanced dataset Ensemble Approach Fraud detection Stacking algorithm Synthetic Minority Oversampling Technique (SMOTE) |
title | A stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasets |
title_full | A stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasets |
title_fullStr | A stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasets |
title_full_unstemmed | A stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasets |
title_short | A stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasets |
title_sort | stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasets |
topic | Imbalanced dataset Ensemble Approach Fraud detection Stacking algorithm Synthetic Minority Oversampling Technique (SMOTE) |
url | https://journal.nsps.org.ng/index.php/jnsps/article/view/2066 |
work_keys_str_mv | AT idongesiteeteng astackedensembleapproachwithresamplingtechniquesforhighlyeffectivefrauddetectioninimbalanceddatasets AT udezelchinedu astackedensembleapproachwithresamplingtechniquesforhighlyeffectivefrauddetectioninimbalanceddatasets AT ayeieibor astackedensembleapproachwithresamplingtechniquesforhighlyeffectivefrauddetectioninimbalanceddatasets AT idongesiteeteng stackedensembleapproachwithresamplingtechniquesforhighlyeffectivefrauddetectioninimbalanceddatasets AT udezelchinedu stackedensembleapproachwithresamplingtechniquesforhighlyeffectivefrauddetectioninimbalanceddatasets AT ayeieibor stackedensembleapproachwithresamplingtechniquesforhighlyeffectivefrauddetectioninimbalanceddatasets |