A stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasets

In several earlier studies, machine learning (ML) has been widely explored for fraud detection. However, fraud detection is still a challenging problem. This is due to the imbalanced nature of fraud data, which leads to underperformance by most models in detecting a few fraud cases. Undetected frau...

Full description

Saved in:
Bibliographic Details
Main Authors: Idongesit E. Eteng, Udeze L. Chinedu, Ayei E. Ibor
Format: Article
Language:English
Published: Nigerian Society of Physical Sciences 2025-02-01
Series:Journal of Nigerian Society of Physical Sciences
Subjects:
Online Access:https://journal.nsps.org.ng/index.php/jnsps/article/view/2066
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841525192028848128
author Idongesit E. Eteng
Udeze L. Chinedu
Ayei E. Ibor
author_facet Idongesit E. Eteng
Udeze L. Chinedu
Ayei E. Ibor
author_sort Idongesit E. Eteng
collection DOAJ
description In several earlier studies, machine learning (ML) has been widely explored for fraud detection. However, fraud detection is still a challenging problem. This is due to the imbalanced nature of fraud data, which leads to underperformance by most models in detecting a few fraud cases. Undetected fraud cases also account for the loss of several millions of dollars annually. Thus, we propose an ensemble approach that stacks five classifiers - Support Vector Machine, Decision Trees, Random Forests, Gaussian Na¨?ve Bayes, and k-Nearest Neighbour, and uses the Logistic Regression meta-classifier to make predictions based on a stacking algorithm and novel pipeline. The effectiveness of the proposed model is examined on three datasets. The first two datasets were trained and tested initially without resampling and then compared with the results obtained using the Synthetic Minority Oversampling Technique (SMOTE) and RandomUnderSampler techniques. Only a balanced resampled dataset was trained on the third dataset that clearly showed an imbalance. From the results obtained, it is observed that the proposed model is highly competitive, with extant models producing ROC AUC of 99% and scoring above 98% in all other metrics. The approach is recommended for detecting fraud cases in similar case studies.
format Article
id doaj-art-0bf1996d0b7f41e9b7150d646777e4b9
institution Kabale University
issn 2714-2817
2714-4704
language English
publishDate 2025-02-01
publisher Nigerian Society of Physical Sciences
record_format Article
series Journal of Nigerian Society of Physical Sciences
spelling doaj-art-0bf1996d0b7f41e9b7150d646777e4b92025-01-17T18:52:29ZengNigerian Society of Physical SciencesJournal of Nigerian Society of Physical Sciences2714-28172714-47042025-02-017110.46481/jnsps.2025.2066A stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasetsIdongesit E. Eteng0 Udeze L. Chinedu1Ayei E. Ibor2Department of Computer Science, University of Calabar, Calabar, NigeriaDepartment of Computer Science and Creative Technologies, University of the West of England, Bristol, United KingdomDepartment of Computer Science, University of Calabar, Calabar, Nigeria In several earlier studies, machine learning (ML) has been widely explored for fraud detection. However, fraud detection is still a challenging problem. This is due to the imbalanced nature of fraud data, which leads to underperformance by most models in detecting a few fraud cases. Undetected fraud cases also account for the loss of several millions of dollars annually. Thus, we propose an ensemble approach that stacks five classifiers - Support Vector Machine, Decision Trees, Random Forests, Gaussian Na¨?ve Bayes, and k-Nearest Neighbour, and uses the Logistic Regression meta-classifier to make predictions based on a stacking algorithm and novel pipeline. The effectiveness of the proposed model is examined on three datasets. The first two datasets were trained and tested initially without resampling and then compared with the results obtained using the Synthetic Minority Oversampling Technique (SMOTE) and RandomUnderSampler techniques. Only a balanced resampled dataset was trained on the third dataset that clearly showed an imbalance. From the results obtained, it is observed that the proposed model is highly competitive, with extant models producing ROC AUC of 99% and scoring above 98% in all other metrics. The approach is recommended for detecting fraud cases in similar case studies. https://journal.nsps.org.ng/index.php/jnsps/article/view/2066Imbalanced datasetEnsemble ApproachFraud detectionStacking algorithmSynthetic Minority Oversampling Technique (SMOTE)
spellingShingle Idongesit E. Eteng
Udeze L. Chinedu
Ayei E. Ibor
A stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasets
Journal of Nigerian Society of Physical Sciences
Imbalanced dataset
Ensemble Approach
Fraud detection
Stacking algorithm
Synthetic Minority Oversampling Technique (SMOTE)
title A stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasets
title_full A stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasets
title_fullStr A stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasets
title_full_unstemmed A stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasets
title_short A stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasets
title_sort stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasets
topic Imbalanced dataset
Ensemble Approach
Fraud detection
Stacking algorithm
Synthetic Minority Oversampling Technique (SMOTE)
url https://journal.nsps.org.ng/index.php/jnsps/article/view/2066
work_keys_str_mv AT idongesiteeteng astackedensembleapproachwithresamplingtechniquesforhighlyeffectivefrauddetectioninimbalanceddatasets
AT udezelchinedu astackedensembleapproachwithresamplingtechniquesforhighlyeffectivefrauddetectioninimbalanceddatasets
AT ayeieibor astackedensembleapproachwithresamplingtechniquesforhighlyeffectivefrauddetectioninimbalanceddatasets
AT idongesiteeteng stackedensembleapproachwithresamplingtechniquesforhighlyeffectivefrauddetectioninimbalanceddatasets
AT udezelchinedu stackedensembleapproachwithresamplingtechniquesforhighlyeffectivefrauddetectioninimbalanceddatasets
AT ayeieibor stackedensembleapproachwithresamplingtechniquesforhighlyeffectivefrauddetectioninimbalanceddatasets