A stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasets

In several earlier studies, machine learning (ML) has been widely explored for fraud detection. However, fraud detection is still a challenging problem. This is due to the imbalanced nature of fraud data, which leads to underperformance by most models in detecting a few fraud cases. Undetected frau...

Full description

Saved in:
Bibliographic Details
Main Authors: Idongesit E. Eteng, Udeze L. Chinedu, Ayei E. Ibor
Format: Article
Language:English
Published: Nigerian Society of Physical Sciences 2025-02-01
Series:Journal of Nigerian Society of Physical Sciences
Subjects:
Online Access:https://journal.nsps.org.ng/index.php/jnsps/article/view/2066
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In several earlier studies, machine learning (ML) has been widely explored for fraud detection. However, fraud detection is still a challenging problem. This is due to the imbalanced nature of fraud data, which leads to underperformance by most models in detecting a few fraud cases. Undetected fraud cases also account for the loss of several millions of dollars annually. Thus, we propose an ensemble approach that stacks five classifiers - Support Vector Machine, Decision Trees, Random Forests, Gaussian Na¨?ve Bayes, and k-Nearest Neighbour, and uses the Logistic Regression meta-classifier to make predictions based on a stacking algorithm and novel pipeline. The effectiveness of the proposed model is examined on three datasets. The first two datasets were trained and tested initially without resampling and then compared with the results obtained using the Synthetic Minority Oversampling Technique (SMOTE) and RandomUnderSampler techniques. Only a balanced resampled dataset was trained on the third dataset that clearly showed an imbalance. From the results obtained, it is observed that the proposed model is highly competitive, with extant models producing ROC AUC of 99% and scoring above 98% in all other metrics. The approach is recommended for detecting fraud cases in similar case studies.
ISSN:2714-2817
2714-4704