A stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasets
In several earlier studies, machine learning (ML) has been widely explored for fraud detection. However, fraud detection is still a challenging problem. This is due to the imbalanced nature of fraud data, which leads to underperformance by most models in detecting a few fraud cases. Undetected frau...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nigerian Society of Physical Sciences
2025-02-01
|
Series: | Journal of Nigerian Society of Physical Sciences |
Subjects: | |
Online Access: | https://journal.nsps.org.ng/index.php/jnsps/article/view/2066 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In several earlier studies, machine learning (ML) has been widely explored for fraud detection. However, fraud detection is still a challenging problem. This is due to the imbalanced nature of fraud data, which leads to underperformance by most models in detecting a few fraud cases. Undetected fraud cases also account for the loss of several millions of dollars annually. Thus, we propose an ensemble approach that stacks five classifiers - Support Vector Machine, Decision Trees, Random Forests, Gaussian Na¨?ve Bayes, and k-Nearest Neighbour, and uses the Logistic Regression meta-classifier to make predictions based on a stacking algorithm and novel pipeline. The effectiveness of the proposed model is examined on three datasets. The first two datasets were trained and tested initially without resampling and then compared with the results obtained using the Synthetic Minority Oversampling Technique (SMOTE) and RandomUnderSampler techniques. Only a balanced resampled dataset was trained on the third dataset that clearly showed an imbalance. From the results obtained, it is observed that the proposed model is highly competitive, with extant models producing ROC AUC of 99% and scoring above 98% in all other metrics. The approach is recommended for detecting fraud cases in similar case studies.
|
---|---|
ISSN: | 2714-2817 2714-4704 |