Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing Values

Insurance fraud is a prevalent issue that insurance companies must face, particularly in the realm of automobile insurance. This type of fraud has significant cost implications for insurance firms and can have a long-term impact on pricing strategies and insurance rates. As a result, accurately pred...

Full description

Saved in:
Bibliographic Details
Main Authors: Ahmed A. Khalil, Zaiming Liu, Ahmed Fathalla, Ahmed Ali, Ahmad Salah
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10695046/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841536206765031424
author Ahmed A. Khalil
Zaiming Liu
Ahmed Fathalla
Ahmed Ali
Ahmad Salah
author_facet Ahmed A. Khalil
Zaiming Liu
Ahmed Fathalla
Ahmed Ali
Ahmad Salah
author_sort Ahmed A. Khalil
collection DOAJ
description Insurance fraud is a prevalent issue that insurance companies must face, particularly in the realm of automobile insurance. This type of fraud has significant cost implications for insurance firms and can have a long-term impact on pricing strategies and insurance rates. As a result, accurately predicting and detecting insurance fraud has become a crucial challenge for insurers. The fraud datasets are usually imbalanced, as the number of fraudulent instances is much less than the ligament instances and contains missing values. Prior research has employed machine learning methods to address this class imbalance dataset problem, but there is limited effort handling the class imbalance dataset present in insurance fraud datasets. Moreover, we could not find an overfitting analysis for the relevant predictive models. This paper addresses these two limitations by employing two car insurance company datasets, namely, an Egyptian real-life dataset and a standard dataset. We proposed addressing the missing data and the class imbalance problems with different methods. Then, the predictive models were trained on processed datasets to predict insurance fraud as a classification problem. The classifiers are evaluated on several evaluation metrics. Moreover, we proposed the first overfitting analysis for insurance fraud classifiers, to our knowledge. The obtained results outline that addressing the class imbalance in the insurance fraud detection dataset has a significant positive effect on the performance of the predictive model, while addressing the problem of missing values has a slight effect. Moreover, the proposed methods outperform all of the existing methods on the accuracy metric.
format Article
id doaj-art-c95817c6099d4857ae9e95a4dd026f9b
institution Kabale University
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-c95817c6099d4857ae9e95a4dd026f9b2025-01-15T00:02:12ZengIEEEIEEE Access2169-35362024-01-011215545115546810.1109/ACCESS.2024.346899310695046Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing ValuesAhmed A. Khalil0https://orcid.org/0000-0002-4563-5822Zaiming Liu1Ahmed Fathalla2https://orcid.org/0000-0001-5432-5407Ahmed Ali3https://orcid.org/0000-0003-2775-4104Ahmad Salah4School of Mathematics and Statistics, Central South University, Changsha, Hunan, ChinaSchool of Mathematics and Statistics, Central South University, Changsha, Hunan, ChinaDepartment of Mathematics, Faculty of Science, Suez Canal University, Ismailia, EgyptDepartment of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi ArabiaCollege of Computing and Information Sciences, University of Technology and Applied Sciences, Ibri, OmanInsurance fraud is a prevalent issue that insurance companies must face, particularly in the realm of automobile insurance. This type of fraud has significant cost implications for insurance firms and can have a long-term impact on pricing strategies and insurance rates. As a result, accurately predicting and detecting insurance fraud has become a crucial challenge for insurers. The fraud datasets are usually imbalanced, as the number of fraudulent instances is much less than the ligament instances and contains missing values. Prior research has employed machine learning methods to address this class imbalance dataset problem, but there is limited effort handling the class imbalance dataset present in insurance fraud datasets. Moreover, we could not find an overfitting analysis for the relevant predictive models. This paper addresses these two limitations by employing two car insurance company datasets, namely, an Egyptian real-life dataset and a standard dataset. We proposed addressing the missing data and the class imbalance problems with different methods. Then, the predictive models were trained on processed datasets to predict insurance fraud as a classification problem. The classifiers are evaluated on several evaluation metrics. Moreover, we proposed the first overfitting analysis for insurance fraud classifiers, to our knowledge. The obtained results outline that addressing the class imbalance in the insurance fraud detection dataset has a significant positive effect on the performance of the predictive model, while addressing the problem of missing values has a slight effect. Moreover, the proposed methods outperform all of the existing methods on the accuracy metric.https://ieeexplore.ieee.org/document/10695046/Data imputationensemble learningimbalanced datainsurance fraudmachine learningmissing data handling
spellingShingle Ahmed A. Khalil
Zaiming Liu
Ahmed Fathalla
Ahmed Ali
Ahmad Salah
Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing Values
IEEE Access
Data imputation
ensemble learning
imbalanced data
insurance fraud
machine learning
missing data handling
title Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing Values
title_full Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing Values
title_fullStr Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing Values
title_full_unstemmed Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing Values
title_short Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing Values
title_sort machine learning based method for insurance fraud detection on class imbalance datasets with missing values
topic Data imputation
ensemble learning
imbalanced data
insurance fraud
machine learning
missing data handling
url https://ieeexplore.ieee.org/document/10695046/
work_keys_str_mv AT ahmedakhalil machinelearningbasedmethodforinsurancefrauddetectiononclassimbalancedatasetswithmissingvalues
AT zaimingliu machinelearningbasedmethodforinsurancefrauddetectiononclassimbalancedatasetswithmissingvalues
AT ahmedfathalla machinelearningbasedmethodforinsurancefrauddetectiononclassimbalancedatasetswithmissingvalues
AT ahmedali machinelearningbasedmethodforinsurancefrauddetectiononclassimbalancedatasetswithmissingvalues
AT ahmadsalah machinelearningbasedmethodforinsurancefrauddetectiononclassimbalancedatasetswithmissingvalues