Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing Values
Insurance fraud is a prevalent issue that insurance companies must face, particularly in the realm of automobile insurance. This type of fraud has significant cost implications for insurance firms and can have a long-term impact on pricing strategies and insurance rates. As a result, accurately pred...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2024-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10695046/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841536206765031424 |
---|---|
author | Ahmed A. Khalil Zaiming Liu Ahmed Fathalla Ahmed Ali Ahmad Salah |
author_facet | Ahmed A. Khalil Zaiming Liu Ahmed Fathalla Ahmed Ali Ahmad Salah |
author_sort | Ahmed A. Khalil |
collection | DOAJ |
description | Insurance fraud is a prevalent issue that insurance companies must face, particularly in the realm of automobile insurance. This type of fraud has significant cost implications for insurance firms and can have a long-term impact on pricing strategies and insurance rates. As a result, accurately predicting and detecting insurance fraud has become a crucial challenge for insurers. The fraud datasets are usually imbalanced, as the number of fraudulent instances is much less than the ligament instances and contains missing values. Prior research has employed machine learning methods to address this class imbalance dataset problem, but there is limited effort handling the class imbalance dataset present in insurance fraud datasets. Moreover, we could not find an overfitting analysis for the relevant predictive models. This paper addresses these two limitations by employing two car insurance company datasets, namely, an Egyptian real-life dataset and a standard dataset. We proposed addressing the missing data and the class imbalance problems with different methods. Then, the predictive models were trained on processed datasets to predict insurance fraud as a classification problem. The classifiers are evaluated on several evaluation metrics. Moreover, we proposed the first overfitting analysis for insurance fraud classifiers, to our knowledge. The obtained results outline that addressing the class imbalance in the insurance fraud detection dataset has a significant positive effect on the performance of the predictive model, while addressing the problem of missing values has a slight effect. Moreover, the proposed methods outperform all of the existing methods on the accuracy metric. |
format | Article |
id | doaj-art-c95817c6099d4857ae9e95a4dd026f9b |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2024-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-c95817c6099d4857ae9e95a4dd026f9b2025-01-15T00:02:12ZengIEEEIEEE Access2169-35362024-01-011215545115546810.1109/ACCESS.2024.346899310695046Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing ValuesAhmed A. Khalil0https://orcid.org/0000-0002-4563-5822Zaiming Liu1Ahmed Fathalla2https://orcid.org/0000-0001-5432-5407Ahmed Ali3https://orcid.org/0000-0003-2775-4104Ahmad Salah4School of Mathematics and Statistics, Central South University, Changsha, Hunan, ChinaSchool of Mathematics and Statistics, Central South University, Changsha, Hunan, ChinaDepartment of Mathematics, Faculty of Science, Suez Canal University, Ismailia, EgyptDepartment of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi ArabiaCollege of Computing and Information Sciences, University of Technology and Applied Sciences, Ibri, OmanInsurance fraud is a prevalent issue that insurance companies must face, particularly in the realm of automobile insurance. This type of fraud has significant cost implications for insurance firms and can have a long-term impact on pricing strategies and insurance rates. As a result, accurately predicting and detecting insurance fraud has become a crucial challenge for insurers. The fraud datasets are usually imbalanced, as the number of fraudulent instances is much less than the ligament instances and contains missing values. Prior research has employed machine learning methods to address this class imbalance dataset problem, but there is limited effort handling the class imbalance dataset present in insurance fraud datasets. Moreover, we could not find an overfitting analysis for the relevant predictive models. This paper addresses these two limitations by employing two car insurance company datasets, namely, an Egyptian real-life dataset and a standard dataset. We proposed addressing the missing data and the class imbalance problems with different methods. Then, the predictive models were trained on processed datasets to predict insurance fraud as a classification problem. The classifiers are evaluated on several evaluation metrics. Moreover, we proposed the first overfitting analysis for insurance fraud classifiers, to our knowledge. The obtained results outline that addressing the class imbalance in the insurance fraud detection dataset has a significant positive effect on the performance of the predictive model, while addressing the problem of missing values has a slight effect. Moreover, the proposed methods outperform all of the existing methods on the accuracy metric.https://ieeexplore.ieee.org/document/10695046/Data imputationensemble learningimbalanced datainsurance fraudmachine learningmissing data handling |
spellingShingle | Ahmed A. Khalil Zaiming Liu Ahmed Fathalla Ahmed Ali Ahmad Salah Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing Values IEEE Access Data imputation ensemble learning imbalanced data insurance fraud machine learning missing data handling |
title | Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing Values |
title_full | Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing Values |
title_fullStr | Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing Values |
title_full_unstemmed | Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing Values |
title_short | Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing Values |
title_sort | machine learning based method for insurance fraud detection on class imbalance datasets with missing values |
topic | Data imputation ensemble learning imbalanced data insurance fraud machine learning missing data handling |
url | https://ieeexplore.ieee.org/document/10695046/ |
work_keys_str_mv | AT ahmedakhalil machinelearningbasedmethodforinsurancefrauddetectiononclassimbalancedatasetswithmissingvalues AT zaimingliu machinelearningbasedmethodforinsurancefrauddetectiononclassimbalancedatasetswithmissingvalues AT ahmedfathalla machinelearningbasedmethodforinsurancefrauddetectiononclassimbalancedatasetswithmissingvalues AT ahmedali machinelearningbasedmethodforinsurancefrauddetectiononclassimbalancedatasetswithmissingvalues AT ahmadsalah machinelearningbasedmethodforinsurancefrauddetectiononclassimbalancedatasetswithmissingvalues |