Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing Values

Insurance fraud is a prevalent issue that insurance companies must face, particularly in the realm of automobile insurance. This type of fraud has significant cost implications for insurance firms and can have a long-term impact on pricing strategies and insurance rates. As a result, accurately pred...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ahmed A. Khalil, Zaiming Liu, Ahmed Fathalla, Ahmed Ali, Ahmad Salah
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Access
Subjects:	Data imputation ensemble learning imbalanced data insurance fraud machine learning missing data handling
Online Access:	https://ieeexplore.ieee.org/document/10695046/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841536206765031424
author	Ahmed A. Khalil Zaiming Liu Ahmed Fathalla Ahmed Ali Ahmad Salah
author_facet	Ahmed A. Khalil Zaiming Liu Ahmed Fathalla Ahmed Ali Ahmad Salah
author_sort	Ahmed A. Khalil
collection	DOAJ
description	Insurance fraud is a prevalent issue that insurance companies must face, particularly in the realm of automobile insurance. This type of fraud has significant cost implications for insurance firms and can have a long-term impact on pricing strategies and insurance rates. As a result, accurately predicting and detecting insurance fraud has become a crucial challenge for insurers. The fraud datasets are usually imbalanced, as the number of fraudulent instances is much less than the ligament instances and contains missing values. Prior research has employed machine learning methods to address this class imbalance dataset problem, but there is limited effort handling the class imbalance dataset present in insurance fraud datasets. Moreover, we could not find an overfitting analysis for the relevant predictive models. This paper addresses these two limitations by employing two car insurance company datasets, namely, an Egyptian real-life dataset and a standard dataset. We proposed addressing the missing data and the class imbalance problems with different methods. Then, the predictive models were trained on processed datasets to predict insurance fraud as a classification problem. The classifiers are evaluated on several evaluation metrics. Moreover, we proposed the first overfitting analysis for insurance fraud classifiers, to our knowledge. The obtained results outline that addressing the class imbalance in the insurance fraud detection dataset has a significant positive effect on the performance of the predictive model, while addressing the problem of missing values has a slight effect. Moreover, the proposed methods outperform all of the existing methods on the accuracy metric.
format	Article
id	doaj-art-c95817c6099d4857ae9e95a4dd026f9b
institution	Kabale University
issn	2169-3536
language	English
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-c95817c6099d4857ae9e95a4dd026f9b2025-01-15T00:02:12ZengIEEEIEEE Access2169-35362024-01-011215545115546810.1109/ACCESS.2024.346899310695046Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing ValuesAhmed A. Khalil0https://orcid.org/0000-0002-4563-5822Zaiming Liu1Ahmed Fathalla2https://orcid.org/0000-0001-5432-5407Ahmed Ali3https://orcid.org/0000-0003-2775-4104Ahmad Salah4School of Mathematics and Statistics, Central South University, Changsha, Hunan, ChinaSchool of Mathematics and Statistics, Central South University, Changsha, Hunan, ChinaDepartment of Mathematics, Faculty of Science, Suez Canal University, Ismailia, EgyptDepartment of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi ArabiaCollege of Computing and Information Sciences, University of Technology and Applied Sciences, Ibri, OmanInsurance fraud is a prevalent issue that insurance companies must face, particularly in the realm of automobile insurance. This type of fraud has significant cost implications for insurance firms and can have a long-term impact on pricing strategies and insurance rates. As a result, accurately predicting and detecting insurance fraud has become a crucial challenge for insurers. The fraud datasets are usually imbalanced, as the number of fraudulent instances is much less than the ligament instances and contains missing values. Prior research has employed machine learning methods to address this class imbalance dataset problem, but there is limited effort handling the class imbalance dataset present in insurance fraud datasets. Moreover, we could not find an overfitting analysis for the relevant predictive models. This paper addresses these two limitations by employing two car insurance company datasets, namely, an Egyptian real-life dataset and a standard dataset. We proposed addressing the missing data and the class imbalance problems with different methods. Then, the predictive models were trained on processed datasets to predict insurance fraud as a classification problem. The classifiers are evaluated on several evaluation metrics. Moreover, we proposed the first overfitting analysis for insurance fraud classifiers, to our knowledge. The obtained results outline that addressing the class imbalance in the insurance fraud detection dataset has a significant positive effect on the performance of the predictive model, while addressing the problem of missing values has a slight effect. Moreover, the proposed methods outperform all of the existing methods on the accuracy metric.https://ieeexplore.ieee.org/document/10695046/Data imputationensemble learningimbalanced datainsurance fraudmachine learningmissing data handling
spellingShingle	Ahmed A. Khalil Zaiming Liu Ahmed Fathalla Ahmed Ali Ahmad Salah Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing Values IEEE Access Data imputation ensemble learning imbalanced data insurance fraud machine learning missing data handling
title	Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing Values
title_full	Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing Values
title_fullStr	Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing Values
title_full_unstemmed	Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing Values
title_short	Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing Values
title_sort	machine learning based method for insurance fraud detection on class imbalance datasets with missing values
topic	Data imputation ensemble learning imbalanced data insurance fraud machine learning missing data handling
url	https://ieeexplore.ieee.org/document/10695046/
work_keys_str_mv	AT ahmedakhalil machinelearningbasedmethodforinsurancefrauddetectiononclassimbalancedatasetswithmissingvalues AT zaimingliu machinelearningbasedmethodforinsurancefrauddetectiononclassimbalancedatasetswithmissingvalues AT ahmedfathalla machinelearningbasedmethodforinsurancefrauddetectiononclassimbalancedatasetswithmissingvalues AT ahmedali machinelearningbasedmethodforinsurancefrauddetectiononclassimbalancedatasetswithmissingvalues AT ahmadsalah machinelearningbasedmethodforinsurancefrauddetectiononclassimbalancedatasetswithmissingvalues

Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing Values

Similar Items