Predictive performance of count regression models versus machine learning techniques: A comparative analysis using an automobile insurance claims frequency dataset.

Accurate forecasting of claim frequency in automobile insurance is essential for insurers to assess risks effectively and establish appropriate pricing policies. Traditional methods typically rely on a Poisson distribution for modeling claim counts; however, this approach can be inadequate due to fr...

Full description

Saved in:
Bibliographic Details
Main Author: Gadir Alomair
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2024-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0314975
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841555644686008320
author Gadir Alomair
author_facet Gadir Alomair
author_sort Gadir Alomair
collection DOAJ
description Accurate forecasting of claim frequency in automobile insurance is essential for insurers to assess risks effectively and establish appropriate pricing policies. Traditional methods typically rely on a Poisson distribution for modeling claim counts; however, this approach can be inadequate due to frequent zero-claim periods, leading to zero inflation in the data. Zero inflation occurs when more zeros are observed than expected under standard Poisson or negative binomial (NB) models. While machine learning (ML) techniques have been explored for predictive analytics in other contexts, their application to zero-inflated insurance data remains limited. This study investigates the utility of ML in improving forecast accuracy under conditions of zero-inflation, a data characteristic common in automobile insurance. The research involved a comparative evaluation of several models, including Poisson, NB, zero-inflated Poisson (ZIP), hurdle Poisson, zero-inflated negative binomial (ZINB), hurdle negative binomial, random forest (RF), support vector machine (SVM), and artificial neural network (ANN) on an insurance dataset. The performance of these models was assessed using mean absolute error. The results reveal that the SVM model outperforms others in predictive accuracy, particularly in handling zero-inflation, followed by the ZIP and ZINB models. In contrast, the traditional Poisson and NB models showed lower predictive capabilities. By addressing the challenge of zero-inflation in automobile claim data, this study offers insights into improving the accuracy of claim frequency predictions. Although this study is based on a single dataset, the findings provide valuable perspectives on enhancing prediction accuracy and improving risk management practices in the insurance industry.
format Article
id doaj-art-f2bec89aafa74e7688e583bbbd3dd9a2
institution Kabale University
issn 1932-6203
language English
publishDate 2024-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-f2bec89aafa74e7688e583bbbd3dd9a22025-01-08T05:32:02ZengPublic Library of Science (PLoS)PLoS ONE1932-62032024-01-011912e031497510.1371/journal.pone.0314975Predictive performance of count regression models versus machine learning techniques: A comparative analysis using an automobile insurance claims frequency dataset.Gadir AlomairAccurate forecasting of claim frequency in automobile insurance is essential for insurers to assess risks effectively and establish appropriate pricing policies. Traditional methods typically rely on a Poisson distribution for modeling claim counts; however, this approach can be inadequate due to frequent zero-claim periods, leading to zero inflation in the data. Zero inflation occurs when more zeros are observed than expected under standard Poisson or negative binomial (NB) models. While machine learning (ML) techniques have been explored for predictive analytics in other contexts, their application to zero-inflated insurance data remains limited. This study investigates the utility of ML in improving forecast accuracy under conditions of zero-inflation, a data characteristic common in automobile insurance. The research involved a comparative evaluation of several models, including Poisson, NB, zero-inflated Poisson (ZIP), hurdle Poisson, zero-inflated negative binomial (ZINB), hurdle negative binomial, random forest (RF), support vector machine (SVM), and artificial neural network (ANN) on an insurance dataset. The performance of these models was assessed using mean absolute error. The results reveal that the SVM model outperforms others in predictive accuracy, particularly in handling zero-inflation, followed by the ZIP and ZINB models. In contrast, the traditional Poisson and NB models showed lower predictive capabilities. By addressing the challenge of zero-inflation in automobile claim data, this study offers insights into improving the accuracy of claim frequency predictions. Although this study is based on a single dataset, the findings provide valuable perspectives on enhancing prediction accuracy and improving risk management practices in the insurance industry.https://doi.org/10.1371/journal.pone.0314975
spellingShingle Gadir Alomair
Predictive performance of count regression models versus machine learning techniques: A comparative analysis using an automobile insurance claims frequency dataset.
PLoS ONE
title Predictive performance of count regression models versus machine learning techniques: A comparative analysis using an automobile insurance claims frequency dataset.
title_full Predictive performance of count regression models versus machine learning techniques: A comparative analysis using an automobile insurance claims frequency dataset.
title_fullStr Predictive performance of count regression models versus machine learning techniques: A comparative analysis using an automobile insurance claims frequency dataset.
title_full_unstemmed Predictive performance of count regression models versus machine learning techniques: A comparative analysis using an automobile insurance claims frequency dataset.
title_short Predictive performance of count regression models versus machine learning techniques: A comparative analysis using an automobile insurance claims frequency dataset.
title_sort predictive performance of count regression models versus machine learning techniques a comparative analysis using an automobile insurance claims frequency dataset
url https://doi.org/10.1371/journal.pone.0314975
work_keys_str_mv AT gadiralomair predictiveperformanceofcountregressionmodelsversusmachinelearningtechniquesacomparativeanalysisusinganautomobileinsuranceclaimsfrequencydataset