Explainable unsupervised anomaly detection for healthcare insurance data

Abstract Background Waste and fraud are important problems for health insurers to deal with. With the advent of big data, these insurers are looking more and more towards data mining and machine learning methods to help in detecting waste and fraud. However, labeled data is costly and difficult to a...

Full description

Saved in:

Bibliographic Details
Main Authors:	Hannes De Meulemeester, Frank De Smet, Johan van Dorst, Elise Derroitte, Bart De Moor
Format:	Article
Language:	English
Published:	BMC 2025-01-01
Series:	BMC Medical Informatics and Decision Making
Subjects:	Health insurance Anomaly detection Unsupervised machine learning
Online Access:	https://doi.org/10.1186/s12911-024-02823-6
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841544591406268416
author	Hannes De Meulemeester Frank De Smet Johan van Dorst Elise Derroitte Bart De Moor
author_facet	Hannes De Meulemeester Frank De Smet Johan van Dorst Elise Derroitte Bart De Moor
author_sort	Hannes De Meulemeester
collection	DOAJ
description	Abstract Background Waste and fraud are important problems for health insurers to deal with. With the advent of big data, these insurers are looking more and more towards data mining and machine learning methods to help in detecting waste and fraud. However, labeled data is costly and difficult to acquire as it requires expert investigators and known care providers with atypical behavior. Methods In this work we show how recent advances in machine learning can be used to set up a workflow that can aid investigators in discovering practitioners or groups of practitioners with unusual resource use in order to more efficiently combat waste and fraud. We combine three different techniques, which have not been used in the context of healthcare insurance anomaly detection: categorical embeddings to deal with high-cardinality categorical variables, state-of-the-art unsupervised anomaly detection techniques to detect anomalies and Shapley additive explanations (SHAP) to explain the model output. Results The method has been evaluated on providers with a known anomalous profile and with the help of experts of the largest health insurance fund in Belgium. The quantitative experiments show that categorical embeddings offer a significant improvement compared to standard methods and that the state-of-the-art unsupervised anomaly detection techniques generally show an improvement over traditional methods. In a practical setting, the proposed workflow with SHAP was able to detect a previously unknown, anomalous trend among general practitioners. Conclusions The proposed workflow is able to detect known care providers with atypical behaviour and helps expert investigators in making informed decisions concerning possible fraud or overconsumption in the health insurance field.
format	Article
id	doaj-art-6cb8ec1b32a54b08ab227b7125670df8
institution	Kabale University
issn	1472-6947
language	English
publishDate	2025-01-01
publisher	BMC
record_format	Article
series	BMC Medical Informatics and Decision Making
spelling	doaj-art-6cb8ec1b32a54b08ab227b7125670df82025-01-12T12:26:22ZengBMCBMC Medical Informatics and Decision Making1472-69472025-01-0125111110.1186/s12911-024-02823-6Explainable unsupervised anomaly detection for healthcare insurance dataHannes De Meulemeester0Frank De Smet1Johan van Dorst2Elise Derroitte3Bart De Moor4Department of Electrical Engineering, ESAT-STADIUS, KU LeuvenChristian Health Insurance FundChristian Health Insurance FundChristian Health Insurance FundDepartment of Electrical Engineering, ESAT-STADIUS, KU LeuvenAbstract Background Waste and fraud are important problems for health insurers to deal with. With the advent of big data, these insurers are looking more and more towards data mining and machine learning methods to help in detecting waste and fraud. However, labeled data is costly and difficult to acquire as it requires expert investigators and known care providers with atypical behavior. Methods In this work we show how recent advances in machine learning can be used to set up a workflow that can aid investigators in discovering practitioners or groups of practitioners with unusual resource use in order to more efficiently combat waste and fraud. We combine three different techniques, which have not been used in the context of healthcare insurance anomaly detection: categorical embeddings to deal with high-cardinality categorical variables, state-of-the-art unsupervised anomaly detection techniques to detect anomalies and Shapley additive explanations (SHAP) to explain the model output. Results The method has been evaluated on providers with a known anomalous profile and with the help of experts of the largest health insurance fund in Belgium. The quantitative experiments show that categorical embeddings offer a significant improvement compared to standard methods and that the state-of-the-art unsupervised anomaly detection techniques generally show an improvement over traditional methods. In a practical setting, the proposed workflow with SHAP was able to detect a previously unknown, anomalous trend among general practitioners. Conclusions The proposed workflow is able to detect known care providers with atypical behaviour and helps expert investigators in making informed decisions concerning possible fraud or overconsumption in the health insurance field.https://doi.org/10.1186/s12911-024-02823-6Health insuranceAnomaly detectionUnsupervised machine learning
spellingShingle	Hannes De Meulemeester Frank De Smet Johan van Dorst Elise Derroitte Bart De Moor Explainable unsupervised anomaly detection for healthcare insurance data BMC Medical Informatics and Decision Making Health insurance Anomaly detection Unsupervised machine learning
title	Explainable unsupervised anomaly detection for healthcare insurance data
title_full	Explainable unsupervised anomaly detection for healthcare insurance data
title_fullStr	Explainable unsupervised anomaly detection for healthcare insurance data
title_full_unstemmed	Explainable unsupervised anomaly detection for healthcare insurance data
title_short	Explainable unsupervised anomaly detection for healthcare insurance data
title_sort	explainable unsupervised anomaly detection for healthcare insurance data
topic	Health insurance Anomaly detection Unsupervised machine learning
url	https://doi.org/10.1186/s12911-024-02823-6
work_keys_str_mv	AT hannesdemeulemeester explainableunsupervisedanomalydetectionforhealthcareinsurancedata AT frankdesmet explainableunsupervisedanomalydetectionforhealthcareinsurancedata AT johanvandorst explainableunsupervisedanomalydetectionforhealthcareinsurancedata AT elisederroitte explainableunsupervisedanomalydetectionforhealthcareinsurancedata AT bartdemoor explainableunsupervisedanomalydetectionforhealthcareinsurancedata

Explainable unsupervised anomaly detection for healthcare insurance data

Similar Items