Enhance health evidence quality in classification tasks: A triangulation approach utilizing case-based reasoning and process features

Objective Machine learning (ML) has enabled healthcare discoveries by facilitating efficient modeling, such as for cancer screening. Unlike clinical trials, real-world data used in ML are often gathered for multiple purposes, leading to bias and missing information for a specific classification task...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ruihua Guo, Ross Smith, Qifan Chen, Angus Ritchie, Simon Poon
Format:	Article
Language:	English
Published:	SAGE Publishing 2025-01-01
Series:	Digital Health
Online Access:	https://doi.org/10.1177/20552076251314097
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841525287118962688
author	Ruihua Guo Ross Smith Qifan Chen Angus Ritchie Simon Poon
author_facet	Ruihua Guo Ross Smith Qifan Chen Angus Ritchie Simon Poon
author_sort	Ruihua Guo
collection	DOAJ
description	Objective Machine learning (ML) has enabled healthcare discoveries by facilitating efficient modeling, such as for cancer screening. Unlike clinical trials, real-world data used in ML are often gathered for multiple purposes, leading to bias and missing information for a specific classification task. This challenge is especially pronounced in healthcare because of stringent ethical considerations and resource constraints. This study proposed an integrated approach to enhance the quality of health evidence from a classification task for predicting Medicare's Diagnosis-Related Groups of ischemic heart disease (IHD) patients. Methods Eligible participants were identified from the Medical Information Mart for Intensive Care IV (MIMIC IV), a publicly available hospital database. Six ML models were selected for model triangulation. Sequential triangulation was employed via Local Process Mining (LPM) and Qualitative Comparative Analysis (QCA). Results A total of 1545 IHD hospitalizations from 916 patients were identified from the MIMIC IV. Eight health process features were identified through LPM aligned with clinical knowledge. The correlation coefficients for process features, ranging from 0.24 to 0.42, are higher than those for non-process features ranged from 0.02 to 0.36. A total of 56 unique combinations were identified from the QCA, with 28 configurations having raw coverage lower than 1.0%. The overall model performance (i.e. weighted F1 and area under the curve scores) increased after adopting this integrated approach. The proportion of cases misclassified by any of the six models decreased by 47% after incorporating process features (from 5.29% to 2.91%) and further decreased to 0.0% after applying the QCA solutions. Conclusion The integrated approach demonstrates its ability to enhance quality of a classification task through its clinical relevance, improved model performance, and reduced case-level error rates. However, more scalable QCA methods are needed for larger datasets. Developing health process feature engineering for broader applications can be a future direction.
format	Article
id	doaj-art-3ff18b7e0c304317bee0fe5e958f5942
institution	Kabale University
issn	2055-2076
language	English
publishDate	2025-01-01
publisher	SAGE Publishing
record_format	Article
series	Digital Health
spelling	doaj-art-3ff18b7e0c304317bee0fe5e958f59422025-01-17T17:03:33ZengSAGE PublishingDigital Health2055-20762025-01-011110.1177/20552076251314097Enhance health evidence quality in classification tasks: A triangulation approach utilizing case-based reasoning and process featuresRuihua Guo0Ross Smith1Qifan Chen2Angus Ritchie3Simon Poon4 Population Health Group, , Canberra, ACT, Australia School of Computer Science, , Sydney, NSW, Australia School of Computer Science, , Sydney, NSW, Australia 'Digital Health and Innovation, Sydney Local Health Distinct, Camperdown, NSW, Australia School of Computer Science, , Sydney, NSW, AustraliaObjective Machine learning (ML) has enabled healthcare discoveries by facilitating efficient modeling, such as for cancer screening. Unlike clinical trials, real-world data used in ML are often gathered for multiple purposes, leading to bias and missing information for a specific classification task. This challenge is especially pronounced in healthcare because of stringent ethical considerations and resource constraints. This study proposed an integrated approach to enhance the quality of health evidence from a classification task for predicting Medicare's Diagnosis-Related Groups of ischemic heart disease (IHD) patients. Methods Eligible participants were identified from the Medical Information Mart for Intensive Care IV (MIMIC IV), a publicly available hospital database. Six ML models were selected for model triangulation. Sequential triangulation was employed via Local Process Mining (LPM) and Qualitative Comparative Analysis (QCA). Results A total of 1545 IHD hospitalizations from 916 patients were identified from the MIMIC IV. Eight health process features were identified through LPM aligned with clinical knowledge. The correlation coefficients for process features, ranging from 0.24 to 0.42, are higher than those for non-process features ranged from 0.02 to 0.36. A total of 56 unique combinations were identified from the QCA, with 28 configurations having raw coverage lower than 1.0%. The overall model performance (i.e. weighted F1 and area under the curve scores) increased after adopting this integrated approach. The proportion of cases misclassified by any of the six models decreased by 47% after incorporating process features (from 5.29% to 2.91%) and further decreased to 0.0% after applying the QCA solutions. Conclusion The integrated approach demonstrates its ability to enhance quality of a classification task through its clinical relevance, improved model performance, and reduced case-level error rates. However, more scalable QCA methods are needed for larger datasets. Developing health process feature engineering for broader applications can be a future direction.https://doi.org/10.1177/20552076251314097
spellingShingle	Ruihua Guo Ross Smith Qifan Chen Angus Ritchie Simon Poon Enhance health evidence quality in classification tasks: A triangulation approach utilizing case-based reasoning and process features Digital Health
title	Enhance health evidence quality in classification tasks: A triangulation approach utilizing case-based reasoning and process features
title_full	Enhance health evidence quality in classification tasks: A triangulation approach utilizing case-based reasoning and process features
title_fullStr	Enhance health evidence quality in classification tasks: A triangulation approach utilizing case-based reasoning and process features
title_full_unstemmed	Enhance health evidence quality in classification tasks: A triangulation approach utilizing case-based reasoning and process features
title_short	Enhance health evidence quality in classification tasks: A triangulation approach utilizing case-based reasoning and process features
title_sort	enhance health evidence quality in classification tasks a triangulation approach utilizing case based reasoning and process features
url	https://doi.org/10.1177/20552076251314097
work_keys_str_mv	AT ruihuaguo enhancehealthevidencequalityinclassificationtasksatriangulationapproachutilizingcasebasedreasoningandprocessfeatures AT rosssmith enhancehealthevidencequalityinclassificationtasksatriangulationapproachutilizingcasebasedreasoningandprocessfeatures AT qifanchen enhancehealthevidencequalityinclassificationtasksatriangulationapproachutilizingcasebasedreasoningandprocessfeatures AT angusritchie enhancehealthevidencequalityinclassificationtasksatriangulationapproachutilizingcasebasedreasoningandprocessfeatures AT simonpoon enhancehealthevidencequalityinclassificationtasksatriangulationapproachutilizingcasebasedreasoningandprocessfeatures

Enhance health evidence quality in classification tasks: A triangulation approach utilizing case-based reasoning and process features

Similar Items