Enhance health evidence quality in classification tasks: A triangulation approach utilizing case-based reasoning and process features

Objective Machine learning (ML) has enabled healthcare discoveries by facilitating efficient modeling, such as for cancer screening. Unlike clinical trials, real-world data used in ML are often gathered for multiple purposes, leading to bias and missing information for a specific classification task...

Full description

Saved in:
Bibliographic Details
Main Authors: Ruihua Guo, Ross Smith, Qifan Chen, Angus Ritchie, Simon Poon
Format: Article
Language:English
Published: SAGE Publishing 2025-01-01
Series:Digital Health
Online Access:https://doi.org/10.1177/20552076251314097
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841525287118962688
author Ruihua Guo
Ross Smith
Qifan Chen
Angus Ritchie
Simon Poon
author_facet Ruihua Guo
Ross Smith
Qifan Chen
Angus Ritchie
Simon Poon
author_sort Ruihua Guo
collection DOAJ
description Objective Machine learning (ML) has enabled healthcare discoveries by facilitating efficient modeling, such as for cancer screening. Unlike clinical trials, real-world data used in ML are often gathered for multiple purposes, leading to bias and missing information for a specific classification task. This challenge is especially pronounced in healthcare because of stringent ethical considerations and resource constraints. This study proposed an integrated approach to enhance the quality of health evidence from a classification task for predicting Medicare's Diagnosis-Related Groups of ischemic heart disease (IHD) patients. Methods Eligible participants were identified from the Medical Information Mart for Intensive Care IV (MIMIC IV), a publicly available hospital database. Six ML models were selected for model triangulation. Sequential triangulation was employed via Local Process Mining (LPM) and Qualitative Comparative Analysis (QCA). Results A total of 1545 IHD hospitalizations from 916 patients were identified from the MIMIC IV. Eight health process features were identified through LPM aligned with clinical knowledge. The correlation coefficients for process features, ranging from 0.24 to 0.42, are higher than those for non-process features ranged from 0.02 to 0.36. A total of 56 unique combinations were identified from the QCA, with 28 configurations having raw coverage lower than 1.0%. The overall model performance (i.e. weighted F1 and area under the curve scores) increased after adopting this integrated approach. The proportion of cases misclassified by any of the six models decreased by 47% after incorporating process features (from 5.29% to 2.91%) and further decreased to 0.0% after applying the QCA solutions. Conclusion The integrated approach demonstrates its ability to enhance quality of a classification task through its clinical relevance, improved model performance, and reduced case-level error rates. However, more scalable QCA methods are needed for larger datasets. Developing health process feature engineering for broader applications can be a future direction.
format Article
id doaj-art-3ff18b7e0c304317bee0fe5e958f5942
institution Kabale University
issn 2055-2076
language English
publishDate 2025-01-01
publisher SAGE Publishing
record_format Article
series Digital Health
spelling doaj-art-3ff18b7e0c304317bee0fe5e958f59422025-01-17T17:03:33ZengSAGE PublishingDigital Health2055-20762025-01-011110.1177/20552076251314097Enhance health evidence quality in classification tasks: A triangulation approach utilizing case-based reasoning and process featuresRuihua Guo0Ross Smith1Qifan Chen2Angus Ritchie3Simon Poon4 Population Health Group, , Canberra, ACT, Australia School of Computer Science, , Sydney, NSW, Australia School of Computer Science, , Sydney, NSW, Australia 'Digital Health and Innovation, Sydney Local Health Distinct, Camperdown, NSW, Australia School of Computer Science, , Sydney, NSW, AustraliaObjective Machine learning (ML) has enabled healthcare discoveries by facilitating efficient modeling, such as for cancer screening. Unlike clinical trials, real-world data used in ML are often gathered for multiple purposes, leading to bias and missing information for a specific classification task. This challenge is especially pronounced in healthcare because of stringent ethical considerations and resource constraints. This study proposed an integrated approach to enhance the quality of health evidence from a classification task for predicting Medicare's Diagnosis-Related Groups of ischemic heart disease (IHD) patients. Methods Eligible participants were identified from the Medical Information Mart for Intensive Care IV (MIMIC IV), a publicly available hospital database. Six ML models were selected for model triangulation. Sequential triangulation was employed via Local Process Mining (LPM) and Qualitative Comparative Analysis (QCA). Results A total of 1545 IHD hospitalizations from 916 patients were identified from the MIMIC IV. Eight health process features were identified through LPM aligned with clinical knowledge. The correlation coefficients for process features, ranging from 0.24 to 0.42, are higher than those for non-process features ranged from 0.02 to 0.36. A total of 56 unique combinations were identified from the QCA, with 28 configurations having raw coverage lower than 1.0%. The overall model performance (i.e. weighted F1 and area under the curve scores) increased after adopting this integrated approach. The proportion of cases misclassified by any of the six models decreased by 47% after incorporating process features (from 5.29% to 2.91%) and further decreased to 0.0% after applying the QCA solutions. Conclusion The integrated approach demonstrates its ability to enhance quality of a classification task through its clinical relevance, improved model performance, and reduced case-level error rates. However, more scalable QCA methods are needed for larger datasets. Developing health process feature engineering for broader applications can be a future direction.https://doi.org/10.1177/20552076251314097
spellingShingle Ruihua Guo
Ross Smith
Qifan Chen
Angus Ritchie
Simon Poon
Enhance health evidence quality in classification tasks: A triangulation approach utilizing case-based reasoning and process features
Digital Health
title Enhance health evidence quality in classification tasks: A triangulation approach utilizing case-based reasoning and process features
title_full Enhance health evidence quality in classification tasks: A triangulation approach utilizing case-based reasoning and process features
title_fullStr Enhance health evidence quality in classification tasks: A triangulation approach utilizing case-based reasoning and process features
title_full_unstemmed Enhance health evidence quality in classification tasks: A triangulation approach utilizing case-based reasoning and process features
title_short Enhance health evidence quality in classification tasks: A triangulation approach utilizing case-based reasoning and process features
title_sort enhance health evidence quality in classification tasks a triangulation approach utilizing case based reasoning and process features
url https://doi.org/10.1177/20552076251314097
work_keys_str_mv AT ruihuaguo enhancehealthevidencequalityinclassificationtasksatriangulationapproachutilizingcasebasedreasoningandprocessfeatures
AT rosssmith enhancehealthevidencequalityinclassificationtasksatriangulationapproachutilizingcasebasedreasoningandprocessfeatures
AT qifanchen enhancehealthevidencequalityinclassificationtasksatriangulationapproachutilizingcasebasedreasoningandprocessfeatures
AT angusritchie enhancehealthevidencequalityinclassificationtasksatriangulationapproachutilizingcasebasedreasoningandprocessfeatures
AT simonpoon enhancehealthevidencequalityinclassificationtasksatriangulationapproachutilizingcasebasedreasoningandprocessfeatures