Comparing AI/ML approaches and classical regression for predictive modeling using large population health databases: Applications to COVID-19 case prediction

Background: Research comparing artificial intelligence and machine learning (AI/ML) methods with classical statistical methods applied to large population health databases is limited. Objectives: This retrospective cohort study aimed to compare the predictive performance of AI/ML algorithms against...

Full description

Saved in:

Bibliographic Details
Main Authors:	Lise M. Bjerre, Cayden Peixoto, Rawan Alkurd, Robert Talarico, Rami Abielmona
Format:	Article
Language:	English
Published:	Elsevier 2024-12-01
Series:	Global Epidemiology
Subjects:	Artificial intelligence Machine learning COVID-19 Logistic regression Predictive modeling Gradient boosting trees
Online Access:	http://www.sciencedirect.com/science/article/pii/S2590113324000348
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846127246713028608
author	Lise M. Bjerre Cayden Peixoto Rawan Alkurd Robert Talarico Rami Abielmona
author_facet	Lise M. Bjerre Cayden Peixoto Rawan Alkurd Robert Talarico Rami Abielmona
author_sort	Lise M. Bjerre
collection	DOAJ
description	Background: Research comparing artificial intelligence and machine learning (AI/ML) methods with classical statistical methods applied to large population health databases is limited. Objectives: This retrospective cohort study aimed to compare the predictive performance of AI/ML algorithms against conventional multivariate logistic regression models using linked health administrative data. Methods: Using Ontario's population health databases, we created a cohort of residents of the city of Ottawa, Ontario, who underwent a PCR test for COVID-19 between March 10, 2020, and May 13, 2021. Using demographic, socio-economic and health data (including COVID-19 PCR test results and available, symptom data), we developed predictive models for the purpose of COVID-19 case identification using the following approaches: classical multivariate logistic regression (LR); deep neural network (DNN); random forest (RF); and gradient boosting trees (GBT). Model performance comparisons were made using the area under the curve (AUC) swarm plot for 10-fold cross-validation. Results: The cohort consisted of n = 351,248 Ottawa residents tested for COVID-19 during the study period. Among whom, a total of n = 883,879 unique COVID-19 tests were performed (2.6 % positive test results). Inclusion of COVID-19 symptoms data in the analysis improved model performance and variable predictive value across all tested models (p < 0.0001), with the 10-fold cross-validation AUC increasing to near or over 0.7 in all models when symptoms data were included. In various pairwise comparisons, the GBT method had the highest predictive ability (AUC = 0.796 ± 0.017), significantly outperforming multivariate logistic regression and the other AI/ML approaches. Conclusions: Conventional multivariate regression-based models are better than some and worse than other machine learning algorithms to provide good predictive accuracy in a moderate dataset with a reasonable number of features. However, whenever possible, the AI/ML GBT approach should be considered.
format	Article
id	doaj-art-90cd181a34b448cf91b0d9d86b49ac8e
institution	Kabale University
issn	2590-1133
language	English
publishDate	2024-12-01
publisher	Elsevier
record_format	Article
series	Global Epidemiology
spelling	doaj-art-90cd181a34b448cf91b0d9d86b49ac8e2024-12-12T05:22:31ZengElsevierGlobal Epidemiology2590-11332024-12-018100168Comparing AI/ML approaches and classical regression for predictive modeling using large population health databases: Applications to COVID-19 case predictionLise M. Bjerre0Cayden Peixoto1Rawan Alkurd2Robert Talarico3Rami Abielmona4Institut du Savoir Montfort, 713, chemin Montréal, Ottawa, Ontario K1K 0T2, Canada; University of Ottawa, Faculty of Medicine, Department of Family Medicine, 201-600 Peter-Morand Crescent, Ottawa ON, K1G 5Z3, Canada; Institute for Clinical and Evaluative Sciences (ICES), 1053 Carling Avenue, Box 684, Administrative Services Building, 1st Floor, Ottawa, Ontario K1Y 4E9, Canada; Corresponding author at: 713, chemin Montréal, Ottawa, Ontario K1K 0T2, Canada.Institut du Savoir Montfort, 713, chemin Montréal, Ottawa, Ontario K1K 0T2, CanadaLarus Technologies Corporation, 170 Laurier Ave West, Suite 310 Ottawa, Ontario K1P 5V5, CanadaInstitute for Clinical and Evaluative Sciences (ICES), 1053 Carling Avenue, Box 684, Administrative Services Building, 1st Floor, Ottawa, Ontario K1Y 4E9, Canada; Ottawa Hospital Research Institute, 501 Smyth Box 511, Ottawa ON, K1H 8L6, CanadaLarus Technologies Corporation, 170 Laurier Ave West, Suite 310 Ottawa, Ontario K1P 5V5, Canada; University of Ottawa, Faculty of Engineering, 800 King Edward Ave, Ottawa, ON K1N 6N5, CanadaBackground: Research comparing artificial intelligence and machine learning (AI/ML) methods with classical statistical methods applied to large population health databases is limited. Objectives: This retrospective cohort study aimed to compare the predictive performance of AI/ML algorithms against conventional multivariate logistic regression models using linked health administrative data. Methods: Using Ontario's population health databases, we created a cohort of residents of the city of Ottawa, Ontario, who underwent a PCR test for COVID-19 between March 10, 2020, and May 13, 2021. Using demographic, socio-economic and health data (including COVID-19 PCR test results and available, symptom data), we developed predictive models for the purpose of COVID-19 case identification using the following approaches: classical multivariate logistic regression (LR); deep neural network (DNN); random forest (RF); and gradient boosting trees (GBT). Model performance comparisons were made using the area under the curve (AUC) swarm plot for 10-fold cross-validation. Results: The cohort consisted of n = 351,248 Ottawa residents tested for COVID-19 during the study period. Among whom, a total of n = 883,879 unique COVID-19 tests were performed (2.6 % positive test results). Inclusion of COVID-19 symptoms data in the analysis improved model performance and variable predictive value across all tested models (p < 0.0001), with the 10-fold cross-validation AUC increasing to near or over 0.7 in all models when symptoms data were included. In various pairwise comparisons, the GBT method had the highest predictive ability (AUC = 0.796 ± 0.017), significantly outperforming multivariate logistic regression and the other AI/ML approaches. Conclusions: Conventional multivariate regression-based models are better than some and worse than other machine learning algorithms to provide good predictive accuracy in a moderate dataset with a reasonable number of features. However, whenever possible, the AI/ML GBT approach should be considered.http://www.sciencedirect.com/science/article/pii/S2590113324000348Artificial intelligenceMachine learningCOVID-19Logistic regressionPredictive modelingGradient boosting trees
spellingShingle	Lise M. Bjerre Cayden Peixoto Rawan Alkurd Robert Talarico Rami Abielmona Comparing AI/ML approaches and classical regression for predictive modeling using large population health databases: Applications to COVID-19 case prediction Global Epidemiology Artificial intelligence Machine learning COVID-19 Logistic regression Predictive modeling Gradient boosting trees
title	Comparing AI/ML approaches and classical regression for predictive modeling using large population health databases: Applications to COVID-19 case prediction
title_full	Comparing AI/ML approaches and classical regression for predictive modeling using large population health databases: Applications to COVID-19 case prediction
title_fullStr	Comparing AI/ML approaches and classical regression for predictive modeling using large population health databases: Applications to COVID-19 case prediction
title_full_unstemmed	Comparing AI/ML approaches and classical regression for predictive modeling using large population health databases: Applications to COVID-19 case prediction
title_short	Comparing AI/ML approaches and classical regression for predictive modeling using large population health databases: Applications to COVID-19 case prediction
title_sort	comparing ai ml approaches and classical regression for predictive modeling using large population health databases applications to covid 19 case prediction
topic	Artificial intelligence Machine learning COVID-19 Logistic regression Predictive modeling Gradient boosting trees
url	http://www.sciencedirect.com/science/article/pii/S2590113324000348
work_keys_str_mv	AT lisembjerre comparingaimlapproachesandclassicalregressionforpredictivemodelingusinglargepopulationhealthdatabasesapplicationstocovid19caseprediction AT caydenpeixoto comparingaimlapproachesandclassicalregressionforpredictivemodelingusinglargepopulationhealthdatabasesapplicationstocovid19caseprediction AT rawanalkurd comparingaimlapproachesandclassicalregressionforpredictivemodelingusinglargepopulationhealthdatabasesapplicationstocovid19caseprediction AT roberttalarico comparingaimlapproachesandclassicalregressionforpredictivemodelingusinglargepopulationhealthdatabasesapplicationstocovid19caseprediction AT ramiabielmona comparingaimlapproachesandclassicalregressionforpredictivemodelingusinglargepopulationhealthdatabasesapplicationstocovid19caseprediction

Comparing AI/ML approaches and classical regression for predictive modeling using large population health databases: Applications to COVID-19 case prediction

Similar Items