Comparing AI/ML approaches and classical regression for predictive modeling using large population health databases: Applications to COVID-19 case prediction
Background: Research comparing artificial intelligence and machine learning (AI/ML) methods with classical statistical methods applied to large population health databases is limited. Objectives: This retrospective cohort study aimed to compare the predictive performance of AI/ML algorithms against...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2024-12-01
|
| Series: | Global Epidemiology |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2590113324000348 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1846127246713028608 |
|---|---|
| author | Lise M. Bjerre Cayden Peixoto Rawan Alkurd Robert Talarico Rami Abielmona |
| author_facet | Lise M. Bjerre Cayden Peixoto Rawan Alkurd Robert Talarico Rami Abielmona |
| author_sort | Lise M. Bjerre |
| collection | DOAJ |
| description | Background: Research comparing artificial intelligence and machine learning (AI/ML) methods with classical statistical methods applied to large population health databases is limited. Objectives: This retrospective cohort study aimed to compare the predictive performance of AI/ML algorithms against conventional multivariate logistic regression models using linked health administrative data. Methods: Using Ontario's population health databases, we created a cohort of residents of the city of Ottawa, Ontario, who underwent a PCR test for COVID-19 between March 10, 2020, and May 13, 2021. Using demographic, socio-economic and health data (including COVID-19 PCR test results and available, symptom data), we developed predictive models for the purpose of COVID-19 case identification using the following approaches: classical multivariate logistic regression (LR); deep neural network (DNN); random forest (RF); and gradient boosting trees (GBT). Model performance comparisons were made using the area under the curve (AUC) swarm plot for 10-fold cross-validation. Results: The cohort consisted of n = 351,248 Ottawa residents tested for COVID-19 during the study period. Among whom, a total of n = 883,879 unique COVID-19 tests were performed (2.6 % positive test results). Inclusion of COVID-19 symptoms data in the analysis improved model performance and variable predictive value across all tested models (p < 0.0001), with the 10-fold cross-validation AUC increasing to near or over 0.7 in all models when symptoms data were included. In various pairwise comparisons, the GBT method had the highest predictive ability (AUC = 0.796 ± 0.017), significantly outperforming multivariate logistic regression and the other AI/ML approaches. Conclusions: Conventional multivariate regression-based models are better than some and worse than other machine learning algorithms to provide good predictive accuracy in a moderate dataset with a reasonable number of features. However, whenever possible, the AI/ML GBT approach should be considered. |
| format | Article |
| id | doaj-art-90cd181a34b448cf91b0d9d86b49ac8e |
| institution | Kabale University |
| issn | 2590-1133 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Global Epidemiology |
| spelling | doaj-art-90cd181a34b448cf91b0d9d86b49ac8e2024-12-12T05:22:31ZengElsevierGlobal Epidemiology2590-11332024-12-018100168Comparing AI/ML approaches and classical regression for predictive modeling using large population health databases: Applications to COVID-19 case predictionLise M. Bjerre0Cayden Peixoto1Rawan Alkurd2Robert Talarico3Rami Abielmona4Institut du Savoir Montfort, 713, chemin Montréal, Ottawa, Ontario K1K 0T2, Canada; University of Ottawa, Faculty of Medicine, Department of Family Medicine, 201-600 Peter-Morand Crescent, Ottawa ON, K1G 5Z3, Canada; Institute for Clinical and Evaluative Sciences (ICES), 1053 Carling Avenue, Box 684, Administrative Services Building, 1st Floor, Ottawa, Ontario K1Y 4E9, Canada; Corresponding author at: 713, chemin Montréal, Ottawa, Ontario K1K 0T2, Canada.Institut du Savoir Montfort, 713, chemin Montréal, Ottawa, Ontario K1K 0T2, CanadaLarus Technologies Corporation, 170 Laurier Ave West, Suite 310 Ottawa, Ontario K1P 5V5, CanadaInstitute for Clinical and Evaluative Sciences (ICES), 1053 Carling Avenue, Box 684, Administrative Services Building, 1st Floor, Ottawa, Ontario K1Y 4E9, Canada; Ottawa Hospital Research Institute, 501 Smyth Box 511, Ottawa ON, K1H 8L6, CanadaLarus Technologies Corporation, 170 Laurier Ave West, Suite 310 Ottawa, Ontario K1P 5V5, Canada; University of Ottawa, Faculty of Engineering, 800 King Edward Ave, Ottawa, ON K1N 6N5, CanadaBackground: Research comparing artificial intelligence and machine learning (AI/ML) methods with classical statistical methods applied to large population health databases is limited. Objectives: This retrospective cohort study aimed to compare the predictive performance of AI/ML algorithms against conventional multivariate logistic regression models using linked health administrative data. Methods: Using Ontario's population health databases, we created a cohort of residents of the city of Ottawa, Ontario, who underwent a PCR test for COVID-19 between March 10, 2020, and May 13, 2021. Using demographic, socio-economic and health data (including COVID-19 PCR test results and available, symptom data), we developed predictive models for the purpose of COVID-19 case identification using the following approaches: classical multivariate logistic regression (LR); deep neural network (DNN); random forest (RF); and gradient boosting trees (GBT). Model performance comparisons were made using the area under the curve (AUC) swarm plot for 10-fold cross-validation. Results: The cohort consisted of n = 351,248 Ottawa residents tested for COVID-19 during the study period. Among whom, a total of n = 883,879 unique COVID-19 tests were performed (2.6 % positive test results). Inclusion of COVID-19 symptoms data in the analysis improved model performance and variable predictive value across all tested models (p < 0.0001), with the 10-fold cross-validation AUC increasing to near or over 0.7 in all models when symptoms data were included. In various pairwise comparisons, the GBT method had the highest predictive ability (AUC = 0.796 ± 0.017), significantly outperforming multivariate logistic regression and the other AI/ML approaches. Conclusions: Conventional multivariate regression-based models are better than some and worse than other machine learning algorithms to provide good predictive accuracy in a moderate dataset with a reasonable number of features. However, whenever possible, the AI/ML GBT approach should be considered.http://www.sciencedirect.com/science/article/pii/S2590113324000348Artificial intelligenceMachine learningCOVID-19Logistic regressionPredictive modelingGradient boosting trees |
| spellingShingle | Lise M. Bjerre Cayden Peixoto Rawan Alkurd Robert Talarico Rami Abielmona Comparing AI/ML approaches and classical regression for predictive modeling using large population health databases: Applications to COVID-19 case prediction Global Epidemiology Artificial intelligence Machine learning COVID-19 Logistic regression Predictive modeling Gradient boosting trees |
| title | Comparing AI/ML approaches and classical regression for predictive modeling using large population health databases: Applications to COVID-19 case prediction |
| title_full | Comparing AI/ML approaches and classical regression for predictive modeling using large population health databases: Applications to COVID-19 case prediction |
| title_fullStr | Comparing AI/ML approaches and classical regression for predictive modeling using large population health databases: Applications to COVID-19 case prediction |
| title_full_unstemmed | Comparing AI/ML approaches and classical regression for predictive modeling using large population health databases: Applications to COVID-19 case prediction |
| title_short | Comparing AI/ML approaches and classical regression for predictive modeling using large population health databases: Applications to COVID-19 case prediction |
| title_sort | comparing ai ml approaches and classical regression for predictive modeling using large population health databases applications to covid 19 case prediction |
| topic | Artificial intelligence Machine learning COVID-19 Logistic regression Predictive modeling Gradient boosting trees |
| url | http://www.sciencedirect.com/science/article/pii/S2590113324000348 |
| work_keys_str_mv | AT lisembjerre comparingaimlapproachesandclassicalregressionforpredictivemodelingusinglargepopulationhealthdatabasesapplicationstocovid19caseprediction AT caydenpeixoto comparingaimlapproachesandclassicalregressionforpredictivemodelingusinglargepopulationhealthdatabasesapplicationstocovid19caseprediction AT rawanalkurd comparingaimlapproachesandclassicalregressionforpredictivemodelingusinglargepopulationhealthdatabasesapplicationstocovid19caseprediction AT roberttalarico comparingaimlapproachesandclassicalregressionforpredictivemodelingusinglargepopulationhealthdatabasesapplicationstocovid19caseprediction AT ramiabielmona comparingaimlapproachesandclassicalregressionforpredictivemodelingusinglargepopulationhealthdatabasesapplicationstocovid19caseprediction |