STI/HIV risk prediction model development—A novel use of public data to forecast STIs/HIV risk for men who have sex with men
A novel automatic framework is proposed for global sexually transmissible infections (STIs) and HIV risk prediction. Four machine learning methods, namely, Gradient Boosting Machine (GBM), Random Forest (RF), XG Boost, and Ensemble learning GBM-RF-XG Boost are applied and evaluated on the Demographi...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2025-01-01
|
Series: | Frontiers in Public Health |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fpubh.2024.1511689/full |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841561012889714688 |
---|---|
author | Xiaopeng Ji Zhaohui Tang Sonya R. Osborne Thi Phuoc Van Nguyen Amy B. Mullens Judith A. Dean Yan Li |
author_facet | Xiaopeng Ji Zhaohui Tang Sonya R. Osborne Thi Phuoc Van Nguyen Amy B. Mullens Judith A. Dean Yan Li |
author_sort | Xiaopeng Ji |
collection | DOAJ |
description | A novel automatic framework is proposed for global sexually transmissible infections (STIs) and HIV risk prediction. Four machine learning methods, namely, Gradient Boosting Machine (GBM), Random Forest (RF), XG Boost, and Ensemble learning GBM-RF-XG Boost are applied and evaluated on the Demographic and Health Surveys Program (DHSP), with thirteen features ultimately selected as the most predictive features. Classification and generalization experiments are conducted to test the accuracy, F1-score, precision, and area under the curve (AUC) performance of these four algorithms. Two imbalanced data solutions are also applied to reduce bias for classification performance improvement. The experimental results of these models demonstrate that the Random Forest algorithm yields the best results on HIV prediction, whereby the highest accuracy, and AUC are 0.99 and 0.99, respectively. The performance of the STI prediction achieves the best when the Synthetic Minority Oversampling Technique (SMOTE) is applied (Accuracy = 0.99, AUC = 0.99), which outperforms the state-of-the-art baselines. Two possible factors that may affect the classification and generalization performance are further analyzed. This automatic classification model helps to improve convenience and reduce the cost of HIV testing. |
format | Article |
id | doaj-art-0a325e6b46444228af3d874ad6cb4a5c |
institution | Kabale University |
issn | 2296-2565 |
language | English |
publishDate | 2025-01-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Public Health |
spelling | doaj-art-0a325e6b46444228af3d874ad6cb4a5c2025-01-03T06:46:52ZengFrontiers Media S.A.Frontiers in Public Health2296-25652025-01-011210.3389/fpubh.2024.15116891511689STI/HIV risk prediction model development—A novel use of public data to forecast STIs/HIV risk for men who have sex with menXiaopeng Ji0Zhaohui Tang1Sonya R. Osborne2Thi Phuoc Van Nguyen3Amy B. Mullens4Judith A. Dean5Yan Li6School of Mathematics, Physics and Computing, Centre for Health Research, University of Southern Queensland, Toowoomba, QLD, AustraliaSchool of Mathematics, Physics and Computing, Centre for Health Research, University of Southern Queensland, Toowoomba, QLD, AustraliaSchool of Nursing and Midwifery, Centre for Health Research, Institute for Resilient Regions, University of Southern Queensland, Ipswich, QLD, AustraliaSchool of Mathematics, Physics and Computing, Centre for Health Research, University of Southern Queensland, Toowoomba, QLD, AustraliaSchool of Psychology and Wellbeing, Centre for Health Research, Institute for Resilient Regions, University of Southern Queensland, Ipswich, QLD, AustraliaSchool of Public Health, Faculty of Medicine, The University of Queensland, Herston, QLD, AustraliaSchool of Mathematics, Physics and Computing, Centre for Health Research, University of Southern Queensland, Toowoomba, QLD, AustraliaA novel automatic framework is proposed for global sexually transmissible infections (STIs) and HIV risk prediction. Four machine learning methods, namely, Gradient Boosting Machine (GBM), Random Forest (RF), XG Boost, and Ensemble learning GBM-RF-XG Boost are applied and evaluated on the Demographic and Health Surveys Program (DHSP), with thirteen features ultimately selected as the most predictive features. Classification and generalization experiments are conducted to test the accuracy, F1-score, precision, and area under the curve (AUC) performance of these four algorithms. Two imbalanced data solutions are also applied to reduce bias for classification performance improvement. The experimental results of these models demonstrate that the Random Forest algorithm yields the best results on HIV prediction, whereby the highest accuracy, and AUC are 0.99 and 0.99, respectively. The performance of the STI prediction achieves the best when the Synthetic Minority Oversampling Technique (SMOTE) is applied (Accuracy = 0.99, AUC = 0.99), which outperforms the state-of-the-art baselines. Two possible factors that may affect the classification and generalization performance are further analyzed. This automatic classification model helps to improve convenience and reduce the cost of HIV testing.https://www.frontiersin.org/articles/10.3389/fpubh.2024.1511689/fullhuman immunodeficiency virussexually transmissible infectionsartificial intelligencemachine learningrisk prediction |
spellingShingle | Xiaopeng Ji Zhaohui Tang Sonya R. Osborne Thi Phuoc Van Nguyen Amy B. Mullens Judith A. Dean Yan Li STI/HIV risk prediction model development—A novel use of public data to forecast STIs/HIV risk for men who have sex with men Frontiers in Public Health human immunodeficiency virus sexually transmissible infections artificial intelligence machine learning risk prediction |
title | STI/HIV risk prediction model development—A novel use of public data to forecast STIs/HIV risk for men who have sex with men |
title_full | STI/HIV risk prediction model development—A novel use of public data to forecast STIs/HIV risk for men who have sex with men |
title_fullStr | STI/HIV risk prediction model development—A novel use of public data to forecast STIs/HIV risk for men who have sex with men |
title_full_unstemmed | STI/HIV risk prediction model development—A novel use of public data to forecast STIs/HIV risk for men who have sex with men |
title_short | STI/HIV risk prediction model development—A novel use of public data to forecast STIs/HIV risk for men who have sex with men |
title_sort | sti hiv risk prediction model development a novel use of public data to forecast stis hiv risk for men who have sex with men |
topic | human immunodeficiency virus sexually transmissible infections artificial intelligence machine learning risk prediction |
url | https://www.frontiersin.org/articles/10.3389/fpubh.2024.1511689/full |
work_keys_str_mv | AT xiaopengji stihivriskpredictionmodeldevelopmentanoveluseofpublicdatatoforecaststishivriskformenwhohavesexwithmen AT zhaohuitang stihivriskpredictionmodeldevelopmentanoveluseofpublicdatatoforecaststishivriskformenwhohavesexwithmen AT sonyarosborne stihivriskpredictionmodeldevelopmentanoveluseofpublicdatatoforecaststishivriskformenwhohavesexwithmen AT thiphuocvannguyen stihivriskpredictionmodeldevelopmentanoveluseofpublicdatatoforecaststishivriskformenwhohavesexwithmen AT amybmullens stihivriskpredictionmodeldevelopmentanoveluseofpublicdatatoforecaststishivriskformenwhohavesexwithmen AT judithadean stihivriskpredictionmodeldevelopmentanoveluseofpublicdatatoforecaststishivriskformenwhohavesexwithmen AT yanli stihivriskpredictionmodeldevelopmentanoveluseofpublicdatatoforecaststishivriskformenwhohavesexwithmen |