STI/HIV risk prediction model development—A novel use of public data to forecast STIs/HIV risk for men who have sex with men

A novel automatic framework is proposed for global sexually transmissible infections (STIs) and HIV risk prediction. Four machine learning methods, namely, Gradient Boosting Machine (GBM), Random Forest (RF), XG Boost, and Ensemble learning GBM-RF-XG Boost are applied and evaluated on the Demographi...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaopeng Ji, Zhaohui Tang, Sonya R. Osborne, Thi Phuoc Van Nguyen, Amy B. Mullens, Judith A. Dean, Yan Li
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-01-01
Series:Frontiers in Public Health
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fpubh.2024.1511689/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841561012889714688
author Xiaopeng Ji
Zhaohui Tang
Sonya R. Osborne
Thi Phuoc Van Nguyen
Amy B. Mullens
Judith A. Dean
Yan Li
author_facet Xiaopeng Ji
Zhaohui Tang
Sonya R. Osborne
Thi Phuoc Van Nguyen
Amy B. Mullens
Judith A. Dean
Yan Li
author_sort Xiaopeng Ji
collection DOAJ
description A novel automatic framework is proposed for global sexually transmissible infections (STIs) and HIV risk prediction. Four machine learning methods, namely, Gradient Boosting Machine (GBM), Random Forest (RF), XG Boost, and Ensemble learning GBM-RF-XG Boost are applied and evaluated on the Demographic and Health Surveys Program (DHSP), with thirteen features ultimately selected as the most predictive features. Classification and generalization experiments are conducted to test the accuracy, F1-score, precision, and area under the curve (AUC) performance of these four algorithms. Two imbalanced data solutions are also applied to reduce bias for classification performance improvement. The experimental results of these models demonstrate that the Random Forest algorithm yields the best results on HIV prediction, whereby the highest accuracy, and AUC are 0.99 and 0.99, respectively. The performance of the STI prediction achieves the best when the Synthetic Minority Oversampling Technique (SMOTE) is applied (Accuracy = 0.99, AUC = 0.99), which outperforms the state-of-the-art baselines. Two possible factors that may affect the classification and generalization performance are further analyzed. This automatic classification model helps to improve convenience and reduce the cost of HIV testing.
format Article
id doaj-art-0a325e6b46444228af3d874ad6cb4a5c
institution Kabale University
issn 2296-2565
language English
publishDate 2025-01-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Public Health
spelling doaj-art-0a325e6b46444228af3d874ad6cb4a5c2025-01-03T06:46:52ZengFrontiers Media S.A.Frontiers in Public Health2296-25652025-01-011210.3389/fpubh.2024.15116891511689STI/HIV risk prediction model development—A novel use of public data to forecast STIs/HIV risk for men who have sex with menXiaopeng Ji0Zhaohui Tang1Sonya R. Osborne2Thi Phuoc Van Nguyen3Amy B. Mullens4Judith A. Dean5Yan Li6School of Mathematics, Physics and Computing, Centre for Health Research, University of Southern Queensland, Toowoomba, QLD, AustraliaSchool of Mathematics, Physics and Computing, Centre for Health Research, University of Southern Queensland, Toowoomba, QLD, AustraliaSchool of Nursing and Midwifery, Centre for Health Research, Institute for Resilient Regions, University of Southern Queensland, Ipswich, QLD, AustraliaSchool of Mathematics, Physics and Computing, Centre for Health Research, University of Southern Queensland, Toowoomba, QLD, AustraliaSchool of Psychology and Wellbeing, Centre for Health Research, Institute for Resilient Regions, University of Southern Queensland, Ipswich, QLD, AustraliaSchool of Public Health, Faculty of Medicine, The University of Queensland, Herston, QLD, AustraliaSchool of Mathematics, Physics and Computing, Centre for Health Research, University of Southern Queensland, Toowoomba, QLD, AustraliaA novel automatic framework is proposed for global sexually transmissible infections (STIs) and HIV risk prediction. Four machine learning methods, namely, Gradient Boosting Machine (GBM), Random Forest (RF), XG Boost, and Ensemble learning GBM-RF-XG Boost are applied and evaluated on the Demographic and Health Surveys Program (DHSP), with thirteen features ultimately selected as the most predictive features. Classification and generalization experiments are conducted to test the accuracy, F1-score, precision, and area under the curve (AUC) performance of these four algorithms. Two imbalanced data solutions are also applied to reduce bias for classification performance improvement. The experimental results of these models demonstrate that the Random Forest algorithm yields the best results on HIV prediction, whereby the highest accuracy, and AUC are 0.99 and 0.99, respectively. The performance of the STI prediction achieves the best when the Synthetic Minority Oversampling Technique (SMOTE) is applied (Accuracy = 0.99, AUC = 0.99), which outperforms the state-of-the-art baselines. Two possible factors that may affect the classification and generalization performance are further analyzed. This automatic classification model helps to improve convenience and reduce the cost of HIV testing.https://www.frontiersin.org/articles/10.3389/fpubh.2024.1511689/fullhuman immunodeficiency virussexually transmissible infectionsartificial intelligencemachine learningrisk prediction
spellingShingle Xiaopeng Ji
Zhaohui Tang
Sonya R. Osborne
Thi Phuoc Van Nguyen
Amy B. Mullens
Judith A. Dean
Yan Li
STI/HIV risk prediction model development—A novel use of public data to forecast STIs/HIV risk for men who have sex with men
Frontiers in Public Health
human immunodeficiency virus
sexually transmissible infections
artificial intelligence
machine learning
risk prediction
title STI/HIV risk prediction model development—A novel use of public data to forecast STIs/HIV risk for men who have sex with men
title_full STI/HIV risk prediction model development—A novel use of public data to forecast STIs/HIV risk for men who have sex with men
title_fullStr STI/HIV risk prediction model development—A novel use of public data to forecast STIs/HIV risk for men who have sex with men
title_full_unstemmed STI/HIV risk prediction model development—A novel use of public data to forecast STIs/HIV risk for men who have sex with men
title_short STI/HIV risk prediction model development—A novel use of public data to forecast STIs/HIV risk for men who have sex with men
title_sort sti hiv risk prediction model development a novel use of public data to forecast stis hiv risk for men who have sex with men
topic human immunodeficiency virus
sexually transmissible infections
artificial intelligence
machine learning
risk prediction
url https://www.frontiersin.org/articles/10.3389/fpubh.2024.1511689/full
work_keys_str_mv AT xiaopengji stihivriskpredictionmodeldevelopmentanoveluseofpublicdatatoforecaststishivriskformenwhohavesexwithmen
AT zhaohuitang stihivriskpredictionmodeldevelopmentanoveluseofpublicdatatoforecaststishivriskformenwhohavesexwithmen
AT sonyarosborne stihivriskpredictionmodeldevelopmentanoveluseofpublicdatatoforecaststishivriskformenwhohavesexwithmen
AT thiphuocvannguyen stihivriskpredictionmodeldevelopmentanoveluseofpublicdatatoforecaststishivriskformenwhohavesexwithmen
AT amybmullens stihivriskpredictionmodeldevelopmentanoveluseofpublicdatatoforecaststishivriskformenwhohavesexwithmen
AT judithadean stihivriskpredictionmodeldevelopmentanoveluseofpublicdatatoforecaststishivriskformenwhohavesexwithmen
AT yanli stihivriskpredictionmodeldevelopmentanoveluseofpublicdatatoforecaststishivriskformenwhohavesexwithmen