Construction of machine learning-based models for screening the high-risk patients with gastric precancerous lesions

Abstract Background The individualized prediction and discrimination of precancerous lesions of gastric cancer (PLGC) is critical for the early prevention of gastric cancer (GC). However, accurate non-invasive methods for distinguishing between PLGC and GC are currently lacking. This study therefore...

Full description

Saved in:
Bibliographic Details
Main Authors: Shuxian Yu, Haiyang Jiang, Jing Xia, Jie Gu, Mengting Chen, Yan Wang, Xiaohong Zhao, Zehua Liao, Puhua Zeng, Tian Xie, Xinbing Sui
Format: Article
Language:English
Published: BMC 2025-01-01
Series:Chinese Medicine
Subjects:
Online Access:https://doi.org/10.1186/s13020-025-01059-4
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841544294981173248
author Shuxian Yu
Haiyang Jiang
Jing Xia
Jie Gu
Mengting Chen
Yan Wang
Xiaohong Zhao
Zehua Liao
Puhua Zeng
Tian Xie
Xinbing Sui
author_facet Shuxian Yu
Haiyang Jiang
Jing Xia
Jie Gu
Mengting Chen
Yan Wang
Xiaohong Zhao
Zehua Liao
Puhua Zeng
Tian Xie
Xinbing Sui
author_sort Shuxian Yu
collection DOAJ
description Abstract Background The individualized prediction and discrimination of precancerous lesions of gastric cancer (PLGC) is critical for the early prevention of gastric cancer (GC). However, accurate non-invasive methods for distinguishing between PLGC and GC are currently lacking. This study therefore aimed to develop a risk prediction model by machine learning and deep learning techniques to aid the early diagnosis of GC. Methods In this study, a total of 2229 subjects were recruited from nine tertiary hospitals between October 2022 and November 2023. We designed a comprehensive questionnaire, identified statistically significant factors, and created a web-based column chart. Then, a risk prediction model was subsequently developed by machine learning techniques. In addition, a tongue image-based risk prediction model was established by deep learning algorithms. Results Based on logistic regression analysis, a dynamic web-based nomogram was developed and it is freely accessible at: https://yz6677.shinyapps.io/GC67/ . Then, the prediction model was established using ten different machine learning algorithms and the Random Forest (RF) model achieved the highest accuracy at 85.65%. According with the predictive results, the top 10 key risk factors were age, traditional Chinese medicine (TCM) constitution type, tongue coating color, tongue color, irregular meals, pickled food, greasy fur, over-hot eating habit, anxiety and sleep onset latency. These factors are all significant risk indicators for the progression of PLGC patients to GC patients. Subsequently, the Swin Transformer architecture was used to develop a tongue image-based model for predicting the risk for progression of PLGC. The verification set showed an accuracy of 73.33% and an area under curve (AUC) greater than 0.8 across all models. Conclusions Our study developed machine learning and deep learning-based models for predicting the risk for progression of PLGC to GC, which will offer the assistance to determine the high-risk patients from PLGC and improve the early diagnosis of GC. Graphical Abstract
format Article
id doaj-art-85d58e914da34d99875ee56c35546a9f
institution Kabale University
issn 1749-8546
language English
publishDate 2025-01-01
publisher BMC
record_format Article
series Chinese Medicine
spelling doaj-art-85d58e914da34d99875ee56c35546a9f2025-01-12T12:39:44ZengBMCChinese Medicine1749-85462025-01-0120111510.1186/s13020-025-01059-4Construction of machine learning-based models for screening the high-risk patients with gastric precancerous lesionsShuxian Yu0Haiyang Jiang1Jing Xia2Jie Gu3Mengting Chen4Yan Wang5Xiaohong Zhao6Zehua Liao7Puhua Zeng8Tian Xie9Xinbing Sui10School of Pharmacy, Hangzhou Normal UniversitySchool of Pharmacy, Hangzhou Normal UniversitySchool of Pharmacy, Hangzhou Normal UniversitySchool of Pharmacy, Hangzhou Normal UniversitySchool of Pharmacy, Hangzhou Normal UniversitySchool of Pharmacy, Hangzhou Normal UniversitySchool of Pharmacy, Hangzhou Normal UniversitySchool of Pharmacy, Hangzhou Normal UniversityThe Affiliated Hospital of Hunan Academy of Traditional Chinese MedicineSchool of Pharmacy, Hangzhou Normal UniversitySchool of Pharmacy, Hangzhou Normal UniversityAbstract Background The individualized prediction and discrimination of precancerous lesions of gastric cancer (PLGC) is critical for the early prevention of gastric cancer (GC). However, accurate non-invasive methods for distinguishing between PLGC and GC are currently lacking. This study therefore aimed to develop a risk prediction model by machine learning and deep learning techniques to aid the early diagnosis of GC. Methods In this study, a total of 2229 subjects were recruited from nine tertiary hospitals between October 2022 and November 2023. We designed a comprehensive questionnaire, identified statistically significant factors, and created a web-based column chart. Then, a risk prediction model was subsequently developed by machine learning techniques. In addition, a tongue image-based risk prediction model was established by deep learning algorithms. Results Based on logistic regression analysis, a dynamic web-based nomogram was developed and it is freely accessible at: https://yz6677.shinyapps.io/GC67/ . Then, the prediction model was established using ten different machine learning algorithms and the Random Forest (RF) model achieved the highest accuracy at 85.65%. According with the predictive results, the top 10 key risk factors were age, traditional Chinese medicine (TCM) constitution type, tongue coating color, tongue color, irregular meals, pickled food, greasy fur, over-hot eating habit, anxiety and sleep onset latency. These factors are all significant risk indicators for the progression of PLGC patients to GC patients. Subsequently, the Swin Transformer architecture was used to develop a tongue image-based model for predicting the risk for progression of PLGC. The verification set showed an accuracy of 73.33% and an area under curve (AUC) greater than 0.8 across all models. Conclusions Our study developed machine learning and deep learning-based models for predicting the risk for progression of PLGC to GC, which will offer the assistance to determine the high-risk patients from PLGC and improve the early diagnosis of GC. Graphical Abstracthttps://doi.org/10.1186/s13020-025-01059-4Precancerous lesions of gastric cancerGastric cancerMachine learningDeep learningTraditional Chinese medicine
spellingShingle Shuxian Yu
Haiyang Jiang
Jing Xia
Jie Gu
Mengting Chen
Yan Wang
Xiaohong Zhao
Zehua Liao
Puhua Zeng
Tian Xie
Xinbing Sui
Construction of machine learning-based models for screening the high-risk patients with gastric precancerous lesions
Chinese Medicine
Precancerous lesions of gastric cancer
Gastric cancer
Machine learning
Deep learning
Traditional Chinese medicine
title Construction of machine learning-based models for screening the high-risk patients with gastric precancerous lesions
title_full Construction of machine learning-based models for screening the high-risk patients with gastric precancerous lesions
title_fullStr Construction of machine learning-based models for screening the high-risk patients with gastric precancerous lesions
title_full_unstemmed Construction of machine learning-based models for screening the high-risk patients with gastric precancerous lesions
title_short Construction of machine learning-based models for screening the high-risk patients with gastric precancerous lesions
title_sort construction of machine learning based models for screening the high risk patients with gastric precancerous lesions
topic Precancerous lesions of gastric cancer
Gastric cancer
Machine learning
Deep learning
Traditional Chinese medicine
url https://doi.org/10.1186/s13020-025-01059-4
work_keys_str_mv AT shuxianyu constructionofmachinelearningbasedmodelsforscreeningthehighriskpatientswithgastricprecancerouslesions
AT haiyangjiang constructionofmachinelearningbasedmodelsforscreeningthehighriskpatientswithgastricprecancerouslesions
AT jingxia constructionofmachinelearningbasedmodelsforscreeningthehighriskpatientswithgastricprecancerouslesions
AT jiegu constructionofmachinelearningbasedmodelsforscreeningthehighriskpatientswithgastricprecancerouslesions
AT mengtingchen constructionofmachinelearningbasedmodelsforscreeningthehighriskpatientswithgastricprecancerouslesions
AT yanwang constructionofmachinelearningbasedmodelsforscreeningthehighriskpatientswithgastricprecancerouslesions
AT xiaohongzhao constructionofmachinelearningbasedmodelsforscreeningthehighriskpatientswithgastricprecancerouslesions
AT zehualiao constructionofmachinelearningbasedmodelsforscreeningthehighriskpatientswithgastricprecancerouslesions
AT puhuazeng constructionofmachinelearningbasedmodelsforscreeningthehighriskpatientswithgastricprecancerouslesions
AT tianxie constructionofmachinelearningbasedmodelsforscreeningthehighriskpatientswithgastricprecancerouslesions
AT xinbingsui constructionofmachinelearningbasedmodelsforscreeningthehighriskpatientswithgastricprecancerouslesions