Machine learning model for differentiating malignant from benign thyroid nodules based on the thyroid function data

Objectives To develop and validate a machine learning (ML) model to differentiate malignant from benign thyroid nodules (TNs) based on the routine data and provide diagnostic assistance for medical professionals.Setting A qualified panel of 1649 patients with TNs from one hospital were stratified by...

Full description

Saved in:
Bibliographic Details
Main Authors: Quan Zhou, Lihua Zhang, Nan Xiang, Lele Zhang, Fuqiang Ma, Fengchang Yu, Shenhui Lv, Zhilin Lu, He-Rong Mao
Format: Article
Language:English
Published: BMJ Publishing Group 2025-05-01
Series:BMJ Open
Online Access:https://bmjopen.bmj.com/content/15/5/e093466.full
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Objectives To develop and validate a machine learning (ML) model to differentiate malignant from benign thyroid nodules (TNs) based on the routine data and provide diagnostic assistance for medical professionals.Setting A qualified panel of 1649 patients with TNs from one hospital were stratified by gender, age, free triiodothyronine (FT3), free thyroxine (FT4) and thyroid peroxidase antibody (TPOAB).Participants Thyroid function (TF) data of 1649 patients with TNs were collected in a single centre from January 2018 to June 2022, with a total of 273 males and 1376 females, respectively.Measures Seven popular ML models (Random Forest, Decision Tree, Logistic Regression (LR), K-Neighbours, Gaussian Naive Bayes, Multilayer Perception and Gradient Boosting) were developed to predict malignant and benign TNs, whose performance indicators included area under the curve (AUC), accuracy, recall, precision and F1 score.Results A total of 1649 patients were enrolled in this study, with the median age of 45.15±13.41 years, and the male to female ratio was 1:5.055. In the multivariate LR analysis, statistically significant differences existed between the TNs group and thyroid cancer group in gender, age, free triiodothyronine (FT3), free thyroxine (FT4) and TPOAB. Among the seven tested ML models, the best performance was achieved in the Gradient Boosting model in terms of precision, AUC, accuracy, recall and F1 score, with the AUC of 0.82, accuracy of 79.4% and precision of 0.814 after experimental verification. FT4, TPOAB and FT3 were validated as the top three features in the Gradient Boosting model.Conclusions This study innovatively developed a predictive model for benign and malignant TNs based on the Gradient Boosting Decision Tree algorithm. For the first time, it validated the clinical predictive value of TF parameters (FT4, FT3) and TPOAB as key biomarkers.
ISSN:2044-6055