Quantitative evaluation of GPT-4’s performance on US and Chinese osteoarthritis treatment guideline interpretation and orthopaedic case consultation

Objectives To evaluate GPT-4’s performance in interpreting osteoarthritis (OA) treatment guidelines from the USA and China, and to assess its ability to diagnose and manage orthopaedic cases.Setting The study was conducted using publicly available OA treatment guidelines and simulated orthopaedic ca...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xiang Gao, Xu Li, Juntan Li, Tianxu Dou, Yuyang Gao, Wannan Zhu
Format:	Article
Language:	English
Published:	BMJ Publishing Group 2024-12-01
Series:	BMJ Open
Online Access:	https://bmjopen.bmj.com/content/14/12/e082344.full
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846100266540072960
author	Xiang Gao Xu Li Juntan Li Tianxu Dou Yuyang Gao Wannan Zhu
author_facet	Xiang Gao Xu Li Juntan Li Tianxu Dou Yuyang Gao Wannan Zhu
author_sort	Xiang Gao
collection	DOAJ
description	Objectives To evaluate GPT-4’s performance in interpreting osteoarthritis (OA) treatment guidelines from the USA and China, and to assess its ability to diagnose and manage orthopaedic cases.Setting The study was conducted using publicly available OA treatment guidelines and simulated orthopaedic case scenarios.Participants No human participants were involved. The evaluation focused on GPT-4’s responses to clinical guidelines and case questions, assessed by two orthopaedic specialists.Outcomes Primary outcomes included the accuracy and completeness of GPT-4’s responses to guideline-based queries and case scenarios. Metrics included the correct match rate, completeness score and stratification of case responses into predefined tiers of correctness.Results In interpreting the American Academy of Orthopaedic Surgeons and Chinese OA guidelines, GPT-4 achieved a correct match rate of 46.4% and complete agreement with all score-2 recommendations. The accuracy score for guideline interpretation was 4.3±1.6 (95% CI 3.9 to 4.7), and the completeness score was 2.8±0.6 (95% CI 2.5 to 3.1). For case-based questions, GPT-4 demonstrated high performance, with over 88% of responses rated as comprehensive.Conclusions GPT-4 demonstrates promising capabilities as an auxiliary tool in orthopaedic clinical practice and patient education, with high levels of accuracy and completeness in guideline interpretation and clinical case analysis. However, further validation is necessary to establish its utility in real-world clinical settings.
format	Article
id	doaj-art-9fa2c9ecd9c74b46bd5832b9bc6a5c9c
institution	Kabale University
issn	2044-6055
language	English
publishDate	2024-12-01
publisher	BMJ Publishing Group
record_format	Article
series	BMJ Open
spelling	doaj-art-9fa2c9ecd9c74b46bd5832b9bc6a5c9c2024-12-30T11:20:08ZengBMJ Publishing GroupBMJ Open2044-60552024-12-01141210.1136/bmjopen-2023-082344Quantitative evaluation of GPT-4’s performance on US and Chinese osteoarthritis treatment guideline interpretation and orthopaedic case consultationXiang Gao0Xu Li1Juntan Li2Tianxu Dou3Yuyang Gao4Wannan Zhu5Public Health Education, UNC Greensboro, Greensboro, North Carolina, USA1 Department of Anesthesiology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, ChinaJinzhou Medical University, Jinzhou, Liaoning, ChinaDepartment of Orthopedics, The First Hospital of China Medical University, Shenyang, ChinaDepartment of Orthopedics, The First Hospital of China Medical University, Shenyang, ChinaJinzhou Medical University, Jinzhou, Liaoning, ChinaObjectives To evaluate GPT-4’s performance in interpreting osteoarthritis (OA) treatment guidelines from the USA and China, and to assess its ability to diagnose and manage orthopaedic cases.Setting The study was conducted using publicly available OA treatment guidelines and simulated orthopaedic case scenarios.Participants No human participants were involved. The evaluation focused on GPT-4’s responses to clinical guidelines and case questions, assessed by two orthopaedic specialists.Outcomes Primary outcomes included the accuracy and completeness of GPT-4’s responses to guideline-based queries and case scenarios. Metrics included the correct match rate, completeness score and stratification of case responses into predefined tiers of correctness.Results In interpreting the American Academy of Orthopaedic Surgeons and Chinese OA guidelines, GPT-4 achieved a correct match rate of 46.4% and complete agreement with all score-2 recommendations. The accuracy score for guideline interpretation was 4.3±1.6 (95% CI 3.9 to 4.7), and the completeness score was 2.8±0.6 (95% CI 2.5 to 3.1). For case-based questions, GPT-4 demonstrated high performance, with over 88% of responses rated as comprehensive.Conclusions GPT-4 demonstrates promising capabilities as an auxiliary tool in orthopaedic clinical practice and patient education, with high levels of accuracy and completeness in guideline interpretation and clinical case analysis. However, further validation is necessary to establish its utility in real-world clinical settings.https://bmjopen.bmj.com/content/14/12/e082344.full
spellingShingle	Xiang Gao Xu Li Juntan Li Tianxu Dou Yuyang Gao Wannan Zhu Quantitative evaluation of GPT-4’s performance on US and Chinese osteoarthritis treatment guideline interpretation and orthopaedic case consultation BMJ Open
title	Quantitative evaluation of GPT-4’s performance on US and Chinese osteoarthritis treatment guideline interpretation and orthopaedic case consultation
title_full	Quantitative evaluation of GPT-4’s performance on US and Chinese osteoarthritis treatment guideline interpretation and orthopaedic case consultation
title_fullStr	Quantitative evaluation of GPT-4’s performance on US and Chinese osteoarthritis treatment guideline interpretation and orthopaedic case consultation
title_full_unstemmed	Quantitative evaluation of GPT-4’s performance on US and Chinese osteoarthritis treatment guideline interpretation and orthopaedic case consultation
title_short	Quantitative evaluation of GPT-4’s performance on US and Chinese osteoarthritis treatment guideline interpretation and orthopaedic case consultation
title_sort	quantitative evaluation of gpt 4 s performance on us and chinese osteoarthritis treatment guideline interpretation and orthopaedic case consultation
url	https://bmjopen.bmj.com/content/14/12/e082344.full
work_keys_str_mv	AT xianggao quantitativeevaluationofgpt4sperformanceonusandchineseosteoarthritistreatmentguidelineinterpretationandorthopaediccaseconsultation AT xuli quantitativeevaluationofgpt4sperformanceonusandchineseosteoarthritistreatmentguidelineinterpretationandorthopaediccaseconsultation AT juntanli quantitativeevaluationofgpt4sperformanceonusandchineseosteoarthritistreatmentguidelineinterpretationandorthopaediccaseconsultation AT tianxudou quantitativeevaluationofgpt4sperformanceonusandchineseosteoarthritistreatmentguidelineinterpretationandorthopaediccaseconsultation AT yuyanggao quantitativeevaluationofgpt4sperformanceonusandchineseosteoarthritistreatmentguidelineinterpretationandorthopaediccaseconsultation AT wannanzhu quantitativeevaluationofgpt4sperformanceonusandchineseosteoarthritistreatmentguidelineinterpretationandorthopaediccaseconsultation

Quantitative evaluation of GPT-4’s performance on US and Chinese osteoarthritis treatment guideline interpretation and orthopaedic case consultation

Similar Items