Performance of Artificial Intelligence Chatbots on Ultrasound Examinations: Cross-Sectional Comparative Analysis

Abstract BackgroundArtificial intelligence chatbots are being increasingly used for medical inquiries, particularly in the field of ultrasound medicine. However, their performance varies and is influenced by factors such as language, question type, and topic. Objec...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yong Zhang, Xiao Lu, Yan Luo, Ying Zhu, Wenwu Ling
Format:	Article
Language:	English
Published:	JMIR Publications 2025-01-01
Series:	JMIR Medical Informatics
Online Access:	https://medinform.jmir.org/2025/1/e63924
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841526624611205120
author	Yong Zhang Xiao Lu Yan Luo Ying Zhu Wenwu Ling
author_facet	Yong Zhang Xiao Lu Yan Luo Ying Zhu Wenwu Ling
author_sort	Yong Zhang
collection	DOAJ
description	Abstract BackgroundArtificial intelligence chatbots are being increasingly used for medical inquiries, particularly in the field of ultrasound medicine. However, their performance varies and is influenced by factors such as language, question type, and topic. ObjectiveThis study aimed to evaluate the performance of ChatGPT and ERNIE Bot in answering ultrasound-related medical examination questions, providing insights for users and developers. MethodsWe curated 554 questions from ultrasound medicine examinations, covering various question types and topics. The questions were posed in both English and Chinese. Objective questions were scored based on accuracy rates, whereas subjective questions were rated by 5 experienced doctors using a Likert scale. The data were analyzed in Excel. ResultsOf the 554 questions included in this study, single-choice questions comprised the largest share (354/554, 64%), followed by short answers (69/554, 12%) and noun explanations (63/554, 11%). The accuracy rates for objective questions ranged from 8.33% to 80%, with true or false questions scoring highest. Subjective questions received acceptability rates ranging from 47.62% to 75.36%. ERNIE Bot was superior to ChatGPT in many aspects (P ConclusionsChatbots can provide valuable ultrasound-related answers, but performance differs by model and is influenced by language, question type, and topic. In general, ERNIE Bot outperforms ChatGPT. Users and developers should understand model performance characteristics and select appropriate models for different questions and languages to optimize chatbot use.
format	Article
id	doaj-art-0fed945cd25a4501beccb8918906ccfb
institution	Kabale University
issn	2291-9694
language	English
publishDate	2025-01-01
publisher	JMIR Publications
record_format	Article
series	JMIR Medical Informatics
spelling	doaj-art-0fed945cd25a4501beccb8918906ccfb2025-01-16T15:29:32ZengJMIR PublicationsJMIR Medical Informatics2291-96942025-01-0113e63924e6392410.2196/63924Performance of Artificial Intelligence Chatbots on Ultrasound Examinations: Cross-Sectional Comparative AnalysisYong Zhanghttp://orcid.org/0000-0002-5941-342XXiao Luhttp://orcid.org/0009-0003-3791-8325Yan Luohttp://orcid.org/0009-0007-0805-8839Ying Zhuhttp://orcid.org/0009-0009-4932-7614Wenwu Linghttp://orcid.org/0009-0002-6770-8444 Abstract BackgroundArtificial intelligence chatbots are being increasingly used for medical inquiries, particularly in the field of ultrasound medicine. However, their performance varies and is influenced by factors such as language, question type, and topic. ObjectiveThis study aimed to evaluate the performance of ChatGPT and ERNIE Bot in answering ultrasound-related medical examination questions, providing insights for users and developers. MethodsWe curated 554 questions from ultrasound medicine examinations, covering various question types and topics. The questions were posed in both English and Chinese. Objective questions were scored based on accuracy rates, whereas subjective questions were rated by 5 experienced doctors using a Likert scale. The data were analyzed in Excel. ResultsOf the 554 questions included in this study, single-choice questions comprised the largest share (354/554, 64%), followed by short answers (69/554, 12%) and noun explanations (63/554, 11%). The accuracy rates for objective questions ranged from 8.33% to 80%, with true or false questions scoring highest. Subjective questions received acceptability rates ranging from 47.62% to 75.36%. ERNIE Bot was superior to ChatGPT in many aspects (P ConclusionsChatbots can provide valuable ultrasound-related answers, but performance differs by model and is influenced by language, question type, and topic. In general, ERNIE Bot outperforms ChatGPT. Users and developers should understand model performance characteristics and select appropriate models for different questions and languages to optimize chatbot use.https://medinform.jmir.org/2025/1/e63924
spellingShingle	Yong Zhang Xiao Lu Yan Luo Ying Zhu Wenwu Ling Performance of Artificial Intelligence Chatbots on Ultrasound Examinations: Cross-Sectional Comparative Analysis JMIR Medical Informatics
title	Performance of Artificial Intelligence Chatbots on Ultrasound Examinations: Cross-Sectional Comparative Analysis
title_full	Performance of Artificial Intelligence Chatbots on Ultrasound Examinations: Cross-Sectional Comparative Analysis
title_fullStr	Performance of Artificial Intelligence Chatbots on Ultrasound Examinations: Cross-Sectional Comparative Analysis
title_full_unstemmed	Performance of Artificial Intelligence Chatbots on Ultrasound Examinations: Cross-Sectional Comparative Analysis
title_short	Performance of Artificial Intelligence Chatbots on Ultrasound Examinations: Cross-Sectional Comparative Analysis
title_sort	performance of artificial intelligence chatbots on ultrasound examinations cross sectional comparative analysis
url	https://medinform.jmir.org/2025/1/e63924
work_keys_str_mv	AT yongzhang performanceofartificialintelligencechatbotsonultrasoundexaminationscrosssectionalcomparativeanalysis AT xiaolu performanceofartificialintelligencechatbotsonultrasoundexaminationscrosssectionalcomparativeanalysis AT yanluo performanceofartificialintelligencechatbotsonultrasoundexaminationscrosssectionalcomparativeanalysis AT yingzhu performanceofartificialintelligencechatbotsonultrasoundexaminationscrosssectionalcomparativeanalysis AT wenwuling performanceofartificialintelligencechatbotsonultrasoundexaminationscrosssectionalcomparativeanalysis

Performance of Artificial Intelligence Chatbots on Ultrasound Examinations: Cross-Sectional Comparative Analysis

Similar Items