EyeGPT for Patient Inquiries and Medical Education: Development and Validation of an Ophthalmology Large Language Model

BackgroundLarge language models (LLMs) have the potential to enhance clinical flow and improve medical education, but they encounter challenges related to specialized knowledge in ophthalmology. ObjectiveThis study aims to enhance ophthalmic knowledge by refining...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaolan Chen, Ziwei Zhao, Weiyi Zhang, Pusheng Xu, Yue Wu, Mingpu Xu, Le Gao, Yinwen Li, Xianwen Shang, Danli Shi, Mingguang He
Format: Article
Language:English
Published: JMIR Publications 2024-12-01
Series:Journal of Medical Internet Research
Online Access:https://www.jmir.org/2024/1/e60063
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846127611323875328
author Xiaolan Chen
Ziwei Zhao
Weiyi Zhang
Pusheng Xu
Yue Wu
Mingpu Xu
Le Gao
Yinwen Li
Xianwen Shang
Danli Shi
Mingguang He
author_facet Xiaolan Chen
Ziwei Zhao
Weiyi Zhang
Pusheng Xu
Yue Wu
Mingpu Xu
Le Gao
Yinwen Li
Xianwen Shang
Danli Shi
Mingguang He
author_sort Xiaolan Chen
collection DOAJ
description BackgroundLarge language models (LLMs) have the potential to enhance clinical flow and improve medical education, but they encounter challenges related to specialized knowledge in ophthalmology. ObjectiveThis study aims to enhance ophthalmic knowledge by refining a general LLM into an ophthalmology-specialized assistant for patient inquiries and medical education. MethodsWe transformed Llama2 into an ophthalmology-specialized LLM, termed EyeGPT, through the following 3 strategies: prompt engineering for role-playing, fine-tuning with publicly available data sets filtered for eye-specific terminology (83,919 samples), and retrieval-augmented generation leveraging a medical database and 14 ophthalmology textbooks. The efficacy of various EyeGPT variants was evaluated by 4 board-certified ophthalmologists through comprehensive use of 120 diverse category questions in both simple and complex question-answering scenarios. The performance of the best EyeGPT model was then compared with that of the unassisted human physician group and the EyeGPT+human group. We proposed 4 metrics for assessment: accuracy, understandability, trustworthiness, and empathy. The proportion of hallucinations was also reported. ResultsThe best fine-tuned model significantly outperformed the original Llama2 model at providing informed advice (mean 9.30, SD 4.42 vs mean 13.79, SD 5.70; P<.001) and mitigating hallucinations (97/120, 80.8% vs 53/120, 44.2%, P<.001). Incorporating information retrieval from reliable sources, particularly ophthalmology textbooks, further improved the model's response compared with solely the best fine-tuned model (mean 13.08, SD 5.43 vs mean 15.14, SD 4.64; P=.001) and reduced hallucinations (71/120, 59.2% vs 57/120, 47.4%, P=.02). Subgroup analysis revealed that EyeGPT showed robustness across common diseases, with consistent performance across different users and domains. Among the variants, the model integrating fine-tuning and book retrieval ranked highest, closely followed by the combination of fine-tuning and the manual database, standalone fine-tuning, and pure role-playing methods. EyeGPT demonstrated competitive capabilities in understandability and empathy when compared with human ophthalmologists. With the assistance of EyeGPT, the performance of the ophthalmologist was notably enhanced. ConclusionsWe pioneered and introduced EyeGPT by refining a general domain LLM and conducted a comprehensive comparison and evaluation of different strategies to develop an ophthalmology-specific assistant. Our results highlight EyeGPT’s potential to assist ophthalmologists and patients in medical settings.
format Article
id doaj-art-770a46c7544c4aac89eb1e17bcf105a9
institution Kabale University
issn 1438-8871
language English
publishDate 2024-12-01
publisher JMIR Publications
record_format Article
series Journal of Medical Internet Research
spelling doaj-art-770a46c7544c4aac89eb1e17bcf105a92024-12-11T14:02:07ZengJMIR PublicationsJournal of Medical Internet Research1438-88712024-12-0126e6006310.2196/60063EyeGPT for Patient Inquiries and Medical Education: Development and Validation of an Ophthalmology Large Language ModelXiaolan Chenhttps://orcid.org/0000-0003-1581-5045Ziwei Zhaohttps://orcid.org/0009-0008-4551-348XWeiyi Zhanghttps://orcid.org/0009-0008-2780-9121Pusheng Xuhttps://orcid.org/0000-0002-3195-4822Yue Wuhttps://orcid.org/0009-0004-8283-3854Mingpu Xuhttps://orcid.org/0000-0002-0052-0837Le Gaohttps://orcid.org/0009-0006-7494-1315Yinwen Lihttps://orcid.org/0000-0003-4254-0972Xianwen Shanghttps://orcid.org/0000-0002-2362-3222Danli Shihttps://orcid.org/0000-0001-6094-137XMingguang Hehttps://orcid.org/0000-0002-6912-2810 BackgroundLarge language models (LLMs) have the potential to enhance clinical flow and improve medical education, but they encounter challenges related to specialized knowledge in ophthalmology. ObjectiveThis study aims to enhance ophthalmic knowledge by refining a general LLM into an ophthalmology-specialized assistant for patient inquiries and medical education. MethodsWe transformed Llama2 into an ophthalmology-specialized LLM, termed EyeGPT, through the following 3 strategies: prompt engineering for role-playing, fine-tuning with publicly available data sets filtered for eye-specific terminology (83,919 samples), and retrieval-augmented generation leveraging a medical database and 14 ophthalmology textbooks. The efficacy of various EyeGPT variants was evaluated by 4 board-certified ophthalmologists through comprehensive use of 120 diverse category questions in both simple and complex question-answering scenarios. The performance of the best EyeGPT model was then compared with that of the unassisted human physician group and the EyeGPT+human group. We proposed 4 metrics for assessment: accuracy, understandability, trustworthiness, and empathy. The proportion of hallucinations was also reported. ResultsThe best fine-tuned model significantly outperformed the original Llama2 model at providing informed advice (mean 9.30, SD 4.42 vs mean 13.79, SD 5.70; P<.001) and mitigating hallucinations (97/120, 80.8% vs 53/120, 44.2%, P<.001). Incorporating information retrieval from reliable sources, particularly ophthalmology textbooks, further improved the model's response compared with solely the best fine-tuned model (mean 13.08, SD 5.43 vs mean 15.14, SD 4.64; P=.001) and reduced hallucinations (71/120, 59.2% vs 57/120, 47.4%, P=.02). Subgroup analysis revealed that EyeGPT showed robustness across common diseases, with consistent performance across different users and domains. Among the variants, the model integrating fine-tuning and book retrieval ranked highest, closely followed by the combination of fine-tuning and the manual database, standalone fine-tuning, and pure role-playing methods. EyeGPT demonstrated competitive capabilities in understandability and empathy when compared with human ophthalmologists. With the assistance of EyeGPT, the performance of the ophthalmologist was notably enhanced. ConclusionsWe pioneered and introduced EyeGPT by refining a general domain LLM and conducted a comprehensive comparison and evaluation of different strategies to develop an ophthalmology-specific assistant. Our results highlight EyeGPT’s potential to assist ophthalmologists and patients in medical settings.https://www.jmir.org/2024/1/e60063
spellingShingle Xiaolan Chen
Ziwei Zhao
Weiyi Zhang
Pusheng Xu
Yue Wu
Mingpu Xu
Le Gao
Yinwen Li
Xianwen Shang
Danli Shi
Mingguang He
EyeGPT for Patient Inquiries and Medical Education: Development and Validation of an Ophthalmology Large Language Model
Journal of Medical Internet Research
title EyeGPT for Patient Inquiries and Medical Education: Development and Validation of an Ophthalmology Large Language Model
title_full EyeGPT for Patient Inquiries and Medical Education: Development and Validation of an Ophthalmology Large Language Model
title_fullStr EyeGPT for Patient Inquiries and Medical Education: Development and Validation of an Ophthalmology Large Language Model
title_full_unstemmed EyeGPT for Patient Inquiries and Medical Education: Development and Validation of an Ophthalmology Large Language Model
title_short EyeGPT for Patient Inquiries and Medical Education: Development and Validation of an Ophthalmology Large Language Model
title_sort eyegpt for patient inquiries and medical education development and validation of an ophthalmology large language model
url https://www.jmir.org/2024/1/e60063
work_keys_str_mv AT xiaolanchen eyegptforpatientinquiriesandmedicaleducationdevelopmentandvalidationofanophthalmologylargelanguagemodel
AT ziweizhao eyegptforpatientinquiriesandmedicaleducationdevelopmentandvalidationofanophthalmologylargelanguagemodel
AT weiyizhang eyegptforpatientinquiriesandmedicaleducationdevelopmentandvalidationofanophthalmologylargelanguagemodel
AT pushengxu eyegptforpatientinquiriesandmedicaleducationdevelopmentandvalidationofanophthalmologylargelanguagemodel
AT yuewu eyegptforpatientinquiriesandmedicaleducationdevelopmentandvalidationofanophthalmologylargelanguagemodel
AT mingpuxu eyegptforpatientinquiriesandmedicaleducationdevelopmentandvalidationofanophthalmologylargelanguagemodel
AT legao eyegptforpatientinquiriesandmedicaleducationdevelopmentandvalidationofanophthalmologylargelanguagemodel
AT yinwenli eyegptforpatientinquiriesandmedicaleducationdevelopmentandvalidationofanophthalmologylargelanguagemodel
AT xianwenshang eyegptforpatientinquiriesandmedicaleducationdevelopmentandvalidationofanophthalmologylargelanguagemodel
AT danlishi eyegptforpatientinquiriesandmedicaleducationdevelopmentandvalidationofanophthalmologylargelanguagemodel
AT mingguanghe eyegptforpatientinquiriesandmedicaleducationdevelopmentandvalidationofanophthalmologylargelanguagemodel