Potential of multimodal large language models for data mining of medical images and free-text reports

Medical images and radiology reports are essential for physicians to diagnose medical conditions. However, the vast diversity and cross-source heterogeneity inherent in these data have posed significant challenges to the generalizability of current data-mining methods for clinical decision-making. R...

Full description

Saved in:
Bibliographic Details
Main Authors: Yutong Zhang, Yi Pan, Tianyang Zhong, Peixin Dong, Kangni Xie, Yuxiao Liu, Hanqi Jiang, Zihao Wu, Zhengliang Liu, Wei Zhao, Wei Zhang, Shijie Zhao, Tuo Zhang, Xi Jiang, Dinggang Shen, Tianming Liu, Xin Zhang
Format: Article
Language:English
Published: KeAi Communications Co., Ltd. 2024-12-01
Series:Meta-Radiology
Online Access:http://www.sciencedirect.com/science/article/pii/S2950162824000572
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841560487098056704
author Yutong Zhang
Yi Pan
Tianyang Zhong
Peixin Dong
Kangni Xie
Yuxiao Liu
Hanqi Jiang
Zihao Wu
Zhengliang Liu
Wei Zhao
Wei Zhang
Shijie Zhao
Tuo Zhang
Xi Jiang
Dinggang Shen
Tianming Liu
Xin Zhang
author_facet Yutong Zhang
Yi Pan
Tianyang Zhong
Peixin Dong
Kangni Xie
Yuxiao Liu
Hanqi Jiang
Zihao Wu
Zhengliang Liu
Wei Zhao
Wei Zhang
Shijie Zhao
Tuo Zhang
Xi Jiang
Dinggang Shen
Tianming Liu
Xin Zhang
author_sort Yutong Zhang
collection DOAJ
description Medical images and radiology reports are essential for physicians to diagnose medical conditions. However, the vast diversity and cross-source heterogeneity inherent in these data have posed significant challenges to the generalizability of current data-mining methods for clinical decision-making. Recently, multimodal large language models (MLLMs), especially Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models, have revolutionized numerous domains, significantly impacting the medical field. In this study, we conducted a detailed evaluation of the performance of the Gemini series models (including Gemini-1.0-Pro-Vision, Gemini-1.5-Pro, and Gemini-1.5-Flash) and GPT series models (including GPT-4o, GPT-4-Turbo, and GPT-3.5-Turbo) across 14 medical datasets, covering 5 medical imaging categories (dermatology, radiology, dentistry, ophthalmology, and endoscopy) and 3 radiology report datasets. The investigated tasks encompass disease classification, lesion segmentation, anatomical localization, disease diagnosis, report generation, and lesion detection. Moreover, we also validated the performance of the Claude-3-Opus, Yi-Large, Yi-Large-Turbo, and LLaMA 3 models to gain a comprehensive understanding of the MLLM models in the medical field. Our experimental results demonstrated that Gemini-series models excelled in report generation and lesion detection but faces challenges in disease classification and anatomical localization. Conversely, GPT-series models exhibited proficiency in lesion segmentation and anatomical localization but encountered difficulties in disease diagnosis and lesion detection. Additionally, both the Gemini series and GPT series contain models that have demonstrated commendable generation efficiency. While both models hold promise in reducing physician workload, alleviating pressure on limited healthcare resources, and fostering collaboration between clinical practitioners and artificial intelligence technologies, substantial enhancements and comprehensive validations remain imperative before clinical deployment.
format Article
id doaj-art-6c08c8dfff4f4288804b21c5d93f454a
institution Kabale University
issn 2950-1628
language English
publishDate 2024-12-01
publisher KeAi Communications Co., Ltd.
record_format Article
series Meta-Radiology
spelling doaj-art-6c08c8dfff4f4288804b21c5d93f454a2025-01-04T04:57:33ZengKeAi Communications Co., Ltd.Meta-Radiology2950-16282024-12-0124100103Potential of multimodal large language models for data mining of medical images and free-text reportsYutong Zhang0Yi Pan1Tianyang Zhong2Peixin Dong3Kangni Xie4Yuxiao Liu5Hanqi Jiang6Zihao Wu7Zhengliang Liu8Wei Zhao9Wei Zhang10Shijie Zhao11Tuo Zhang12Xi Jiang13Dinggang Shen14Tianming Liu15Xin Zhang16Institute of Medical Research, Northwestern Polytechnical University, Xi'an, 710072, ChinaSchool of Computing, The University of Georgia, Athens, 30602, USA; School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 611731, ChinaSchool of Automation, Northwestern Polytechnical University, Xi'an, 710072, ChinaSchool of Automation, Northwestern Polytechnical University, Xi'an, 710072, ChinaSchool of Electronics and Information, Northwestern Polytechnical University, Xi'an, 710072, ChinaSchool of Biomedical Engineering, ShanghaiTech University, Shanghai, 201210, ChinaSchool of Computing, The University of Georgia, Athens, 30602, USASchool of Computing, The University of Georgia, Athens, 30602, USASchool of Computing, The University of Georgia, Athens, 30602, USADepartment of Radiology, The Second Xiangya Hospital, Central South University, Changsha, 410011, China; Clinical Research Center for Medical Imaging in Hunan Province, Changsha, 410011, China; Institute of Biomedical and Health Engineering, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, ChinaSchool of Computer and Cyber Sciences, Augusta University, Augusta, 30912, USASchool of Automation, Northwestern Polytechnical University, Xi'an, 710072, ChinaSchool of Automation, Northwestern Polytechnical University, Xi'an, 710072, ChinaSchool of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 611731, ChinaSchool of Biomedical Engineering, ShanghaiTech University, Shanghai, 201210, China; Shanghai United Imaging Intelligence Co., Ltd., Shanghai, 200230, China; Shanghai Clinical Research and Trial Center, Shanghai, 201210, ChinaSchool of Computing, The University of Georgia, Athens, 30602, USAInstitute of Medical Research, Northwestern Polytechnical University, Xi'an, 710072, China; Corresponding author.Medical images and radiology reports are essential for physicians to diagnose medical conditions. However, the vast diversity and cross-source heterogeneity inherent in these data have posed significant challenges to the generalizability of current data-mining methods for clinical decision-making. Recently, multimodal large language models (MLLMs), especially Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models, have revolutionized numerous domains, significantly impacting the medical field. In this study, we conducted a detailed evaluation of the performance of the Gemini series models (including Gemini-1.0-Pro-Vision, Gemini-1.5-Pro, and Gemini-1.5-Flash) and GPT series models (including GPT-4o, GPT-4-Turbo, and GPT-3.5-Turbo) across 14 medical datasets, covering 5 medical imaging categories (dermatology, radiology, dentistry, ophthalmology, and endoscopy) and 3 radiology report datasets. The investigated tasks encompass disease classification, lesion segmentation, anatomical localization, disease diagnosis, report generation, and lesion detection. Moreover, we also validated the performance of the Claude-3-Opus, Yi-Large, Yi-Large-Turbo, and LLaMA 3 models to gain a comprehensive understanding of the MLLM models in the medical field. Our experimental results demonstrated that Gemini-series models excelled in report generation and lesion detection but faces challenges in disease classification and anatomical localization. Conversely, GPT-series models exhibited proficiency in lesion segmentation and anatomical localization but encountered difficulties in disease diagnosis and lesion detection. Additionally, both the Gemini series and GPT series contain models that have demonstrated commendable generation efficiency. While both models hold promise in reducing physician workload, alleviating pressure on limited healthcare resources, and fostering collaboration between clinical practitioners and artificial intelligence technologies, substantial enhancements and comprehensive validations remain imperative before clinical deployment.http://www.sciencedirect.com/science/article/pii/S2950162824000572
spellingShingle Yutong Zhang
Yi Pan
Tianyang Zhong
Peixin Dong
Kangni Xie
Yuxiao Liu
Hanqi Jiang
Zihao Wu
Zhengliang Liu
Wei Zhao
Wei Zhang
Shijie Zhao
Tuo Zhang
Xi Jiang
Dinggang Shen
Tianming Liu
Xin Zhang
Potential of multimodal large language models for data mining of medical images and free-text reports
Meta-Radiology
title Potential of multimodal large language models for data mining of medical images and free-text reports
title_full Potential of multimodal large language models for data mining of medical images and free-text reports
title_fullStr Potential of multimodal large language models for data mining of medical images and free-text reports
title_full_unstemmed Potential of multimodal large language models for data mining of medical images and free-text reports
title_short Potential of multimodal large language models for data mining of medical images and free-text reports
title_sort potential of multimodal large language models for data mining of medical images and free text reports
url http://www.sciencedirect.com/science/article/pii/S2950162824000572
work_keys_str_mv AT yutongzhang potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports
AT yipan potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports
AT tianyangzhong potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports
AT peixindong potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports
AT kangnixie potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports
AT yuxiaoliu potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports
AT hanqijiang potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports
AT zihaowu potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports
AT zhengliangliu potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports
AT weizhao potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports
AT weizhang potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports
AT shijiezhao potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports
AT tuozhang potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports
AT xijiang potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports
AT dinggangshen potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports
AT tianmingliu potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports
AT xinzhang potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports