Potential of multimodal large language models for data mining of medical images and free-text reports
Medical images and radiology reports are essential for physicians to diagnose medical conditions. However, the vast diversity and cross-source heterogeneity inherent in these data have posed significant challenges to the generalizability of current data-mining methods for clinical decision-making. R...
Saved in:
Main Authors: | , , , , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
KeAi Communications Co., Ltd.
2024-12-01
|
Series: | Meta-Radiology |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2950162824000572 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841560487098056704 |
---|---|
author | Yutong Zhang Yi Pan Tianyang Zhong Peixin Dong Kangni Xie Yuxiao Liu Hanqi Jiang Zihao Wu Zhengliang Liu Wei Zhao Wei Zhang Shijie Zhao Tuo Zhang Xi Jiang Dinggang Shen Tianming Liu Xin Zhang |
author_facet | Yutong Zhang Yi Pan Tianyang Zhong Peixin Dong Kangni Xie Yuxiao Liu Hanqi Jiang Zihao Wu Zhengliang Liu Wei Zhao Wei Zhang Shijie Zhao Tuo Zhang Xi Jiang Dinggang Shen Tianming Liu Xin Zhang |
author_sort | Yutong Zhang |
collection | DOAJ |
description | Medical images and radiology reports are essential for physicians to diagnose medical conditions. However, the vast diversity and cross-source heterogeneity inherent in these data have posed significant challenges to the generalizability of current data-mining methods for clinical decision-making. Recently, multimodal large language models (MLLMs), especially Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models, have revolutionized numerous domains, significantly impacting the medical field. In this study, we conducted a detailed evaluation of the performance of the Gemini series models (including Gemini-1.0-Pro-Vision, Gemini-1.5-Pro, and Gemini-1.5-Flash) and GPT series models (including GPT-4o, GPT-4-Turbo, and GPT-3.5-Turbo) across 14 medical datasets, covering 5 medical imaging categories (dermatology, radiology, dentistry, ophthalmology, and endoscopy) and 3 radiology report datasets. The investigated tasks encompass disease classification, lesion segmentation, anatomical localization, disease diagnosis, report generation, and lesion detection. Moreover, we also validated the performance of the Claude-3-Opus, Yi-Large, Yi-Large-Turbo, and LLaMA 3 models to gain a comprehensive understanding of the MLLM models in the medical field. Our experimental results demonstrated that Gemini-series models excelled in report generation and lesion detection but faces challenges in disease classification and anatomical localization. Conversely, GPT-series models exhibited proficiency in lesion segmentation and anatomical localization but encountered difficulties in disease diagnosis and lesion detection. Additionally, both the Gemini series and GPT series contain models that have demonstrated commendable generation efficiency. While both models hold promise in reducing physician workload, alleviating pressure on limited healthcare resources, and fostering collaboration between clinical practitioners and artificial intelligence technologies, substantial enhancements and comprehensive validations remain imperative before clinical deployment. |
format | Article |
id | doaj-art-6c08c8dfff4f4288804b21c5d93f454a |
institution | Kabale University |
issn | 2950-1628 |
language | English |
publishDate | 2024-12-01 |
publisher | KeAi Communications Co., Ltd. |
record_format | Article |
series | Meta-Radiology |
spelling | doaj-art-6c08c8dfff4f4288804b21c5d93f454a2025-01-04T04:57:33ZengKeAi Communications Co., Ltd.Meta-Radiology2950-16282024-12-0124100103Potential of multimodal large language models for data mining of medical images and free-text reportsYutong Zhang0Yi Pan1Tianyang Zhong2Peixin Dong3Kangni Xie4Yuxiao Liu5Hanqi Jiang6Zihao Wu7Zhengliang Liu8Wei Zhao9Wei Zhang10Shijie Zhao11Tuo Zhang12Xi Jiang13Dinggang Shen14Tianming Liu15Xin Zhang16Institute of Medical Research, Northwestern Polytechnical University, Xi'an, 710072, ChinaSchool of Computing, The University of Georgia, Athens, 30602, USA; School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 611731, ChinaSchool of Automation, Northwestern Polytechnical University, Xi'an, 710072, ChinaSchool of Automation, Northwestern Polytechnical University, Xi'an, 710072, ChinaSchool of Electronics and Information, Northwestern Polytechnical University, Xi'an, 710072, ChinaSchool of Biomedical Engineering, ShanghaiTech University, Shanghai, 201210, ChinaSchool of Computing, The University of Georgia, Athens, 30602, USASchool of Computing, The University of Georgia, Athens, 30602, USASchool of Computing, The University of Georgia, Athens, 30602, USADepartment of Radiology, The Second Xiangya Hospital, Central South University, Changsha, 410011, China; Clinical Research Center for Medical Imaging in Hunan Province, Changsha, 410011, China; Institute of Biomedical and Health Engineering, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, ChinaSchool of Computer and Cyber Sciences, Augusta University, Augusta, 30912, USASchool of Automation, Northwestern Polytechnical University, Xi'an, 710072, ChinaSchool of Automation, Northwestern Polytechnical University, Xi'an, 710072, ChinaSchool of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 611731, ChinaSchool of Biomedical Engineering, ShanghaiTech University, Shanghai, 201210, China; Shanghai United Imaging Intelligence Co., Ltd., Shanghai, 200230, China; Shanghai Clinical Research and Trial Center, Shanghai, 201210, ChinaSchool of Computing, The University of Georgia, Athens, 30602, USAInstitute of Medical Research, Northwestern Polytechnical University, Xi'an, 710072, China; Corresponding author.Medical images and radiology reports are essential for physicians to diagnose medical conditions. However, the vast diversity and cross-source heterogeneity inherent in these data have posed significant challenges to the generalizability of current data-mining methods for clinical decision-making. Recently, multimodal large language models (MLLMs), especially Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models, have revolutionized numerous domains, significantly impacting the medical field. In this study, we conducted a detailed evaluation of the performance of the Gemini series models (including Gemini-1.0-Pro-Vision, Gemini-1.5-Pro, and Gemini-1.5-Flash) and GPT series models (including GPT-4o, GPT-4-Turbo, and GPT-3.5-Turbo) across 14 medical datasets, covering 5 medical imaging categories (dermatology, radiology, dentistry, ophthalmology, and endoscopy) and 3 radiology report datasets. The investigated tasks encompass disease classification, lesion segmentation, anatomical localization, disease diagnosis, report generation, and lesion detection. Moreover, we also validated the performance of the Claude-3-Opus, Yi-Large, Yi-Large-Turbo, and LLaMA 3 models to gain a comprehensive understanding of the MLLM models in the medical field. Our experimental results demonstrated that Gemini-series models excelled in report generation and lesion detection but faces challenges in disease classification and anatomical localization. Conversely, GPT-series models exhibited proficiency in lesion segmentation and anatomical localization but encountered difficulties in disease diagnosis and lesion detection. Additionally, both the Gemini series and GPT series contain models that have demonstrated commendable generation efficiency. While both models hold promise in reducing physician workload, alleviating pressure on limited healthcare resources, and fostering collaboration between clinical practitioners and artificial intelligence technologies, substantial enhancements and comprehensive validations remain imperative before clinical deployment.http://www.sciencedirect.com/science/article/pii/S2950162824000572 |
spellingShingle | Yutong Zhang Yi Pan Tianyang Zhong Peixin Dong Kangni Xie Yuxiao Liu Hanqi Jiang Zihao Wu Zhengliang Liu Wei Zhao Wei Zhang Shijie Zhao Tuo Zhang Xi Jiang Dinggang Shen Tianming Liu Xin Zhang Potential of multimodal large language models for data mining of medical images and free-text reports Meta-Radiology |
title | Potential of multimodal large language models for data mining of medical images and free-text reports |
title_full | Potential of multimodal large language models for data mining of medical images and free-text reports |
title_fullStr | Potential of multimodal large language models for data mining of medical images and free-text reports |
title_full_unstemmed | Potential of multimodal large language models for data mining of medical images and free-text reports |
title_short | Potential of multimodal large language models for data mining of medical images and free-text reports |
title_sort | potential of multimodal large language models for data mining of medical images and free text reports |
url | http://www.sciencedirect.com/science/article/pii/S2950162824000572 |
work_keys_str_mv | AT yutongzhang potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports AT yipan potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports AT tianyangzhong potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports AT peixindong potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports AT kangnixie potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports AT yuxiaoliu potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports AT hanqijiang potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports AT zihaowu potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports AT zhengliangliu potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports AT weizhao potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports AT weizhang potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports AT shijiezhao potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports AT tuozhang potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports AT xijiang potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports AT dinggangshen potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports AT tianmingliu potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports AT xinzhang potentialofmultimodallargelanguagemodelsfordataminingofmedicalimagesandfreetextreports |