Evaluating ChatGPT-4o as a decision support tool in multidisciplinary sarcoma tumor boards: heterogeneous performance across various specialties
Background and objectivesSince the launch of ChatGPT in 2023, large language models have attracted substantial interest to be deployed in the health care sector. This study evaluates the performance of ChatGPT-4o as a support tool for decision-making in multidisciplinary sarcoma tumor boards.Methods...
Saved in:
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2025-01-01
|
Series: | Frontiers in Oncology |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fonc.2024.1526288/full |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841525886759731200 |
---|---|
author | Tekoshin Ammo Vincent G. J. Guillaume Ulf Krister Hofmann Norma M. Ulmer Nina Buenting Florian Laenger Justus P. Beier Tim Leypold |
author_facet | Tekoshin Ammo Vincent G. J. Guillaume Ulf Krister Hofmann Norma M. Ulmer Nina Buenting Florian Laenger Justus P. Beier Tim Leypold |
author_sort | Tekoshin Ammo |
collection | DOAJ |
description | Background and objectivesSince the launch of ChatGPT in 2023, large language models have attracted substantial interest to be deployed in the health care sector. This study evaluates the performance of ChatGPT-4o as a support tool for decision-making in multidisciplinary sarcoma tumor boards.MethodsWe created five sarcoma patient cases mimicking real-world scenarios and prompted ChatGPT-4o to issue tumor board decisions. These recommendations were independently assessed by a multidisciplinary panel, consisting of an orthopedic surgeon, plastic surgeon, radiation oncologist, radiologist, and pathologist. Assessments were graded on a Likert scale from 1 (completely disagree) to 5 (completely agree) across five categories: understanding, therapy/diagnostic recommendation, aftercare recommendation, summarization, and support tool effectiveness.ResultsThe mean score for ChatGPT-4o performance was 3.76, indicating moderate effectiveness. Surgical specialties received the highest score, with a mean score of 4.48, while diagnostic specialties (radiology/pathology) performed considerably better than the radiation oncology specialty, which performed poorly.ConclusionsThis study provides initial insights into the use of prompt-engineered large language models as decision support tools in sarcoma tumor boards. ChatGPT-4o recommendations regarding surgical specialties performed best while ChatGPT-4o struggled to give valuable advice in the other tested specialties. Clinicians should understand both the advantages and limitations of this technology for effective integration into clinical practice. |
format | Article |
id | doaj-art-dcce96e2d74f41b1b0501788a71f6ccc |
institution | Kabale University |
issn | 2234-943X |
language | English |
publishDate | 2025-01-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Oncology |
spelling | doaj-art-dcce96e2d74f41b1b0501788a71f6ccc2025-01-17T06:50:55ZengFrontiers Media S.A.Frontiers in Oncology2234-943X2025-01-011410.3389/fonc.2024.15262881526288Evaluating ChatGPT-4o as a decision support tool in multidisciplinary sarcoma tumor boards: heterogeneous performance across various specialtiesTekoshin Ammo0Vincent G. J. Guillaume1Ulf Krister Hofmann2Norma M. Ulmer3Nina Buenting4Florian Laenger5Justus P. Beier6Tim Leypold7Department of Plastic Surgery, Hand and Reconstructive Surgery, University Hospital Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen, Aachen, GermanyDepartment of Plastic Surgery, Hand and Reconstructive Surgery, University Hospital Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen, Aachen, GermanyDepartment of Orthopedics, Trauma and Reconstructive Surgery, Division of Arthroplasty, University Hospital Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen, Aachen, GermanyDepartment of Radiation Oncology, University Hospital Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen, Aachen, GermanyDepartment of Diagnostic and Interventional Radiology, University Hospital Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen, Aachen, GermanyInstitute of Pathology, University Hospital Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen, Aachen, GermanyDepartment of Plastic Surgery, Hand and Reconstructive Surgery, University Hospital Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen, Aachen, GermanyDepartment of Plastic Surgery, Hand and Reconstructive Surgery, University Hospital Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen, Aachen, GermanyBackground and objectivesSince the launch of ChatGPT in 2023, large language models have attracted substantial interest to be deployed in the health care sector. This study evaluates the performance of ChatGPT-4o as a support tool for decision-making in multidisciplinary sarcoma tumor boards.MethodsWe created five sarcoma patient cases mimicking real-world scenarios and prompted ChatGPT-4o to issue tumor board decisions. These recommendations were independently assessed by a multidisciplinary panel, consisting of an orthopedic surgeon, plastic surgeon, radiation oncologist, radiologist, and pathologist. Assessments were graded on a Likert scale from 1 (completely disagree) to 5 (completely agree) across five categories: understanding, therapy/diagnostic recommendation, aftercare recommendation, summarization, and support tool effectiveness.ResultsThe mean score for ChatGPT-4o performance was 3.76, indicating moderate effectiveness. Surgical specialties received the highest score, with a mean score of 4.48, while diagnostic specialties (radiology/pathology) performed considerably better than the radiation oncology specialty, which performed poorly.ConclusionsThis study provides initial insights into the use of prompt-engineered large language models as decision support tools in sarcoma tumor boards. ChatGPT-4o recommendations regarding surgical specialties performed best while ChatGPT-4o struggled to give valuable advice in the other tested specialties. Clinicians should understand both the advantages and limitations of this technology for effective integration into clinical practice.https://www.frontiersin.org/articles/10.3389/fonc.2024.1526288/fullsarcomamultidisciplinary sarcoma tumor boardartificial intelligencechat-GPTlarge language modelscancer |
spellingShingle | Tekoshin Ammo Vincent G. J. Guillaume Ulf Krister Hofmann Norma M. Ulmer Nina Buenting Florian Laenger Justus P. Beier Tim Leypold Evaluating ChatGPT-4o as a decision support tool in multidisciplinary sarcoma tumor boards: heterogeneous performance across various specialties Frontiers in Oncology sarcoma multidisciplinary sarcoma tumor board artificial intelligence chat-GPT large language models cancer |
title | Evaluating ChatGPT-4o as a decision support tool in multidisciplinary sarcoma tumor boards: heterogeneous performance across various specialties |
title_full | Evaluating ChatGPT-4o as a decision support tool in multidisciplinary sarcoma tumor boards: heterogeneous performance across various specialties |
title_fullStr | Evaluating ChatGPT-4o as a decision support tool in multidisciplinary sarcoma tumor boards: heterogeneous performance across various specialties |
title_full_unstemmed | Evaluating ChatGPT-4o as a decision support tool in multidisciplinary sarcoma tumor boards: heterogeneous performance across various specialties |
title_short | Evaluating ChatGPT-4o as a decision support tool in multidisciplinary sarcoma tumor boards: heterogeneous performance across various specialties |
title_sort | evaluating chatgpt 4o as a decision support tool in multidisciplinary sarcoma tumor boards heterogeneous performance across various specialties |
topic | sarcoma multidisciplinary sarcoma tumor board artificial intelligence chat-GPT large language models cancer |
url | https://www.frontiersin.org/articles/10.3389/fonc.2024.1526288/full |
work_keys_str_mv | AT tekoshinammo evaluatingchatgpt4oasadecisionsupporttoolinmultidisciplinarysarcomatumorboardsheterogeneousperformanceacrossvariousspecialties AT vincentgjguillaume evaluatingchatgpt4oasadecisionsupporttoolinmultidisciplinarysarcomatumorboardsheterogeneousperformanceacrossvariousspecialties AT ulfkristerhofmann evaluatingchatgpt4oasadecisionsupporttoolinmultidisciplinarysarcomatumorboardsheterogeneousperformanceacrossvariousspecialties AT normamulmer evaluatingchatgpt4oasadecisionsupporttoolinmultidisciplinarysarcomatumorboardsheterogeneousperformanceacrossvariousspecialties AT ninabuenting evaluatingchatgpt4oasadecisionsupporttoolinmultidisciplinarysarcomatumorboardsheterogeneousperformanceacrossvariousspecialties AT florianlaenger evaluatingchatgpt4oasadecisionsupporttoolinmultidisciplinarysarcomatumorboardsheterogeneousperformanceacrossvariousspecialties AT justuspbeier evaluatingchatgpt4oasadecisionsupporttoolinmultidisciplinarysarcomatumorboardsheterogeneousperformanceacrossvariousspecialties AT timleypold evaluatingchatgpt4oasadecisionsupporttoolinmultidisciplinarysarcomatumorboardsheterogeneousperformanceacrossvariousspecialties |