Evaluating ChatGPT-4o as a decision support tool in multidisciplinary sarcoma tumor boards: heterogeneous performance across various specialties

Background and objectivesSince the launch of ChatGPT in 2023, large language models have attracted substantial interest to be deployed in the health care sector. This study evaluates the performance of ChatGPT-4o as a support tool for decision-making in multidisciplinary sarcoma tumor boards.Methods...

Full description

Saved in:
Bibliographic Details
Main Authors: Tekoshin Ammo, Vincent G. J. Guillaume, Ulf Krister Hofmann, Norma M. Ulmer, Nina Buenting, Florian Laenger, Justus P. Beier, Tim Leypold
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-01-01
Series:Frontiers in Oncology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fonc.2024.1526288/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841525886759731200
author Tekoshin Ammo
Vincent G. J. Guillaume
Ulf Krister Hofmann
Norma M. Ulmer
Nina Buenting
Florian Laenger
Justus P. Beier
Tim Leypold
author_facet Tekoshin Ammo
Vincent G. J. Guillaume
Ulf Krister Hofmann
Norma M. Ulmer
Nina Buenting
Florian Laenger
Justus P. Beier
Tim Leypold
author_sort Tekoshin Ammo
collection DOAJ
description Background and objectivesSince the launch of ChatGPT in 2023, large language models have attracted substantial interest to be deployed in the health care sector. This study evaluates the performance of ChatGPT-4o as a support tool for decision-making in multidisciplinary sarcoma tumor boards.MethodsWe created five sarcoma patient cases mimicking real-world scenarios and prompted ChatGPT-4o to issue tumor board decisions. These recommendations were independently assessed by a multidisciplinary panel, consisting of an orthopedic surgeon, plastic surgeon, radiation oncologist, radiologist, and pathologist. Assessments were graded on a Likert scale from 1 (completely disagree) to 5 (completely agree) across five categories: understanding, therapy/diagnostic recommendation, aftercare recommendation, summarization, and support tool effectiveness.ResultsThe mean score for ChatGPT-4o performance was 3.76, indicating moderate effectiveness. Surgical specialties received the highest score, with a mean score of 4.48, while diagnostic specialties (radiology/pathology) performed considerably better than the radiation oncology specialty, which performed poorly.ConclusionsThis study provides initial insights into the use of prompt-engineered large language models as decision support tools in sarcoma tumor boards. ChatGPT-4o recommendations regarding surgical specialties performed best while ChatGPT-4o struggled to give valuable advice in the other tested specialties. Clinicians should understand both the advantages and limitations of this technology for effective integration into clinical practice.
format Article
id doaj-art-dcce96e2d74f41b1b0501788a71f6ccc
institution Kabale University
issn 2234-943X
language English
publishDate 2025-01-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Oncology
spelling doaj-art-dcce96e2d74f41b1b0501788a71f6ccc2025-01-17T06:50:55ZengFrontiers Media S.A.Frontiers in Oncology2234-943X2025-01-011410.3389/fonc.2024.15262881526288Evaluating ChatGPT-4o as a decision support tool in multidisciplinary sarcoma tumor boards: heterogeneous performance across various specialtiesTekoshin Ammo0Vincent G. J. Guillaume1Ulf Krister Hofmann2Norma M. Ulmer3Nina Buenting4Florian Laenger5Justus P. Beier6Tim Leypold7Department of Plastic Surgery, Hand and Reconstructive Surgery, University Hospital Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen, Aachen, GermanyDepartment of Plastic Surgery, Hand and Reconstructive Surgery, University Hospital Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen, Aachen, GermanyDepartment of Orthopedics, Trauma and Reconstructive Surgery, Division of Arthroplasty, University Hospital Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen, Aachen, GermanyDepartment of Radiation Oncology, University Hospital Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen, Aachen, GermanyDepartment of Diagnostic and Interventional Radiology, University Hospital Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen, Aachen, GermanyInstitute of Pathology, University Hospital Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen, Aachen, GermanyDepartment of Plastic Surgery, Hand and Reconstructive Surgery, University Hospital Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen, Aachen, GermanyDepartment of Plastic Surgery, Hand and Reconstructive Surgery, University Hospital Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen, Aachen, GermanyBackground and objectivesSince the launch of ChatGPT in 2023, large language models have attracted substantial interest to be deployed in the health care sector. This study evaluates the performance of ChatGPT-4o as a support tool for decision-making in multidisciplinary sarcoma tumor boards.MethodsWe created five sarcoma patient cases mimicking real-world scenarios and prompted ChatGPT-4o to issue tumor board decisions. These recommendations were independently assessed by a multidisciplinary panel, consisting of an orthopedic surgeon, plastic surgeon, radiation oncologist, radiologist, and pathologist. Assessments were graded on a Likert scale from 1 (completely disagree) to 5 (completely agree) across five categories: understanding, therapy/diagnostic recommendation, aftercare recommendation, summarization, and support tool effectiveness.ResultsThe mean score for ChatGPT-4o performance was 3.76, indicating moderate effectiveness. Surgical specialties received the highest score, with a mean score of 4.48, while diagnostic specialties (radiology/pathology) performed considerably better than the radiation oncology specialty, which performed poorly.ConclusionsThis study provides initial insights into the use of prompt-engineered large language models as decision support tools in sarcoma tumor boards. ChatGPT-4o recommendations regarding surgical specialties performed best while ChatGPT-4o struggled to give valuable advice in the other tested specialties. Clinicians should understand both the advantages and limitations of this technology for effective integration into clinical practice.https://www.frontiersin.org/articles/10.3389/fonc.2024.1526288/fullsarcomamultidisciplinary sarcoma tumor boardartificial intelligencechat-GPTlarge language modelscancer
spellingShingle Tekoshin Ammo
Vincent G. J. Guillaume
Ulf Krister Hofmann
Norma M. Ulmer
Nina Buenting
Florian Laenger
Justus P. Beier
Tim Leypold
Evaluating ChatGPT-4o as a decision support tool in multidisciplinary sarcoma tumor boards: heterogeneous performance across various specialties
Frontiers in Oncology
sarcoma
multidisciplinary sarcoma tumor board
artificial intelligence
chat-GPT
large language models
cancer
title Evaluating ChatGPT-4o as a decision support tool in multidisciplinary sarcoma tumor boards: heterogeneous performance across various specialties
title_full Evaluating ChatGPT-4o as a decision support tool in multidisciplinary sarcoma tumor boards: heterogeneous performance across various specialties
title_fullStr Evaluating ChatGPT-4o as a decision support tool in multidisciplinary sarcoma tumor boards: heterogeneous performance across various specialties
title_full_unstemmed Evaluating ChatGPT-4o as a decision support tool in multidisciplinary sarcoma tumor boards: heterogeneous performance across various specialties
title_short Evaluating ChatGPT-4o as a decision support tool in multidisciplinary sarcoma tumor boards: heterogeneous performance across various specialties
title_sort evaluating chatgpt 4o as a decision support tool in multidisciplinary sarcoma tumor boards heterogeneous performance across various specialties
topic sarcoma
multidisciplinary sarcoma tumor board
artificial intelligence
chat-GPT
large language models
cancer
url https://www.frontiersin.org/articles/10.3389/fonc.2024.1526288/full
work_keys_str_mv AT tekoshinammo evaluatingchatgpt4oasadecisionsupporttoolinmultidisciplinarysarcomatumorboardsheterogeneousperformanceacrossvariousspecialties
AT vincentgjguillaume evaluatingchatgpt4oasadecisionsupporttoolinmultidisciplinarysarcomatumorboardsheterogeneousperformanceacrossvariousspecialties
AT ulfkristerhofmann evaluatingchatgpt4oasadecisionsupporttoolinmultidisciplinarysarcomatumorboardsheterogeneousperformanceacrossvariousspecialties
AT normamulmer evaluatingchatgpt4oasadecisionsupporttoolinmultidisciplinarysarcomatumorboardsheterogeneousperformanceacrossvariousspecialties
AT ninabuenting evaluatingchatgpt4oasadecisionsupporttoolinmultidisciplinarysarcomatumorboardsheterogeneousperformanceacrossvariousspecialties
AT florianlaenger evaluatingchatgpt4oasadecisionsupporttoolinmultidisciplinarysarcomatumorboardsheterogeneousperformanceacrossvariousspecialties
AT justuspbeier evaluatingchatgpt4oasadecisionsupporttoolinmultidisciplinarysarcomatumorboardsheterogeneousperformanceacrossvariousspecialties
AT timleypold evaluatingchatgpt4oasadecisionsupporttoolinmultidisciplinarysarcomatumorboardsheterogeneousperformanceacrossvariousspecialties