Evaluating ChatGPT-4o as a decision support tool in multidisciplinary sarcoma tumor boards: heterogeneous performance across various specialties

Background and objectivesSince the launch of ChatGPT in 2023, large language models have attracted substantial interest to be deployed in the health care sector. This study evaluates the performance of ChatGPT-4o as a support tool for decision-making in multidisciplinary sarcoma tumor boards.Methods...

Full description

Saved in:
Bibliographic Details
Main Authors: Tekoshin Ammo, Vincent G. J. Guillaume, Ulf Krister Hofmann, Norma M. Ulmer, Nina Buenting, Florian Laenger, Justus P. Beier, Tim Leypold
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-01-01
Series:Frontiers in Oncology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fonc.2024.1526288/full
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Background and objectivesSince the launch of ChatGPT in 2023, large language models have attracted substantial interest to be deployed in the health care sector. This study evaluates the performance of ChatGPT-4o as a support tool for decision-making in multidisciplinary sarcoma tumor boards.MethodsWe created five sarcoma patient cases mimicking real-world scenarios and prompted ChatGPT-4o to issue tumor board decisions. These recommendations were independently assessed by a multidisciplinary panel, consisting of an orthopedic surgeon, plastic surgeon, radiation oncologist, radiologist, and pathologist. Assessments were graded on a Likert scale from 1 (completely disagree) to 5 (completely agree) across five categories: understanding, therapy/diagnostic recommendation, aftercare recommendation, summarization, and support tool effectiveness.ResultsThe mean score for ChatGPT-4o performance was 3.76, indicating moderate effectiveness. Surgical specialties received the highest score, with a mean score of 4.48, while diagnostic specialties (radiology/pathology) performed considerably better than the radiation oncology specialty, which performed poorly.ConclusionsThis study provides initial insights into the use of prompt-engineered large language models as decision support tools in sarcoma tumor boards. ChatGPT-4o recommendations regarding surgical specialties performed best while ChatGPT-4o struggled to give valuable advice in the other tested specialties. Clinicians should understand both the advantages and limitations of this technology for effective integration into clinical practice.
ISSN:2234-943X