Efficacy of large language models and their potential in Obstetrics and Gynecology education

Objective The performance of large language models (LLMs) and their potential utility in obstetric and gynecological education are topics of ongoing debate. This study aimed to contribute to this discussion by examining the recent advancements in LLM technology and their transformative potential in...

Full description

Saved in:
Bibliographic Details
Main Authors: Kyung Jin Eoh, Gu Yeun Kwon, Eun Jin Lee, JoonHo Lee, Inha Lee, Young Tae Kim, Eun Ji Nam
Format: Article
Language:English
Published: Korean Society of Obstetrics and Gynecology 2024-11-01
Series:Obstetrics & Gynecology Science
Subjects:
Online Access:http://ogscience.org/upload/pdf/ogs-24211.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846162941283401728
author Kyung Jin Eoh
Gu Yeun Kwon
Eun Jin Lee
JoonHo Lee
Inha Lee
Young Tae Kim
Eun Ji Nam
author_facet Kyung Jin Eoh
Gu Yeun Kwon
Eun Jin Lee
JoonHo Lee
Inha Lee
Young Tae Kim
Eun Ji Nam
author_sort Kyung Jin Eoh
collection DOAJ
description Objective The performance of large language models (LLMs) and their potential utility in obstetric and gynecological education are topics of ongoing debate. This study aimed to contribute to this discussion by examining the recent advancements in LLM technology and their transformative potential in artificial intelligence. Methods This study assessed the performance of generative pre-trained transformer (GPT)-3.5 and −4 in understanding clinical information, as well as its potential implications for obstetric and gynecological education. Obstetrics and gynecology residents at three hospitals underwent an annual promotional examination, from which 116 of the 170 questions over 4 years (2020–2023) were analyzed, excluding 54 questions with images. The scores achieved by GPT-3.5, −4, and the 100 residents were compared. Results The average scores across all 4 years for GPT-3.5 and −4 were 38.79 (standard deviation [SD], 5.65) and 79.31 (SD, 3.67), respectively. For groups first-year resident, second-year resident, and third-year resident, the cumulative annual average scores were 79.12 (SD, 9.00), 80.95 (SD, 5.86), and 83.60 (SD, 6.82), respectively. No statistically significant differences were observed between the scores of GPT-4.0 and those of the residents. When analyzing questions specific to obstetrics, the average scores for GPT-3.5 and −4.0 were 33.44 (SD, 10.18) and 90.22 (SD, 7.68), respectively. Conclusion GPT-4 demonstrated exceptional performance in obstetrics, different types of data interpretation, and problem solving, showcasing the potential utility of LLMs in these areas. However, acknowledging the constraints of LLMs is crucial and their utilization should augment human expertise and discernment.
format Article
id doaj-art-ddea31d2efdd4e6c98a549b024749288
institution Kabale University
issn 2287-8572
2287-8580
language English
publishDate 2024-11-01
publisher Korean Society of Obstetrics and Gynecology
record_format Article
series Obstetrics & Gynecology Science
spelling doaj-art-ddea31d2efdd4e6c98a549b0247492882024-11-20T01:54:55ZengKorean Society of Obstetrics and GynecologyObstetrics & Gynecology Science2287-85722287-85802024-11-0167655055610.5468/ogs.242118843Efficacy of large language models and their potential in Obstetrics and Gynecology educationKyung Jin Eoh0Gu Yeun Kwon1Eun Jin Lee2JoonHo Lee3Inha Lee4Young Tae Kim5Eun Ji Nam6 Department of Obstetrics and Gynecology, Yongin Severance Hospital, Yonsei University College of Medicine, Yongin, Korea Department of Obstetrics and Gynecology, Institute of Women’s Medical Life Science, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea Department of Obstetrics and Gynecology, Institute of Women’s Medical Life Science, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea Department of Obstetrics and Gynecology, Institute of Women’s Medical Life Science, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea Department of Obstetrics and Gynecology, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, Korea Department of Obstetrics and Gynecology, Institute of Women’s Medical Life Science, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea Department of Obstetrics and Gynecology, Institute of Women’s Medical Life Science, Severance Hospital, Yonsei University College of Medicine, Seoul, KoreaObjective The performance of large language models (LLMs) and their potential utility in obstetric and gynecological education are topics of ongoing debate. This study aimed to contribute to this discussion by examining the recent advancements in LLM technology and their transformative potential in artificial intelligence. Methods This study assessed the performance of generative pre-trained transformer (GPT)-3.5 and −4 in understanding clinical information, as well as its potential implications for obstetric and gynecological education. Obstetrics and gynecology residents at three hospitals underwent an annual promotional examination, from which 116 of the 170 questions over 4 years (2020–2023) were analyzed, excluding 54 questions with images. The scores achieved by GPT-3.5, −4, and the 100 residents were compared. Results The average scores across all 4 years for GPT-3.5 and −4 were 38.79 (standard deviation [SD], 5.65) and 79.31 (SD, 3.67), respectively. For groups first-year resident, second-year resident, and third-year resident, the cumulative annual average scores were 79.12 (SD, 9.00), 80.95 (SD, 5.86), and 83.60 (SD, 6.82), respectively. No statistically significant differences were observed between the scores of GPT-4.0 and those of the residents. When analyzing questions specific to obstetrics, the average scores for GPT-3.5 and −4.0 were 33.44 (SD, 10.18) and 90.22 (SD, 7.68), respectively. Conclusion GPT-4 demonstrated exceptional performance in obstetrics, different types of data interpretation, and problem solving, showcasing the potential utility of LLMs in these areas. However, acknowledging the constraints of LLMs is crucial and their utilization should augment human expertise and discernment.http://ogscience.org/upload/pdf/ogs-24211.pdfartificial intelligenceobstetricsgynecologymedical education
spellingShingle Kyung Jin Eoh
Gu Yeun Kwon
Eun Jin Lee
JoonHo Lee
Inha Lee
Young Tae Kim
Eun Ji Nam
Efficacy of large language models and their potential in Obstetrics and Gynecology education
Obstetrics & Gynecology Science
artificial intelligence
obstetrics
gynecology
medical education
title Efficacy of large language models and their potential in Obstetrics and Gynecology education
title_full Efficacy of large language models and their potential in Obstetrics and Gynecology education
title_fullStr Efficacy of large language models and their potential in Obstetrics and Gynecology education
title_full_unstemmed Efficacy of large language models and their potential in Obstetrics and Gynecology education
title_short Efficacy of large language models and their potential in Obstetrics and Gynecology education
title_sort efficacy of large language models and their potential in obstetrics and gynecology education
topic artificial intelligence
obstetrics
gynecology
medical education
url http://ogscience.org/upload/pdf/ogs-24211.pdf
work_keys_str_mv AT kyungjineoh efficacyoflargelanguagemodelsandtheirpotentialinobstetricsandgynecologyeducation
AT guyeunkwon efficacyoflargelanguagemodelsandtheirpotentialinobstetricsandgynecologyeducation
AT eunjinlee efficacyoflargelanguagemodelsandtheirpotentialinobstetricsandgynecologyeducation
AT joonholee efficacyoflargelanguagemodelsandtheirpotentialinobstetricsandgynecologyeducation
AT inhalee efficacyoflargelanguagemodelsandtheirpotentialinobstetricsandgynecologyeducation
AT youngtaekim efficacyoflargelanguagemodelsandtheirpotentialinobstetricsandgynecologyeducation
AT eunjinam efficacyoflargelanguagemodelsandtheirpotentialinobstetricsandgynecologyeducation