Evaluating large language models as graders of medical short answer questions: a comparative analysis with expert human graders

The assessment of short-answer questions (SAQs) in medical education is resource-intensive, requiring significant expert time. Large Language Models (LLMs) offer potential for automating this process, but their efficacy in specialized medical education assessment remains understudied. To evaluate th...

Full description

Saved in:
Bibliographic Details
Main Authors: Olena Bolgova, Paul Ganguly, Muhammad Faisal Ikram, Volodymyr Mavrych
Format: Article
Language:English
Published: Taylor & Francis Group 2025-12-01
Series:Medical Education Online
Subjects:
Online Access:https://www.tandfonline.com/doi/10.1080/10872981.2025.2550751
Tags: Add Tag
No Tags, Be the first to tag this record!

Similar Items