ChatGPT-4 versus human generated multiple choice questions - A study from a medical college in Pakistan

Background: There has been a growing interest in using artificial intelligence (AI) generated multiple choice questions (MCQs) to supplement traditional assessments. While AI claims to generate higher-order questions, few studies focus on undergraduate medical education assessment in Pakistan. Ob...

Full description

Saved in:

Bibliographic Details
Main Authors:	Muhammad Ahsan Naseer, Yusra Nasir, Afifa Tabassum, Sobia Ali
Format:	Article
Language:	English
Published:	Shalamar Medical & Dental College, Lahore, Pakistan 2024-12-01
Series:	Journal of Shalamar Medical & Dental College
Subjects:	Artificial intelligence Multiple choice questions Undergraduate medical examination ChatGPT-4
Online Access:	https://journal.smdc.edu.pk/index.php/journal/article/view/253
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Background: There has been a growing interest in using artificial intelligence (AI) generated multiple choice questions (MCQs) to supplement traditional assessments. While AI claims to generate higher-order questions, few studies focus on undergraduate medical education assessment in Pakistan. Objective: To compare the quality of human-developed versus ChatGPT-4-generated MCQs for the final-year MBBS written MCQs examination Methods: This observational study compared ChatGPT-4-generated and human-developed MCQs in four specialties: Pediatrics, Obstetrics and Gynecology (Ob/Gyn), Surgery, and Medicine. Based on the table of specifications, 204 MCQs were ChatGPT-4-generated and 196 MCQs were retrieved from the question bank of the medical college. ChatGPT-4-generated and human-generated MCQs were anonymized and MCQs quality was scored using a checklist based on the National Board of Medical Examiner criteria. Data was analyzed using SPSS version 23 and Mann-Whitney U and Chi square tests were applied. Results: Out of 400 MCQs, 396 MCQs were included in the final review as four MCQs were not according to the table of specification. Total scores were not significantly different between human-generated and ChatGPT-4 generated MCQs (p=0.12). However, human-developed MCQs performed significantly better than ChatGPT-4-generated MCQ in Ob/Gyn (p=0.03). Human-developed MCQs scored better than ChatGPT-generated MCQs in the item checklist “stem includes necessary details for answering the question’’ in Ob/Gyn and Pediatrics (p < 0.05) as well as in "Is the item appropriate for cover the options rule"? in Surgery. Conclusion: With a well-structured and specific prompting, ChatGPT-4 has the potential to assist in medical examination MCQ development. However, ChatGPT-4 has limitations where in depth contextual item generation is required.
ISSN:	2789-3669 2789-3677

ChatGPT-4 versus human generated multiple choice questions - A study from a medical college in Pakistan

Similar Items