The performance of GPT-3.5 and GPT-4 on genetic tests at PhD-level: GPT-4 as a promising tool for genomic medicine and education

Background: Natural Language Processing (NLP) has empowered AI models to understand and generate human language, with transformer-based architectures like GPT-3 and GPT-4 marking significant advancements. GPT-4, equipped with a larger parameter count and multimodal capabilities, offers enhanced accu...

Full description

Saved in:
Bibliographic Details
Main Authors: Teymoor Khosravi, Arian Rahimzadeh, Farzaneh Motallebi, Fatemeh Vaghefi, Zainab Mohammad Al Sudani, Morteza Oladnabi
Format: Article
Language:English
Published: Golestan University Of Medical Sciences 2024-12-01
Series:Journal of Clinical and Basic Research
Subjects:
Online Access:http://jcbr.goums.ac.ir/article-1-476-en.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Background: Natural Language Processing (NLP) has empowered AI models to understand and generate human language, with transformer-based architectures like GPT-3 and GPT-4 marking significant advancements. GPT-4, equipped with a larger parameter count and multimodal capabilities, offers enhanced accuracy and contextual understanding over its predecessor, GPT-3.5. However, challenges such as factual inaccuracies remain. This study aims to evaluate GPT-4’s performance on genetics-related tasks, assessing its strengths and limitations compared to GPT-3.5. Methods: We assessed GPT-4's performance across five key genetic tasks: (1) understanding basic genetic concepts, (2) interpreting family pedigrees, (3) analyzing genetic mutations, (4) solving population genetics problems, and (5) answering medical genetics Ph.D. entrance exam questions. Both open-ended and multiple-choice questions (MCQs) were used, some of which required forced justification to evaluate reasoning. GPT-4’s multimodal capabilities were also tested using pedigree images for inheritance pattern analysis. Results: GPT-4 demonstrated perfect accuracy in Task 1 (basic genetic concepts) and Task 3 (genetic mutation interpretation), correctly answering all 10 and 16 questions, respectively. In Task 2 (pedigree analysis), GPT-4 answered 24 out of 71 questions correctly, with 47 incorrect responses. For Task 4 (population genetics problems), GPT-4 provided 30 correct answers out of 34. In Task 5, which assessed performance on a Ph.D. entrance exam, GPT-4 correctly answered 58 out of 80 questions. Performance was notably higher for MCQs than for open-ended questions. Conclusion: GPT-4 substantially improves over GPT-3.5, particularly in understanding genetic concepts and interpreting genetic mutations. Despite these advances, its performance in more complex tasks, such as pedigree analysis, reveals areas that require further refinement. These findings highlight GPT-4's potential in advancing genetic education and research. Future studies should further explore GPT-4's capabilities and address its limitations in tasks that demand higher reasoning and factual accuracy.
ISSN:2538-3736