Comparison of the performances between ChatGPT and Gemini in answering questions on viral hepatitis

Abstract This is the first study to evaluate the adequacy and reliability of the ChatGPT and Gemini chatbots on viral hepatitis. A total of 176 questions were composed from three different categories. The first group includes “questions and answers (Q&As) for the public” determined by the Center...

Full description

Saved in:
Bibliographic Details
Main Authors: Meryem Sahin Ozdemir, Yusuf Emre Ozdemir
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-024-83575-1
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841544718129823744
author Meryem Sahin Ozdemir
Yusuf Emre Ozdemir
author_facet Meryem Sahin Ozdemir
Yusuf Emre Ozdemir
author_sort Meryem Sahin Ozdemir
collection DOAJ
description Abstract This is the first study to evaluate the adequacy and reliability of the ChatGPT and Gemini chatbots on viral hepatitis. A total of 176 questions were composed from three different categories. The first group includes “questions and answers (Q&As) for the public” determined by the Centers for Disease Control and Prevention (CDC). The second group includes strong recommendations of international guidelines. The third group includes frequently asked questions on social media platforms. The answers of the chatbots were evaluated by two different infectious diseases specialists on a scoring scale from 1 to 4. Cohen’s kappa coefficient was calculated to assess inter-rater reliability. The reproducibility and correlation of answers generated by ChatGPT and Gemini were analyzed. ChatGPT and Gemini’s mean scores (3.55 ± 0.83 vs. 3.57 ± 0.89, p = 0.260) and completely correct response rates (71.0% vs. 78.4%, p = 0.111) were similar. In addition, in subgroup analyses with the CDC questions Sect. (90.1% vs. 91.9%, p = 0.752), the guideline questions Sect. (49.4% vs. 61.4%, p = 0.140), and the social media platform questions Sect. (82.5% vs. 90%, p = 0.335), the completely correct answers rates were similar. There was a moderate positive correlation between ChatGPT and Gemini chatbots’ answers (r = 0.633, p < 0.001). Reproducibility rates of answers to questions were 91.3% in ChatGPT and 92% in Gemini (p = 0.710). According to Cohen’s kappa test, there was a substantial inter-rater agreement for both ChatGPT (κ = 0.720) and Gemini (κ = 0.704). ChatGPT and Gemini successfully answered CDC questions and social media platform questions, but the correct answer rates were insufficient for guideline questions.
format Article
id doaj-art-1364cc00e84640e59175ec8b0457a140
institution Kabale University
issn 2045-2322
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-1364cc00e84640e59175ec8b0457a1402025-01-12T12:19:19ZengNature PortfolioScientific Reports2045-23222025-01-011511810.1038/s41598-024-83575-1Comparison of the performances between ChatGPT and Gemini in answering questions on viral hepatitisMeryem Sahin Ozdemir0Yusuf Emre Ozdemir1Department of Infectious Diseases and Clinical Microbiology, Basaksehir Cam and Sakura City HospitalDepartment of Infectious Diseases and Clinical Microbiology, Bakirkoy Dr Sadi Konuk Training and Research HospitalAbstract This is the first study to evaluate the adequacy and reliability of the ChatGPT and Gemini chatbots on viral hepatitis. A total of 176 questions were composed from three different categories. The first group includes “questions and answers (Q&As) for the public” determined by the Centers for Disease Control and Prevention (CDC). The second group includes strong recommendations of international guidelines. The third group includes frequently asked questions on social media platforms. The answers of the chatbots were evaluated by two different infectious diseases specialists on a scoring scale from 1 to 4. Cohen’s kappa coefficient was calculated to assess inter-rater reliability. The reproducibility and correlation of answers generated by ChatGPT and Gemini were analyzed. ChatGPT and Gemini’s mean scores (3.55 ± 0.83 vs. 3.57 ± 0.89, p = 0.260) and completely correct response rates (71.0% vs. 78.4%, p = 0.111) were similar. In addition, in subgroup analyses with the CDC questions Sect. (90.1% vs. 91.9%, p = 0.752), the guideline questions Sect. (49.4% vs. 61.4%, p = 0.140), and the social media platform questions Sect. (82.5% vs. 90%, p = 0.335), the completely correct answers rates were similar. There was a moderate positive correlation between ChatGPT and Gemini chatbots’ answers (r = 0.633, p < 0.001). Reproducibility rates of answers to questions were 91.3% in ChatGPT and 92% in Gemini (p = 0.710). According to Cohen’s kappa test, there was a substantial inter-rater agreement for both ChatGPT (κ = 0.720) and Gemini (κ = 0.704). ChatGPT and Gemini successfully answered CDC questions and social media platform questions, but the correct answer rates were insufficient for guideline questions.https://doi.org/10.1038/s41598-024-83575-1ChatGPTGeminiViral hepatitisHepatitis BHepatitis C
spellingShingle Meryem Sahin Ozdemir
Yusuf Emre Ozdemir
Comparison of the performances between ChatGPT and Gemini in answering questions on viral hepatitis
Scientific Reports
ChatGPT
Gemini
Viral hepatitis
Hepatitis B
Hepatitis C
title Comparison of the performances between ChatGPT and Gemini in answering questions on viral hepatitis
title_full Comparison of the performances between ChatGPT and Gemini in answering questions on viral hepatitis
title_fullStr Comparison of the performances between ChatGPT and Gemini in answering questions on viral hepatitis
title_full_unstemmed Comparison of the performances between ChatGPT and Gemini in answering questions on viral hepatitis
title_short Comparison of the performances between ChatGPT and Gemini in answering questions on viral hepatitis
title_sort comparison of the performances between chatgpt and gemini in answering questions on viral hepatitis
topic ChatGPT
Gemini
Viral hepatitis
Hepatitis B
Hepatitis C
url https://doi.org/10.1038/s41598-024-83575-1
work_keys_str_mv AT meryemsahinozdemir comparisonoftheperformancesbetweenchatgptandgeminiinansweringquestionsonviralhepatitis
AT yusufemreozdemir comparisonoftheperformancesbetweenchatgptandgeminiinansweringquestionsonviralhepatitis