Comparison of the performances between ChatGPT and Gemini in answering questions on viral hepatitis
Abstract This is the first study to evaluate the adequacy and reliability of the ChatGPT and Gemini chatbots on viral hepatitis. A total of 176 questions were composed from three different categories. The first group includes “questions and answers (Q&As) for the public” determined by the Center...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-01-01
|
Series: | Scientific Reports |
Subjects: | |
Online Access: | https://doi.org/10.1038/s41598-024-83575-1 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841544718129823744 |
---|---|
author | Meryem Sahin Ozdemir Yusuf Emre Ozdemir |
author_facet | Meryem Sahin Ozdemir Yusuf Emre Ozdemir |
author_sort | Meryem Sahin Ozdemir |
collection | DOAJ |
description | Abstract This is the first study to evaluate the adequacy and reliability of the ChatGPT and Gemini chatbots on viral hepatitis. A total of 176 questions were composed from three different categories. The first group includes “questions and answers (Q&As) for the public” determined by the Centers for Disease Control and Prevention (CDC). The second group includes strong recommendations of international guidelines. The third group includes frequently asked questions on social media platforms. The answers of the chatbots were evaluated by two different infectious diseases specialists on a scoring scale from 1 to 4. Cohen’s kappa coefficient was calculated to assess inter-rater reliability. The reproducibility and correlation of answers generated by ChatGPT and Gemini were analyzed. ChatGPT and Gemini’s mean scores (3.55 ± 0.83 vs. 3.57 ± 0.89, p = 0.260) and completely correct response rates (71.0% vs. 78.4%, p = 0.111) were similar. In addition, in subgroup analyses with the CDC questions Sect. (90.1% vs. 91.9%, p = 0.752), the guideline questions Sect. (49.4% vs. 61.4%, p = 0.140), and the social media platform questions Sect. (82.5% vs. 90%, p = 0.335), the completely correct answers rates were similar. There was a moderate positive correlation between ChatGPT and Gemini chatbots’ answers (r = 0.633, p < 0.001). Reproducibility rates of answers to questions were 91.3% in ChatGPT and 92% in Gemini (p = 0.710). According to Cohen’s kappa test, there was a substantial inter-rater agreement for both ChatGPT (κ = 0.720) and Gemini (κ = 0.704). ChatGPT and Gemini successfully answered CDC questions and social media platform questions, but the correct answer rates were insufficient for guideline questions. |
format | Article |
id | doaj-art-1364cc00e84640e59175ec8b0457a140 |
institution | Kabale University |
issn | 2045-2322 |
language | English |
publishDate | 2025-01-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj-art-1364cc00e84640e59175ec8b0457a1402025-01-12T12:19:19ZengNature PortfolioScientific Reports2045-23222025-01-011511810.1038/s41598-024-83575-1Comparison of the performances between ChatGPT and Gemini in answering questions on viral hepatitisMeryem Sahin Ozdemir0Yusuf Emre Ozdemir1Department of Infectious Diseases and Clinical Microbiology, Basaksehir Cam and Sakura City HospitalDepartment of Infectious Diseases and Clinical Microbiology, Bakirkoy Dr Sadi Konuk Training and Research HospitalAbstract This is the first study to evaluate the adequacy and reliability of the ChatGPT and Gemini chatbots on viral hepatitis. A total of 176 questions were composed from three different categories. The first group includes “questions and answers (Q&As) for the public” determined by the Centers for Disease Control and Prevention (CDC). The second group includes strong recommendations of international guidelines. The third group includes frequently asked questions on social media platforms. The answers of the chatbots were evaluated by two different infectious diseases specialists on a scoring scale from 1 to 4. Cohen’s kappa coefficient was calculated to assess inter-rater reliability. The reproducibility and correlation of answers generated by ChatGPT and Gemini were analyzed. ChatGPT and Gemini’s mean scores (3.55 ± 0.83 vs. 3.57 ± 0.89, p = 0.260) and completely correct response rates (71.0% vs. 78.4%, p = 0.111) were similar. In addition, in subgroup analyses with the CDC questions Sect. (90.1% vs. 91.9%, p = 0.752), the guideline questions Sect. (49.4% vs. 61.4%, p = 0.140), and the social media platform questions Sect. (82.5% vs. 90%, p = 0.335), the completely correct answers rates were similar. There was a moderate positive correlation between ChatGPT and Gemini chatbots’ answers (r = 0.633, p < 0.001). Reproducibility rates of answers to questions were 91.3% in ChatGPT and 92% in Gemini (p = 0.710). According to Cohen’s kappa test, there was a substantial inter-rater agreement for both ChatGPT (κ = 0.720) and Gemini (κ = 0.704). ChatGPT and Gemini successfully answered CDC questions and social media platform questions, but the correct answer rates were insufficient for guideline questions.https://doi.org/10.1038/s41598-024-83575-1ChatGPTGeminiViral hepatitisHepatitis BHepatitis C |
spellingShingle | Meryem Sahin Ozdemir Yusuf Emre Ozdemir Comparison of the performances between ChatGPT and Gemini in answering questions on viral hepatitis Scientific Reports ChatGPT Gemini Viral hepatitis Hepatitis B Hepatitis C |
title | Comparison of the performances between ChatGPT and Gemini in answering questions on viral hepatitis |
title_full | Comparison of the performances between ChatGPT and Gemini in answering questions on viral hepatitis |
title_fullStr | Comparison of the performances between ChatGPT and Gemini in answering questions on viral hepatitis |
title_full_unstemmed | Comparison of the performances between ChatGPT and Gemini in answering questions on viral hepatitis |
title_short | Comparison of the performances between ChatGPT and Gemini in answering questions on viral hepatitis |
title_sort | comparison of the performances between chatgpt and gemini in answering questions on viral hepatitis |
topic | ChatGPT Gemini Viral hepatitis Hepatitis B Hepatitis C |
url | https://doi.org/10.1038/s41598-024-83575-1 |
work_keys_str_mv | AT meryemsahinozdemir comparisonoftheperformancesbetweenchatgptandgeminiinansweringquestionsonviralhepatitis AT yusufemreozdemir comparisonoftheperformancesbetweenchatgptandgeminiinansweringquestionsonviralhepatitis |