Assessment of Dental Trauma Content Generated by Three Artificial Intelligence Tools Using the Traumatic Dental Injuries Questionnaire: A Cross-Sectional Study

Objective: To analyze the agreement between responses generated by AI tools regarding dental trauma, as derived from a validated questionnaire for laypeople, based on the latest International Association of Dental Traumatology (IADT) guideline recommendations. Materials and Methods:Eleven questions...

Full description

Saved in:
Bibliographic Details
Main Authors: Ana Paula Portes Zeno, Breno Pereira Caetano, Marcela Baraúna Magno, Patrícia Andrade Risso, Lucianne Cople Maia
Format: Article
Language:English
Published: Association of Support to Oral Health Research (APESB) 2025-08-01
Series:Pesquisa Brasileira em Odontopediatria e Clínica Integrada
Subjects:
Online Access:https://revista.uepb.edu.br/PBOCI/article/view/4636
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Objective: To analyze the agreement between responses generated by AI tools regarding dental trauma, as derived from a validated questionnaire for laypeople, based on the latest International Association of Dental Traumatology (IADT) guideline recommendations. Materials and Methods:Eleven questions were entered into the AI tools ChatGPT-3.5, Microsoft Bing-Copilot, and Google Gemini on the same date and in sequence. The answers were collected and coded as "correct" (1 point) or "incorrect" (0 point). The difference between correct responses among the AI tools analyzed was evaluated using Fisher's exact test (p<0.05). The agreement between the AI tools and IADT was assessed using the Kappa test. Results: None of the AI tools answered all the questions correctly, and there were no significant differences between the three AI tools (p>0.05). Regarding the accordance with IADT recommendations, ChatGPT-3.5 demonstrated weak agreement (k = 0.46; p = 1.122), Google Gemini showed moderate agreement (k = 0.63; p = 0.036), and Microsoft Bing-Copilot exhibited strong agreement (k = 0.81; p = 0.006). The frequency of correct answers among ChatGPT-3.5, Google Gemini, and Microsoft Bing-Copilot was 73%, 82%, and 91%, respectively. Conclusion: Although there were no differences regarding the number of correct answers, the level of agreement with IADT recommendations varied among the AI tools, and their use by the public should be cautiously approached.
ISSN:1519-0501
1983-4632