Large language models can extract metadata for annotation of human neuroimaging publications

We show that recent (mid-to-late 2024) commercial large language models (LLMs) are capable of good quality metadata extraction and annotation with very little work on the part of investigators for several exemplar real-world annotation tasks in the neuroimaging literature. We investigated the GPT-4o...

Full description

Saved in:

Bibliographic Details
Main Authors:	Matthew D. Turner, Abhishek Appaji, Nibras Ar Rakib, Pedram Golnari, Arcot K. Rajasekar, Anitha Rathnam K V, Satya S. Sahoo, Yue Wang, Lei Wang, Jessica A. Turner
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2025-08-01
Series:	Frontiers in Neuroinformatics
Subjects:	large language models metadata annotation information extraction human neuroimaging ontologies document annotation
Online Access:	https://www.frontiersin.org/articles/10.3389/fninf.2025.1609077/full
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We show that recent (mid-to-late 2024) commercial large language models (LLMs) are capable of good quality metadata extraction and annotation with very little work on the part of investigators for several exemplar real-world annotation tasks in the neuroimaging literature. We investigated the GPT-4o LLM from OpenAI which performed comparably with several groups of specially trained and supervised human annotators. The LLM achieves similar performance to humans, between 0.91 and 0.97 on zero-shot prompts without feedback to the LLM. Reviewing the disagreements between LLM and gold standard human annotations we note that actual LLM errors are comparable to human errors in most cases, and in many cases these disagreements are not errors. Based on the specific types of annotations we tested, with exceptionally reviewed gold-standard correct values, the LLM performance is usable for metadata annotation at scale. We encourage other research groups to develop and make available more specialized “micro-benchmarks,” like the ones we provide here, for testing both LLMs, and more complex agent systems annotation performance in real-world metadata annotation tasks.
ISSN:	1662-5196

Large language models can extract metadata for annotation of human neuroimaging publications

Similar Items