Large language models can extract metadata for annotation of human neuroimaging publications

We show that recent (mid-to-late 2024) commercial large language models (LLMs) are capable of good quality metadata extraction and annotation with very little work on the part of investigators for several exemplar real-world annotation tasks in the neuroimaging literature. We investigated the GPT-4o...

Full description

Saved in:
Bibliographic Details
Main Authors: Matthew D. Turner, Abhishek Appaji, Nibras Ar Rakib, Pedram Golnari, Arcot K. Rajasekar, Anitha Rathnam K V, Satya S. Sahoo, Yue Wang, Lei Wang, Jessica A. Turner
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-08-01
Series:Frontiers in Neuroinformatics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fninf.2025.1609077/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849233400234246144
author Matthew D. Turner
Abhishek Appaji
Nibras Ar Rakib
Pedram Golnari
Arcot K. Rajasekar
Anitha Rathnam K V
Satya S. Sahoo
Yue Wang
Yue Wang
Lei Wang
Jessica A. Turner
author_facet Matthew D. Turner
Abhishek Appaji
Nibras Ar Rakib
Pedram Golnari
Arcot K. Rajasekar
Anitha Rathnam K V
Satya S. Sahoo
Yue Wang
Yue Wang
Lei Wang
Jessica A. Turner
author_sort Matthew D. Turner
collection DOAJ
description We show that recent (mid-to-late 2024) commercial large language models (LLMs) are capable of good quality metadata extraction and annotation with very little work on the part of investigators for several exemplar real-world annotation tasks in the neuroimaging literature. We investigated the GPT-4o LLM from OpenAI which performed comparably with several groups of specially trained and supervised human annotators. The LLM achieves similar performance to humans, between 0.91 and 0.97 on zero-shot prompts without feedback to the LLM. Reviewing the disagreements between LLM and gold standard human annotations we note that actual LLM errors are comparable to human errors in most cases, and in many cases these disagreements are not errors. Based on the specific types of annotations we tested, with exceptionally reviewed gold-standard correct values, the LLM performance is usable for metadata annotation at scale. We encourage other research groups to develop and make available more specialized “micro-benchmarks,” like the ones we provide here, for testing both LLMs, and more complex agent systems annotation performance in real-world metadata annotation tasks.
format Article
id doaj-art-65d5a80ec4d94d3bb1b32bede4bfd9f2
institution Kabale University
issn 1662-5196
language English
publishDate 2025-08-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Neuroinformatics
spelling doaj-art-65d5a80ec4d94d3bb1b32bede4bfd9f22025-08-20T05:32:46ZengFrontiers Media S.A.Frontiers in Neuroinformatics1662-51962025-08-011910.3389/fninf.2025.16090771609077Large language models can extract metadata for annotation of human neuroimaging publicationsMatthew D. Turner0Abhishek Appaji1Nibras Ar Rakib2Pedram Golnari3Arcot K. Rajasekar4Anitha Rathnam K V5Satya S. Sahoo6Yue Wang7Yue Wang8Lei Wang9Jessica A. Turner10Department of Psychiatry, The Ohio State University, Columbus, OH, United StatesDepartment of Medical Electronics Engineering, B.M.S. College of Engineering, Bengaluru, IndiaFaculty of Information, University of Toronto, Toronto, ON, CanadaDepartment of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, United StatesSchool of Information and Library Science, University of North Carolina, Chapel Hill, NC, United StatesDepartment of Computer Science and Engineering, University Visvesvaraya College of Engineering, Bangalore University, Bengaluru, IndiaDepartment of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, United StatesSchool of Information and Library Science, University of North Carolina, Chapel Hill, NC, United StatesCarolina Health Informatics Program, University of North Carolina, Chapel Hill, NC, United StatesDepartment of Psychiatry, The Ohio State University, Columbus, OH, United StatesDepartment of Psychiatry, The Ohio State University, Columbus, OH, United StatesWe show that recent (mid-to-late 2024) commercial large language models (LLMs) are capable of good quality metadata extraction and annotation with very little work on the part of investigators for several exemplar real-world annotation tasks in the neuroimaging literature. We investigated the GPT-4o LLM from OpenAI which performed comparably with several groups of specially trained and supervised human annotators. The LLM achieves similar performance to humans, between 0.91 and 0.97 on zero-shot prompts without feedback to the LLM. Reviewing the disagreements between LLM and gold standard human annotations we note that actual LLM errors are comparable to human errors in most cases, and in many cases these disagreements are not errors. Based on the specific types of annotations we tested, with exceptionally reviewed gold-standard correct values, the LLM performance is usable for metadata annotation at scale. We encourage other research groups to develop and make available more specialized “micro-benchmarks,” like the ones we provide here, for testing both LLMs, and more complex agent systems annotation performance in real-world metadata annotation tasks.https://www.frontiersin.org/articles/10.3389/fninf.2025.1609077/fulllarge language modelsmetadata annotationinformation extractionhuman neuroimagingontologiesdocument annotation
spellingShingle Matthew D. Turner
Abhishek Appaji
Nibras Ar Rakib
Pedram Golnari
Arcot K. Rajasekar
Anitha Rathnam K V
Satya S. Sahoo
Yue Wang
Yue Wang
Lei Wang
Jessica A. Turner
Large language models can extract metadata for annotation of human neuroimaging publications
Frontiers in Neuroinformatics
large language models
metadata annotation
information extraction
human neuroimaging
ontologies
document annotation
title Large language models can extract metadata for annotation of human neuroimaging publications
title_full Large language models can extract metadata for annotation of human neuroimaging publications
title_fullStr Large language models can extract metadata for annotation of human neuroimaging publications
title_full_unstemmed Large language models can extract metadata for annotation of human neuroimaging publications
title_short Large language models can extract metadata for annotation of human neuroimaging publications
title_sort large language models can extract metadata for annotation of human neuroimaging publications
topic large language models
metadata annotation
information extraction
human neuroimaging
ontologies
document annotation
url https://www.frontiersin.org/articles/10.3389/fninf.2025.1609077/full
work_keys_str_mv AT matthewdturner largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications
AT abhishekappaji largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications
AT nibrasarrakib largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications
AT pedramgolnari largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications
AT arcotkrajasekar largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications
AT anitharathnamkv largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications
AT satyassahoo largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications
AT yuewang largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications
AT yuewang largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications
AT leiwang largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications
AT jessicaaturner largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications