Large language models can extract metadata for annotation of human neuroimaging publications

We show that recent (mid-to-late 2024) commercial large language models (LLMs) are capable of good quality metadata extraction and annotation with very little work on the part of investigators for several exemplar real-world annotation tasks in the neuroimaging literature. We investigated the GPT-4o...

Full description

Saved in:

Bibliographic Details
Main Authors:	Matthew D. Turner, Abhishek Appaji, Nibras Ar Rakib, Pedram Golnari, Arcot K. Rajasekar, Anitha Rathnam K V, Satya S. Sahoo, Yue Wang, Lei Wang, Jessica A. Turner
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2025-08-01
Series:	Frontiers in Neuroinformatics
Subjects:	large language models metadata annotation information extraction human neuroimaging ontologies document annotation
Online Access:	https://www.frontiersin.org/articles/10.3389/fninf.2025.1609077/full
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849233400234246144
author	Matthew D. Turner Abhishek Appaji Nibras Ar Rakib Pedram Golnari Arcot K. Rajasekar Anitha Rathnam K V Satya S. Sahoo Yue Wang Yue Wang Lei Wang Jessica A. Turner
author_facet	Matthew D. Turner Abhishek Appaji Nibras Ar Rakib Pedram Golnari Arcot K. Rajasekar Anitha Rathnam K V Satya S. Sahoo Yue Wang Yue Wang Lei Wang Jessica A. Turner
author_sort	Matthew D. Turner
collection	DOAJ
description	We show that recent (mid-to-late 2024) commercial large language models (LLMs) are capable of good quality metadata extraction and annotation with very little work on the part of investigators for several exemplar real-world annotation tasks in the neuroimaging literature. We investigated the GPT-4o LLM from OpenAI which performed comparably with several groups of specially trained and supervised human annotators. The LLM achieves similar performance to humans, between 0.91 and 0.97 on zero-shot prompts without feedback to the LLM. Reviewing the disagreements between LLM and gold standard human annotations we note that actual LLM errors are comparable to human errors in most cases, and in many cases these disagreements are not errors. Based on the specific types of annotations we tested, with exceptionally reviewed gold-standard correct values, the LLM performance is usable for metadata annotation at scale. We encourage other research groups to develop and make available more specialized “micro-benchmarks,” like the ones we provide here, for testing both LLMs, and more complex agent systems annotation performance in real-world metadata annotation tasks.
format	Article
id	doaj-art-65d5a80ec4d94d3bb1b32bede4bfd9f2
institution	Kabale University
issn	1662-5196
language	English
publishDate	2025-08-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Neuroinformatics
spelling	doaj-art-65d5a80ec4d94d3bb1b32bede4bfd9f22025-08-20T05:32:46ZengFrontiers Media S.A.Frontiers in Neuroinformatics1662-51962025-08-011910.3389/fninf.2025.16090771609077Large language models can extract metadata for annotation of human neuroimaging publicationsMatthew D. Turner0Abhishek Appaji1Nibras Ar Rakib2Pedram Golnari3Arcot K. Rajasekar4Anitha Rathnam K V5Satya S. Sahoo6Yue Wang7Yue Wang8Lei Wang9Jessica A. Turner10Department of Psychiatry, The Ohio State University, Columbus, OH, United StatesDepartment of Medical Electronics Engineering, B.M.S. College of Engineering, Bengaluru, IndiaFaculty of Information, University of Toronto, Toronto, ON, CanadaDepartment of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, United StatesSchool of Information and Library Science, University of North Carolina, Chapel Hill, NC, United StatesDepartment of Computer Science and Engineering, University Visvesvaraya College of Engineering, Bangalore University, Bengaluru, IndiaDepartment of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, United StatesSchool of Information and Library Science, University of North Carolina, Chapel Hill, NC, United StatesCarolina Health Informatics Program, University of North Carolina, Chapel Hill, NC, United StatesDepartment of Psychiatry, The Ohio State University, Columbus, OH, United StatesDepartment of Psychiatry, The Ohio State University, Columbus, OH, United StatesWe show that recent (mid-to-late 2024) commercial large language models (LLMs) are capable of good quality metadata extraction and annotation with very little work on the part of investigators for several exemplar real-world annotation tasks in the neuroimaging literature. We investigated the GPT-4o LLM from OpenAI which performed comparably with several groups of specially trained and supervised human annotators. The LLM achieves similar performance to humans, between 0.91 and 0.97 on zero-shot prompts without feedback to the LLM. Reviewing the disagreements between LLM and gold standard human annotations we note that actual LLM errors are comparable to human errors in most cases, and in many cases these disagreements are not errors. Based on the specific types of annotations we tested, with exceptionally reviewed gold-standard correct values, the LLM performance is usable for metadata annotation at scale. We encourage other research groups to develop and make available more specialized “micro-benchmarks,” like the ones we provide here, for testing both LLMs, and more complex agent systems annotation performance in real-world metadata annotation tasks.https://www.frontiersin.org/articles/10.3389/fninf.2025.1609077/fulllarge language modelsmetadata annotationinformation extractionhuman neuroimagingontologiesdocument annotation
spellingShingle	Matthew D. Turner Abhishek Appaji Nibras Ar Rakib Pedram Golnari Arcot K. Rajasekar Anitha Rathnam K V Satya S. Sahoo Yue Wang Yue Wang Lei Wang Jessica A. Turner Large language models can extract metadata for annotation of human neuroimaging publications Frontiers in Neuroinformatics large language models metadata annotation information extraction human neuroimaging ontologies document annotation
title	Large language models can extract metadata for annotation of human neuroimaging publications
title_full	Large language models can extract metadata for annotation of human neuroimaging publications
title_fullStr	Large language models can extract metadata for annotation of human neuroimaging publications
title_full_unstemmed	Large language models can extract metadata for annotation of human neuroimaging publications
title_short	Large language models can extract metadata for annotation of human neuroimaging publications
title_sort	large language models can extract metadata for annotation of human neuroimaging publications
topic	large language models metadata annotation information extraction human neuroimaging ontologies document annotation
url	https://www.frontiersin.org/articles/10.3389/fninf.2025.1609077/full
work_keys_str_mv	AT matthewdturner largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications AT abhishekappaji largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications AT nibrasarrakib largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications AT pedramgolnari largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications AT arcotkrajasekar largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications AT anitharathnamkv largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications AT satyassahoo largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications AT yuewang largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications AT yuewang largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications AT leiwang largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications AT jessicaaturner largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications

Large language models can extract metadata for annotation of human neuroimaging publications

Similar Items