Large language models can extract metadata for annotation of human neuroimaging publications
We show that recent (mid-to-late 2024) commercial large language models (LLMs) are capable of good quality metadata extraction and annotation with very little work on the part of investigators for several exemplar real-world annotation tasks in the neuroimaging literature. We investigated the GPT-4o...
Saved in:
| Main Authors: | , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Frontiers Media S.A.
2025-08-01
|
| Series: | Frontiers in Neuroinformatics |
| Subjects: | |
| Online Access: | https://www.frontiersin.org/articles/10.3389/fninf.2025.1609077/full |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849233400234246144 |
|---|---|
| author | Matthew D. Turner Abhishek Appaji Nibras Ar Rakib Pedram Golnari Arcot K. Rajasekar Anitha Rathnam K V Satya S. Sahoo Yue Wang Yue Wang Lei Wang Jessica A. Turner |
| author_facet | Matthew D. Turner Abhishek Appaji Nibras Ar Rakib Pedram Golnari Arcot K. Rajasekar Anitha Rathnam K V Satya S. Sahoo Yue Wang Yue Wang Lei Wang Jessica A. Turner |
| author_sort | Matthew D. Turner |
| collection | DOAJ |
| description | We show that recent (mid-to-late 2024) commercial large language models (LLMs) are capable of good quality metadata extraction and annotation with very little work on the part of investigators for several exemplar real-world annotation tasks in the neuroimaging literature. We investigated the GPT-4o LLM from OpenAI which performed comparably with several groups of specially trained and supervised human annotators. The LLM achieves similar performance to humans, between 0.91 and 0.97 on zero-shot prompts without feedback to the LLM. Reviewing the disagreements between LLM and gold standard human annotations we note that actual LLM errors are comparable to human errors in most cases, and in many cases these disagreements are not errors. Based on the specific types of annotations we tested, with exceptionally reviewed gold-standard correct values, the LLM performance is usable for metadata annotation at scale. We encourage other research groups to develop and make available more specialized “micro-benchmarks,” like the ones we provide here, for testing both LLMs, and more complex agent systems annotation performance in real-world metadata annotation tasks. |
| format | Article |
| id | doaj-art-65d5a80ec4d94d3bb1b32bede4bfd9f2 |
| institution | Kabale University |
| issn | 1662-5196 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | Frontiers Media S.A. |
| record_format | Article |
| series | Frontiers in Neuroinformatics |
| spelling | doaj-art-65d5a80ec4d94d3bb1b32bede4bfd9f22025-08-20T05:32:46ZengFrontiers Media S.A.Frontiers in Neuroinformatics1662-51962025-08-011910.3389/fninf.2025.16090771609077Large language models can extract metadata for annotation of human neuroimaging publicationsMatthew D. Turner0Abhishek Appaji1Nibras Ar Rakib2Pedram Golnari3Arcot K. Rajasekar4Anitha Rathnam K V5Satya S. Sahoo6Yue Wang7Yue Wang8Lei Wang9Jessica A. Turner10Department of Psychiatry, The Ohio State University, Columbus, OH, United StatesDepartment of Medical Electronics Engineering, B.M.S. College of Engineering, Bengaluru, IndiaFaculty of Information, University of Toronto, Toronto, ON, CanadaDepartment of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, United StatesSchool of Information and Library Science, University of North Carolina, Chapel Hill, NC, United StatesDepartment of Computer Science and Engineering, University Visvesvaraya College of Engineering, Bangalore University, Bengaluru, IndiaDepartment of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, United StatesSchool of Information and Library Science, University of North Carolina, Chapel Hill, NC, United StatesCarolina Health Informatics Program, University of North Carolina, Chapel Hill, NC, United StatesDepartment of Psychiatry, The Ohio State University, Columbus, OH, United StatesDepartment of Psychiatry, The Ohio State University, Columbus, OH, United StatesWe show that recent (mid-to-late 2024) commercial large language models (LLMs) are capable of good quality metadata extraction and annotation with very little work on the part of investigators for several exemplar real-world annotation tasks in the neuroimaging literature. We investigated the GPT-4o LLM from OpenAI which performed comparably with several groups of specially trained and supervised human annotators. The LLM achieves similar performance to humans, between 0.91 and 0.97 on zero-shot prompts without feedback to the LLM. Reviewing the disagreements between LLM and gold standard human annotations we note that actual LLM errors are comparable to human errors in most cases, and in many cases these disagreements are not errors. Based on the specific types of annotations we tested, with exceptionally reviewed gold-standard correct values, the LLM performance is usable for metadata annotation at scale. We encourage other research groups to develop and make available more specialized “micro-benchmarks,” like the ones we provide here, for testing both LLMs, and more complex agent systems annotation performance in real-world metadata annotation tasks.https://www.frontiersin.org/articles/10.3389/fninf.2025.1609077/fulllarge language modelsmetadata annotationinformation extractionhuman neuroimagingontologiesdocument annotation |
| spellingShingle | Matthew D. Turner Abhishek Appaji Nibras Ar Rakib Pedram Golnari Arcot K. Rajasekar Anitha Rathnam K V Satya S. Sahoo Yue Wang Yue Wang Lei Wang Jessica A. Turner Large language models can extract metadata for annotation of human neuroimaging publications Frontiers in Neuroinformatics large language models metadata annotation information extraction human neuroimaging ontologies document annotation |
| title | Large language models can extract metadata for annotation of human neuroimaging publications |
| title_full | Large language models can extract metadata for annotation of human neuroimaging publications |
| title_fullStr | Large language models can extract metadata for annotation of human neuroimaging publications |
| title_full_unstemmed | Large language models can extract metadata for annotation of human neuroimaging publications |
| title_short | Large language models can extract metadata for annotation of human neuroimaging publications |
| title_sort | large language models can extract metadata for annotation of human neuroimaging publications |
| topic | large language models metadata annotation information extraction human neuroimaging ontologies document annotation |
| url | https://www.frontiersin.org/articles/10.3389/fninf.2025.1609077/full |
| work_keys_str_mv | AT matthewdturner largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications AT abhishekappaji largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications AT nibrasarrakib largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications AT pedramgolnari largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications AT arcotkrajasekar largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications AT anitharathnamkv largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications AT satyassahoo largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications AT yuewang largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications AT yuewang largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications AT leiwang largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications AT jessicaaturner largelanguagemodelscanextractmetadataforannotationofhumanneuroimagingpublications |