Standardized patient profile review using large language models for case adjudication in observational research

Abstract Using administrative claims and electronic health records for observational studies is common but challenging due to data limitations. Researchers rely on phenotype algorithms, requiring labor-intensive chart reviews for validation. This study investigates whether case adjudication using th...

Full description

Saved in:
Bibliographic Details
Main Authors: Martijn J. Schuemie, Anna Ostropolets, Aleh Zhuk, Uladzislau Korsik, Seung In Seo, Marc A. Suchard, George Hripcsak, Patrick B. Ryan
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:npj Digital Medicine
Online Access:https://doi.org/10.1038/s41746-025-01433-4
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841544260834295808
author Martijn J. Schuemie
Anna Ostropolets
Aleh Zhuk
Uladzislau Korsik
Seung In Seo
Marc A. Suchard
George Hripcsak
Patrick B. Ryan
author_facet Martijn J. Schuemie
Anna Ostropolets
Aleh Zhuk
Uladzislau Korsik
Seung In Seo
Marc A. Suchard
George Hripcsak
Patrick B. Ryan
author_sort Martijn J. Schuemie
collection DOAJ
description Abstract Using administrative claims and electronic health records for observational studies is common but challenging due to data limitations. Researchers rely on phenotype algorithms, requiring labor-intensive chart reviews for validation. This study investigates whether case adjudication using the previously introduced Knowledge-Enhanced Electronic Profile Review (KEEPER) system with large language models (LLMs) is feasible and could serve as a viable alternative to manual chart review. The task involves adjudicating cases identified by a phenotype algorithm, with KEEPER extracting predefined findings such as symptoms, comorbidities, and treatments from structured data. LLMs then evaluate KEEPER outputs to determine whether a patient truly qualifies as a case. We tested four LLMs including GPT-4, hosted locally to ensure privacy. Using zero-shot prompting and iterative prompt optimization, we found LLM performance, across ten diseases, varied by prompt and model, with sensitivities from 78 to 98% and specificities from 48 to 98%, indicating promise for automating phenotype evaluation.
format Article
id doaj-art-0b94fbd6be0943a0821308c85ad7715f
institution Kabale University
issn 2398-6352
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series npj Digital Medicine
spelling doaj-art-0b94fbd6be0943a0821308c85ad7715f2025-01-12T12:40:53ZengNature Portfolionpj Digital Medicine2398-63522025-01-01811710.1038/s41746-025-01433-4Standardized patient profile review using large language models for case adjudication in observational researchMartijn J. Schuemie0Anna Ostropolets1Aleh Zhuk2Uladzislau Korsik3Seung In Seo4Marc A. Suchard5George Hripcsak6Patrick B. Ryan7Observational Health Data Science and InformaticsObservational Health Data Science and InformaticsObservational Health Data Science and InformaticsObservational Health Data Science and InformaticsObservational Health Data Science and InformaticsObservational Health Data Science and InformaticsObservational Health Data Science and InformaticsObservational Health Data Science and InformaticsAbstract Using administrative claims and electronic health records for observational studies is common but challenging due to data limitations. Researchers rely on phenotype algorithms, requiring labor-intensive chart reviews for validation. This study investigates whether case adjudication using the previously introduced Knowledge-Enhanced Electronic Profile Review (KEEPER) system with large language models (LLMs) is feasible and could serve as a viable alternative to manual chart review. The task involves adjudicating cases identified by a phenotype algorithm, with KEEPER extracting predefined findings such as symptoms, comorbidities, and treatments from structured data. LLMs then evaluate KEEPER outputs to determine whether a patient truly qualifies as a case. We tested four LLMs including GPT-4, hosted locally to ensure privacy. Using zero-shot prompting and iterative prompt optimization, we found LLM performance, across ten diseases, varied by prompt and model, with sensitivities from 78 to 98% and specificities from 48 to 98%, indicating promise for automating phenotype evaluation.https://doi.org/10.1038/s41746-025-01433-4
spellingShingle Martijn J. Schuemie
Anna Ostropolets
Aleh Zhuk
Uladzislau Korsik
Seung In Seo
Marc A. Suchard
George Hripcsak
Patrick B. Ryan
Standardized patient profile review using large language models for case adjudication in observational research
npj Digital Medicine
title Standardized patient profile review using large language models for case adjudication in observational research
title_full Standardized patient profile review using large language models for case adjudication in observational research
title_fullStr Standardized patient profile review using large language models for case adjudication in observational research
title_full_unstemmed Standardized patient profile review using large language models for case adjudication in observational research
title_short Standardized patient profile review using large language models for case adjudication in observational research
title_sort standardized patient profile review using large language models for case adjudication in observational research
url https://doi.org/10.1038/s41746-025-01433-4
work_keys_str_mv AT martijnjschuemie standardizedpatientprofilereviewusinglargelanguagemodelsforcaseadjudicationinobservationalresearch
AT annaostropolets standardizedpatientprofilereviewusinglargelanguagemodelsforcaseadjudicationinobservationalresearch
AT alehzhuk standardizedpatientprofilereviewusinglargelanguagemodelsforcaseadjudicationinobservationalresearch
AT uladzislaukorsik standardizedpatientprofilereviewusinglargelanguagemodelsforcaseadjudicationinobservationalresearch
AT seunginseo standardizedpatientprofilereviewusinglargelanguagemodelsforcaseadjudicationinobservationalresearch
AT marcasuchard standardizedpatientprofilereviewusinglargelanguagemodelsforcaseadjudicationinobservationalresearch
AT georgehripcsak standardizedpatientprofilereviewusinglargelanguagemodelsforcaseadjudicationinobservationalresearch
AT patrickbryan standardizedpatientprofilereviewusinglargelanguagemodelsforcaseadjudicationinobservationalresearch