Large language models for accurate disease detection in electronic health records: the examples of crystal arthropathies

Objectives We propose and test a framework to detect disease diagnosis using a recent large language model (LLM), Meta’s Llama-3-8B, on French-language electronic health record (EHR) documents. Specifically, it focuses on detecting gout (‘goutte’ in French), a ubiquitous French term that has multipl...

Full description

Saved in:

Bibliographic Details
Main Authors:	Kim Lauper, Denis Mongin, Nils Bürgisser, Samia Mehouachi, Clement P. Buclin, Delphine S. Courvoisier, Etienne Chalot
Format:	Article
Language:	English
Published:	BMJ Publishing Group 2024-12-01
Series:	RMD Open
Online Access:	https://rmdopen.bmj.com/content/10/4/e005003.full
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Objectives We propose and test a framework to detect disease diagnosis using a recent large language model (LLM), Meta’s Llama-3-8B, on French-language electronic health record (EHR) documents. Specifically, it focuses on detecting gout (‘goutte’ in French), a ubiquitous French term that has multiple meanings beyond the disease. The study compares the performance of the LLM-based framework with traditional natural language processing techniques and tests its dependence on the parameter used.Methods The framework was developed using a training and testing set of 700 paragraphs assessing ‘gout’ from a random selection of EHR documents from a tertiary university hospital in Geneva, Switzerland. All paragraphs were manually reviewed and classified by two healthcare professionals into disease (true gout) and non-disease (gold standard). The LLM’s accuracy was tested using few-shot and chain-of-thought prompting and compared with a regular expression (regex)-based method, focusing on the effects of model parameters and prompt structure. The framework was further validated on 600 paragraphs assessing ‘Calcium Pyrophosphate Deposition Disease (CPPD)’.Results The LLM-based algorithm outperformed the regex method, achieving a 92.7% (88.7%–95.4%) positive predictive value, a 96.6% (94.6%–97.8%) negative predictive value and an accuracy of 95.4% (93.6%–96.7%) for gout. In the validation set on CPPD, accuracy was 94.1% (90.2%–97.6%). The LLM framework performed well over a wide range of parameter values.Conclusion LLMs accurately detected disease diagnoses from EHRs, even in non-English languages. They could facilitate creating large disease registers in any language, improving disease care assessment and patient recruitment for clinical trials.
ISSN:	2056-5933

Large language models for accurate disease detection in electronic health records: the examples of crystal arthropathies

Similar Items