Large language models for accurate disease detection in electronic health records: the examples of crystal arthropathies
Objectives We propose and test a framework to detect disease diagnosis using a recent large language model (LLM), Meta’s Llama-3-8B, on French-language electronic health record (EHR) documents. Specifically, it focuses on detecting gout (‘goutte’ in French), a ubiquitous French term that has multipl...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMJ Publishing Group
2024-12-01
|
Series: | RMD Open |
Online Access: | https://rmdopen.bmj.com/content/10/4/e005003.full |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Objectives We propose and test a framework to detect disease diagnosis using a recent large language model (LLM), Meta’s Llama-3-8B, on French-language electronic health record (EHR) documents. Specifically, it focuses on detecting gout (‘goutte’ in French), a ubiquitous French term that has multiple meanings beyond the disease. The study compares the performance of the LLM-based framework with traditional natural language processing techniques and tests its dependence on the parameter used.Methods The framework was developed using a training and testing set of 700 paragraphs assessing ‘gout’ from a random selection of EHR documents from a tertiary university hospital in Geneva, Switzerland. All paragraphs were manually reviewed and classified by two healthcare professionals into disease (true gout) and non-disease (gold standard). The LLM’s accuracy was tested using few-shot and chain-of-thought prompting and compared with a regular expression (regex)-based method, focusing on the effects of model parameters and prompt structure. The framework was further validated on 600 paragraphs assessing ‘Calcium Pyrophosphate Deposition Disease (CPPD)’.Results The LLM-based algorithm outperformed the regex method, achieving a 92.7% (88.7%–95.4%) positive predictive value, a 96.6% (94.6%–97.8%) negative predictive value and an accuracy of 95.4% (93.6%–96.7%) for gout. In the validation set on CPPD, accuracy was 94.1% (90.2%–97.6%). The LLM framework performed well over a wide range of parameter values.Conclusion LLMs accurately detected disease diagnoses from EHRs, even in non-English languages. They could facilitate creating large disease registers in any language, improving disease care assessment and patient recruitment for clinical trials. |
---|---|
ISSN: | 2056-5933 |