Ascertaining Framingham heart failure phenotype from inpatient electronic health record data using natural language processing: a multicentre Atherosclerosis Risk in Communities (ARIC) validation study

Objectives Using free-text clinical notes and reports from hospitalised patients, determine the performance of natural language processing (NLP) ascertainment of Framingham heart failure (HF) criteria and phenotype.Study design A retrospective observational study design of patients hospitalised in 2...

Full description

Saved in:
Bibliographic Details
Main Authors: Gerardo Heiss, Saumya Jain, Anna M Kucharska-Newton, Eric Whitsel, Carlton R Moore, Stephanie Haas, Harish Yadav, Wayne Rosamand
Format: Article
Language:English
Published: BMJ Publishing Group 2021-06-01
Series:BMJ Open
Online Access:https://bmjopen.bmj.com/content/11/6/e047356.full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846162945196687360
author Gerardo Heiss
Saumya Jain
Anna M Kucharska-Newton
Eric Whitsel
Carlton R Moore
Stephanie Haas
Harish Yadav
Wayne Rosamand
author_facet Gerardo Heiss
Saumya Jain
Anna M Kucharska-Newton
Eric Whitsel
Carlton R Moore
Stephanie Haas
Harish Yadav
Wayne Rosamand
author_sort Gerardo Heiss
collection DOAJ
description Objectives Using free-text clinical notes and reports from hospitalised patients, determine the performance of natural language processing (NLP) ascertainment of Framingham heart failure (HF) criteria and phenotype.Study design A retrospective observational study design of patients hospitalised in 2015 from four hospitals participating in the Atherosclerosis Risk in Communities (ARIC) study was used to determine NLP performance in the ascertainment of Framingham HF criteria and phenotype.Setting Four ARIC study hospitals, each representing an ARIC study region in the USA.Participants A stratified random sample of hospitalisations identified using a broad range of International Classification of Disease, ninth revision, diagnostic codes indicative of an HF event and occurring during 2015 was drawn for this study. A randomly selected set of 394 hospitalisations was used as the derivation dataset and 406 hospitalisations was used as the validation dataset.Intervention Use of NLP on free-text clinical notes and reports to ascertain Framingham HF criteria and phenotype.Primary and secondary outcome measures NLP performance as measured by sensitivity, specificity, positive-predictive value (PPV) and agreement in ascertainment of Framingham HF criteria and phenotype. Manual medical record review by trained ARIC abstractors was used as the reference standard.Results Overall, performance of NLP ascertainment of Framingham HF phenotype in the validation dataset was good, with 78.8%, 81.7%, 84.4% and 80.0% for sensitivity, specificity, PPV and agreement, respectively.Conclusions By decreasing the need for manual chart review, our results on the use of NLP to ascertain Framingham HF phenotype from free-text electronic health record data suggest that validated NLP technology holds the potential for significantly improving the feasibility and efficiency of conducting large-scale epidemiologic surveillance of HF prevalence and incidence.
format Article
id doaj-art-f10f3ba45e7a4db081c84d08c0dd1d5b
institution Kabale University
issn 2044-6055
language English
publishDate 2021-06-01
publisher BMJ Publishing Group
record_format Article
series BMJ Open
spelling doaj-art-f10f3ba45e7a4db081c84d08c0dd1d5b2024-11-20T02:55:11ZengBMJ Publishing GroupBMJ Open2044-60552021-06-0111610.1136/bmjopen-2020-047356Ascertaining Framingham heart failure phenotype from inpatient electronic health record data using natural language processing: a multicentre Atherosclerosis Risk in Communities (ARIC) validation studyGerardo Heiss0Saumya Jain1Anna M Kucharska-Newton2Eric Whitsel3Carlton R Moore4Stephanie Haas5Harish Yadav6Wayne Rosamand7Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USAEpidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA2 Department of Epidemiology, UNC Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USAEpidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USAMedicine, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, North Carolina, USASchool of Information and Library Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USASchool of Information and Library Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USAEpidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USAObjectives Using free-text clinical notes and reports from hospitalised patients, determine the performance of natural language processing (NLP) ascertainment of Framingham heart failure (HF) criteria and phenotype.Study design A retrospective observational study design of patients hospitalised in 2015 from four hospitals participating in the Atherosclerosis Risk in Communities (ARIC) study was used to determine NLP performance in the ascertainment of Framingham HF criteria and phenotype.Setting Four ARIC study hospitals, each representing an ARIC study region in the USA.Participants A stratified random sample of hospitalisations identified using a broad range of International Classification of Disease, ninth revision, diagnostic codes indicative of an HF event and occurring during 2015 was drawn for this study. A randomly selected set of 394 hospitalisations was used as the derivation dataset and 406 hospitalisations was used as the validation dataset.Intervention Use of NLP on free-text clinical notes and reports to ascertain Framingham HF criteria and phenotype.Primary and secondary outcome measures NLP performance as measured by sensitivity, specificity, positive-predictive value (PPV) and agreement in ascertainment of Framingham HF criteria and phenotype. Manual medical record review by trained ARIC abstractors was used as the reference standard.Results Overall, performance of NLP ascertainment of Framingham HF phenotype in the validation dataset was good, with 78.8%, 81.7%, 84.4% and 80.0% for sensitivity, specificity, PPV and agreement, respectively.Conclusions By decreasing the need for manual chart review, our results on the use of NLP to ascertain Framingham HF phenotype from free-text electronic health record data suggest that validated NLP technology holds the potential for significantly improving the feasibility and efficiency of conducting large-scale epidemiologic surveillance of HF prevalence and incidence.https://bmjopen.bmj.com/content/11/6/e047356.full
spellingShingle Gerardo Heiss
Saumya Jain
Anna M Kucharska-Newton
Eric Whitsel
Carlton R Moore
Stephanie Haas
Harish Yadav
Wayne Rosamand
Ascertaining Framingham heart failure phenotype from inpatient electronic health record data using natural language processing: a multicentre Atherosclerosis Risk in Communities (ARIC) validation study
BMJ Open
title Ascertaining Framingham heart failure phenotype from inpatient electronic health record data using natural language processing: a multicentre Atherosclerosis Risk in Communities (ARIC) validation study
title_full Ascertaining Framingham heart failure phenotype from inpatient electronic health record data using natural language processing: a multicentre Atherosclerosis Risk in Communities (ARIC) validation study
title_fullStr Ascertaining Framingham heart failure phenotype from inpatient electronic health record data using natural language processing: a multicentre Atherosclerosis Risk in Communities (ARIC) validation study
title_full_unstemmed Ascertaining Framingham heart failure phenotype from inpatient electronic health record data using natural language processing: a multicentre Atherosclerosis Risk in Communities (ARIC) validation study
title_short Ascertaining Framingham heart failure phenotype from inpatient electronic health record data using natural language processing: a multicentre Atherosclerosis Risk in Communities (ARIC) validation study
title_sort ascertaining framingham heart failure phenotype from inpatient electronic health record data using natural language processing a multicentre atherosclerosis risk in communities aric validation study
url https://bmjopen.bmj.com/content/11/6/e047356.full
work_keys_str_mv AT gerardoheiss ascertainingframinghamheartfailurephenotypefrominpatientelectronichealthrecorddatausingnaturallanguageprocessingamulticentreatherosclerosisriskincommunitiesaricvalidationstudy
AT saumyajain ascertainingframinghamheartfailurephenotypefrominpatientelectronichealthrecorddatausingnaturallanguageprocessingamulticentreatherosclerosisriskincommunitiesaricvalidationstudy
AT annamkucharskanewton ascertainingframinghamheartfailurephenotypefrominpatientelectronichealthrecorddatausingnaturallanguageprocessingamulticentreatherosclerosisriskincommunitiesaricvalidationstudy
AT ericwhitsel ascertainingframinghamheartfailurephenotypefrominpatientelectronichealthrecorddatausingnaturallanguageprocessingamulticentreatherosclerosisriskincommunitiesaricvalidationstudy
AT carltonrmoore ascertainingframinghamheartfailurephenotypefrominpatientelectronichealthrecorddatausingnaturallanguageprocessingamulticentreatherosclerosisriskincommunitiesaricvalidationstudy
AT stephaniehaas ascertainingframinghamheartfailurephenotypefrominpatientelectronichealthrecorddatausingnaturallanguageprocessingamulticentreatherosclerosisriskincommunitiesaricvalidationstudy
AT harishyadav ascertainingframinghamheartfailurephenotypefrominpatientelectronichealthrecorddatausingnaturallanguageprocessingamulticentreatherosclerosisriskincommunitiesaricvalidationstudy
AT waynerosamand ascertainingframinghamheartfailurephenotypefrominpatientelectronichealthrecorddatausingnaturallanguageprocessingamulticentreatherosclerosisriskincommunitiesaricvalidationstudy