Care home resident identification: A comparison of address matching methods with Natural Language Processing.

<h4>Background</h4>Care home residents are a highly vulnerable group, but identifying care home residents in routine data is challenging. This study aimed to develop and validate Natural Language Processing (NLP) methods to identify care home residents from primary care address records.&...

Full description

Saved in:
Bibliographic Details
Main Authors: Víctor Suárez-Paniagua, Arlene Casey, Charis A Marwick, Jennifer K Burton, Helen Callaby, Isobel Guthrie, Bruce Guthrie, Beatrice Alex
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2024-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0309341
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841555377300176896
author Víctor Suárez-Paniagua
Arlene Casey
Charis A Marwick
Jennifer K Burton
Helen Callaby
Isobel Guthrie
Bruce Guthrie
Beatrice Alex
author_facet Víctor Suárez-Paniagua
Arlene Casey
Charis A Marwick
Jennifer K Burton
Helen Callaby
Isobel Guthrie
Bruce Guthrie
Beatrice Alex
author_sort Víctor Suárez-Paniagua
collection DOAJ
description <h4>Background</h4>Care home residents are a highly vulnerable group, but identifying care home residents in routine data is challenging. This study aimed to develop and validate Natural Language Processing (NLP) methods to identify care home residents from primary care address records.<h4>Methods</h4>The proposed system applies an NLP sequential filtering and preprocessing of text, then the calculation of similarity scores between general practice (GP) addresses and care home registered addresses. Performance was evaluated in a diagnostic test study comparing NLP prediction to independent, gold-standard manual identification of care home addresses. The analysis used population data for 771,588 uniquely written addresses for 819,911 people in two NHS Scotland health board regions. The source code is publicly available at https://github.com/vsuarezpaniagua/NLPcarehome.<h4>Results</h4>Care home resident identification by NLP methods overall was better in Fife than in Tayside, and better in the over-65s than in the whole population. Methods with the best performance were Correlation (sensitivity 90.2%, PPV 92.0%) for Fife data and Cosine (sensitivity 90.4%, PPV 93.7%) for Tayside. For people aged ≥65 years, the best methods were Jensen-Shannon (sensitivity 91.5%, PPV 98.7%) for Fife and City Block (sensitivity 94.4%, PPV 98.3%) for Tayside. These results show the feasibility of applying NLP methods to real data concluding that computing address similarities outperforms previous works.<h4>Conclusions</h4>Address-matching techniques using NLP methods can determine with reasonable accuracy if individuals live in a care home based on their GP-registered addresses. The performance of the system exceeds previously reported results such as Postcode matching, Markov score or Phonics score.
format Article
id doaj-art-3b57b5d569cc4797ab3ee5afa2c785b7
institution Kabale University
issn 1932-6203
language English
publishDate 2024-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-3b57b5d569cc4797ab3ee5afa2c785b72025-01-08T05:33:29ZengPublic Library of Science (PLoS)PLoS ONE1932-62032024-01-011912e030934110.1371/journal.pone.0309341Care home resident identification: A comparison of address matching methods with Natural Language Processing.Víctor Suárez-PaniaguaArlene CaseyCharis A MarwickJennifer K BurtonHelen CallabyIsobel GuthrieBruce GuthrieBeatrice Alex<h4>Background</h4>Care home residents are a highly vulnerable group, but identifying care home residents in routine data is challenging. This study aimed to develop and validate Natural Language Processing (NLP) methods to identify care home residents from primary care address records.<h4>Methods</h4>The proposed system applies an NLP sequential filtering and preprocessing of text, then the calculation of similarity scores between general practice (GP) addresses and care home registered addresses. Performance was evaluated in a diagnostic test study comparing NLP prediction to independent, gold-standard manual identification of care home addresses. The analysis used population data for 771,588 uniquely written addresses for 819,911 people in two NHS Scotland health board regions. The source code is publicly available at https://github.com/vsuarezpaniagua/NLPcarehome.<h4>Results</h4>Care home resident identification by NLP methods overall was better in Fife than in Tayside, and better in the over-65s than in the whole population. Methods with the best performance were Correlation (sensitivity 90.2%, PPV 92.0%) for Fife data and Cosine (sensitivity 90.4%, PPV 93.7%) for Tayside. For people aged ≥65 years, the best methods were Jensen-Shannon (sensitivity 91.5%, PPV 98.7%) for Fife and City Block (sensitivity 94.4%, PPV 98.3%) for Tayside. These results show the feasibility of applying NLP methods to real data concluding that computing address similarities outperforms previous works.<h4>Conclusions</h4>Address-matching techniques using NLP methods can determine with reasonable accuracy if individuals live in a care home based on their GP-registered addresses. The performance of the system exceeds previously reported results such as Postcode matching, Markov score or Phonics score.https://doi.org/10.1371/journal.pone.0309341
spellingShingle Víctor Suárez-Paniagua
Arlene Casey
Charis A Marwick
Jennifer K Burton
Helen Callaby
Isobel Guthrie
Bruce Guthrie
Beatrice Alex
Care home resident identification: A comparison of address matching methods with Natural Language Processing.
PLoS ONE
title Care home resident identification: A comparison of address matching methods with Natural Language Processing.
title_full Care home resident identification: A comparison of address matching methods with Natural Language Processing.
title_fullStr Care home resident identification: A comparison of address matching methods with Natural Language Processing.
title_full_unstemmed Care home resident identification: A comparison of address matching methods with Natural Language Processing.
title_short Care home resident identification: A comparison of address matching methods with Natural Language Processing.
title_sort care home resident identification a comparison of address matching methods with natural language processing
url https://doi.org/10.1371/journal.pone.0309341
work_keys_str_mv AT victorsuarezpaniagua carehomeresidentidentificationacomparisonofaddressmatchingmethodswithnaturallanguageprocessing
AT arlenecasey carehomeresidentidentificationacomparisonofaddressmatchingmethodswithnaturallanguageprocessing
AT charisamarwick carehomeresidentidentificationacomparisonofaddressmatchingmethodswithnaturallanguageprocessing
AT jenniferkburton carehomeresidentidentificationacomparisonofaddressmatchingmethodswithnaturallanguageprocessing
AT helencallaby carehomeresidentidentificationacomparisonofaddressmatchingmethodswithnaturallanguageprocessing
AT isobelguthrie carehomeresidentidentificationacomparisonofaddressmatchingmethodswithnaturallanguageprocessing
AT bruceguthrie carehomeresidentidentificationacomparisonofaddressmatchingmethodswithnaturallanguageprocessing
AT beatricealex carehomeresidentidentificationacomparisonofaddressmatchingmethodswithnaturallanguageprocessing