Care home resident identification: A comparison of address matching methods with Natural Language Processing.
<h4>Background</h4>Care home residents are a highly vulnerable group, but identifying care home residents in routine data is challenging. This study aimed to develop and validate Natural Language Processing (NLP) methods to identify care home residents from primary care address records.&...
Saved in:
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2024-01-01
|
Series: | PLoS ONE |
Online Access: | https://doi.org/10.1371/journal.pone.0309341 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841555377300176896 |
---|---|
author | Víctor Suárez-Paniagua Arlene Casey Charis A Marwick Jennifer K Burton Helen Callaby Isobel Guthrie Bruce Guthrie Beatrice Alex |
author_facet | Víctor Suárez-Paniagua Arlene Casey Charis A Marwick Jennifer K Burton Helen Callaby Isobel Guthrie Bruce Guthrie Beatrice Alex |
author_sort | Víctor Suárez-Paniagua |
collection | DOAJ |
description | <h4>Background</h4>Care home residents are a highly vulnerable group, but identifying care home residents in routine data is challenging. This study aimed to develop and validate Natural Language Processing (NLP) methods to identify care home residents from primary care address records.<h4>Methods</h4>The proposed system applies an NLP sequential filtering and preprocessing of text, then the calculation of similarity scores between general practice (GP) addresses and care home registered addresses. Performance was evaluated in a diagnostic test study comparing NLP prediction to independent, gold-standard manual identification of care home addresses. The analysis used population data for 771,588 uniquely written addresses for 819,911 people in two NHS Scotland health board regions. The source code is publicly available at https://github.com/vsuarezpaniagua/NLPcarehome.<h4>Results</h4>Care home resident identification by NLP methods overall was better in Fife than in Tayside, and better in the over-65s than in the whole population. Methods with the best performance were Correlation (sensitivity 90.2%, PPV 92.0%) for Fife data and Cosine (sensitivity 90.4%, PPV 93.7%) for Tayside. For people aged ≥65 years, the best methods were Jensen-Shannon (sensitivity 91.5%, PPV 98.7%) for Fife and City Block (sensitivity 94.4%, PPV 98.3%) for Tayside. These results show the feasibility of applying NLP methods to real data concluding that computing address similarities outperforms previous works.<h4>Conclusions</h4>Address-matching techniques using NLP methods can determine with reasonable accuracy if individuals live in a care home based on their GP-registered addresses. The performance of the system exceeds previously reported results such as Postcode matching, Markov score or Phonics score. |
format | Article |
id | doaj-art-3b57b5d569cc4797ab3ee5afa2c785b7 |
institution | Kabale University |
issn | 1932-6203 |
language | English |
publishDate | 2024-01-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS ONE |
spelling | doaj-art-3b57b5d569cc4797ab3ee5afa2c785b72025-01-08T05:33:29ZengPublic Library of Science (PLoS)PLoS ONE1932-62032024-01-011912e030934110.1371/journal.pone.0309341Care home resident identification: A comparison of address matching methods with Natural Language Processing.Víctor Suárez-PaniaguaArlene CaseyCharis A MarwickJennifer K BurtonHelen CallabyIsobel GuthrieBruce GuthrieBeatrice Alex<h4>Background</h4>Care home residents are a highly vulnerable group, but identifying care home residents in routine data is challenging. This study aimed to develop and validate Natural Language Processing (NLP) methods to identify care home residents from primary care address records.<h4>Methods</h4>The proposed system applies an NLP sequential filtering and preprocessing of text, then the calculation of similarity scores between general practice (GP) addresses and care home registered addresses. Performance was evaluated in a diagnostic test study comparing NLP prediction to independent, gold-standard manual identification of care home addresses. The analysis used population data for 771,588 uniquely written addresses for 819,911 people in two NHS Scotland health board regions. The source code is publicly available at https://github.com/vsuarezpaniagua/NLPcarehome.<h4>Results</h4>Care home resident identification by NLP methods overall was better in Fife than in Tayside, and better in the over-65s than in the whole population. Methods with the best performance were Correlation (sensitivity 90.2%, PPV 92.0%) for Fife data and Cosine (sensitivity 90.4%, PPV 93.7%) for Tayside. For people aged ≥65 years, the best methods were Jensen-Shannon (sensitivity 91.5%, PPV 98.7%) for Fife and City Block (sensitivity 94.4%, PPV 98.3%) for Tayside. These results show the feasibility of applying NLP methods to real data concluding that computing address similarities outperforms previous works.<h4>Conclusions</h4>Address-matching techniques using NLP methods can determine with reasonable accuracy if individuals live in a care home based on their GP-registered addresses. The performance of the system exceeds previously reported results such as Postcode matching, Markov score or Phonics score.https://doi.org/10.1371/journal.pone.0309341 |
spellingShingle | Víctor Suárez-Paniagua Arlene Casey Charis A Marwick Jennifer K Burton Helen Callaby Isobel Guthrie Bruce Guthrie Beatrice Alex Care home resident identification: A comparison of address matching methods with Natural Language Processing. PLoS ONE |
title | Care home resident identification: A comparison of address matching methods with Natural Language Processing. |
title_full | Care home resident identification: A comparison of address matching methods with Natural Language Processing. |
title_fullStr | Care home resident identification: A comparison of address matching methods with Natural Language Processing. |
title_full_unstemmed | Care home resident identification: A comparison of address matching methods with Natural Language Processing. |
title_short | Care home resident identification: A comparison of address matching methods with Natural Language Processing. |
title_sort | care home resident identification a comparison of address matching methods with natural language processing |
url | https://doi.org/10.1371/journal.pone.0309341 |
work_keys_str_mv | AT victorsuarezpaniagua carehomeresidentidentificationacomparisonofaddressmatchingmethodswithnaturallanguageprocessing AT arlenecasey carehomeresidentidentificationacomparisonofaddressmatchingmethodswithnaturallanguageprocessing AT charisamarwick carehomeresidentidentificationacomparisonofaddressmatchingmethodswithnaturallanguageprocessing AT jenniferkburton carehomeresidentidentificationacomparisonofaddressmatchingmethodswithnaturallanguageprocessing AT helencallaby carehomeresidentidentificationacomparisonofaddressmatchingmethodswithnaturallanguageprocessing AT isobelguthrie carehomeresidentidentificationacomparisonofaddressmatchingmethodswithnaturallanguageprocessing AT bruceguthrie carehomeresidentidentificationacomparisonofaddressmatchingmethodswithnaturallanguageprocessing AT beatricealex carehomeresidentidentificationacomparisonofaddressmatchingmethodswithnaturallanguageprocessing |