Using Information Extraction to Normalize the Training Data for Automatic Radiology Report Generation

High lexico-syntactic variation across radiology reports even when they convey the same diagnostic information complicates evaluation and hence the training of deep learning models for Automatic Radiology Report Generation. This problem can be addressed by 1) developing an internal standard for the...

Full description

Saved in:
Bibliographic Details
Main Authors: Yuxiang Liao, Haishan Xiang, Hantao Liu, Irena Spasic
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10759643/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841536214568534016
author Yuxiang Liao
Haishan Xiang
Hantao Liu
Irena Spasic
author_facet Yuxiang Liao
Haishan Xiang
Hantao Liu
Irena Spasic
author_sort Yuxiang Liao
collection DOAJ
description High lexico-syntactic variation across radiology reports even when they convey the same diagnostic information complicates evaluation and hence the training of deep learning models for Automatic Radiology Report Generation. This problem can be addressed by 1) developing an internal standard for the structured representation of radiology reports; 2) automatically converting radiology reports to the structured representation prior to training; 3) training a deep learning model to generate a structured radiology report from an image, and finally 4) converting the structured report into a narrative one. In this study, we focus specifically on steps 1) and 2). First, we proposed a structured radiology report scheme based upon RadGraph, which serves to formally represent clinical entities, their attributes and relations discussed in a radiology report. Using the new scheme, we manually annotated a total of 550 MIMIC-CXR reports for model training and evaluation and 50 CheXpert reports for evaluating the model’s generalization ability. We developed a joint entity and relation model and proposed a novel auxiliary component to enhance the model performance by interpreting token-level information. Using the annotated data, we trained the model for automatically converting information from a narrative radiology report into the structured representation, which achieved a micro-F1 of 96.6% and 96.1% on named entity recognition, 94.0% and 89.8% on entity attribute recognition, and 89.5% and 86.6% on relation extraction, on the MIMIC-CXR and CheXpert test sets, respectively. We then used this model to automatically annotate 227,835 MIMIC-CXR reports. We shared all data and software deliverables using PhysioNet Credentialed Health Data License 1.5.0 to enable further research on Automatic Radiology Report Generation.
format Article
id doaj-art-eb20086908f649129c49f3d90f8fd15d
institution Kabale University
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-eb20086908f649129c49f3d90f8fd15d2025-01-15T00:02:19ZengIEEEIEEE Access2169-35362024-01-011218510318511610.1109/ACCESS.2024.350437810759643Using Information Extraction to Normalize the Training Data for Automatic Radiology Report GenerationYuxiang Liao0https://orcid.org/0000-0002-4095-7048Haishan Xiang1https://orcid.org/0009-0003-2922-6353Hantao Liu2https://orcid.org/0000-0003-4544-3481Irena Spasic3https://orcid.org/0000-0002-8132-3885School of Computer Science and Informatics, Cardiff University, Cardiff, U.K.Baoan Center for Disease Control and Prevention in Shenzhen, Baoan District, Shenzhen, ChinaSchool of Computer Science and Informatics, Cardiff University, Cardiff, U.K.School of Computer Science and Informatics, Cardiff University, Cardiff, U.K.High lexico-syntactic variation across radiology reports even when they convey the same diagnostic information complicates evaluation and hence the training of deep learning models for Automatic Radiology Report Generation. This problem can be addressed by 1) developing an internal standard for the structured representation of radiology reports; 2) automatically converting radiology reports to the structured representation prior to training; 3) training a deep learning model to generate a structured radiology report from an image, and finally 4) converting the structured report into a narrative one. In this study, we focus specifically on steps 1) and 2). First, we proposed a structured radiology report scheme based upon RadGraph, which serves to formally represent clinical entities, their attributes and relations discussed in a radiology report. Using the new scheme, we manually annotated a total of 550 MIMIC-CXR reports for model training and evaluation and 50 CheXpert reports for evaluating the model’s generalization ability. We developed a joint entity and relation model and proposed a novel auxiliary component to enhance the model performance by interpreting token-level information. Using the annotated data, we trained the model for automatically converting information from a narrative radiology report into the structured representation, which achieved a micro-F1 of 96.6% and 96.1% on named entity recognition, 94.0% and 89.8% on entity attribute recognition, and 89.5% and 86.6% on relation extraction, on the MIMIC-CXR and CheXpert test sets, respectively. We then used this model to automatically annotate 227,835 MIMIC-CXR reports. We shared all data and software deliverables using PhysioNet Credentialed Health Data License 1.5.0 to enable further research on Automatic Radiology Report Generation.https://ieeexplore.ieee.org/document/10759643/Information extractionnatural language processingnamed entity recognitionrelation extractionstructured radiology report
spellingShingle Yuxiang Liao
Haishan Xiang
Hantao Liu
Irena Spasic
Using Information Extraction to Normalize the Training Data for Automatic Radiology Report Generation
IEEE Access
Information extraction
natural language processing
named entity recognition
relation extraction
structured radiology report
title Using Information Extraction to Normalize the Training Data for Automatic Radiology Report Generation
title_full Using Information Extraction to Normalize the Training Data for Automatic Radiology Report Generation
title_fullStr Using Information Extraction to Normalize the Training Data for Automatic Radiology Report Generation
title_full_unstemmed Using Information Extraction to Normalize the Training Data for Automatic Radiology Report Generation
title_short Using Information Extraction to Normalize the Training Data for Automatic Radiology Report Generation
title_sort using information extraction to normalize the training data for automatic radiology report generation
topic Information extraction
natural language processing
named entity recognition
relation extraction
structured radiology report
url https://ieeexplore.ieee.org/document/10759643/
work_keys_str_mv AT yuxiangliao usinginformationextractiontonormalizethetrainingdataforautomaticradiologyreportgeneration
AT haishanxiang usinginformationextractiontonormalizethetrainingdataforautomaticradiologyreportgeneration
AT hantaoliu usinginformationextractiontonormalizethetrainingdataforautomaticradiologyreportgeneration
AT irenaspasic usinginformationextractiontonormalizethetrainingdataforautomaticradiologyreportgeneration