Using Information Extraction to Normalize the Training Data for Automatic Radiology Report Generation

High lexico-syntactic variation across radiology reports even when they convey the same diagnostic information complicates evaluation and hence the training of deep learning models for Automatic Radiology Report Generation. This problem can be addressed by 1) developing an internal standard for the...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yuxiang Liao, Haishan Xiang, Hantao Liu, Irena Spasic
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Access
Subjects:	Information extraction natural language processing named entity recognition relation extraction structured radiology report
Online Access:	https://ieeexplore.ieee.org/document/10759643/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841536214568534016
author	Yuxiang Liao Haishan Xiang Hantao Liu Irena Spasic
author_facet	Yuxiang Liao Haishan Xiang Hantao Liu Irena Spasic
author_sort	Yuxiang Liao
collection	DOAJ
description	High lexico-syntactic variation across radiology reports even when they convey the same diagnostic information complicates evaluation and hence the training of deep learning models for Automatic Radiology Report Generation. This problem can be addressed by 1) developing an internal standard for the structured representation of radiology reports; 2) automatically converting radiology reports to the structured representation prior to training; 3) training a deep learning model to generate a structured radiology report from an image, and finally 4) converting the structured report into a narrative one. In this study, we focus specifically on steps 1) and 2). First, we proposed a structured radiology report scheme based upon RadGraph, which serves to formally represent clinical entities, their attributes and relations discussed in a radiology report. Using the new scheme, we manually annotated a total of 550 MIMIC-CXR reports for model training and evaluation and 50 CheXpert reports for evaluating the model’s generalization ability. We developed a joint entity and relation model and proposed a novel auxiliary component to enhance the model performance by interpreting token-level information. Using the annotated data, we trained the model for automatically converting information from a narrative radiology report into the structured representation, which achieved a micro-F1 of 96.6% and 96.1% on named entity recognition, 94.0% and 89.8% on entity attribute recognition, and 89.5% and 86.6% on relation extraction, on the MIMIC-CXR and CheXpert test sets, respectively. We then used this model to automatically annotate 227,835 MIMIC-CXR reports. We shared all data and software deliverables using PhysioNet Credentialed Health Data License 1.5.0 to enable further research on Automatic Radiology Report Generation.
format	Article
id	doaj-art-eb20086908f649129c49f3d90f8fd15d
institution	Kabale University
issn	2169-3536
language	English
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-eb20086908f649129c49f3d90f8fd15d2025-01-15T00:02:19ZengIEEEIEEE Access2169-35362024-01-011218510318511610.1109/ACCESS.2024.350437810759643Using Information Extraction to Normalize the Training Data for Automatic Radiology Report GenerationYuxiang Liao0https://orcid.org/0000-0002-4095-7048Haishan Xiang1https://orcid.org/0009-0003-2922-6353Hantao Liu2https://orcid.org/0000-0003-4544-3481Irena Spasic3https://orcid.org/0000-0002-8132-3885School of Computer Science and Informatics, Cardiff University, Cardiff, U.K.Baoan Center for Disease Control and Prevention in Shenzhen, Baoan District, Shenzhen, ChinaSchool of Computer Science and Informatics, Cardiff University, Cardiff, U.K.School of Computer Science and Informatics, Cardiff University, Cardiff, U.K.High lexico-syntactic variation across radiology reports even when they convey the same diagnostic information complicates evaluation and hence the training of deep learning models for Automatic Radiology Report Generation. This problem can be addressed by 1) developing an internal standard for the structured representation of radiology reports; 2) automatically converting radiology reports to the structured representation prior to training; 3) training a deep learning model to generate a structured radiology report from an image, and finally 4) converting the structured report into a narrative one. In this study, we focus specifically on steps 1) and 2). First, we proposed a structured radiology report scheme based upon RadGraph, which serves to formally represent clinical entities, their attributes and relations discussed in a radiology report. Using the new scheme, we manually annotated a total of 550 MIMIC-CXR reports for model training and evaluation and 50 CheXpert reports for evaluating the model’s generalization ability. We developed a joint entity and relation model and proposed a novel auxiliary component to enhance the model performance by interpreting token-level information. Using the annotated data, we trained the model for automatically converting information from a narrative radiology report into the structured representation, which achieved a micro-F1 of 96.6% and 96.1% on named entity recognition, 94.0% and 89.8% on entity attribute recognition, and 89.5% and 86.6% on relation extraction, on the MIMIC-CXR and CheXpert test sets, respectively. We then used this model to automatically annotate 227,835 MIMIC-CXR reports. We shared all data and software deliverables using PhysioNet Credentialed Health Data License 1.5.0 to enable further research on Automatic Radiology Report Generation.https://ieeexplore.ieee.org/document/10759643/Information extractionnatural language processingnamed entity recognitionrelation extractionstructured radiology report
spellingShingle	Yuxiang Liao Haishan Xiang Hantao Liu Irena Spasic Using Information Extraction to Normalize the Training Data for Automatic Radiology Report Generation IEEE Access Information extraction natural language processing named entity recognition relation extraction structured radiology report
title	Using Information Extraction to Normalize the Training Data for Automatic Radiology Report Generation
title_full	Using Information Extraction to Normalize the Training Data for Automatic Radiology Report Generation
title_fullStr	Using Information Extraction to Normalize the Training Data for Automatic Radiology Report Generation
title_full_unstemmed	Using Information Extraction to Normalize the Training Data for Automatic Radiology Report Generation
title_short	Using Information Extraction to Normalize the Training Data for Automatic Radiology Report Generation
title_sort	using information extraction to normalize the training data for automatic radiology report generation
topic	Information extraction natural language processing named entity recognition relation extraction structured radiology report
url	https://ieeexplore.ieee.org/document/10759643/
work_keys_str_mv	AT yuxiangliao usinginformationextractiontonormalizethetrainingdataforautomaticradiologyreportgeneration AT haishanxiang usinginformationextractiontonormalizethetrainingdataforautomaticradiologyreportgeneration AT hantaoliu usinginformationextractiontonormalizethetrainingdataforautomaticradiologyreportgeneration AT irenaspasic usinginformationextractiontonormalizethetrainingdataforautomaticradiologyreportgeneration

Using Information Extraction to Normalize the Training Data for Automatic Radiology Report Generation

Similar Items