Presenting the framework of the whole slide image file Babel fish: An OCR-based file labeling tool

Introduction: Metadata extraction from digitized slides or whole slide image files is a frequent, laborious, and tedious task. In this work, we present a tool to automatically extract all relevant slide information, such as case number, year, slide number, block number, and staining from the macro-i...

Full description

Saved in:

Bibliographic Details
Main Authors:	Nils Englert, Constantin Schwab, Maximilian Legnar, Cleo-Aron Weis
Format:	Article
Language:	English
Published:	Elsevier 2024-12-01
Series:	Journal of Pathology Informatics
Subjects:	DICOM Digital pathology Optical character recognition Automatization
Online Access:	http://www.sciencedirect.com/science/article/pii/S2153353924000415
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846122125755154432
author	Nils Englert Constantin Schwab Maximilian Legnar Cleo-Aron Weis
author_facet	Nils Englert Constantin Schwab Maximilian Legnar Cleo-Aron Weis
author_sort	Nils Englert
collection	DOAJ
description	Introduction: Metadata extraction from digitized slides or whole slide image files is a frequent, laborious, and tedious task. In this work, we present a tool to automatically extract all relevant slide information, such as case number, year, slide number, block number, and staining from the macro-images of the scanned slide.We named the tool Babel fish as it helps translate relevant information printed on the slide. It is written to contain certain basic assumptions regarding, for example, the location of certain information. This can be adapted to the respective location. The extracted metadata can then be used to sort digital slides into databases or to link them with associated case IDs from laboratory information systems. Material and methods: The tool is based on optical character recognition (OCR). For most information, the easyOCR tool is used. For the block number and cases with insufficient results in the first OCR round, a second OCR with pytesseract is applied.Two datasets are used: one for tool development has 342 slides; and another for one for testing has 110 slides. Results: For the testing set, the overall accuracy for retrieving all relevant information per slide is 0.982. Of note, the accuracy for most information parts is 1.000, whereas the accuracy for the block number detection is 0.982. Conclusion: The Babel fish tool can be used to rename vast amounts of whole slide image files in an image analysis pipeline. Furthermore, it could be an essential part of DICOM conversion pipelines, as it extracts relevant metadata like case number, year, block ID, and staining.
format	Article
id	doaj-art-ac2dea88bf1445c5a41fb8eb17f2fd6a
institution	Kabale University
issn	2153-3539
language	English
publishDate	2024-12-01
publisher	Elsevier
record_format	Article
series	Journal of Pathology Informatics
spelling	doaj-art-ac2dea88bf1445c5a41fb8eb17f2fd6a2024-12-15T06:15:22ZengElsevierJournal of Pathology Informatics2153-35392024-12-0115100402Presenting the framework of the whole slide image file Babel fish: An OCR-based file labeling toolNils Englert0Constantin Schwab1Maximilian Legnar2Cleo-Aron Weis3Section Computational Pathology Heidelberg, Institute of Pathology Heidelberg, University Hospital Heidelberg, University of Heidelberg, Heidelberg, GermanyInstitute of Pathology Heidelberg, University Hospital Heidelberg, University of Heidelberg, Heidelberg, GermanySection Computational Pathology Heidelberg, Institute of Pathology Heidelberg, University Hospital Heidelberg, University of Heidelberg, Heidelberg, GermanySection Computational Pathology Heidelberg, Institute of Pathology Heidelberg, University Hospital Heidelberg, University of Heidelberg, Heidelberg, Germany; Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Heidelberg, Germany; Corresponding author.Introduction: Metadata extraction from digitized slides or whole slide image files is a frequent, laborious, and tedious task. In this work, we present a tool to automatically extract all relevant slide information, such as case number, year, slide number, block number, and staining from the macro-images of the scanned slide.We named the tool Babel fish as it helps translate relevant information printed on the slide. It is written to contain certain basic assumptions regarding, for example, the location of certain information. This can be adapted to the respective location. The extracted metadata can then be used to sort digital slides into databases or to link them with associated case IDs from laboratory information systems. Material and methods: The tool is based on optical character recognition (OCR). For most information, the easyOCR tool is used. For the block number and cases with insufficient results in the first OCR round, a second OCR with pytesseract is applied.Two datasets are used: one for tool development has 342 slides; and another for one for testing has 110 slides. Results: For the testing set, the overall accuracy for retrieving all relevant information per slide is 0.982. Of note, the accuracy for most information parts is 1.000, whereas the accuracy for the block number detection is 0.982. Conclusion: The Babel fish tool can be used to rename vast amounts of whole slide image files in an image analysis pipeline. Furthermore, it could be an essential part of DICOM conversion pipelines, as it extracts relevant metadata like case number, year, block ID, and staining.http://www.sciencedirect.com/science/article/pii/S2153353924000415DICOMDigital pathologyOptical character recognitionAutomatization
spellingShingle	Nils Englert Constantin Schwab Maximilian Legnar Cleo-Aron Weis Presenting the framework of the whole slide image file Babel fish: An OCR-based file labeling tool Journal of Pathology Informatics DICOM Digital pathology Optical character recognition Automatization
title	Presenting the framework of the whole slide image file Babel fish: An OCR-based file labeling tool
title_full	Presenting the framework of the whole slide image file Babel fish: An OCR-based file labeling tool
title_fullStr	Presenting the framework of the whole slide image file Babel fish: An OCR-based file labeling tool
title_full_unstemmed	Presenting the framework of the whole slide image file Babel fish: An OCR-based file labeling tool
title_short	Presenting the framework of the whole slide image file Babel fish: An OCR-based file labeling tool
title_sort	presenting the framework of the whole slide image file babel fish an ocr based file labeling tool
topic	DICOM Digital pathology Optical character recognition Automatization
url	http://www.sciencedirect.com/science/article/pii/S2153353924000415
work_keys_str_mv	AT nilsenglert presentingtheframeworkofthewholeslideimagefilebabelfishanocrbasedfilelabelingtool AT constantinschwab presentingtheframeworkofthewholeslideimagefilebabelfishanocrbasedfilelabelingtool AT maximilianlegnar presentingtheframeworkofthewholeslideimagefilebabelfishanocrbasedfilelabelingtool AT cleoaronweis presentingtheframeworkofthewholeslideimagefilebabelfishanocrbasedfilelabelingtool

Presenting the framework of the whole slide image file Babel fish: An OCR-based file labeling tool

Similar Items