Publicly available datasets of breast histopathology H&E whole-slide images: A scoping review

Advancements in digital pathology and computing resources have made a significant impact in the field of computational pathology for breast cancer diagnosis and treatment. However, access to high-quality labeled histopathological images of breast cancer is a big challenge that limits the development...

Full description

Saved in:
Bibliographic Details
Main Authors: Masoud Tafavvoghi, Lars Ailo Bongo, Nikita Shvetsov, Lill-Tove Rasmussen Busund, Kajsa Møllersen
Format: Article
Language:English
Published: Elsevier 2024-12-01
Series:Journal of Pathology Informatics
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2153353924000026
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846122083036168192
author Masoud Tafavvoghi
Lars Ailo Bongo
Nikita Shvetsov
Lill-Tove Rasmussen Busund
Kajsa Møllersen
author_facet Masoud Tafavvoghi
Lars Ailo Bongo
Nikita Shvetsov
Lill-Tove Rasmussen Busund
Kajsa Møllersen
author_sort Masoud Tafavvoghi
collection DOAJ
description Advancements in digital pathology and computing resources have made a significant impact in the field of computational pathology for breast cancer diagnosis and treatment. However, access to high-quality labeled histopathological images of breast cancer is a big challenge that limits the development of accurate and robust deep learning models. In this scoping review, we identified the publicly available datasets of breast H&E-stained whole-slide images (WSIs) that can be used to develop deep learning algorithms. We systematically searched 9 scientific literature databases and 9 research data repositories and found 17 publicly available datasets containing 10 385 H&E WSIs of breast cancer. Moreover, we reported image metadata and characteristics for each dataset to assist researchers in selecting proper datasets for specific tasks in breast cancer computational pathology. In addition, we compiled 2 lists of breast H&E patches and private datasets as supplementary resources for researchers. Notably, only 28% of the included articles utilized multiple datasets, and only 14% used an external validation set, suggesting that the performance of other developed models may be susceptible to overestimation. The TCGA-BRCA was used in 52% of the selected studies. This dataset has a considerable selection bias that can impact the robustness and generalizability of the trained algorithms. There is also a lack of consistent metadata reporting of breast WSI datasets that can be an issue in developing accurate deep learning models, indicating the necessity of establishing explicit guidelines for documenting breast WSI dataset characteristics and metadata.
format Article
id doaj-art-1b42556ffece42f8b0e1ca0cb0bafd24
institution Kabale University
issn 2153-3539
language English
publishDate 2024-12-01
publisher Elsevier
record_format Article
series Journal of Pathology Informatics
spelling doaj-art-1b42556ffece42f8b0e1ca0cb0bafd242024-12-15T06:15:10ZengElsevierJournal of Pathology Informatics2153-35392024-12-0115100363Publicly available datasets of breast histopathology H&E whole-slide images: A scoping reviewMasoud Tafavvoghi0Lars Ailo Bongo1Nikita Shvetsov2Lill-Tove Rasmussen Busund3Kajsa Møllersen4Department of Community Medicine, Uit The Arctic University of Norway, Tromsø, Norway; Corresponding author.Department of Computer Science, Uit The Arctic University of Norway, Tromsø, NorwayDepartment of Computer Science, Uit The Arctic University of Norway, Tromsø, NorwayDepartment of Medical Biology, Uit The Arctic University of Norway, Tromsø, NorwayDepartment of Community Medicine, Uit The Arctic University of Norway, Tromsø, NorwayAdvancements in digital pathology and computing resources have made a significant impact in the field of computational pathology for breast cancer diagnosis and treatment. However, access to high-quality labeled histopathological images of breast cancer is a big challenge that limits the development of accurate and robust deep learning models. In this scoping review, we identified the publicly available datasets of breast H&E-stained whole-slide images (WSIs) that can be used to develop deep learning algorithms. We systematically searched 9 scientific literature databases and 9 research data repositories and found 17 publicly available datasets containing 10 385 H&E WSIs of breast cancer. Moreover, we reported image metadata and characteristics for each dataset to assist researchers in selecting proper datasets for specific tasks in breast cancer computational pathology. In addition, we compiled 2 lists of breast H&E patches and private datasets as supplementary resources for researchers. Notably, only 28% of the included articles utilized multiple datasets, and only 14% used an external validation set, suggesting that the performance of other developed models may be susceptible to overestimation. The TCGA-BRCA was used in 52% of the selected studies. This dataset has a considerable selection bias that can impact the robustness and generalizability of the trained algorithms. There is also a lack of consistent metadata reporting of breast WSI datasets that can be an issue in developing accurate deep learning models, indicating the necessity of establishing explicit guidelines for documenting breast WSI dataset characteristics and metadata.http://www.sciencedirect.com/science/article/pii/S2153353924000026Breast cancerComputational pathologyDeep learningWhole-slide imagesPublicly available datasets
spellingShingle Masoud Tafavvoghi
Lars Ailo Bongo
Nikita Shvetsov
Lill-Tove Rasmussen Busund
Kajsa Møllersen
Publicly available datasets of breast histopathology H&E whole-slide images: A scoping review
Journal of Pathology Informatics
Breast cancer
Computational pathology
Deep learning
Whole-slide images
Publicly available datasets
title Publicly available datasets of breast histopathology H&E whole-slide images: A scoping review
title_full Publicly available datasets of breast histopathology H&E whole-slide images: A scoping review
title_fullStr Publicly available datasets of breast histopathology H&E whole-slide images: A scoping review
title_full_unstemmed Publicly available datasets of breast histopathology H&E whole-slide images: A scoping review
title_short Publicly available datasets of breast histopathology H&E whole-slide images: A scoping review
title_sort publicly available datasets of breast histopathology h e whole slide images a scoping review
topic Breast cancer
Computational pathology
Deep learning
Whole-slide images
Publicly available datasets
url http://www.sciencedirect.com/science/article/pii/S2153353924000026
work_keys_str_mv AT masoudtafavvoghi publiclyavailabledatasetsofbreasthistopathologyhewholeslideimagesascopingreview
AT larsailobongo publiclyavailabledatasetsofbreasthistopathologyhewholeslideimagesascopingreview
AT nikitashvetsov publiclyavailabledatasetsofbreasthistopathologyhewholeslideimagesascopingreview
AT lilltoverasmussenbusund publiclyavailabledatasetsofbreasthistopathologyhewholeslideimagesascopingreview
AT kajsamøllersen publiclyavailabledatasetsofbreasthistopathologyhewholeslideimagesascopingreview