PW-BALFC, a clinical dataset for detection and instance segmentation of bronchoalveolar lavage fluid cell

Abstract Bronchoalveolar lavage fluid (BALF) cytology provides an important basis for the diagnosis and treatment of lung diseases. Current cytological analysis of BALF relies on manual microscopic examination, which is time-consuming, laborious, and experience-dependent. Automated identification of...

Full description

Saved in:
Bibliographic Details
Main Authors: Xin Shi, Qing Huang, Teng Xu, Hongwen Mei, Tingwei Quan, Xiuli Wang, Yinghan Shi, Ye Hu, Zhimei Duan, Fei Xie, Sifan Li, Lixin Xie, Kaifei Wang
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-025-05452-4
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849238171037990912
author Xin Shi
Qing Huang
Teng Xu
Hongwen Mei
Tingwei Quan
Xiuli Wang
Yinghan Shi
Ye Hu
Zhimei Duan
Fei Xie
Sifan Li
Lixin Xie
Kaifei Wang
author_facet Xin Shi
Qing Huang
Teng Xu
Hongwen Mei
Tingwei Quan
Xiuli Wang
Yinghan Shi
Ye Hu
Zhimei Duan
Fei Xie
Sifan Li
Lixin Xie
Kaifei Wang
author_sort Xin Shi
collection DOAJ
description Abstract Bronchoalveolar lavage fluid (BALF) cytology provides an important basis for the diagnosis and treatment of lung diseases. Current cytological analysis of BALF relies on manual microscopic examination, which is time-consuming, laborious, and experience-dependent. Automated identification of BALF cytology helps increase the accuracy and speed of screening qualified samples and subsequent cytomorphology analysis. However, there is a lack of public clinical BALF cell datasets for the detection of different cell types and a lack of pixel-level annotations for cytomorphology analysis. In this work, high-resolution cell images from clinical bronchoalveolar lavage sample obtained at the Chinese PLA General Hospital from 2018–2024 were collected, and pixel-level high-quality instance annotations of seven cell types were labeled. In total, 2,105 clinical images were gathered, with 13,263 cells from seven distinct classes, via both contour fine labeling and bounding box labeling. The dataset was trained and tested by the YOLOv8 instance segmentation network. The results demonstrated that the dataset and model we provided are beneficial for the study of automated cell identification in BALF.
format Article
id doaj-art-c6f51f24c59d4dfea5e4b7041cb981e1
institution Kabale University
issn 2052-4463
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series Scientific Data
spelling doaj-art-c6f51f24c59d4dfea5e4b7041cb981e12025-08-20T04:01:43ZengNature PortfolioScientific Data2052-44632025-07-011211810.1038/s41597-025-05452-4PW-BALFC, a clinical dataset for detection and instance segmentation of bronchoalveolar lavage fluid cellXin Shi0Qing Huang1Teng Xu2Hongwen Mei3Tingwei Quan4Xiuli Wang5Yinghan Shi6Ye Hu7Zhimei Duan8Fei Xie9Sifan Li10Lixin Xie11Kaifei Wang12College of Pulmonary and Critical Care Medicine, Chinese PLA General HospitalSchool of Computer Science & Engineering Artificial Intelligence, Hubei Key Laboratory of Intelligent Robotics, Wuhan Institute of TechnologySchool of Computer Science & Engineering Artificial Intelligence, Hubei Key Laboratory of Intelligent Robotics, Wuhan Institute of TechnologySchool of Computer Science & Engineering Artificial Intelligence, Hubei Key Laboratory of Intelligent Robotics, Wuhan Institute of TechnologyBritton Chance Center for Biomedical Photonics, Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and TechnologyCollege of Pulmonary and Critical Care Medicine, Chinese PLA General HospitalCollege of Pulmonary and Critical Care Medicine, Chinese PLA General HospitalCollege of Pulmonary and Critical Care Medicine, Chinese PLA General HospitalCollege of Pulmonary and Critical Care Medicine, Chinese PLA General HospitalCollege of Pulmonary and Critical Care Medicine, Chinese PLA General HospitalCollege of Pulmonary and Critical Care Medicine, Chinese PLA General HospitalCollege of Pulmonary and Critical Care Medicine, Chinese PLA General HospitalCollege of Pulmonary and Critical Care Medicine, Chinese PLA General HospitalAbstract Bronchoalveolar lavage fluid (BALF) cytology provides an important basis for the diagnosis and treatment of lung diseases. Current cytological analysis of BALF relies on manual microscopic examination, which is time-consuming, laborious, and experience-dependent. Automated identification of BALF cytology helps increase the accuracy and speed of screening qualified samples and subsequent cytomorphology analysis. However, there is a lack of public clinical BALF cell datasets for the detection of different cell types and a lack of pixel-level annotations for cytomorphology analysis. In this work, high-resolution cell images from clinical bronchoalveolar lavage sample obtained at the Chinese PLA General Hospital from 2018–2024 were collected, and pixel-level high-quality instance annotations of seven cell types were labeled. In total, 2,105 clinical images were gathered, with 13,263 cells from seven distinct classes, via both contour fine labeling and bounding box labeling. The dataset was trained and tested by the YOLOv8 instance segmentation network. The results demonstrated that the dataset and model we provided are beneficial for the study of automated cell identification in BALF.https://doi.org/10.1038/s41597-025-05452-4
spellingShingle Xin Shi
Qing Huang
Teng Xu
Hongwen Mei
Tingwei Quan
Xiuli Wang
Yinghan Shi
Ye Hu
Zhimei Duan
Fei Xie
Sifan Li
Lixin Xie
Kaifei Wang
PW-BALFC, a clinical dataset for detection and instance segmentation of bronchoalveolar lavage fluid cell
Scientific Data
title PW-BALFC, a clinical dataset for detection and instance segmentation of bronchoalveolar lavage fluid cell
title_full PW-BALFC, a clinical dataset for detection and instance segmentation of bronchoalveolar lavage fluid cell
title_fullStr PW-BALFC, a clinical dataset for detection and instance segmentation of bronchoalveolar lavage fluid cell
title_full_unstemmed PW-BALFC, a clinical dataset for detection and instance segmentation of bronchoalveolar lavage fluid cell
title_short PW-BALFC, a clinical dataset for detection and instance segmentation of bronchoalveolar lavage fluid cell
title_sort pw balfc a clinical dataset for detection and instance segmentation of bronchoalveolar lavage fluid cell
url https://doi.org/10.1038/s41597-025-05452-4
work_keys_str_mv AT xinshi pwbalfcaclinicaldatasetfordetectionandinstancesegmentationofbronchoalveolarlavagefluidcell
AT qinghuang pwbalfcaclinicaldatasetfordetectionandinstancesegmentationofbronchoalveolarlavagefluidcell
AT tengxu pwbalfcaclinicaldatasetfordetectionandinstancesegmentationofbronchoalveolarlavagefluidcell
AT hongwenmei pwbalfcaclinicaldatasetfordetectionandinstancesegmentationofbronchoalveolarlavagefluidcell
AT tingweiquan pwbalfcaclinicaldatasetfordetectionandinstancesegmentationofbronchoalveolarlavagefluidcell
AT xiuliwang pwbalfcaclinicaldatasetfordetectionandinstancesegmentationofbronchoalveolarlavagefluidcell
AT yinghanshi pwbalfcaclinicaldatasetfordetectionandinstancesegmentationofbronchoalveolarlavagefluidcell
AT yehu pwbalfcaclinicaldatasetfordetectionandinstancesegmentationofbronchoalveolarlavagefluidcell
AT zhimeiduan pwbalfcaclinicaldatasetfordetectionandinstancesegmentationofbronchoalveolarlavagefluidcell
AT feixie pwbalfcaclinicaldatasetfordetectionandinstancesegmentationofbronchoalveolarlavagefluidcell
AT sifanli pwbalfcaclinicaldatasetfordetectionandinstancesegmentationofbronchoalveolarlavagefluidcell
AT lixinxie pwbalfcaclinicaldatasetfordetectionandinstancesegmentationofbronchoalveolarlavagefluidcell
AT kaifeiwang pwbalfcaclinicaldatasetfordetectionandinstancesegmentationofbronchoalveolarlavagefluidcell