Efficiency and safety of automated label cleaning on multimodal retinal images

Abstract Label noise is a common and important issue that would affect the model’s performance in artificial intelligence. This study assessed the effectiveness and potential risks of automated label cleaning using an open-source framework, Cleanlab, in multi-category datasets of fundus photography...

Full description

Saved in:

Bibliographic Details
Main Authors:	Tian Lin, Meng Wang, Aidi Lin, Xiaoting Mai, Huiyu Liang, Yih-Chung Tham, Haoyu Chen
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-01-01
Series:	npj Digital Medicine
Online Access:	https://doi.org/10.1038/s41746-024-01424-x
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Abstract Label noise is a common and important issue that would affect the model’s performance in artificial intelligence. This study assessed the effectiveness and potential risks of automated label cleaning using an open-source framework, Cleanlab, in multi-category datasets of fundus photography and optical coherence tomography, with intentionally introduced label noise ranging from 0 to 70%. After six cycles of automatic cleaning, significant improvements are achieved in label accuracies (3.4–62.9%) and dataset quality scores (DQS, 5.1–74.4%). The majority (86.6 to 97.5%) of label errors were accurately modified, with minimal missed (0.5–2.8%) or misclassified (0.4–10.6%). The classification accuracy of RETFound significantly improved by 0.3–52.9% when trained with the datasets after cleaning. We also developed a DQS-guided cleaning strategy to mitigate over-cleaning. Furthermore, external validation on EyePACS and APTOS-2019 datasets boosted label accuracy by 1.3 and 1.8%, respectively. This approach automates label correction, enhances dataset reliability, and strengthens model performance efficiently and safely.
ISSN:	2398-6352

Efficiency and safety of automated label cleaning on multimodal retinal images

Similar Items