Efficiency and safety of automated label cleaning on multimodal retinal images

Abstract Label noise is a common and important issue that would affect the model’s performance in artificial intelligence. This study assessed the effectiveness and potential risks of automated label cleaning using an open-source framework, Cleanlab, in multi-category datasets of fundus photography...

Full description

Saved in:
Bibliographic Details
Main Authors: Tian Lin, Meng Wang, Aidi Lin, Xiaoting Mai, Huiyu Liang, Yih-Chung Tham, Haoyu Chen
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:npj Digital Medicine
Online Access:https://doi.org/10.1038/s41746-024-01424-x
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Label noise is a common and important issue that would affect the model’s performance in artificial intelligence. This study assessed the effectiveness and potential risks of automated label cleaning using an open-source framework, Cleanlab, in multi-category datasets of fundus photography and optical coherence tomography, with intentionally introduced label noise ranging from 0 to 70%. After six cycles of automatic cleaning, significant improvements are achieved in label accuracies (3.4–62.9%) and dataset quality scores (DQS, 5.1–74.4%). The majority (86.6 to 97.5%) of label errors were accurately modified, with minimal missed (0.5–2.8%) or misclassified (0.4–10.6%). The classification accuracy of RETFound significantly improved by 0.3–52.9% when trained with the datasets after cleaning. We also developed a DQS-guided cleaning strategy to mitigate over-cleaning. Furthermore, external validation on EyePACS and APTOS-2019 datasets boosted label accuracy by 1.3 and 1.8%, respectively. This approach automates label correction, enhances dataset reliability, and strengthens model performance efficiently and safely.
ISSN:2398-6352