Efficiency and safety of automated label cleaning on multimodal retinal images

Abstract Label noise is a common and important issue that would affect the model’s performance in artificial intelligence. This study assessed the effectiveness and potential risks of automated label cleaning using an open-source framework, Cleanlab, in multi-category datasets of fundus photography...

Full description

Saved in:
Bibliographic Details
Main Authors: Tian Lin, Meng Wang, Aidi Lin, Xiaoting Mai, Huiyu Liang, Yih-Chung Tham, Haoyu Chen
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:npj Digital Medicine
Online Access:https://doi.org/10.1038/s41746-024-01424-x
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841559092504559616
author Tian Lin
Meng Wang
Aidi Lin
Xiaoting Mai
Huiyu Liang
Yih-Chung Tham
Haoyu Chen
author_facet Tian Lin
Meng Wang
Aidi Lin
Xiaoting Mai
Huiyu Liang
Yih-Chung Tham
Haoyu Chen
author_sort Tian Lin
collection DOAJ
description Abstract Label noise is a common and important issue that would affect the model’s performance in artificial intelligence. This study assessed the effectiveness and potential risks of automated label cleaning using an open-source framework, Cleanlab, in multi-category datasets of fundus photography and optical coherence tomography, with intentionally introduced label noise ranging from 0 to 70%. After six cycles of automatic cleaning, significant improvements are achieved in label accuracies (3.4–62.9%) and dataset quality scores (DQS, 5.1–74.4%). The majority (86.6 to 97.5%) of label errors were accurately modified, with minimal missed (0.5–2.8%) or misclassified (0.4–10.6%). The classification accuracy of RETFound significantly improved by 0.3–52.9% when trained with the datasets after cleaning. We also developed a DQS-guided cleaning strategy to mitigate over-cleaning. Furthermore, external validation on EyePACS and APTOS-2019 datasets boosted label accuracy by 1.3 and 1.8%, respectively. This approach automates label correction, enhances dataset reliability, and strengthens model performance efficiently and safely.
format Article
id doaj-art-bd025d528fc9460088b81cb832c43e65
institution Kabale University
issn 2398-6352
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series npj Digital Medicine
spelling doaj-art-bd025d528fc9460088b81cb832c43e652025-01-05T12:47:25ZengNature Portfolionpj Digital Medicine2398-63522025-01-01811910.1038/s41746-024-01424-xEfficiency and safety of automated label cleaning on multimodal retinal imagesTian Lin0Meng Wang1Aidi Lin2Xiaoting Mai3Huiyu Liang4Yih-Chung Tham5Haoyu Chen6Joint Shantou International Eye Center, Shantou University and the Chinese University of Hong KongBeth Israel Deaconess Medical Center, Harvard Medical SchoolJoint Shantou International Eye Center, Shantou University and the Chinese University of Hong KongJoint Shantou International Eye Center, Shantou University and the Chinese University of Hong KongJoint Shantou International Eye Center, Shantou University and the Chinese University of Hong KongCentre for Innovation & Precision Eye Health, Yong Loo Lin School of Medicine, National University of SingaporeJoint Shantou International Eye Center, Shantou University and the Chinese University of Hong KongAbstract Label noise is a common and important issue that would affect the model’s performance in artificial intelligence. This study assessed the effectiveness and potential risks of automated label cleaning using an open-source framework, Cleanlab, in multi-category datasets of fundus photography and optical coherence tomography, with intentionally introduced label noise ranging from 0 to 70%. After six cycles of automatic cleaning, significant improvements are achieved in label accuracies (3.4–62.9%) and dataset quality scores (DQS, 5.1–74.4%). The majority (86.6 to 97.5%) of label errors were accurately modified, with minimal missed (0.5–2.8%) or misclassified (0.4–10.6%). The classification accuracy of RETFound significantly improved by 0.3–52.9% when trained with the datasets after cleaning. We also developed a DQS-guided cleaning strategy to mitigate over-cleaning. Furthermore, external validation on EyePACS and APTOS-2019 datasets boosted label accuracy by 1.3 and 1.8%, respectively. This approach automates label correction, enhances dataset reliability, and strengthens model performance efficiently and safely.https://doi.org/10.1038/s41746-024-01424-x
spellingShingle Tian Lin
Meng Wang
Aidi Lin
Xiaoting Mai
Huiyu Liang
Yih-Chung Tham
Haoyu Chen
Efficiency and safety of automated label cleaning on multimodal retinal images
npj Digital Medicine
title Efficiency and safety of automated label cleaning on multimodal retinal images
title_full Efficiency and safety of automated label cleaning on multimodal retinal images
title_fullStr Efficiency and safety of automated label cleaning on multimodal retinal images
title_full_unstemmed Efficiency and safety of automated label cleaning on multimodal retinal images
title_short Efficiency and safety of automated label cleaning on multimodal retinal images
title_sort efficiency and safety of automated label cleaning on multimodal retinal images
url https://doi.org/10.1038/s41746-024-01424-x
work_keys_str_mv AT tianlin efficiencyandsafetyofautomatedlabelcleaningonmultimodalretinalimages
AT mengwang efficiencyandsafetyofautomatedlabelcleaningonmultimodalretinalimages
AT aidilin efficiencyandsafetyofautomatedlabelcleaningonmultimodalretinalimages
AT xiaotingmai efficiencyandsafetyofautomatedlabelcleaningonmultimodalretinalimages
AT huiyuliang efficiencyandsafetyofautomatedlabelcleaningonmultimodalretinalimages
AT yihchungtham efficiencyandsafetyofautomatedlabelcleaningonmultimodalretinalimages
AT haoyuchen efficiencyandsafetyofautomatedlabelcleaningonmultimodalretinalimages