Efficiency and safety of automated label cleaning on multimodal retinal images
Abstract Label noise is a common and important issue that would affect the model’s performance in artificial intelligence. This study assessed the effectiveness and potential risks of automated label cleaning using an open-source framework, Cleanlab, in multi-category datasets of fundus photography...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-01-01
|
Series: | npj Digital Medicine |
Online Access: | https://doi.org/10.1038/s41746-024-01424-x |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841559092504559616 |
---|---|
author | Tian Lin Meng Wang Aidi Lin Xiaoting Mai Huiyu Liang Yih-Chung Tham Haoyu Chen |
author_facet | Tian Lin Meng Wang Aidi Lin Xiaoting Mai Huiyu Liang Yih-Chung Tham Haoyu Chen |
author_sort | Tian Lin |
collection | DOAJ |
description | Abstract Label noise is a common and important issue that would affect the model’s performance in artificial intelligence. This study assessed the effectiveness and potential risks of automated label cleaning using an open-source framework, Cleanlab, in multi-category datasets of fundus photography and optical coherence tomography, with intentionally introduced label noise ranging from 0 to 70%. After six cycles of automatic cleaning, significant improvements are achieved in label accuracies (3.4–62.9%) and dataset quality scores (DQS, 5.1–74.4%). The majority (86.6 to 97.5%) of label errors were accurately modified, with minimal missed (0.5–2.8%) or misclassified (0.4–10.6%). The classification accuracy of RETFound significantly improved by 0.3–52.9% when trained with the datasets after cleaning. We also developed a DQS-guided cleaning strategy to mitigate over-cleaning. Furthermore, external validation on EyePACS and APTOS-2019 datasets boosted label accuracy by 1.3 and 1.8%, respectively. This approach automates label correction, enhances dataset reliability, and strengthens model performance efficiently and safely. |
format | Article |
id | doaj-art-bd025d528fc9460088b81cb832c43e65 |
institution | Kabale University |
issn | 2398-6352 |
language | English |
publishDate | 2025-01-01 |
publisher | Nature Portfolio |
record_format | Article |
series | npj Digital Medicine |
spelling | doaj-art-bd025d528fc9460088b81cb832c43e652025-01-05T12:47:25ZengNature Portfolionpj Digital Medicine2398-63522025-01-01811910.1038/s41746-024-01424-xEfficiency and safety of automated label cleaning on multimodal retinal imagesTian Lin0Meng Wang1Aidi Lin2Xiaoting Mai3Huiyu Liang4Yih-Chung Tham5Haoyu Chen6Joint Shantou International Eye Center, Shantou University and the Chinese University of Hong KongBeth Israel Deaconess Medical Center, Harvard Medical SchoolJoint Shantou International Eye Center, Shantou University and the Chinese University of Hong KongJoint Shantou International Eye Center, Shantou University and the Chinese University of Hong KongJoint Shantou International Eye Center, Shantou University and the Chinese University of Hong KongCentre for Innovation & Precision Eye Health, Yong Loo Lin School of Medicine, National University of SingaporeJoint Shantou International Eye Center, Shantou University and the Chinese University of Hong KongAbstract Label noise is a common and important issue that would affect the model’s performance in artificial intelligence. This study assessed the effectiveness and potential risks of automated label cleaning using an open-source framework, Cleanlab, in multi-category datasets of fundus photography and optical coherence tomography, with intentionally introduced label noise ranging from 0 to 70%. After six cycles of automatic cleaning, significant improvements are achieved in label accuracies (3.4–62.9%) and dataset quality scores (DQS, 5.1–74.4%). The majority (86.6 to 97.5%) of label errors were accurately modified, with minimal missed (0.5–2.8%) or misclassified (0.4–10.6%). The classification accuracy of RETFound significantly improved by 0.3–52.9% when trained with the datasets after cleaning. We also developed a DQS-guided cleaning strategy to mitigate over-cleaning. Furthermore, external validation on EyePACS and APTOS-2019 datasets boosted label accuracy by 1.3 and 1.8%, respectively. This approach automates label correction, enhances dataset reliability, and strengthens model performance efficiently and safely.https://doi.org/10.1038/s41746-024-01424-x |
spellingShingle | Tian Lin Meng Wang Aidi Lin Xiaoting Mai Huiyu Liang Yih-Chung Tham Haoyu Chen Efficiency and safety of automated label cleaning on multimodal retinal images npj Digital Medicine |
title | Efficiency and safety of automated label cleaning on multimodal retinal images |
title_full | Efficiency and safety of automated label cleaning on multimodal retinal images |
title_fullStr | Efficiency and safety of automated label cleaning on multimodal retinal images |
title_full_unstemmed | Efficiency and safety of automated label cleaning on multimodal retinal images |
title_short | Efficiency and safety of automated label cleaning on multimodal retinal images |
title_sort | efficiency and safety of automated label cleaning on multimodal retinal images |
url | https://doi.org/10.1038/s41746-024-01424-x |
work_keys_str_mv | AT tianlin efficiencyandsafetyofautomatedlabelcleaningonmultimodalretinalimages AT mengwang efficiencyandsafetyofautomatedlabelcleaningonmultimodalretinalimages AT aidilin efficiencyandsafetyofautomatedlabelcleaningonmultimodalretinalimages AT xiaotingmai efficiencyandsafetyofautomatedlabelcleaningonmultimodalretinalimages AT huiyuliang efficiencyandsafetyofautomatedlabelcleaningonmultimodalretinalimages AT yihchungtham efficiencyandsafetyofautomatedlabelcleaningonmultimodalretinalimages AT haoyuchen efficiencyandsafetyofautomatedlabelcleaningonmultimodalretinalimages |