Lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformer
IntroductionDiabetic retinopathy grading plays a vital role in the diagnosis and treatment of patients. In practice, this task mainly relies on manual inspection using human visual system. However, the human visual system-based screening process is labor-intensive, time-consuming, and error-prone. T...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2025-01-01
|
Series: | Frontiers in Public Health |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fpubh.2024.1442114/full |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841558803986776064 |
---|---|
author | Chong Liu Weiguang Wang Jian Lian Wanzhen Jiao |
author_facet | Chong Liu Weiguang Wang Jian Lian Wanzhen Jiao |
author_sort | Chong Liu |
collection | DOAJ |
description | IntroductionDiabetic retinopathy grading plays a vital role in the diagnosis and treatment of patients. In practice, this task mainly relies on manual inspection using human visual system. However, the human visual system-based screening process is labor-intensive, time-consuming, and error-prone. Therefore, plenty of automated screening technique have been developed to address this task.MethodsAmong these techniques, the deep learning models have demonstrated promising outcomes in various types of machine vision tasks. However, most of the medical image analysis-oriented deep learning approaches are built upon the convolutional operations, which might neglect the global dependencies between long-range pixels in the medical images. Therefore, the vision transformer models, which can unveil the associations between global pixels, have been gradually employed in medical image analysis. However, the quadratic computation complexity of attention mechanism has hindered the deployment of vision transformer in clinical practices. Bearing the analysis above in mind, this study introduces an integrated self-attention mechanism with both softmax and linear modules to guarantee efficiency and expressiveness, simultaneously. To be specific, a portion of query and key tokens, which are much less than the original query and key tokens, are adopted in the attention module by adding a set of proxy tokens. Note that the proxy tokens can fully utilize both the advantages of softmax and linear attention.ResultsTo evaluate the performance of the presented approach, the comparison experiments between state-of-the-art algorithms and the proposed approach are conducted. Experimental results demonstrate that the proposed approach achieves superior outcome over the state-of-the-art algorithms on the publicly available datasets.DiscussionAccordingly, the proposed approach can be taken as a potentially valuable instrument in clinical practices. |
format | Article |
id | doaj-art-e5f315e52f9e47ac931360eda139d597 |
institution | Kabale University |
issn | 2296-2565 |
language | English |
publishDate | 2025-01-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Public Health |
spelling | doaj-art-e5f315e52f9e47ac931360eda139d5972025-01-06T05:13:15ZengFrontiers Media S.A.Frontiers in Public Health2296-25652025-01-011210.3389/fpubh.2024.14421141442114Lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformerChong Liu0Weiguang Wang1Jian Lian2Wanzhen Jiao3School of Intelligence Engineering, Shandong Management University, Jinan, ChinaDepartment of Ophthalmology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, ChinaSchool of Intelligence Engineering, Shandong Management University, Jinan, ChinaDepartment of Ophthalmology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, ChinaIntroductionDiabetic retinopathy grading plays a vital role in the diagnosis and treatment of patients. In practice, this task mainly relies on manual inspection using human visual system. However, the human visual system-based screening process is labor-intensive, time-consuming, and error-prone. Therefore, plenty of automated screening technique have been developed to address this task.MethodsAmong these techniques, the deep learning models have demonstrated promising outcomes in various types of machine vision tasks. However, most of the medical image analysis-oriented deep learning approaches are built upon the convolutional operations, which might neglect the global dependencies between long-range pixels in the medical images. Therefore, the vision transformer models, which can unveil the associations between global pixels, have been gradually employed in medical image analysis. However, the quadratic computation complexity of attention mechanism has hindered the deployment of vision transformer in clinical practices. Bearing the analysis above in mind, this study introduces an integrated self-attention mechanism with both softmax and linear modules to guarantee efficiency and expressiveness, simultaneously. To be specific, a portion of query and key tokens, which are much less than the original query and key tokens, are adopted in the attention module by adding a set of proxy tokens. Note that the proxy tokens can fully utilize both the advantages of softmax and linear attention.ResultsTo evaluate the performance of the presented approach, the comparison experiments between state-of-the-art algorithms and the proposed approach are conducted. Experimental results demonstrate that the proposed approach achieves superior outcome over the state-of-the-art algorithms on the publicly available datasets.DiscussionAccordingly, the proposed approach can be taken as a potentially valuable instrument in clinical practices.https://www.frontiersin.org/articles/10.3389/fpubh.2024.1442114/fullmedical image analysisimage classificationdeep learningBi-LSTMtransformer |
spellingShingle | Chong Liu Weiguang Wang Jian Lian Wanzhen Jiao Lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformer Frontiers in Public Health medical image analysis image classification deep learning Bi-LSTM transformer |
title | Lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformer |
title_full | Lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformer |
title_fullStr | Lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformer |
title_full_unstemmed | Lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformer |
title_short | Lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformer |
title_sort | lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformer |
topic | medical image analysis image classification deep learning Bi-LSTM transformer |
url | https://www.frontiersin.org/articles/10.3389/fpubh.2024.1442114/full |
work_keys_str_mv | AT chongliu lesionclassificationanddiabeticretinopathygradingbyintegratingsoftmaxandpoolingoperatorsintovisiontransformer AT weiguangwang lesionclassificationanddiabeticretinopathygradingbyintegratingsoftmaxandpoolingoperatorsintovisiontransformer AT jianlian lesionclassificationanddiabeticretinopathygradingbyintegratingsoftmaxandpoolingoperatorsintovisiontransformer AT wanzhenjiao lesionclassificationanddiabeticretinopathygradingbyintegratingsoftmaxandpoolingoperatorsintovisiontransformer |