Lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformer

IntroductionDiabetic retinopathy grading plays a vital role in the diagnosis and treatment of patients. In practice, this task mainly relies on manual inspection using human visual system. However, the human visual system-based screening process is labor-intensive, time-consuming, and error-prone. T...

Full description

Saved in:
Bibliographic Details
Main Authors: Chong Liu, Weiguang Wang, Jian Lian, Wanzhen Jiao
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-01-01
Series:Frontiers in Public Health
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fpubh.2024.1442114/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841558803986776064
author Chong Liu
Weiguang Wang
Jian Lian
Wanzhen Jiao
author_facet Chong Liu
Weiguang Wang
Jian Lian
Wanzhen Jiao
author_sort Chong Liu
collection DOAJ
description IntroductionDiabetic retinopathy grading plays a vital role in the diagnosis and treatment of patients. In practice, this task mainly relies on manual inspection using human visual system. However, the human visual system-based screening process is labor-intensive, time-consuming, and error-prone. Therefore, plenty of automated screening technique have been developed to address this task.MethodsAmong these techniques, the deep learning models have demonstrated promising outcomes in various types of machine vision tasks. However, most of the medical image analysis-oriented deep learning approaches are built upon the convolutional operations, which might neglect the global dependencies between long-range pixels in the medical images. Therefore, the vision transformer models, which can unveil the associations between global pixels, have been gradually employed in medical image analysis. However, the quadratic computation complexity of attention mechanism has hindered the deployment of vision transformer in clinical practices. Bearing the analysis above in mind, this study introduces an integrated self-attention mechanism with both softmax and linear modules to guarantee efficiency and expressiveness, simultaneously. To be specific, a portion of query and key tokens, which are much less than the original query and key tokens, are adopted in the attention module by adding a set of proxy tokens. Note that the proxy tokens can fully utilize both the advantages of softmax and linear attention.ResultsTo evaluate the performance of the presented approach, the comparison experiments between state-of-the-art algorithms and the proposed approach are conducted. Experimental results demonstrate that the proposed approach achieves superior outcome over the state-of-the-art algorithms on the publicly available datasets.DiscussionAccordingly, the proposed approach can be taken as a potentially valuable instrument in clinical practices.
format Article
id doaj-art-e5f315e52f9e47ac931360eda139d597
institution Kabale University
issn 2296-2565
language English
publishDate 2025-01-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Public Health
spelling doaj-art-e5f315e52f9e47ac931360eda139d5972025-01-06T05:13:15ZengFrontiers Media S.A.Frontiers in Public Health2296-25652025-01-011210.3389/fpubh.2024.14421141442114Lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformerChong Liu0Weiguang Wang1Jian Lian2Wanzhen Jiao3School of Intelligence Engineering, Shandong Management University, Jinan, ChinaDepartment of Ophthalmology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, ChinaSchool of Intelligence Engineering, Shandong Management University, Jinan, ChinaDepartment of Ophthalmology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, ChinaIntroductionDiabetic retinopathy grading plays a vital role in the diagnosis and treatment of patients. In practice, this task mainly relies on manual inspection using human visual system. However, the human visual system-based screening process is labor-intensive, time-consuming, and error-prone. Therefore, plenty of automated screening technique have been developed to address this task.MethodsAmong these techniques, the deep learning models have demonstrated promising outcomes in various types of machine vision tasks. However, most of the medical image analysis-oriented deep learning approaches are built upon the convolutional operations, which might neglect the global dependencies between long-range pixels in the medical images. Therefore, the vision transformer models, which can unveil the associations between global pixels, have been gradually employed in medical image analysis. However, the quadratic computation complexity of attention mechanism has hindered the deployment of vision transformer in clinical practices. Bearing the analysis above in mind, this study introduces an integrated self-attention mechanism with both softmax and linear modules to guarantee efficiency and expressiveness, simultaneously. To be specific, a portion of query and key tokens, which are much less than the original query and key tokens, are adopted in the attention module by adding a set of proxy tokens. Note that the proxy tokens can fully utilize both the advantages of softmax and linear attention.ResultsTo evaluate the performance of the presented approach, the comparison experiments between state-of-the-art algorithms and the proposed approach are conducted. Experimental results demonstrate that the proposed approach achieves superior outcome over the state-of-the-art algorithms on the publicly available datasets.DiscussionAccordingly, the proposed approach can be taken as a potentially valuable instrument in clinical practices.https://www.frontiersin.org/articles/10.3389/fpubh.2024.1442114/fullmedical image analysisimage classificationdeep learningBi-LSTMtransformer
spellingShingle Chong Liu
Weiguang Wang
Jian Lian
Wanzhen Jiao
Lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformer
Frontiers in Public Health
medical image analysis
image classification
deep learning
Bi-LSTM
transformer
title Lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformer
title_full Lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformer
title_fullStr Lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformer
title_full_unstemmed Lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformer
title_short Lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformer
title_sort lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformer
topic medical image analysis
image classification
deep learning
Bi-LSTM
transformer
url https://www.frontiersin.org/articles/10.3389/fpubh.2024.1442114/full
work_keys_str_mv AT chongliu lesionclassificationanddiabeticretinopathygradingbyintegratingsoftmaxandpoolingoperatorsintovisiontransformer
AT weiguangwang lesionclassificationanddiabeticretinopathygradingbyintegratingsoftmaxandpoolingoperatorsintovisiontransformer
AT jianlian lesionclassificationanddiabeticretinopathygradingbyintegratingsoftmaxandpoolingoperatorsintovisiontransformer
AT wanzhenjiao lesionclassificationanddiabeticretinopathygradingbyintegratingsoftmaxandpoolingoperatorsintovisiontransformer