Lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformer

IntroductionDiabetic retinopathy grading plays a vital role in the diagnosis and treatment of patients. In practice, this task mainly relies on manual inspection using human visual system. However, the human visual system-based screening process is labor-intensive, time-consuming, and error-prone. T...

Full description

Saved in:

Bibliographic Details
Main Authors:	Chong Liu, Weiguang Wang, Jian Lian, Wanzhen Jiao
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2025-01-01
Series:	Frontiers in Public Health
Subjects:	medical image analysis image classification deep learning Bi-LSTM transformer
Online Access:	https://www.frontiersin.org/articles/10.3389/fpubh.2024.1442114/full
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841558803986776064
author	Chong Liu Weiguang Wang Jian Lian Wanzhen Jiao
author_facet	Chong Liu Weiguang Wang Jian Lian Wanzhen Jiao
author_sort	Chong Liu
collection	DOAJ
description	IntroductionDiabetic retinopathy grading plays a vital role in the diagnosis and treatment of patients. In practice, this task mainly relies on manual inspection using human visual system. However, the human visual system-based screening process is labor-intensive, time-consuming, and error-prone. Therefore, plenty of automated screening technique have been developed to address this task.MethodsAmong these techniques, the deep learning models have demonstrated promising outcomes in various types of machine vision tasks. However, most of the medical image analysis-oriented deep learning approaches are built upon the convolutional operations, which might neglect the global dependencies between long-range pixels in the medical images. Therefore, the vision transformer models, which can unveil the associations between global pixels, have been gradually employed in medical image analysis. However, the quadratic computation complexity of attention mechanism has hindered the deployment of vision transformer in clinical practices. Bearing the analysis above in mind, this study introduces an integrated self-attention mechanism with both softmax and linear modules to guarantee efficiency and expressiveness, simultaneously. To be specific, a portion of query and key tokens, which are much less than the original query and key tokens, are adopted in the attention module by adding a set of proxy tokens. Note that the proxy tokens can fully utilize both the advantages of softmax and linear attention.ResultsTo evaluate the performance of the presented approach, the comparison experiments between state-of-the-art algorithms and the proposed approach are conducted. Experimental results demonstrate that the proposed approach achieves superior outcome over the state-of-the-art algorithms on the publicly available datasets.DiscussionAccordingly, the proposed approach can be taken as a potentially valuable instrument in clinical practices.
format	Article
id	doaj-art-e5f315e52f9e47ac931360eda139d597
institution	Kabale University
issn	2296-2565
language	English
publishDate	2025-01-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Public Health
spelling	doaj-art-e5f315e52f9e47ac931360eda139d5972025-01-06T05:13:15ZengFrontiers Media S.A.Frontiers in Public Health2296-25652025-01-011210.3389/fpubh.2024.14421141442114Lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformerChong Liu0Weiguang Wang1Jian Lian2Wanzhen Jiao3School of Intelligence Engineering, Shandong Management University, Jinan, ChinaDepartment of Ophthalmology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, ChinaSchool of Intelligence Engineering, Shandong Management University, Jinan, ChinaDepartment of Ophthalmology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, ChinaIntroductionDiabetic retinopathy grading plays a vital role in the diagnosis and treatment of patients. In practice, this task mainly relies on manual inspection using human visual system. However, the human visual system-based screening process is labor-intensive, time-consuming, and error-prone. Therefore, plenty of automated screening technique have been developed to address this task.MethodsAmong these techniques, the deep learning models have demonstrated promising outcomes in various types of machine vision tasks. However, most of the medical image analysis-oriented deep learning approaches are built upon the convolutional operations, which might neglect the global dependencies between long-range pixels in the medical images. Therefore, the vision transformer models, which can unveil the associations between global pixels, have been gradually employed in medical image analysis. However, the quadratic computation complexity of attention mechanism has hindered the deployment of vision transformer in clinical practices. Bearing the analysis above in mind, this study introduces an integrated self-attention mechanism with both softmax and linear modules to guarantee efficiency and expressiveness, simultaneously. To be specific, a portion of query and key tokens, which are much less than the original query and key tokens, are adopted in the attention module by adding a set of proxy tokens. Note that the proxy tokens can fully utilize both the advantages of softmax and linear attention.ResultsTo evaluate the performance of the presented approach, the comparison experiments between state-of-the-art algorithms and the proposed approach are conducted. Experimental results demonstrate that the proposed approach achieves superior outcome over the state-of-the-art algorithms on the publicly available datasets.DiscussionAccordingly, the proposed approach can be taken as a potentially valuable instrument in clinical practices.https://www.frontiersin.org/articles/10.3389/fpubh.2024.1442114/fullmedical image analysisimage classificationdeep learningBi-LSTMtransformer
spellingShingle	Chong Liu Weiguang Wang Jian Lian Wanzhen Jiao Lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformer Frontiers in Public Health medical image analysis image classification deep learning Bi-LSTM transformer
title	Lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformer
title_full	Lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformer
title_fullStr	Lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformer
title_full_unstemmed	Lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformer
title_short	Lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformer
title_sort	lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformer
topic	medical image analysis image classification deep learning Bi-LSTM transformer
url	https://www.frontiersin.org/articles/10.3389/fpubh.2024.1442114/full
work_keys_str_mv	AT chongliu lesionclassificationanddiabeticretinopathygradingbyintegratingsoftmaxandpoolingoperatorsintovisiontransformer AT weiguangwang lesionclassificationanddiabeticretinopathygradingbyintegratingsoftmaxandpoolingoperatorsintovisiontransformer AT jianlian lesionclassificationanddiabeticretinopathygradingbyintegratingsoftmaxandpoolingoperatorsintovisiontransformer AT wanzhenjiao lesionclassificationanddiabeticretinopathygradingbyintegratingsoftmaxandpoolingoperatorsintovisiontransformer

Lesion classification and diabetic retinopathy grading by integrating softmax and pooling operators into vision transformer

Similar Items