Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification

Background/Objectives: Vision Transformers (ViTs) and convolutional neural networks (CNNs) have demonstrated remarkable performances in image classification, especially in the domain of medical imaging analysis. However, ViTs struggle to capture high-frequency components of images, which are critica...

Full description

Saved in:

Bibliographic Details
Main Authors:	Okpala Chibuike, Xiaopeng Yang
Format:	Article
Language:	English
Published:	MDPI AG 2024-12-01
Series:	Diagnostics
Subjects:	vision transformer convolutional neural network gated control mechanism multi-scale fusion module pulmonary diseases
Online Access:	https://www.mdpi.com/2075-4418/14/24/2790
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846105098870063104
author	Okpala Chibuike Xiaopeng Yang
author_facet	Okpala Chibuike Xiaopeng Yang
author_sort	Okpala Chibuike
collection	DOAJ
description	Background/Objectives: Vision Transformers (ViTs) and convolutional neural networks (CNNs) have demonstrated remarkable performances in image classification, especially in the domain of medical imaging analysis. However, ViTs struggle to capture high-frequency components of images, which are critical in identifying fine-grained patterns, while CNNs have difficulties in capturing long-range dependencies due to their local receptive fields, which makes it difficult to fully capture the spatial relationship across lung regions. Methods: In this paper, we proposed a hybrid architecture that integrates ViTs and CNNs within a modular component block(s) to leverage both local feature extraction and global context capture. In each component block, the CNN is used to extract the local features, which are then passed through the ViT to capture the global dependencies. We implemented a gated attention mechanism that combines the channel-, spatial-, and element-wise attention to selectively emphasize the important features, thereby enhancing overall feature representation. Furthermore, we incorporated a multi-scale fusion module (MSFM) in the proposed framework to fuse the features at different scales for more comprehensive feature representation. Results: Our proposed model achieved an accuracy of 99.50% in the classification of four pulmonary conditions. Conclusions: Through extensive experiments and ablation studies, we demonstrated the effectiveness of our approach in improving the medical image classification performance, while achieving good calibration results. This hybrid approach offers a promising framework for reliable and accurate disease diagnosis in medical imaging.
format	Article
id	doaj-art-dce1f07a66e445b0a863032923a6b789
institution	Kabale University
issn	2075-4418
language	English
publishDate	2024-12-01
publisher	MDPI AG
record_format	Article
series	Diagnostics
spelling	doaj-art-dce1f07a66e445b0a863032923a6b7892024-12-27T14:20:45ZengMDPI AGDiagnostics2075-44182024-12-011424279010.3390/diagnostics14242790Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease ClassificationOkpala Chibuike0Xiaopeng Yang1Department of Human Ecology & Technology, Handong Global University, Pohang 37554, Republic of KoreaDepartment of Human Ecology & Technology, Handong Global University, Pohang 37554, Republic of KoreaBackground/Objectives: Vision Transformers (ViTs) and convolutional neural networks (CNNs) have demonstrated remarkable performances in image classification, especially in the domain of medical imaging analysis. However, ViTs struggle to capture high-frequency components of images, which are critical in identifying fine-grained patterns, while CNNs have difficulties in capturing long-range dependencies due to their local receptive fields, which makes it difficult to fully capture the spatial relationship across lung regions. Methods: In this paper, we proposed a hybrid architecture that integrates ViTs and CNNs within a modular component block(s) to leverage both local feature extraction and global context capture. In each component block, the CNN is used to extract the local features, which are then passed through the ViT to capture the global dependencies. We implemented a gated attention mechanism that combines the channel-, spatial-, and element-wise attention to selectively emphasize the important features, thereby enhancing overall feature representation. Furthermore, we incorporated a multi-scale fusion module (MSFM) in the proposed framework to fuse the features at different scales for more comprehensive feature representation. Results: Our proposed model achieved an accuracy of 99.50% in the classification of four pulmonary conditions. Conclusions: Through extensive experiments and ablation studies, we demonstrated the effectiveness of our approach in improving the medical image classification performance, while achieving good calibration results. This hybrid approach offers a promising framework for reliable and accurate disease diagnosis in medical imaging.https://www.mdpi.com/2075-4418/14/24/2790vision transformerconvolutional neural networkgated control mechanismmulti-scale fusion modulepulmonary diseases
spellingShingle	Okpala Chibuike Xiaopeng Yang Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification Diagnostics vision transformer convolutional neural network gated control mechanism multi-scale fusion module pulmonary diseases
title	Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification
title_full	Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification
title_fullStr	Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification
title_full_unstemmed	Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification
title_short	Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification
title_sort	convolutional neural network vision transformer architecture with gated control mechanism and multi scale fusion for enhanced pulmonary disease classification
topic	vision transformer convolutional neural network gated control mechanism multi-scale fusion module pulmonary diseases
url	https://www.mdpi.com/2075-4418/14/24/2790
work_keys_str_mv	AT okpalachibuike convolutionalneuralnetworkvisiontransformerarchitecturewithgatedcontrolmechanismandmultiscalefusionforenhancedpulmonarydiseaseclassification AT xiaopengyang convolutionalneuralnetworkvisiontransformerarchitecturewithgatedcontrolmechanismandmultiscalefusionforenhancedpulmonarydiseaseclassification

Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification

Similar Items