Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification

Background/Objectives: Vision Transformers (ViTs) and convolutional neural networks (CNNs) have demonstrated remarkable performances in image classification, especially in the domain of medical imaging analysis. However, ViTs struggle to capture high-frequency components of images, which are critica...

Full description

Saved in:

Bibliographic Details
Main Authors:	Okpala Chibuike, Xiaopeng Yang
Format:	Article
Language:	English
Published:	MDPI AG 2024-12-01
Series:	Diagnostics
Subjects:	vision transformer convolutional neural network gated control mechanism multi-scale fusion module pulmonary diseases
Online Access:	https://www.mdpi.com/2075-4418/14/24/2790
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Background/Objectives: Vision Transformers (ViTs) and convolutional neural networks (CNNs) have demonstrated remarkable performances in image classification, especially in the domain of medical imaging analysis. However, ViTs struggle to capture high-frequency components of images, which are critical in identifying fine-grained patterns, while CNNs have difficulties in capturing long-range dependencies due to their local receptive fields, which makes it difficult to fully capture the spatial relationship across lung regions. Methods: In this paper, we proposed a hybrid architecture that integrates ViTs and CNNs within a modular component block(s) to leverage both local feature extraction and global context capture. In each component block, the CNN is used to extract the local features, which are then passed through the ViT to capture the global dependencies. We implemented a gated attention mechanism that combines the channel-, spatial-, and element-wise attention to selectively emphasize the important features, thereby enhancing overall feature representation. Furthermore, we incorporated a multi-scale fusion module (MSFM) in the proposed framework to fuse the features at different scales for more comprehensive feature representation. Results: Our proposed model achieved an accuracy of 99.50% in the classification of four pulmonary conditions. Conclusions: Through extensive experiments and ablation studies, we demonstrated the effectiveness of our approach in improving the medical image classification performance, while achieving good calibration results. This hybrid approach offers a promising framework for reliable and accurate disease diagnosis in medical imaging.
ISSN:	2075-4418

Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification

Similar Items