Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification

Background/Objectives: Vision Transformers (ViTs) and convolutional neural networks (CNNs) have demonstrated remarkable performances in image classification, especially in the domain of medical imaging analysis. However, ViTs struggle to capture high-frequency components of images, which are critica...

Full description

Saved in:
Bibliographic Details
Main Authors: Okpala Chibuike, Xiaopeng Yang
Format: Article
Language:English
Published: MDPI AG 2024-12-01
Series:Diagnostics
Subjects:
Online Access:https://www.mdpi.com/2075-4418/14/24/2790
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846105098870063104
author Okpala Chibuike
Xiaopeng Yang
author_facet Okpala Chibuike
Xiaopeng Yang
author_sort Okpala Chibuike
collection DOAJ
description Background/Objectives: Vision Transformers (ViTs) and convolutional neural networks (CNNs) have demonstrated remarkable performances in image classification, especially in the domain of medical imaging analysis. However, ViTs struggle to capture high-frequency components of images, which are critical in identifying fine-grained patterns, while CNNs have difficulties in capturing long-range dependencies due to their local receptive fields, which makes it difficult to fully capture the spatial relationship across lung regions. Methods: In this paper, we proposed a hybrid architecture that integrates ViTs and CNNs within a modular component block(s) to leverage both local feature extraction and global context capture. In each component block, the CNN is used to extract the local features, which are then passed through the ViT to capture the global dependencies. We implemented a gated attention mechanism that combines the channel-, spatial-, and element-wise attention to selectively emphasize the important features, thereby enhancing overall feature representation. Furthermore, we incorporated a multi-scale fusion module (MSFM) in the proposed framework to fuse the features at different scales for more comprehensive feature representation. Results: Our proposed model achieved an accuracy of 99.50% in the classification of four pulmonary conditions. Conclusions: Through extensive experiments and ablation studies, we demonstrated the effectiveness of our approach in improving the medical image classification performance, while achieving good calibration results. This hybrid approach offers a promising framework for reliable and accurate disease diagnosis in medical imaging.
format Article
id doaj-art-dce1f07a66e445b0a863032923a6b789
institution Kabale University
issn 2075-4418
language English
publishDate 2024-12-01
publisher MDPI AG
record_format Article
series Diagnostics
spelling doaj-art-dce1f07a66e445b0a863032923a6b7892024-12-27T14:20:45ZengMDPI AGDiagnostics2075-44182024-12-011424279010.3390/diagnostics14242790Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease ClassificationOkpala Chibuike0Xiaopeng Yang1Department of Human Ecology & Technology, Handong Global University, Pohang 37554, Republic of KoreaDepartment of Human Ecology & Technology, Handong Global University, Pohang 37554, Republic of KoreaBackground/Objectives: Vision Transformers (ViTs) and convolutional neural networks (CNNs) have demonstrated remarkable performances in image classification, especially in the domain of medical imaging analysis. However, ViTs struggle to capture high-frequency components of images, which are critical in identifying fine-grained patterns, while CNNs have difficulties in capturing long-range dependencies due to their local receptive fields, which makes it difficult to fully capture the spatial relationship across lung regions. Methods: In this paper, we proposed a hybrid architecture that integrates ViTs and CNNs within a modular component block(s) to leverage both local feature extraction and global context capture. In each component block, the CNN is used to extract the local features, which are then passed through the ViT to capture the global dependencies. We implemented a gated attention mechanism that combines the channel-, spatial-, and element-wise attention to selectively emphasize the important features, thereby enhancing overall feature representation. Furthermore, we incorporated a multi-scale fusion module (MSFM) in the proposed framework to fuse the features at different scales for more comprehensive feature representation. Results: Our proposed model achieved an accuracy of 99.50% in the classification of four pulmonary conditions. Conclusions: Through extensive experiments and ablation studies, we demonstrated the effectiveness of our approach in improving the medical image classification performance, while achieving good calibration results. This hybrid approach offers a promising framework for reliable and accurate disease diagnosis in medical imaging.https://www.mdpi.com/2075-4418/14/24/2790vision transformerconvolutional neural networkgated control mechanismmulti-scale fusion modulepulmonary diseases
spellingShingle Okpala Chibuike
Xiaopeng Yang
Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification
Diagnostics
vision transformer
convolutional neural network
gated control mechanism
multi-scale fusion module
pulmonary diseases
title Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification
title_full Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification
title_fullStr Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification
title_full_unstemmed Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification
title_short Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification
title_sort convolutional neural network vision transformer architecture with gated control mechanism and multi scale fusion for enhanced pulmonary disease classification
topic vision transformer
convolutional neural network
gated control mechanism
multi-scale fusion module
pulmonary diseases
url https://www.mdpi.com/2075-4418/14/24/2790
work_keys_str_mv AT okpalachibuike convolutionalneuralnetworkvisiontransformerarchitecturewithgatedcontrolmechanismandmultiscalefusionforenhancedpulmonarydiseaseclassification
AT xiaopengyang convolutionalneuralnetworkvisiontransformerarchitecturewithgatedcontrolmechanismandmultiscalefusionforenhancedpulmonarydiseaseclassification