Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification
Background/Objectives: Vision Transformers (ViTs) and convolutional neural networks (CNNs) have demonstrated remarkable performances in image classification, especially in the domain of medical imaging analysis. However, ViTs struggle to capture high-frequency components of images, which are critica...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2024-12-01
|
| Series: | Diagnostics |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2075-4418/14/24/2790 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1846105098870063104 |
|---|---|
| author | Okpala Chibuike Xiaopeng Yang |
| author_facet | Okpala Chibuike Xiaopeng Yang |
| author_sort | Okpala Chibuike |
| collection | DOAJ |
| description | Background/Objectives: Vision Transformers (ViTs) and convolutional neural networks (CNNs) have demonstrated remarkable performances in image classification, especially in the domain of medical imaging analysis. However, ViTs struggle to capture high-frequency components of images, which are critical in identifying fine-grained patterns, while CNNs have difficulties in capturing long-range dependencies due to their local receptive fields, which makes it difficult to fully capture the spatial relationship across lung regions. Methods: In this paper, we proposed a hybrid architecture that integrates ViTs and CNNs within a modular component block(s) to leverage both local feature extraction and global context capture. In each component block, the CNN is used to extract the local features, which are then passed through the ViT to capture the global dependencies. We implemented a gated attention mechanism that combines the channel-, spatial-, and element-wise attention to selectively emphasize the important features, thereby enhancing overall feature representation. Furthermore, we incorporated a multi-scale fusion module (MSFM) in the proposed framework to fuse the features at different scales for more comprehensive feature representation. Results: Our proposed model achieved an accuracy of 99.50% in the classification of four pulmonary conditions. Conclusions: Through extensive experiments and ablation studies, we demonstrated the effectiveness of our approach in improving the medical image classification performance, while achieving good calibration results. This hybrid approach offers a promising framework for reliable and accurate disease diagnosis in medical imaging. |
| format | Article |
| id | doaj-art-dce1f07a66e445b0a863032923a6b789 |
| institution | Kabale University |
| issn | 2075-4418 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Diagnostics |
| spelling | doaj-art-dce1f07a66e445b0a863032923a6b7892024-12-27T14:20:45ZengMDPI AGDiagnostics2075-44182024-12-011424279010.3390/diagnostics14242790Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease ClassificationOkpala Chibuike0Xiaopeng Yang1Department of Human Ecology & Technology, Handong Global University, Pohang 37554, Republic of KoreaDepartment of Human Ecology & Technology, Handong Global University, Pohang 37554, Republic of KoreaBackground/Objectives: Vision Transformers (ViTs) and convolutional neural networks (CNNs) have demonstrated remarkable performances in image classification, especially in the domain of medical imaging analysis. However, ViTs struggle to capture high-frequency components of images, which are critical in identifying fine-grained patterns, while CNNs have difficulties in capturing long-range dependencies due to their local receptive fields, which makes it difficult to fully capture the spatial relationship across lung regions. Methods: In this paper, we proposed a hybrid architecture that integrates ViTs and CNNs within a modular component block(s) to leverage both local feature extraction and global context capture. In each component block, the CNN is used to extract the local features, which are then passed through the ViT to capture the global dependencies. We implemented a gated attention mechanism that combines the channel-, spatial-, and element-wise attention to selectively emphasize the important features, thereby enhancing overall feature representation. Furthermore, we incorporated a multi-scale fusion module (MSFM) in the proposed framework to fuse the features at different scales for more comprehensive feature representation. Results: Our proposed model achieved an accuracy of 99.50% in the classification of four pulmonary conditions. Conclusions: Through extensive experiments and ablation studies, we demonstrated the effectiveness of our approach in improving the medical image classification performance, while achieving good calibration results. This hybrid approach offers a promising framework for reliable and accurate disease diagnosis in medical imaging.https://www.mdpi.com/2075-4418/14/24/2790vision transformerconvolutional neural networkgated control mechanismmulti-scale fusion modulepulmonary diseases |
| spellingShingle | Okpala Chibuike Xiaopeng Yang Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification Diagnostics vision transformer convolutional neural network gated control mechanism multi-scale fusion module pulmonary diseases |
| title | Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification |
| title_full | Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification |
| title_fullStr | Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification |
| title_full_unstemmed | Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification |
| title_short | Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification |
| title_sort | convolutional neural network vision transformer architecture with gated control mechanism and multi scale fusion for enhanced pulmonary disease classification |
| topic | vision transformer convolutional neural network gated control mechanism multi-scale fusion module pulmonary diseases |
| url | https://www.mdpi.com/2075-4418/14/24/2790 |
| work_keys_str_mv | AT okpalachibuike convolutionalneuralnetworkvisiontransformerarchitecturewithgatedcontrolmechanismandmultiscalefusionforenhancedpulmonarydiseaseclassification AT xiaopengyang convolutionalneuralnetworkvisiontransformerarchitecturewithgatedcontrolmechanismandmultiscalefusionforenhancedpulmonarydiseaseclassification |