Multiple Hierarchical Cross-Scale Transformer for Remote Sensing Scene Classification
The Transformer model can capture global contextual information but does not have an inherent inductive bias. In contrast, convolutional neural networks (CNNs) are highly praised in computer vision due to their strong inductive bias and local spatial correlation. To combine the advantages of the two...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-12-01
|
Series: | Remote Sensing |
Subjects: | |
Online Access: | https://www.mdpi.com/2072-4292/17/1/42 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841548962752888832 |
---|---|
author | Dan Zhang Wenping Ma Licheng Jiao Xu Liu Yuting Yang Fang Liu |
author_facet | Dan Zhang Wenping Ma Licheng Jiao Xu Liu Yuting Yang Fang Liu |
author_sort | Dan Zhang |
collection | DOAJ |
description | The Transformer model can capture global contextual information but does not have an inherent inductive bias. In contrast, convolutional neural networks (CNNs) are highly praised in computer vision due to their strong inductive bias and local spatial correlation. To combine the advantages of the two model types, we propose a multiple hierarchical cross-scale Transformer model that efficiently combines the Transformer model with CNNs and is specifically designed for complex remote sensing scene classification. Firstly, a feature pyramid network with attention aggregation extracts the multi-scale base features. Then, these base features are fed into the proposed multi-scale channel Transformer (MSCT) module to derive the global features with channel-wise attention. Additionally, the base features are also fed into the proposed hierarchical cross-scale Transformer (HCST) module, which can obtain multi-level cross-scale representations. Lastly, the outputs from both modules are taken into account to calculate the final classification score. The performance of the proposed method has been validated for its effectiveness on three public datasets: AID, UCM, and NWPU-RESISC45. |
format | Article |
id | doaj-art-042986599bf744e682eb558d2215fc96 |
institution | Kabale University |
issn | 2072-4292 |
language | English |
publishDate | 2024-12-01 |
publisher | MDPI AG |
record_format | Article |
series | Remote Sensing |
spelling | doaj-art-042986599bf744e682eb558d2215fc962025-01-10T13:20:02ZengMDPI AGRemote Sensing2072-42922024-12-011714210.3390/rs17010042Multiple Hierarchical Cross-Scale Transformer for Remote Sensing Scene ClassificationDan Zhang0Wenping Ma1Licheng Jiao2Xu Liu3Yuting Yang4Fang Liu5Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, Xidian University, Xi’an 710071, ChinaKey Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, Xidian University, Xi’an 710071, ChinaKey Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, Xidian University, Xi’an 710071, ChinaKey Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, Xidian University, Xi’an 710071, ChinaKey Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, Xidian University, Xi’an 710071, ChinaKey Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, Xidian University, Xi’an 710071, ChinaThe Transformer model can capture global contextual information but does not have an inherent inductive bias. In contrast, convolutional neural networks (CNNs) are highly praised in computer vision due to their strong inductive bias and local spatial correlation. To combine the advantages of the two model types, we propose a multiple hierarchical cross-scale Transformer model that efficiently combines the Transformer model with CNNs and is specifically designed for complex remote sensing scene classification. Firstly, a feature pyramid network with attention aggregation extracts the multi-scale base features. Then, these base features are fed into the proposed multi-scale channel Transformer (MSCT) module to derive the global features with channel-wise attention. Additionally, the base features are also fed into the proposed hierarchical cross-scale Transformer (HCST) module, which can obtain multi-level cross-scale representations. Lastly, the outputs from both modules are taken into account to calculate the final classification score. The performance of the proposed method has been validated for its effectiveness on three public datasets: AID, UCM, and NWPU-RESISC45.https://www.mdpi.com/2072-4292/17/1/42hierarchicalmultiple cross-scaleTransformerscene classification |
spellingShingle | Dan Zhang Wenping Ma Licheng Jiao Xu Liu Yuting Yang Fang Liu Multiple Hierarchical Cross-Scale Transformer for Remote Sensing Scene Classification Remote Sensing hierarchical multiple cross-scale Transformer scene classification |
title | Multiple Hierarchical Cross-Scale Transformer for Remote Sensing Scene Classification |
title_full | Multiple Hierarchical Cross-Scale Transformer for Remote Sensing Scene Classification |
title_fullStr | Multiple Hierarchical Cross-Scale Transformer for Remote Sensing Scene Classification |
title_full_unstemmed | Multiple Hierarchical Cross-Scale Transformer for Remote Sensing Scene Classification |
title_short | Multiple Hierarchical Cross-Scale Transformer for Remote Sensing Scene Classification |
title_sort | multiple hierarchical cross scale transformer for remote sensing scene classification |
topic | hierarchical multiple cross-scale Transformer scene classification |
url | https://www.mdpi.com/2072-4292/17/1/42 |
work_keys_str_mv | AT danzhang multiplehierarchicalcrossscaletransformerforremotesensingsceneclassification AT wenpingma multiplehierarchicalcrossscaletransformerforremotesensingsceneclassification AT lichengjiao multiplehierarchicalcrossscaletransformerforremotesensingsceneclassification AT xuliu multiplehierarchicalcrossscaletransformerforremotesensingsceneclassification AT yutingyang multiplehierarchicalcrossscaletransformerforremotesensingsceneclassification AT fangliu multiplehierarchicalcrossscaletransformerforremotesensingsceneclassification |