A Hierarchical Graph-Enhanced Transformer Network for Remote Sensing Scene Classification

Remote sensing scene classification (RSSC) is essential in Earth observation, with applications in land use, environmental status, urban development, and disaster risk assessment. However, redundant background interference, varying feature scales, and high interclass similarity in remote sensing ima...

Full description

Saved in:
Bibliographic Details
Main Authors: Ziwei Li, Weiming Xu, Shiyu Yang, Juan Wang, Hua Su, Zhanchao Huang, Sheng Wu
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10742489/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841533428173897728
author Ziwei Li
Weiming Xu
Shiyu Yang
Juan Wang
Hua Su
Zhanchao Huang
Sheng Wu
author_facet Ziwei Li
Weiming Xu
Shiyu Yang
Juan Wang
Hua Su
Zhanchao Huang
Sheng Wu
author_sort Ziwei Li
collection DOAJ
description Remote sensing scene classification (RSSC) is essential in Earth observation, with applications in land use, environmental status, urban development, and disaster risk assessment. However, redundant background interference, varying feature scales, and high interclass similarity in remote sensing images present significant challenges for RSSC. To address these challenges, this article proposes a novel hierarchical graph-enhanced transformer network (HGTNet) for RSSC. Initially, we introduce a dual attention (DA) module, which extracts key feature information from both the channel and spatial domains, effectively suppressing background noise. Subsequently, we meticulously design a three-stage hierarchical transformer extractor, incorporating a DA module at the bottleneck of each stage to facilitate information exchange between different stages, in conjunction with the Swin transformer block to capture multiscale global visual information. Moreover, we develop a fine-grained graph neural network extractor that constructs the spatial topological relationships of pixel-level scene images, thereby aiding in the discrimination of similar complex scene categories. Finally, the visual features and spatial structural features are fully integrated and input into the classifier by employing skip connections. HGTNet achieves classification accuracies of 98.47%, 95.75%, and 96.33% on the aerial image, NWPU-RESISC45, and OPTIMAL-31 datasets, respectively, demonstrating superior performance compared to other state-of-the-art models. Extensive experimental results indicate that our proposed method effectively learns critical multiscale visual features and distinguishes between similar complex scenes, thereby significantly enhancing the accuracy of RSSC.
format Article
id doaj-art-9c8c6aa3dcee4ef2b672573bbc2baf23
institution Kabale University
issn 1939-1404
2151-1535
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling doaj-art-9c8c6aa3dcee4ef2b672573bbc2baf232025-01-16T00:00:20ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352024-01-0117203152033010.1109/JSTARS.2024.349133510742489A Hierarchical Graph-Enhanced Transformer Network for Remote Sensing Scene ClassificationZiwei Li0https://orcid.org/0009-0002-8540-8728Weiming Xu1https://orcid.org/0009-0002-0002-9391Shiyu Yang2https://orcid.org/0009-0007-3725-9648Juan Wang3Hua Su4https://orcid.org/0000-0003-0280-3926Zhanchao Huang5https://orcid.org/0000-0001-5522-283XSheng Wu6Key Laboratory of Spatial Data Mining and Information Sharing Ministry of Education, National & Local Joint Engineering Research Center of Satellite Geospatial Information Technology, The Academy of Digital China, Fuzhou University, Fuzhou, ChinaKey Laboratory of Spatial Data Mining and Information Sharing Ministry of Education, National & Local Joint Engineering Research Center of Satellite Geospatial Information Technology, The Academy of Digital China, The Digital Economy Alliance of Fujian, Fuzhou University, Fuzhou, ChinaKey Laboratory of Spatial Data Mining and Information Sharing Ministry of Education, National & Local Joint Engineering Research Center of Satellite Geospatial Information Technology, The Academy of Digital China, Fuzhou University, Fuzhou, ChinaKey Laboratory of Spatial Data Mining and Information Sharing Ministry of Education, National & Local Joint Engineering Research Center of Satellite Geospatial Information Technology, The Academy of Digital China, Fuzhou University, Fuzhou, ChinaKey Laboratory of Spatial Data Mining and Information Sharing Ministry of Education, National & Local Joint Engineering Research Center of Satellite Geospatial Information Technology, The Academy of Digital China, Fuzhou University, Fuzhou, ChinaKey Laboratory of Spatial Data Mining and Information Sharing Ministry of Education, National & Local Joint Engineering Research Center of Satellite Geospatial Information Technology, The Academy of Digital China, Fuzhou University, Fuzhou, ChinaKey Laboratory of Spatial Data Mining and Information Sharing Ministry of Education, National & Local Joint Engineering Research Center of Satellite Geospatial Information Technology, The Academy of Digital China, The Digital Economy Alliance of Fujian, Fuzhou University, Fuzhou, ChinaRemote sensing scene classification (RSSC) is essential in Earth observation, with applications in land use, environmental status, urban development, and disaster risk assessment. However, redundant background interference, varying feature scales, and high interclass similarity in remote sensing images present significant challenges for RSSC. To address these challenges, this article proposes a novel hierarchical graph-enhanced transformer network (HGTNet) for RSSC. Initially, we introduce a dual attention (DA) module, which extracts key feature information from both the channel and spatial domains, effectively suppressing background noise. Subsequently, we meticulously design a three-stage hierarchical transformer extractor, incorporating a DA module at the bottleneck of each stage to facilitate information exchange between different stages, in conjunction with the Swin transformer block to capture multiscale global visual information. Moreover, we develop a fine-grained graph neural network extractor that constructs the spatial topological relationships of pixel-level scene images, thereby aiding in the discrimination of similar complex scene categories. Finally, the visual features and spatial structural features are fully integrated and input into the classifier by employing skip connections. HGTNet achieves classification accuracies of 98.47%, 95.75%, and 96.33% on the aerial image, NWPU-RESISC45, and OPTIMAL-31 datasets, respectively, demonstrating superior performance compared to other state-of-the-art models. Extensive experimental results indicate that our proposed method effectively learns critical multiscale visual features and distinguishes between similar complex scenes, thereby significantly enhancing the accuracy of RSSC.https://ieeexplore.ieee.org/document/10742489/Attention mechanismgraph neural network (GNN)remote sensing scene classification (RSSC)spatial structural featuretransformer
spellingShingle Ziwei Li
Weiming Xu
Shiyu Yang
Juan Wang
Hua Su
Zhanchao Huang
Sheng Wu
A Hierarchical Graph-Enhanced Transformer Network for Remote Sensing Scene Classification
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Attention mechanism
graph neural network (GNN)
remote sensing scene classification (RSSC)
spatial structural feature
transformer
title A Hierarchical Graph-Enhanced Transformer Network for Remote Sensing Scene Classification
title_full A Hierarchical Graph-Enhanced Transformer Network for Remote Sensing Scene Classification
title_fullStr A Hierarchical Graph-Enhanced Transformer Network for Remote Sensing Scene Classification
title_full_unstemmed A Hierarchical Graph-Enhanced Transformer Network for Remote Sensing Scene Classification
title_short A Hierarchical Graph-Enhanced Transformer Network for Remote Sensing Scene Classification
title_sort hierarchical graph enhanced transformer network for remote sensing scene classification
topic Attention mechanism
graph neural network (GNN)
remote sensing scene classification (RSSC)
spatial structural feature
transformer
url https://ieeexplore.ieee.org/document/10742489/
work_keys_str_mv AT ziweili ahierarchicalgraphenhancedtransformernetworkforremotesensingsceneclassification
AT weimingxu ahierarchicalgraphenhancedtransformernetworkforremotesensingsceneclassification
AT shiyuyang ahierarchicalgraphenhancedtransformernetworkforremotesensingsceneclassification
AT juanwang ahierarchicalgraphenhancedtransformernetworkforremotesensingsceneclassification
AT huasu ahierarchicalgraphenhancedtransformernetworkforremotesensingsceneclassification
AT zhanchaohuang ahierarchicalgraphenhancedtransformernetworkforremotesensingsceneclassification
AT shengwu ahierarchicalgraphenhancedtransformernetworkforremotesensingsceneclassification
AT ziweili hierarchicalgraphenhancedtransformernetworkforremotesensingsceneclassification
AT weimingxu hierarchicalgraphenhancedtransformernetworkforremotesensingsceneclassification
AT shiyuyang hierarchicalgraphenhancedtransformernetworkforremotesensingsceneclassification
AT juanwang hierarchicalgraphenhancedtransformernetworkforremotesensingsceneclassification
AT huasu hierarchicalgraphenhancedtransformernetworkforremotesensingsceneclassification
AT zhanchaohuang hierarchicalgraphenhancedtransformernetworkforremotesensingsceneclassification
AT shengwu hierarchicalgraphenhancedtransformernetworkforremotesensingsceneclassification