Local-Global Feature Extraction Network With Dynamic 3-D Convolution and Residual Attention Transformer for Hyperspectral Image Classification

Currently, convolutional neural network (CNN) and transformer-based hyperspectral image (HSI) classification methods have attracted significant attention owing to their effective feature representation capabilities. However, methods based on CNN pay insufficient attention to valuable pixels in 3-D H...

Full description

Saved in:
Bibliographic Details
Main Authors: Qiqiang Chen, Zhengyang Li, Junru Yin, Wei Huang, Tianming Zhan
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10946673/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849240455211909120
author Qiqiang Chen
Zhengyang Li
Junru Yin
Wei Huang
Tianming Zhan
author_facet Qiqiang Chen
Zhengyang Li
Junru Yin
Wei Huang
Tianming Zhan
author_sort Qiqiang Chen
collection DOAJ
description Currently, convolutional neural network (CNN) and transformer-based hyperspectral image (HSI) classification methods have attracted significant attention owing to their effective feature representation capabilities. However, methods based on CNN pay insufficient attention to valuable pixels in 3-D HSI samples and cannot adapt to variations in these samples. Transformer-based methods also suffer from high computational complexity and a tendency for low-level spatial-spectral features of the shallow attention layer to vanish as the number of attention layers increases. To address these issues, we proposed a local–global feature extraction network with dynamic 3-D convolution and residual attention transformer (LGDRNet). The LGDRNet primarily consists of multiscale 3-D conv, dynamic local feature extraction, residual global feature extraction, and feature fusion modules. Specifically, a multiscale 3-D conv module is used for low-level multiscale spectral information extraction. Then, the dynamic local feature extraction module utilizes dynamic 3-D convolution, which can adapt to different samples. This allows the network to focus on valuable pixels in 3-D samples. The residual global feature extraction module utilizes a convolutional projection unit and convolutional multihead self-attention to reduce computational complexity. It employs a residual attention connection to enable the network to effectively transmit and accumulate attention information across consecutive multihead attention layers. This prevents the vanishing of shallow spatial-spectral features. Finally, local and global HSI information may be efficiently integrated using the feature fusion module, which also improves performance during subsequent classification. The proposed model achieves overall classification accuracies of 89.24%, 92.01%, and 94.53% on three benchmark datasets, respectively, outperforming state-of-the-art approaches with limited training samples.
format Article
id doaj-art-9afad653585b46369ffcd02af0b4deb0
institution Kabale University
issn 1939-1404
2151-1535
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling doaj-art-9afad653585b46369ffcd02af0b4deb02025-08-20T04:00:34ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-011899861000110.1109/JSTARS.2025.355672210946673Local-Global Feature Extraction Network With Dynamic 3-D Convolution and Residual Attention Transformer for Hyperspectral Image ClassificationQiqiang Chen0https://orcid.org/0009-0005-5965-7853Zhengyang Li1Junru Yin2https://orcid.org/0000-0002-7101-1140Wei Huang3https://orcid.org/0000-0002-0095-1354Tianming Zhan4https://orcid.org/0000-0001-5030-3032School of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou, ChinaSchool of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou, ChinaSchool of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou, ChinaSchool of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou, ChinaJiangsu Modern Intelligent Audit Integrated Application Technology Engineering Research Center, School of Computer Science, Nanjing Audit University, Nanjing, ChinaCurrently, convolutional neural network (CNN) and transformer-based hyperspectral image (HSI) classification methods have attracted significant attention owing to their effective feature representation capabilities. However, methods based on CNN pay insufficient attention to valuable pixels in 3-D HSI samples and cannot adapt to variations in these samples. Transformer-based methods also suffer from high computational complexity and a tendency for low-level spatial-spectral features of the shallow attention layer to vanish as the number of attention layers increases. To address these issues, we proposed a local–global feature extraction network with dynamic 3-D convolution and residual attention transformer (LGDRNet). The LGDRNet primarily consists of multiscale 3-D conv, dynamic local feature extraction, residual global feature extraction, and feature fusion modules. Specifically, a multiscale 3-D conv module is used for low-level multiscale spectral information extraction. Then, the dynamic local feature extraction module utilizes dynamic 3-D convolution, which can adapt to different samples. This allows the network to focus on valuable pixels in 3-D samples. The residual global feature extraction module utilizes a convolutional projection unit and convolutional multihead self-attention to reduce computational complexity. It employs a residual attention connection to enable the network to effectively transmit and accumulate attention information across consecutive multihead attention layers. This prevents the vanishing of shallow spatial-spectral features. Finally, local and global HSI information may be efficiently integrated using the feature fusion module, which also improves performance during subsequent classification. The proposed model achieves overall classification accuracies of 89.24%, 92.01%, and 94.53% on three benchmark datasets, respectively, outperforming state-of-the-art approaches with limited training samples.https://ieeexplore.ieee.org/document/10946673/Dynamic 3-D convolutiondynamic local feature extractionhyperspectral image (HSI) classificationresidual attention transformerresidual global feature extraction
spellingShingle Qiqiang Chen
Zhengyang Li
Junru Yin
Wei Huang
Tianming Zhan
Local-Global Feature Extraction Network With Dynamic 3-D Convolution and Residual Attention Transformer for Hyperspectral Image Classification
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Dynamic 3-D convolution
dynamic local feature extraction
hyperspectral image (HSI) classification
residual attention transformer
residual global feature extraction
title Local-Global Feature Extraction Network With Dynamic 3-D Convolution and Residual Attention Transformer for Hyperspectral Image Classification
title_full Local-Global Feature Extraction Network With Dynamic 3-D Convolution and Residual Attention Transformer for Hyperspectral Image Classification
title_fullStr Local-Global Feature Extraction Network With Dynamic 3-D Convolution and Residual Attention Transformer for Hyperspectral Image Classification
title_full_unstemmed Local-Global Feature Extraction Network With Dynamic 3-D Convolution and Residual Attention Transformer for Hyperspectral Image Classification
title_short Local-Global Feature Extraction Network With Dynamic 3-D Convolution and Residual Attention Transformer for Hyperspectral Image Classification
title_sort local global feature extraction network with dynamic 3 d convolution and residual attention transformer for hyperspectral image classification
topic Dynamic 3-D convolution
dynamic local feature extraction
hyperspectral image (HSI) classification
residual attention transformer
residual global feature extraction
url https://ieeexplore.ieee.org/document/10946673/
work_keys_str_mv AT qiqiangchen localglobalfeatureextractionnetworkwithdynamic3dconvolutionandresidualattentiontransformerforhyperspectralimageclassification
AT zhengyangli localglobalfeatureextractionnetworkwithdynamic3dconvolutionandresidualattentiontransformerforhyperspectralimageclassification
AT junruyin localglobalfeatureextractionnetworkwithdynamic3dconvolutionandresidualattentiontransformerforhyperspectralimageclassification
AT weihuang localglobalfeatureextractionnetworkwithdynamic3dconvolutionandresidualattentiontransformerforhyperspectralimageclassification
AT tianmingzhan localglobalfeatureextractionnetworkwithdynamic3dconvolutionandresidualattentiontransformerforhyperspectralimageclassification