RTDU: Interpretable Region-Aware Transformer-Based Deep Unfolding Network for Pan-Sharpening

Pan-sharpening is a fundamental and critical image processing method in remote sensing imaging. This technique reconstructs high-resolution multispectral (HRMS) images by leveraging the spectral information of low-resolution multispectral (LRMS) images and the spatial features of panchromatic (PAN)...

Full description

Saved in:
Bibliographic Details
Main Authors: Shuo Wang, Genji Yuan, Jinjiang Li
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11005626/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Pan-sharpening is a fundamental and critical image processing method in remote sensing imaging. This technique reconstructs high-resolution multispectral (HRMS) images by leveraging the spectral information of low-resolution multispectral (LRMS) images and the spatial features of panchromatic (PAN) images (PAN). In recent years, various DL-based methods have emerged, achieving remarkable results. For instance, the vision transformer (ViT) has been applied in image processing, showing excellent performance in global information modeling, but still losing significant local information. To better preserve local details and integrate them with global information, we propose a region-aware transformer for the deep unfolding network for pan-sharpening. Specifically, we propose three optimization problems for pan-sharpening and design a region-aware transformer block (RA-Block) within the unfolding algorithm framework to model both global and local information for the proposed optimization problems, redefining the bottleneck. Additionally, we designed a triple-dimensional information perception (TDIP) module to address feature residuals among complementary channels. We integrate these components into the optimization framework, forming the unfolding network. All parameters in RTDU are adaptively learnable, providing our network with strong interpretability. We conducted experiments on the QuickBird (QB) and WorldView-2 (WV2) datasets. Through qualitative analysis and quantitative comparisons, we demonstrate the superiority of our proposed method. Ablation experiments further validate the effectiveness of our designed modules.
ISSN:1939-1404
2151-1535