Dynamic atrous attention and dual branch context fusion for cross scale Building segmentation in high resolution remote sensing imagery

Abstract Building segmentation of high-resolution remote sensing images using deep learning effectively reduces labor costs, but still faces the key challenges of effectively modeling cross-scale contextual relationships and preserving fine spatial details. Current Transformer-based approaches demon...

Full description

Saved in:
Bibliographic Details
Main Authors: Yaohui Liu, Shuzhe Zhang, Xinkai Wang, Rui Zhai, Hu Jiang, Lingjia Kong
Format: Article
Language:English
Published: Nature Portfolio 2025-08-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-14751-0
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Building segmentation of high-resolution remote sensing images using deep learning effectively reduces labor costs, but still faces the key challenges of effectively modeling cross-scale contextual relationships and preserving fine spatial details. Current Transformer-based approaches demonstrate superior long-range dependency modeling, but still suffer from the problem of progressive information loss during hierarchical feature encoding. Therefore, this study proposed a new semantic segmentation network named SegTDformer to extract buildings in remote sensing images. We designed a Dynamic Atrous Attention (DAA) fusion module that integrated multi-scale features from Transformer, constructing an information exchange between global information and local representational information. Among them, we introduced the Shift Operation module and the Self-Attention module, which adopt a dual-branch structure to respectively capture local spatial dependencies and global correlations, and perform weight coupling to achieve highly complementary contextual information fusion. Furthermore, we fused triplet attention with depth-wise separable convolutions, reducing computational requirements and mitigating potential overfitting scenarios. We benchmarked the model on three different datasets, including Massachusetts, INRIA, and WHU, and the results show that the model consistently outperforms existing models. Notably, on the Massachusetts dataset, the SegTDformer model achieved benchmark in mIoU, F1-score, and Overall Accuracy of 75.47%, 84.7%, and 94.61%, respectively, superseding other deep learning models. The proposed SegTDformer model exhibits enhanced precision in the extraction of urban structures from intricate environments and manifests a diminished rate of both omission and internal misclassification errors, particularly within the context of diminutive and expansive edifices.
ISSN:2045-2322