Dynamic atrous attention and dual branch context fusion for cross scale Building segmentation in high resolution remote sensing imagery

Abstract Building segmentation of high-resolution remote sensing images using deep learning effectively reduces labor costs, but still faces the key challenges of effectively modeling cross-scale contextual relationships and preserving fine spatial details. Current Transformer-based approaches demon...

Full description

Saved in:
Bibliographic Details
Main Authors: Yaohui Liu, Shuzhe Zhang, Xinkai Wang, Rui Zhai, Hu Jiang, Lingjia Kong
Format: Article
Language:English
Published: Nature Portfolio 2025-08-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-14751-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849226270040129536
author Yaohui Liu
Shuzhe Zhang
Xinkai Wang
Rui Zhai
Hu Jiang
Lingjia Kong
author_facet Yaohui Liu
Shuzhe Zhang
Xinkai Wang
Rui Zhai
Hu Jiang
Lingjia Kong
author_sort Yaohui Liu
collection DOAJ
description Abstract Building segmentation of high-resolution remote sensing images using deep learning effectively reduces labor costs, but still faces the key challenges of effectively modeling cross-scale contextual relationships and preserving fine spatial details. Current Transformer-based approaches demonstrate superior long-range dependency modeling, but still suffer from the problem of progressive information loss during hierarchical feature encoding. Therefore, this study proposed a new semantic segmentation network named SegTDformer to extract buildings in remote sensing images. We designed a Dynamic Atrous Attention (DAA) fusion module that integrated multi-scale features from Transformer, constructing an information exchange between global information and local representational information. Among them, we introduced the Shift Operation module and the Self-Attention module, which adopt a dual-branch structure to respectively capture local spatial dependencies and global correlations, and perform weight coupling to achieve highly complementary contextual information fusion. Furthermore, we fused triplet attention with depth-wise separable convolutions, reducing computational requirements and mitigating potential overfitting scenarios. We benchmarked the model on three different datasets, including Massachusetts, INRIA, and WHU, and the results show that the model consistently outperforms existing models. Notably, on the Massachusetts dataset, the SegTDformer model achieved benchmark in mIoU, F1-score, and Overall Accuracy of 75.47%, 84.7%, and 94.61%, respectively, superseding other deep learning models. The proposed SegTDformer model exhibits enhanced precision in the extraction of urban structures from intricate environments and manifests a diminished rate of both omission and internal misclassification errors, particularly within the context of diminutive and expansive edifices.
format Article
id doaj-art-4cbb9ac3e15c4748bfb855019867a70f
institution Kabale University
issn 2045-2322
language English
publishDate 2025-08-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-4cbb9ac3e15c4748bfb855019867a70f2025-08-24T11:26:43ZengNature PortfolioScientific Reports2045-23222025-08-0115111610.1038/s41598-025-14751-0Dynamic atrous attention and dual branch context fusion for cross scale Building segmentation in high resolution remote sensing imageryYaohui Liu0Shuzhe Zhang1Xinkai Wang2Rui Zhai3Hu Jiang4Lingjia Kong5School of Surveying and Geo-Informatics, Shandong Jianzhu UniversitySchool of Surveying and Geo-Informatics, Shandong Jianzhu UniversitySchool of Surveying and Geo-Informatics, Shandong Jianzhu UniversityChina Unicom Shandong BranchSchool of Surveying and Geo-Informatics, Shandong Jianzhu UniversitySchool of Mining Engineering, Heilongjiang University of Science and TechnologyAbstract Building segmentation of high-resolution remote sensing images using deep learning effectively reduces labor costs, but still faces the key challenges of effectively modeling cross-scale contextual relationships and preserving fine spatial details. Current Transformer-based approaches demonstrate superior long-range dependency modeling, but still suffer from the problem of progressive information loss during hierarchical feature encoding. Therefore, this study proposed a new semantic segmentation network named SegTDformer to extract buildings in remote sensing images. We designed a Dynamic Atrous Attention (DAA) fusion module that integrated multi-scale features from Transformer, constructing an information exchange between global information and local representational information. Among them, we introduced the Shift Operation module and the Self-Attention module, which adopt a dual-branch structure to respectively capture local spatial dependencies and global correlations, and perform weight coupling to achieve highly complementary contextual information fusion. Furthermore, we fused triplet attention with depth-wise separable convolutions, reducing computational requirements and mitigating potential overfitting scenarios. We benchmarked the model on three different datasets, including Massachusetts, INRIA, and WHU, and the results show that the model consistently outperforms existing models. Notably, on the Massachusetts dataset, the SegTDformer model achieved benchmark in mIoU, F1-score, and Overall Accuracy of 75.47%, 84.7%, and 94.61%, respectively, superseding other deep learning models. The proposed SegTDformer model exhibits enhanced precision in the extraction of urban structures from intricate environments and manifests a diminished rate of both omission and internal misclassification errors, particularly within the context of diminutive and expansive edifices.https://doi.org/10.1038/s41598-025-14751-0Remote sensingBuilding segmentationTransformerAttention mechanismDynamic atrous attention
spellingShingle Yaohui Liu
Shuzhe Zhang
Xinkai Wang
Rui Zhai
Hu Jiang
Lingjia Kong
Dynamic atrous attention and dual branch context fusion for cross scale Building segmentation in high resolution remote sensing imagery
Scientific Reports
Remote sensing
Building segmentation
Transformer
Attention mechanism
Dynamic atrous attention
title Dynamic atrous attention and dual branch context fusion for cross scale Building segmentation in high resolution remote sensing imagery
title_full Dynamic atrous attention and dual branch context fusion for cross scale Building segmentation in high resolution remote sensing imagery
title_fullStr Dynamic atrous attention and dual branch context fusion for cross scale Building segmentation in high resolution remote sensing imagery
title_full_unstemmed Dynamic atrous attention and dual branch context fusion for cross scale Building segmentation in high resolution remote sensing imagery
title_short Dynamic atrous attention and dual branch context fusion for cross scale Building segmentation in high resolution remote sensing imagery
title_sort dynamic atrous attention and dual branch context fusion for cross scale building segmentation in high resolution remote sensing imagery
topic Remote sensing
Building segmentation
Transformer
Attention mechanism
Dynamic atrous attention
url https://doi.org/10.1038/s41598-025-14751-0
work_keys_str_mv AT yaohuiliu dynamicatrousattentionanddualbranchcontextfusionforcrossscalebuildingsegmentationinhighresolutionremotesensingimagery
AT shuzhezhang dynamicatrousattentionanddualbranchcontextfusionforcrossscalebuildingsegmentationinhighresolutionremotesensingimagery
AT xinkaiwang dynamicatrousattentionanddualbranchcontextfusionforcrossscalebuildingsegmentationinhighresolutionremotesensingimagery
AT ruizhai dynamicatrousattentionanddualbranchcontextfusionforcrossscalebuildingsegmentationinhighresolutionremotesensingimagery
AT hujiang dynamicatrousattentionanddualbranchcontextfusionforcrossscalebuildingsegmentationinhighresolutionremotesensingimagery
AT lingjiakong dynamicatrousattentionanddualbranchcontextfusionforcrossscalebuildingsegmentationinhighresolutionremotesensingimagery