Dynamic atrous attention and dual branch context fusion for cross scale Building segmentation in high resolution remote sensing imagery

Abstract Building segmentation of high-resolution remote sensing images using deep learning effectively reduces labor costs, but still faces the key challenges of effectively modeling cross-scale contextual relationships and preserving fine spatial details. Current Transformer-based approaches demon...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yaohui Liu, Shuzhe Zhang, Xinkai Wang, Rui Zhai, Hu Jiang, Lingjia Kong
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-08-01
Series:	Scientific Reports
Subjects:	Remote sensing Building segmentation Transformer Attention mechanism Dynamic atrous attention
Online Access:	https://doi.org/10.1038/s41598-025-14751-0
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849226270040129536
author	Yaohui Liu Shuzhe Zhang Xinkai Wang Rui Zhai Hu Jiang Lingjia Kong
author_facet	Yaohui Liu Shuzhe Zhang Xinkai Wang Rui Zhai Hu Jiang Lingjia Kong
author_sort	Yaohui Liu
collection	DOAJ
description	Abstract Building segmentation of high-resolution remote sensing images using deep learning effectively reduces labor costs, but still faces the key challenges of effectively modeling cross-scale contextual relationships and preserving fine spatial details. Current Transformer-based approaches demonstrate superior long-range dependency modeling, but still suffer from the problem of progressive information loss during hierarchical feature encoding. Therefore, this study proposed a new semantic segmentation network named SegTDformer to extract buildings in remote sensing images. We designed a Dynamic Atrous Attention (DAA) fusion module that integrated multi-scale features from Transformer, constructing an information exchange between global information and local representational information. Among them, we introduced the Shift Operation module and the Self-Attention module, which adopt a dual-branch structure to respectively capture local spatial dependencies and global correlations, and perform weight coupling to achieve highly complementary contextual information fusion. Furthermore, we fused triplet attention with depth-wise separable convolutions, reducing computational requirements and mitigating potential overfitting scenarios. We benchmarked the model on three different datasets, including Massachusetts, INRIA, and WHU, and the results show that the model consistently outperforms existing models. Notably, on the Massachusetts dataset, the SegTDformer model achieved benchmark in mIoU, F1-score, and Overall Accuracy of 75.47%, 84.7%, and 94.61%, respectively, superseding other deep learning models. The proposed SegTDformer model exhibits enhanced precision in the extraction of urban structures from intricate environments and manifests a diminished rate of both omission and internal misclassification errors, particularly within the context of diminutive and expansive edifices.
format	Article
id	doaj-art-4cbb9ac3e15c4748bfb855019867a70f
institution	Kabale University
issn	2045-2322
language	English
publishDate	2025-08-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-4cbb9ac3e15c4748bfb855019867a70f2025-08-24T11:26:43ZengNature PortfolioScientific Reports2045-23222025-08-0115111610.1038/s41598-025-14751-0Dynamic atrous attention and dual branch context fusion for cross scale Building segmentation in high resolution remote sensing imageryYaohui Liu0Shuzhe Zhang1Xinkai Wang2Rui Zhai3Hu Jiang4Lingjia Kong5School of Surveying and Geo-Informatics, Shandong Jianzhu UniversitySchool of Surveying and Geo-Informatics, Shandong Jianzhu UniversitySchool of Surveying and Geo-Informatics, Shandong Jianzhu UniversityChina Unicom Shandong BranchSchool of Surveying and Geo-Informatics, Shandong Jianzhu UniversitySchool of Mining Engineering, Heilongjiang University of Science and TechnologyAbstract Building segmentation of high-resolution remote sensing images using deep learning effectively reduces labor costs, but still faces the key challenges of effectively modeling cross-scale contextual relationships and preserving fine spatial details. Current Transformer-based approaches demonstrate superior long-range dependency modeling, but still suffer from the problem of progressive information loss during hierarchical feature encoding. Therefore, this study proposed a new semantic segmentation network named SegTDformer to extract buildings in remote sensing images. We designed a Dynamic Atrous Attention (DAA) fusion module that integrated multi-scale features from Transformer, constructing an information exchange between global information and local representational information. Among them, we introduced the Shift Operation module and the Self-Attention module, which adopt a dual-branch structure to respectively capture local spatial dependencies and global correlations, and perform weight coupling to achieve highly complementary contextual information fusion. Furthermore, we fused triplet attention with depth-wise separable convolutions, reducing computational requirements and mitigating potential overfitting scenarios. We benchmarked the model on three different datasets, including Massachusetts, INRIA, and WHU, and the results show that the model consistently outperforms existing models. Notably, on the Massachusetts dataset, the SegTDformer model achieved benchmark in mIoU, F1-score, and Overall Accuracy of 75.47%, 84.7%, and 94.61%, respectively, superseding other deep learning models. The proposed SegTDformer model exhibits enhanced precision in the extraction of urban structures from intricate environments and manifests a diminished rate of both omission and internal misclassification errors, particularly within the context of diminutive and expansive edifices.https://doi.org/10.1038/s41598-025-14751-0Remote sensingBuilding segmentationTransformerAttention mechanismDynamic atrous attention
spellingShingle	Yaohui Liu Shuzhe Zhang Xinkai Wang Rui Zhai Hu Jiang Lingjia Kong Dynamic atrous attention and dual branch context fusion for cross scale Building segmentation in high resolution remote sensing imagery Scientific Reports Remote sensing Building segmentation Transformer Attention mechanism Dynamic atrous attention
title	Dynamic atrous attention and dual branch context fusion for cross scale Building segmentation in high resolution remote sensing imagery
title_full	Dynamic atrous attention and dual branch context fusion for cross scale Building segmentation in high resolution remote sensing imagery
title_fullStr	Dynamic atrous attention and dual branch context fusion for cross scale Building segmentation in high resolution remote sensing imagery
title_full_unstemmed	Dynamic atrous attention and dual branch context fusion for cross scale Building segmentation in high resolution remote sensing imagery
title_short	Dynamic atrous attention and dual branch context fusion for cross scale Building segmentation in high resolution remote sensing imagery
title_sort	dynamic atrous attention and dual branch context fusion for cross scale building segmentation in high resolution remote sensing imagery
topic	Remote sensing Building segmentation Transformer Attention mechanism Dynamic atrous attention
url	https://doi.org/10.1038/s41598-025-14751-0
work_keys_str_mv	AT yaohuiliu dynamicatrousattentionanddualbranchcontextfusionforcrossscalebuildingsegmentationinhighresolutionremotesensingimagery AT shuzhezhang dynamicatrousattentionanddualbranchcontextfusionforcrossscalebuildingsegmentationinhighresolutionremotesensingimagery AT xinkaiwang dynamicatrousattentionanddualbranchcontextfusionforcrossscalebuildingsegmentationinhighresolutionremotesensingimagery AT ruizhai dynamicatrousattentionanddualbranchcontextfusionforcrossscalebuildingsegmentationinhighresolutionremotesensingimagery AT hujiang dynamicatrousattentionanddualbranchcontextfusionforcrossscalebuildingsegmentationinhighresolutionremotesensingimagery AT lingjiakong dynamicatrousattentionanddualbranchcontextfusionforcrossscalebuildingsegmentationinhighresolutionremotesensingimagery

Dynamic atrous attention and dual branch context fusion for cross scale Building segmentation in high resolution remote sensing imagery

Similar Items