Dynamic atrous attention and dual branch context fusion for cross scale Building segmentation in high resolution remote sensing imagery
Abstract Building segmentation of high-resolution remote sensing images using deep learning effectively reduces labor costs, but still faces the key challenges of effectively modeling cross-scale contextual relationships and preserving fine spatial details. Current Transformer-based approaches demon...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-08-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-14751-0 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849226270040129536 |
|---|---|
| author | Yaohui Liu Shuzhe Zhang Xinkai Wang Rui Zhai Hu Jiang Lingjia Kong |
| author_facet | Yaohui Liu Shuzhe Zhang Xinkai Wang Rui Zhai Hu Jiang Lingjia Kong |
| author_sort | Yaohui Liu |
| collection | DOAJ |
| description | Abstract Building segmentation of high-resolution remote sensing images using deep learning effectively reduces labor costs, but still faces the key challenges of effectively modeling cross-scale contextual relationships and preserving fine spatial details. Current Transformer-based approaches demonstrate superior long-range dependency modeling, but still suffer from the problem of progressive information loss during hierarchical feature encoding. Therefore, this study proposed a new semantic segmentation network named SegTDformer to extract buildings in remote sensing images. We designed a Dynamic Atrous Attention (DAA) fusion module that integrated multi-scale features from Transformer, constructing an information exchange between global information and local representational information. Among them, we introduced the Shift Operation module and the Self-Attention module, which adopt a dual-branch structure to respectively capture local spatial dependencies and global correlations, and perform weight coupling to achieve highly complementary contextual information fusion. Furthermore, we fused triplet attention with depth-wise separable convolutions, reducing computational requirements and mitigating potential overfitting scenarios. We benchmarked the model on three different datasets, including Massachusetts, INRIA, and WHU, and the results show that the model consistently outperforms existing models. Notably, on the Massachusetts dataset, the SegTDformer model achieved benchmark in mIoU, F1-score, and Overall Accuracy of 75.47%, 84.7%, and 94.61%, respectively, superseding other deep learning models. The proposed SegTDformer model exhibits enhanced precision in the extraction of urban structures from intricate environments and manifests a diminished rate of both omission and internal misclassification errors, particularly within the context of diminutive and expansive edifices. |
| format | Article |
| id | doaj-art-4cbb9ac3e15c4748bfb855019867a70f |
| institution | Kabale University |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-4cbb9ac3e15c4748bfb855019867a70f2025-08-24T11:26:43ZengNature PortfolioScientific Reports2045-23222025-08-0115111610.1038/s41598-025-14751-0Dynamic atrous attention and dual branch context fusion for cross scale Building segmentation in high resolution remote sensing imageryYaohui Liu0Shuzhe Zhang1Xinkai Wang2Rui Zhai3Hu Jiang4Lingjia Kong5School of Surveying and Geo-Informatics, Shandong Jianzhu UniversitySchool of Surveying and Geo-Informatics, Shandong Jianzhu UniversitySchool of Surveying and Geo-Informatics, Shandong Jianzhu UniversityChina Unicom Shandong BranchSchool of Surveying and Geo-Informatics, Shandong Jianzhu UniversitySchool of Mining Engineering, Heilongjiang University of Science and TechnologyAbstract Building segmentation of high-resolution remote sensing images using deep learning effectively reduces labor costs, but still faces the key challenges of effectively modeling cross-scale contextual relationships and preserving fine spatial details. Current Transformer-based approaches demonstrate superior long-range dependency modeling, but still suffer from the problem of progressive information loss during hierarchical feature encoding. Therefore, this study proposed a new semantic segmentation network named SegTDformer to extract buildings in remote sensing images. We designed a Dynamic Atrous Attention (DAA) fusion module that integrated multi-scale features from Transformer, constructing an information exchange between global information and local representational information. Among them, we introduced the Shift Operation module and the Self-Attention module, which adopt a dual-branch structure to respectively capture local spatial dependencies and global correlations, and perform weight coupling to achieve highly complementary contextual information fusion. Furthermore, we fused triplet attention with depth-wise separable convolutions, reducing computational requirements and mitigating potential overfitting scenarios. We benchmarked the model on three different datasets, including Massachusetts, INRIA, and WHU, and the results show that the model consistently outperforms existing models. Notably, on the Massachusetts dataset, the SegTDformer model achieved benchmark in mIoU, F1-score, and Overall Accuracy of 75.47%, 84.7%, and 94.61%, respectively, superseding other deep learning models. The proposed SegTDformer model exhibits enhanced precision in the extraction of urban structures from intricate environments and manifests a diminished rate of both omission and internal misclassification errors, particularly within the context of diminutive and expansive edifices.https://doi.org/10.1038/s41598-025-14751-0Remote sensingBuilding segmentationTransformerAttention mechanismDynamic atrous attention |
| spellingShingle | Yaohui Liu Shuzhe Zhang Xinkai Wang Rui Zhai Hu Jiang Lingjia Kong Dynamic atrous attention and dual branch context fusion for cross scale Building segmentation in high resolution remote sensing imagery Scientific Reports Remote sensing Building segmentation Transformer Attention mechanism Dynamic atrous attention |
| title | Dynamic atrous attention and dual branch context fusion for cross scale Building segmentation in high resolution remote sensing imagery |
| title_full | Dynamic atrous attention and dual branch context fusion for cross scale Building segmentation in high resolution remote sensing imagery |
| title_fullStr | Dynamic atrous attention and dual branch context fusion for cross scale Building segmentation in high resolution remote sensing imagery |
| title_full_unstemmed | Dynamic atrous attention and dual branch context fusion for cross scale Building segmentation in high resolution remote sensing imagery |
| title_short | Dynamic atrous attention and dual branch context fusion for cross scale Building segmentation in high resolution remote sensing imagery |
| title_sort | dynamic atrous attention and dual branch context fusion for cross scale building segmentation in high resolution remote sensing imagery |
| topic | Remote sensing Building segmentation Transformer Attention mechanism Dynamic atrous attention |
| url | https://doi.org/10.1038/s41598-025-14751-0 |
| work_keys_str_mv | AT yaohuiliu dynamicatrousattentionanddualbranchcontextfusionforcrossscalebuildingsegmentationinhighresolutionremotesensingimagery AT shuzhezhang dynamicatrousattentionanddualbranchcontextfusionforcrossscalebuildingsegmentationinhighresolutionremotesensingimagery AT xinkaiwang dynamicatrousattentionanddualbranchcontextfusionforcrossscalebuildingsegmentationinhighresolutionremotesensingimagery AT ruizhai dynamicatrousattentionanddualbranchcontextfusionforcrossscalebuildingsegmentationinhighresolutionremotesensingimagery AT hujiang dynamicatrousattentionanddualbranchcontextfusionforcrossscalebuildingsegmentationinhighresolutionremotesensingimagery AT lingjiakong dynamicatrousattentionanddualbranchcontextfusionforcrossscalebuildingsegmentationinhighresolutionremotesensingimagery |