Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristics
Abstract Static Street View Images (SSVIs) are widely used in urban studies to analyze building characteristics. Typically, camera parameters such as pitch and heading need precise adjustments to clearly capture these features. However, system errors during image acquisition frequently result in unu...
Saved in:
| Main Author: | |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-08-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-14786-3 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849333551601811456 |
|---|---|
| author | Seunghyeon Wang |
| author_facet | Seunghyeon Wang |
| author_sort | Seunghyeon Wang |
| collection | DOAJ |
| description | Abstract Static Street View Images (SSVIs) are widely used in urban studies to analyze building characteristics. Typically, camera parameters such as pitch and heading need precise adjustments to clearly capture these features. However, system errors during image acquisition frequently result in unusable images. Although manual filtering is commonly utilized to address this problem, it is labor-intensive and inefficient, and automated solutions have not been thoroughly investigated. This research introduces a deep-learning-based automated classification framework designed for two specific tasks: (1) analyzing entire building façades and (2) examining first-story façades. Five transformer-based architectures—Swin Transformer, ViT, PVT, MobileViT, and Axial Transformer—were systematically evaluated, resulting in the generation of 1,026 distinct models through various combinations of architectures and hyperparameters. Among these, the Swin Transformer demonstrated the highest performance, achieving an F1 score of 90.15% and accuracy of 91.72% for whole-building façade analysis, and an F1 score of 89.72% and accuracy of 92.27% for first-story façade analysis. Transformer-based models consistently outperformed 810 CNN-based models, offering efficient processing speeds of 0.022 s per image. However, differences in performance among most models were not statistically significant. Finally, this research discusses the practical implications and applications of these findings in urban studies. |
| format | Article |
| id | doaj-art-35d7ffaee5b44f95b26e7ad0d42f4a60 |
| institution | Kabale University |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-35d7ffaee5b44f95b26e7ad0d42f4a602025-08-20T03:45:49ZengNature PortfolioScientific Reports2045-23222025-08-0115112010.1038/s41598-025-14786-3Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristicsSeunghyeon Wang0Institute for Environmental Design and Engineering, University College LondonAbstract Static Street View Images (SSVIs) are widely used in urban studies to analyze building characteristics. Typically, camera parameters such as pitch and heading need precise adjustments to clearly capture these features. However, system errors during image acquisition frequently result in unusable images. Although manual filtering is commonly utilized to address this problem, it is labor-intensive and inefficient, and automated solutions have not been thoroughly investigated. This research introduces a deep-learning-based automated classification framework designed for two specific tasks: (1) analyzing entire building façades and (2) examining first-story façades. Five transformer-based architectures—Swin Transformer, ViT, PVT, MobileViT, and Axial Transformer—were systematically evaluated, resulting in the generation of 1,026 distinct models through various combinations of architectures and hyperparameters. Among these, the Swin Transformer demonstrated the highest performance, achieving an F1 score of 90.15% and accuracy of 91.72% for whole-building façade analysis, and an F1 score of 89.72% and accuracy of 92.27% for first-story façade analysis. Transformer-based models consistently outperformed 810 CNN-based models, offering efficient processing speeds of 0.022 s per image. However, differences in performance among most models were not statistically significant. Finally, this research discusses the practical implications and applications of these findings in urban studies.https://doi.org/10.1038/s41598-025-14786-3Static street view imagesGoogle street viewBuilding characteristicsImage augmentationDeep learningConvolutional neural network |
| spellingShingle | Seunghyeon Wang Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristics Scientific Reports Static street view images Google street view Building characteristics Image augmentation Deep learning Convolutional neural network |
| title | Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristics |
| title_full | Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristics |
| title_fullStr | Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristics |
| title_full_unstemmed | Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristics |
| title_short | Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristics |
| title_sort | development of approach to an automated acquisition of static street view images using transformer architecture for analysis of building characteristics |
| topic | Static street view images Google street view Building characteristics Image augmentation Deep learning Convolutional neural network |
| url | https://doi.org/10.1038/s41598-025-14786-3 |
| work_keys_str_mv | AT seunghyeonwang developmentofapproachtoanautomatedacquisitionofstaticstreetviewimagesusingtransformerarchitectureforanalysisofbuildingcharacteristics |