Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristics
Abstract Static Street View Images (SSVIs) are widely used in urban studies to analyze building characteristics. Typically, camera parameters such as pitch and heading need precise adjustments to clearly capture these features. However, system errors during image acquisition frequently result in unu...
Saved in:
| Main Author: | |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-08-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-14786-3 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Static Street View Images (SSVIs) are widely used in urban studies to analyze building characteristics. Typically, camera parameters such as pitch and heading need precise adjustments to clearly capture these features. However, system errors during image acquisition frequently result in unusable images. Although manual filtering is commonly utilized to address this problem, it is labor-intensive and inefficient, and automated solutions have not been thoroughly investigated. This research introduces a deep-learning-based automated classification framework designed for two specific tasks: (1) analyzing entire building façades and (2) examining first-story façades. Five transformer-based architectures—Swin Transformer, ViT, PVT, MobileViT, and Axial Transformer—were systematically evaluated, resulting in the generation of 1,026 distinct models through various combinations of architectures and hyperparameters. Among these, the Swin Transformer demonstrated the highest performance, achieving an F1 score of 90.15% and accuracy of 91.72% for whole-building façade analysis, and an F1 score of 89.72% and accuracy of 92.27% for first-story façade analysis. Transformer-based models consistently outperformed 810 CNN-based models, offering efficient processing speeds of 0.022 s per image. However, differences in performance among most models were not statistically significant. Finally, this research discusses the practical implications and applications of these findings in urban studies. |
|---|---|
| ISSN: | 2045-2322 |