Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristics

Abstract Static Street View Images (SSVIs) are widely used in urban studies to analyze building characteristics. Typically, camera parameters such as pitch and heading need precise adjustments to clearly capture these features. However, system errors during image acquisition frequently result in unu...

Full description

Saved in:

Bibliographic Details
Main Author:	Seunghyeon Wang
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-08-01
Series:	Scientific Reports
Subjects:	Static street view images Google street view Building characteristics Image augmentation Deep learning Convolutional neural network
Online Access:	https://doi.org/10.1038/s41598-025-14786-3
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Abstract Static Street View Images (SSVIs) are widely used in urban studies to analyze building characteristics. Typically, camera parameters such as pitch and heading need precise adjustments to clearly capture these features. However, system errors during image acquisition frequently result in unusable images. Although manual filtering is commonly utilized to address this problem, it is labor-intensive and inefficient, and automated solutions have not been thoroughly investigated. This research introduces a deep-learning-based automated classification framework designed for two specific tasks: (1) analyzing entire building façades and (2) examining first-story façades. Five transformer-based architectures—Swin Transformer, ViT, PVT, MobileViT, and Axial Transformer—were systematically evaluated, resulting in the generation of 1,026 distinct models through various combinations of architectures and hyperparameters. Among these, the Swin Transformer demonstrated the highest performance, achieving an F1 score of 90.15% and accuracy of 91.72% for whole-building façade analysis, and an F1 score of 89.72% and accuracy of 92.27% for first-story façade analysis. Transformer-based models consistently outperformed 810 CNN-based models, offering efficient processing speeds of 0.022 s per image. However, differences in performance among most models were not statistically significant. Finally, this research discusses the practical implications and applications of these findings in urban studies.
ISSN:	2045-2322

Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristics

Similar Items