Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristics

Abstract Static Street View Images (SSVIs) are widely used in urban studies to analyze building characteristics. Typically, camera parameters such as pitch and heading need precise adjustments to clearly capture these features. However, system errors during image acquisition frequently result in unu...

Full description

Saved in:

Bibliographic Details
Main Author:	Seunghyeon Wang
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-08-01
Series:	Scientific Reports
Subjects:	Static street view images Google street view Building characteristics Image augmentation Deep learning Convolutional neural network
Online Access:	https://doi.org/10.1038/s41598-025-14786-3
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849333551601811456
author	Seunghyeon Wang
author_facet	Seunghyeon Wang
author_sort	Seunghyeon Wang
collection	DOAJ
description	Abstract Static Street View Images (SSVIs) are widely used in urban studies to analyze building characteristics. Typically, camera parameters such as pitch and heading need precise adjustments to clearly capture these features. However, system errors during image acquisition frequently result in unusable images. Although manual filtering is commonly utilized to address this problem, it is labor-intensive and inefficient, and automated solutions have not been thoroughly investigated. This research introduces a deep-learning-based automated classification framework designed for two specific tasks: (1) analyzing entire building façades and (2) examining first-story façades. Five transformer-based architectures—Swin Transformer, ViT, PVT, MobileViT, and Axial Transformer—were systematically evaluated, resulting in the generation of 1,026 distinct models through various combinations of architectures and hyperparameters. Among these, the Swin Transformer demonstrated the highest performance, achieving an F1 score of 90.15% and accuracy of 91.72% for whole-building façade analysis, and an F1 score of 89.72% and accuracy of 92.27% for first-story façade analysis. Transformer-based models consistently outperformed 810 CNN-based models, offering efficient processing speeds of 0.022 s per image. However, differences in performance among most models were not statistically significant. Finally, this research discusses the practical implications and applications of these findings in urban studies.
format	Article
id	doaj-art-35d7ffaee5b44f95b26e7ad0d42f4a60
institution	Kabale University
issn	2045-2322
language	English
publishDate	2025-08-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-35d7ffaee5b44f95b26e7ad0d42f4a602025-08-20T03:45:49ZengNature PortfolioScientific Reports2045-23222025-08-0115112010.1038/s41598-025-14786-3Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristicsSeunghyeon Wang0Institute for Environmental Design and Engineering, University College LondonAbstract Static Street View Images (SSVIs) are widely used in urban studies to analyze building characteristics. Typically, camera parameters such as pitch and heading need precise adjustments to clearly capture these features. However, system errors during image acquisition frequently result in unusable images. Although manual filtering is commonly utilized to address this problem, it is labor-intensive and inefficient, and automated solutions have not been thoroughly investigated. This research introduces a deep-learning-based automated classification framework designed for two specific tasks: (1) analyzing entire building façades and (2) examining first-story façades. Five transformer-based architectures—Swin Transformer, ViT, PVT, MobileViT, and Axial Transformer—were systematically evaluated, resulting in the generation of 1,026 distinct models through various combinations of architectures and hyperparameters. Among these, the Swin Transformer demonstrated the highest performance, achieving an F1 score of 90.15% and accuracy of 91.72% for whole-building façade analysis, and an F1 score of 89.72% and accuracy of 92.27% for first-story façade analysis. Transformer-based models consistently outperformed 810 CNN-based models, offering efficient processing speeds of 0.022 s per image. However, differences in performance among most models were not statistically significant. Finally, this research discusses the practical implications and applications of these findings in urban studies.https://doi.org/10.1038/s41598-025-14786-3Static street view imagesGoogle street viewBuilding characteristicsImage augmentationDeep learningConvolutional neural network
spellingShingle	Seunghyeon Wang Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristics Scientific Reports Static street view images Google street view Building characteristics Image augmentation Deep learning Convolutional neural network
title	Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristics
title_full	Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristics
title_fullStr	Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristics
title_full_unstemmed	Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristics
title_short	Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristics
title_sort	development of approach to an automated acquisition of static street view images using transformer architecture for analysis of building characteristics
topic	Static street view images Google street view Building characteristics Image augmentation Deep learning Convolutional neural network
url	https://doi.org/10.1038/s41598-025-14786-3
work_keys_str_mv	AT seunghyeonwang developmentofapproachtoanautomatedacquisitionofstaticstreetviewimagesusingtransformerarchitectureforanalysisofbuildingcharacteristics

Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristics

Similar Items