Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristics

Abstract Static Street View Images (SSVIs) are widely used in urban studies to analyze building characteristics. Typically, camera parameters such as pitch and heading need precise adjustments to clearly capture these features. However, system errors during image acquisition frequently result in unu...

Full description

Saved in:
Bibliographic Details
Main Author: Seunghyeon Wang
Format: Article
Language:English
Published: Nature Portfolio 2025-08-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-14786-3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849333551601811456
author Seunghyeon Wang
author_facet Seunghyeon Wang
author_sort Seunghyeon Wang
collection DOAJ
description Abstract Static Street View Images (SSVIs) are widely used in urban studies to analyze building characteristics. Typically, camera parameters such as pitch and heading need precise adjustments to clearly capture these features. However, system errors during image acquisition frequently result in unusable images. Although manual filtering is commonly utilized to address this problem, it is labor-intensive and inefficient, and automated solutions have not been thoroughly investigated. This research introduces a deep-learning-based automated classification framework designed for two specific tasks: (1) analyzing entire building façades and (2) examining first-story façades. Five transformer-based architectures—Swin Transformer, ViT, PVT, MobileViT, and Axial Transformer—were systematically evaluated, resulting in the generation of 1,026 distinct models through various combinations of architectures and hyperparameters. Among these, the Swin Transformer demonstrated the highest performance, achieving an F1 score of 90.15% and accuracy of 91.72% for whole-building façade analysis, and an F1 score of 89.72% and accuracy of 92.27% for first-story façade analysis. Transformer-based models consistently outperformed 810 CNN-based models, offering efficient processing speeds of 0.022 s per image. However, differences in performance among most models were not statistically significant. Finally, this research discusses the practical implications and applications of these findings in urban studies.
format Article
id doaj-art-35d7ffaee5b44f95b26e7ad0d42f4a60
institution Kabale University
issn 2045-2322
language English
publishDate 2025-08-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-35d7ffaee5b44f95b26e7ad0d42f4a602025-08-20T03:45:49ZengNature PortfolioScientific Reports2045-23222025-08-0115112010.1038/s41598-025-14786-3Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristicsSeunghyeon Wang0Institute for Environmental Design and Engineering, University College LondonAbstract Static Street View Images (SSVIs) are widely used in urban studies to analyze building characteristics. Typically, camera parameters such as pitch and heading need precise adjustments to clearly capture these features. However, system errors during image acquisition frequently result in unusable images. Although manual filtering is commonly utilized to address this problem, it is labor-intensive and inefficient, and automated solutions have not been thoroughly investigated. This research introduces a deep-learning-based automated classification framework designed for two specific tasks: (1) analyzing entire building façades and (2) examining first-story façades. Five transformer-based architectures—Swin Transformer, ViT, PVT, MobileViT, and Axial Transformer—were systematically evaluated, resulting in the generation of 1,026 distinct models through various combinations of architectures and hyperparameters. Among these, the Swin Transformer demonstrated the highest performance, achieving an F1 score of 90.15% and accuracy of 91.72% for whole-building façade analysis, and an F1 score of 89.72% and accuracy of 92.27% for first-story façade analysis. Transformer-based models consistently outperformed 810 CNN-based models, offering efficient processing speeds of 0.022 s per image. However, differences in performance among most models were not statistically significant. Finally, this research discusses the practical implications and applications of these findings in urban studies.https://doi.org/10.1038/s41598-025-14786-3Static street view imagesGoogle street viewBuilding characteristicsImage augmentationDeep learningConvolutional neural network
spellingShingle Seunghyeon Wang
Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristics
Scientific Reports
Static street view images
Google street view
Building characteristics
Image augmentation
Deep learning
Convolutional neural network
title Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristics
title_full Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristics
title_fullStr Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristics
title_full_unstemmed Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristics
title_short Development of approach to an automated acquisition of static street view images using transformer architecture for analysis of Building characteristics
title_sort development of approach to an automated acquisition of static street view images using transformer architecture for analysis of building characteristics
topic Static street view images
Google street view
Building characteristics
Image augmentation
Deep learning
Convolutional neural network
url https://doi.org/10.1038/s41598-025-14786-3
work_keys_str_mv AT seunghyeonwang developmentofapproachtoanautomatedacquisitionofstaticstreetviewimagesusingtransformerarchitectureforanalysisofbuildingcharacteristics