Utilizing Footstep Sound Event Detection by Using CNN Techniques for Assuring Property Security

The advanced camera systems developed for property security prove inadequate when faced with the ease of disabling the cameras. Therefore, incorporating sound data from potential threats can also play a significant role in ensuring property security. This sound event-based approach enables a more co...

Full description

Saved in:
Bibliographic Details
Main Authors: Furkan Y. Yavuz, Nejat Yumusak
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10966859/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The advanced camera systems developed for property security prove inadequate when faced with the ease of disabling the cameras. Therefore, incorporating sound data from potential threats can also play a significant role in ensuring property security. This sound event-based approach enables a more compact installation and allows for the use of affordable microphones, offering an alternative to visual-based security systems. We propose deep learning based on a convolutional neural network (CNN) algorithm for classification of footstep sound events. The study includes two consecutive stages: ReaLISED dataset and refined ReaLISED supported by Epidemic Sound data. Sound event files are transformed into Mel-Frequency Cepstral Coefficients (MFCC) and a CNN is fed with the represented images of MFCC. By optimizing the model parameters, our unique model detected footstep sound events with 98% accuracy among 17 other sound events. Validation through Repeated Stratified K-Fold Cross-Validation (5 folds, 10 repetitions) and comparisons with state-of-the-art architectures demonstrated robust performance, with F1-Scores ranging from 0.905 to 0.992 and a mean of 0.960. The strategic incorporation of diverse open-source data fosters transparency and reproducibility, enhancing the model’s adaptability and reliability in handling real-world audio patterns, as evidenced by its commendable 1% error rate in precise identification.
ISSN:2169-3536