Acoustic scene classification using inter- and intra-subarray spatial features in distributed microphone array

Abstract In this study, we investigate the effectiveness of spatial features in acoustic scene classification using distributed microphone arrays. Under the assumption that multiple subarrays, each equipped with microphones, are synchronized, we investigate two types of spatial feature: intra- and i...

Full description

Saved in:
Bibliographic Details
Main Authors: Takao Kawamura, Yuma Kinoshita, Nobutaka Ono, Robin Scheibler
Format: Article
Language:English
Published: SpringerOpen 2024-12-01
Series:EURASIP Journal on Audio, Speech, and Music Processing
Subjects:
Online Access:https://doi.org/10.1186/s13636-024-00386-y
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846112368573022208
author Takao Kawamura
Yuma Kinoshita
Nobutaka Ono
Robin Scheibler
author_facet Takao Kawamura
Yuma Kinoshita
Nobutaka Ono
Robin Scheibler
author_sort Takao Kawamura
collection DOAJ
description Abstract In this study, we investigate the effectiveness of spatial features in acoustic scene classification using distributed microphone arrays. Under the assumption that multiple subarrays, each equipped with microphones, are synchronized, we investigate two types of spatial feature: intra- and inter-generalized cross-correlation phase transforms (GCC-PHATs). These are derived from channels within the same subarray and between different subarrays, respectively. Our approach treats the log-Mel spectrogram as a spectral feature and intra- and/or inter-GCC-PHAT as a spatial feature. We propose two integration methods for spectral and spatial features: (a) middle integration, which fuses embeddings obtained by spectral and spatial features, and (b) late integration, which fuses decisions estimated using spectral and spatial features. The evaluation experiments showed that, when using only spectral features, employing all channels did not markedly improve the F1-score compared with the single-channel case. In contrast, integrating both spectral and spatial features improved the F1-score compared with using only spectral features. Additionally, we confirmed that the F1-score for late integration was slightly higher than that for middle integration.
format Article
id doaj-art-bcc7d186a0ec4c2d89c79769929a3933
institution Kabale University
issn 1687-4722
language English
publishDate 2024-12-01
publisher SpringerOpen
record_format Article
series EURASIP Journal on Audio, Speech, and Music Processing
spelling doaj-art-bcc7d186a0ec4c2d89c79769929a39332024-12-22T12:39:30ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47222024-12-012024111310.1186/s13636-024-00386-yAcoustic scene classification using inter- and intra-subarray spatial features in distributed microphone arrayTakao Kawamura0Yuma Kinoshita1Nobutaka Ono2Robin Scheibler3Department of Computer Science, Tokyo Metropolitan UniversityDepartment of Computer Science, Tokyo Metropolitan UniversityDepartment of Computer Science, Tokyo Metropolitan UniversityMusic Processing Team, LY CorporationAbstract In this study, we investigate the effectiveness of spatial features in acoustic scene classification using distributed microphone arrays. Under the assumption that multiple subarrays, each equipped with microphones, are synchronized, we investigate two types of spatial feature: intra- and inter-generalized cross-correlation phase transforms (GCC-PHATs). These are derived from channels within the same subarray and between different subarrays, respectively. Our approach treats the log-Mel spectrogram as a spectral feature and intra- and/or inter-GCC-PHAT as a spatial feature. We propose two integration methods for spectral and spatial features: (a) middle integration, which fuses embeddings obtained by spectral and spatial features, and (b) late integration, which fuses decisions estimated using spectral and spatial features. The evaluation experiments showed that, when using only spectral features, employing all channels did not markedly improve the F1-score compared with the single-channel case. In contrast, integrating both spectral and spatial features improved the F1-score compared with using only spectral features. Additionally, we confirmed that the F1-score for late integration was slightly higher than that for middle integration.https://doi.org/10.1186/s13636-024-00386-yDomestic activity monitoringAcoustic scene classificationDistributed microphone arraySubarrayGeneralized cross-correlation phase transformMiddle integration
spellingShingle Takao Kawamura
Yuma Kinoshita
Nobutaka Ono
Robin Scheibler
Acoustic scene classification using inter- and intra-subarray spatial features in distributed microphone array
EURASIP Journal on Audio, Speech, and Music Processing
Domestic activity monitoring
Acoustic scene classification
Distributed microphone array
Subarray
Generalized cross-correlation phase transform
Middle integration
title Acoustic scene classification using inter- and intra-subarray spatial features in distributed microphone array
title_full Acoustic scene classification using inter- and intra-subarray spatial features in distributed microphone array
title_fullStr Acoustic scene classification using inter- and intra-subarray spatial features in distributed microphone array
title_full_unstemmed Acoustic scene classification using inter- and intra-subarray spatial features in distributed microphone array
title_short Acoustic scene classification using inter- and intra-subarray spatial features in distributed microphone array
title_sort acoustic scene classification using inter and intra subarray spatial features in distributed microphone array
topic Domestic activity monitoring
Acoustic scene classification
Distributed microphone array
Subarray
Generalized cross-correlation phase transform
Middle integration
url https://doi.org/10.1186/s13636-024-00386-y
work_keys_str_mv AT takaokawamura acousticsceneclassificationusinginterandintrasubarrayspatialfeaturesindistributedmicrophonearray
AT yumakinoshita acousticsceneclassificationusinginterandintrasubarrayspatialfeaturesindistributedmicrophonearray
AT nobutakaono acousticsceneclassificationusinginterandintrasubarrayspatialfeaturesindistributedmicrophonearray
AT robinscheibler acousticsceneclassificationusinginterandintrasubarrayspatialfeaturesindistributedmicrophonearray