Acoustic scene classification using inter- and intra-subarray spatial features in distributed microphone array
Abstract In this study, we investigate the effectiveness of spatial features in acoustic scene classification using distributed microphone arrays. Under the assumption that multiple subarrays, each equipped with microphones, are synchronized, we investigate two types of spatial feature: intra- and i...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
SpringerOpen
2024-12-01
|
| Series: | EURASIP Journal on Audio, Speech, and Music Processing |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s13636-024-00386-y |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1846112368573022208 |
|---|---|
| author | Takao Kawamura Yuma Kinoshita Nobutaka Ono Robin Scheibler |
| author_facet | Takao Kawamura Yuma Kinoshita Nobutaka Ono Robin Scheibler |
| author_sort | Takao Kawamura |
| collection | DOAJ |
| description | Abstract In this study, we investigate the effectiveness of spatial features in acoustic scene classification using distributed microphone arrays. Under the assumption that multiple subarrays, each equipped with microphones, are synchronized, we investigate two types of spatial feature: intra- and inter-generalized cross-correlation phase transforms (GCC-PHATs). These are derived from channels within the same subarray and between different subarrays, respectively. Our approach treats the log-Mel spectrogram as a spectral feature and intra- and/or inter-GCC-PHAT as a spatial feature. We propose two integration methods for spectral and spatial features: (a) middle integration, which fuses embeddings obtained by spectral and spatial features, and (b) late integration, which fuses decisions estimated using spectral and spatial features. The evaluation experiments showed that, when using only spectral features, employing all channels did not markedly improve the F1-score compared with the single-channel case. In contrast, integrating both spectral and spatial features improved the F1-score compared with using only spectral features. Additionally, we confirmed that the F1-score for late integration was slightly higher than that for middle integration. |
| format | Article |
| id | doaj-art-bcc7d186a0ec4c2d89c79769929a3933 |
| institution | Kabale University |
| issn | 1687-4722 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | SpringerOpen |
| record_format | Article |
| series | EURASIP Journal on Audio, Speech, and Music Processing |
| spelling | doaj-art-bcc7d186a0ec4c2d89c79769929a39332024-12-22T12:39:30ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47222024-12-012024111310.1186/s13636-024-00386-yAcoustic scene classification using inter- and intra-subarray spatial features in distributed microphone arrayTakao Kawamura0Yuma Kinoshita1Nobutaka Ono2Robin Scheibler3Department of Computer Science, Tokyo Metropolitan UniversityDepartment of Computer Science, Tokyo Metropolitan UniversityDepartment of Computer Science, Tokyo Metropolitan UniversityMusic Processing Team, LY CorporationAbstract In this study, we investigate the effectiveness of spatial features in acoustic scene classification using distributed microphone arrays. Under the assumption that multiple subarrays, each equipped with microphones, are synchronized, we investigate two types of spatial feature: intra- and inter-generalized cross-correlation phase transforms (GCC-PHATs). These are derived from channels within the same subarray and between different subarrays, respectively. Our approach treats the log-Mel spectrogram as a spectral feature and intra- and/or inter-GCC-PHAT as a spatial feature. We propose two integration methods for spectral and spatial features: (a) middle integration, which fuses embeddings obtained by spectral and spatial features, and (b) late integration, which fuses decisions estimated using spectral and spatial features. The evaluation experiments showed that, when using only spectral features, employing all channels did not markedly improve the F1-score compared with the single-channel case. In contrast, integrating both spectral and spatial features improved the F1-score compared with using only spectral features. Additionally, we confirmed that the F1-score for late integration was slightly higher than that for middle integration.https://doi.org/10.1186/s13636-024-00386-yDomestic activity monitoringAcoustic scene classificationDistributed microphone arraySubarrayGeneralized cross-correlation phase transformMiddle integration |
| spellingShingle | Takao Kawamura Yuma Kinoshita Nobutaka Ono Robin Scheibler Acoustic scene classification using inter- and intra-subarray spatial features in distributed microphone array EURASIP Journal on Audio, Speech, and Music Processing Domestic activity monitoring Acoustic scene classification Distributed microphone array Subarray Generalized cross-correlation phase transform Middle integration |
| title | Acoustic scene classification using inter- and intra-subarray spatial features in distributed microphone array |
| title_full | Acoustic scene classification using inter- and intra-subarray spatial features in distributed microphone array |
| title_fullStr | Acoustic scene classification using inter- and intra-subarray spatial features in distributed microphone array |
| title_full_unstemmed | Acoustic scene classification using inter- and intra-subarray spatial features in distributed microphone array |
| title_short | Acoustic scene classification using inter- and intra-subarray spatial features in distributed microphone array |
| title_sort | acoustic scene classification using inter and intra subarray spatial features in distributed microphone array |
| topic | Domestic activity monitoring Acoustic scene classification Distributed microphone array Subarray Generalized cross-correlation phase transform Middle integration |
| url | https://doi.org/10.1186/s13636-024-00386-y |
| work_keys_str_mv | AT takaokawamura acousticsceneclassificationusinginterandintrasubarrayspatialfeaturesindistributedmicrophonearray AT yumakinoshita acousticsceneclassificationusinginterandintrasubarrayspatialfeaturesindistributedmicrophonearray AT nobutakaono acousticsceneclassificationusinginterandintrasubarrayspatialfeaturesindistributedmicrophonearray AT robinscheibler acousticsceneclassificationusinginterandintrasubarrayspatialfeaturesindistributedmicrophonearray |