Audio-Language Datasets of Scenes and Events: A Survey

Audio-Language Datasets of Scenes and Events: A Survey

Audio-language models (ALMs) generate linguistic descriptions of sound-producing events and scenes. Advances in dataset creation and computational power have led to significant progress in this domain. This paper surveys 69 datasets used to train ALMs, covering research up to September 2024 (<uri...

Full description

Saved in:

Bibliographic Details
Main Authors:	Gijs Wijngaard, Elia Formisano, Michele Esposito, Michel Dumontier
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Audio-to-language learning language-to-audio learning audio-language datasets review
Online Access:	https://ieeexplore.ieee.org/document/10854210/
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

The Development of Digital Audio Coding
by: Guo Ke
Published: (1995-01-01)

Trends in audio scene source counting and analysis
by: Michael Nigro, et al.
Published: (2024-12-01)

Estimating rainfall intensity based on surveillance audio and deep-learning
by: Meizhen Wang, et al.
Published: (2024-11-01)

AUDIO BRANDING GUIDANCE MODEL IN THE CASE OF SMALL AND MEDIUM-SIZED BUSINESSES
by: Justinas Kisieliauskas, et al.
Published: (2024-12-01)

Audio Features and Crowdfunding Success: An Empirical Study Using Audio Mining
by: Miao Miao, et al.
Published: (2024-11-01)

A Novel Audio Copy Move Forgery Detection Method With Classification of Graph-Based Representations
by: Beste Ustubioglu, et al.
Published: (2025-01-01)

Current Trends in Audio Description Research
by: Mine
Published: (2018-12-01)

Deep convolutional neural networks for double compressed AMR audio detection
by: Aykut Büker, et al.
Published: (2021-06-01)

Synchronization and blind detect algorithm for dual channel audio watermark
by: FENG Tao1, et al.
Published: (2006-01-01)

Audiogmenter: a MATLAB toolbox for audio data augmentation
by: Gianluca Maguolo, et al.
Published: (2025-01-01)

Networked microcontrollers for accessible, distributed spatial audio
by: Thomas Albert Rushton, et al.
Published: (2024-11-01)

PENGGUNAAN MEDIA AUDIO VISUAL PADA MATA PELAJARAN PENDIDIKAN AGAMA ISLAM UNTUK MENINGKATKAN AKTIVITAS BELAJAR SISWA KELAS V SD N 09 PALEMBANG
by: Ibrahim Ibrahim, et al.
Published: (2024-01-01)

Audio-Driven Facial Animation with Deep Learning: A Survey
by: Diqiong Jiang, et al.
Published: (2024-10-01)

Three-channel dependent mid/side coding framwork for multichannel 3D audio
by: Shi DONG, et al.
Published: (2014-06-01)

Embedding-based pair generation for contrastive representation learning in audio-visual surveillance data
by: Wei-Cheng Wang, et al.
Published: (2025-01-01)

The cadenza woodwind dataset: Synthesised quartets for music information retrieval and machine learningZenodo
by: Gerardo Roa Dabike, et al.
Published: (2024-12-01)

Bandwidth extension method based on nonlinear audio characteristics classification
by: Li-yan ZHANG, et al.
Published: (2013-08-01)

HELPING YOUNG LEARNERS TO LEARN AUDIO DISCRIMINATION BY USING FLASHCARDS
by: Yansyah Yansyah
Published: (2017-05-01)

Peningkatan Kedisiplinan Siswa Sekolah Dasar Melalui Pemanfaatan Media Audio Visual
by: Siti Diyah Rachmatika, et al.
Published: (2024-09-01)

Stereo robust watermark algorithm based on parameter optimization
by: Yiming XUE, et al.
Published: (2023-07-01)

Audio-visual event localization with dual temporal-aware scene understanding and image-text knowledge bridging
by: Pufen Zhang, et al.
Published: (2024-11-01)

Advancements in End-to-End Audio Style Transformation: A Differentiable Approach for Voice Conversion and Musical Style Transfer
by: Shashwat Aggarwal, et al.
Published: (2025-01-01)

New Trends outside the Translation Classroom
by: Silvia Martínez Martínez, et al.
Published: (2014-09-01)

Live and mediated user engagements: A comparative dataset from two Bengali audio-story based youtube channelsMendeley Data
by: Mohammad Harun Or Rashid, et al.
Published: (2025-02-01)

A Novel Cascaded Approach for Classification of Tuberculosis Using Cough Audio in Real-Time Environment
by: Haroon Mahmood, et al.
Published: (2024-01-01)

Deep Learning Approach for Detecting Audio Deepfakes in Urdu
by: Marium Mateen
Published: (2023-07-01)

Equipment Sounds’ Event Localization and Detection Using Synthetic Multi-Channel Audio Signal to Support Collision Hazard Prevention
by: Kehinde Elelu, et al.
Published: (2024-10-01)

SilentPlay. Percorsi di ricerca tra tecnologie audio e teatro partecipativo
by: Carlo Presotto, et al.
Published: (2024-12-01)

LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging
by: Charilaos Papaioannou, et al.
Published: (2025-01-01)

Whombat: An open‐source audio annotation tool for machine learning assisted bioacoustics
by: Santiago Martínez Balvanera, et al.
Published: (2025-01-01)

Development of the mon busquets pass, an audio-based football passing instruments for kids aged 10 to 12
by: Suhermon Suhermon, et al.
Published: (2024-09-01)

The Preferred User: How Audio Description could Change Understandings of Australian Television Audiences and Media Technology
by: Ellis Katie, et al.
Published: (2018-07-01)

Detection of tuberculosis using cough audio analysis: a deep learning approach with capsule networks
by: Sakthi Jaya Sundar Rajasekar, et al.
Published: (2024-11-01)

Authenticity at Risk: Key Factors in the Generation and Detection of Audio Deepfakes
by: Alba Martínez-Serrano, et al.
Published: (2025-01-01)

THE INFLUENCE OF AUDIO-VISUAL VIDEO MEDIA ON KNOWLEDGE IN EFFORTS TO PREVENT SEXUAL VIOLENCE AGAINST ADOLESCENTS WITH DISABILITIES AT SLB SHANTI KOSALA NGANJUK
by: Mia Ashari Kurniasari, et al.
Published: (2023-10-01)

Effect of oral health education by audio aids, Braille & tactile models on the oral health status of visually impaired children of Bhopal city
by: Anjali Gautam, et al.
Published: (2018-09-01)

Technologies of network audio retrieval
by: ZHANG Wei-qiang, et al.
Published: (2007-01-01)

A Shift from an Audio- to a Video-Based Exam Format to Reflect Real-Life Clinical Interactions in the Language-Learning Classroom
by: Gabriella Hild, et al.
Published: (2024-11-01)

Multimodal MRI analysis of microstructural and functional connectivity brain changes following systematic audio-visual training in a virtual environment
by: Kholoud Alwashmi, et al.
Published: (2025-01-01)

A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation
by: Karn N. Watcharasupat, et al.
Published: (2024-01-01)