A Review on Language-Independent Search on Speech and its Applications

A thorough analysis of language-independent search methods and models for speech detection, a crucial task in retrieving audio file from large archives based on spoken queries was presented in this study. Unlike traditional speech recognition, this “zero-resource task” doesn&am...

Full description

Saved in:
Bibliographic Details
Main Authors: Sushil Venkatesh Kulkarni, Sukomal Pal
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10807177/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A thorough analysis of language-independent search methods and models for speech detection, a crucial task in retrieving audio file from large archives based on spoken queries was presented in this study. Unlike traditional speech recognition, this “zero-resource task” doesn’t require specific training data or lexical information, relying on hypothesis testing and pattern matching instead. Spoken term detection is the process of searching for large audio databases. Typically, this consists of text-based “spoken term datasets” of specific languages, where sufficient data are available to train automatic speech recognition systems. Speech recognition enables human-machine communication through a variety of voice commands and clear instructions. Telephones and cellular systems are examples of these applications. The study examines modern spoken-term detection systems, highlighting significant advancements and performance improvements. It delves into various speech recognition techniques used in cross-media retrieval systems and machine learning methodologies, emphasizing the practical information retrieval capabilities of cross-modal learning approaches. The research aims to provide an in-depth analysis of methods combining text and image features, addressing topics previously overlooked in surveys. The motivation behind this study stems from the lack of comprehensive reviews on “image and text modalities,” ongoing challenges in the “cross-modal retrieval field,” and the untapped potential of image and text features in cross-modal retrieval development. By exploring state-of-the-art language-independent searches for speech recognition, this study sheds light on sophisticated applications and paves the way for further advancements in the field.
ISSN:2169-3536