JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge Distillation for Visual Speech Recognition

JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge Distillation for Visual Speech Recognition

Visual Speech Recognition (VSR) tasks are generally recognized to have a lower theoretical performance ceiling than Automatic Speech Recognition (ASR), owing to the inherent limitations of conveying semantic information visually. To mitigate this challenge, this paper introduces an advanced knowledg...

Full description

Saved in:

Bibliographic Details
Main Authors:	Chang Sun, Bo Qin, Hong Yang
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Open Journal of Signal Processing
Subjects:	Joint-embedding predictive architecture knowledge distillation lip-reading visual speech recognition
Online Access:	https://ieeexplore.ieee.org/document/10750407/
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SlowFast-TCN: A Deep Learning Approach for Visual Speech Recognition
by: Nicole Yah Yie Ha, et al.
Published: (2024-12-01)

Speech Emotion Recognition Using Transfer Learning: Integration of Advanced Speaker Embeddings and Image Recognition Models
by: Maros Jakubec, et al.
Published: (2024-10-01)

LipBengal: Pioneering Bengali lip-reading dataset for pronunciation mapping through lip gesturesHugging Face
by: Md. Tanvir Rahman Sahed, et al.
Published: (2025-02-01)

Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild Dataset
by: Ashwaq Waleed Abdul Ameer, et al.
Published: (2025-01-01)

Speech Emotion Recognition Using Two-Stage Multiple Instance Learning Networks
by: ZHANG Shiqing, CHEN Chen, ZHAO Xiaoming
Published: (2024-12-01)

Improved embedded wideband speech codec fitting EV-VBR standard
by: XIN Jie, et al.
Published: (2010-01-01)

8～64kbit/s super-wideband embedded speech and audio coding algorithm
by: JIA Mao-shen, et al.
Published: (2009-01-01)

Knowledge Distillation for Face Recognition Using Synthetic Data With Dynamic Latent Sampling
by: Hatef Otroshi Shahreza, et al.
Published: (2024-01-01)

Node and Edge Joint Embedding for Heterogeneous Information Network
by: Lei Chen, et al.
Published: (2024-09-01)

Arabic Speech Recognition Based on Encoder-Decoder Architecture of Transformer
by: Mohanad Sameer, et al.
Published: (2023-03-01)

Analysis for speech and esthetics in sixty consecutive patients with cleft lip and palate
by: Mahantesh S Shiraganvi, et al.
Published: (2011-10-01)

DM-KD: Decoupling Mixed-Images for Efficient Knowledge Distillation
by: Jongkyung Im, et al.
Published: (2025-01-01)

Improving Speech Recognition Rate through Analysis Parameters
by: Eringis Deividas, et al.
Published: (2014-05-01)

Frame erasure concealment method used for wideband embedded speech codec
by: ZHU Heng, et al.
Published: (2008-01-01)

Leveraging Lightweight Hybrid Ensemble Distillation (HED) for Suspect Identification With Face Recognition
by: Vaishnavi Munusamy, et al.
Published: (2025-01-01)

Suitability of Speech Files for Automatic Speech Recognition Systems after Noise Reduction Procedures
by: R.Kh. Latypov, et al.
Published: (2015-12-01)

Recent advancements in automatic disordered speech recognition: A survey paper
by: Nada Gohider, et al.
Published: (2024-12-01)

Sentence Embedding Generation Framework Based on Kullback–Leibler Divergence Optimization and RoBERTa Knowledge Distillation
by: Jin Han, et al.
Published: (2024-12-01)

AFT-SAM: Adaptive Fusion Transformer with a Sparse Attention Mechanism for Audio–Visual Speech Recognition
by: Na Che, et al.
Published: (2024-12-01)

Speech Emotion Recognition Model Based on Joint Modeling of Discrete and Dimensional Emotion Representation
by: John Lorenzo Bautista, et al.
Published: (2025-01-01)

KAZAKH SPEECH AND RECOGNITION METHODS: ERROR ANALYSIS AND IMPROVEMENT PROSPECTS
by: Yerlan Karabaliyev, et al.
Published: (2024-10-01)

Lip-Reading Classification of Turkish Digits Using Ensemble Learning Architecture Based on 3DCNN
by: Ali Erbey, et al.
Published: (2025-01-01)

Assessment-Based Optimization of Distillation Parameters
by: Ludmila N. Krikunova, et al.
Published: (2023-06-01)

Dysarthric Speech Transformer: A Sequence-to-Sequence Dysarthric Speech Recognition System
by: Seyed Reza Shahamiri, et al.
Published: (2023-01-01)

Optimizing Speech Emotion Recognition with Hilbert Curve and convolutional neural network
by: Zijun Yang, et al.
Published: (2024-01-01)

Linearized distortion model for robust speech recognition in noisy environments
by: HE Yong-jun1, et al.
Published: (2010-01-01)

Refining maritime Automatic Speech Recognition by leveraging synthetic speech
by: Christoph Martius, et al.
Published: (2024-12-01)

A survey on knowledge distillation: Recent advancements
by: Amir Moslemi, et al.
Published: (2024-12-01)

Silent-Hidden-Voice Attack on Speech Recognition System
by: Hyun Kwon, et al.
Published: (2024-01-01)

Accents in Speech Recognition through the Lens of a World Englishes Evaluation Set
by: Miguel Del Río, et al.
Published: (2023-12-01)

CNN Based Automatic Speech Recognition: A Comparative Study
by: Hilal Ilgaz, et al.
Published: (2024-08-01)

Deficits in prosodic speech-in-noise recognition in schizophrenia patients and its association with psychiatric symptoms
by: Shenglin She, et al.
Published: (2024-11-01)

Comprehensive association analysis of speech recognition thresholds after cisplatin‐based chemotherapy in survivors of adult‐onset cancer
by: Mohammad Shahbazi, et al.
Published: (2023-02-01)

Hybrid LSTM–Attention and CNN Model for Enhanced Speech Emotion Recognition
by: Fazliddin Makhmudov, et al.
Published: (2024-12-01)

WTASR: Wavelet Transformer for Automatic Speech Recognition of Indian Languages
by: Tripti Choudhary, et al.
Published: (2023-03-01)

Chinese semantic and phonological information-based text proofreading model for speech recognition
by: Meiyu ZHONG, et al.
Published: (2022-11-01)

Research on a Lightweight Arrhythmia Classification Model Based on Knowledge Distillation for Wearable Single-Lead ECG Monitoring Systems
by: Xiang An, et al.
Published: (2024-12-01)

Design and Evaluation of a Voice-Controlled Elevator System to Improve the Safety and Accessibility
by: Ander Gonzalez Docasal, et al.
Published: (2024-01-01)

An improved ShuffleNetV2 method based on ensemble self-distillation for tomato leaf diseases recognition
by: Shuiping Ni, et al.
Published: (2025-01-01)

Speech Recognition for the Sterile Interaction with Information Systems in the Surgical Area
by: Schrüfer Katrin V., et al.
Published: (2024-09-01)