JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge Distillation for Visual Speech Recognition
Visual Speech Recognition (VSR) tasks are generally recognized to have a lower theoretical performance ceiling than Automatic Speech Recognition (ASR), owing to the inherent limitations of conveying semantic information visually. To mitigate this challenge, this paper introduces an advanced knowledg...
Saved in:
Main Authors: | Chang Sun, Bo Qin, Hong Yang |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2024-01-01
|
Series: | IEEE Open Journal of Signal Processing |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10750407/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
SlowFast-TCN: A Deep Learning Approach for Visual Speech Recognition
by: Nicole Yah Yie Ha, et al.
Published: (2024-12-01) -
Speech Emotion Recognition Using Transfer Learning: Integration of Advanced Speaker Embeddings and Image Recognition Models
by: Maros Jakubec, et al.
Published: (2024-10-01) -
LipBengal: Pioneering Bengali lip-reading dataset for pronunciation mapping through lip gesturesHugging Face
by: Md. Tanvir Rahman Sahed, et al.
Published: (2025-02-01) -
Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild Dataset
by: Ashwaq Waleed Abdul Ameer, et al.
Published: (2025-01-01) -
Speech Emotion Recognition Using Two-Stage Multiple Instance Learning Networks
by: ZHANG Shiqing, CHEN Chen, ZHAO Xiaoming
Published: (2024-12-01)