AFT-SAM: Adaptive Fusion Transformer with a Sparse Attention Mechanism for Audio–Visual Speech Recognition

AFT-SAM: Adaptive Fusion Transformer with a Sparse Attention Mechanism for Audio–Visual Speech Recognition

Aiming at the problems of serious information redundancy, complex inter-modal information interaction, and difficult multimodal fusion faced by the audio–visual speech recognition system when dealing with complex multimodal information, this paper proposes an adaptive fusion transformer algorithm (A...

Full description

Saved in:

Bibliographic Details
Main Authors:	Na Che, Yiming Zhu, Haiyan Wang, Xianwei Zeng, Qinsheng Du
Format:	Article
Language:	English
Published:	MDPI AG 2024-12-01
Series:	Applied Sciences
Subjects:	speech recognition multimodal integration transformer adaptive fusion
Online Access:	https://www.mdpi.com/2076-3417/15/1/199
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Objective assessment of communication speech interference effect based on feature fusion
by: Yun LIN, et al.
Published: (2023-03-01)

End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM
by: Yiming WANG, et al.
Published: (2019-12-01)

Dual-feature speech emotion recognition fusion algorithm based on wavelet scattering transform and MFCC
by: YING Na, et al.
Published: (2024-05-01)

Metaphor recognition based on cross-modal multi-level information fusion
by: Qimeng Yang, et al.
Published: (2024-12-01)

Parkinson’s Disease Prediction: An Attention-Based Multimodal Fusion Framework Using Handwriting and Clinical Data
by: Sabrina Benredjem, et al.
Published: (2024-12-01)

Performance Analysis: AI-based VIST Audio Player by Microsoft Speech API
by: Ribwar Bakhtyar Ibrahim
Published: (2021-07-01)

IGAF: Incremental Guided Attention Fusion for Depth Super-Resolution
by: Athanasios Tragakis, et al.
Published: (2024-12-01)

FusionMamba: dynamic feature enhancement for multimodal image fusion with Mamba
by: Xinyu Xie, et al.
Published: (2024-12-01)

Multi-feature fusion malware detection method based on attention and gating mechanisms
by: Zhongyuan CHEN, et al.
Published: (2024-02-01)

8～64kbit/s super-wideband embedded speech and audio coding algorithm
by: JIA Mao-shen, et al.
Published: (2009-01-01)

Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data
by: A-Hyeon Jo, et al.
Published: (2025-01-01)

Invariant Representation Learning in Multimedia Recommendation with Modality Alignment and Model Fusion
by: Xinghang Hu, et al.
Published: (2025-01-01)

Face recognition using decision fusion of multiple sparse representation-based classifiers
by: Biao TANG, et al.
Published: (2018-04-01)

A social media geolocation prediction method based on multimodal fusion
by: Shiduo HUANG, et al.
Published: (2023-08-01)

Multimodal Data Fusion for Depression Detection Approach
by: Mariia Nykoniuk, et al.
Published: (2025-01-01)

PERCEPTION AND RECOGNITION OF CONCEPTS OF SPEECH ACTS IN VOCAL COMMUNICATION
by: E. I. Grigoriev
Published: (2013-10-01)

Fusion-Based Damage Segmentation for Multimodal Building Façade Images from an End-to-End Perspective
by: Pujin Wang, et al.
Published: (2024-12-01)

Adaptive Multimodal Fusion with Cross-Attention for Robust Scene Segmentation and Urban Economic Analysis
by: Chun Zhong, et al.
Published: (2025-01-01)

Evaluation of CBCT reconstructed tooth models at different thresholds and voxels and their accuracy in fusion with IOS data: an in vitro validation study
by: Yusong Zhang, et al.
Published: (2024-12-01)

A Large-Scale Spatio-Temporal Multimodal Fusion Framework for Traffic Prediction
by: Bodong Zhou, et al.
Published: (2024-09-01)

Speech enhancement method based on multi-domain fusion and neural architecture search
by: Rui ZHANG, et al.
Published: (2024-02-01)

A Disentangled Representation-Based Multimodal Fusion Framework Integrating Pathomics and Radiomics for KRAS Mutation Detection in Colorectal Cancer
by: Zhilong Lv, et al.
Published: (2024-09-01)

Multimodal Adaptive Identity-Recognition Algorithm Fused with Gait Perception
by: Changjie Wang, et al.
Published: (2021-12-01)

Arabic Speech Recognition Based on Encoder-Decoder Architecture of Transformer
by: Mohanad Sameer, et al.
Published: (2023-03-01)

Biometric Fusion for Enhanced Authentication in Cloud Computing Environments
by: Chiyo Miyazawa, et al.
Published: (2024-03-01)

AFF-LightNet: A Lightweight Ship Detection Architecture Based on Attentional Feature Fusion
by: Yingxiu Yuan, et al.
Published: (2024-12-01)

Enhancement Infrared-Visible Image Fusion Using the Integration of Stationary Wavelet Transform and Fuzzy Histogram Equalization
by: Rusul Basheer Khazal, et al.
Published: (2022-12-01)

Attention-based interactive multi-level feature fusion for named entity recognition
by: Yiwu Xu, et al.
Published: (2025-01-01)

Survey of research on multimodal semantic communication
by: Zhijin QIN, et al.
Published: (2023-05-01)

TMFN: a text-based multimodal fusion network with multi-scale feature extraction and unsupervised contrastive learning for multimodal sentiment analysis
by: Junsong Fu, et al.
Published: (2025-01-01)

Harnessing the Multi-Phasal Nature of Speech-EEG for Enhancing Imagined Speech Recognition
by: Rini Sharon, et al.
Published: (2025-01-01)

Gear Fault Diagnosis based on Feature Fusion and Sparse Representation
by: Wang Jiangping, et al.
Published: (2017-01-01)

Continuous speech speaker recognition based on CNN
by: Zhendong WU, et al.
Published: (2017-03-01)

Speaker verification method based on cross-domain attentive feature fusion
by: Zhen YANG, et al.
Published: (2023-08-01)

Linearized distortion model for robust speech recognition in noisy environments
by: HE Yong-jun1, et al.
Published: (2010-01-01)

IoT-based approach to multimodal music emotion recognition
by: Hanbing Zhao, et al.
Published: (2025-02-01)

Multimodal Interaction, Interfaces, and Communication: A Survey
by: Elias Dritsas, et al.
Published: (2025-01-01)

Answer Distillation Network With Bi-Text-Image Attention for Medical Visual Question Answering
by: Hongfang Gong, et al.
Published: (2025-01-01)

Perception of vocoded speech in domestic dogs
by: Amritha Mallikarjun, et al.
Published: (2024-04-01)

Deep Learning for Obstructive Sleep Apnea Detection and Severity Assessment: A Multimodal Signals Fusion Multiscale Transformer Model
by: Zhang Y, et al.
Published: (2025-01-01)