Bi-Modal Bi-Task Emotion Recognition Based on Transformer Architecture

In the field of emotion recognition, analyzing emotions from speech alone (single-modal speech emotion recognition) has several limitations, including limited data volume and low accuracy. Additionally, single-task models lack generalization and fail to fully utilize relevant information. To address...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yu Song, Qi Zhou
Format:	Article
Language:	English
Published:	Taylor & Francis Group 2024-12-01
Series:	Applied Artificial Intelligence
Online Access:	https://www.tandfonline.com/doi/10.1080/08839514.2024.2356992
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In the field of emotion recognition, analyzing emotions from speech alone (single-modal speech emotion recognition) has several limitations, including limited data volume and low accuracy. Additionally, single-task models lack generalization and fail to fully utilize relevant information. To address these issues, this paper proposes a new bi-modal bi-task emotion recognition model. The proposed model introduces multi-task learning on the Transformer architecture. On one hand, unsupervised contrastive predictive coding is used to extract denser features from the data while preserving self-information and context-related information. On the other hand, model robustness against interfering information is enhanced by employing self-supervised contrastive learning. Furthermore, the proposed model utilizes a modality fusion module to incorporate textual and audio information to implicitly align features from both modalities. The proposed model achieved accuracy rates of 82.3% and 83.5% on the IEMOCAP and RAVDESS datasets, respectively, when considering weighted accuracy (WA). When weight is not considered (unweighted accuracy (UA)), the model achieved 83.0% and 82.4% accuracy. Compared to the existing methods, the performance is further improved.
ISSN:	0883-9514 1087-6545

Bi-Modal Bi-Task Emotion Recognition Based on Transformer Architecture

Similar Items