Dual-feature speech emotion recognition fusion algorithm based on wavelet scattering transform and MFCC

A fusion algorithm named permutation entropy weighted and bias adjustment rule fusion (PEW-BAR) was proposed to enhance the accuracy of speech emotion recognition by exploiting the emotional information in the spectral characteristics of speech signals. The algorithm was based on the integration of...

Full description

Saved in:
Bibliographic Details
Main Authors: YING Na, WU Shunpeng, YANG Meng, ZOU Yujian
Format: Article
Language:zho
Published: Beijing Xintong Media Co., Ltd 2024-05-01
Series:Dianxin kexue
Subjects:
Online Access:http://www.telecomsci.com/zh/article/doi/10.11959/j.issn.1000-0801.2024088/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A fusion algorithm named permutation entropy weighted and bias adjustment rule fusion (PEW-BAR) was proposed to enhance the accuracy of speech emotion recognition by exploiting the emotional information in the spectral characteristics of speech signals. The algorithm was based on the integration of wavelet scattering transform and Mel-frequency cepstral coefficients (MFCC). Firstly, wavelet scattering features and MFCC-related features from speech signals were extracted. Then, the wavelet scattering features were expanded in the scale dimension and applied support vector machines to obtain posterior probabilities for emotion recognition. And permutation entropy was calculated and a weighted fusion based on this entropy was subsequently applied. Finally, a bias adjustment rule was utilized to refine the integration results obtained from the MFCC-related features. Experimental results on various datasets, including EMODB, RAVDESS, and eNTERFACE05, demonstrate notable improvements. The proposed algorithm outperforms traditional wavelet scattering coefficient-based methods, achieving accuracy improvements of 2.82%, 2.85%, and 5.92%, respectively. Additionally, it shows enhancements of 3.40%, 2.87%, and 5.80% in terms of unweighted average recall (UAR), and a 6.89% improvement on the IEMOCAP dataset.
ISSN:1000-0801