Fourier Hilbert: The input transformation to enhance CNN models for speech emotion recognition

Signal processing in general, and speech emotion recognition in particular, have long been familiar Artificial Intelligence (AI) tasks. With the explosion of deep learning, CNN models are used more frequently, accompanied by the emergence of many signal transformations. However, these methods often...

Full description

Saved in:
Bibliographic Details
Main Author: Bao Long Ly
Format: Article
Language:English
Published: KeAi Communications Co. Ltd. 2024-01-01
Series:Cognitive Robotics
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2667241324000168
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Signal processing in general, and speech emotion recognition in particular, have long been familiar Artificial Intelligence (AI) tasks. With the explosion of deep learning, CNN models are used more frequently, accompanied by the emergence of many signal transformations. However, these methods often require significant hardware and runtime. In an effort to address these issues, we analyze and learn from existing transformations, leading us to propose a new method: Fourier Hilbert Transformation (FHT). In general, this method applies the Hilbert curve to Fourier images. The resulting images are small and dense, which is a shape well-suited to the CNN architecture. Additionally, the better distribution of information on the image allows the filters to fully utilize their power. These points support the argument that FHT provides an optimal input for CNN. Experiments conducted on popular datasets yielded promising results. FHT saves a large amount of hardware usage and runtime while maintaining high performance, even offers greater stability compared to existing methods. This opens up opportunities for deploying signal processing tasks on real-time systems with limited hardware.
ISSN:2667-2413