MT-EfficientNetV2: A Multi-Temporal Scale Fusion EEG Emotion Recognition Method Based on Recurrence Plots

Emotion recognition based on electroencephalography (EEG) signals has garnered significant research attention in recent years due to its potential applications in affective computing and brain-computer interfaces. Despite the proposal of various deep learning-based methods for extracting emotional f...

Full description

Saved in:
Bibliographic Details
Main Authors: Zihan Zhang, Zhiyong Zhou, Jun Wang, Hao Hu, Jing Zhao
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11095664/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Emotion recognition based on electroencephalography (EEG) signals has garnered significant research attention in recent years due to its potential applications in affective computing and brain-computer interfaces. Despite the proposal of various deep learning-based methods for extracting emotional features from EEG signals, most existing models struggle to effectively capture both long-term and short-term dependencies within the signals, failing to fully integrate features across different temporal scales. To address these challenges, we propose a deep learning model that combines multi-temporal-scale fusion, termed MT-EfficientNetV2. This model segments one-dimensional EEG signals using combinations of varying window sizes and fixed step lengths. The Recursive Plot (RP) algorithm is then employed to transform these segments into RGB images that intuitively represent the dynamic characteristics of the signals, facilitating the capture of complex emotional features. Additionally, a three-branch input feature fusion module has been designed to effectively integrate features across different scales within the same temporal domain. The model architecture incorporates DEconv and the SimAM attention mechanism with EfficientNetV2. This integration enhances the global fusion and expression of multi-scale features while strengthening the extraction of key emotional features at the local level, thereby suppressing redundant information. Experiments conducted on the public datasets SEED and SEED-IV yielded accuracies of 98.67% and 96.89%, respectively, surpassing current mainstream methods and validating the feasibility and effectiveness of the proposed approach.
ISSN:2169-3536