Empathetic Deep Learning: Transferring Adult Speech Emotion Models to Children With Gender-Specific Adaptations Using Neural Embeddings

Abstract Understanding and recognizing emotional states through speech has vast implications in areas ranging from customer service to mental health. In this paper, we investigate the relationship between adults and children for the task of automatic speech emotion recognition, focusing on the criti...

Full description

Saved in:
Bibliographic Details
Main Authors: Elina Lesyk, Tomás Arias-Vergara, Elmar Nöth, Andreas Maier, Juan Rafael Orozco-Arroyave, Paula Andrea Perez-Toro
Format: Article
Language:English
Published: Springer Nature 2024-12-01
Series:Human-Centric Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s44230-024-00088-w
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841544590005370880
author Elina Lesyk
Tomás Arias-Vergara
Elmar Nöth
Andreas Maier
Juan Rafael Orozco-Arroyave
Paula Andrea Perez-Toro
author_facet Elina Lesyk
Tomás Arias-Vergara
Elmar Nöth
Andreas Maier
Juan Rafael Orozco-Arroyave
Paula Andrea Perez-Toro
author_sort Elina Lesyk
collection DOAJ
description Abstract Understanding and recognizing emotional states through speech has vast implications in areas ranging from customer service to mental health. In this paper, we investigate the relationship between adults and children for the task of automatic speech emotion recognition, focusing on the critical issue of limited datasets for children’s emotions. We use two databases: IEMOCAP, which contains emotional speech recordings from adults, and AIBO, which includes recordings from children. To address the dataset limitations, we employ transfer learning by training a neural network to classify adult emotional speech using a Wav2Vec model for feature extraction, followed by a classification head for the downstream task. However, the labels between IEMOCAP and AIBO do not align perfectly, presenting a challenge in emotional mapping. To tackle this, we perform inference on children’s data to examine how emotional labels in IEMOCAP correspond to those in AIBO, highlighting the complexities of cross-age emotional transfer. This approach achieved F-scores of up to 0.47. In addition, we trained male and female IEMOCAP models to determine how variations in gender within adult speech affect emotional mapping in children data. Some of our findings indicate that female samples align more with high arousal emotions, while male samples align more with low arousal emotion, underscoring the importance of gender in emotion recognition. To the best of our knowledge, this is the first study in the field of deep learning applications on emotion recognition that analyses the effects of genders and age groups on emotional mapping.
format Article
id doaj-art-65bcf65b755c4490a5f81ee46b355eef
institution Kabale University
issn 2667-1336
language English
publishDate 2024-12-01
publisher Springer Nature
record_format Article
series Human-Centric Intelligent Systems
spelling doaj-art-65bcf65b755c4490a5f81ee46b355eef2025-01-12T12:26:40ZengSpringer NatureHuman-Centric Intelligent Systems2667-13362024-12-014463364210.1007/s44230-024-00088-wEmpathetic Deep Learning: Transferring Adult Speech Emotion Models to Children With Gender-Specific Adaptations Using Neural EmbeddingsElina Lesyk0Tomás Arias-Vergara1Elmar Nöth2Andreas Maier3Juan Rafael Orozco-Arroyave4Paula Andrea Perez-Toro5Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-NürnbergPattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-NürnbergPattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-NürnbergPattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-NürnbergPattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-NürnbergPattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-NürnbergAbstract Understanding and recognizing emotional states through speech has vast implications in areas ranging from customer service to mental health. In this paper, we investigate the relationship between adults and children for the task of automatic speech emotion recognition, focusing on the critical issue of limited datasets for children’s emotions. We use two databases: IEMOCAP, which contains emotional speech recordings from adults, and AIBO, which includes recordings from children. To address the dataset limitations, we employ transfer learning by training a neural network to classify adult emotional speech using a Wav2Vec model for feature extraction, followed by a classification head for the downstream task. However, the labels between IEMOCAP and AIBO do not align perfectly, presenting a challenge in emotional mapping. To tackle this, we perform inference on children’s data to examine how emotional labels in IEMOCAP correspond to those in AIBO, highlighting the complexities of cross-age emotional transfer. This approach achieved F-scores of up to 0.47. In addition, we trained male and female IEMOCAP models to determine how variations in gender within adult speech affect emotional mapping in children data. Some of our findings indicate that female samples align more with high arousal emotions, while male samples align more with low arousal emotion, underscoring the importance of gender in emotion recognition. To the best of our knowledge, this is the first study in the field of deep learning applications on emotion recognition that analyses the effects of genders and age groups on emotional mapping.https://doi.org/10.1007/s44230-024-00088-wSpeech emotion recognitionDeep learningChildren’s emotionsWav2VecAdults’ emotionsTransfer knowledge
spellingShingle Elina Lesyk
Tomás Arias-Vergara
Elmar Nöth
Andreas Maier
Juan Rafael Orozco-Arroyave
Paula Andrea Perez-Toro
Empathetic Deep Learning: Transferring Adult Speech Emotion Models to Children With Gender-Specific Adaptations Using Neural Embeddings
Human-Centric Intelligent Systems
Speech emotion recognition
Deep learning
Children’s emotions
Wav2Vec
Adults’ emotions
Transfer knowledge
title Empathetic Deep Learning: Transferring Adult Speech Emotion Models to Children With Gender-Specific Adaptations Using Neural Embeddings
title_full Empathetic Deep Learning: Transferring Adult Speech Emotion Models to Children With Gender-Specific Adaptations Using Neural Embeddings
title_fullStr Empathetic Deep Learning: Transferring Adult Speech Emotion Models to Children With Gender-Specific Adaptations Using Neural Embeddings
title_full_unstemmed Empathetic Deep Learning: Transferring Adult Speech Emotion Models to Children With Gender-Specific Adaptations Using Neural Embeddings
title_short Empathetic Deep Learning: Transferring Adult Speech Emotion Models to Children With Gender-Specific Adaptations Using Neural Embeddings
title_sort empathetic deep learning transferring adult speech emotion models to children with gender specific adaptations using neural embeddings
topic Speech emotion recognition
Deep learning
Children’s emotions
Wav2Vec
Adults’ emotions
Transfer knowledge
url https://doi.org/10.1007/s44230-024-00088-w
work_keys_str_mv AT elinalesyk empatheticdeeplearningtransferringadultspeechemotionmodelstochildrenwithgenderspecificadaptationsusingneuralembeddings
AT tomasariasvergara empatheticdeeplearningtransferringadultspeechemotionmodelstochildrenwithgenderspecificadaptationsusingneuralembeddings
AT elmarnoth empatheticdeeplearningtransferringadultspeechemotionmodelstochildrenwithgenderspecificadaptationsusingneuralembeddings
AT andreasmaier empatheticdeeplearningtransferringadultspeechemotionmodelstochildrenwithgenderspecificadaptationsusingneuralembeddings
AT juanrafaelorozcoarroyave empatheticdeeplearningtransferringadultspeechemotionmodelstochildrenwithgenderspecificadaptationsusingneuralembeddings
AT paulaandreapereztoro empatheticdeeplearningtransferringadultspeechemotionmodelstochildrenwithgenderspecificadaptationsusingneuralembeddings