Empathetic Deep Learning: Transferring Adult Speech Emotion Models to Children With Gender-Specific Adaptations Using Neural Embeddings
Abstract Understanding and recognizing emotional states through speech has vast implications in areas ranging from customer service to mental health. In this paper, we investigate the relationship between adults and children for the task of automatic speech emotion recognition, focusing on the criti...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Springer Nature
2024-12-01
|
Series: | Human-Centric Intelligent Systems |
Subjects: | |
Online Access: | https://doi.org/10.1007/s44230-024-00088-w |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841544590005370880 |
---|---|
author | Elina Lesyk Tomás Arias-Vergara Elmar Nöth Andreas Maier Juan Rafael Orozco-Arroyave Paula Andrea Perez-Toro |
author_facet | Elina Lesyk Tomás Arias-Vergara Elmar Nöth Andreas Maier Juan Rafael Orozco-Arroyave Paula Andrea Perez-Toro |
author_sort | Elina Lesyk |
collection | DOAJ |
description | Abstract Understanding and recognizing emotional states through speech has vast implications in areas ranging from customer service to mental health. In this paper, we investigate the relationship between adults and children for the task of automatic speech emotion recognition, focusing on the critical issue of limited datasets for children’s emotions. We use two databases: IEMOCAP, which contains emotional speech recordings from adults, and AIBO, which includes recordings from children. To address the dataset limitations, we employ transfer learning by training a neural network to classify adult emotional speech using a Wav2Vec model for feature extraction, followed by a classification head for the downstream task. However, the labels between IEMOCAP and AIBO do not align perfectly, presenting a challenge in emotional mapping. To tackle this, we perform inference on children’s data to examine how emotional labels in IEMOCAP correspond to those in AIBO, highlighting the complexities of cross-age emotional transfer. This approach achieved F-scores of up to 0.47. In addition, we trained male and female IEMOCAP models to determine how variations in gender within adult speech affect emotional mapping in children data. Some of our findings indicate that female samples align more with high arousal emotions, while male samples align more with low arousal emotion, underscoring the importance of gender in emotion recognition. To the best of our knowledge, this is the first study in the field of deep learning applications on emotion recognition that analyses the effects of genders and age groups on emotional mapping. |
format | Article |
id | doaj-art-65bcf65b755c4490a5f81ee46b355eef |
institution | Kabale University |
issn | 2667-1336 |
language | English |
publishDate | 2024-12-01 |
publisher | Springer Nature |
record_format | Article |
series | Human-Centric Intelligent Systems |
spelling | doaj-art-65bcf65b755c4490a5f81ee46b355eef2025-01-12T12:26:40ZengSpringer NatureHuman-Centric Intelligent Systems2667-13362024-12-014463364210.1007/s44230-024-00088-wEmpathetic Deep Learning: Transferring Adult Speech Emotion Models to Children With Gender-Specific Adaptations Using Neural EmbeddingsElina Lesyk0Tomás Arias-Vergara1Elmar Nöth2Andreas Maier3Juan Rafael Orozco-Arroyave4Paula Andrea Perez-Toro5Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-NürnbergPattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-NürnbergPattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-NürnbergPattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-NürnbergPattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-NürnbergPattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-NürnbergAbstract Understanding and recognizing emotional states through speech has vast implications in areas ranging from customer service to mental health. In this paper, we investigate the relationship between adults and children for the task of automatic speech emotion recognition, focusing on the critical issue of limited datasets for children’s emotions. We use two databases: IEMOCAP, which contains emotional speech recordings from adults, and AIBO, which includes recordings from children. To address the dataset limitations, we employ transfer learning by training a neural network to classify adult emotional speech using a Wav2Vec model for feature extraction, followed by a classification head for the downstream task. However, the labels between IEMOCAP and AIBO do not align perfectly, presenting a challenge in emotional mapping. To tackle this, we perform inference on children’s data to examine how emotional labels in IEMOCAP correspond to those in AIBO, highlighting the complexities of cross-age emotional transfer. This approach achieved F-scores of up to 0.47. In addition, we trained male and female IEMOCAP models to determine how variations in gender within adult speech affect emotional mapping in children data. Some of our findings indicate that female samples align more with high arousal emotions, while male samples align more with low arousal emotion, underscoring the importance of gender in emotion recognition. To the best of our knowledge, this is the first study in the field of deep learning applications on emotion recognition that analyses the effects of genders and age groups on emotional mapping.https://doi.org/10.1007/s44230-024-00088-wSpeech emotion recognitionDeep learningChildren’s emotionsWav2VecAdults’ emotionsTransfer knowledge |
spellingShingle | Elina Lesyk Tomás Arias-Vergara Elmar Nöth Andreas Maier Juan Rafael Orozco-Arroyave Paula Andrea Perez-Toro Empathetic Deep Learning: Transferring Adult Speech Emotion Models to Children With Gender-Specific Adaptations Using Neural Embeddings Human-Centric Intelligent Systems Speech emotion recognition Deep learning Children’s emotions Wav2Vec Adults’ emotions Transfer knowledge |
title | Empathetic Deep Learning: Transferring Adult Speech Emotion Models to Children With Gender-Specific Adaptations Using Neural Embeddings |
title_full | Empathetic Deep Learning: Transferring Adult Speech Emotion Models to Children With Gender-Specific Adaptations Using Neural Embeddings |
title_fullStr | Empathetic Deep Learning: Transferring Adult Speech Emotion Models to Children With Gender-Specific Adaptations Using Neural Embeddings |
title_full_unstemmed | Empathetic Deep Learning: Transferring Adult Speech Emotion Models to Children With Gender-Specific Adaptations Using Neural Embeddings |
title_short | Empathetic Deep Learning: Transferring Adult Speech Emotion Models to Children With Gender-Specific Adaptations Using Neural Embeddings |
title_sort | empathetic deep learning transferring adult speech emotion models to children with gender specific adaptations using neural embeddings |
topic | Speech emotion recognition Deep learning Children’s emotions Wav2Vec Adults’ emotions Transfer knowledge |
url | https://doi.org/10.1007/s44230-024-00088-w |
work_keys_str_mv | AT elinalesyk empatheticdeeplearningtransferringadultspeechemotionmodelstochildrenwithgenderspecificadaptationsusingneuralembeddings AT tomasariasvergara empatheticdeeplearningtransferringadultspeechemotionmodelstochildrenwithgenderspecificadaptationsusingneuralembeddings AT elmarnoth empatheticdeeplearningtransferringadultspeechemotionmodelstochildrenwithgenderspecificadaptationsusingneuralembeddings AT andreasmaier empatheticdeeplearningtransferringadultspeechemotionmodelstochildrenwithgenderspecificadaptationsusingneuralembeddings AT juanrafaelorozcoarroyave empatheticdeeplearningtransferringadultspeechemotionmodelstochildrenwithgenderspecificadaptationsusingneuralembeddings AT paulaandreapereztoro empatheticdeeplearningtransferringadultspeechemotionmodelstochildrenwithgenderspecificadaptationsusingneuralembeddings |