Causal Inference for Modality Debiasing in Multimodal Emotion Recognition

Multimodal emotion recognition (MER) aims to enhance the understanding of human emotions by integrating visual, auditory, and textual modalities. However, previous MER approaches often depend on a dominant modality rather than considering all modalities, leading to poor generalization. To address th...

Full description

Saved in:
Bibliographic Details
Main Authors: Juyeon Kim, Juyoung Hong, Yukyung Choi
Format: Article
Language:English
Published: MDPI AG 2024-12-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/14/23/11397
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846124398040317952
author Juyeon Kim
Juyoung Hong
Yukyung Choi
author_facet Juyeon Kim
Juyoung Hong
Yukyung Choi
author_sort Juyeon Kim
collection DOAJ
description Multimodal emotion recognition (MER) aims to enhance the understanding of human emotions by integrating visual, auditory, and textual modalities. However, previous MER approaches often depend on a dominant modality rather than considering all modalities, leading to poor generalization. To address this, we propose Causal Inference in Multimodal Emotion Recognition (CausalMER), which leverages counterfactual reasoning and causal graphs to capture relationships between modalities and reduce direct modality effects contributing to bias. This allows CausalMER to make unbiased predictions while being easily applied to existing MER methods in a model-agnostic manner, without requiring any architectural modifications. We evaluate CausalMER on the IEMOCAP and CMU-MOSEI datasets, widely used benchmarks in MER, and compare it with existing methods. On the IEMOCAP dataset with the MulT backbone, CausalMER achieves an average accuracy of 83.4%. On the CMU-MOSEI dataset, the average accuracies with MulT, PMR, and DMD backbones are 50.1%, 48.8%, and 48.8%, respectively. Experimental results demonstrate that CausalMER is robust in missing modality scenarios, as shown by its low standard deviation in performance drop gaps. Additionally, we evaluate modality contributions and show that CausalMER achieves balanced contributions from each modality, effectively mitigating direct biases from individual modalities.
format Article
id doaj-art-d8be911f04584202b92b2365fcb7cd9b
institution Kabale University
issn 2076-3417
language English
publishDate 2024-12-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-d8be911f04584202b92b2365fcb7cd9b2024-12-13T16:23:47ZengMDPI AGApplied Sciences2076-34172024-12-0114231139710.3390/app142311397Causal Inference for Modality Debiasing in Multimodal Emotion RecognitionJuyeon Kim0Juyoung Hong1Yukyung Choi2Department of Convergence Engineering for Intelligent Drone, Sejong University, Gwangjin-gu, Seoul 05006, Republic of KoreaDepartment of Convergence Engineering for Intelligent Drone, Sejong University, Gwangjin-gu, Seoul 05006, Republic of KoreaDepartment of Convergence Engineering for Intelligent Drone, Sejong University, Gwangjin-gu, Seoul 05006, Republic of KoreaMultimodal emotion recognition (MER) aims to enhance the understanding of human emotions by integrating visual, auditory, and textual modalities. However, previous MER approaches often depend on a dominant modality rather than considering all modalities, leading to poor generalization. To address this, we propose Causal Inference in Multimodal Emotion Recognition (CausalMER), which leverages counterfactual reasoning and causal graphs to capture relationships between modalities and reduce direct modality effects contributing to bias. This allows CausalMER to make unbiased predictions while being easily applied to existing MER methods in a model-agnostic manner, without requiring any architectural modifications. We evaluate CausalMER on the IEMOCAP and CMU-MOSEI datasets, widely used benchmarks in MER, and compare it with existing methods. On the IEMOCAP dataset with the MulT backbone, CausalMER achieves an average accuracy of 83.4%. On the CMU-MOSEI dataset, the average accuracies with MulT, PMR, and DMD backbones are 50.1%, 48.8%, and 48.8%, respectively. Experimental results demonstrate that CausalMER is robust in missing modality scenarios, as shown by its low standard deviation in performance drop gaps. Additionally, we evaluate modality contributions and show that CausalMER achieves balanced contributions from each modality, effectively mitigating direct biases from individual modalities.https://www.mdpi.com/2076-3417/14/23/11397emotion recognitionmultimodal learningcausal inference
spellingShingle Juyeon Kim
Juyoung Hong
Yukyung Choi
Causal Inference for Modality Debiasing in Multimodal Emotion Recognition
Applied Sciences
emotion recognition
multimodal learning
causal inference
title Causal Inference for Modality Debiasing in Multimodal Emotion Recognition
title_full Causal Inference for Modality Debiasing in Multimodal Emotion Recognition
title_fullStr Causal Inference for Modality Debiasing in Multimodal Emotion Recognition
title_full_unstemmed Causal Inference for Modality Debiasing in Multimodal Emotion Recognition
title_short Causal Inference for Modality Debiasing in Multimodal Emotion Recognition
title_sort causal inference for modality debiasing in multimodal emotion recognition
topic emotion recognition
multimodal learning
causal inference
url https://www.mdpi.com/2076-3417/14/23/11397
work_keys_str_mv AT juyeonkim causalinferenceformodalitydebiasinginmultimodalemotionrecognition
AT juyounghong causalinferenceformodalitydebiasinginmultimodalemotionrecognition
AT yukyungchoi causalinferenceformodalitydebiasinginmultimodalemotionrecognition