Causal Inference for Modality Debiasing in Multimodal Emotion Recognition
Multimodal emotion recognition (MER) aims to enhance the understanding of human emotions by integrating visual, auditory, and textual modalities. However, previous MER approaches often depend on a dominant modality rather than considering all modalities, leading to poor generalization. To address th...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2024-12-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/14/23/11397 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1846124398040317952 |
|---|---|
| author | Juyeon Kim Juyoung Hong Yukyung Choi |
| author_facet | Juyeon Kim Juyoung Hong Yukyung Choi |
| author_sort | Juyeon Kim |
| collection | DOAJ |
| description | Multimodal emotion recognition (MER) aims to enhance the understanding of human emotions by integrating visual, auditory, and textual modalities. However, previous MER approaches often depend on a dominant modality rather than considering all modalities, leading to poor generalization. To address this, we propose Causal Inference in Multimodal Emotion Recognition (CausalMER), which leverages counterfactual reasoning and causal graphs to capture relationships between modalities and reduce direct modality effects contributing to bias. This allows CausalMER to make unbiased predictions while being easily applied to existing MER methods in a model-agnostic manner, without requiring any architectural modifications. We evaluate CausalMER on the IEMOCAP and CMU-MOSEI datasets, widely used benchmarks in MER, and compare it with existing methods. On the IEMOCAP dataset with the MulT backbone, CausalMER achieves an average accuracy of 83.4%. On the CMU-MOSEI dataset, the average accuracies with MulT, PMR, and DMD backbones are 50.1%, 48.8%, and 48.8%, respectively. Experimental results demonstrate that CausalMER is robust in missing modality scenarios, as shown by its low standard deviation in performance drop gaps. Additionally, we evaluate modality contributions and show that CausalMER achieves balanced contributions from each modality, effectively mitigating direct biases from individual modalities. |
| format | Article |
| id | doaj-art-d8be911f04584202b92b2365fcb7cd9b |
| institution | Kabale University |
| issn | 2076-3417 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-d8be911f04584202b92b2365fcb7cd9b2024-12-13T16:23:47ZengMDPI AGApplied Sciences2076-34172024-12-0114231139710.3390/app142311397Causal Inference for Modality Debiasing in Multimodal Emotion RecognitionJuyeon Kim0Juyoung Hong1Yukyung Choi2Department of Convergence Engineering for Intelligent Drone, Sejong University, Gwangjin-gu, Seoul 05006, Republic of KoreaDepartment of Convergence Engineering for Intelligent Drone, Sejong University, Gwangjin-gu, Seoul 05006, Republic of KoreaDepartment of Convergence Engineering for Intelligent Drone, Sejong University, Gwangjin-gu, Seoul 05006, Republic of KoreaMultimodal emotion recognition (MER) aims to enhance the understanding of human emotions by integrating visual, auditory, and textual modalities. However, previous MER approaches often depend on a dominant modality rather than considering all modalities, leading to poor generalization. To address this, we propose Causal Inference in Multimodal Emotion Recognition (CausalMER), which leverages counterfactual reasoning and causal graphs to capture relationships between modalities and reduce direct modality effects contributing to bias. This allows CausalMER to make unbiased predictions while being easily applied to existing MER methods in a model-agnostic manner, without requiring any architectural modifications. We evaluate CausalMER on the IEMOCAP and CMU-MOSEI datasets, widely used benchmarks in MER, and compare it with existing methods. On the IEMOCAP dataset with the MulT backbone, CausalMER achieves an average accuracy of 83.4%. On the CMU-MOSEI dataset, the average accuracies with MulT, PMR, and DMD backbones are 50.1%, 48.8%, and 48.8%, respectively. Experimental results demonstrate that CausalMER is robust in missing modality scenarios, as shown by its low standard deviation in performance drop gaps. Additionally, we evaluate modality contributions and show that CausalMER achieves balanced contributions from each modality, effectively mitigating direct biases from individual modalities.https://www.mdpi.com/2076-3417/14/23/11397emotion recognitionmultimodal learningcausal inference |
| spellingShingle | Juyeon Kim Juyoung Hong Yukyung Choi Causal Inference for Modality Debiasing in Multimodal Emotion Recognition Applied Sciences emotion recognition multimodal learning causal inference |
| title | Causal Inference for Modality Debiasing in Multimodal Emotion Recognition |
| title_full | Causal Inference for Modality Debiasing in Multimodal Emotion Recognition |
| title_fullStr | Causal Inference for Modality Debiasing in Multimodal Emotion Recognition |
| title_full_unstemmed | Causal Inference for Modality Debiasing in Multimodal Emotion Recognition |
| title_short | Causal Inference for Modality Debiasing in Multimodal Emotion Recognition |
| title_sort | causal inference for modality debiasing in multimodal emotion recognition |
| topic | emotion recognition multimodal learning causal inference |
| url | https://www.mdpi.com/2076-3417/14/23/11397 |
| work_keys_str_mv | AT juyeonkim causalinferenceformodalitydebiasinginmultimodalemotionrecognition AT juyounghong causalinferenceformodalitydebiasinginmultimodalemotionrecognition AT yukyungchoi causalinferenceformodalitydebiasinginmultimodalemotionrecognition |