Modal-Guided Multi-Domain Inconsistency Learning for Face Forgery Detection
The remarkable development of deepfake models has facilitated the generation of fake content with various modalities, such as forged images, manipulated audio, and modified video with (or without) corresponding audio. However, many existing methods only analyze content with known and fixed modalitie...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-12-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/15/1/229 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The remarkable development of deepfake models has facilitated the generation of fake content with various modalities, such as forged images, manipulated audio, and modified video with (or without) corresponding audio. However, many existing methods only analyze content with known and fixed modalities to identify deepfakes, which restricts their focus on intra-domain inconsistencies, and they fail to explore diverse modal and inter-domain hierarchical inconsistencies. In this work, we propose a novel unified neural network named MGDL-Net (Modal-Guided Domain Learning Network), which contains a spatial branch, a temporal branch, and a frequency branch. This diverse combination of branches endows our network with the ability to detect face-related input with flexible modalities and perceive both intra- and inter-domain inconsistencies, such as unimodal, bimodal, and trimodal modalities. To effectively and comprehensively capture the various inconsistencies, we propose implementing heterogeneous inconsistency learning (HIL) with a three-level joint extraction paradigm. In particular, HIL performs heterogeneous learning from spatial, temporal, and frequency perspectives to generate more generalized representations of forgery and eliminate the interference of static redundant information. Furthermore, a multi-modal deepfake dataset is also constructed. We have conducted extensive experiments, and our results have demonstrated that the proposed method can achieve an outstanding performance compared to that of numerous state-of-the-art methods, which implies that the cross-modal inconsistency learning we propose is beneficial for multi-modal face forgery detection. |
---|---|
ISSN: | 2076-3417 |