Modal-Guided Multi-Domain Inconsistency Learning for Face Forgery Detection

The remarkable development of deepfake models has facilitated the generation of fake content with various modalities, such as forged images, manipulated audio, and modified video with (or without) corresponding audio. However, many existing methods only analyze content with known and fixed modalitie...

Full description

Saved in:
Bibliographic Details
Main Authors: Zishuo Guo, Baopeng Zhang, Jack Fan, Zhu Teng, Jianping Fan
Format: Article
Language:English
Published: MDPI AG 2024-12-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/1/229
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The remarkable development of deepfake models has facilitated the generation of fake content with various modalities, such as forged images, manipulated audio, and modified video with (or without) corresponding audio. However, many existing methods only analyze content with known and fixed modalities to identify deepfakes, which restricts their focus on intra-domain inconsistencies, and they fail to explore diverse modal and inter-domain hierarchical inconsistencies. In this work, we propose a novel unified neural network named MGDL-Net (Modal-Guided Domain Learning Network), which contains a spatial branch, a temporal branch, and a frequency branch. This diverse combination of branches endows our network with the ability to detect face-related input with flexible modalities and perceive both intra- and inter-domain inconsistencies, such as unimodal, bimodal, and trimodal modalities. To effectively and comprehensively capture the various inconsistencies, we propose implementing heterogeneous inconsistency learning (HIL) with a three-level joint extraction paradigm. In particular, HIL performs heterogeneous learning from spatial, temporal, and frequency perspectives to generate more generalized representations of forgery and eliminate the interference of static redundant information. Furthermore, a multi-modal deepfake dataset is also constructed. We have conducted extensive experiments, and our results have demonstrated that the proposed method can achieve an outstanding performance compared to that of numerous state-of-the-art methods, which implies that the cross-modal inconsistency learning we propose is beneficial for multi-modal face forgery detection.
ISSN:2076-3417