Modal-Guided Multi-Domain Inconsistency Learning for Face Forgery Detection

The remarkable development of deepfake models has facilitated the generation of fake content with various modalities, such as forged images, manipulated audio, and modified video with (or without) corresponding audio. However, many existing methods only analyze content with known and fixed modalitie...

Full description

Saved in:
Bibliographic Details
Main Authors: Zishuo Guo, Baopeng Zhang, Jack Fan, Zhu Teng, Jianping Fan
Format: Article
Language:English
Published: MDPI AG 2024-12-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/1/229
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841549458159960064
author Zishuo Guo
Baopeng Zhang
Jack Fan
Zhu Teng
Jianping Fan
author_facet Zishuo Guo
Baopeng Zhang
Jack Fan
Zhu Teng
Jianping Fan
author_sort Zishuo Guo
collection DOAJ
description The remarkable development of deepfake models has facilitated the generation of fake content with various modalities, such as forged images, manipulated audio, and modified video with (or without) corresponding audio. However, many existing methods only analyze content with known and fixed modalities to identify deepfakes, which restricts their focus on intra-domain inconsistencies, and they fail to explore diverse modal and inter-domain hierarchical inconsistencies. In this work, we propose a novel unified neural network named MGDL-Net (Modal-Guided Domain Learning Network), which contains a spatial branch, a temporal branch, and a frequency branch. This diverse combination of branches endows our network with the ability to detect face-related input with flexible modalities and perceive both intra- and inter-domain inconsistencies, such as unimodal, bimodal, and trimodal modalities. To effectively and comprehensively capture the various inconsistencies, we propose implementing heterogeneous inconsistency learning (HIL) with a three-level joint extraction paradigm. In particular, HIL performs heterogeneous learning from spatial, temporal, and frequency perspectives to generate more generalized representations of forgery and eliminate the interference of static redundant information. Furthermore, a multi-modal deepfake dataset is also constructed. We have conducted extensive experiments, and our results have demonstrated that the proposed method can achieve an outstanding performance compared to that of numerous state-of-the-art methods, which implies that the cross-modal inconsistency learning we propose is beneficial for multi-modal face forgery detection.
format Article
id doaj-art-9d8d0341e873471d9c18f4759cad3bf3
institution Kabale University
issn 2076-3417
language English
publishDate 2024-12-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-9d8d0341e873471d9c18f4759cad3bf32025-01-10T13:14:51ZengMDPI AGApplied Sciences2076-34172024-12-0115122910.3390/app15010229Modal-Guided Multi-Domain Inconsistency Learning for Face Forgery DetectionZishuo Guo0Baopeng Zhang1Jack Fan2Zhu Teng3Jianping Fan4School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, ChinaSchool of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, ChinaYoomi Health, Inc., New York, NY 13066, USASchool of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, ChinaThe AI Lab, Lenovo Research, Beijing 100085, ChinaThe remarkable development of deepfake models has facilitated the generation of fake content with various modalities, such as forged images, manipulated audio, and modified video with (or without) corresponding audio. However, many existing methods only analyze content with known and fixed modalities to identify deepfakes, which restricts their focus on intra-domain inconsistencies, and they fail to explore diverse modal and inter-domain hierarchical inconsistencies. In this work, we propose a novel unified neural network named MGDL-Net (Modal-Guided Domain Learning Network), which contains a spatial branch, a temporal branch, and a frequency branch. This diverse combination of branches endows our network with the ability to detect face-related input with flexible modalities and perceive both intra- and inter-domain inconsistencies, such as unimodal, bimodal, and trimodal modalities. To effectively and comprehensively capture the various inconsistencies, we propose implementing heterogeneous inconsistency learning (HIL) with a three-level joint extraction paradigm. In particular, HIL performs heterogeneous learning from spatial, temporal, and frequency perspectives to generate more generalized representations of forgery and eliminate the interference of static redundant information. Furthermore, a multi-modal deepfake dataset is also constructed. We have conducted extensive experiments, and our results have demonstrated that the proposed method can achieve an outstanding performance compared to that of numerous state-of-the-art methods, which implies that the cross-modal inconsistency learning we propose is beneficial for multi-modal face forgery detection.https://www.mdpi.com/2076-3417/15/1/229multi-modal deepfake detectiondomain inconsistency learninganti-face forgery
spellingShingle Zishuo Guo
Baopeng Zhang
Jack Fan
Zhu Teng
Jianping Fan
Modal-Guided Multi-Domain Inconsistency Learning for Face Forgery Detection
Applied Sciences
multi-modal deepfake detection
domain inconsistency learning
anti-face forgery
title Modal-Guided Multi-Domain Inconsistency Learning for Face Forgery Detection
title_full Modal-Guided Multi-Domain Inconsistency Learning for Face Forgery Detection
title_fullStr Modal-Guided Multi-Domain Inconsistency Learning for Face Forgery Detection
title_full_unstemmed Modal-Guided Multi-Domain Inconsistency Learning for Face Forgery Detection
title_short Modal-Guided Multi-Domain Inconsistency Learning for Face Forgery Detection
title_sort modal guided multi domain inconsistency learning for face forgery detection
topic multi-modal deepfake detection
domain inconsistency learning
anti-face forgery
url https://www.mdpi.com/2076-3417/15/1/229
work_keys_str_mv AT zishuoguo modalguidedmultidomaininconsistencylearningforfaceforgerydetection
AT baopengzhang modalguidedmultidomaininconsistencylearningforfaceforgerydetection
AT jackfan modalguidedmultidomaininconsistencylearningforfaceforgerydetection
AT zhuteng modalguidedmultidomaininconsistencylearningforfaceforgerydetection
AT jianpingfan modalguidedmultidomaininconsistencylearningforfaceforgerydetection