Modal-Guided Multi-Domain Inconsistency Learning for Face Forgery Detection

The remarkable development of deepfake models has facilitated the generation of fake content with various modalities, such as forged images, manipulated audio, and modified video with (or without) corresponding audio. However, many existing methods only analyze content with known and fixed modalitie...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zishuo Guo, Baopeng Zhang, Jack Fan, Zhu Teng, Jianping Fan
Format:	Article
Language:	English
Published:	MDPI AG 2024-12-01
Series:	Applied Sciences
Subjects:	multi-modal deepfake detection domain inconsistency learning anti-face forgery
Online Access:	https://www.mdpi.com/2076-3417/15/1/229
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841549458159960064
author	Zishuo Guo Baopeng Zhang Jack Fan Zhu Teng Jianping Fan
author_facet	Zishuo Guo Baopeng Zhang Jack Fan Zhu Teng Jianping Fan
author_sort	Zishuo Guo
collection	DOAJ
description	The remarkable development of deepfake models has facilitated the generation of fake content with various modalities, such as forged images, manipulated audio, and modified video with (or without) corresponding audio. However, many existing methods only analyze content with known and fixed modalities to identify deepfakes, which restricts their focus on intra-domain inconsistencies, and they fail to explore diverse modal and inter-domain hierarchical inconsistencies. In this work, we propose a novel unified neural network named MGDL-Net (Modal-Guided Domain Learning Network), which contains a spatial branch, a temporal branch, and a frequency branch. This diverse combination of branches endows our network with the ability to detect face-related input with flexible modalities and perceive both intra- and inter-domain inconsistencies, such as unimodal, bimodal, and trimodal modalities. To effectively and comprehensively capture the various inconsistencies, we propose implementing heterogeneous inconsistency learning (HIL) with a three-level joint extraction paradigm. In particular, HIL performs heterogeneous learning from spatial, temporal, and frequency perspectives to generate more generalized representations of forgery and eliminate the interference of static redundant information. Furthermore, a multi-modal deepfake dataset is also constructed. We have conducted extensive experiments, and our results have demonstrated that the proposed method can achieve an outstanding performance compared to that of numerous state-of-the-art methods, which implies that the cross-modal inconsistency learning we propose is beneficial for multi-modal face forgery detection.
format	Article
id	doaj-art-9d8d0341e873471d9c18f4759cad3bf3
institution	Kabale University
issn	2076-3417
language	English
publishDate	2024-12-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj-art-9d8d0341e873471d9c18f4759cad3bf32025-01-10T13:14:51ZengMDPI AGApplied Sciences2076-34172024-12-0115122910.3390/app15010229Modal-Guided Multi-Domain Inconsistency Learning for Face Forgery DetectionZishuo Guo0Baopeng Zhang1Jack Fan2Zhu Teng3Jianping Fan4School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, ChinaSchool of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, ChinaYoomi Health, Inc., New York, NY 13066, USASchool of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, ChinaThe AI Lab, Lenovo Research, Beijing 100085, ChinaThe remarkable development of deepfake models has facilitated the generation of fake content with various modalities, such as forged images, manipulated audio, and modified video with (or without) corresponding audio. However, many existing methods only analyze content with known and fixed modalities to identify deepfakes, which restricts their focus on intra-domain inconsistencies, and they fail to explore diverse modal and inter-domain hierarchical inconsistencies. In this work, we propose a novel unified neural network named MGDL-Net (Modal-Guided Domain Learning Network), which contains a spatial branch, a temporal branch, and a frequency branch. This diverse combination of branches endows our network with the ability to detect face-related input with flexible modalities and perceive both intra- and inter-domain inconsistencies, such as unimodal, bimodal, and trimodal modalities. To effectively and comprehensively capture the various inconsistencies, we propose implementing heterogeneous inconsistency learning (HIL) with a three-level joint extraction paradigm. In particular, HIL performs heterogeneous learning from spatial, temporal, and frequency perspectives to generate more generalized representations of forgery and eliminate the interference of static redundant information. Furthermore, a multi-modal deepfake dataset is also constructed. We have conducted extensive experiments, and our results have demonstrated that the proposed method can achieve an outstanding performance compared to that of numerous state-of-the-art methods, which implies that the cross-modal inconsistency learning we propose is beneficial for multi-modal face forgery detection.https://www.mdpi.com/2076-3417/15/1/229multi-modal deepfake detectiondomain inconsistency learninganti-face forgery
spellingShingle	Zishuo Guo Baopeng Zhang Jack Fan Zhu Teng Jianping Fan Modal-Guided Multi-Domain Inconsistency Learning for Face Forgery Detection Applied Sciences multi-modal deepfake detection domain inconsistency learning anti-face forgery
title	Modal-Guided Multi-Domain Inconsistency Learning for Face Forgery Detection
title_full	Modal-Guided Multi-Domain Inconsistency Learning for Face Forgery Detection
title_fullStr	Modal-Guided Multi-Domain Inconsistency Learning for Face Forgery Detection
title_full_unstemmed	Modal-Guided Multi-Domain Inconsistency Learning for Face Forgery Detection
title_short	Modal-Guided Multi-Domain Inconsistency Learning for Face Forgery Detection
title_sort	modal guided multi domain inconsistency learning for face forgery detection
topic	multi-modal deepfake detection domain inconsistency learning anti-face forgery
url	https://www.mdpi.com/2076-3417/15/1/229
work_keys_str_mv	AT zishuoguo modalguidedmultidomaininconsistencylearningforfaceforgerydetection AT baopengzhang modalguidedmultidomaininconsistencylearningforfaceforgerydetection AT jackfan modalguidedmultidomaininconsistencylearningforfaceforgerydetection AT zhuteng modalguidedmultidomaininconsistencylearningforfaceforgerydetection AT jianpingfan modalguidedmultidomaininconsistencylearningforfaceforgerydetection

Modal-Guided Multi-Domain Inconsistency Learning for Face Forgery Detection

Similar Items