Modal-Guided Multi-Domain Inconsistency Learning for Face Forgery Detection
The remarkable development of deepfake models has facilitated the generation of fake content with various modalities, such as forged images, manipulated audio, and modified video with (or without) corresponding audio. However, many existing methods only analyze content with known and fixed modalitie...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-12-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/15/1/229 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841549458159960064 |
---|---|
author | Zishuo Guo Baopeng Zhang Jack Fan Zhu Teng Jianping Fan |
author_facet | Zishuo Guo Baopeng Zhang Jack Fan Zhu Teng Jianping Fan |
author_sort | Zishuo Guo |
collection | DOAJ |
description | The remarkable development of deepfake models has facilitated the generation of fake content with various modalities, such as forged images, manipulated audio, and modified video with (or without) corresponding audio. However, many existing methods only analyze content with known and fixed modalities to identify deepfakes, which restricts their focus on intra-domain inconsistencies, and they fail to explore diverse modal and inter-domain hierarchical inconsistencies. In this work, we propose a novel unified neural network named MGDL-Net (Modal-Guided Domain Learning Network), which contains a spatial branch, a temporal branch, and a frequency branch. This diverse combination of branches endows our network with the ability to detect face-related input with flexible modalities and perceive both intra- and inter-domain inconsistencies, such as unimodal, bimodal, and trimodal modalities. To effectively and comprehensively capture the various inconsistencies, we propose implementing heterogeneous inconsistency learning (HIL) with a three-level joint extraction paradigm. In particular, HIL performs heterogeneous learning from spatial, temporal, and frequency perspectives to generate more generalized representations of forgery and eliminate the interference of static redundant information. Furthermore, a multi-modal deepfake dataset is also constructed. We have conducted extensive experiments, and our results have demonstrated that the proposed method can achieve an outstanding performance compared to that of numerous state-of-the-art methods, which implies that the cross-modal inconsistency learning we propose is beneficial for multi-modal face forgery detection. |
format | Article |
id | doaj-art-9d8d0341e873471d9c18f4759cad3bf3 |
institution | Kabale University |
issn | 2076-3417 |
language | English |
publishDate | 2024-12-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj-art-9d8d0341e873471d9c18f4759cad3bf32025-01-10T13:14:51ZengMDPI AGApplied Sciences2076-34172024-12-0115122910.3390/app15010229Modal-Guided Multi-Domain Inconsistency Learning for Face Forgery DetectionZishuo Guo0Baopeng Zhang1Jack Fan2Zhu Teng3Jianping Fan4School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, ChinaSchool of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, ChinaYoomi Health, Inc., New York, NY 13066, USASchool of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, ChinaThe AI Lab, Lenovo Research, Beijing 100085, ChinaThe remarkable development of deepfake models has facilitated the generation of fake content with various modalities, such as forged images, manipulated audio, and modified video with (or without) corresponding audio. However, many existing methods only analyze content with known and fixed modalities to identify deepfakes, which restricts their focus on intra-domain inconsistencies, and they fail to explore diverse modal and inter-domain hierarchical inconsistencies. In this work, we propose a novel unified neural network named MGDL-Net (Modal-Guided Domain Learning Network), which contains a spatial branch, a temporal branch, and a frequency branch. This diverse combination of branches endows our network with the ability to detect face-related input with flexible modalities and perceive both intra- and inter-domain inconsistencies, such as unimodal, bimodal, and trimodal modalities. To effectively and comprehensively capture the various inconsistencies, we propose implementing heterogeneous inconsistency learning (HIL) with a three-level joint extraction paradigm. In particular, HIL performs heterogeneous learning from spatial, temporal, and frequency perspectives to generate more generalized representations of forgery and eliminate the interference of static redundant information. Furthermore, a multi-modal deepfake dataset is also constructed. We have conducted extensive experiments, and our results have demonstrated that the proposed method can achieve an outstanding performance compared to that of numerous state-of-the-art methods, which implies that the cross-modal inconsistency learning we propose is beneficial for multi-modal face forgery detection.https://www.mdpi.com/2076-3417/15/1/229multi-modal deepfake detectiondomain inconsistency learninganti-face forgery |
spellingShingle | Zishuo Guo Baopeng Zhang Jack Fan Zhu Teng Jianping Fan Modal-Guided Multi-Domain Inconsistency Learning for Face Forgery Detection Applied Sciences multi-modal deepfake detection domain inconsistency learning anti-face forgery |
title | Modal-Guided Multi-Domain Inconsistency Learning for Face Forgery Detection |
title_full | Modal-Guided Multi-Domain Inconsistency Learning for Face Forgery Detection |
title_fullStr | Modal-Guided Multi-Domain Inconsistency Learning for Face Forgery Detection |
title_full_unstemmed | Modal-Guided Multi-Domain Inconsistency Learning for Face Forgery Detection |
title_short | Modal-Guided Multi-Domain Inconsistency Learning for Face Forgery Detection |
title_sort | modal guided multi domain inconsistency learning for face forgery detection |
topic | multi-modal deepfake detection domain inconsistency learning anti-face forgery |
url | https://www.mdpi.com/2076-3417/15/1/229 |
work_keys_str_mv | AT zishuoguo modalguidedmultidomaininconsistencylearningforfaceforgerydetection AT baopengzhang modalguidedmultidomaininconsistencylearningforfaceforgerydetection AT jackfan modalguidedmultidomaininconsistencylearningforfaceforgerydetection AT zhuteng modalguidedmultidomaininconsistencylearningforfaceforgerydetection AT jianpingfan modalguidedmultidomaininconsistencylearningforfaceforgerydetection |