Lip forgery detection via spatial-frequency domain combination

In recent years, numerous “face-swapping” videos have emerged in social networks, one of the representatives is the lip forgery with speakers.While making life more entertaining for the public, it poses a significant crisis for personal privacy and property security in cyberspace.Currently, under no...

Full description

Saved in:
Bibliographic Details
Main Authors: Jiaying LIN, Wenbo ZHOU, Weiming ZHANG, Nenghai YU
Format: Article
Language:English
Published: POSTS&TELECOM PRESS Co., LTD 2022-12-01
Series:网络与信息安全学报
Subjects:
Online Access:http://www.cjnis.com.cn/thesisDetails#10.11959/j.issn.2096-109x.2022075
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841529705450176512
author Jiaying LIN
Wenbo ZHOU
Weiming ZHANG
Nenghai YU
author_facet Jiaying LIN
Wenbo ZHOU
Weiming ZHANG
Nenghai YU
author_sort Jiaying LIN
collection DOAJ
description In recent years, numerous “face-swapping” videos have emerged in social networks, one of the representatives is the lip forgery with speakers.While making life more entertaining for the public, it poses a significant crisis for personal privacy and property security in cyberspace.Currently, under non-destructive conditions, most of the lip forgery detection methods achieve good performance.However, the compression operations are widely used in practice especially in social media platforms, face recognition and other scenarios.While saving pixel and time redundancy, the compression operations affect the video quality and destroy the coherent integrity of pixel-to-pixel and frame-to-frame in the spatial domain, and then the degradation of its detection performance and even misjudgment of the real video will be caused.When the information in the spatial domain cannot provide sufficiently effective features, the information in the frequency domain naturally becomes a priority research object because it can resist compression interference.Aiming at this problem, the advantages of frequency information in image structure and gradient feedback were analyzed.Then the lip forgery detection via spatial-frequency domain combination was proposed, which effectively utilized the corresponding characteristics of information in spatial and frequency domains.For lip features in the spatial domain, an adaptive extraction network and a light-weight attention module were designed.For frequency features in the frequency domain, separate extraction and fusion modules for different components were designed.Subsequently, by conducting a weighted fusion of lip features in spatial domain and frequency features in frequency domain, more texture information was preserved.In addition, fine-grained constraints were designed during the training to separate the inter-class distance of real and fake lip features while closing the intra-class distance.Experimental results show that, benefiting from the frequency information, the proposed method can enhance the detection accuracy under compression situation with certain transferability.On the other hand, in the ablation study conducted on the core modules, the results verify the effectiveness of the frequency component for anti-compression and the constraint of the dual loss function in training.
format Article
id doaj-art-6414f091540a4e76a2b5d0b421624c33
institution Kabale University
issn 2096-109X
language English
publishDate 2022-12-01
publisher POSTS&TELECOM PRESS Co., LTD
record_format Article
series 网络与信息安全学报
spelling doaj-art-6414f091540a4e76a2b5d0b421624c332025-01-15T03:16:06ZengPOSTS&TELECOM PRESS Co., LTD网络与信息安全学报2096-109X2022-12-01814615559574740Lip forgery detection via spatial-frequency domain combinationJiaying LINWenbo ZHOUWeiming ZHANGNenghai YUIn recent years, numerous “face-swapping” videos have emerged in social networks, one of the representatives is the lip forgery with speakers.While making life more entertaining for the public, it poses a significant crisis for personal privacy and property security in cyberspace.Currently, under non-destructive conditions, most of the lip forgery detection methods achieve good performance.However, the compression operations are widely used in practice especially in social media platforms, face recognition and other scenarios.While saving pixel and time redundancy, the compression operations affect the video quality and destroy the coherent integrity of pixel-to-pixel and frame-to-frame in the spatial domain, and then the degradation of its detection performance and even misjudgment of the real video will be caused.When the information in the spatial domain cannot provide sufficiently effective features, the information in the frequency domain naturally becomes a priority research object because it can resist compression interference.Aiming at this problem, the advantages of frequency information in image structure and gradient feedback were analyzed.Then the lip forgery detection via spatial-frequency domain combination was proposed, which effectively utilized the corresponding characteristics of information in spatial and frequency domains.For lip features in the spatial domain, an adaptive extraction network and a light-weight attention module were designed.For frequency features in the frequency domain, separate extraction and fusion modules for different components were designed.Subsequently, by conducting a weighted fusion of lip features in spatial domain and frequency features in frequency domain, more texture information was preserved.In addition, fine-grained constraints were designed during the training to separate the inter-class distance of real and fake lip features while closing the intra-class distance.Experimental results show that, benefiting from the frequency information, the proposed method can enhance the detection accuracy under compression situation with certain transferability.On the other hand, in the ablation study conducted on the core modules, the results verify the effectiveness of the frequency component for anti-compression and the constraint of the dual loss function in training.http://www.cjnis.com.cn/thesisDetails#10.11959/j.issn.2096-109x.2022075DeepFake forgeryDeepFake detection and defenselip forgery detectionanti-compressiondeep learning
spellingShingle Jiaying LIN
Wenbo ZHOU
Weiming ZHANG
Nenghai YU
Lip forgery detection via spatial-frequency domain combination
网络与信息安全学报
DeepFake forgery
DeepFake detection and defense
lip forgery detection
anti-compression
deep learning
title Lip forgery detection via spatial-frequency domain combination
title_full Lip forgery detection via spatial-frequency domain combination
title_fullStr Lip forgery detection via spatial-frequency domain combination
title_full_unstemmed Lip forgery detection via spatial-frequency domain combination
title_short Lip forgery detection via spatial-frequency domain combination
title_sort lip forgery detection via spatial frequency domain combination
topic DeepFake forgery
DeepFake detection and defense
lip forgery detection
anti-compression
deep learning
url http://www.cjnis.com.cn/thesisDetails#10.11959/j.issn.2096-109x.2022075
work_keys_str_mv AT jiayinglin lipforgerydetectionviaspatialfrequencydomaincombination
AT wenbozhou lipforgerydetectionviaspatialfrequencydomaincombination
AT weimingzhang lipforgerydetectionviaspatialfrequencydomaincombination
AT nenghaiyu lipforgerydetectionviaspatialfrequencydomaincombination