research-article

Locate and Verify: A Two-Stream Network for Improved Deepfake Detection

Authors:

Lorenzo Cavallaro,

Kui RenAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 7131 - 7142

https://doi.org/10.1145/3581783.3612386

Published: 27 October 2023 Publication History

Abstract

Deepfake has taken the world by storm, triggering a trust crisis. Current deepfake detection methods are typically inadequate in generalizability, with a tendency to overfit to image contents such as the background, which are frequently occurring but relatively unimportant in the training dataset. Furthermore, current methods heavily rely on a few dominant forgery regions and may ignore other equally important regions, leading to inadequate uncovering of forgery cues.

In this paper, we strive to address these shortcomings from three aspects: (1) We propose an innovative two-stream network that effectively enlarges the potential regions from which the model extracts forgery evidence. (2) We devise three functional modules to handle the multi-stream and multi-scale features in a collaborative learning scheme. (3) Confronted with the challenge of obtaining forgery annotations, we propose a Semi-supervised Patch Similarity Learning strategy to estimate patch-level forged location annotations. Empirically, our method demonstrates significantly improved robustness and generalizability, outperforming previous methods on six benchmarks, and improving the frame-level AUC on Deepfake Detection Challenge preview dataset from 0.797 to 0.835 and video-level AUC on CelebDF_v1 dataset from 0.811 to 0.847. Our implementation is available at https://github.com/sccsok/Locate-and-Verify.

References

[1]

2016. FaceSwap. https://github.com/MarekKowalski/FaceSwap/. Accessed: 2023-3-19.

[2]

2020. Deepfake detection challenge. https://www.kaggle.com/c/deepfake- detection-challenge. Accessed: 2023-3-19.

[3]

2020. Deepfakes. https://github.com/deepfakes/faceswap. Accessed: 2023-3-19.

[4]

Liang Chen, Yong Zhang, Yibing Song, Lingqiao Liu, and Jue Wang. 2022. Self-supervised learning of adversarial example: Towards good generalizations for deepfake detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 18710--18719.

[5]

Renwang Chen, Xuanhong Chen, Bingbing Ni, and Yanhao Ge. 2020. Simswap: An efficient framework for high fidelity face swapping. In Proceedings of the 28th ACM International Conference on Multimedia. 2003--2011.

Digital Library

[6]

Shen Chen, Taiping Yao, Yang Chen, Shouhong Ding, Jilin Li, and Rongrong Ji. 2021. Local relation learning for face forgery detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 1081--1088.

[7]

François Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1251--1258.

[8]

Davide Cozzolino, Andreas Rössler, Justus Thies, Matthias Nießner, and Luisa Verdoliva. 2021. Id-reveal: Identity-aware deepfake video detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 15108--15117.

[9]

Hao Dang, Feng Liu, Joel Stehouwer, Xiaoming Liu, and Anil K Jain. 2020. On the detection of digital face manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern recognition. 5781--5790.

[10]

Sowmen Das, Selim Seferbekov, Arup Datta, Md Islam, Md Amin, et al. 2021. Towards solving the deepfake problem: An analysis on improving deepfake detection using dynamic face augmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3776--3785.

[11]

Brian Dolhansky, Russ Howes, Ben Pflaum, Nicole Baram, and Cristian Canton Ferrer. 2019. The deepfake detection challenge (dfdc) preview dataset. arXiv preprint arXiv:1910.08854 (2019).

[12]

Jianfeng Dong, Xiaoman Peng, Zhe Ma, Daizong Liu, Xiaoye Qu, Xun Yang, Jixiang Zhu, and Baolong Liu. 2023. From Region to Patch: Attribute-Aware Foreground-Background Contrastive Learning for Fine-Grained Fashion Retrieval. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1273--1282.

Digital Library

[13]

Shichao Dong, Jin Wang, Renhe Ji, Jiajun Liang, Haoqiang Fan, and Zheng Ge. 2022b. Towards A Robust Deepfake Detector: Common Artifact Deepfake Detection Model. arXiv preprint arXiv:2210.14457 (2022).

[14]

Shichao Dong, Jin Wang, Jiajun Liang, Haoqiang Fan, and Renhe Ji. 2022c. Explaining Deepfake Detection by Analysing Image Matching. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XIV. Springer, 18--35.

[15]

Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Ting Zhang, Weiming Zhang, Nenghai Yu, Dong Chen, Fang Wen, and Baining Guo. 2022a. Protecting celebrities from deepfake with identity consistency transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9468--9478.

[16]

Jianwei Fei, Yunshu Dai, Peipeng Yu, Tianrun Shen, Zhihua Xia, and Jian Weng. 2022. Learning second order local anomaly for general face forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20270--20280.

[17]

Joel Frank, Thorsten Eisenhofer, Lea Schönherr, Asja Fischer, Dorothea Kolossa, and Thorsten Holz. 2020. Leveraging frequency analysis for deep fake image recognition. In International conference on machine learning. PMLR, 3247--3258.

[18]

Jessica Fridrich and Jan Kodovsky. 2012. Rich models for steganalysis of digital images. IEEE Transactions on information Forensics and Security, Vol. 7, 3 (2012), 868--882.

Digital Library

[19]

Jiazhi Guan, Hang Zhou, Zhibin Hong, Errui Ding, Jingdong Wang, Chengbin Quan, and Youjian Zhao. 2022. Delving into sequential patches for DeepFake detection. arXiv preprint arXiv:2207.02803 (2022).

[20]

Alexandros Haliassos, Konstantinos Vougioukas, Stavros Petridis, and Maja Pantic. 2021. Lips don't lie: A generalisable and robust approach to face forgery detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5039--5049.

[21]

R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, and Yoshua Bengio. 2018. Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018).

[22]

Baojin Huang, Zhongyuan Wang, Jifan Yang, Jiaxin Ai, Qin Zou, Qian Wang, and Dengpan Ye. 2023. Implicit Identity Driven Deepfake Face Swapping Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4490--4499.

[23]

Liming Jiang, Ren Li, Wayne Wu, Chen Qian, and Chen Change Loy. 2020. Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2889--2898.

[24]

Jin-Hwa Kim, Kyoung-Woon On, Woosang Lim, Jeonghee Kim, Jung-Woo Ha, and Byoung-Tak Zhang. 2016. Hadamard product for low-rank bilinear pooling. arXiv preprint arXiv:1610.04325 (2016).

[25]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[26]

Jiaming Li, Hongtao Xie, Jiahong Li, Zhongyuan Wang, and Yongdong Zhang. 2021. Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6458--6467.

[27]

Jiaming Li, Hongtao Xie, Lingyun Yu, and Yongdong Zhang. 2022. Wavelet-enhanced Weakly Supervised Local Feature Learning for Face Forgery Detection. In Proceedings of the 30th ACM International Conference on Multimedia. 1299--1308.

Digital Library

[28]

Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, and Baining Guo. 2020a. Face x-ray for more general face forgery detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5001--5010.

[29]

Yuezun Li and Siwei Lyu. 2018. Exposing deepfake videos by detecting face warping artifacts. arXiv preprint arXiv:1811.00656 (2018).

[30]

Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. 2019. Celeb-df (v2): a new dataset for deepfake forensics. arXiv preprint arXiv:1909.12962, Vol. 4 (2019).

[31]

Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. 2020b. Celeb-df: A large-scale challenging dataset for deepfake forensics. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3207--3216.

[32]

Honggu Liu, Xiaodan Li, Wenbo Zhou, Yuefeng Chen, Yuan He, Hui Xue, Weiming Zhang, and Nenghai Yu. 2021b. Spatial-phase shallow learning: rethinking face forgery detection in frequency domain. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 772--781.

[33]

Zhenguang Liu, Haoming Chen, Runyang Feng, Shuang Wu, Shouling Ji, Bailin Yang, and Xun Wang. 2021a. Deep Dual Consecutive Network for Human Pose Estimation. In CVPR. 525--534. https://doi.org/10.1109/CVPR46437.2021.00059

[34]

Zihan Liu, Hanyi Wang, and Shilin Wang. 2022a. Cross-Domain Local Characteristic Enhanced Deepfake Video Detection. In Proceedings of the Asian Conference on Computer Vision. 3412--3429.

[35]

Zhenguang Liu, Sifan Wu, Chejian Xu, Xiang Wang, Lei Zhu, Shuang Wu, and Fuli Feng. 2022b. Copy Motion From One to Another: Fake Motion Video Generation. arXiv preprint arXiv:2205.01373 (2022).

[36]

Yuchen Luo, Yong Zhang, Junchi Yan, and Wei Liu. 2021. Generalizing face forgery detection with high-frequency features. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 16317--16326.

[37]

Iacopo Masi, Aditya Killekar, Royston Marian Mascarenhas, Shenoy Pratik Gurudatt, and Wael AbdAlmageed. 2020. Two-branch recurrent network for isolating deepfakes in videos. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part VII 16. Springer, 667--684.

[38]

Yuval Nirkin, Yosi Keller, and Tal Hassner. 2019. Fsgan: Subject agnostic face swapping and reenactment. In Proceedings of the IEEE/CVF international conference on computer vision. 7184--7193.

[39]

KR Prajwal, Rudrabha Mukhopadhyay, Vinay P Namboodiri, and CV Jawahar. 2020. A lip sync expert is all you need for speech to lip generation in the wild. In Proceedings of the 28th ACM international conference on multimedia. 484--492.

Digital Library

[40]

Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, and Jing Shao. 2020. Thinking in frequency: Face forgery detection by mining frequency-aware clues. In European conference on computer vision. Springer, 86--103.

Digital Library

[41]

Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. Faceforensics: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF international conference on computer vision. 1--11.

[42]

Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision. 618--626.

[43]

Kaede Shiohara and Toshihiko Yamasaki. 2022. Detecting deepfakes with self-blended images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18720--18729.

[44]

Ke Sun, Hong Liu, Qixiang Ye, Yue Gao, Jianzhuang Liu, Ling Shao, and Rongrong Ji. 2021. Domain general face forgery detection by learning to weight. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 2638--2646.

[45]

Ke Sun, Taiping Yao, Shen Chen, Shouhong Ding, Jilin Li, and Rongrong Ji. 2022. Dual contrastive learning for general face forgery detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 2316--2324.

[46]

Supasorn Suwajanakorn, Steven M Seitz, and Ira Kemelmacher-Shlizerman. 2017. Synthesizing obama: learning lip sync from audio. ACM Transactions on Graphics (ToG), Vol. 36, 4 (2017), 1--13.

Digital Library

[47]

Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, and Yunchao Wei. 2023. Learning on Gradients: Generalized Artifacts Representation for GAN-Generated Images Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12105--12114.

[48]

Justus Thies, Michael Zollhöfer, and Matthias Nießner. 2019. Deferred neural rendering: Image synthesis using neural textures. Acm Transactions on Graphics (TOG), Vol. 38, 4 (2019), 1--12.

Digital Library

[49]

Justus Thies, Michael Zollhofer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2016. Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2387--2395.

Digital Library

[50]

Ruben Tolosana, Ruben Vera-Rodriguez, Julian Fierrez, Aythami Morales, and Javier Ortega-Garcia. 2020. Deepfakes and beyond: A survey of face manipulation and fake detection. Information Fusion, Vol. 64 (2020), 131--148.

[51]

Jian Wang, Yunlian Sun, and Jinhui Tang. 2022. LiSiam: Localization invariance Siamese network for deepfake detection. IEEE Transactions on Information Forensics and Security, Vol. 17 (2022), 2425--2436.

[52]

Zhicai Wang, Yanbin Hao, Tingting Mu, Ouxiang Li, Shuo Wang, and Xiangnan He. 2023. Bi-directional Distribution Alignment for Transductive Zero-Shot Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19893--19902.

[53]

Jun Wei, Shuhui Wang, and Qingming Huang. 2020. F3Net: fusion, feedback and focus for salient object detection. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 12321--12328.

[54]

Xun Yang, Fuli Feng, Wei Ji, Meng Wang, and Tat-Seng Chua. 2021. Deconfounded video moment retrieval with causal intervention. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1--10.

Digital Library

[55]

Yuehao Yin, Bin Zhu, Jingjing Chen, Lechao Cheng, and Yu-Gang Jiang. 2022. Mix-DANN and Dynamic-Modal-Distillation for Video Domain Adaptation. In Proceedings of the 30th ACM International Conference on Multimedia. 3224--3233.

Digital Library

[56]

Hanqing Zhao, Wenbo Zhou, Dongdong Chen, Tianyi Wei, Weiming Zhang, and Nenghai Yu. 2021b. Multi-attentional deepfake detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2185--2194.

[57]

Tianchen Zhao, Xiang Xu, Mingze Xu, Hui Ding, Yuanjun Xiong, and Wei Xia. 2021a. Learning self-consistency for deepfake detection. In Proceedings of the IEEE/CVF international conference on computer vision. 15023--15033.

[58]

Yinglin Zheng, Jianmin Bao, Dong Chen, Ming Zeng, and Fang Wen. 2021. Exploring temporal coherence for more general video face forgery detection. In Proceedings of the IEEE/CVF international conference on computer vision. 15044--15054.

[59]

Tianfei Zhou, Wenguan Wang, Zhiyuan Liang, and Jianbing Shen. 2021. Face forensics in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5778--5788.

[60]

Wanyi Zhuang, Qi Chu, Zhentao Tan, Qiankun Liu, Haojie Yuan, Changtao Miao, Zixiang Luo, and Nenghai Yu. 2022. UIA-ViT: Unsupervised inconsistency-aware method based on vision transformer for face forgery detection. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part V. Springer, 391--407.

Cited By

Lin CYi FWang HDeng JZhao ZLi QShen C(2024)Exploiting Facial Relationships and Feature Aggregation for Multi-Face Forgery DetectionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.346146919(8832-8844)Online publication date: 2024
https://doi.org/10.1109/TIFS.2024.3461469
Wang BWu XWang FZhang YWei FSong Z(2024)Spatial-frequency feature fusion based deepfake detection through knowledge distillationEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108341133(108341)Online publication date: Jul-2024
https://doi.org/10.1016/j.engappai.2024.108341
Mi ZJiang XSun TXu KXu QMeng L(2024)Low-Quality Deepfake Video Detection Model Targeting Compression-Degraded Spatiotemporal InconsistenciesAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5606-3_23(267-280)Online publication date: 30-Jul-2024
https://doi.org/10.1007/978-981-97-5606-3_23

Index Terms

Locate and Verify: A Two-Stream Network for Improved Deepfake Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Transductive Multilabel Learning via Label Set Propagation

The problem of multilabel classification has attracted great interest in the last decade, where each instance can be assigned with a set of multiple class labels simultaneously. It has a wide variety of real-world applications, e.g., automatic image ...
AI-assisted deepfake detection using adaptive blind image watermarking
Highlights
- A new adaptive blind image watermarking technology, utilizing artificial intelligence (AI) and named AwDD, has been proposed for detecting color image deepfakes.
- The AI technology used includes face detection, denoising autoencoder (...
Abstract
This paper proposes a new adaptive blind watermarking technology for deepfake detection, which can embed deepfake detection information into the image and verify the image's authenticity without requiring additional information. The proposed ...
Weakly- and Semi-Supervised Fast Region-Based CNN for Object Detection
Abstract
Learning an effective object detector with little supervision is an essential but challenging problem in computer vision applications. In this paper, we consider the problem of learning a deep convolutional neural network (CNN) based object ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

the National Natural Science Foundation of China
the National Key R\&D Program of China
the Key R\&D Program of Zhejiang Province

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
376
Total Downloads

Downloads (Last 12 months)376
Downloads (Last 6 weeks)41

Reflects downloads up to 01 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lin CYi FWang HDeng JZhao ZLi QShen C(2024)Exploiting Facial Relationships and Feature Aggregation for Multi-Face Forgery DetectionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.346146919(8832-8844)Online publication date: 2024
https://doi.org/10.1109/TIFS.2024.3461469
Wang BWu XWang FZhang YWei FSong Z(2024)Spatial-frequency feature fusion based deepfake detection through knowledge distillationEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108341133(108341)Online publication date: Jul-2024
https://doi.org/10.1016/j.engappai.2024.108341
Mi ZJiang XSun TXu KXu QMeng L(2024)Low-Quality Deepfake Video Detection Model Targeting Compression-Degraded Spatiotemporal InconsistenciesAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5606-3_23(267-280)Online publication date: 30-Jul-2024
https://doi.org/10.1007/978-981-97-5606-3_23
Yu ZLi JWang GZhu YLuo G(2024)Generalizable Deepfake Detection with Unbiased Feature Extraction and Low-Level Forgery EnhancementArtificial Neural Networks and Machine Learning – ICANN 202410.1007/978-3-031-72335-3_19(275-288)Online publication date: 17-Sep-2024
https://doi.org/10.1007/978-3-031-72335-3_19

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents