research-article

Open access

A Multi-Stage Deep Learning Approach Incorporating Text-Image and Image-Image Comparisons for Cheapfake Detection

Authors:

Hyo-Seok Hwang,

Junhee SeokAuthors Info & Claims

ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

Pages 1312 - 1316

https://doi.org/10.1145/3652583.3657601

Published: 07 June 2024 Publication History

Abstract

The advancement of multimedia and artificial intelligence (AI) technologies has dismantled the barriers of information sharing, yet it has also ushered in a double-edged sword: a surge in the spread of fake information. In this context, there is a growing need for research on the detection of 'cheapfakes,' which are low-cost fake media, known for their ease of creation. This paper proposes a multi-stage deep learning process designed to effectively detect the diverse and rapidly evolving nature of cheapfakes. A single-step deep learning model faces limitations in distinguishing various types of cheapfakes, necessitating the application of a complex deep learning model approach to detect subtle Out-of-Context (OOC) phenomena. This study employs models based on Bidirectional Encoder Representations from Transformers (BERT) and stable diffusion technologies to approach cheapfake detection. Through the ACM ICMR 2024 challenge, the performance of this model was evaluated on a real dataset, achieving an accuracy of 71.9% in Task 1, an improvement of 7% over previous methods, and an accuracy of 55.7% in Task 2. These results are expected to make a significant contribution to the development of strategies for creating and countering cheapfakes. Additionally, this research aims to contribute to the detection of OOC media misuse through this challenge.

References

[1]

Britt Paris, and Joan Donovan. 2019. Deepfakes and cheap fakes.

[2]

Darius Afchar, Vincent Nozick, Junichi Yamagishi, and Isao Echizen. 2018. Mesonet: a compact facial video forgery detection network. 2018 IEEE international workshop on information forensics and security (WIFS) 2018.

[3]

Md Shohel Rana, Mohammad Nur Nobi, Beddhu Murali, and Andrew H Sung. 2022. Deepfake detection: A systematic literature review. IEEE access 10. https://doi.org/ 10.1109/ACCESS.2022.3154404

[4]

Brian Dolhansky, Joanna Bitton, Ben Pflaum, Jikuo Lu, Russ Howes, Menglin Wang, and Cristian Canton Ferrer. 2020. The deepfake detection challenge (dfdc) dataset. arXiv preprint arXiv:2006.07397. https://doi.org/ 10.48550/arXiv.2006.07397

[5]

Hanqing Zhao, Wenbo Zhou, Dongdong Chen, Tianyi Wei, Weiming Zhang, and Nenghai Yu. 2021. Multi-attentional deepfake detection. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) 2021.

[6]

Shivangi Aneja, Cise Midoglu, Duc-Tien Dang-Nguyen, Michael Alexander Riegler, Paal Halvorsen, Matthias Nießner, Balu Adsumilli, and Chris Bregler. 2021. MMSys' 21 grand challenge on detecting cheapfakes. arXiv preprint arXiv:2107.05297. https://doi.org/ 10.48550/arXiv.2107.05297

[7]

Shivangi Aneja, Cise Midoglu, Duc-Tien Dang-Nguyen, Sohail Ahmed Khan, Michael Riegler, Pål Halvorsen, Chris Bregler, and Balu Adsumilli. 2022. Acm multimedia grand challenge on detecting cheapfakes. arXiv preprint arXiv:2207.14534. https://doi.org/ 10.48550/arXiv.2207.14534

[8]

Tankut Akgul, Tugce Erkilic Civelek, Deniz Ugur, and Ali C Begen. 2021. Cosmos on steroids: a cheap detector for cheapfakes. Proceedings of the 12th ACM Multimedia Systems Conference 2021.

Digital Library

[9]

Kha-Luan Pham, Manh-Thien Nguyen, Anh-Duy Tran, Minh-Son Dao, and Duc-Tien Dang-Nguyen. 2023. Detecting cheapfakes using self-query adaptive-context learning. Proceedings of the 4th ACM Workshop on Intelligent Cross-Data Analysis and Retrieval 2023.

Digital Library

[10]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

[11]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) 2022.

[12]

Yuanen Zhou, Meng Wang, Daqing Liu, Zhenzhen Hu, and Hanwang Zhang. 2020. More grounded image captioning by distilling image-text matching model. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) 2020.

[13]

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. 2019. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675. https://doi.org/ 10.48550/arXiv.1904.09675

[14]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. Medical image computing and computer-assisted intervention--MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18 2015.

[15]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, and Jack Clark. 2021. Learning transferable visual models from natural language supervision. International conference on machine learning (PMLR) 2021.

[16]

Shivangi Aneja, Chris Bregler, and Matthias Nießner. 2021. Cosmos: Catching out-of-context misinformation with self-supervised learning. arXiv preprint arXiv:2101.06278. https://doi.org/ 10.48550/arXiv.2101.06278

[17]

Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. 2020. Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654. https://doi.org/ 10.48550/arXiv.2006.03654

[18]

Nils Reimers, and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.

[19]

Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. 2023. Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952.

Cited By

Dang-Nguyen DKhan SRiegler MHalvorsen PTran ADao MTran MGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)Overview of the Grand Challenge on Detecting Cheapfakes at ACM ICMR 2024Proceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3657587(1275-1281)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3657587

Index Terms

A Multi-Stage Deep Learning Approach Incorporating Text-Image and Image-Image Comparisons for Cheapfake Detection
1. Computing methodologies
  1. Artificial intelligence

Recommendations

Deep learning for image-based cancer detection and diagnosis − A survey
Abstract
In this paper, we aim to provide a survey on the applications of deep learning for cancer detection and diagnosis and hope to provide an overview of the progress in this field. In the survey, we firstly provide an overview on deep ...
Multi-stage image denoising with the wavelet transform
Highlights
- A dynamic convolution is used into a CNN to address limitations in depth and width of lightweight CNNs for pursuing good denoising performance.
- The combination of a signal processing technique and discriminative learning technique is ...
Abstract
Deep convolutional neural networks (CNNs) are used for image denoising via automatically mining accurate structure information. However, most of existing CNNs depend on enlarging depth of designed networks to obtain better denoising performance, ...
Rumor and clickbait detection by combining information divergence measures and deep learning techniques
ARES '22: Proceedings of the 17th International Conference on Availability, Reliability and Security

In this article we address the challenge of detecting the generation and spreading of misleading information in the specific scenario of clickbait. Our contribution consists of a methodology that combines a deep neural network and an information ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

May 2024

1379 pages

ISBN:9798400706196

DOI:10.1145/3652583

General Chairs:
Cathal Gurrin
Dublin City University, Ireland
,
Rachada Kongkachandra
Thammasat University, Thailand
,
Klaus Schoeffmann
Klagenfurt University, Austria
,
Program Chairs:
Duc-Tien Dang-Nguyen
University of Bergen, Norway
,
Luca Rossetto
University of Zurich, Switzerland
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Liting Zhou
Dublin City University, Ireland

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Korea Institute of Police Technology
National Research Foundation of Korea

Conference

ICMR '24

Sponsor:

ICMR '24: International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket, Thailand

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
78
Total Downloads

Downloads (Last 12 months)78
Downloads (Last 6 weeks)20

Reflects downloads up to 24 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Dang-Nguyen DKhan SRiegler MHalvorsen PTran ADao MTran MGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)Overview of the Grand Challenge on Detecting Cheapfakes at ACM ICMR 2024Proceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3657587(1275-1281)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3657587

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents