Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3658644.3690297acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

ZeroFake: Zero-Shot Detection of Fake Images Generated and Edited by Text-to-Image Generation Models

Published: 09 December 2024 Publication History

Abstract

The text-to-image generation model has attracted significant interest from both academic and industrial communities. These models can generate the images based on the given prompt descriptions. Their potent capabilities, while beneficial, also present risks. Previous efforts relied on the approach of training binary classifiers to detect the generated fake images, which is inefficient, lacking in generalizability, and non-robust. In this paper, we propose the novel zero-shot detection method, called ZeroFake, to distinguish fake images apart from real ones by utilizing a perturbation-based DDIM inversion technique. ZeroFake is inspired by the findings that fake images are more robust than real images during the process of DDIM inversion and reconstruction. Specifically, for a given image, ZeroFake first generates noise with DDIM inversion guided by adversary prompts. Then, ZeroFake reconstructs the image from the generated noise. Subsequently, it compares the reconstructed image with the original image to determine whether it is fake or real. By exploiting the differential response of fake and real images to the adversary prompts during the inversion and reconstruction process, our model offers a more robust and efficient method to detect fake images without the extensive data and training costs. Extensive results demonstrate that the proposed ZeroFake can achieve great performance in fake image detection, fake artwork detection, and fake edited image detection. We further illustrate the robustness of the proposed ZeroFake by showcasing its resilience against potential adversary attacks. We hope that our solution can better assist the community in achieving the arrival of a more efficient and fair AGI.

References

[1]
https://midjourney.com/.
[2]
https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0.
[3]
https://github.com/Robin-WZQ/AGFD-20K.
[4]
Navaneeth Bodla, Gang Hua, and Rama Chellappa. Semi-supervised FusedGAN for Conditional Image Generation. In European Conference on Computer Vision (ECCV), pages 689--704. Springer, 2018.
[5]
Andrew Brock, Jeff Donahue, and Karen Simonyan. Large Scale GAN Training for High Fidelity Natural Image Synthesis. In International Conference on Learning Representations (ICLR), 2019.
[6]
Tim Brooks, Aleksander Holynski, and Alexei A. Efros. InstructPix2Pix: Learning to Follow Image Editing Instructions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 18392--18402. IEEE, 2023.
[7]
Chen Chen, Qifeng Chen, Jia Xu, and Vladlen Koltun. Learning to See in the Dark. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3291--3330. IEEE, 2018.
[8]
Qifeng Chen and Vladlen Koltun. Photographic Image Synthesis with Cascaded Refinement Networks. In IEEE International Conference on Computer Vision (ICCV), pages 1520--1529. IEEE, 2017.
[9]
Tao Dai, Jianrui Cai, Yongbing Zhang, Shu-Tao Xia, and Lei Zhang. Second-Order Attention Network for Single Image Super-Resolution. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 11065--11074. IEEE, 2019.
[10]
Ming Ding, Zhuoyi Yang, Wenyi Hong, Wendi Zheng, Chang Zhou, Da Yin, Junyang Lin, Xu Zou, Zhou Shao, Hongxia Yang, and Jie Tang. CogView: Mastering Text-to-Image Generation via Transformers. In Annual Conference on Neural Information Processing Systems (NeurIPS), pages 19822--19835. NeurIPS, 2021.
[11]
Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, and Daniel Cohen-Or. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion. CoRR abs/2208.01618, 2022.
[12]
Sharath Girish, Saksham Suri, Sai Saketh Rambhatla, and Abhinav Shrivastava. Towards Discovery and Attribution of Open-World GAN Generated Images. In IEEE International Conference on Computer Vision (ICCV), pages 14094--14103. IEEE, 2021.
[13]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative Adversarial Nets. In Annual Conference on Neural Information Processing Systems (NIPS), pages 2672--2680. NIPS, 2014.
[14]
Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and Harnessing Adversarial Examples. In International Conference on Learning Representations (ICLR), 2015.
[15]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising Diffusion Probabilistic Models. In Annual Conference on Neural Information Processing Systems (NeurIPS). NeurIPS, 2020.
[16]
Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial Examples in the Physical World. CoRR abs/1607.02533, 2016.
[17]
Qicheng Lao, Mohammad Havaei, Ahmad Pesaranghader, Francis Dutil, Lisa Di-Jorio, and Thomas Fevens. Dual Adversarial Inference for Text-to-Image Synthesis. In IEEE International Conference on Computer Vision (ICCV), pages 7566--7575. IEEE, 2019.
[18]
Ang Li, Yichuan Mo, Mingjie Li, and Yisen Wang. PID: Prompt-Independent Data Protection Against Latent Diffusion Models. In International Conference on Machine Learning (ICML), pages 28421--28447. PMLR, 2024.
[19]
Junnan Li, Dongxu Li, Caiming Xiong, and Steven C. H. Hoi. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. CoRR abs/2201.12086, 2022.
[20]
Ke Li, Tianhao Zhang, and Jitendra Malik. Diverse Image Synthesis From Semantic Layouts via Conditional IMLE. In IEEE International Conference on Computer Vision (ICCV), pages 4219--4228. IEEE, 2019.
[21]
Mingjie Li, Lingshen He, and Zhouchen Lin. Implicit Euler Skip Connections: Enhancing Adversarial Robustness via Numerical Stability. In International Conference on Machine Learning (ICML), pages 5874--5883. PMLR, 2020.
[22]
Mingjie Li, Yisen Wang, and Zhouchen Lin. Cerdeq: Certifiable deep equilibrium model. In International Conference on Machine Learning (ICML), pages 12998-- 13013. PMLR, 2022.
[23]
Mingjie Li, Yisen Wang, and Zhouchen Lin. GEQ: Gaussian Kernel Inspired Equilibrium models. In Annual Conference on Neural Information Processing Systems (NeurIPS), pages 38767--38785. NeurIPS, 2023.
[24]
Wenbo Li, Pengchuan Zhang, Lei Zhang, Qiuyuan Huang, Xiaodong He, Siwei Lyu, and Jianfeng Gao. Object-Driven Text-To-Image Synthesis via Adversarial Training. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1274--12182. IEEE, 2019.
[25]
Shanchuan Lin, Anran Wang, and Xiao Yang. SDXL-Lightning: Progressive Adversarial Diffusion Distillation. CoRR abs/2402.13929, 2024.
[26]
Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision (ECCV), pages 740--755. Springer, 2014.
[27]
Vivian Liu and Lydia B. Chilton. Design Guidelines for Prompt Engineering Text-to-Image Generative Models. In Annual ACM Conference on Human Factors in Computing Systems (CHI), pages 384:1--384:23. ACM, 2022.
[28]
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep Learning Face Attributes in the Wild. In IEEE International Conference on Computer Vision (ICCV), pages 3730--3738. IEEE, 2015.
[29]
Francesco Marra, Diego Gragnaniello, Davide Cozzolino, and Luisa Verdoliva. Detection of GAN-Generated Fake Images over Social Networks. In IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pages 384--389. IEEE, 2018.
[30]
Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations. In International Conference on Learning Representations (ICLR), 2022.
[31]
Yichuan Mo, Hui Huang, Mingjie Li, Ang Li, and Yisen Wang. TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors. In International Conference on Machine Learning (ICML), pages 35892--35909. PMLR, 2024.
[32]
Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. CoRR abs/2112.10741, 2021.
[33]
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical Text-Conditional Image Generation with CLIP Latents. CoRR abs/2204.06125, 2022.
[34]
Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-Shot Text-to-Image Generation. In International Conference on Machine Learning (ICML), pages 8821--8831. JMLR, 2021.
[35]
Scott E. Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. Generative Adversarial Text to Image Synthesis. In International Conference on Machine Learning (ICML), pages 1060--1069. JMLR, 2016.
[36]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-Resolution Image Synthesis with Latent Diffusion Models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684-- 10695. IEEE, 2022.
[37]
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J. Fleet, and Mohammad Norouzi. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. CoRR abs/2205.11487, 2022.
[38]
Zeyang Sha, Zheng Li, Ning Yu, and Yang Zhang. DE-FAKE: Detection and Attribution of Fake Images Generated by Text-to-Image Diffusion Models. CoRR abs/2210.06998, 2022.
[39]
Jiaming Song, Chenlin Meng, and Stefano Ermo. Denoising Diffusion Implicit Models. In International Conference on Learning Representations (ICLR), 2021.
[40]
Douglas M. Souza, Jonatas Wehrmann, and Duncan D. Ruiz. Efficient Neural Architecture for Text-to-Image Synthesis. In International Joint Conference on Neural Networks (IJCNN), pages 1--8. IEEE, 2020.
[41]
Omer Tov, Yuval Alaluf, Yotam Nitzan, Or Patashnik, and Daniel Cohen-Or. Designing an encoder for StyleGAN image manipulation. ACM Transactions on Graphics, 2021.
[42]
Linoy Tsaban and Apolinário Passos. LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance. CoRR abs/2307.00522, 2023.
[43]
Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A. Efros. CNN-Generated Images Are Surprisingly Easy to Spot... for Now. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 8692--8701. IEEE, 2020.
[44]
Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, and Houqiang Li. DIRE for Diffusion-Generated Image Detection. In IEEE International Conference on Computer Vision (ICCV), pages 22388--22398. IEEE, 2023.
[45]
Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. Image Quality Assessment: from Error Visibility to Structural Similarity. IEEE Transactions on Image Processing, 2004.
[46]
Zixu Wang, Zhe Quan, Zhi-Jie Wang, Xinjian Hu, and Yangyang Chen. Text to Image Synthesis With Bidirectional Generative Adversarial Network. In International Conference on Multimedia and Expo (ICME), pages 1--6. IEEE, 2020.
[47]
Yixin Wu, Ning Yu, Zheng Li, Michael Backes, and Yang Zhang. Membership Inference Attacks Against Text-to-image Generation Models. CoRR abs/2210.00968, 2022.
[48]
Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, 2014.
[49]
Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, Zirui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, and Yonghui Wu. Scaling Autoregressive Models for Content-Rich Text-to-Image Generation. CoRR abs/2206.10789, 2022.
[50]
Ning Yu, Larry Davis, and Mario Fritz. Attributing Fake Images to GANs: Learning and Analyzing GAN Fingerprints. In IEEE International Conference on Computer Vision (ICCV), pages 7555--7565. IEEE, 2019.
[51]
Han Zhang, Jing Yu Koh, Jason Baldridge, Honglak Lee, and Yinfei Yang. CrossModal Contrastive Learning for Text-to-Image Generation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 833--842. IEEE, 2021.
[52]
Han Zhang, Tao Xu, and Hongsheng Li. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks. In IEEE International Conference on Computer Vision (ICCV), pages 5908--5916. IEEE, 2017.
[53]
Xu Zhang, Svebor Karaman, and Shih-Fu Chang. Detecting and Simulating Artifacts in GAN Fake Images. In IEEE International Workshop on Information Forensics and Security (WIFS), pages 1--6. IEEE, 2019.

Index Terms

  1. ZeroFake: Zero-Shot Detection of Fake Images Generated and Edited by Text-to-Image Generation Models

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CCS '24: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security
    December 2024
    5188 pages
    ISBN:9798400706363
    DOI:10.1145/3658644
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 December 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. deepfake detection
    2. image editing
    3. text-to-image models

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    CCS '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

    Upcoming Conference

    CCS '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 282
      Total Downloads
    • Downloads (Last 12 months)282
    • Downloads (Last 6 weeks)100
    Reflects downloads up to 22 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media