research-article

Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning

Authors:

Shengfang Zhai,

Hang SuAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 1577 - 1587

https://doi.org/10.1145/3581783.3612108

Published: 27 October 2023 Publication History

Abstract

With the help of conditioning mechanisms, the state-of-the-art diffusion models have achieved tremendous success in guided image generation, particularly in text-to-image synthesis. To gain a better understanding of the training process and potential risks of text-to-image synthesis, we perform a systematic investigation of backdoor attack on text-to-image diffusion models and propose BadT2I, a general multimodal backdoor attack framework that tampers with image synthesis in diverse semantic levels. Specifically, we perform backdoor attacks on three levels of the vision semantics: Pixel-Backdoor, Object-Backdoor and Style-Backdoor. By utilizing a regularization loss, our methods efficiently inject backdoors into a large-scale text-to-image diffusion model while preserving its utility with benign inputs. We conduct empirical experiments on Stable Diffusion, the widely-used text-to-image diffusion model, demonstrating that the large-scale diffusion model can be easily backdoored within a few fine-tuning steps. We conduct additional experiments to explore the impact of different types of textual triggers, as well as the backdoor persistence during further training, providing insights for the development of backdoor defense methods. Besides, our investigation may contribute to the copyright protection of text-to-image models in the future. Our Code: https://github.com/sf-zhai/BadT2I.

References

[1]

Yossi Adi, Carsten Baum, Moustapha Cisse, Benny Pinkas, and Joseph Keshet. 2018. Turning your weakness into a strength: Watermarking deep neural networks by backdooring. In 27th {USENIX} Security Symposium ({USENIX} Security 18). 1615--1631.

[2]

Fan Bao, Chongxuan Li, Yue Cao, and Jun Zhu. 2022. All are Worth Words: a ViT Backbone for Score-based Diffusion Models. arXiv preprint arXiv:2209.12152 (2022).

[3]

Shih-Han Chan, Yinpeng Dong, Jun Zhu, Xiaolu Zhang, and Jun Zhou. 2023. Baddet: Backdoor attacks on object detection. In Computer Vision-ECCV 2022 Workshops: Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part I. Springer, 396--412.

Digital Library

[4]

Wenhu Chen, Hexiang Hu, Chitwan Saharia, and William W Cohen. 2022b. Re-imagen: Retrieval-augmented text-to-image generator. arXiv preprint arXiv:2209.14491 (2022).

[5]

Weixin Chen, Dawn Song, and Bo Li. 2023. TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets. arXiv preprint arXiv:2303.05762 (2023).

[6]

Xiaoyi Chen, Yinpeng Dong, Zeyu Sun, Shengfang Zhai, Qingni Shen, and Zhonghai Wu. 2022a. Kallima: A Clean-Label Framework for Textual Backdoor Attacks. In Computer Security-ESORICS 2022: 27th European Symposium on Research in Computer Security, Copenhagen, Denmark, September 26-30, 2022, Proceedings, Part I. Springer, 447--466.

[7]

Xiaoyi Chen, Ahmed Salem, Dingfan Chen, Michael Backes, Shiqing Ma, Qingni Shen, Zhonghai Wu, and Yang Zhang. 2021. Badnl: Backdoor attacks against nlp models with semantic-preserving improvements. In Annual Computer Security Applications Conference. 554--569.

Digital Library

[8]

Sheng-Yen Chou, Pin-Yu Chen, and Tsung-Yi Ho. 2022. How to Backdoor Diffusion Models? arXiv preprint arXiv:2212.05400 (2022).

[9]

Jiazhu Dai, Chuanshuai Chen, and Yufeng Li. 2019. A backdoor attack against lstm-based text classification systems. IEEE Access, Vol. 7 (2019), 138872--138878.

[10]

Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, Vol. 34 (2021), 8780--8794.

[11]

Tianyu Gu, Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2019. Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access, Vol. 7 (2019), 47230--47244.

[12]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[13]

Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. 2021. Clipscore: A reference-free evaluation metric for image captioning. arXiv preprint arXiv:2104.08718 (2021).

[14]

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems (2017).

Digital Library

[15]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems (2020).

[16]

Jonathan Ho and Tim Salimans. 2022. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598 (2022).

[17]

Jinyuan Jia, Yupei Liu, and Neil Zhenqiang Gong. 2022. Badencoder: Backdoor attacks to pre-trained encoders in self-supervised learning. In 2022 IEEE Symposium on Security and Privacy (SP). IEEE, 2043--2059.

[18]

Gwanghyun Kim, Taesung Kwon, and Jong Chul Ye. 2022. Diffusionclip: Text-guided diffusion models for robust image manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2426--2435.

[19]

Keita Kurita, Paul Michel, and Graham Neubig. 2020. Weight Poisoning Attacks on Pretrained Models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2793--2806.

[20]

DeepFloyd Lab. 2023. DeepFloyd IF. https://github.com/deep-floyd/IF.

[21]

Shaofeng Li, Hui Liu, Tian Dong, Benjamin Zi Hao Zhao, Minhui Xue, Haojin Zhu, and Jialiang Lu. 2021. Hidden backdoors in human-centric language models. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. 3123--3140.

Digital Library

[22]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Computer Vision-ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 740--755.

[23]

Xihui Liu, Dong Huk Park, Samaneh Azadi, Gong Zhang, Arman Chopikyan, Yuxiao Hu, Humphrey Shi, Anna Rohrbach, and Trevor Darrell. 2023. More control for free! image synthesis with semantic diffusion guidance. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 289--299.

[24]

Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. [n.,d.]. DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps. In Advances in Neural Information Processing Systems.

[25]

Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. 2021. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741 (2021).

[26]

Alexander Quinn Nichol and Prafulla Dhariwal. 2021. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning. PMLR, 8162--8171.

[27]

Ding Sheng Ong, Chee Seng Chan, Kam Woh Ng, Lixin Fan, and Qiang Yang. 2021. Protecting intellectual property of generative adversarial networks from ambiguity attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3630--3639.

[28]

Fanchao Qi, Yangyi Chen, Mukai Li, Yuan Yao, Zhiyuan Liu, and Maosong Sun. 2021. ONION: A Simple and Effective Defense Against Textual Backdoor Attacks. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 9558--9566.

[29]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748--8763.

[30]

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022).

[31]

Ambrish Rawat, Killian Levacher, and Mathieu Sinn. 2022. The Devil Is in the GAN: Backdoor Attacks and Defenses in Deep Generative Models. In Computer Security-ESORICS 2022: 27th European Symposium on Research in Computer Security, Copenhagen, Denmark, September 26-30, 2022, Proceedings, Part III. Springer, 776--783.

[32]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10684--10695.

[33]

Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. 2022. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. arXiv preprint arXiv:2208.12242 (2022).

[34]

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. 2022. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, Vol. 35 (2022), 36479--36494.

[35]

Ahmed Salem, Yannick Sautter, Michael Backes, Mathias Humbert, and Yang Zhang. 2020. Baaan: Backdoor attacks against autoencoder and gan-based machine learning models. arXiv preprint arXiv:2010.03007 (2020).

[36]

Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. arXiv preprint arXiv:2210.08402 (2022).

[37]

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning. 2256--2265.

[38]

Jiaming Song, Chenlin Meng, and Stefano Ermon. 2020. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020).

[39]

Lukas Struppek, Dominik Hintersdorf, and Kristian Kersting. 2022. Rickrolling the Artist: Injecting Invisible Backdoors into Text-Guided Image Generation Models. arXiv preprint arXiv:2211.02408 (2022).

[40]

Raphael Tang, Linqing Liu, Akshat Pandey, Zhiying Jiang, Gefei Yang, Karun Kumar, Pontus Stenetorp, Jimmy Lin, and Ferhan Ture. 2022. What the DAAM: Interpreting Stable Diffusion Using Cross Attention. arXiv:2210.04885 (2022).

[41]

Zhenting Wang, Chen Chen, Yuchen Liu, Lingjuan Lyu, Dimitris Metaxas, and Shiqing Ma. 2023. How to Detect Unauthorized Data Usages in Text-to-image Diffusion Models. arXiv preprint arXiv:2307.03108 (2023).

[42]

Emily Wenger, Josephine Passananti, Arjun Nitin Bhagoji, Yuanshun Yao, Haitao Zheng, and Ben Y Zhao. 2021. Backdoor attacks against deep learning systems in the physical world. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6206--6215.

[43]

Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yue Zhao, Yingxia Shao, Wentao Zhang, Bin Cui, and Ming-Hsuan Yang. 2022. Diffusion models: A comprehensive survey of methods and applications. arXiv preprint arXiv:2209.00796 (2022).

[44]

Shengfang Zhai, Qingni Shen, Xiaoyi Chen, Weilong Wang, Cong Li, Yuejian Fang, and Zhonghai Wu. 2023. NCL: Textual Backdoor Defense Using Noise-augmented Contrastive Learning. arXiv preprint arXiv:2303.01742 (2023).

[45]

Jialong Zhang, Zhongshu Gu, Jiyong Jang, Hui Wu, Marc Ph Stoecklin, Heqing Huang, and Ian Molloy. 2018. Protecting intellectual property of deep neural networks with watermarking. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security. 159--172.

Digital Library

[46]

Yunqing Zhao, Tianyu Pang, Chao Du, Xiao Yang, Ngai-Man Cheung, and Min Lin. 2023. A recipe for watermarking diffusion models. arXiv preprint arXiv:2303.10137 (2023).

Cited By

Fan MWang CChen CLiu YHuang J(2025)On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and OutlookInternational Journal of Computer Vision10.1007/s11263-025-02375-wOnline publication date: 28-Feb-2025
https://doi.org/10.1007/s11263-025-02375-w
Liang JLiang SLiu ACao X(2025)VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language ModelsInternational Journal of Computer Vision10.1007/s11263-025-02368-9Online publication date: 19-Feb-2025
https://doi.org/10.1007/s11263-025-02368-9
Ye XZhang QCui SYing ZSun JDu X(2024)Mitigating Adversarial Attacks in Object Detection through Conditional Diffusion ModelsMathematics10.3390/math1219309312:19(3093)Online publication date: 2-Oct-2024
https://doi.org/10.3390/math12193093
Show More Cited By

Index Terms

Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
2. Security and privacy

Recommendations

A Comprehensive Survey on Poisoning Attacks and Countermeasures in Machine Learning
The prosperity of machine learning has been accompanied by increasing attacks on the training process. Among them, poisoning attacks have become an emerging threat during model training. Poisoning attacks have profound impacts on the target models, e.g., ...
EvilEdit: Backdooring Text-to-Image Diffusion Models in One Second
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Text-to-image (T2I) diffusion models enjoy great popularity and many individuals and companies build their applications based on publicly released T2I diffusion models. Previous studies have demonstrated that backdoor attacks can elicit T2I diffusion ...
Swinv2-Imagen: hierarchical vision transformer diffusion models for text-to-image generation
Abstract
Recently, diffusion models have been proven to perform remarkably well in text-to-image synthesis tasks in a number of studies, immediately presenting new study opportunities for image generation. Google’s Imagen follows this research trend and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key R&D Program of China
Shuimu Tsinghua Scholar Program
China National Postdoctoral Program for Innovative Talents
NSFC Project

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
686
Total Downloads

Downloads (Last 12 months)589
Downloads (Last 6 weeks)43

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fan MWang CChen CLiu YHuang J(2025)On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and OutlookInternational Journal of Computer Vision10.1007/s11263-025-02375-wOnline publication date: 28-Feb-2025
https://doi.org/10.1007/s11263-025-02375-w
Liang JLiang SLiu ACao X(2025)VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language ModelsInternational Journal of Computer Vision10.1007/s11263-025-02368-9Online publication date: 19-Feb-2025
https://doi.org/10.1007/s11263-025-02368-9
Ye XZhang QCui SYing ZSun JDu X(2024)Mitigating Adversarial Attacks in Object Detection through Conditional Diffusion ModelsMathematics10.3390/math1219309312:19(3093)Online publication date: 2-Oct-2024
https://doi.org/10.3390/math12193093
Zhang SPan YLiu QYan ZChoo KWang G(2024)Backdoor Attacks and Defenses Targeting Multi-Domain AI Models: A Comprehensive ReviewACM Computing Surveys10.1145/370472557:4(1-35)Online publication date: 10-Dec-2024
https://dl.acm.org/doi/10.1145/3704725
Wang HGuo SHe JChen KZhang SZhang TXiang TCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)EvilEdit: Backdooring Text-to-Image Diffusion Models in One SecondProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680689(3657-3665)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680689
Vice JAkhtar NHartley RMian A(2024)BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative ModelsIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.338605819(4865-4880)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1109/TIFS.2024.3386058
Shan SDing WPassananti JWu SZheng HZhao B(2024)Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00207(807-825)Online publication date: 19-May-2024
https://doi.org/10.1109/SP54263.2024.00207
Zhang XZhang ZShen QWang WGao YYang ZZhang J(2024)SegScope: Probing Fine-grained Interrupts via Architectural Footprints2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00039(424-438)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00039
Chent YGuo WGe PMa XZhang Y(2024)Model Extraction Attacks on Text-to-Image Generative Adversarial Networks2024 IEEE Cyber Science and Technology Congress (CyberSciTech)10.1109/CyberSciTech64112.2024.00050(272-279)Online publication date: 5-Nov-2024
https://doi.org/10.1109/CyberSciTech64112.2024.00050
Li XShen QKawaguchi K(2024)VA3: Virtually Assured Amplification Attack on Probabilistic Copyright Protection for Text-to-Image Generative Models2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01175(12363-12373)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.01175
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten