research-article

BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements

Authors:

Michael Backes,

Yang ZhangAuthors Info & Claims

ACSAC '21: Proceedings of the 37th Annual Computer Security Applications Conference

Pages 554 - 569

https://doi.org/10.1145/3485832.3485837

Published: 06 December 2021 Publication History

Abstract

Deep neural networks (DNNs) have progressed rapidly during the past decade and have been deployed in various real-world applications. Meanwhile, DNN models have been shown to be vulnerable to security and privacy attacks. One such attack that has attracted a great deal of attention recently is the backdoor attack. Specifically, the adversary poisons the target model’s training set to mislead any input with an added secret trigger to a target class.

Previous backdoor attacks predominantly focus on computer vision (CV) applications, such as image classification. In this paper, we perform a systematic investigation of backdoor attack on NLP models, and propose BadNL, a general NLP backdoor attack framework including novel attack methods. Specifically, we propose three methods to construct triggers, namely BadChar, BadWord, and BadSentence, including basic and semantic-preserving variants. Our attacks achieve an almost perfect attack success rate with a negligible effect on the original model’s utility. For instance, using the BadChar, our backdoor attack achieves a 98.9% attack success rate with yielding a utility improvement of 1.5% on the SST-5 dataset when only poisoning 3% of the original set. Moreover, we conduct a user study to prove that our triggers can well preserve the semantics from humans perspective.

References

[1]

Daniel Cer, Yinfei Yang, Sheng yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Céspedes, Steve Yuan, Chris Tar, 2018. Universal Sentence Encoder. CoRR abs/1803.11175(2018).

[2]

Alvin Chan, Yi Tay, Yew-Soon Ong, and Aston Zhang. 2020. Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder. CoRR abs/2010.02684(2020).

[3]

Noam Chomsky. 2009. Syntactic Structures. De Gruyter Mouton.

[4]

Alberto Compagno, Mauro Conti, Daniele Lain, Giulio Lovisotto, and Luigi Vincenzo Mancini. 2015. Boten ELISA: A Novel Approach for Botnet C&C in Online Social Networks. In IEEE Conference on Communications and Network Security (CNS). IEEE, 74–82. https://doi.org/10.1109/CNS.2015.7346813

[5]

Jiazhu Dai, Chuanshuai Chen, and Yufeng Li. 2019. A Backdoor Attack Against LSTM-Based Text Classification Systems. IEEE Access 7(2019), 138872–138878. https://doi.org/10.1109/ACCESS.2019.2941376

[6]

Abdelrahman Desoky. 2010. Comprehensive Linguistic Steganography Survey. Int. J. Inf. Comput. Secur. 4, 2 (2010), 164–197. https://doi.org/10.1504/IJICS.2010.034816

Digital Library

[7]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR abs/1810.04805(2018).

[8]

Karan Ganju, Qi Wang, Wei Yang, Carl A. Gunter, and Nikita Borisov. 2018. Property Inference Attacks on Fully Connected Neural Networks using Permutation Invariant Representations. In ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 619–633.

Digital Library

[9]

Yansong Gao, Change Xu, Derui Wang, Shiping Chen, Damith C Ranasinghe, and Surya Nepal. 2019. STRIP: A Defence Against Trojan Attacks on Deep Neural Networks. In Annual Computer Security Applications Conference (ACSAC). ACM, 113–125.

[10]

Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Grag. 2017. Badnets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain. CoRR abs/1708.06733(2017).

[11]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation(1997).

[12]

Jinyuan Jia and Neil Zhenqiang Gong. 2018. AttriGuard: A Practical Defense Against Attribute Inference Attacks via Adversarial Machine Learning. In USENIX Security Symposium (USENIX Security). USENIX, 513–529.

[13]

Mika Juuti, Sebastian Szyller, Samuel Marchal, and N. Asokan. 2019. PRADA: Protecting Against DNN Model Stealing Attacks. In IEEE European Symposium on Security and Privacy (Euro S&P). IEEE, 512–527.

[14]

Philipp Koehn. 2005. Europarl: A Parallel Corpus for Statistical Machine Translation. In MT summit. 79–86.

[15]

Keita Kurita, Paul Michel, and Graham Neubig. 2020. Weight Poisoning Attacks on Pretrained Models. In Annual Meeting of the Association for Computational Linguistics (ACL). ACL, Online, 2793–2806. https://doi.org/10.18653/v1/2020.acl-main.249

[16]

Juncen Li, Robin Jia, He He, and Percy Liang. 2018. Delete, Retrieve, Generate: a Simple Approach to Sentiment and Style Transfer. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). ACL, 1865–1874.

[17]

Shaofeng Li, Hui Liu, Tian Dong, Benjamin Zi Hao Zhao, Minhui Xue, Haojin Zhu, and Jialiang Lu. 2021. Hidden Backdoors in Human-Centric Language Models. In ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM.

[18]

Yingqi Liu, Wen-Chuan Lee, Guanhong Tao, Shiqing Ma, Yousra Aafer, and Xiangyu Zhang. 2019. ABS: Scanning Neural Networks for Back-Doors by Artificial Brain Stimulation. In ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 1265–1282.

Digital Library

[19]

Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, Juan Zhai, Weihang Wang, and Xiangyu Zhang. 2019. Trojaning Attack on Neural Networks. In Network and Distributed System Security Symposium (NDSS). Internet Society.

[20]

Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning Word Vectors for Sentiment Analysis. In Annual Meeting of the Association for Computational Linguistics (ACL). ACL, 142–150.

[21]

Manish Munikar, Sushil Shakya, and Aakash Shrestha. 2019. Fine-grained Sentiment Classification using BERT. CoRR abs/1910.03474(2019).

[22]

Anh Nguyen and Anh Tran. 2020. Input-Aware Dynamic Backdoor Attack. CoRR abs/2010.08138(2020).

[23]

Jianmo Ni, Jiacheng Li, and Julian J. McAuley. 2019. Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects. In Conference on Empirical Methods in Natural Language Processing and International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). ACL, 188–197.

[24]

Tribhuvanesh Orekondy, Bernt Schiele, and Mario Fritz. 2019. Knockoff Nets: Stealing Functionality of Black-Box Models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 4954–4963.

[25]

Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A Fast, Extensible Toolkit for Sequence Modeling. In Proceedings of NAACL-HLT 2019: Demonstrations.

[26]

Myle Ott, Sergey Edunov, David Grangier, and Michael Auli. 2018. Scaling Neural Machine Translation. CoRR abs/1806.00187(2018).

[27]

Luca Pajola and Mauro Conti. 2021. Fall of Giants: How popular text-based MLaaS fall against a simple evasion attack. CoRR abs/2104.05996(2021).

[28]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2016. Bleu: a Method for Automatic Evaluation of Machine Translation. In Annual Meeting of the Association for Computational Linguistics (ACL). ACL, 311–318.

[29]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP). 1532–1543. http://www.aclweb.org/anthology/D14-1162

[30]

Apostolos Pyrgelis, Carmela Troncoso, and Emiliano De Cristofaro. 2018. Knock Knock, Who’s There? Membership Inference on Aggregate Location Data. In Network and Distributed System Security Symposium (NDSS). Internet Society.

[31]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). ACL, 3982–3992.

[32]

Aniruddha Saha, Akshayvarun Subramanya, and Hamed Pirsiavash. 2020. Hidden Trigger Backdoor Attacks. In Proceedings of the AAAI Conference on Artificial Intelligence. 11957–11965.

[33]

Ahmed Salem, Apratim Bhattacharya, Michael Backes, Mario Fritz, and Yang Zhang. 2020. Updates-Leak: Data Set Inference and Reconstruction Attacks in Online Learning. In USENIX Security Symposium (USENIX Security). USENIX, 1291–1308.

[34]

Ahmed Salem, Rui Wen, Michael Backes, Shiqing Ma, and Yang Zhang. 2020. Dynamic Backdoor Attacks Against Machine Learning Models. CoRR abs/2003.03675(2020).

[35]

Ahmed Salem, Yang Zhang, Mathias Humbert, Pascal Berrang, Mario Fritz, and Michael Backes. 2019. ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models. In Network and Distributed System Security Symposium (NDSS). Internet Society.

[36]

Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership Inference Attacks Against Machine Learning Models. In IEEE Symposium on Security and Privacy (S&P). IEEE, 3–18.

[37]

Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In Conference on Empirical Methods in Natural Language Processing (EMNLP). ACL, 1631–1642.

[38]

Akhilesh Sudhakar, Bhargav Upadhyay, and Arjun Maheswaran. 2019. “Transforming” Delete, Retrieve, Generate Approach for Controlled Text Style Transfer. In Conference on Empirical Methods in Natural Language Processing and International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). ACL, 3267–3277.

[39]

Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. 2016. Stealing Machine Learning Models via Prediction APIs. In USENIX Security Symposium (USENIX Security). USENIX, 601–618.

[40]

Binghui Wang and Neil Zhenqiang Gong. 2018. Stealing Hyperparameters in Machine Learning. In IEEE Symposium on Security and Privacy (S&P). IEEE, 36–52.

[41]

Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y. Zhao. 2019. Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks. In IEEE Symposium on Security and Privacy (S&P). IEEE, 707–723.

[42]

Yuanshun Yao, Huiying Li, Haitao Zheng, and Ben Y. Zhao. 2019. Latent Backdoor Attacks on Deep Neural Networks. In ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2041–2055.

[43]

Honggang Yu, Kaichen Yang, Teng Zhang, Yun-Yun Tsai, Tsung-Yi Ho, and Yier Jin. 2020. CloudLeak: Large-Scale Deep Learning Models Stealing Through Adversarial Examples. In Network and Distributed System Security Symposium (NDSS). Internet Society.

[44]

Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2017. mixup: Beyond Empirical Risk Minimization. CoRR abs/1710.09412(2017).

[45]

Xinyang Zhang, Zheng Zhang, Shouling Ji, and Ting Wang. 2020. Trojaning Language Models for Fun and Profit. CoRR abs/2008.00312(2020).

Cited By

Jones A(2024)Backdoor BreakthroughInnovations, Securities, and Case Studies Across Healthcare, Business, and Technology10.4018/979-8-3693-1906-2.ch008(140-156)Online publication date: 12-Apr-2024
https://doi.org/10.4018/979-8-3693-1906-2.ch008
Yang ZXu BZhang JKang HShi JHe JLo D(2024)Stealthy Backdoor Attack for Code ModelsIEEE Transactions on Software Engineering10.1109/TSE.2024.336166150:4(721-741)Online publication date: Apr-2024
https://doi.org/10.1109/TSE.2024.3361661
Li XLu XLi P(2024)Leverage NLP Models Against Other NLP Models: Two Invisible Feature Space Backdoor AttacksIEEE Transactions on Reliability10.1109/TR.2024.337552673:3(1559-1568)Online publication date: Sep-2024
https://doi.org/10.1109/TR.2024.3375526
Show More Cited By

Index Terms

BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements
1. Social and professional topics

Index terms have been assigned to the content through auto-classification.

Recommendations

Backdoor Two-Stream Video Models on Federated Learning
Video models on federated learning (FL) enable continual learning of the involved models for video tasks on end-user devices while protecting the privacy of end-user data. As a result, the security issues on FL, e.g., the backdoor attacks on FL and their ...
Does Differential Privacy Prevent Backdoor Attacks in Practice?
Data and Applications Security and Privacy XXXVIII
Abstract
Differential Privacy (DP) was originally developed to protect privacy. However, it has recently been utilized to secure machine learning (ML) models from poisoning attacks, with DP-SGD receiving substantial attention. Nevertheless, a thorough ...
Backdoor Attacks with Input-Unique Triggers in NLP
Machine Learning and Knowledge Discovery in Databases. Research Track
Abstract
Backdoor attack aims to induce neural models to make incorrect predictions for poison data while keeping predictions on the clean dataset unchanged, which creates a considerable threat to current natural language processing (NLP) systems. Existing ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ACSAC '21: Proceedings of the 37th Annual Computer Security Applications Conference

December 2021

1077 pages

ISBN:9781450385794

DOI:10.1145/3485832

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 December 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Natural Science Foundation of China
Helmholtz Association

Conference

ACSAC '21

ACSAC '21: Annual Computer Security Applications Conference

December 6 - 10, 2021

Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 104 of 497 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

64
Total Citations
View Citations
1,093
Total Downloads

Downloads (Last 12 months)349
Downloads (Last 6 weeks)24

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jones A(2024)Backdoor BreakthroughInnovations, Securities, and Case Studies Across Healthcare, Business, and Technology10.4018/979-8-3693-1906-2.ch008(140-156)Online publication date: 12-Apr-2024
https://doi.org/10.4018/979-8-3693-1906-2.ch008
Yang ZXu BZhang JKang HShi JHe JLo D(2024)Stealthy Backdoor Attack for Code ModelsIEEE Transactions on Software Engineering10.1109/TSE.2024.336166150:4(721-741)Online publication date: Apr-2024
https://doi.org/10.1109/TSE.2024.3361661
Li XLu XLi P(2024)Leverage NLP Models Against Other NLP Models: Two Invisible Feature Space Backdoor AttacksIEEE Transactions on Reliability10.1109/TR.2024.337552673:3(1559-1568)Online publication date: Sep-2024
https://doi.org/10.1109/TR.2024.3375526
Li YJiang YLi ZXia S(2024)Backdoor Learning: A SurveyIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.318297935:1(5-22)Online publication date: Jan-2024
https://doi.org/10.1109/TNNLS.2022.3182979
Zhang ZYuan XZhu LSong JNie L(2024)BadCM: Invisible Backdoor Attack Against Cross-Modal LearningIEEE Transactions on Image Processing10.1109/TIP.2024.337891833(2558-2571)Online publication date: 2024
https://doi.org/10.1109/TIP.2024.3378918
Fan WLi HJiang WHao MYu SZhang X(2024)Stealthy Targeted Backdoor Attacks Against Image CaptioningIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.340217919(5655-5667)Online publication date: 2024
https://doi.org/10.1109/TIFS.2024.3402179
Wei JFan MJiao WJin WLiu T(2024)BDMMT: Backdoor Sample Detection for Language Models Through Model Mutation TestingIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.337696819(4285-4300)Online publication date: 2024
https://doi.org/10.1109/TIFS.2024.3376968
Manna DTripathy S(2024)TriMPA: Triggerless Targeted Model Poisoning Attack in DNNIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.334926911:4(5431-5443)Online publication date: Aug-2024
https://doi.org/10.1109/TCSS.2023.3349269
Zhao STuan LFu JWen JLuo W(2024)Exploring Clean Label Backdoor Attacks and Defense in Language ModelsIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2024.340757132(3014-3024)Online publication date: 2024
https://doi.org/10.1109/TASLP.2024.3407571
Clifford EShumailov IZhao YAnderson RMullins R(2024)ImpNet: Imperceptible and blackbox-undetectable backdoors in compiled neural networks2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)10.1109/SaTML59370.2024.00024(344-357)Online publication date: 9-Apr-2024
https://doi.org/10.1109/SaTML59370.2024.00024
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents