Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3485832.3485837acmotherconferencesArticle/Chapter ViewAbstractPublication PagesacsacConference Proceedingsconference-collections
research-article

BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements

Published: 06 December 2021 Publication History

Abstract

Deep neural networks (DNNs) have progressed rapidly during the past decade and have been deployed in various real-world applications. Meanwhile, DNN models have been shown to be vulnerable to security and privacy attacks. One such attack that has attracted a great deal of attention recently is the backdoor attack. Specifically, the adversary poisons the target model’s training set to mislead any input with an added secret trigger to a target class.
Previous backdoor attacks predominantly focus on computer vision (CV) applications, such as image classification. In this paper, we perform a systematic investigation of backdoor attack on NLP models, and propose BadNL, a general NLP backdoor attack framework including novel attack methods. Specifically, we propose three methods to construct triggers, namely BadChar, BadWord, and BadSentence, including basic and semantic-preserving variants. Our attacks achieve an almost perfect attack success rate with a negligible effect on the original model’s utility. For instance, using the BadChar, our backdoor attack achieves a 98.9% attack success rate with yielding a utility improvement of 1.5% on the SST-5 dataset when only poisoning 3% of the original set. Moreover, we conduct a user study to prove that our triggers can well preserve the semantics from humans perspective.

References

[1]
Daniel Cer, Yinfei Yang, Sheng yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Céspedes, Steve Yuan, Chris Tar, 2018. Universal Sentence Encoder. CoRR abs/1803.11175(2018).
[2]
Alvin Chan, Yi Tay, Yew-Soon Ong, and Aston Zhang. 2020. Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder. CoRR abs/2010.02684(2020).
[3]
Noam Chomsky. 2009. Syntactic Structures. De Gruyter Mouton.
[4]
Alberto Compagno, Mauro Conti, Daniele Lain, Giulio Lovisotto, and Luigi Vincenzo Mancini. 2015. Boten ELISA: A Novel Approach for Botnet C&C in Online Social Networks. In IEEE Conference on Communications and Network Security (CNS). IEEE, 74–82. https://doi.org/10.1109/CNS.2015.7346813
[5]
Jiazhu Dai, Chuanshuai Chen, and Yufeng Li. 2019. A Backdoor Attack Against LSTM-Based Text Classification Systems. IEEE Access 7(2019), 138872–138878. https://doi.org/10.1109/ACCESS.2019.2941376
[6]
Abdelrahman Desoky. 2010. Comprehensive Linguistic Steganography Survey. Int. J. Inf. Comput. Secur. 4, 2 (2010), 164–197. https://doi.org/10.1504/IJICS.2010.034816
[7]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR abs/1810.04805(2018).
[8]
Karan Ganju, Qi Wang, Wei Yang, Carl A. Gunter, and Nikita Borisov. 2018. Property Inference Attacks on Fully Connected Neural Networks using Permutation Invariant Representations. In ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 619–633.
[9]
Yansong Gao, Change Xu, Derui Wang, Shiping Chen, Damith C Ranasinghe, and Surya Nepal. 2019. STRIP: A Defence Against Trojan Attacks on Deep Neural Networks. In Annual Computer Security Applications Conference (ACSAC). ACM, 113–125.
[10]
Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Grag. 2017. Badnets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain. CoRR abs/1708.06733(2017).
[11]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation(1997).
[12]
Jinyuan Jia and Neil Zhenqiang Gong. 2018. AttriGuard: A Practical Defense Against Attribute Inference Attacks via Adversarial Machine Learning. In USENIX Security Symposium (USENIX Security). USENIX, 513–529.
[13]
Mika Juuti, Sebastian Szyller, Samuel Marchal, and N. Asokan. 2019. PRADA: Protecting Against DNN Model Stealing Attacks. In IEEE European Symposium on Security and Privacy (Euro S&P). IEEE, 512–527.
[14]
Philipp Koehn. 2005. Europarl: A Parallel Corpus for Statistical Machine Translation. In MT summit. 79–86.
[15]
Keita Kurita, Paul Michel, and Graham Neubig. 2020. Weight Poisoning Attacks on Pretrained Models. In Annual Meeting of the Association for Computational Linguistics (ACL). ACL, Online, 2793–2806. https://doi.org/10.18653/v1/2020.acl-main.249
[16]
Juncen Li, Robin Jia, He He, and Percy Liang. 2018. Delete, Retrieve, Generate: a Simple Approach to Sentiment and Style Transfer. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). ACL, 1865–1874.
[17]
Shaofeng Li, Hui Liu, Tian Dong, Benjamin Zi Hao Zhao, Minhui Xue, Haojin Zhu, and Jialiang Lu. 2021. Hidden Backdoors in Human-Centric Language Models. In ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM.
[18]
Yingqi Liu, Wen-Chuan Lee, Guanhong Tao, Shiqing Ma, Yousra Aafer, and Xiangyu Zhang. 2019. ABS: Scanning Neural Networks for Back-Doors by Artificial Brain Stimulation. In ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 1265–1282.
[19]
Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, Juan Zhai, Weihang Wang, and Xiangyu Zhang. 2019. Trojaning Attack on Neural Networks. In Network and Distributed System Security Symposium (NDSS). Internet Society.
[20]
Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning Word Vectors for Sentiment Analysis. In Annual Meeting of the Association for Computational Linguistics (ACL). ACL, 142–150.
[21]
Manish Munikar, Sushil Shakya, and Aakash Shrestha. 2019. Fine-grained Sentiment Classification using BERT. CoRR abs/1910.03474(2019).
[22]
Anh Nguyen and Anh Tran. 2020. Input-Aware Dynamic Backdoor Attack. CoRR abs/2010.08138(2020).
[23]
Jianmo Ni, Jiacheng Li, and Julian J. McAuley. 2019. Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects. In Conference on Empirical Methods in Natural Language Processing and International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). ACL, 188–197.
[24]
Tribhuvanesh Orekondy, Bernt Schiele, and Mario Fritz. 2019. Knockoff Nets: Stealing Functionality of Black-Box Models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 4954–4963.
[25]
Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A Fast, Extensible Toolkit for Sequence Modeling. In Proceedings of NAACL-HLT 2019: Demonstrations.
[26]
Myle Ott, Sergey Edunov, David Grangier, and Michael Auli. 2018. Scaling Neural Machine Translation. CoRR abs/1806.00187(2018).
[27]
Luca Pajola and Mauro Conti. 2021. Fall of Giants: How popular text-based MLaaS fall against a simple evasion attack. CoRR abs/2104.05996(2021).
[28]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2016. Bleu: a Method for Automatic Evaluation of Machine Translation. In Annual Meeting of the Association for Computational Linguistics (ACL). ACL, 311–318.
[29]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP). 1532–1543. http://www.aclweb.org/anthology/D14-1162
[30]
Apostolos Pyrgelis, Carmela Troncoso, and Emiliano De Cristofaro. 2018. Knock Knock, Who’s There? Membership Inference on Aggregate Location Data. In Network and Distributed System Security Symposium (NDSS). Internet Society.
[31]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). ACL, 3982–3992.
[32]
Aniruddha Saha, Akshayvarun Subramanya, and Hamed Pirsiavash. 2020. Hidden Trigger Backdoor Attacks. In Proceedings of the AAAI Conference on Artificial Intelligence. 11957–11965.
[33]
Ahmed Salem, Apratim Bhattacharya, Michael Backes, Mario Fritz, and Yang Zhang. 2020. Updates-Leak: Data Set Inference and Reconstruction Attacks in Online Learning. In USENIX Security Symposium (USENIX Security). USENIX, 1291–1308.
[34]
Ahmed Salem, Rui Wen, Michael Backes, Shiqing Ma, and Yang Zhang. 2020. Dynamic Backdoor Attacks Against Machine Learning Models. CoRR abs/2003.03675(2020).
[35]
Ahmed Salem, Yang Zhang, Mathias Humbert, Pascal Berrang, Mario Fritz, and Michael Backes. 2019. ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models. In Network and Distributed System Security Symposium (NDSS). Internet Society.
[36]
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership Inference Attacks Against Machine Learning Models. In IEEE Symposium on Security and Privacy (S&P). IEEE, 3–18.
[37]
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In Conference on Empirical Methods in Natural Language Processing (EMNLP). ACL, 1631–1642.
[38]
Akhilesh Sudhakar, Bhargav Upadhyay, and Arjun Maheswaran. 2019. “Transforming” Delete, Retrieve, Generate Approach for Controlled Text Style Transfer. In Conference on Empirical Methods in Natural Language Processing and International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). ACL, 3267–3277.
[39]
Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. 2016. Stealing Machine Learning Models via Prediction APIs. In USENIX Security Symposium (USENIX Security). USENIX, 601–618.
[40]
Binghui Wang and Neil Zhenqiang Gong. 2018. Stealing Hyperparameters in Machine Learning. In IEEE Symposium on Security and Privacy (S&P). IEEE, 36–52.
[41]
Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y. Zhao. 2019. Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks. In IEEE Symposium on Security and Privacy (S&P). IEEE, 707–723.
[42]
Yuanshun Yao, Huiying Li, Haitao Zheng, and Ben Y. Zhao. 2019. Latent Backdoor Attacks on Deep Neural Networks. In ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2041–2055.
[43]
Honggang Yu, Kaichen Yang, Teng Zhang, Yun-Yun Tsai, Tsung-Yi Ho, and Yier Jin. 2020. CloudLeak: Large-Scale Deep Learning Models Stealing Through Adversarial Examples. In Network and Distributed System Security Symposium (NDSS). Internet Society.
[44]
Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2017. mixup: Beyond Empirical Risk Minimization. CoRR abs/1710.09412(2017).
[45]
Xinyang Zhang, Zheng Zhang, Shouling Ji, and Ting Wang. 2020. Trojaning Language Models for Fun and Profit. CoRR abs/2008.00312(2020).

Cited By

View all
  • (2024)Backdoor BreakthroughInnovations, Securities, and Case Studies Across Healthcare, Business, and Technology10.4018/979-8-3693-1906-2.ch008(140-156)Online publication date: 12-Apr-2024
  • (2024)Stealthy Backdoor Attack for Code ModelsIEEE Transactions on Software Engineering10.1109/TSE.2024.336166150:4(721-741)Online publication date: Apr-2024
  • (2024)Leverage NLP Models Against Other NLP Models: Two Invisible Feature Space Backdoor AttacksIEEE Transactions on Reliability10.1109/TR.2024.337552673:3(1559-1568)Online publication date: Sep-2024
  • Show More Cited By

Index Terms

  1. BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ACSAC '21: Proceedings of the 37th Annual Computer Security Applications Conference
    December 2021
    1077 pages
    ISBN:9781450385794
    DOI:10.1145/3485832
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 December 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. NLP
    2. backdoor attack
    3. semantic-preserving

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    ACSAC '21

    Acceptance Rates

    Overall Acceptance Rate 104 of 497 submissions, 21%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)349
    • Downloads (Last 6 weeks)24
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Backdoor BreakthroughInnovations, Securities, and Case Studies Across Healthcare, Business, and Technology10.4018/979-8-3693-1906-2.ch008(140-156)Online publication date: 12-Apr-2024
    • (2024)Stealthy Backdoor Attack for Code ModelsIEEE Transactions on Software Engineering10.1109/TSE.2024.336166150:4(721-741)Online publication date: Apr-2024
    • (2024)Leverage NLP Models Against Other NLP Models: Two Invisible Feature Space Backdoor AttacksIEEE Transactions on Reliability10.1109/TR.2024.337552673:3(1559-1568)Online publication date: Sep-2024
    • (2024)Backdoor Learning: A SurveyIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.318297935:1(5-22)Online publication date: Jan-2024
    • (2024)BadCM: Invisible Backdoor Attack Against Cross-Modal LearningIEEE Transactions on Image Processing10.1109/TIP.2024.337891833(2558-2571)Online publication date: 2024
    • (2024)Stealthy Targeted Backdoor Attacks Against Image CaptioningIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.340217919(5655-5667)Online publication date: 2024
    • (2024)BDMMT: Backdoor Sample Detection for Language Models Through Model Mutation TestingIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.337696819(4285-4300)Online publication date: 2024
    • (2024)TriMPA: Triggerless Targeted Model Poisoning Attack in DNNIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.334926911:4(5431-5443)Online publication date: Aug-2024
    • (2024)Exploring Clean Label Backdoor Attacks and Defense in Language ModelsIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2024.340757132(3014-3024)Online publication date: 2024
    • (2024)ImpNet: Imperceptible and blackbox-undetectable backdoors in compiled neural networks2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)10.1109/SaTML59370.2024.00024(344-357)Online publication date: 9-Apr-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media