%0 Conference Proceedings %T Large Language Models Are Better Adversaries: Exploring Generative Clean-Label Backdoor Attacks Against Text Classifiers %A You, Wencong %A Hammoudeh, Zayd %A Lowd, Daniel %Y Bouamor, Houda %Y Pino, Juan %Y Bali, Kalika %S Findings of the Association for Computational Linguistics: EMNLP 2023 %D 2023 %8 December %I Association for Computational Linguistics %C Singapore %F you-etal-2023-large %X Backdoor attacks manipulate model predictions by inserting innocuous triggers into training and test data. We focus on more realistic and more challenging clean-label attacks where the adversarial training examples are correctly labeled. Our attack, LLMBkd, leverages language models to automatically insert diverse style-based triggers into texts. We also propose a poison selection technique to improve the effectiveness of both LLMBkd as well as existing textual backdoor attacks. Lastly, we describe REACT, a baseline defense to mitigate backdoor attacks via antidote training examples. Our evaluations demonstrate LLMBkd’s effectiveness and efficiency, where we consistently achieve high attack success rates across a wide range of styles with little effort and no model training. %R 10.18653/v1/2023.findings-emnlp.833 %U https://aclanthology.org/2023.findings-emnlp.833 %U https://doi.org/10.18653/v1/2023.findings-emnlp.833 %P 12499-12527