research-article

Neural Exec: Learning (and Learning from) Execution Triggers for Prompt Injection Attacks

Authors:

Dario Pasquini,

Martin Strohmeier,

Carmela TroncosoAuthors Info & Claims

AISec '24: Proceedings of the 2024 Workshop on Artificial Intelligence and Security

Pages 89 - 100

https://doi.org/10.1145/3689932.3694764

Published: 22 November 2024 Publication History

Abstract

We introduce a new family of prompt injection attacks, termed Neural Exec. Unlike known attacks that rely on handcrafted strings (e.g., "Ignore previous instructions and..."), we show that it is possible to conceptualize the creation of execution triggers as a differentiable search problem and use learning-based methods to autonomously generate them.

Our results demonstrate that a motivated adversary can forge triggers that are not only drastically more effective than current handcrafted ones but also exhibit inherent flexibility in shape, properties, and functionality. In this direction, we show that an attacker can design and generate Neural Execs capable of persisting through multi-stage preprocessing pipelines, such as in the case of Retrieval-Augmented Generation (RAG)-based applications. More critically, our findings show that attackers can produce triggers that deviate markedly in form and shape from any known attack, sidestepping existing blacklist-based detection and sanitation approaches. Code available at https://github.com/pasquini-dario/LLM_NeuralExec

References

[1]

'AI Risk Management Framework'. https://www.nist.gov/itl/ai-riskmanagement-framework.

[2]

'CVE-2023-29374: LLMMathChain allows arbitrary code execution via Python exec'. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-29374.

[3]

'How generative AI is changing the way developers work'. https://github.blog/ 2023-04--14-how-generative-ai-is-changing-the-way-developers-work/.

[4]

'HuggingFace: meta-llama/Meta-Llama-3--8B-Instruct'. https://huggingface.co/ meta-llama/Meta-Llama-3-8B-Instruct.

[5]

'HuggingFace: mistralai/Mistral-7B-Instruct-v0.2'. https://huggingface.co/ mistralai/Mistral-7B-Instruct-v0.2.

[6]

'HuggingFace: mistralai/Mixtral-8x7B-v0.1'. https://huggingface.co/mistralai/ Mixtral-8x7B-v0.1.

[7]

'HuggingFace: openchat/openchat_3.5'. https://huggingface.co/openchat/ openchat_3.5.

[8]

'HuggingFace: sentence-transformers/all-mpnet-base-v2'. https://huggingface. co/sentence-transformers/all-mpnet-base-v2.

[9]

'Microsoft Copilot'. https://copilot.microsoft.com.

[10]

'OWASP Top 10 for Large Language Model Applications'. https://owasp.org/ www-project-top-10-for-large-language-model-applications/.

[11]

'Prompt injection attacks against GPT-3'. https://simonwillison.net/2022/Sep/ 12/prompt-injection/.

[12]

'Securing LLM Systems Against Prompt Injection'. https://developer.nvidia.com/ blog/securing-llm-systems-against-prompt-injection/.

[13]

'Series: Prompt injection'. .

[14]

'The Impact of LLMs on the Legal Industry'. https://www.edgewortheconomics. com/insight-impact-LLMs-legal-industry.

[15]

'The Rise of AI-Powered Applications: Large Language Models in Modern Business'. https://www.computer.org/publications/tech-news/trends/largelanguage-models-in-modern-business.

[16]

'Thinking about the security of AI systems'. https://www.ncsc.gov.uk/blogpost/ thinking-about-security-ai-systems.

[17]

Tom Brown and et al. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1877--1901. Curran Associates, Inc., 2020.

[18]

Patrick Chao, Alexander Robey, Edgar Dobriban, Hamed Hassani, George J. Pappas, and Eric Wong. Jailbreaking black box large language models in twenty queries, 2023.

[19]

Sahil Chaudhary. Code alpaca: An instruction-following llama model for code generation. https://github.com/sahil280114/codealpaca, 2023.

[20]

Lichang Chen, Jiuhai Chen, Tom Goldstein, Heng Huang, and Tianyi Zhou. Instructzero: Efficient instruction optimization for black-box large language models. arXiv preprint arXiv:2306.03082, 2023.

[21]

Touvron et al. Llama 2: Open foundation and fine-tuned chat models, 2023.

[22]

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you've signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, pages 79--90, 2023.

Digital Library

[23]

Albert Q Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, et al. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.

[24]

Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de Las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, L'elio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. Mixtral of experts. 2024.

[25]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Kuüttler, Mike Lewis,Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459--9474, 2020.

[26]

Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, and Yang Liu. Prompt injection attack against llm-integrated applications, 2023.

[27]

Long Ouyang and et al. Training language models to follow instructions with human feedback. volume 35, pages 27730--27744. Curran Associates, Inc., 2022.

[28]

Fábio Perez and Ian Ribeiro. Ignore previous prompt: Attack techniques for language models, 2022.

[29]

Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019.

[30]

Pranav Rajpurkar, Robin Jia, and Percy Liang. Know what you don't know: Unanswerable questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 784--789, Melbourne, Australia, July 2018. Association for Computational Linguistics.

[31]

Taylor Shin, Yasaman Razeghi, Robert L. Logan IV, Eric Wallace, and Sameer Singh. AutoPrompt: Eliciting knowledge from language models with automatically generated prompts. In Empirical Methods in Natural Language Processing (EMNLP), 2020.

[32]

Chawin Sitawarin, Norman Mu, David Wagner, and Alexandre Araujo. Pal: Proxy-guided black-box attack on large language models, 2024.

[33]

Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B. Hashimoto. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.

[34]

Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, and Sameer Singh. Universal adversarial triggers for attacking and analyzing NLP. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLPIJCNLP), pages 2153--2162, Hong Kong, China, November 2019. Association for Computational Linguistics.

[35]

Guan Wang, Sijie Cheng, Xianyuan Zhan, Xiangang Li, Sen Song, and Yang Liu. Openchat: Advancing open-source language models with mixed-quality data, 2023.

[36]

Andy Zou, Zifan Wang, J. Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models, 2023.

Cited By

Fan MWang CChen CLiu YHuang J(2025)On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and OutlookInternational Journal of Computer Vision10.1007/s11263-025-02375-wOnline publication date: 28-Feb-2025
https://doi.org/10.1007/s11263-025-02375-w
Samonte MAparize JGonzales EMorilla J(2024)Data Breach Prevention in AI Systems: Employing Event-Driven Architecture to Combat Prompt Injection Attacks in Chatbots2024 IEEE 12th International Conference on Information, Communication and Networks (ICICN)10.1109/ICICN62625.2024.10761619(626-632)Online publication date: 21-Aug-2024
https://doi.org/10.1109/ICICN62625.2024.10761619

Index Terms

Neural Exec: Learning (and Learning from) Execution Triggers for Prompt Injection Attacks
1. Computing methodologies
  1. Machine learning
2. Security and privacy

Recommendations

Prompt Injection Attacks in Defended Systems
Distributed Computer and Communication Networks
Abstract
Large language models play a crucial role in modern natural language processing technologies. However, their extensive use also introduces potential security risks, such as the possibility of black-box attacks. These attacks can embed hidden ...
Transformer-based GAN-augmented Defender for Adversarial USB Keystroke Injection Attacks
ICDCN '25: Proceedings of the 26th International Conference on Distributed Computing and Networking
Existing defenses against adversarial Universal Serial Bus (USB) keystroke injection attacks are often insufficient, as adversaries can exploit data poisoning techniques during model training to undermine performance. This paper introduces a novel ...
Formalizing and benchmarking prompt injection attacks and defenses
SEC '24: Proceedings of the 33rd USENIX Conference on Security Symposium

A prompt injection attack aims to inject malicious instruction/data into the input of an LLM-Integrated Application such that it produces results as an attacker desires. Existing works are limited to case studies. As a result, the literature lacks a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

AISec '24: Proceedings of the 2024 Workshop on Artificial Intelligence and Security

November 2024

225 pages

ISBN:9798400712289

DOI:10.1145/3689932

Program Chairs:
Maura Pintor
University of Cagliari
,
Xinyun Chen
Google DeepMind
,
Matthew Jagielski
Google DeepMind

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 November 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CCS '24

Sponsor:

SIGSAC

CCS '24: ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

UT, Salt Lake City, USA

Acceptance Rates

Overall Acceptance Rate 94 of 231 submissions, 41%

Upcoming Conference

CCS '25

Sponsor:
sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 13 - 17, 2025

Taipei , Taiwan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
76
Total Downloads

Downloads (Last 12 months)76
Downloads (Last 6 weeks)22

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fan MWang CChen CLiu YHuang J(2025)On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and OutlookInternational Journal of Computer Vision10.1007/s11263-025-02375-wOnline publication date: 28-Feb-2025
https://doi.org/10.1007/s11263-025-02375-w
Samonte MAparize JGonzales EMorilla J(2024)Data Breach Prevention in AI Systems: Employing Event-Driven Architecture to Combat Prompt Injection Attacks in Chatbots2024 IEEE 12th International Conference on Information, Communication and Networks (ICICN)10.1109/ICICN62625.2024.10761619(626-632)Online publication date: 21-Aug-2024
https://doi.org/10.1109/ICICN62625.2024.10761619

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten