Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3689932.3694764acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

Neural Exec: Learning (and Learning from) Execution Triggers for Prompt Injection Attacks

Published: 22 November 2024 Publication History

Abstract

We introduce a new family of prompt injection attacks, termed Neural Exec. Unlike known attacks that rely on handcrafted strings (e.g., "Ignore previous instructions and..."), we show that it is possible to conceptualize the creation of execution triggers as a differentiable search problem and use learning-based methods to autonomously generate them.
Our results demonstrate that a motivated adversary can forge triggers that are not only drastically more effective than current handcrafted ones but also exhibit inherent flexibility in shape, properties, and functionality. In this direction, we show that an attacker can design and generate Neural Execs capable of persisting through multi-stage preprocessing pipelines, such as in the case of Retrieval-Augmented Generation (RAG)-based applications. More critically, our findings show that attackers can produce triggers that deviate markedly in form and shape from any known attack, sidestepping existing blacklist-based detection and sanitation approaches. Code available at https://github.com/pasquini-dario/LLM_NeuralExec

References

[1]
'AI Risk Management Framework'. https://www.nist.gov/itl/ai-riskmanagement-framework.
[2]
'CVE-2023-29374: LLMMathChain allows arbitrary code execution via Python exec'. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-29374.
[3]
'How generative AI is changing the way developers work'. https://github.blog/ 2023-04--14-how-generative-ai-is-changing-the-way-developers-work/.
[4]
'HuggingFace: meta-llama/Meta-Llama-3--8B-Instruct'. https://huggingface.co/ meta-llama/Meta-Llama-3-8B-Instruct.
[5]
'HuggingFace: mistralai/Mistral-7B-Instruct-v0.2'. https://huggingface.co/ mistralai/Mistral-7B-Instruct-v0.2.
[6]
'HuggingFace: mistralai/Mixtral-8x7B-v0.1'. https://huggingface.co/mistralai/ Mixtral-8x7B-v0.1.
[7]
'HuggingFace: openchat/openchat_3.5'. https://huggingface.co/openchat/ openchat_3.5.
[8]
'HuggingFace: sentence-transformers/all-mpnet-base-v2'. https://huggingface. co/sentence-transformers/all-mpnet-base-v2.
[9]
'Microsoft Copilot'. https://copilot.microsoft.com.
[10]
'OWASP Top 10 for Large Language Model Applications'. https://owasp.org/ www-project-top-10-for-large-language-model-applications/.
[11]
'Prompt injection attacks against GPT-3'. https://simonwillison.net/2022/Sep/ 12/prompt-injection/.
[12]
'Securing LLM Systems Against Prompt Injection'. https://developer.nvidia.com/ blog/securing-llm-systems-against-prompt-injection/.
[13]
'Series: Prompt injection'. .
[14]
'The Impact of LLMs on the Legal Industry'. https://www.edgewortheconomics. com/insight-impact-LLMs-legal-industry.
[15]
'The Rise of AI-Powered Applications: Large Language Models in Modern Business'. https://www.computer.org/publications/tech-news/trends/largelanguage-models-in-modern-business.
[16]
'Thinking about the security of AI systems'. https://www.ncsc.gov.uk/blogpost/ thinking-about-security-ai-systems.
[17]
Tom Brown and et al. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1877--1901. Curran Associates, Inc., 2020.
[18]
Patrick Chao, Alexander Robey, Edgar Dobriban, Hamed Hassani, George J. Pappas, and Eric Wong. Jailbreaking black box large language models in twenty queries, 2023.
[19]
Sahil Chaudhary. Code alpaca: An instruction-following llama model for code generation. https://github.com/sahil280114/codealpaca, 2023.
[20]
Lichang Chen, Jiuhai Chen, Tom Goldstein, Heng Huang, and Tianyi Zhou. Instructzero: Efficient instruction optimization for black-box large language models. arXiv preprint arXiv:2306.03082, 2023.
[21]
Touvron et al. Llama 2: Open foundation and fine-tuned chat models, 2023.
[22]
Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you've signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, pages 79--90, 2023.
[23]
Albert Q Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, et al. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
[24]
Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de Las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, L'elio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. Mixtral of experts. 2024.
[25]
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Kuüttler, Mike Lewis,Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459--9474, 2020.
[26]
Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, and Yang Liu. Prompt injection attack against llm-integrated applications, 2023.
[27]
Long Ouyang and et al. Training language models to follow instructions with human feedback. volume 35, pages 27730--27744. Curran Associates, Inc., 2022.
[28]
Fábio Perez and Ian Ribeiro. Ignore previous prompt: Attack techniques for language models, 2022.
[29]
Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019.
[30]
Pranav Rajpurkar, Robin Jia, and Percy Liang. Know what you don't know: Unanswerable questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 784--789, Melbourne, Australia, July 2018. Association for Computational Linguistics.
[31]
Taylor Shin, Yasaman Razeghi, Robert L. Logan IV, Eric Wallace, and Sameer Singh. AutoPrompt: Eliciting knowledge from language models with automatically generated prompts. In Empirical Methods in Natural Language Processing (EMNLP), 2020.
[32]
Chawin Sitawarin, Norman Mu, David Wagner, and Alexandre Araujo. Pal: Proxy-guided black-box attack on large language models, 2024.
[33]
Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B. Hashimoto. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
[34]
Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, and Sameer Singh. Universal adversarial triggers for attacking and analyzing NLP. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLPIJCNLP), pages 2153--2162, Hong Kong, China, November 2019. Association for Computational Linguistics.
[35]
Guan Wang, Sijie Cheng, Xianyuan Zhan, Xiangang Li, Sen Song, and Yang Liu. Openchat: Advancing open-source language models with mixed-quality data, 2023.
[36]
Andy Zou, Zifan Wang, J. Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models, 2023.

Cited By

View all
  • (2025)On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and OutlookInternational Journal of Computer Vision10.1007/s11263-025-02375-wOnline publication date: 28-Feb-2025
  • (2024)Data Breach Prevention in AI Systems: Employing Event-Driven Architecture to Combat Prompt Injection Attacks in Chatbots2024 IEEE 12th International Conference on Information, Communication and Networks (ICICN)10.1109/ICICN62625.2024.10761619(626-632)Online publication date: 21-Aug-2024

Index Terms

  1. Neural Exec: Learning (and Learning from) Execution Triggers for Prompt Injection Attacks

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      AISec '24: Proceedings of the 2024 Workshop on Artificial Intelligence and Security
      November 2024
      225 pages
      ISBN:9798400712289
      DOI:10.1145/3689932
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 November 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. adversarial inputs
      2. ai readteam
      3. llms
      4. prompt injection
      5. rag

      Qualifiers

      • Research-article

      Conference

      CCS '24
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 94 of 231 submissions, 41%

      Upcoming Conference

      CCS '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)76
      • Downloads (Last 6 weeks)22
      Reflects downloads up to 07 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and OutlookInternational Journal of Computer Vision10.1007/s11263-025-02375-wOnline publication date: 28-Feb-2025
      • (2024)Data Breach Prevention in AI Systems: Employing Event-Driven Architecture to Combat Prompt Injection Attacks in Chatbots2024 IEEE 12th International Conference on Information, Communication and Networks (ICICN)10.1109/ICICN62625.2024.10761619(626-632)Online publication date: 21-Aug-2024

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media