Computer Science > Cryptography and Security

arXiv:2402.08416 (cs)

[Submitted on 13 Feb 2024]

Title:Pandora: Jailbreak GPTs by Retrieval Augmented Generation Poisoning

Authors:Gelei Deng, Yi Liu, Kailong Wang, Yuekang Li, Tianwei Zhang, Yang Liu

View PDF

Abstract:Large Language Models~(LLMs) have gained immense popularity and are being increasingly applied in various domains. Consequently, ensuring the security of these models is of paramount importance. Jailbreak attacks, which manipulate LLMs to generate malicious content, are recognized as a significant vulnerability. While existing research has predominantly focused on direct jailbreak attacks on LLMs, there has been limited exploration of indirect methods. The integration of various plugins into LLMs, notably Retrieval Augmented Generation~(RAG), which enables LLMs to incorporate external knowledge bases into their response generation such as GPTs, introduces new avenues for indirect jailbreak attacks.
To fill this gap, we investigate indirect jailbreak attacks on LLMs, particularly GPTs, introducing a novel attack vector named Retrieval Augmented Generation Poisoning. This method, Pandora, exploits the synergy between LLMs and RAG through prompt manipulation to generate unexpected responses. Pandora uses maliciously crafted content to influence the RAG process, effectively initiating jailbreak attacks. Our preliminary tests show that Pandora successfully conducts jailbreak attacks in four different scenarios, achieving higher success rates than direct attacks, with 64.3\% for GPT-3.5 and 34.8\% for GPT-4.

Comments:	6 pages
Subjects:	Cryptography and Security (cs.CR)
Cite as:	arXiv:2402.08416 [cs.CR]
	(or arXiv:2402.08416v1 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2402.08416

Submission history

From: Gelei Deng [view email]
[v1] Tue, 13 Feb 2024 12:40:39 UTC (17,347 KB)

Computer Science > Cryptography and Security

Title:Pandora: Jailbreak GPTs by Retrieval Augmented Generation Poisoning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:Pandora: Jailbreak GPTs by Retrieval Augmented Generation Poisoning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators