GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models.

AllImages Books Videos Maps News Shopping

GUARD: Role-playing to Generate Natural-language Jailbreakings to ...

Feb 5, 2024 · In this paper, we follow a novel yet intuitive strategy to generate jailbreaks in the style of the human generation.

GUARD: Role-playing to Generate Natural-language Jailbreakings to ...

openreview.net › forum

Mar 3, 2024 · In this paper, we propose a role-playing system, namely GUARD (Guideline Upholding through Adaptive Role-play Diagnostics), which can automatically follow the ...

Quack: Automatic Jailbreaking Large Language Models via Role-playing

Jailbreaking Black Box Large Language Models in Twenty Queries

More results from openreview.net

[PDF] GUARD: ROLE-PLAYING TO GENERATE NATURAL - OpenReview

openreview.net › pdf

GUARD works based on four role-playing LLMs: Translator, Generator, Evaluator and Optimizer, which work jointly towards successful natural-language jailbreaks.

GUARD: Role-playing to Generate Natural-language Jailbreakings to ...

www.semanticscholar.org › paper

A role-playing system that assigns four different roles to the user LLMs to collaborate on new jailbreaks and pioneer a setting in this system that will ...

GUARD: Role-playing to Generate Natural-language Jailbreakings to ...

bytez.com › docs › arxiv › paper

GUARD is a system that tests if AI models follow guidelines by generating tricky questions. It uses four AI models for creating, organizing, assessing, ...

GUARD: Role-playing to Generate Natural-language Jailbreakings to ...

www.aimodels.fyi › papers › arxiv › gua...

Jun 2, 2024 · Researchers propose a novel system called GUARD (Guideline Upholding through Adaptive Role-play Diagnostics) to proactively test large language ...

GUARD: Role-playing to Generate Natural-language Jailbreakings to ...

arxiv-sanity-lite.com › ...

In this paper, we propose GoLLIE (Guideline-following Large Language Model for IE), a model able to improve zero-shot results on unseen IE tasks by virtue of ...

Role-playing for LLM Jailbreak Testing - GoatStack.AI

goatstack.ai › topics › role-playing-for-ll...

GUARD's role-playing generates guideline-violating responses in LLMs. The methodology is tested on three open-source LLMs and a commercial LLM. This paper ...

GitHub - isXinLiu/Awesome-MLLM-Safety

github.com › isXinLiu › Awesome-MLL...

GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models. Haibo Jin, Ruoxi Chen, Andy Zhou, Jinyin ...

[PDF] arXiv:2404.00629v1 [cs.CL] 31 Mar 2024

arxiv.org › pdf

Mar 31, 2024 · Guard: Role- playing to generate natural-language jailbreakings to test guideline adherence of large language models. arXiv:2402.03299, 2024 ...