Nothing Special   »   [go: up one dir, main page]

×
Please click here if you are not redirected within a few seconds.
Feb 5, 2024 · In this paper, we follow a novel yet intuitive strategy to generate jailbreaks in the style of the human generation.
Mar 3, 2024 · In this paper, we propose a role-playing system, namely GUARD (Guideline Upholding through Adaptive Role-play Diagnostics), which can automatically follow the ...
GUARD works based on four role-playing LLMs: Translator, Generator, Evaluator and Optimizer, which work jointly towards successful natural-language jailbreaks.
A role-playing system that assigns four different roles to the user LLMs to collaborate on new jailbreaks and pioneer a setting in this system that will ...
GUARD is a system that tests if AI models follow guidelines by generating tricky questions. It uses four AI models for creating, organizing, assessing, ...
Jun 2, 2024 · Researchers propose a novel system called GUARD (Guideline Upholding through Adaptive Role-play Diagnostics) to proactively test large language ...
In this paper, we propose GoLLIE (Guideline-following Large Language Model for IE), a model able to improve zero-shot results on unseen IE tasks by virtue of ...
GUARD's role-playing generates guideline-violating responses in LLMs. The methodology is tested on three open-source LLMs and a commercial LLM. This paper ...
GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models. Haibo Jin, Ruoxi Chen, Andy Zhou, Jinyin ...
Mar 31, 2024 · Guard: Role- playing to generate natural-language jailbreakings to test guideline adherence of large language models. arXiv:2402.03299, 2024 ...