Computer Science > Cryptography and Security

arXiv:2402.11814 (cs)

[Submitted on 19 Feb 2024]

Title:An Empirical Evaluation of LLMs for Solving Offensive Security Challenges

Authors:Minghao Shao, Boyuan Chen, Sofija Jancheska, Brendan Dolan-Gavitt, Siddharth Garg, Ramesh Karri, Muhammad Shafique

View PDF

Abstract:Capture The Flag (CTF) challenges are puzzles related to computer security scenarios. With the advent of large language models (LLMs), more and more CTF participants are using LLMs to understand and solve the challenges. However, so far no work has evaluated the effectiveness of LLMs in solving CTF challenges with a fully automated workflow. We develop two CTF-solving workflows, human-in-the-loop (HITL) and fully-automated, to examine the LLMs' ability to solve a selected set of CTF challenges, prompted with information about the question. We collect human contestants' results on the same set of questions, and find that LLMs achieve higher success rate than an average human participant. This work provides a comprehensive evaluation of the capability of LLMs in solving real world CTF challenges, from real competition to fully automated workflow. Our results provide references for applying LLMs in cybersecurity education and pave the way for systematic evaluation of offensive cybersecurity capabilities in LLMs.

Subjects:	Cryptography and Security (cs.CR)
Cite as:	arXiv:2402.11814 [cs.CR]
	(or arXiv:2402.11814v1 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2402.11814

Submission history

From: Sofija Jancheska [view email]
[v1] Mon, 19 Feb 2024 04:08:44 UTC (1,841 KB)

Computer Science > Cryptography and Security

Title:An Empirical Evaluation of LLMs for Solving Offensive Security Challenges

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:An Empirical Evaluation of LLMs for Solving Offensive Security Challenges

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators