research-article

Free access

PenHeal: A Two-Stage LLM Framework for Automated Pentesting and Optimal Remediation

Authors:

Quanyan ZhuAuthors Info & Claims

AutonomousCyber '24: Proceedings of the Workshop on Autonomous Cybersecurity

Pages 11 - 22

https://doi.org/10.1145/3689933.3690831

Published: 07 November 2024 Publication History

Abstract

Recent advances in Large Language Models (LLMs) have shown significant potential in enhancing cybersecurity defenses against sophisticated threats. LLM-based penetration testing is an essential step in automating system security evaluations by identifying vulnerabilities. Remediation, the subsequent crucial step, addresses these discovered vulnerabilities. Since details about vulnerabilities, exploitation methods, and software versions offer crucial insights into system weaknesses, integrating penetration testing with vulnerability remediation into a cohesive system has become both intuitive and necessary.

This paper introduces PenHeal, a two-stage LLM-based framework designed to autonomously identify and mitigate security vulnerabilities. The framework integrates two LLM-enabled components: the Pentest Module, which detects multiple vulnerabilities within a system, and the Remediation Module, which recommends optimal remediation strategies. The integration is facilitated through Counterfactual Prompting and an Instructor module that guides the LLMs using external knowledge to explore multiple potential attack paths effectively. Our experimental results demonstrate that PenHeal not only automates the identification and remediation of vulnerabilities but also significantly improves vulnerability coverage by 31%, increases the effectiveness of remediation strategies by 32%, and reduces the associated costs by 46% compared to baseline models. These outcomes highlight the transformative potential of LLMs in reshaping cybersecurity practices, offering an innovative solution to defend against cyber threats.

References

[1]

Tom B. Brown et al. 2020. Language models are few-shot learners. (2020). arXiv: 2005.14165.

[2]

OpenAI et al. 2024. Gpt-4 technical report. (2024). arXiv: 2303.08774.

[3]

Hugo Touvron et al. 2023. Llama 2: open foundation and fine-tuned chat models. (2023). arXiv: 2307.09288.

[4]

Gemini Team et al. 2024. Gemini: a family of highly capable multimodal models. (2024). arXiv: 2312.11805.

[5]

Junyi Li, Tianyi Tang, Wayne Xin Zhao, Jian-Yun Nie, and Ji-Rong Wen. 2024. Pre-trained language models for text generation: a survey. ACM Comput. Surv., 56, 9, Article 230, (Apr. 2024), 39 pages.

Digital Library

[6]

QiangWang, Bei Li, Tong Xiao, Jingbo Zhu, Changliang Li, Derek F.Wong, and Lidia S. Chao. 2019. Learning deep transformer models for machine translation. (2019). https://arxiv.org/abs/1906.01787 arXiv: 1906.01787 [cs.CL].

[7]

Haoyu Zhang, Jianjun Xu, and Ji Wang. 2019. Pretraining-based natural language generation for text summarization. (2019). https://arxiv.org/abs/1902.09 243 arXiv: 1902.09243 [cs.CL].

[8]

Hamad Al-Mohannadi, Qublai Mirza, Anitta Namanya, Irfan Awan, Andrea Cullen, and Jules Disso. 2016. Cyber-attack modeling analysis techniques: an overview. In 2016 IEEE 4th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW), 69--76.

[9]

Matt Bishop. 2007. About penetration testing. IEEE Security & Privacy, 5, 6, 84--87.

Digital Library

[10]

Xin Zhou, Sicong Cao, Xiaobing Sun, and David Lo. 2024. Large language model for vulnerability detection and repair: literature review and the road ahead. (2024). arXiv: 2404.02525.

[11]

Samaneh Shafee, Alysson Bessani, and Pedro M. Ferreira. 2024. Evaluation of llm chatbots for osint-based cyber threat awareness. (2024). arXiv: 2401.15127.

[12]

Marwan Omar. 2023. Detecting software vulnerabilities using language models. (2023). arXiv: 2302.11773.

[13]

Gelei Deng et al. 2023. Pentestgpt: an llm-empowered automatic penetration testing tool. (2023). arXiv: 2308.06782 [cs.SE].

[14]

Jiacen Xu, Jack W. Stokes, Geoff McDonald, Xuesong Bai, David Marshall, Siyue Wang, Adith Swaminathan, and Zhou Li. 2024. Autoattacker: a large language model guided system to implement automatic cyber-attacks. (2024). arXiv: 2403.01038.

[15]

Georgia Weidman. 2014. Penetration Testing: A Hands-On Introduction to Hacking. No Starch Press, 528. isbn: 9781593275648.

[16]

Zhenguo Hu, Razvan Beuran, and Yasuo Tan. 2020. Automated penetration testing using deep reinforcement learning. In 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), 2--10. 1379.2020.00010.

[17]

Ge Chu and Alexei Lisitsa. 2018. Penetration testing for internet of things and its automation. In 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/ DSS), 1479--1484.

[18]

Sujita Chaudhary, Austin O'Brien, and Shengjie Xu. 2020. Automated postbreach penetration testing through reinforcement learning. In 2020 IEEE Conference on Communications and Network Security (CNS), 1--2. 8642.2020.9162301.

[19]

Michael Fu, Chakkrit Tantithamthavorn, Trung Le, Van Nguyen, and Dinh Phung. 2022. Vulrepair: a t5-based automated software vulnerability repair. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022). Association for Computing Machinery, Singapore, Singapore, 935--947. isbn: 9781450394130.

Digital Library

[20]

Alexander Marchand-Melsom and Duong Bao Nguyen Mai. 2020. Automatic repair of owasp top 10 security vulnerabilities: a survey. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops (ICSEW'20). Association for Computing Machinery, Seoul, Republic of Korea, 23--30. isbn: 9781450379632.

Digital Library

[21]

Eduard Pinconschi, Rui Abreu, and Pedro Adão. 2021. A comparative study of automatic program repair techniques for security vulnerabilities. In 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE), 196--207.

[22]

Zimin Chen, Steve Kommrusch, and Martin Monperrus. 2023. Neural transfer learning for repairing security vulnerabilities in c code. IEEE Transactions on Software Engineering, 49, 1, (Jan. 2023).

[23]

Jacob Harer, Onur Ozdemir, Tomo Lazovich, Christopher P. Reale, Rebecca L. Russell, Louis Y. Kim, and Peter Chin. 2018. Learning to repair software vulnerabilities with generative adversarial networks. (2018). arXiv: 1805.07475.

[24]

MITRE Corporation. 2021. Common Vulnerabilities and Exposures (CVE). Accessed: 2024-06--21. The MITRE Corporation. https://cve.mitre.org.

[25]

2021. CVSS v3.1 Specification Document. https://www.first.org/cvss/v3.1/spec ification-document. Accessed: 2024-06--21. FIRST, (2021).

[26]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2023. Attention is all you need. (2023). arXiv: 1706.03762.

[27]

Mahdi Namazifar, Alexandros Papangelis, Gokhan Tur, and Dilek Hakkani- Tür. 2021. Language model is all you need: natural language understanding as question answering. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7803--7807. /ICASSP39728.2021.9413810.

[28]

Patrick Lewis et al. 2021. Retrieval-augmented generation for knowledgeintensive nlp tasks. (2021). arXiv: 2005.11401.

[29]

Aobo Kong, Shiwan Zhao, Hao Chen, Qicheng Li, Yong Qin, Ruiqi Sun, Xin Zhou, Enzhi Wang, and Xiaohang Dong. 2024. Better zero-shot reasoning with role-play prompting. (2024). arXiv: 2308.07702.

[30]

Zhiheng Xi et al. 2023. The rise and potential of large language model based agents: a survey. (2023). arXiv: 2309.07864.

[31]

Yujia Qin et al. 2023. Toolllm: facilitating large language models to master 16000 real-world apis. (2023). arXiv: 2307.16789.

[32]

Tamás Vörös, Sean Paul Bergeron, and Konstantin Berlin. 2023. Web content filtering through knowledge distillation of large language models. (2023). arXiv: 2305.05027.

[33]

Mehrdad Kaheh, Danial Khosh Kholgh, and Panos Kostakos. 2023. Cyber sentinel: exploring conversational agents in streamlining security tasks with gpt-4. (2023). https://arxiv.org/abs/2309.16422 arXiv: 2309.16422 [cs.CR].

[34]

John Yang, Akshara Prabhakar, Shunyu Yao, Kexin Pei, and Karthik R Narasimhan. 2023. Language agents as hackers: evaluating cybersecurity skills with capture the flag. In Multi-Agent Security Workshop @ NeurIPS'23. https://openreview.n et/forum?id=KOZwk7BFc3.

[35]

Wesley Tann, Yuancheng Liu, Jun Heng Sim, Choon Meng Seah, and Ee-Chien Chang. 2023. Using large language models for cybersecurity capture-the-flag challenges and certification questions. (2023). arXiv: 2308.10443.

[36]

Minghao Shao, Boyuan Chen, Sofija Jancheska, Brendan Dolan-Gavitt, Siddharth Garg, Ramesh Karri, and Muhammad Shafique. 2024. An empirical evaluation of llms for solving offensive security challenges. (2024). arXiv: 2402 .11814.

[37]

Andreas Happe and Jürgen Cito. 2023. Getting pwn?d by ai: penetration testing with large language models. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE '23). ACM, (Nov. 2023).

Digital Library

[38]

Andreas Happe, Aaron Kaplan, and Jürgen Cito. 2024. Llms as hackers: autonomous linux privilege escalation attacks. (2024). arXiv: 2310.11409.

[39]

2024. Vulnhub. Retrieved 2024-06--20 from https://www.vulnhub.com/.

[40]

[n. d.] HacktheBox: Hacking training for the best. http://www.hackthebox.com/. Accessed: 2024-06--20. ().

[41]

National Institute of Standards and Technology (NIST). 2024. Nvd - vulnerabilities. National Vulnerability Database (NVD). Accessed: 2024-06--21. (2024). https://nvd.nist.gov/developers/vulnerabilities.

[42]

Red Hat Product Security. 2024. Python cvss calculator. GitHub repository. Accessed: 2024-06--21. (2024). https://github.com/RedHatProductSecurity/cvss.

[43]

Rapid7. [n. d.] Metasploitable 2. https://docs.rapid7.com/metasploit/metasploit able-2. Accessed: 2024-06--20. ().

[44]

LangChain Contributors. 2023. Langchain: an open-source library for building language models. Available on GitHub. Accessed: 2023-09--30. (2023). https://gi thub.com/LangChain/langchain.

[45]

Nipun Jaswal, Daniel Teixeira, Abhinav Singh, and Monika Agarwal. 2018. Metasploit Penetration Testing Cookbook: Evade Antiviruses, Bypass Firewalls, and Exploit Complex Environments with the Most Widely Used Penetration Testing Framework. (3rd ed.). Packt Publishing, (Feb. 2018). isbn: 9781788623179.

Index Terms

PenHeal: A Two-Stage LLM Framework for Automated Pentesting and Optimal Remediation
1. Security and privacy
  1. Systems security
    1. Vulnerability management
      1. Penetration testing

Recommendations

Demo: Large Scale Analysis on Vulnerability Remediation in Open-source JavaScript Projects
CCS '21: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security

Given the widespread prevalence of vulnerabilities, remediation is a critical phase that every software project has to go through. When comparing the studies on understanding the security vulnerabilities in software, such as vulnerability discovery and ...
Vulnerability and Remediation for a High-assurance Web-based Enterprise
ICEIS 2014: Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 2

A process for fielding vulnerability free software in the enterprise is discussed. This process involves testing for known vulnerabilities, generic penetration testing and threat specific testing coupled with a strong flaw remediation process. The ...
Mastering Kali Linux Wireless Pentesting

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

AutonomousCyber '24: Proceedings of the Workshop on Autonomous Cybersecurity

November 2024

73 pages

ISBN:9798400712296

DOI:10.1145/3689933

General Chairs:
Ali Dehghantanha
University of Guelph, Canada
,
Reza M. Parizi
Kennesaw State University, USA
,
Gregory Epiphaniou
University of Warwick, UK

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

CCS '24

Sponsor:

SIGSAC

CCS '24: ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

UT, Salt Lake City, USA

Upcoming Conference

CCS '25

Sponsor:
sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 13 - 17, 2025

Taipei , Taiwan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
36
Total Downloads

Downloads (Last 12 months)36
Downloads (Last 6 weeks)36

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents