Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3689933.3690831acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article
Free access

PenHeal: A Two-Stage LLM Framework for Automated Pentesting and Optimal Remediation

Published: 07 November 2024 Publication History

Abstract

Recent advances in Large Language Models (LLMs) have shown significant potential in enhancing cybersecurity defenses against sophisticated threats. LLM-based penetration testing is an essential step in automating system security evaluations by identifying vulnerabilities. Remediation, the subsequent crucial step, addresses these discovered vulnerabilities. Since details about vulnerabilities, exploitation methods, and software versions offer crucial insights into system weaknesses, integrating penetration testing with vulnerability remediation into a cohesive system has become both intuitive and necessary.
This paper introduces PenHeal, a two-stage LLM-based framework designed to autonomously identify and mitigate security vulnerabilities. The framework integrates two LLM-enabled components: the Pentest Module, which detects multiple vulnerabilities within a system, and the Remediation Module, which recommends optimal remediation strategies. The integration is facilitated through Counterfactual Prompting and an Instructor module that guides the LLMs using external knowledge to explore multiple potential attack paths effectively. Our experimental results demonstrate that PenHeal not only automates the identification and remediation of vulnerabilities but also significantly improves vulnerability coverage by 31%, increases the effectiveness of remediation strategies by 32%, and reduces the associated costs by 46% compared to baseline models. These outcomes highlight the transformative potential of LLMs in reshaping cybersecurity practices, offering an innovative solution to defend against cyber threats.

References

[1]
Tom B. Brown et al. 2020. Language models are few-shot learners. (2020). arXiv: 2005.14165.
[2]
OpenAI et al. 2024. Gpt-4 technical report. (2024). arXiv: 2303.08774.
[3]
Hugo Touvron et al. 2023. Llama 2: open foundation and fine-tuned chat models. (2023). arXiv: 2307.09288.
[4]
Gemini Team et al. 2024. Gemini: a family of highly capable multimodal models. (2024). arXiv: 2312.11805.
[5]
Junyi Li, Tianyi Tang, Wayne Xin Zhao, Jian-Yun Nie, and Ji-Rong Wen. 2024. Pre-trained language models for text generation: a survey. ACM Comput. Surv., 56, 9, Article 230, (Apr. 2024), 39 pages.
[6]
QiangWang, Bei Li, Tong Xiao, Jingbo Zhu, Changliang Li, Derek F.Wong, and Lidia S. Chao. 2019. Learning deep transformer models for machine translation. (2019). https://arxiv.org/abs/1906.01787 arXiv: 1906.01787 [cs.CL].
[7]
Haoyu Zhang, Jianjun Xu, and Ji Wang. 2019. Pretraining-based natural language generation for text summarization. (2019). https://arxiv.org/abs/1902.09 243 arXiv: 1902.09243 [cs.CL].
[8]
Hamad Al-Mohannadi, Qublai Mirza, Anitta Namanya, Irfan Awan, Andrea Cullen, and Jules Disso. 2016. Cyber-attack modeling analysis techniques: an overview. In 2016 IEEE 4th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW), 69--76.
[9]
Matt Bishop. 2007. About penetration testing. IEEE Security & Privacy, 5, 6, 84--87.
[10]
Xin Zhou, Sicong Cao, Xiaobing Sun, and David Lo. 2024. Large language model for vulnerability detection and repair: literature review and the road ahead. (2024). arXiv: 2404.02525.
[11]
Samaneh Shafee, Alysson Bessani, and Pedro M. Ferreira. 2024. Evaluation of llm chatbots for osint-based cyber threat awareness. (2024). arXiv: 2401.15127.
[12]
Marwan Omar. 2023. Detecting software vulnerabilities using language models. (2023). arXiv: 2302.11773.
[13]
Gelei Deng et al. 2023. Pentestgpt: an llm-empowered automatic penetration testing tool. (2023). arXiv: 2308.06782 [cs.SE].
[14]
Jiacen Xu, Jack W. Stokes, Geoff McDonald, Xuesong Bai, David Marshall, Siyue Wang, Adith Swaminathan, and Zhou Li. 2024. Autoattacker: a large language model guided system to implement automatic cyber-attacks. (2024). arXiv: 2403.01038.
[15]
Georgia Weidman. 2014. Penetration Testing: A Hands-On Introduction to Hacking. No Starch Press, 528. isbn: 9781593275648.
[16]
Zhenguo Hu, Razvan Beuran, and Yasuo Tan. 2020. Automated penetration testing using deep reinforcement learning. In 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), 2--10. 1379.2020.00010.
[17]
Ge Chu and Alexei Lisitsa. 2018. Penetration testing for internet of things and its automation. In 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/ DSS), 1479--1484.
[18]
Sujita Chaudhary, Austin O'Brien, and Shengjie Xu. 2020. Automated postbreach penetration testing through reinforcement learning. In 2020 IEEE Conference on Communications and Network Security (CNS), 1--2. 8642.2020.9162301.
[19]
Michael Fu, Chakkrit Tantithamthavorn, Trung Le, Van Nguyen, and Dinh Phung. 2022. Vulrepair: a t5-based automated software vulnerability repair. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022). Association for Computing Machinery, Singapore, Singapore, 935--947. isbn: 9781450394130.
[20]
Alexander Marchand-Melsom and Duong Bao Nguyen Mai. 2020. Automatic repair of owasp top 10 security vulnerabilities: a survey. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops (ICSEW'20). Association for Computing Machinery, Seoul, Republic of Korea, 23--30. isbn: 9781450379632.
[21]
Eduard Pinconschi, Rui Abreu, and Pedro Adão. 2021. A comparative study of automatic program repair techniques for security vulnerabilities. In 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE), 196--207.
[22]
Zimin Chen, Steve Kommrusch, and Martin Monperrus. 2023. Neural transfer learning for repairing security vulnerabilities in c code. IEEE Transactions on Software Engineering, 49, 1, (Jan. 2023).
[23]
Jacob Harer, Onur Ozdemir, Tomo Lazovich, Christopher P. Reale, Rebecca L. Russell, Louis Y. Kim, and Peter Chin. 2018. Learning to repair software vulnerabilities with generative adversarial networks. (2018). arXiv: 1805.07475.
[24]
MITRE Corporation. 2021. Common Vulnerabilities and Exposures (CVE). Accessed: 2024-06--21. The MITRE Corporation. https://cve.mitre.org.
[25]
2021. CVSS v3.1 Specification Document. https://www.first.org/cvss/v3.1/spec ification-document. Accessed: 2024-06--21. FIRST, (2021).
[26]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2023. Attention is all you need. (2023). arXiv: 1706.03762.
[27]
Mahdi Namazifar, Alexandros Papangelis, Gokhan Tur, and Dilek Hakkani- Tür. 2021. Language model is all you need: natural language understanding as question answering. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7803--7807. /ICASSP39728.2021.9413810.
[28]
Patrick Lewis et al. 2021. Retrieval-augmented generation for knowledgeintensive nlp tasks. (2021). arXiv: 2005.11401.
[29]
Aobo Kong, Shiwan Zhao, Hao Chen, Qicheng Li, Yong Qin, Ruiqi Sun, Xin Zhou, Enzhi Wang, and Xiaohang Dong. 2024. Better zero-shot reasoning with role-play prompting. (2024). arXiv: 2308.07702.
[30]
Zhiheng Xi et al. 2023. The rise and potential of large language model based agents: a survey. (2023). arXiv: 2309.07864.
[31]
Yujia Qin et al. 2023. Toolllm: facilitating large language models to master 16000 real-world apis. (2023). arXiv: 2307.16789.
[32]
Tamás Vörös, Sean Paul Bergeron, and Konstantin Berlin. 2023. Web content filtering through knowledge distillation of large language models. (2023). arXiv: 2305.05027.
[33]
Mehrdad Kaheh, Danial Khosh Kholgh, and Panos Kostakos. 2023. Cyber sentinel: exploring conversational agents in streamlining security tasks with gpt-4. (2023). https://arxiv.org/abs/2309.16422 arXiv: 2309.16422 [cs.CR].
[34]
John Yang, Akshara Prabhakar, Shunyu Yao, Kexin Pei, and Karthik R Narasimhan. 2023. Language agents as hackers: evaluating cybersecurity skills with capture the flag. In Multi-Agent Security Workshop @ NeurIPS'23. https://openreview.n et/forum?id=KOZwk7BFc3.
[35]
Wesley Tann, Yuancheng Liu, Jun Heng Sim, Choon Meng Seah, and Ee-Chien Chang. 2023. Using large language models for cybersecurity capture-the-flag challenges and certification questions. (2023). arXiv: 2308.10443.
[36]
Minghao Shao, Boyuan Chen, Sofija Jancheska, Brendan Dolan-Gavitt, Siddharth Garg, Ramesh Karri, and Muhammad Shafique. 2024. An empirical evaluation of llms for solving offensive security challenges. (2024). arXiv: 2402 .11814.
[37]
Andreas Happe and Jürgen Cito. 2023. Getting pwn?d by ai: penetration testing with large language models. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE '23). ACM, (Nov. 2023).
[38]
Andreas Happe, Aaron Kaplan, and Jürgen Cito. 2024. Llms as hackers: autonomous linux privilege escalation attacks. (2024). arXiv: 2310.11409.
[39]
2024. Vulnhub. Retrieved 2024-06--20 from https://www.vulnhub.com/.
[40]
[n. d.] HacktheBox: Hacking training for the best. http://www.hackthebox.com/. Accessed: 2024-06--20. ().
[41]
National Institute of Standards and Technology (NIST). 2024. Nvd - vulnerabilities. National Vulnerability Database (NVD). Accessed: 2024-06--21. (2024). https://nvd.nist.gov/developers/vulnerabilities.
[42]
Red Hat Product Security. 2024. Python cvss calculator. GitHub repository. Accessed: 2024-06--21. (2024). https://github.com/RedHatProductSecurity/cvss.
[43]
Rapid7. [n. d.] Metasploitable 2. https://docs.rapid7.com/metasploit/metasploit able-2. Accessed: 2024-06--20. ().
[44]
LangChain Contributors. 2023. Langchain: an open-source library for building language models. Available on GitHub. Accessed: 2023-09--30. (2023). https://gi thub.com/LangChain/langchain.
[45]
Nipun Jaswal, Daniel Teixeira, Abhinav Singh, and Monika Agarwal. 2018. Metasploit Penetration Testing Cookbook: Evade Antiviruses, Bypass Firewalls, and Exploit Complex Environments with the Most Widely Used Penetration Testing Framework. (3rd ed.). Packt Publishing, (Feb. 2018). isbn: 9781788623179.

Index Terms

  1. PenHeal: A Two-Stage LLM Framework for Automated Pentesting and Optimal Remediation

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    AutonomousCyber '24: Proceedings of the Workshop on Autonomous Cybersecurity
    November 2024
    73 pages
    ISBN:9798400712296
    DOI:10.1145/3689933
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 November 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cybersecurity automation
    2. llms
    3. penetration testing
    4. retrieval-augmented generation
    5. vulnerability remediation

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    CCS '24
    Sponsor:

    Upcoming Conference

    CCS '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 36
      Total Downloads
    • Downloads (Last 12 months)36
    • Downloads (Last 6 weeks)36
    Reflects downloads up to 18 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media