Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3672197.3673434acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Open access

Kgent: Kernel Extensions Large Language Model Agent

Published: 04 August 2024 Publication History

Abstract

The extended Berkeley Packet Filters (eBPF) ecosystem allows for the extension of Linux and Windows kernels, but writing eBPF programs is challenging due to the required knowledge of OS internals and programming limitations enforced by the eBPF verifier. These limitations ensure that only expert kernel developers can extend their kernels, making it difficult for junior sys admins, patch makers, and DevOps personnel to maintain extensions. This paper presents Kgent, an alternative framework that alleviates the difficulty of writing an eBPF program by allowing Kernel Extensions to be written in Natural language. Kgent uses recent advances in large language models (LLMs) to synthesize an eBPF program given a user's English language prompt. To ensure that LLM's output is semantically equivalent to the user's prompt, Kgent employs a combination of LLM-empowered program comprehension, symbolic execution, and a series of feedback loops. Kgent's key novelty is the combination of these techniques. In particular, the system uses symbolic execution in a novel structure that allows it to combine the results of program synthesis and program comprehension and build on the recent success that LLMs have shown for each of these tasks individually.
To evaluate Kgent, we develop a new corpus of natural language prompts for eBPF programs. We show that Kgent produces correct eBPF programs on 80%---which is an improvement of a factor of 2.67 compared to GPT-4 program synthesis baseline. Moreover, we find that Kgent very rarely synthesizes "false positive" eBPF programs--- i.e., eBPF programs that Kgent verifies as correct but manual inspection reveals to be semantically incorrect for the input prompt. The code for Kgent is publicly accessible at https://github.com/eunomia-bpf/KEN.

References

[1]
Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2021. Unified Pre-training for Program Understanding and Generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 2655--2668. https://doi.org/10.18653/v1/2021.naacl-main.211
[2]
Maximilian Bachl, Joachim Fabini, and Tanja Zseby. 2021. A flow-based IDS using Machine Learning in eBPF. CoRR abs/2102.09980 (2021). arXiv:2102.09980 https://arxiv.org/abs/2102.09980
[3]
Cristian Cadar, Daniel Dunbar, Dawson R Engler, et al. 2008. Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI, Vol. 8. 209--224.
[4]
Yiannis Charalambous, Norbert Tihanyi, Ridhi Jain, Youcheng Sun, Mohamed Amine Ferrag, and Lucas C Cordeiro. 2023. A New Era in Software Security: Towards Self-Healing Software via Large Language Models and Formal Verification. arXiv preprint arXiv:2305.14752 (2023).
[5]
Xinyun Chen, Maxwell Lin, Nathanael Schärli, and Denny Zhou. 2023. Teaching large language models to self-debug. arXiv preprint arXiv:2304.05128 (2023).
[6]
Vitaly Chipounov, Volodymyr Kuznetsov, and George Candea. 2011. S2E: A Platform for in-Vivo Multi-Path Analysis of Software Systems. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (Newport Beach, California, USA) (ASPLOS XVI). Association for Computing Machinery, New York, NY, USA, 265--278. https://doi.org/10.1145/1950365.1950396
[7]
Edmund M. Clarke and E. Allen Emerson. 1982. Design and synthesis of synchronization skeletons using branching time temporal logic. In Logics of Programs, Dexter Kozen (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 52--71.
[8]
Cloudflare. 2023. ebpf_exporter: eBPF-based exporter for Prometheus. GitHub repository. https://github.com/cloudflare/ebpf_exporter.
[9]
Alibaba Cloud Native Community. 2023. Seven Core Issues about eBPF. https://www.alibabacloud.com/blog/seven-core-issues-about-ebpf_599668.
[10]
Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In International conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 337--340.
[11]
eBPF for Windows Contributors. 2023. eBPF for Windows. https://github.com/microsoft/ebpf-for-windows.
[12]
Ahmed Elnaggar, Wei Ding, Llion Jones, Tom Gibbs, Tamas Feher, Christoph Angerer, Silvia Severini, Florian Matthes, and Burkhard Rost. 2021. CodeTrans: Towards Cracking the Language of Silicon's Code Through Self-Supervised Deep Learning and High Performance Computing. arXiv:2104.02443 [cs.SE]
[13]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong (YIMING), Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of EMNLP 2020. https://www.microsoft.com/en-us/research/publication/codebert-a-pre-trained-model-for-programming-and-natural- languages/
[14]
fuzzing book author. [n. d.]. The fuzzing book: Concolic Fuzzing. https://www.fuzzingbook.org/beta/html/SymbolicFuzzer.html.
[15]
Yoann Ghigoff, Julien Sopena, Kahina Lazri, Antoine Blin, and Gilles Muller. 2021. {BMC}: Accelerating Memcached using Safe In-kernel Caching and Pre-stack Processing. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21). 487--501.
[16]
Brenden Gregg. 2001. Brenden Gregg's Homepage. https://www.brendangregg.com/.
[17]
Brenden Gregg. 2016. Linux Extended BPF (eBPF) Tracing Tools. https://www.brendangregg.com/ebpf.html.
[18]
Brendan Gregg. 2021. Computing Performance. (2021).
[19]
Arie Gurfinkel, Temesghen Kahsai, and Jorge A Navas. 2015. SeaHorn: A framework for verifying C programs (competition contribution). In International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 447--450.
[20]
Robusta Intellegence. [n. d.]. LangChain. https://www.langchain.com/.
[21]
Jinghao Jia, Michael V Le, Salman Ahmed, Dan Williams, and Hani Jamjoom. 2023. Practical and Flexible Kernel CFI Enforcement using eBPF. In Proceedings of the 1st Workshop on eBPF and Kernel Extensions. 84--85.
[22]
Jinghao Jia, YiFei Zhu, Dan Williams, Andrea Arcangeli, Claudio Canella, Hubertus Franke, Tobin Feldman-Fitzthum, Dimitrios Skarlatos, Daniel Gruss, and Tianyin Xu. 2023. Programmable System Call Security with eBPF. arXiv preprint arXiv:2302.10366 (2023).
[23]
Andrea Mayer, Pierpaolo Loreti, Lorenzo Bracciale, Paolo Lungaroni, Stefano Salsano, and Clarence Filsfils. 2021. Performance Monitoring with H^2: Hybrid Kernel/eBPF data plane for SRv6 based Hybrid SDN. Computer Networks 185 (2021), 107705. https://doi.org/10.1016/j.comnet.2020.107705
[24]
Jeff H. Perkins and Michael D. Ernst. 2004. Efficient Incremental Algorithms for Dynamic Detection of Likely Invariants. In Proceedings of the 12th ACM SIGSOFT Twelfth International Symposium on Foundations of Software Engineering (Newport Beach, CA, USA) (SIGSOFT '04/FSE-12). Association for Computing Machinery, New York, NY, USA, 23--32. https://doi.org/10.1145/1029894.1029901
[25]
Long Phan, Hieu Tran, Daniel Le, Hieu Nguyen, James Anibal, Alec Peltekian, and Yanfang Ye. 2021. Cotext: Multi-task learning with code-text transformer. arXiv preprint arXiv:2105.08645 (2021).
[26]
Gabriel Poesia, Kanishk Gandhi, Eric Zelikman, and Noah D Goodman. 2023. Certified Reasoning with Language Models. arXiv preprint arXiv:2306.04031 (2023).
[27]
IO Visor Project. 2023. BPF Compiler Collection (bcc). Available: https://github.com/iovisor/bcc.
[28]
J. P. Queille and J. Sifakis. 1982. Specification and verification of concurrent systems in CESAR. In International Symposium on Programming, Mariangiola Dezani-Ciancaglini and Ugo Montanari (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 337--351.
[29]
Agam Shah. [n.d.]. Google TPU v5e AI Chip Debuts after Controversial Origins. https://www.enterpriseai.news/2023/08/31/google-tpu-v5e-ai-chip-debuts-after-controversial-origins/.
[30]
IO Visor. 2023. bpftrace: High-level tracing language for Linux eBPF. GitHub repository. https://github.com/iovisor/bpftrace.
[31]
Jianguo Wang, Xiaomeng Yi, Rentong Guo, Hai Jin, Peng Xu, Shengjun Li, Xiangyu Wang, Xiangzhou Guo, Chengming Li, Xiaohai Xu, et al. 2021. Milvus: A purpose-built vector data management system. In Proceedings of the 2021 International Conference on Management of Data. 2614--2627.
[32]
Yaqing Wang, Quanming Yao, James T Kwok, and Lionel M Ni. 2020. Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur) 53, 3 (2020), 1--34.
[33]
Wikipedia. [n.d.]. The Wikipedia of HarmonyOS. https://en.wikipedia.org/wiki/HarmonyOS.
[34]
Hongqiu Wu, Hai Zhao, and Min Zhang. 2020. Code summarization with structure-induced transformer. arXiv preprint arXiv:2012.14710 (2020).
[35]
Zhe Yang, Youyou Lu, Xiaojian Liao, Youmin Chen, Junru Li, Siyu He, and Jiwu Shu. 2023. {λ-IO}: A Unified {IO} Stack for Computational Storage. In 21st USENIX Conference on File and Storage Technologies (FAST 23). 347--362.
[36]
Yusheng Zheng, Tong Yu, Yiwei Yang, Yanpeng Hu, Xiaozheng Lai, and Andrew Quinn. 2023. bpftime: userspace eBPF Runtime for Uprobe, Syscall and Kernel-User Interactions. arXiv:2311.07923 [cs.OS]
[37]
Yuhong Zhong, Haoyu Li, Yu Jian Wu, Ioannis Zarkadas, Jeffrey Tao, Evan Mesterhazy, Michael Makris, Junfeng Yang, Amy Tai, Ryan Stutsman, and Asaf Cidon. 2022. XRP: In-Kernel Storage Functions with eBPF. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). USENIX Association, Carlsbad, CA, 375--393. https://www.usenix.org/conference/osdi22/presentation/zhong
[38]
Daniel Zügner, Tobias Kirschstein, Michele Catasta, Jure Leskovec, and Stephan Günnemann. 2021. Language-agnostic representation learning of source code from structure and context. arXiv preprint arXiv:2103.11318 (2021).

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
eBPF '24: Proceedings of the ACM SIGCOMM 2024 Workshop on eBPF and Kernel Extensions
August 2024
77 pages
ISBN:9798400707124
DOI:10.1145/3672197
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2024

Check for updates

Author Tags

  1. Large Language Model
  2. Symbolic Execution
  3. eBPF

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ACM SIGCOMM '24
Sponsor:
ACM SIGCOMM '24: ACM SIGCOMM 2024 Conference
August 4 - 8, 2024
NSW, Sydney, Australia

Acceptance Rates

Overall Acceptance Rate 12 of 21 submissions, 57%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 479
    Total Downloads
  • Downloads (Last 12 months)479
  • Downloads (Last 6 weeks)194
Reflects downloads up to 26 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media