research-article

Open access

Assisting Static Analysis with Large Language Models: A ChatGPT Experiment

Authors:

Zhiyun QianAuthors Info & Claims

ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 2107 - 2111

https://doi.org/10.1145/3611643.3613078

Published: 30 November 2023 Publication History

Abstract

Recent advances of Large Language Models (LLMs), e.g., ChatGPT, exhibited strong capabilities of comprehending and responding to questions across a variety of domains. Surprisingly, ChatGPT even possesses a strong understanding of program code. In this paper, we investigate where and how LLMs can assist static analysis by asking appropriate questions. In particular, we target a specific bug-finding tool, which produces many false positives from the static analysis. In our evaluation, we find that these false positives can be effectively pruned by asking carefully constructed questions about function-level behaviors or function summaries. Specifically, with a pilot study of 20 false positives, we can successfully prune 8 out of 20 based on GPT-3.5, whereas GPT-4 had a near-perfect result of 16 out of 20, where the four failed ones are not currently considered/supported by our questions, e.g., involving concurrency. Additionally, it also identified one false negative case (a missed bug). We find LLMs a promising tool that can enable a more effective and efficient program analysis.

References

[1]

OpenAI (2023). 2023. GPT-4 Technical Report. arxiv:2303.08774 arXiv:2303.08774 [cs]

[2]

Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson Engler. 2010. A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World. Commun. ACM, 66–75.

Digital Library

[3]

Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, and Yi Zhang. 2023. Sparks of Artificial General Intelligence: Early experiments with GPT-4. arxiv:2303.12712 arXiv:2303.12712 [cs]

[4]

Jiuhai Chen, Lichang Chen, Heng Huang, and Tianyi Zhou. 2023. When do you need Chain-of-Thought Prompting for ChatGPT? arxiv:2304.03262 arXiv:2304.03262 [cs]

[5]

Clang. 2023. Clang Language Extensions. https://clang.llvm.org/docs/LanguageExtensions.html##builtin-functions

[6]

Brittany Johnson, Yoonki Song, Emerson Murphy-Hill, and Robert Bowdidge. 2013. Why don’t software developers use static analysis tools to find bugs? In 2013 35th International Conference on Software Engineering (ICSE). 672–681.

[7]

Caroline Lemieux, Jeevana Priya Inala, Shuvendu K Lahiri, and Siddhartha Sen. 2023. CODAMOSA: Escaping Coverage Plateaus in Test Generation with Pre-trained Large Language Models.

[8]

Haonan Li, Yu Hao, Yizhuo Zhai, and Zhiyun Qian. 2023. The Hitchhiker’s Guide to Program Analysis: A Journey with Large Language Models. arxiv:2308.00245.

[9]

Qing Lyu, Shreya Havaldar, Adam Stein, Li Zhang, Delip Rao, Eric Wong, Marianna Apidianaki, and Chris Callison-Burch. 2023. Faithful Chain-of-Thought Reasoning. arxiv:2301.13379.

[10]

Bertrand Meyer. 1997. Object-Oriented Software Construction, 2nd Edition. Prentice-Hall. isbn:0-13-629155-4 http://www.eiffel.com/doc/oosc/page.html

Digital Library

[11]

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. 2022. Training language models to follow instructions with human feedback. arxiv:2203.02155 arXiv:2203.02155 [cs]

[12]

Jihyeok Park, Hongki Lee, and Sukyoung Ryu. 2022. A Survey of Parametric Static Analysis. ACM Comput. Surv., 54, 7 (2022), 149:1–149:37. https://doi.org/10.1145/3464457

Digital Library

[13]

Hammond Pearce, Benjamin Tan, Baleegh Ahmad, Ramesh Karri, and Brendan Dolan-Gavitt. 2023. Examining Zero-Shot Vulnerability Repair with Large Language Models. In 2023 IEEE Symposium on Security and Privacy (S&P). IEEE Computer Society, Los Alamitos, CA, USA. https://doi.org/10.1109/SP46215.2023.00001

[14]

John Schulman, Barret Zoph, Jacob Hilton Christina Kim, Jacob Menick, Jiayi Weng, Juan Felipe Ceron Uribe, Liam Fedus, Luke Metz, Michael Pokorny, Rapha Gontijo Lopes, Shengjia Zhao, Arun Vijayvergiya, Eric Sigler, Adam Perelman, Chelsea Voss, Mike Heaton, Joel Parish, Dave Cummings, Rajeev Nayak, Valerie Balcom, David Schnurr, Tomer Kaftan, Chris Hallacy, Nicholas Turley, Noah Deutsch, Vik Goel, Jonathan Ward, Aris Konstantinidis, Wojciech Zaremba, Long Ouyang, Leonard Bogdonoff, Joshua Gross, David Medina, Sarah Yoo, Teddy Lee, Ryan Lowe, Dan Mossing, Joost Huizinga, Roger Jiang, Carroll Wainwright, Diogo Almeida, Steph Lin, Marvin Zhang, Kai Xiao, Katarina Slama, Steven Bills, Alex Gray, Jan Leike, Jakub Pachocki, Phil Tillet, Shantanu Jain, Greg Brockman, and Nick Ryder. 2022. ChatGPT: Optimizing Language Models for Dialogue. https://openai.com/blog/chatgpt/

[15]

Jessica Shieh. 2023. Best practices for prompt engineering with OpenAI API | OpenAI Help Center. https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-openai-api

[16]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems. 30, Curran Associates, Inc.

[17]

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2023. Self-Consistency Improves Chain of Thought Reasoning in Language Models. In The Eleventh International Conference on Learning Representations.

[18]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2023. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arxiv:2201.11903 arXiv:2201.11903 [cs]

[19]

Chunqiu Steven Xia and Lingming Zhang. 2023. Keep the Conversation Going: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT. arxiv:2304.00385

[20]

Yizhuo Zhai, Yu Hao, Hang Zhang, Daimeng Wang, Chengyu Song, Zhiyun Qian, Mohsen Lesani, Srikanth V. Krishnamurthy, and Paul Yu. 2020. UBITect: A Precise and Scalable Method to Detect Use-before-Initialization Bugs in Linux Kernel. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2020).

Digital Library

[21]

Yizhuo Zhai, Yu Hao, Zheng Zhang, Weiteng Chen, Guoren Li, Zhiyun Qian, Chengyu Song, Manu Sridharan, Srikanth V. Krishnamurthy, Trent Jaeger, and Paul L. Yu. 2022. Progressive Scrutiny: Incremental Detection of UBI bugs in the Linux Kernel. In 29th Annual Network and Distributed System Security Symposium, NDSS 2022, San Diego, California, USA, April 24-28, 2022. The Internet Society. https://www.ndss-symposium.org/ndss-paper/auto-draft-249/

[22]

Hang Zhang, Weiteng Chen, Yu Hao, Guoren Li, Yizhuo Zhai, Xiaochen Zou, and Zhiyun Qian. 2021. Statically Discovering High-Order Taint Style Vulnerabilities in OS Kernels. In CCS ’21: 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, Republic of Korea, November 15 - 19, 2021, Yongdae Kim, Jong Kim, Giovanni Vigna, and Elaine Shi (Eds.). ACM, 811–824. https://doi.org/10.1145/3460120.3484798

Digital Library

[23]

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. 2023. A Survey of Large Language Models. arxiv:2303.18223.

Cited By

Sengul CNeykova RDestefanis G(2024)Software engineering education in the era of conversational AI: current trends and future directionsFrontiers in Artificial Intelligence10.3389/frai.2024.14363507Online publication date: 29-Aug-2024
https://doi.org/10.3389/frai.2024.1436350
Rane NChoudhary SRane J(2024)Role and Challenges of ChatGPT or Similar Generative Artificial Intelligence in Reinforced Concrete TechnologySSRN Electronic Journal10.2139/ssrn.4681731Online publication date: 2024
https://doi.org/10.2139/ssrn.4681731
Stoica BSethi USu YZhou CLu SMace JMusuvathi MNath SWitchel EArpaci-Dusseau ARossbach CKeeton K(2024)If At First You Don’t Succeed, Try, Try, Again...? Insights and LLM-informed Tooling for Detecting Retry Bugs in Software SystemsProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695971(63-78)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3694715.3695971
Show More Cited By

Index Terms

Assisting Static Analysis with Large Language Models: A ChatGPT Experiment
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Security and privacy
  1. Systems security

Recommendations

Interleaving Static Analysis and LLM Prompting
SOAP 2024: Proceedings of the 13th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis

This paper presents a new approach for using Large Language Models (LLMs) to improve static program analysis. Specifically, during program analysis, we interleave calls to the static analyzer and queries to the LLM: the prompt used to query the LLM is ...
Enhancing Static Analysis for Practical Bug Detection: An LLM-Integrated Approach

While static analysis is instrumental in uncovering software bugs, its precision in analyzing large and intricate codebases remains challenging. The emerging prowess of Large Language Models (LLMs) offers a promising avenue to address these complexities. ...
Large Language Models for Emotion Evolution Prediction
Computational Science and Its Applications – ICCSA 2024 Workshops
Abstract
In numerous tasks, especially those of critical safety importance, it is essential that the human participants maintain appropriate emotional states. Recognizing these emotional states accurately has become a key focus, with mainstream approaches ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

November 2023

2215 pages

ISBN:9798400703270

DOI:10.1145/3611643

General Chair:
Satish Chandra
Google, USA
,
Program Chairs:
Kelly Blincoe
University of Auckland, New Zealand
,
Paolo Tonella
USI Lugano, Switzerland

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 November 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ESEC/FSE '23

Sponsor:

SIGSOFT

ESEC/FSE '23: 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

December 3 - 9, 2023

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
2,520
Total Downloads

Downloads (Last 12 months)2,518
Downloads (Last 6 weeks)241

Reflects downloads up to 16 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sengul CNeykova RDestefanis G(2024)Software engineering education in the era of conversational AI: current trends and future directionsFrontiers in Artificial Intelligence10.3389/frai.2024.14363507Online publication date: 29-Aug-2024
https://doi.org/10.3389/frai.2024.1436350
Rane NChoudhary SRane J(2024)Role and Challenges of ChatGPT or Similar Generative Artificial Intelligence in Reinforced Concrete TechnologySSRN Electronic Journal10.2139/ssrn.4681731Online publication date: 2024
https://doi.org/10.2139/ssrn.4681731
Stoica BSethi USu YZhou CLu SMace JMusuvathi MNath SWitchel EArpaci-Dusseau ARossbach CKeeton K(2024)If At First You Don’t Succeed, Try, Try, Again...? Insights and LLM-informed Tooling for Detecting Retry Bugs in Software SystemsProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695971(63-78)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3694715.3695971
Mohajer MAleithan RHarzevili NWei MBelle APham HWang SAdams BZimmermann TOzkaya ILin DZhang J(2024)Effectiveness of ChatGPT for Static Analysis: How Far Are We?Proceedings of the 1st ACM International Conference on AI-Powered Software10.1145/3664646.3664777(151-160)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3664646.3664777
Alshahwan NChheda JFinogenova AGokkaya BHarman MHarper IMarginean ASengupta SWang Ed'Amorim M(2024)Automated Unit Test Improvement using Large Language Models at MetaCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663839(185-196)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3663839
Wang YLópez JNilsson UVarró Dd'Amorim M(2024)Using Run-Time Information to Enhance Static Analysis of Machine Learning Code in NotebooksCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663785(497-501)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3663785
Ezenwoye OPinconschi ERoberts E(2024)Exploring AI for Vulnerability Detection and Repair2024 Cyber Awareness and Research Symposium (CARS)10.1109/CARS61786.2024.10778769(1-9)Online publication date: 28-Oct-2024
https://doi.org/10.1109/CARS61786.2024.10778769
Cao CWang FLindley LWang Z(2024)Managing Linux servers with LLM-based AI agents: An empirical evaluation with GPT4Machine Learning with Applications10.1016/j.mlwa.2024.10057017(100570)Online publication date: Sep-2024
https://doi.org/10.1016/j.mlwa.2024.100570
Ye JFei Xde Carnavalet XZhao LWu LZhang M(2024)Detecting command injection vulnerabilities in Linux-based embedded firmware with LLM-based taint analysis of library functionsComputers & Security10.1016/j.cose.2024.103971144(103971)Online publication date: Sep-2024
https://doi.org/10.1016/j.cose.2024.103971
Guo YPatsakis CHu QTang QCasino F(2024)Outside the Comfort Zone: Analysing LLM Capabilities in Software Vulnerability DetectionComputer Security – ESORICS 202410.1007/978-3-031-70879-4_14(271-289)Online publication date: 16-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-70879-4_14
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents