Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3644033.3644374acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article
Open access

Leveraging Large Language Models to Boost Dafny’s Developers Productivity

Published: 06 June 2024 Publication History

Abstract

This research idea paper proposes leveraging Large Language Models (LLMs) to enhance the productivity of Dafny developers. Although the use of verification-aware languages, such as Dafny, has increased considerably in the last decade, these are still not widely adopted. Often the cost of using such languages is too high, due to the level of expertise required from the developers and challenges that they often face when trying to prove a program correct. Even though Dafny automates a lot of the verification process, sometimes there are steps that are too complex for Dafny to perform on its own. One such case is that of missing lemmas, i.e. Dafny is unable to prove a result without being given further help in the form of a theorem that can assist it in the proof of the step.
In this paper, we describe preliminary work on using LLMs to assist developers by generating suggestions for relevant lemmas that Dafny is unable to discover and use. Moreover, for the lemmas that cannot be proved automatically, we attempt to provide accompanying calculational proofs. We also discuss ideas for future work by describing a research agenda on using LLMs to increase the adoption of verification-aware languages in general, by increasing developers productivity and by reducing the level of expertise required for crafting formal specifications and proving program properties.

References

[1]
Alexandre Abreu, Nuno Macedo, and Alexandra Mendes. 2023. Exploring Automatic Specification Repair in Dafny Programs. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW). 105--112.
[2]
Wolfgang Ahrendt, Dilian Gurov, Moa Johansson, and Philipp Rümmer. 2022. TriCo---Triple Co-piloting of Implementation, Specification and Tests. In International Symposium on Leveraging Applications of Formal Methods. Springer, 174--187.
[3]
Open AI. 2023. GPT-4 Technical report. arXiv preprint arXiv:2303.08774 (2023).
[4]
Lasse Blaauwbroek, Josef Urban, and Herman Geuvers. 2020. The Tactician. In Intelligent Computer Mathematics. 271--277.
[5]
Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, et al. 2023. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 (2023).
[6]
Franck Cassez. 2021. Verification of the incremental Merkle tree algorithm with Dafny. In 24th International Symposium on Formal Methods (FM) (LNCS, Vol. 13047). Springer, 445--462.
[7]
Franck Cassez, Joanne Fuller, Milad K Ghale, David J Pearce, and Horacio MA Quiles. 2023. Formal and Executable Semantics of the Ethereum Virtual Machine in Dafny. In International Symposium on Formal Methods. Springer, 571--583.
[8]
Franck Cassez, Joanne Fuller, and Horacio Mijail Antón Quiles. 2022. Deductive verification of smart contracts with Dafny. In International Conference on Formal Methods for Industrial Critical Systems. Springer, 50--66.
[9]
Michael D Ernst, Jeff H Perkins, Philip J Guo, Stephen McCamant, Carlos Pacheco, Matthew S Tschantz, and Chen Xiao. 2007. The Daikon system for dynamic detection of likely invariants. Sci. Comput. Program. 69, 1--3 (2007), 35--45.
[10]
Marie Farrell, Conor Reynolds, and Rosemary Monahan. 2021. Using dafny to solve the VerifyThis 2021 challenges. In Proceedings of the 23rd ACM International Workshop on Formal Techniques for Java-like Programs. 32--38.
[11]
João F Ferreira and Alexandra Mendes. 2009. Students' feedback on teaching mathematics through the calculational method. In 2009 39th IEEE Frontiers in Education Conference. IEEE, 1--6.
[12]
Joao F Ferreira, Alexandra Mendes, Roland Backhouse, and Luis S Barbosa. 2009. Which mathematics for the information society?. In Teaching Formal Methods: Second International Conference, TFM 2009, Eindhoven, The Netherlands, November 2--6, 2009. Proceedings 2. Springer, 39--56.
[13]
Joao F Ferreira, Alexandra Mendes, Alcino Cunha, Carlos Baquero, and Paulo Silva. 2011. Logic training through algorithmic problem solving. In Tools for Teaching Logic: Third International Congress, TICTTL 2011, Salamanca, Spain, June 1--4, 2011. Proceedings. Springer, 62--69.
[14]
Emily First and Yuriy Brun. 2022. Diversity-Driven Automated Formal Verification. In ICSE (22--27). Pittsburgh, PA, USA, 749--761.
[15]
Emily First, Yuriy Brun, and Arjun Guha. 2020. TacTok: Semantics-Aware Proof Synthesis. PACMPL OOPSLA 4 (November 2020), 231:1--231:31.
[16]
Emily First, Markus Rabe, Talia Ringer, and Yuriy Brun. 2023. Baldur: Whole-Proof Generation and Repair with Large Language Models. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (, San Francisco, CA, USA,) (ESEC/FSE 2023). Association for Computing Machinery, New York, NY, USA, 1229--1241.
[17]
Shuzheng Gao, Cuiyun Gao, Yulan He, Jichuan Zeng, Lunyiu Nie, Xin Xia, and Michael Lyu. 2023. Code Structure-Guided Transformer for Source Code Summarization. ACM Trans. Softw. Eng. Methodol. 32, 1, Article 23 (feb 2023), 32 pages.
[18]
Daniel Huang, Prafulla Dhariwal, Dawn Song, and Ilya Sutskever. 2019. GamePad: A Learning Environment for Theorem Proving. In ICLR. https://openreview.net/forum?id=r1xwKoR9Y7
[19]
Albert Jiang, Konrad Czechowski, Mateja Jamnik, Piotr Milos, Szymon Tworkowski, Wenda Li, and Yuhuai Tony Wu. 2022. Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers. In NeurIPS.
[20]
Matthew Jin, Syed Shahriar, Michele Tufano, Xin Shi, Shuai Lu, Neel Sundaresan, and Alexey Svyatkovskiy. 2023. Inferfix: End-to-end program repair with llms. arXiv preprint arXiv:2303.07263 (2023).
[21]
Junaed Younus Khan and Gias Uddin. 2023. Automatic Code Documentation Generation Using GPT-3. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (Rochester, MI, USA) (ASE '22). Association for Computing Machinery, New York, NY, USA, Article 174, 6 pages.
[22]
K Rustan M Leino. 2010. Dafny: An automatic program verifier for functional correctness. In International conference on logic for programming artificial intelligence and reasoning. Springer, 348--370.
[23]
K Rustan M Leino. 2023. Program Proofs. MIT Press.
[24]
K Rustan M Leino and Nadia Polikarpova. 2013. Verified calculations. In Working Conference on Verified Software: Theories, Tools, and Experiments. Springer, 170--190.
[25]
Nancy Leveson. 2020. Are you sure your software will not kill anyone? Commun. ACM 63, 2 (2020), 25--28.
[26]
Alexandra Mendes, Roland Backhouse, and Joao F Ferreira. 2014. Structure editing of handwritten mathematics: Improving the computer support for the calculational method. In Proceedings of the Ninth ACM International Conference on Interactive Tabletops and Surfaces. 139--148.
[27]
Alexandra Mendes and Joao F Ferreira. 2018. Towards Verified Handwritten Calculational Proofs: (Short Paper). In Interactive Theorem Proving: 9th International Conference, ITP 2018, Held as Part of the Federated Logic Conference, FloC 2018, Oxford, UK, July 9--12, 2018, Proceedings 9. Springer, 432--440.
[28]
Maciej Mikula, Szymon Antoniak, Szymon Tworkowski, Albert Qiaochu Jiang, Jin Peng Zhou, Christian Szegedy, Łukasz Kuciński, Piotr Miloś, and Yuhuai Wu. 2023. Magnushammer: A Transformer-based Approach to Premise Selection. arXiv:2303.04488
[29]
Nhan Nguyen and Sarah Nadi. 2022. An empirical evaluation of GitHub copilot's code suggestions. In Proceedings of the 19th International Conference on Mining Software Repositories. 1--5.
[30]
Julian Aron Prenner, Hlib Babii, and Romain Robbes. 2022. Can OpenAI's Codex Fix Bugs? An Evaluation on QuixBugs. In Proceedings of the Third International Workshop on Automated Program Repair (Pittsburgh, Pennsylvania) (APR '22). Association for Computing Machinery, New York, NY, USA, 69--75.
[31]
Tom Reichel, R Henderson, Andrew Touchet, Andrew Gardner, and Talia Ringer. 2023. Proof repair infrastructure for supervised models: Building a large proof repair dataset. In 14th International Conference on Interactive Theorem Proving (ITP 2023). Schloss-Dagstuhl-Leibniz Zentrum für Informatik.
[32]
Alex Sanchez-Stern, Yousef Alhessi, Lawrence Saul, and Sorin Lerner. 2020. Generating correctness proofs with neural networks. In Proceedings of the 4th ACM SIGPLAN International Workshop on Machine Learning and Programming Languages. 1--10.
[33]
Alex Sanchez-Stern, Emily First, Timothy Zhou, Zhanna Kaufman, Yuriy Brun, and Talia Ringer. 2023. Passport: Improving Automated Formal Verification Using Identifiers. ACM TOPLAS 45, 2, Article 12 (June 2023), 12:1--12:30 pages.
[34]
William Schultz, Ian Dardik, and Stavros Tripakis. 2022. Plain and Simple Inductive Invariant Inference for Distributed Protocols in TLA+. In 2022 Formal Methods in Computer-Aided Design (FMCAD). IEEE, 273--283.
[35]
Aishwarya Sivaraman, Alex Sanchez-Stern, Bretton Chen, Sorin Lerner, and Todd Millstein. 2022. Data-driven lemma synthesis for interactive proofs. Proceedings of the ACM on Programming Languages 6, OOPSLA2 (2022), 505--531.
[36]
Chuyue Sun, Ying Sheng, Oded Padon, and Clark Barrett. 2023. Clover: Closed-Loop Verifiable Code Generation. arXiv preprint arXiv:2310.17807 (2023).
[37]
Quang-Trung Ta, Ton Chanh Le, Siau-Cheng Khoo, and Wei-Ngan Chin. 2017. Automated lemma synthesis in symbolic-heap separation logic. Proceedings of the ACM on Programming Languages 2, POPL (2017), 1--29.
[38]
Amitayush Thakur, Yeming Wen, and Swarat Chaudhuri. 2023. A Language-Agent Approach to Formal Theorem-Proving. arXiv:2310.04353
[39]
Ruyun Wang, Hanwen Zhang, Guoliang Lu, Lei Lyu, and Chen Lyu. 2020. Fret: Functional Reinforced Transformer With BERT for Code Summarization. IEEE Access 8 (2020), 135591--135604.
[40]
Chunqiu Steven Xia, Yuxiang Wei, and Lingming Zhang. 2023. Automated Program Repair in the Era of Large Pre-Trained Language Models. In Proceedings of the 45th International Conference on Software Engineering (Melbourne, Victoria, Australia) (ICSE '23). IEEE Press, 1482--1494.
[41]
Chunqiu Steven Xia and Lingming Zhang. 2022. Less Training, More Repairing Please: Revisiting Automated Program Repair via Zero-Shot Learning. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (, Singapore, Singapore,) (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA, 959--971.
[42]
Xin Xia, Lingfeng Bao, David Lo, Zhenchang Xing, Ahmed E. Hassan, and Shanping Li. 2018. Measuring Program Comprehension: A Large-Scale Field Study with Professionals. IEEE Transactions on Software Engineering 44, 10 (2018), 951--976.
[43]
Kaiyu Yang and Jia Deng. 2019. Learning to prove theorems via interacting with proof assistants. In ICML. 6984--6994.
[44]
Kaiyu Yang, Aidan Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan Prenger, and Anima Anandkumar. 2023. LeanDojo: Theorem Proving with Retrieval-Augmented Language Models. In Neural Information Processing Systems (NeurIPS).
[45]
Weikun Yang, Grigory Fedyukovich, and Aarti Gupta. 2019. Lemma synthesis for automating induction over algebraic data types. In Principles and Practice of Constraint Programming: 25th International Conference, CP 2019, Stamford, CT, USA, September 30--October 4, 2019, Proceedings 25. Springer, 600--617.

Cited By

View all
  • (2024)Specify What? Enhancing Neural Specification Synthesis by Symbolic MethodsIntegrated Formal Methods10.1007/978-3-031-76554-4_19(307-325)Online publication date: 13-Nov-2024
  • (2024)Towards Integrating Copiloting and Formal MethodsLeveraging Applications of Formal Methods, Verification and Validation. Specification and Verification10.1007/978-3-031-75380-0_9(144-158)Online publication date: 30-Oct-2024

Index Terms

  1. Leveraging Large Language Models to Boost Dafny’s Developers Productivity

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      FormaliSE '24: Proceedings of the 2024 IEEE/ACM 12th International Conference on Formal Methods in Software Engineering (FormaliSE)
      April 2024
      154 pages
      ISBN:9798400705892
      DOI:10.1145/3644033
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 06 June 2024

      Check for updates

      Author Tags

      1. verification-aware languages
      2. dafny
      3. large language models
      4. generative AI
      5. software productivity
      6. software verification
      7. lemma inference
      8. proof inference
      9. automated program repair

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      FormaliSE '24
      Sponsor:

      Upcoming Conference

      ICSE 2025

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)188
      • Downloads (Last 6 weeks)39
      Reflects downloads up to 29 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Specify What? Enhancing Neural Specification Synthesis by Symbolic MethodsIntegrated Formal Methods10.1007/978-3-031-76554-4_19(307-325)Online publication date: 13-Nov-2024
      • (2024)Towards Integrating Copiloting and Formal MethodsLeveraging Applications of Formal Methods, Verification and Validation. Specification and Verification10.1007/978-3-031-75380-0_9(144-158)Online publication date: 30-Oct-2024

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media