research-article

Open access

Leveraging Large Language Models to Boost Dafny’s Developers Productivity

Authors:

Álvaro F. Silva,

Alexandra Mendes,

João F. FerreiraAuthors Info & Claims

FormaliSE '24: Proceedings of the 2024 IEEE/ACM 12th International Conference on Formal Methods in Software Engineering (FormaliSE)

Pages 138 - 142

https://doi.org/10.1145/3644033.3644374

Published: 06 June 2024 Publication History

Abstract

This research idea paper proposes leveraging Large Language Models (LLMs) to enhance the productivity of Dafny developers. Although the use of verification-aware languages, such as Dafny, has increased considerably in the last decade, these are still not widely adopted. Often the cost of using such languages is too high, due to the level of expertise required from the developers and challenges that they often face when trying to prove a program correct. Even though Dafny automates a lot of the verification process, sometimes there are steps that are too complex for Dafny to perform on its own. One such case is that of missing lemmas, i.e. Dafny is unable to prove a result without being given further help in the form of a theorem that can assist it in the proof of the step.

In this paper, we describe preliminary work on using LLMs to assist developers by generating suggestions for relevant lemmas that Dafny is unable to discover and use. Moreover, for the lemmas that cannot be proved automatically, we attempt to provide accompanying calculational proofs. We also discuss ideas for future work by describing a research agenda on using LLMs to increase the adoption of verification-aware languages in general, by increasing developers productivity and by reducing the level of expertise required for crafting formal specifications and proving program properties.

References

[1]

Alexandre Abreu, Nuno Macedo, and Alexandra Mendes. 2023. Exploring Automatic Specification Repair in Dafny Programs. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW). 105--112.

[2]

Wolfgang Ahrendt, Dilian Gurov, Moa Johansson, and Philipp Rümmer. 2022. TriCo---Triple Co-piloting of Implementation, Specification and Tests. In International Symposium on Leveraging Applications of Formal Methods. Springer, 174--187.

[3]

Open AI. 2023. GPT-4 Technical report. arXiv preprint arXiv:2303.08774 (2023).

[4]

Lasse Blaauwbroek, Josef Urban, and Herman Geuvers. 2020. The Tactician. In Intelligent Computer Mathematics. 271--277.

[5]

Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, et al. 2023. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 (2023).

[6]

Franck Cassez. 2021. Verification of the incremental Merkle tree algorithm with Dafny. In 24th International Symposium on Formal Methods (FM) (LNCS, Vol. 13047). Springer, 445--462.

Digital Library

[7]

Franck Cassez, Joanne Fuller, Milad K Ghale, David J Pearce, and Horacio MA Quiles. 2023. Formal and Executable Semantics of the Ethereum Virtual Machine in Dafny. In International Symposium on Formal Methods. Springer, 571--583.

[8]

Franck Cassez, Joanne Fuller, and Horacio Mijail Antón Quiles. 2022. Deductive verification of smart contracts with Dafny. In International Conference on Formal Methods for Industrial Critical Systems. Springer, 50--66.

Digital Library

[9]

Michael D Ernst, Jeff H Perkins, Philip J Guo, Stephen McCamant, Carlos Pacheco, Matthew S Tschantz, and Chen Xiao. 2007. The Daikon system for dynamic detection of likely invariants. Sci. Comput. Program. 69, 1--3 (2007), 35--45.

Digital Library

[10]

Marie Farrell, Conor Reynolds, and Rosemary Monahan. 2021. Using dafny to solve the VerifyThis 2021 challenges. In Proceedings of the 23rd ACM International Workshop on Formal Techniques for Java-like Programs. 32--38.

Digital Library

[11]

João F Ferreira and Alexandra Mendes. 2009. Students' feedback on teaching mathematics through the calculational method. In 2009 39th IEEE Frontiers in Education Conference. IEEE, 1--6.

[12]

Joao F Ferreira, Alexandra Mendes, Roland Backhouse, and Luis S Barbosa. 2009. Which mathematics for the information society?. In Teaching Formal Methods: Second International Conference, TFM 2009, Eindhoven, The Netherlands, November 2--6, 2009. Proceedings 2. Springer, 39--56.

Digital Library

[13]

Joao F Ferreira, Alexandra Mendes, Alcino Cunha, Carlos Baquero, and Paulo Silva. 2011. Logic training through algorithmic problem solving. In Tools for Teaching Logic: Third International Congress, TICTTL 2011, Salamanca, Spain, June 1--4, 2011. Proceedings. Springer, 62--69.

[14]

Emily First and Yuriy Brun. 2022. Diversity-Driven Automated Formal Verification. In ICSE (22--27). Pittsburgh, PA, USA, 749--761.

Digital Library

[15]

Emily First, Yuriy Brun, and Arjun Guha. 2020. TacTok: Semantics-Aware Proof Synthesis. PACMPL OOPSLA 4 (November 2020), 231:1--231:31.

Digital Library

[16]

Emily First, Markus Rabe, Talia Ringer, and Yuriy Brun. 2023. Baldur: Whole-Proof Generation and Repair with Large Language Models. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (, San Francisco, CA, USA,) (ESEC/FSE 2023). Association for Computing Machinery, New York, NY, USA, 1229--1241.

Digital Library

[17]

Shuzheng Gao, Cuiyun Gao, Yulan He, Jichuan Zeng, Lunyiu Nie, Xin Xia, and Michael Lyu. 2023. Code Structure-Guided Transformer for Source Code Summarization. ACM Trans. Softw. Eng. Methodol. 32, 1, Article 23 (feb 2023), 32 pages.

Digital Library

[18]

Daniel Huang, Prafulla Dhariwal, Dawn Song, and Ilya Sutskever. 2019. GamePad: A Learning Environment for Theorem Proving. In ICLR. https://openreview.net/forum?id=r1xwKoR9Y7

[19]

Albert Jiang, Konrad Czechowski, Mateja Jamnik, Piotr Milos, Szymon Tworkowski, Wenda Li, and Yuhuai Tony Wu. 2022. Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers. In NeurIPS.

[20]

Matthew Jin, Syed Shahriar, Michele Tufano, Xin Shi, Shuai Lu, Neel Sundaresan, and Alexey Svyatkovskiy. 2023. Inferfix: End-to-end program repair with llms. arXiv preprint arXiv:2303.07263 (2023).

[21]

Junaed Younus Khan and Gias Uddin. 2023. Automatic Code Documentation Generation Using GPT-3. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (Rochester, MI, USA) (ASE '22). Association for Computing Machinery, New York, NY, USA, Article 174, 6 pages.

Digital Library

[22]

K Rustan M Leino. 2010. Dafny: An automatic program verifier for functional correctness. In International conference on logic for programming artificial intelligence and reasoning. Springer, 348--370.

Digital Library

[23]

K Rustan M Leino. 2023. Program Proofs. MIT Press.

[24]

K Rustan M Leino and Nadia Polikarpova. 2013. Verified calculations. In Working Conference on Verified Software: Theories, Tools, and Experiments. Springer, 170--190.

[25]

Nancy Leveson. 2020. Are you sure your software will not kill anyone? Commun. ACM 63, 2 (2020), 25--28.

Digital Library

[26]

Alexandra Mendes, Roland Backhouse, and Joao F Ferreira. 2014. Structure editing of handwritten mathematics: Improving the computer support for the calculational method. In Proceedings of the Ninth ACM International Conference on Interactive Tabletops and Surfaces. 139--148.

Digital Library

[27]

Alexandra Mendes and Joao F Ferreira. 2018. Towards Verified Handwritten Calculational Proofs: (Short Paper). In Interactive Theorem Proving: 9th International Conference, ITP 2018, Held as Part of the Federated Logic Conference, FloC 2018, Oxford, UK, July 9--12, 2018, Proceedings 9. Springer, 432--440.

[28]

Maciej Mikula, Szymon Antoniak, Szymon Tworkowski, Albert Qiaochu Jiang, Jin Peng Zhou, Christian Szegedy, Łukasz Kuciński, Piotr Miloś, and Yuhuai Wu. 2023. Magnushammer: A Transformer-based Approach to Premise Selection. arXiv:2303.04488

[29]

Nhan Nguyen and Sarah Nadi. 2022. An empirical evaluation of GitHub copilot's code suggestions. In Proceedings of the 19th International Conference on Mining Software Repositories. 1--5.

Digital Library

[30]

Julian Aron Prenner, Hlib Babii, and Romain Robbes. 2022. Can OpenAI's Codex Fix Bugs? An Evaluation on QuixBugs. In Proceedings of the Third International Workshop on Automated Program Repair (Pittsburgh, Pennsylvania) (APR '22). Association for Computing Machinery, New York, NY, USA, 69--75.

Digital Library

[31]

Tom Reichel, R Henderson, Andrew Touchet, Andrew Gardner, and Talia Ringer. 2023. Proof repair infrastructure for supervised models: Building a large proof repair dataset. In 14th International Conference on Interactive Theorem Proving (ITP 2023). Schloss-Dagstuhl-Leibniz Zentrum für Informatik.

[32]

Alex Sanchez-Stern, Yousef Alhessi, Lawrence Saul, and Sorin Lerner. 2020. Generating correctness proofs with neural networks. In Proceedings of the 4th ACM SIGPLAN International Workshop on Machine Learning and Programming Languages. 1--10.

Digital Library

[33]

Alex Sanchez-Stern, Emily First, Timothy Zhou, Zhanna Kaufman, Yuriy Brun, and Talia Ringer. 2023. Passport: Improving Automated Formal Verification Using Identifiers. ACM TOPLAS 45, 2, Article 12 (June 2023), 12:1--12:30 pages.

Digital Library

[34]

William Schultz, Ian Dardik, and Stavros Tripakis. 2022. Plain and Simple Inductive Invariant Inference for Distributed Protocols in TLA+. In 2022 Formal Methods in Computer-Aided Design (FMCAD). IEEE, 273--283.

[35]

Aishwarya Sivaraman, Alex Sanchez-Stern, Bretton Chen, Sorin Lerner, and Todd Millstein. 2022. Data-driven lemma synthesis for interactive proofs. Proceedings of the ACM on Programming Languages 6, OOPSLA2 (2022), 505--531.

Digital Library

[36]

Chuyue Sun, Ying Sheng, Oded Padon, and Clark Barrett. 2023. Clover: Closed-Loop Verifiable Code Generation. arXiv preprint arXiv:2310.17807 (2023).

[37]

Quang-Trung Ta, Ton Chanh Le, Siau-Cheng Khoo, and Wei-Ngan Chin. 2017. Automated lemma synthesis in symbolic-heap separation logic. Proceedings of the ACM on Programming Languages 2, POPL (2017), 1--29.

[38]

Amitayush Thakur, Yeming Wen, and Swarat Chaudhuri. 2023. A Language-Agent Approach to Formal Theorem-Proving. arXiv:2310.04353

[39]

Ruyun Wang, Hanwen Zhang, Guoliang Lu, Lei Lyu, and Chen Lyu. 2020. Fret: Functional Reinforced Transformer With BERT for Code Summarization. IEEE Access 8 (2020), 135591--135604.

[40]

Chunqiu Steven Xia, Yuxiang Wei, and Lingming Zhang. 2023. Automated Program Repair in the Era of Large Pre-Trained Language Models. In Proceedings of the 45th International Conference on Software Engineering (Melbourne, Victoria, Australia) (ICSE '23). IEEE Press, 1482--1494.

Digital Library

[41]

Chunqiu Steven Xia and Lingming Zhang. 2022. Less Training, More Repairing Please: Revisiting Automated Program Repair via Zero-Shot Learning. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (, Singapore, Singapore,) (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA, 959--971.

Digital Library

[42]

Xin Xia, Lingfeng Bao, David Lo, Zhenchang Xing, Ahmed E. Hassan, and Shanping Li. 2018. Measuring Program Comprehension: A Large-Scale Field Study with Professionals. IEEE Transactions on Software Engineering 44, 10 (2018), 951--976.

Digital Library

[43]

Kaiyu Yang and Jia Deng. 2019. Learning to prove theorems via interacting with proof assistants. In ICML. 6984--6994.

[44]

Kaiyu Yang, Aidan Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan Prenger, and Anima Anandkumar. 2023. LeanDojo: Theorem Proving with Retrieval-Augmented Language Models. In Neural Information Processing Systems (NeurIPS).

[45]

Weikun Yang, Grigory Fedyukovich, and Aarti Gupta. 2019. Lemma synthesis for automating induction over algebraic data types. In Principles and Practice of Constraint Programming: 25th International Conference, CP 2019, Stamford, CT, USA, September 30--October 4, 2019, Proceedings 25. Springer, 600--617.

Digital Library

Cited By

Granberry GAhrendt WJohansson M(2024)Specify What? Enhancing Neural Specification Synthesis by Symbolic MethodsIntegrated Formal Methods10.1007/978-3-031-76554-4_19(307-325)Online publication date: 13-Nov-2024
https://doi.org/10.1007/978-3-031-76554-4_19
Granberry GAhrendt WJohansson M(2024)Towards Integrating Copiloting and Formal MethodsLeveraging Applications of Formal Methods, Verification and Validation. Specification and Verification10.1007/978-3-031-75380-0_9(144-158)Online publication date: 30-Oct-2024
https://doi.org/10.1007/978-3-031-75380-0_9

Index Terms

Leveraging Large Language Models to Boost Dafny’s Developers Productivity
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
  2. Software notations and tools

Recommendations

Developing verified programs with Dafny
HILT '12

Reasoning about programs is a fundamental skill that every software engineer needs. This tutorial provides participants an opportunity to get hands-on experience with Dafny, a tool that can help develop this skill.

Dafny is a programming language and ...
Towards progressive program verification in Dafny
SBLP '18: Proceedings of the XXII Brazilian Symposium on Programming Languages

Program verification is a tool for the development of software that is free from defects and satisfies its functional specification. It suffers from two issues that have already been addressed in the field of type systems. First, it has a rigid focus on ...
Developing verified programs with Dafny
HILT '12: Proceedings of the 2012 ACM conference on High integrity language technology

Reasoning about programs is a fundamental skill that every software engineer needs. This tutorial provides participants an opportunity to get hands-on experience with Dafny, a tool that can help develop this skill.

Dafny is a programming language and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

FormaliSE '24: Proceedings of the 2024 IEEE/ACM 12th International Conference on Formal Methods in Software Engineering (FormaliSE)

April 2024

154 pages

ISBN:9798400705892

DOI:10.1145/3644033

Chair:
Nico Plat,
General Chair:
Stefania Gnesi
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo", Italy
,
Program Chairs:
Carlo A. Furia,
Antónia Lopes
University of Lisbon, Portugal

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Fundação para a Ciência e a Tecnologia

Conference

FormaliSE '24

Sponsor:

SIGSOFT

FormaliSE '24: 2024 IEEE/ACM 12th International Conference on Formal Methods in Software Engineering (FormaliSE)

April 14 - 15, 2024

Lisbon, Portugal

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
188
Total Downloads

Downloads (Last 12 months)188
Downloads (Last 6 weeks)39

Reflects downloads up to 29 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Granberry GAhrendt WJohansson M(2024)Specify What? Enhancing Neural Specification Synthesis by Symbolic MethodsIntegrated Formal Methods10.1007/978-3-031-76554-4_19(307-325)Online publication date: 13-Nov-2024
https://doi.org/10.1007/978-3-031-76554-4_19
Granberry GAhrendt WJohansson M(2024)Towards Integrating Copiloting and Formal MethodsLeveraging Applications of Formal Methods, Verification and Validation. Specification and Verification10.1007/978-3-031-75380-0_9(144-158)Online publication date: 30-Oct-2024
https://doi.org/10.1007/978-3-031-75380-0_9

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents