Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Verdi: a framework for implementing and formally verifying distributed systems

Published: 03 June 2015 Publication History

Abstract

Distributed systems are difficult to implement correctly because they must handle both concurrency and failures: machines may crash at arbitrary points and networks may reorder, drop, or duplicate packets. Further, their behavior is often too complex to permit exhaustive testing. Bugs in these systems have led to the loss of critical data and unacceptable service outages. We present Verdi, a framework for implementing and formally verifying distributed systems in Coq. Verdi formalizes various network semantics with different faults, and the developer chooses the most appropriate fault model when verifying their implementation. Furthermore, Verdi eases the verification burden by enabling the developer to first verify their system under an idealized fault model, then transfer the resulting correctness guarantees to a more realistic fault model without any additional proof burden. To demonstrate Verdi's utility, we present the first mechanically checked proof of linearizability of the Raft state machine replication algorithm, as well as verified implementations of a primary-backup replication system and a key-value store. These verified systems provide similar performance to unverified equivalents.

References

[1]
Amazon. Summary of the Amazon EC2 and Amazon RDS service disruption in the US East Region, Apr. 2011. http://aws.amazon.com/ message/65648/.
[2]
S. Bishop, M. Fairbairn, M. Norrish, P. Sewell, M. Smith, and K. Wansbrough. Engineering with logic: HOL specification and symbolic-evaluation testing for TCP implementations. In Proceedings of the 33rd ACM Symposium on Principles of Programming Languages (POPL), pages 55–66, Charleston, SC, Jan. 2006.
[3]
T. D. Chandra, R. Griesemer, and J. Redstone. Paxos made live: An engineering perspective. In Proceedings of the 26th ACM SIGACTSIGOPS Symposium on Principles of Distributed Computing (PODC), pages 398–407, Portland, OR, Aug. 2007.
[4]
A. Chlipala. Mostly-automated verification of low-level programs in computational separation logic. In Proceedings of the 2011 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 234–245, San Jose, CA, June 2011.
[5]
R. L. Constable, S. F. Allen, H. M. Bromley, W. R. Cleaveland, J. F. Cremer, R. W. Harper, D. J. Howe, T. B. Knoblock, N. P. Mendler, P. Panangaden, J. T. Sasaki, and S. F. Smith. Implementing Mathematics with The Nuprl Proof Development System. Prentice Hall, 1986.
[6]
CoreOS. etcd: A highly-available key value store for shared configuration and service discovery, 2014.
[7]
https://github.com/coreos/etcd.
[8]
S. J. Garland and N. Lynch. Using I/O automata for developing distributed systems. In Foundations of Component-based Systems. Cambridge University Press, 2000.
[9]
D. Geels, G. Altekar, P. Maniatis, T. Roscoe, and I. Stoica. Friday: Global comprehension for distributed replay. In Proceedings of the 4th Symposium on Networked Systems Design and Implementation (NSDI), pages 285–298, Cambridge, MA, Apr. 2007.
[10]
A. Guha, M. Reitblatt, and N. Foster. Machine-verified network controllers. In Proceedings of the 2013 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 483–494, Seattle, WA, June 2013.
[11]
Z. Guo, S. McDirmid, M. Yang, L. Zhuang, P. Zhang, Y. Luo, T. Bergan, P. Bodik, M. Musuvathi, Z. Zhang, and L. Zhou. Failure recovery: When the cure is worse than the disease. In Proceedings of the HotOS XIV, Santa Ana Pueblo, NM, May 2013.
[12]
M. Hayden. The Ensemble System. PhD thesis, Cornell University, 1998.
[13]
M. P. Herlihy and J. M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems, 12(3):463–492, 1990.
[14]
J. J. Hickey, N. Lynch, and R. van Renesse. Specifications and proofs for Ensemble layers. In Proceedings of the 15th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pages 119–133, York, UK, Mar. 2009.
[15]
High Scalability. The updated big list of articles on the Amazon outage, May 2011. http://highscalability.com/blog/2011/5/2/theupdated-big-list-of-articles-on-the-amazon-outage.html.
[16]
D. Jackson. Software Abstractions: Logic, Language, and Analysis. MIT Press, Feb. 2012.
[17]
C. Killian, J. W. Anderson, R. Braud, R. Jhala, and A. Vahdat. Mace: Language support for building distributed systems. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 179–188, San Diego, CA, June 2007.
[18]
C. Killian, J. W. Anderson, R. Jhala, and A. Vahdat. Life, death, and the critical transition: Finding liveness bugs in systems code. In Proceedings of the 4th Symposium on Networked Systems Design and Implementation (NSDI), pages 243–256, Cambridge, MA, Apr. 2007.
[19]
G. Klein, K. Elphinstone, G. Heiser, J. Andronick, D. Cock, P. Derrin, D. Elkaduwe, K. Engelhardt, M. Norrish, R. Kolanski, T. Sewell, H. Tuch, and S. Winwood. seL4: Formal verification of an OS kernel. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP), pages 207–220, Big Sky, MT, Oct. 2009.
[20]
L. Lamport. Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers. Addison-Wesley Professional, July 2002.
[21]
L. Lamport. Thinking for programmers, Apr. 2014. http://channel9.msdn.com/Events/Build/2014/3-642.
[22]
V. Le, M. Afshari, and Z. Su. Compiler validation via equivalence modulo inputs. In Proceedings of the 2014 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 216–226, Edinburgh, UK, June 2014.
[23]
X. Leroy. Formal verification of a realistic compiler. Communications of the ACM, 52(7):107–115, July 2009.
[24]
X. Liu, C. Kreitz, R. van Renesse, J. Hickey, M. Hayden, K. Birman, and R. L. Constable. Building reliable, high-performance communication systems from components. In Proceedings of the 17th ACM Symposium on Operating Systems Principles (SOSP), pages 80–92, Kiawah Island, SC, Dec. 1999.
[25]
X. Liu, Z. Guo, X. Wang, F. Chen, X. Lian, J. Tang, M. Wu, M. F. Kaashoek, and Z. Zhang. D 3 S: Debugging deployed distributed systems. In Proceedings of the 5th Symposium on Networked Systems Design and Implementation (NSDI), pages 423–437, San Francisco, CA, Apr. 2008.
[26]
N. A. Lynch. Distributed Algorithms. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1996. ISBN 1558603484.
[27]
A. Nanevski, G. Morrisett, A. Shinnar, P. Govereau, and L. Birkedal. Ynot: Dependent types for imperative programs. In Proceedings of the 13th ACM SIGPLAN International Conference on Functional Programming (ICFP), Victoria, British Columbia, Canada, Sept. 2008.
[28]
C. Newcombe, T. Rath, F. Zhang, B. Munteanu, M. Brooker, and M. Deardeuff. Use of formal methods at Amazon Web Services, Sept. 2014. http://research.microsoft.com/enus/um/people/lamport/tla/formal-methods-amazon.pdf.
[29]
NYTimes. Amazon’s trouble raises cloud computing doubts, Apr. 2011. http://www.nytimes.com/2011/04/23/technology/23cloud.html.
[30]
D. Ongaro. Consensus: Bridging Theory and Practice. PhD thesis, Stanford University, Aug. 2014.
[31]
D. Ongaro and J. Ousterhout. In search of an understandable consensus algorithm. In Proceedings of the 2014 USENIX Annual Technical Conference, pages 305–319, Philadelphia, PA, June 2014.
[32]
J. L. Peterson. Petri nets. ACM Computing Surveys, pages 223–252, Sept. 1977.
[33]
V. Rahli. Interfacing with proof assistants for domain specific programming using EventML. In Proceedings of the 10th International Workshop On User Interfaces for Theorem Provers, Bremen, Germany, July 2012.
[34]
D. Ricketts, V. Robert, D. Jang, Z. Tatlock, and S. Lerner. Automating formal proofs for reactive systems. In Proceedings of the 2014 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 452–462, Edinburgh, UK, June 2014.
[35]
T. Ridge. Verifying distributed systems: The operational approach. In Proceedings of the 36th ACM Symposium on Principles of Programming Languages (POPL), pages 429–440, Savannah, GA, Jan. 2009.
[36]
D. Sangiorgi and D. Walker. PI-Calculus: A Theory of Mobile Processes. Cambridge University Press, New York, NY, USA, 2001. ISBN 0521781779.
[37]
N. Schiper, V. Rahli, R. van Renesse, M. Bickford, and R. L. Constable. Developing correctly replicated databases using formal tools. In Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 395–406, Atlanta, GA, June 2014.
[38]
Verdi. Verdi, 2014. https://github.com/uwplse/verdi.
[39]
M. Yabandeh, N. Kneževi´c, D. Kosti´c, and V. Kuncak. CrystalBall: Predicting and preventing inconsistencies in deployed distributed systems. In Proceedings of the 5th Symposium on Networked Systems Design and Implementation (NSDI), pages 229–244, San Francisco, CA, Apr. 2008.
[40]
J. Yang and C. Hawblitzel. Safe to the last instruction: Automated verification of a type-safe operating system. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 99–110, Toronto, Canada, June 2010.
[41]
J. Yang, T. Chen, M. Wu, Z. Xu, X. Liu, H. Lin, M. Yang, F. Long, L. Zhang, and L. Zhou. M O D IST : Transparent model checking of unmodified distributed systems. In Proceedings of the 6th Symposium on Networked Systems Design and Implementation (NSDI), pages 213–228, Boston, MA, Apr. 2009.
[42]
X. Yang, Y. Chen, E. Eide, and J. Regehr. Finding and understanding bugs in C compilers. In Proceedings of the 2011 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 283–294, San Jose, CA, June 2011.
[43]
D. Yuan, Y. Luo, X. Zhuang, G. R. Rodrigues, X. Zhao, Y. Zhang, P. U. Jain, and M. Stumm. Simple testing can prevent most critical failures: An analysis of production failures in distributed data-intensive systems. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI), pages 249–265, Broomfield, CO, Oct. 2014.
[44]
P. Zave. Using lightweight modeling to understand Chord. ACM SIGCOMM Computer Communication Review, 42(2):49–57, Apr. 2012.

Cited By

View all
  • (2025)VeriRT: An End-to-End Verification Framework for Real-Time Distributed SystemsProceedings of the ACM on Programming Languages10.1145/37048979:POPL(1812-1839)Online publication date: 9-Jan-2025
  • (2024)VConMC: Enabling Consistency Verification for Distributed Systems Using Implementation-Level Model Checkers and Consistency OraclesElectronics10.3390/electronics1306115313:6(1153)Online publication date: 21-Mar-2024
  • (2024)Proof Automation with Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695521(1509-1520)Online publication date: 27-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 50, Issue 6
PLDI '15
June 2015
630 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2813885
  • Editor:
  • Andy Gill
Issue’s Table of Contents
  • cover image ACM Conferences
    PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation
    June 2015
    630 pages
    ISBN:9781450334686
    DOI:10.1145/2737924
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 June 2015
Published in SIGPLAN Volume 50, Issue 6

Check for updates

Author Tags

  1. Coq
  2. Formal verification
  3. Verdi
  4. distributed systems
  5. proof assistants

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)616
  • Downloads (Last 6 weeks)71
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)VeriRT: An End-to-End Verification Framework for Real-Time Distributed SystemsProceedings of the ACM on Programming Languages10.1145/37048979:POPL(1812-1839)Online publication date: 9-Jan-2025
  • (2024)VConMC: Enabling Consistency Verification for Distributed Systems Using Implementation-Level Model Checkers and Consistency OraclesElectronics10.3390/electronics1306115313:6(1153)Online publication date: 21-Mar-2024
  • (2024)Proof Automation with Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695521(1509-1520)Online publication date: 27-Oct-2024
  • (2024)Foundational Integration Verification of a Cryptographic ServerProceedings of the ACM on Programming Languages10.1145/36564468:PLDI(1704-1729)Online publication date: 20-Jun-2024
  • (2024)LiDO: Linearizable Byzantine Distributed Objects with Refinement-Based Liveness ProofsProceedings of the ACM on Programming Languages10.1145/36564238:PLDI(1140-1164)Online publication date: 20-Jun-2024
  • (2024)The Development of a TLA$$^{+}$$ Verified Correctness Raft Consensus ProtocolWeb and Big Data10.1007/978-981-97-7244-5_40(459-469)Online publication date: 28-Aug-2024
  • (2023)Specification and Runtime Checking of Derecho, A Protocol for Fast Replication for Cloud ServicesProceedings of the 5th workshop on Advanced tools, programming languages, and PLatforms for Implementing and Evaluating algorithms for Distributed systems10.1145/3584684.3597275(1-10)Online publication date: 19-Jun-2023
  • (2023)$\pi_{\mathbf{RA}}$: A $\pi\text{-calculus}$ for Verifying Protocols that Use Remote Attestation2023 IEEE 36th Computer Security Foundations Symposium (CSF)10.1109/CSF57540.2023.00019(537-551)Online publication date: Jul-2023
  • (2023)Protocol Conformance with Choreographic PlusCalTheoretical Aspects of Software Engineering10.1007/978-3-031-35257-7_8(126-145)Online publication date: 27-Jun-2023
  • (2023)Efficient linearizability checking for actor‐based systemsSoftware: Practice and Experience10.1002/spe.325153:11(2163-2199)Online publication date: 22-Aug-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media