research-article

A short counterexample property for safety and liveness verification of fault-tolerant distributed algorithms

Authors:

Marijana Lazić,

Josef WidderAuthors Info & Claims

POPL '17: Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages

Pages 719 - 734

https://doi.org/10.1145/3009837.3009860

Published: 01 January 2017 Publication History

Abstract

Distributed algorithms have many mission-critical applications ranging from embedded systems and replicated databases to cloud computing. Due to asynchronous communication, process faults, or network failures, these algorithms are difficult to design and verify. Many algorithms achieve fault tolerance by using threshold guards that, for instance, ensure that a process waits until it has received an acknowledgment from a majority of its peers. Consequently, domain-specific languages for fault-tolerant distributed systems offer language support for threshold guards.

We introduce an automated method for model checking of safety and liveness of threshold-guarded distributed algorithms in systems where the number of processes and the fraction of faulty processes are parameters. Our method is based on a short counterexample property: if a distributed algorithm violates a temporal specification (in a fragment of LTL), then there is a counterexample whose length is bounded and independent of the parameters. We prove this property by (i) characterizing executions depending on the structure of the temporal formula, and (ii) using commutativity of transitions to accelerate and shorten executions. We extended the ByMC toolset (Byzantine Model Checker) with our technique, and verified liveness and safety of 10 prominent fault-tolerant distributed algorithms, most of which were out of reach for existing techniques.

References

[1]

P. A. Abdulla, A. Bouajjani, and B. Jonsson. On-the-fly analysis of systems with unbounded, lossy FIFO channels. In CAV, LNCS, pages 305–318, 1998.

Digital Library

[2]

F. Alberti, S. Ghilardi, and E. Pagani. Counting constraints in flat array fragments. In IJCAR, volume 9706 of LNCS, pages 65–81, 2016.

Digital Library

[3]

K. Apt and D. Kozen. Limits for automatic verification of finite-state concurrent systems. IPL, 15:307–309, 1986.

Digital Library

[4]

M. F. Atig, A. Bouajjani, M. Emmi, and A. Lal. Detecting fair nontermination in multithreaded programs. In CAV, pages 210–226, 2012.

Digital Library

[5]

C. Baier and J.-P. Katoen. Principles of model checking. MIT Press, 2008.

Digital Library

[6]

S. Bardin, A. Finkel, J. Leroux, and L. Petrucci. Fast: acceleration from theory to practice. STTT, 10(5):401–424, 2008.

Digital Library

[7]

M. Biely, P. Delgado, Z. Milosevic, and A. Schiper. Distal: a framework for implementing fault-tolerant distributed algorithms. In DSN, pages 1–8, 2013.

Digital Library

[8]

A. Biere, C. Artho, and V. Schuppan. Liveness checking as safety checking. Electronic Notes in Theoretical Computer Science, 66(2): 160–177, 2002.

[9]

R. Bloem, S. Jacobs, A. Khalimov, I. Konnov, S. Rubin, H. Veith, and J. Widder. Decidability of Parameterized Verification. Synthesis Lectures on Distributed Computing Theory. Morgan & Claypool Publishers, 2015.

Digital Library

[10]

A. Bouajjani, P. Habermehl, and T. Vojnar. Abstract regular model checking. In CAV, LNCS, pages 372–386, 2004.

[11]

G. Bracha and S. Toueg. Asynchronous consensus and broadcast protocols. J. ACM, 32(4):824–840, 1985.

Digital Library

[12]

F. V. Brasileiro, F. Greve, A. Mostéfaoui, and M. Raynal. Consensus in one communication step. In PaCT, volume 2127 of LNCS, pages 42–50, 2001.

Digital Library

[13]

E. R. Canfield and S. G. Williamson. A loop-free algorithm for generating the linear extensions of a poset. Order, 12(1):57–75, 1995.

[14]

T. D. Chandra and S. Toueg. Unreliable failure detectors for reliable distributed systems. J. ACM, 43(2):225–267, 1996.

Digital Library

[15]

B. Charron-Bost and S. Merz. Formal verification of a consensus algorithm in the heard-of model. IJSI, 3(2–3):273–303, 2009.

[16]

K. Chaudhuri, D. Doligez, L. Lamport, and S. Merz. Verifying safety properties with the TLA+ proof system. In IJCAR, volume 6173 of LNCS, pages 142–148, 2010.

Digital Library

[17]

E. Clarke, O. Grumberg, and D. Peled. Model Checking. MIT Press, 1999.

Digital Library

[18]

E. Clarke, M. Talupur, and H. Veith. Proving Ptolemy right: the environment abstraction framework for model checking concurrent systems. In TACAS’08/ETAPS’08, pages 33–47. Springer, 2008.

Digital Library

[19]

E. Cohen and L. Lamport. Reduction in TLA. In CONCUR, volume 1466 of LNCS, pages 317–331, 1998.

Digital Library

[20]

L. De Moura and N. Bjørner. Z3: An efficient SMT solver. In TACAS, volume 1579 of LNCS, pages 337–340. 2008.

Digital Library

[21]

D. Dobre and N. Suri. One-step consensus with zero-degradation. In DSN, pages 137–146, 2006.

Digital Library

[22]

T. W. Doeppner. Parallel program correctness through refinement. In POPL, pages 155–169, 1977.

Digital Library

[23]

C. Drăgoi, T. A. Henzinger, and D. Zufferey. PSync: a partially synchronous language for fault-tolerant distributed algorithms. In POPL, pages 400–415, 2016.

Digital Library

[24]

C. Drăgoi, T. A. Henzinger, H. Veith, J. Widder, and D. Zufferey. A logic-based framework for verifying consensus algorithms. In VMCAI, volume 8318 of LNCS, pages 161–181, 2014.

Digital Library

[25]

T. Elmas, S. Qadeer, and S. Tasiran. A calculus of atomic actions. In POPL, pages 2–15, 2009.

Digital Library

[26]

E. Emerson and K. Namjoshi. Reasoning about rings. In POPL, pages 85–94, 1995.

Digital Library

[27]

E. A. Emerson and V. Kahlon. Model checking guarded protocols. In LICS, pages 361–370. IEEE, 2003.

[28]

J. Esparza, A. Finkel, and R. Mayr. On the verification of broadcast protocols. In LICS, pages 352–359. IEEE Computer Society, 1999.

Digital Library

[29]

K. Etessami, M. Y. Vardi, and T. Wilke. First-order logic with two variables and unary temporal logic. Inf. Comput., 179(2):279–295, 2002.

Digital Library

[30]

Y. Fang, N. Piterman, A. Pnueli, and L. D. Zuck. Liveness with invisible ranking. STTT, 8(3):261–279, 2006.

Digital Library

[31]

A. Farzan, Z. Kincaid, and A. Podelski. Proving liveness of parameterized programs. In LICS, pages 185–196, 2016.

Digital Library

[32]

M. J. Fischer, N. A. Lynch, and M. S. Paterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2):374–382, 1985.

Digital Library

[33]

D. Fisman, O. Kupferman, and Y. Lustig. On verifying fault tolerance of distributed protocols. In TACAS, volume 4963 of LNCS, pages 315–331. Springer, 2008.

Digital Library

[34]

C. Flanagan, S. N. Freund, and S. Qadeer. Exploiting purity for atomicity. IEEE Trans. Softw. Eng., 31(4):275–291, 2005.

Digital Library

[35]

S. M. German and A. P. Sistla. Reasoning about systems with many processes. J. ACM, 39:675–735, 1992.

Digital Library

[36]

A. Gmeiner, I. Konnov, U. Schmid, H. Veith, and J. Widder. Tutorial on parameterized model checking of fault-tolerant distributed algorithms. In Formal Methods for Executable Software Models, LNCS, pages 122–171. Springer, 2014.

Digital Library

[37]

R. Guerraoui. Non-blocking atomic commit in asynchronous distributed systems with failure detectors. Distributed Computing, 15 (1):17–25, 2002.

Digital Library

[38]

C. Hawblitzel, J. Howell, M. Kapritsos, J. R. Lorch, B. Parno, M. L. Roberts, S. T. V. Setty, and B. Zill. Ironfleet: proving practical distributed systems correct. In SOSP, pages 1–17, 2015.

Digital Library

[39]

G. Holzmann. The SPIN Model Checker. Addison-Wesley, 2003.

Digital Library

[40]

A. John, I. Konnov, U. Schmid, H. Veith, and J. Widder. Parameterized model checking of fault-tolerant distributed algorithms by abstraction. In FMCAD, pages 201–209, 2013.

[41]

C. E. Killian, J. W. Anderson, R. Braud, R. Jhala, and A. Vahdat. Mace: language support for building distributed systems. In ACM SIGPLAN PLDI, pages 179–188, 2007.

Digital Library

[42]

I. Konnov, H. Veith, and J. Widder. SMT and POR beat counter abstraction: Parameterized model checking of threshold-based distributed algorithms. In CAV (Part I), volume 9206 of LNCS, pages 85–102, 2015.

[43]

I. Konnov, M. Lazi´c, H. Veith, and J. Widder. A short counterexample property for safety and liveness verification of fault-tolerant distributed algorithms. CoRR, abs/1608.05327, 2016.

Digital Library

[44]

I. Konnov, H. Veith, and J. Widder. On the completeness of bounded model checking for threshold-based distributed algorithms: Reachability. Information and Computation, 2016. Accepted manuscript available online: 10-MAR-2016.

Digital Library

[45]

I. Konnov, H. Veith, and J. Widder. What you always wanted to know about model checking of fault-tolerant distributed algorithms. In PSI 2015, Revised Selected Papers, volume 9609 of LNCS, pages 6–21. Springer, 2016.

[46]

D. Kroening, J. Ouaknine, O. Strichman, T. Wahl, and J. Worrell. Linear completeness thresholds for bounded model checking. In CAV, volume 6806 of LNCS, pages 557–572, 2011.

Digital Library

[47]

L. Lamport and F. B. Schneider. Pretending atomicity. Technical Report 44, SRC, 1989.

[48]

M. Lesani, C. J. Bell, and A. Chlipala. Chapar: certified causally consistent distributed key-value stores. In POPL, pages 357–370, 2016.

Digital Library

[49]

P. Lincoln and J. Rushby. A formally verified algorithm for interactive consistency under a hybrid fault model. In FTCS, pages 402–411, 1993.

[50]

R. J. Lipton. Reduction: A method of proving properties of parallel programs. Commun. ACM, 18(12):717–721, 1975.

Digital Library

[51]

B. D. Lubachevsky. An approach to automating the verification of compact parallel coordination programs. I. Acta Informatica, 21(2): 125–169, 1984.

Digital Library

[52]

A. Mostéfaoui, E. Mourgaya, P. R. Parvédy, and M. Raynal. Evaluating the condition-based approach to solve consensus. In DSN, pages 541– 550, 2003.

[53]

Netflix. 5 lessons we have learned using AWS. 2010.

[54]

retrieved on Nov. 7, 2016. http://techblog.netflix.com/2010/ 12/5-lessons-weve-learned-using-aws.html.

[55]

D. Ongaro and J. Ousterhout. In search of an understandable consensus algorithm. In USENIX ATC, pages 305–320, 2014.

Digital Library

[56]

O. Padon, K. L. McMillan, A. Panda, M. Sagiv, and S. Shoham. Ivy: safety verification by interactive generalization. In PLDI, pages 614– 630, 2016.

Digital Library

[57]

M. Pease, R. Shostak, and L. Lamport. Reaching agreement in the presence of faults. J. ACM, 27(2):228–234, 1980.

Digital Library

[58]

S. Peluso, A. Turcu, R. Palmieri, G. Losa, and B. Ravindran. Making fast consensus generally faster. In DSN, pages 156–167, 2016.

[59]

A. Pnueli and E. Shahar. Liveness and acceleration in parameterized verification. In CAV, LNCS, pages 328–343, 2000.

Digital Library

[60]

A. Pnueli, J. Xu, and L. Zuck. Liveness with (0,1,∞)counter abstraction. In CAV, volume 2404 of LNCS, pages 93–111. 2002.

Digital Library

[61]

V. Rahli, D. Guaspari, M. Bickford, and R. L. Constable. Formal specification, verification, and implementation of fault-tolerant systems using EventML. ECEASST, 72, 2015.

[62]

M. Raynal. A case study of agreement problems in distributed systems: Non-blocking atomic commitment. In HASE, pages 209–214, 1997.

Digital Library

[63]

V. Schuppan and A. Biere. Liveness checking as safety checking for infinite state spaces. Electronic Notes in Theoretical Computer Science, 149(1):79–96, 2006.

Digital Library

[64]

Y. J. Song and R. van Renesse. Bosco: One-step Byzantine asynchronous consensus. In DISC, volume 5218 of LNCS, pages 438–450, 2008.

Digital Library

[65]

T. Srikanth and S. Toueg. Simulating authenticated broadcasts to derive simple fault-tolerant algorithms. Dist. Comp., 2:80–94, 1987.

Digital Library

[66]

TLA. TLA+ toolbox. http://research.microsoft.com/en-us/ um/people/lamport/tla/tools.html.

[67]

M. Y. Vardi and P. Wolper. An automata-theoretic approach to automatic program verification. In LICS, pages 322–331, 1986.

[68]

K. von Gleissenthall, N. Bjørner, and A. Rybalchenko. Cardinalities and universal quantifiers for verifying parameterized systems. In PLDI, pages 599–613, 2016.

Digital Library

[69]

J. R. Wilcox, D. Woos, P. Panchekha, Z. Tatlock, X. Wang, M. D. Ernst, and T. E. Anderson. Verdi: a framework for implementing and formally verifying distributed systems. In PLDI, pages 357–368, 2015.

Digital Library

Cited By

Gao SZhan BWu ZZhang L(2024)Verifying Randomized Consensus Protocols with Common Coins2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN58291.2024.00047(403-415)Online publication date: 24-Jun-2024
https://doi.org/10.1109/DSN58291.2024.00047
Baumeister TEichler PJacobs SSakr MVölp M(2024)Parameterized Verification of Round-Based Distributed Algorithms via Extended Threshold AutomataFormal Methods10.1007/978-3-031-71162-6_33(638-657)Online publication date: 11-Sep-2024
https://doi.org/10.1007/978-3-031-71162-6_33
Tamir OTaube MMcMillan KShoham SHowell JGueta GSagiv M(2023)Counterexample Driven Quantifier Instantiations with Applications to Distributed ProtocolsProceedings of the ACM on Programming Languages10.1145/36228647:OOPSLA2(1878-1904)Online publication date: 16-Oct-2023
https://dl.acm.org/doi/10.1145/3622864
Show More Cited By

Index Terms

A short counterexample property for safety and liveness verification of fault-tolerant distributed algorithms

Recommendations

Ivy: safety verification by interactive generalization
PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation

Despite several decades of research, the problem of formal verification of infinite-state systems has resisted effective automation. We describe a system --- Ivy --- for interactively verifying safety of infinite-state systems. Ivy's key principle is ...
A short counterexample property for safety and liveness verification of fault-tolerant distributed algorithms
POPL '17

Distributed algorithms have many mission-critical applications ranging from embedded systems and replicated databases to cloud computing. Due to asynchronous communication, process faults, or network failures, these algorithms are difficult to design ...
On the completeness of bounded model checking for threshold-based distributed algorithms

Counter abstraction is a powerful tool for parameterized model checking, if the number of local states of the concurrent processes is relatively small. In recent work, we introduced parametric interval counter abstraction that allowed us to verify the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

POPL '17: Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages

January 2017

901 pages

ISBN:9781450346603

DOI:10.1145/3009837

General Chair:
Giuseppe Castagna
Paris Diderot University, France / CNRS, France
,
Program Chair:
Andrew D. Gordon
Microsoft Research, UK / University of Edinburgh, UK

ACM SIGPLAN Notices Volume 52, Issue 1
POPL '17
January 2017
901 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3093333
Editor:
Matthew Fluet
Issue’s Table of Contents

Copyright © 2017 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

POPL '17

Sponsor:

SIGPLAN

POPL '17: The 44th Annual ACM SIGPLAN Symposium on Principles of Programming Languages

January 15 - 21, 2017

Paris, France

Acceptance Rates

Overall Acceptance Rate 824 of 4,130 submissions, 20%

Upcoming Conference

POPL '25

Sponsor:
sigplan

The 52nd Annual ACM SIGPLAN Symposium on Principles of Programming Languages

January 19 - 25, 2025

Denver , CO , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

41
Total Citations
View Citations
513
Total Downloads

Downloads (Last 12 months)31
Downloads (Last 6 weeks)2

Reflects downloads up to 28 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gao SZhan BWu ZZhang L(2024)Verifying Randomized Consensus Protocols with Common Coins2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN58291.2024.00047(403-415)Online publication date: 24-Jun-2024
https://doi.org/10.1109/DSN58291.2024.00047
Baumeister TEichler PJacobs SSakr MVölp M(2024)Parameterized Verification of Round-Based Distributed Algorithms via Extended Threshold AutomataFormal Methods10.1007/978-3-031-71162-6_33(638-657)Online publication date: 11-Sep-2024
https://doi.org/10.1007/978-3-031-71162-6_33
Tamir OTaube MMcMillan KShoham SHowell JGueta GSagiv M(2023)Counterexample Driven Quantifier Instantiations with Applications to Distributed ProtocolsProceedings of the ACM on Programming Languages10.1145/36228647:OOPSLA2(1878-1904)Online publication date: 16-Oct-2023
https://dl.acm.org/doi/10.1145/3622864
Thomas BSankur O(2023)PyLTA: A Verification Tool for Parameterized Distributed AlgorithmsTools and Algorithms for the Construction and Analysis of Systems10.1007/978-3-031-30820-8_4(28-35)Online publication date: 22-Apr-2023
https://dl.acm.org/doi/10.1007/978-3-031-30820-8_4
Li JHu KZhu JBodeveix JYe Y(2022)Formal Modelling of PBFT Consensus Algorithm in Event-BWireless Communications & Mobile Computing10.1155/2022/44679172022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/4467917
Feldman YSagiv MShoham SWilcox J(2022)Property-directed reachability as abstract interpretation in the monotone theoryProceedings of the ACM on Programming Languages10.1145/34986766:POPL(1-31)Online publication date: 12-Jan-2022
https://dl.acm.org/doi/10.1145/3498676
Viering MHu REugster PZiarek L(2021)A multiparty session typing discipline for fault-tolerant event-driven distributed programmingProceedings of the ACM on Programming Languages10.1145/34855015:OOPSLA(1-30)Online publication date: 15-Oct-2021
https://dl.acm.org/doi/10.1145/3485501
Moloodi HFaghih FBonakdarpour B(2021)Parameterized Distributed Synthesis of Fault-Tolerance Using Counter Abstraction2021 40th International Symposium on Reliable Distributed Systems (SRDS)10.1109/SRDS53918.2021.00016(67-77)Online publication date: Sep-2021
https://doi.org/10.1109/SRDS53918.2021.00016
Crain TNatoli CGramoli V(2021)Red Belly: A Secure, Fair and Scalable Open Blockchain2021 IEEE Symposium on Security and Privacy (SP)10.1109/SP40001.2021.00087(466-483)Online publication date: May-2021
https://doi.org/10.1109/SP40001.2021.00087
Griffin JLesani MShadab NYin X(2020)TLC: temporal logic of distributed componentsProceedings of the ACM on Programming Languages10.1145/34090054:ICFP(1-30)Online publication date: 3-Aug-2020
https://dl.acm.org/doi/10.1145/3409005
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents