Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3009837.3009860acmconferencesArticle/Chapter ViewAbstractPublication PagespoplConference Proceedingsconference-collections
research-article

A short counterexample property for safety and liveness verification of fault-tolerant distributed algorithms

Published: 01 January 2017 Publication History

Abstract

Distributed algorithms have many mission-critical applications ranging from embedded systems and replicated databases to cloud computing. Due to asynchronous communication, process faults, or network failures, these algorithms are difficult to design and verify. Many algorithms achieve fault tolerance by using threshold guards that, for instance, ensure that a process waits until it has received an acknowledgment from a majority of its peers. Consequently, domain-specific languages for fault-tolerant distributed systems offer language support for threshold guards.
We introduce an automated method for model checking of safety and liveness of threshold-guarded distributed algorithms in systems where the number of processes and the fraction of faulty processes are parameters. Our method is based on a short counterexample property: if a distributed algorithm violates a temporal specification (in a fragment of LTL), then there is a counterexample whose length is bounded and independent of the parameters. We prove this property by (i) characterizing executions depending on the structure of the temporal formula, and (ii) using commutativity of transitions to accelerate and shorten executions. We extended the ByMC toolset (Byzantine Model Checker) with our technique, and verified liveness and safety of 10 prominent fault-tolerant distributed algorithms, most of which were out of reach for existing techniques.

References

[1]
P. A. Abdulla, A. Bouajjani, and B. Jonsson. On-the-fly analysis of systems with unbounded, lossy FIFO channels. In CAV, LNCS, pages 305–318, 1998.
[2]
F. Alberti, S. Ghilardi, and E. Pagani. Counting constraints in flat array fragments. In IJCAR, volume 9706 of LNCS, pages 65–81, 2016.
[3]
K. Apt and D. Kozen. Limits for automatic verification of finite-state concurrent systems. IPL, 15:307–309, 1986.
[4]
M. F. Atig, A. Bouajjani, M. Emmi, and A. Lal. Detecting fair nontermination in multithreaded programs. In CAV, pages 210–226, 2012.
[5]
C. Baier and J.-P. Katoen. Principles of model checking. MIT Press, 2008.
[6]
S. Bardin, A. Finkel, J. Leroux, and L. Petrucci. Fast: acceleration from theory to practice. STTT, 10(5):401–424, 2008.
[7]
M. Biely, P. Delgado, Z. Milosevic, and A. Schiper. Distal: a framework for implementing fault-tolerant distributed algorithms. In DSN, pages 1–8, 2013.
[8]
A. Biere, C. Artho, and V. Schuppan. Liveness checking as safety checking. Electronic Notes in Theoretical Computer Science, 66(2): 160–177, 2002.
[9]
R. Bloem, S. Jacobs, A. Khalimov, I. Konnov, S. Rubin, H. Veith, and J. Widder. Decidability of Parameterized Verification. Synthesis Lectures on Distributed Computing Theory. Morgan & Claypool Publishers, 2015.
[10]
A. Bouajjani, P. Habermehl, and T. Vojnar. Abstract regular model checking. In CAV, LNCS, pages 372–386, 2004.
[11]
G. Bracha and S. Toueg. Asynchronous consensus and broadcast protocols. J. ACM, 32(4):824–840, 1985.
[12]
F. V. Brasileiro, F. Greve, A. Mostéfaoui, and M. Raynal. Consensus in one communication step. In PaCT, volume 2127 of LNCS, pages 42–50, 2001.
[13]
E. R. Canfield and S. G. Williamson. A loop-free algorithm for generating the linear extensions of a poset. Order, 12(1):57–75, 1995.
[14]
T. D. Chandra and S. Toueg. Unreliable failure detectors for reliable distributed systems. J. ACM, 43(2):225–267, 1996.
[15]
B. Charron-Bost and S. Merz. Formal verification of a consensus algorithm in the heard-of model. IJSI, 3(2–3):273–303, 2009.
[16]
K. Chaudhuri, D. Doligez, L. Lamport, and S. Merz. Verifying safety properties with the TLA+ proof system. In IJCAR, volume 6173 of LNCS, pages 142–148, 2010.
[17]
E. Clarke, O. Grumberg, and D. Peled. Model Checking. MIT Press, 1999.
[18]
E. Clarke, M. Talupur, and H. Veith. Proving Ptolemy right: the environment abstraction framework for model checking concurrent systems. In TACAS’08/ETAPS’08, pages 33–47. Springer, 2008.
[19]
E. Cohen and L. Lamport. Reduction in TLA. In CONCUR, volume 1466 of LNCS, pages 317–331, 1998.
[20]
L. De Moura and N. Bjørner. Z3: An efficient SMT solver. In TACAS, volume 1579 of LNCS, pages 337–340. 2008.
[21]
D. Dobre and N. Suri. One-step consensus with zero-degradation. In DSN, pages 137–146, 2006.
[22]
T. W. Doeppner. Parallel program correctness through refinement. In POPL, pages 155–169, 1977.
[23]
C. Drăgoi, T. A. Henzinger, and D. Zufferey. PSync: a partially synchronous language for fault-tolerant distributed algorithms. In POPL, pages 400–415, 2016.
[24]
C. Drăgoi, T. A. Henzinger, H. Veith, J. Widder, and D. Zufferey. A logic-based framework for verifying consensus algorithms. In VMCAI, volume 8318 of LNCS, pages 161–181, 2014.
[25]
T. Elmas, S. Qadeer, and S. Tasiran. A calculus of atomic actions. In POPL, pages 2–15, 2009.
[26]
E. Emerson and K. Namjoshi. Reasoning about rings. In POPL, pages 85–94, 1995.
[27]
E. A. Emerson and V. Kahlon. Model checking guarded protocols. In LICS, pages 361–370. IEEE, 2003.
[28]
J. Esparza, A. Finkel, and R. Mayr. On the verification of broadcast protocols. In LICS, pages 352–359. IEEE Computer Society, 1999.
[29]
K. Etessami, M. Y. Vardi, and T. Wilke. First-order logic with two variables and unary temporal logic. Inf. Comput., 179(2):279–295, 2002.
[30]
Y. Fang, N. Piterman, A. Pnueli, and L. D. Zuck. Liveness with invisible ranking. STTT, 8(3):261–279, 2006.
[31]
A. Farzan, Z. Kincaid, and A. Podelski. Proving liveness of parameterized programs. In LICS, pages 185–196, 2016.
[32]
M. J. Fischer, N. A. Lynch, and M. S. Paterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2):374–382, 1985.
[33]
D. Fisman, O. Kupferman, and Y. Lustig. On verifying fault tolerance of distributed protocols. In TACAS, volume 4963 of LNCS, pages 315–331. Springer, 2008.
[34]
C. Flanagan, S. N. Freund, and S. Qadeer. Exploiting purity for atomicity. IEEE Trans. Softw. Eng., 31(4):275–291, 2005.
[35]
S. M. German and A. P. Sistla. Reasoning about systems with many processes. J. ACM, 39:675–735, 1992.
[36]
A. Gmeiner, I. Konnov, U. Schmid, H. Veith, and J. Widder. Tutorial on parameterized model checking of fault-tolerant distributed algorithms. In Formal Methods for Executable Software Models, LNCS, pages 122–171. Springer, 2014.
[37]
R. Guerraoui. Non-blocking atomic commit in asynchronous distributed systems with failure detectors. Distributed Computing, 15 (1):17–25, 2002.
[38]
C. Hawblitzel, J. Howell, M. Kapritsos, J. R. Lorch, B. Parno, M. L. Roberts, S. T. V. Setty, and B. Zill. Ironfleet: proving practical distributed systems correct. In SOSP, pages 1–17, 2015.
[39]
G. Holzmann. The SPIN Model Checker. Addison-Wesley, 2003.
[40]
A. John, I. Konnov, U. Schmid, H. Veith, and J. Widder. Parameterized model checking of fault-tolerant distributed algorithms by abstraction. In FMCAD, pages 201–209, 2013.
[41]
C. E. Killian, J. W. Anderson, R. Braud, R. Jhala, and A. Vahdat. Mace: language support for building distributed systems. In ACM SIGPLAN PLDI, pages 179–188, 2007.
[42]
I. Konnov, H. Veith, and J. Widder. SMT and POR beat counter abstraction: Parameterized model checking of threshold-based distributed algorithms. In CAV (Part I), volume 9206 of LNCS, pages 85–102, 2015.
[43]
I. Konnov, M. Lazi´c, H. Veith, and J. Widder. A short counterexample property for safety and liveness verification of fault-tolerant distributed algorithms. CoRR, abs/1608.05327, 2016.
[44]
I. Konnov, H. Veith, and J. Widder. On the completeness of bounded model checking for threshold-based distributed algorithms: Reachability. Information and Computation, 2016. Accepted manuscript available online: 10-MAR-2016.
[45]
I. Konnov, H. Veith, and J. Widder. What you always wanted to know about model checking of fault-tolerant distributed algorithms. In PSI 2015, Revised Selected Papers, volume 9609 of LNCS, pages 6–21. Springer, 2016.
[46]
D. Kroening, J. Ouaknine, O. Strichman, T. Wahl, and J. Worrell. Linear completeness thresholds for bounded model checking. In CAV, volume 6806 of LNCS, pages 557–572, 2011.
[47]
L. Lamport and F. B. Schneider. Pretending atomicity. Technical Report 44, SRC, 1989.
[48]
M. Lesani, C. J. Bell, and A. Chlipala. Chapar: certified causally consistent distributed key-value stores. In POPL, pages 357–370, 2016.
[49]
P. Lincoln and J. Rushby. A formally verified algorithm for interactive consistency under a hybrid fault model. In FTCS, pages 402–411, 1993.
[50]
R. J. Lipton. Reduction: A method of proving properties of parallel programs. Commun. ACM, 18(12):717–721, 1975.
[51]
B. D. Lubachevsky. An approach to automating the verification of compact parallel coordination programs. I. Acta Informatica, 21(2): 125–169, 1984.
[52]
A. Mostéfaoui, E. Mourgaya, P. R. Parvédy, and M. Raynal. Evaluating the condition-based approach to solve consensus. In DSN, pages 541– 550, 2003.
[53]
Netflix. 5 lessons we have learned using AWS. 2010.
[54]
retrieved on Nov. 7, 2016. http://techblog.netflix.com/2010/ 12/5-lessons-weve-learned-using-aws.html.
[55]
D. Ongaro and J. Ousterhout. In search of an understandable consensus algorithm. In USENIX ATC, pages 305–320, 2014.
[56]
O. Padon, K. L. McMillan, A. Panda, M. Sagiv, and S. Shoham. Ivy: safety verification by interactive generalization. In PLDI, pages 614– 630, 2016.
[57]
M. Pease, R. Shostak, and L. Lamport. Reaching agreement in the presence of faults. J. ACM, 27(2):228–234, 1980.
[58]
S. Peluso, A. Turcu, R. Palmieri, G. Losa, and B. Ravindran. Making fast consensus generally faster. In DSN, pages 156–167, 2016.
[59]
A. Pnueli and E. Shahar. Liveness and acceleration in parameterized verification. In CAV, LNCS, pages 328–343, 2000.
[60]
A. Pnueli, J. Xu, and L. Zuck. Liveness with (0,1,∞)counter abstraction. In CAV, volume 2404 of LNCS, pages 93–111. 2002.
[61]
V. Rahli, D. Guaspari, M. Bickford, and R. L. Constable. Formal specification, verification, and implementation of fault-tolerant systems using EventML. ECEASST, 72, 2015.
[62]
M. Raynal. A case study of agreement problems in distributed systems: Non-blocking atomic commitment. In HASE, pages 209–214, 1997.
[63]
V. Schuppan and A. Biere. Liveness checking as safety checking for infinite state spaces. Electronic Notes in Theoretical Computer Science, 149(1):79–96, 2006.
[64]
Y. J. Song and R. van Renesse. Bosco: One-step Byzantine asynchronous consensus. In DISC, volume 5218 of LNCS, pages 438–450, 2008.
[65]
T. Srikanth and S. Toueg. Simulating authenticated broadcasts to derive simple fault-tolerant algorithms. Dist. Comp., 2:80–94, 1987.
[66]
TLA. TLA+ toolbox. http://research.microsoft.com/en-us/ um/people/lamport/tla/tools.html.
[67]
M. Y. Vardi and P. Wolper. An automata-theoretic approach to automatic program verification. In LICS, pages 322–331, 1986.
[68]
K. von Gleissenthall, N. Bjørner, and A. Rybalchenko. Cardinalities and universal quantifiers for verifying parameterized systems. In PLDI, pages 599–613, 2016.
[69]
J. R. Wilcox, D. Woos, P. Panchekha, Z. Tatlock, X. Wang, M. D. Ernst, and T. E. Anderson. Verdi: a framework for implementing and formally verifying distributed systems. In PLDI, pages 357–368, 2015.

Cited By

View all
  • (2024)Verifying Randomized Consensus Protocols with Common Coins2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN58291.2024.00047(403-415)Online publication date: 24-Jun-2024
  • (2024)Parameterized Verification of Round-Based Distributed Algorithms via Extended Threshold AutomataFormal Methods10.1007/978-3-031-71162-6_33(638-657)Online publication date: 11-Sep-2024
  • (2023)Counterexample Driven Quantifier Instantiations with Applications to Distributed ProtocolsProceedings of the ACM on Programming Languages10.1145/36228647:OOPSLA2(1878-1904)Online publication date: 16-Oct-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
POPL '17: Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages
January 2017
901 pages
ISBN:9781450346603
DOI:10.1145/3009837
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Byzantine faults
  2. Parameterized model checking
  3. fault- tolerant distributed algorithms
  4. reliable broadcast

Qualifiers

  • Research-article

Funding Sources

Conference

POPL '17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 824 of 4,130 submissions, 20%

Upcoming Conference

POPL '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)31
  • Downloads (Last 6 weeks)2
Reflects downloads up to 28 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Verifying Randomized Consensus Protocols with Common Coins2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN58291.2024.00047(403-415)Online publication date: 24-Jun-2024
  • (2024)Parameterized Verification of Round-Based Distributed Algorithms via Extended Threshold AutomataFormal Methods10.1007/978-3-031-71162-6_33(638-657)Online publication date: 11-Sep-2024
  • (2023)Counterexample Driven Quantifier Instantiations with Applications to Distributed ProtocolsProceedings of the ACM on Programming Languages10.1145/36228647:OOPSLA2(1878-1904)Online publication date: 16-Oct-2023
  • (2023)PyLTA: A Verification Tool for Parameterized Distributed AlgorithmsTools and Algorithms for the Construction and Analysis of Systems10.1007/978-3-031-30820-8_4(28-35)Online publication date: 22-Apr-2023
  • (2022)Formal Modelling of PBFT Consensus Algorithm in Event-BWireless Communications & Mobile Computing10.1155/2022/44679172022Online publication date: 1-Jan-2022
  • (2022)Property-directed reachability as abstract interpretation in the monotone theoryProceedings of the ACM on Programming Languages10.1145/34986766:POPL(1-31)Online publication date: 12-Jan-2022
  • (2021)A multiparty session typing discipline for fault-tolerant event-driven distributed programmingProceedings of the ACM on Programming Languages10.1145/34855015:OOPSLA(1-30)Online publication date: 15-Oct-2021
  • (2021)Parameterized Distributed Synthesis of Fault-Tolerance Using Counter Abstraction2021 40th International Symposium on Reliable Distributed Systems (SRDS)10.1109/SRDS53918.2021.00016(67-77)Online publication date: Sep-2021
  • (2021)Red Belly: A Secure, Fair and Scalable Open Blockchain2021 IEEE Symposium on Security and Privacy (SP)10.1109/SP40001.2021.00087(466-483)Online publication date: May-2021
  • (2020)TLC: temporal logic of distributed componentsProceedings of the ACM on Programming Languages10.1145/34090054:ICFP(1-30)Online publication date: 3-Aug-2020
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media