Article

Fast fault-tolerant agreement algorithms

Authors:

Carole Delporte-Gallet,

Hugues Fauconnier,

Stephanie Lorraine Horn,

Sam TouegAuthors Info & Claims

PODC '05: Proceedings of the twenty-fourth annual ACM symposium on Principles of distributed computing

Pages 169 - 178

https://doi.org/10.1145/1073814.1073847

Published: 17 July 2005 Publication History

Abstract

In the synchronous round-based model, a process crash is dirty if it occurs exactly while a process is sending messages in a round, and this causes the process to send to some, but not all, of the intended recipients for the given round. Dirty crashes are possible; however, they are unlikely to occur, since the time spent sending messages is usually very small compared to the maximum message delay (i.e., compared to the duration of a round). In this paper, we investigate how fast one can solve some agreement problems, namely consensus and terminating reliable broadcast (TRB), when the number of dirty crashes that occur is small. In particular, we describe some algorithms for the uniform and non-uniform versions of these problems, and provide some matching lower bounds. All our uniform algorithms are strictly better than conventional early-stopping algorithms, in the sense that they never take more rounds to decide or halt, and they take fewer rounds when the number of dirty crashes is small.

References

[1]

M. K. Aguilera and S. Toueg. A simple bivalency proof that t-resilient consensus requires t+1 rounds. Information Processing Letters, 71(3--4):155--158, August 1999.

[2]

P. Berman, J. A. Garay, and K. J. Perry. Optimal early stopping in distributed consensus (extended abstract). In WDAG'92: Proceedings of the 6th International Workshop on Distributed Algorithms, pages 221--237. Springer, 1992.

Digital Library

[3]

T. D. Chandra and S. Toueg. Time and message efficient reliable broadcasts. In WDAG '90: Proceedings of the 4th International Workshop on Distributed Algorithms, pages 289--303. Springer, 1990.

Digital Library

[4]

B. Charron-Bost and A. Schiper. Uniform consensus is harder than consensus. J. Algorithms, 51(1):15--37, 2004.

Digital Library

[5]

D. Dolev, R. Reischuk, and H. R. Strong. Early stopping in byzantine agreement. J. ACM, 37(4):720--741, 1990.

Digital Library

[6]

C. Dwork and Y. Moses. Knowledge and common knowledge in a byzantine environment: Crash failures. Information and Computation, 88(2):156--186, Oct 1990.

Digital Library

[7]

F. Le Fessant. The complexity of early-deciding in unreliable synchronous networks. Technical Report TR-2003-23, Microsoft Research, 2003.

[8]

F. Fich and E. Ruppert. Hundreds of impossibility results for distributed computing. Distributed Computing, 16(2-3):121--163, 2003.

Digital Library

[9]

M. J. Fischer and N. A. Lynch. A lower bound for the time to assure interactive consistency. Information Processing Letters, 14(4):183--186, June 1982.

[10]

M. J. Fischer, N. A. Lynch, and M. S. Paterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2):374--382, 1985.

Digital Library

[11]

V. Hadzilacos. A lower bound for byzantine agreement with fail-stop processors. Technical Report 21-83, Department of Computer Science, Harvard University, July 1983.

[12]

V. Hadzilacos and S. Toueg. A modular approach to the specification and implementation of fault-tolerant broadcasts. Technical Report TR94-1425, Department of Computer Science, Cornell University, May 1994.

[13]

I. Keidar and S. Rajsbaum. A simple proof of the uniform consensus synchronous lower bound. Information Processing Letters, 85(1):47--52, January 2003.

Digital Library

[14]

L. Lamport and M. Fischer. Byzantine generals and transaction commit protocols. Technical Report 62, SRI International and Yale University, April 1982.

[15]

N. A. Lynch. Distributed Algorithms. Morgan Kaufmann Publishers Inc., 1996.

Digital Library

[16]

Y. Moses and S. Rajsbaum. A layered analysis of consensus. SIAM J. Comput., 31(4):989--1021, 2002.

Digital Library

[17]

A. Mostefaoui, S. Rajsbaum, and M. Raynal. Using conditions to expedite consensus in synchronous distributed systems. In DISC, volume 2848 of Lecture Notes in Computer Science, pages 249--263. Springer, 2003.

[18]

Ph. Raïpin Parvédy and M. Raynal. Optimal early stopping uniform consensus in synchronous systems with process omission failures. In SPAA '04: Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures, pages 302--310. ACM Press, 2004.

Digital Library

[19]

M.-C. Roçu. Early-stopping terminating reliable broadcast protocol for general-omission failures. In PODC '96: Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing, page 209. ACM Press, 1996.

Digital Library

[20]

J. Cao, X. B. Wang and Y. M. Teo. An approach to achieve message efficient early-stopping uniform consensus protocols. In ISPAN '04: Proceedings of International Symposium on Parallel Architectures, Algorithms, and Networks. IEEE Computer Society Press, May 2004.

Cited By

Widder JBiely MGridling GWeiss BBlanquart J(2012)Consensus in the presence of mortal Byzantine faulty processesDistributed Computing10.1007/s00446-011-0147-324:6(299-321)Online publication date: 1-Jan-2012
https://dl.acm.org/doi/10.1007/s00446-011-0147-3
Wang XTeo YCao J(2008)Message and time efficient consensus protocols for synchronous distributed systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2007.08.00868:5(641-654)Online publication date: 1-May-2008
https://dl.acm.org/doi/10.1016/j.jpdc.2007.08.008
Widder JGridling GWeiss BBlanquart J(2007)Synchronous Consensus with Mortal ByzantinesProceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks10.1109/DSN.2007.91(102-112)Online publication date: 25-Jun-2007
https://dl.acm.org/doi/10.1109/DSN.2007.91
Show More Cited By

Index Terms

Fast fault-tolerant agreement algorithms
1. Software and its engineering
  1. Software organization and properties
    1. Extra-functional properties
      1. Software fault tolerance
    2. Software system structures
      1. Distributed systems organizing principles
        Organizing principles for web applications
2. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Uniform Agreement Despite Process Omission Failures
IPDPS '03: Proceedings of the 17th International Symposium on Parallel and Distributed Processing

A process fails by omission if it "forgets" to send or receive messages. Considering omission failures is crucial for distributed systems, as such failures model both crash failures and incorrect behavior of process input/output buffers (such as buffer ...
Early-deciding consensus is expensive
PODC '13: Proceedings of the 2013 ACM symposium on Principles of distributed computing

In consensus, the n nodes of a distributed system seek to take a consistent decision on some output, despite up to t of them crashing or even failing maliciously, i.e., behaving "Byzantine''. It is known that it is impossible to guarantee that ...
Structured derivation of semi-synchronous algorithms
DISC'11: Proceedings of the 25th international conference on Distributed computing

The semi-synchronous model is an important middle ground between the synchronous and the asynchronous models of distributed computing. In this model, processes can detect (timeout) when other processes fail. However, since detection is done by timing ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PODC '05: Proceedings of the twenty-fourth annual ACM symposium on Principles of distributed computing

July 2005

364 pages

ISBN:1581139942

DOI:10.1145/1073814

General Chair:
Marcos Aguilera
HP Labs
,
Program Chair:
James Aspnes
Yale University

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 July 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

PODC05

Sponsor:

PODC05: ACM Symposium on Principles of Distributed Computing

July 17 - 20, 2005

NV, Las Vegas, USA

Acceptance Rates

Overall Acceptance Rate 740 of 2,477 submissions, 30%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
548
Total Downloads

Downloads (Last 12 months)29
Downloads (Last 6 weeks)2

Reflects downloads up to 21 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Widder JBiely MGridling GWeiss BBlanquart J(2012)Consensus in the presence of mortal Byzantine faulty processesDistributed Computing10.1007/s00446-011-0147-324:6(299-321)Online publication date: 1-Jan-2012
https://dl.acm.org/doi/10.1007/s00446-011-0147-3
Wang XTeo YCao J(2008)Message and time efficient consensus protocols for synchronous distributed systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2007.08.00868:5(641-654)Online publication date: 1-May-2008
https://dl.acm.org/doi/10.1016/j.jpdc.2007.08.008
Widder JGridling GWeiss BBlanquart J(2007)Synchronous Consensus with Mortal ByzantinesProceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks10.1109/DSN.2007.91(102-112)Online publication date: 25-Jun-2007
https://dl.acm.org/doi/10.1109/DSN.2007.91
Paquette MPelc A(2006)Optimal decision strategies in Byzantine environmentsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2005.10.01166:3(419-427)Online publication date: 1-Mar-2006
https://dl.acm.org/doi/10.1016/j.jpdc.2005.10.011

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents