Abstract
This paper presents a self-stabilizing failure detector, asynchronous consensus and replicated state-machine algorithm suite, the components of which can be started in an arbitrary state and converge to act as a virtual state-machine.
Self-stabilizing algorithms can cope with transient faults. Transient faults can alter the system state to an arbitrary state and hence, cause a temporary violation of the safety property of the consensus. New requirements for consensus that fit the on-going nature of self-stabilizing algorithms are presented. The wait-free consensus (and the replicated state-machine) algorithm presented is a classic combination of a failure detector and a (memory bounded) rotating coordinator consensus that satisfy both eventual safety and eventual liveness.
Several new techniques and paradigms are introduced. The bounded memory failure detector abstracts away synchronization assumptions using bounded heartbeat counters combined with a balance-unbalance mechanism. The practically infinite paradigm is introduced in the scope of self-stabilization, where an execution of, say, 264 sequential steps is regarded as (practically) infinite. Finally, we present the first self-stabilizing wait-free reset mechanism that ensures eventual safety and can be used in other scopes.
Partially supported by the Israeli Ministry of Science, the Lynn and William Frankel Center for Computer Sciences, and the Rita Altura Trust Chair in Computer Sciences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abraham, U.: Self-stabilizing timestamps. Theoretical Computer Science 308(1-3), 449–515 (2003)
Aguilera, M.K., Delporte-Gallet, C., Fauconnier, H., Toueg, S.: On implementing omega with weak reliability and synchrony assumptions. In: Proc. of the 22nd ACM symposium on Principles of distributed computing, pp. 306–314 (July 2003)
Aguilera, M.K., Delporte-Gallet, C., Fauconnier, H., Toueg, S.: Communication-efficient leader election and consensus with limited link synchrony. In: Proc. of the 23rd ACM symposium on Principles of distributed computing, pp. 328–337 (July 2004)
Arora, A., Kulkarni, S.S., Demirbas, M.: Resettable vector clocks. In: Proc. of the 19th ACM Symposium on Principles of Distributed Computing, pp. 269–278 (August 2000)
Attiya, H., Welch, J.L.: Distributed Computing: Fundamentals, Simulations, and Advanced Topics, 2nd edn. John Wiley and Sons, Inc., Chichester (2004)
Beauquier, J., Kekkonen-Moneta, S.: Fault-tolerance and self-stabilizing: Impossibility results and solutions using self-stabilizing failure detectors. International Journal of Systems Science 28(11), 1177–1187 (1997)
Birman, K.P.: Replication and Fault-Tolerance in the ISIS System. In: ACM Symposium on Operating Systems Principles (SOSP), pp. 79–86 (1985)
Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed systems. Journal of the ACM 43(2), 225–267 (1996)
Chandra, T.D., Hadzilacos, V., Toueg, S.: The weakest failure detector for solving consensus. In: Proc. of the 11th ACM Symposium on Principles of Distributed Computing, pp. 147–158 (August 1992)
Chor, B., Israeli, A., Li, M.: On processor coordination using asynchronous hardware. In: Proc. of the 6th ACM Symposium on Principles of Distributed Computing, pp. 86–97 (August 1987)
Davidovitch, L., Dolev, S., Rajsbaum, S.: Consensus continue? Stability of multi-valued continuous consensus! In: Proc. of the Workshop on Geometry and Topology in Concurrency and Distributed Computing, pp. 21–24 (October 2004)
Delporte-Gallet, C., Fauconnier, H., Guerraoui, R.: Failure Detection Lower Bounds on Registers and Consensus. In: Proc. of the 16th International Conference on Distributed Computing, pp. 237–251 (October 2002)
Dijkstra, E.W.: Self-Stabilizing Systems in spite of Distributed Control. Communications of the ACM 1(11), 643–644 (1974)
Dolev, S.: Self-stabilization. MIT Press, Cambridge (2000)
Dolev, S., Israeli, A., Moran, S.: Self-Stabilization of Dynamic Systems Assuming only Read/Write Atomicity. In: Proc. of the 9th ACM Symposium on Principles of Distributed Computing, pp. 103–117 (August 1990)
Dolev, S., Gilbert, S., Lahiani, L., Lynch, N., Nolte, T.: Virtual Stationary Automata for Mobile Networks. In: Proc. of the 2005 International Conference On Principles Of Distributed Systems (OPODIS) (2005)
Dolev, S., Gilbert, S., Lynch, N.A., Schiller, E.M., Shvartsman, A., Welch, J.L.: Virtual Mobile Nodes for Mobile Ad Hoc Networks. In: Guerraoui, R. (ed.) DISC 2004. LNCS, vol. 3274, pp. 230–244. Springer, Heidelberg (2004)
Dolev, S., Kat, R.I., Schiller, E.M.: When Consensus Meets Self-Stabilization: Self-Stabilizing Failure-Detector, Consensus and Replicated State-Machine. Computer Science, Ben-Gurion University of the Negev, TR #06-05 (May 2006)
Dolev, S., Rajsbaum, S.: Stability of long-lived consensus. Journal of Computer and System Sciences, 26–45 (August 2003)
Dolev, S., Welch, J.L.: Wait-free clock synchronization. Algorithmica 18, 486–511 (1997)
Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. Journal of the ACM 32(2), 374–382 (1985)
Freiling, F.C., Guerraoui, R., Kouznetsov, P.: The Failure Detector Abstraction. Department for Mathematics and Computer Science, University of Mannheim, TR-2006-003 (2006)
Freiling, F.C., Völzer, H.: Illustrating the impossibility of crash-tolerant consensus in asynchronous systems. ACM SIGOPS Operating Systems Review 40(2), 105–109 (2006)
Fox, A., Patterson, D.: Self-Repairing Computers. Scientific American (June 2003)
Gafni, E., Lamport, L.: Disk Paxos. Distributed Computing 16(1), 1–20 (2003)
Herlihy, M.: Wait-Free Synchronization. ACM Transactions on Programming Languages and Systems 13(1), 124–149 (1991)
Hutle, M., Widder, J.: Self-Stabilizing Failure Detector Algorithms. In: Parallel and Distributed Computing and Networks Conference, pp. 485–490 (February 2005)
Kulkarni, S.S., Arora, A.: Multitolerance in Distributed Reset. Chicago Journal of Theoretical Computer, CJTCS-1998-4 (1998)
Lamport, L.: The part-time parliament. ACM Transactions on Computer Systems 16(2), 133–169 (1998)
Lamport, L.: Time, Clocks, and Ordering of Events in a Distributed System. Communication of the ACM 21(7), 558–565 (1978)
Lo, W., Hadzilacos, V.: Using Failure Detectors to Solve Consensus in Asynchronous Shared-Memory Systems. In: Tel, G., Vitányi, P.M.B. (eds.) WDAG 1994. LNCS, vol. 857, pp. 280–295. Springer, Heidelberg (1994)
Loui, M.C., Abu-Amara, H.H.: Memory requirements for agreement among unreliable asynchronous processes. Advances in Computing Research 4, 163–183 (1987)
Lynch, N.: Distributed Algorithms. Morgan Kaufmann Publishers, San Francisco (1996)
Malkhi, D., Oprea, F., Zhou, L.: Omega Meets Paxos: Leader Election and Stability Without Eventual Timely Links. In: Fraigniaud, P. (ed.) DISC 2005. LNCS, vol. 3724, pp. 199–213. Springer, Heidelberg (2005)
Raynal, M.: A Short Introduction to Failure Detectors for Asynchronous Distributed Systems. SIGACT News column 36(1) (March 2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dolev, S., Kat, R.I., Schiller, E.M. (2006). When Consensus Meets Self-stabilization. In: Shvartsman, M.M.A.A. (eds) Principles of Distributed Systems. OPODIS 2006. Lecture Notes in Computer Science, vol 4305. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11945529_5
Download citation
DOI: https://doi.org/10.1007/11945529_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49990-9
Online ISBN: 978-3-540-49991-6
eBook Packages: Computer ScienceComputer Science (R0)