Unreliable failure detectors for reliable distributed systems

TD Chandra, S Toueg - Journal of the ACM (JACM), 1996 - dl.acm.org
TD Chandra, S Toueg
Journal of the ACM (JACM), 1996dl.acm.org
We introduce the concept of unreliable failure detectors and study how they can be used to
solve Consensus in asynchronous systems with crash failures. We characterise unreliable
failure detectors in terms of two properties—completeness and accuracy. We show that
Consensus can be solved even with unreliable failure detectors that make an infinite
number of mistakes, and determine which ones can be used to solve Consensus despite
any number of crashes, and which ones require a majority of correct processes. We prove …
We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties—completeness and accuracy. We show that Consensus can be solved even with unreliable failure detectors that make an infinite number of mistakes, and determine which ones can be used to solve Consensus despite any number of crashes, and which ones require a majority of correct processes. We prove that Consensus and Atomic Broadcast are reducible to each other in asynchronous systems with crash failures; thus, the above results also apply to Atomic Broadcast. A companion paper shows that one of the failure detectors introduced here is the weakest failure detector for solving Consensus [Chandra et al. 1992].
ACM Digital Library