We provide a complete characterization of both uniform and non-uniform deterministic consensus solvability in distributed systems with benign process and communication faults using point-set topology. More specifically, we non-trivially extend the approach introduced by Alpern and Schneider in 1985, by introducing novel fault-aware topologies on the space of infinite executions: the process-view topology, induced by a distance function that relies on the local view of a given process in an execution, and the minimum topology, which is induced by a distance function that focuses on the local view of the process that is the last to distinguish two executions. Consensus is solvable in a given model if and only if the sets of admissible executions leading to different decision values is disconnected in these topologies. By applying our approach to a wide range of different applications, we provide a topological explanation of a number of existing algorithms and impossibility results and develop several new ones, including a general equivalence of the strong and weak validity conditions.
1 Introduction
We provide a complete characterization of the solvability of deterministic non-uniform and uniform consensus in distributed systems with benign process and/or communication failures, using point-set topology as introduced in the Dijkstra Prize-winning paper by Alpern and Schneider [4]. Our results hence precisely delimit the consensus solvability/impossibility border in very different distributed systems such as dynamic networks [27] controlled by a message adversary [2], synchronous distributed systems with processes that may crash or commit send and/or receive omission failures [40], or purely asynchronous systems with crash failures [19], for example. Whereas we will primarily focus on message-passing architectures in our examples, our topological approach also covers shared-memory systems [34].
Deterministic consensus, where every process starts with some initial input value picked from a finite set \(\mathcal {V}\) and has to irrevocably compute a common output value, is arguably the most well-studied problem in distributed computing. Both impossibility results and consensus algorithm are known for virtually all distributed computing that have been proposed so far. However, they have been obtained primarily on a case-by-case basis, using classic combinatorial analysis techniques [18]. Whereas there are also some generic characterizations [31, 33], i.e., ones that can be applied to different models of computation, we are not aware of any approach that allows to precisely characterize the consensus solvability/impossibility border for arbitrary distributed systems with benign process- and communication-failures.
In this paper, we provide such a characterization based on point-set topology, as introduced by Alpern and Schneider [4]. Regarding topological methods in distributed computing, one has to distinguish point-set topology, which considers the space of infinite executions of a distributed algorithm, from combinatorial topology, which studies the topology of reachable states of prefixes of admissible executions using simplicial complexes. Figure 1 illustrates the objects studied in combinatorial topology vs. point-set topology. As of today, combinatorial topology has been developed into a quite widely applicable tool for the analysis of distributed systems [24]. A celebrated result in this area is the Asynchronous Computability Theorem [21, 25], for example, which characterizes solvable tasks in wait-free asynchronous shared memory systems with crashes.
By contrast, point-set topology has only rarely been used in distributed computing. The primary objects are the infinite executions of a distributed algorithm [4]. By defining a suitable metric between two infinite executions \(\gamma\) and \(\delta\), each considered as the corresponding infinite sequence of global states of the algorithm in the respective execution, they can be viewed as elements of a topological space. For example, according to the common-prefix metric \(d_{\max }(\gamma ,\delta)\), the executions \(\gamma\) and \(\delta\) are close if the common prefix where no process can distinguish them is long. A celebrated general result [4] is that closed and dense sets in the resulting space precisely characterize safety and liveness properties, respectively.
Prior to our paper, however, point-set topology has only occasionally been used for establishing impossibility results. We are only aware of some early work by one of the authors of this paper on a generic topological impossibility proof for consensus in compact models [37], and a topological study of the strongly dependent decision problem [9]. Lubitch and Moran [31] introduced a construction for schedulers, which leads to limit-closed submodels1 of classic non-closed distributed computing models (like asynchronous systems consisting of \(|\Pi |=n\) processes, up to which \(t\lt n-1\) may crash). In a similar spirit, Kuznetsov et al. [28] showed, in the setting of combinatorial topology, how to reason about non-closed models by considering equivalent affine tasks that are closed. Gafni et al. [20] tried to extend the ACT to also cover some non-compact shared memory models that way. A similar purpose is served by defining layerings, as introduced by Moses and Rajsbaum [33]. Whereas such constructions of closed submodels greatly simplify impossibility proofs, they do not lead to a precise characterization of consensus solvability in non-closed models: We are not aware of any proof that there is an equivalent closed submodel for every non-closed model.
Fig. 1.
Contributions. Building on our PODC’19 paper [38] devoted to consensus in dynamic networks under message adversaries [2], the present paper provides a complete topological characterization of both the non-uniform and uniform deterministic consensus solvability/impossibility border for general distributed systems with benign process and/or communication faults. To achieve this, we had to add several new topological ideas to the setting of Alpern and Schneider [4], as detailed below, which not only allowed us to deal with both closed and non-closed models, but also provided us with a topological explanation of bivalence [19] and bipotence [33] impossibility proofs. In more detail:
(i)
We introduce a simple generic system model for full-information protocols that covers all distributed system models with benign faults we are aware of. We define new topologies on the execution space of general distributed algorithms in this model, which allow us to reason about sequences of local views of (correct) processes, rather than about global configuration sequences. The p-view topology is defined by a distance function \(d_p(\gamma ,\delta)\) based on the common prefix of p’s local views in the executions \(\gamma\) and \(\delta\). The uniform and non-uniform minimum topology are induced by the last (correct) process to notice a difference between two executions. In the appendix, we introduce process-time graphs [8] as a succinct alternative to configuration sequences in executions, and show that they are equivalent w.r.t. our topological reasoning. This is accomplished by instantiating our generic system model as an “operational” system model, based on the widely applicable modeling framework introduced by Moses and Rajsbaum [33].
(ii)
We show that consensus can be modeled as a continuous decision function \(\Delta\) in our topologies, which maps an admissible execution to its unique decision value. This allows us to prove that consensus is solvable if and only if all the decision sets, i.e., the pre-images \(\Sigma _v=\Delta ^{-1}(v)\) for every decision value \(v \in \mathcal {V}\), are disconnected from each other. We also provide a universal uniform and non-uniform consensus algorithm, which rely on this separation.
(iii)
We provide an alternative characterization of uniform and non-uniform consensus solvability based on the broadcastability of the decision sets and their connected components. It applies for the usual situation where every vector of values from \(\mathcal {V}\) is an allowed assignment of the input values of the processes (which is not the case for condition-based consensus [34], however). Interestingly, our respective results imply that solving consensus with weak validity and consensus with strong validity is equivalent in any model with benign faults. Moreover, we provide a characterization of consensus solvability based on the limits of two infinite sequences of admissible executions, taken from different decision sets. Consensus is impossible if there is just one pair of such limits with distance 0, which actually coincide with the forever bivalent/bipotent executions constructed in previous proofs [19, 33].
(iv)
We apply our topological approach to different distributed computing models. This way, we provide a topological explanation of well-known classic results like bivalence proofs and consensus solvability/impossibility in synchronous systems with general omission faults. Despite the fact that consensus has been thoroughly studied in virtually any conceivable distributed computing model, we also provide some new results: We provide a new necessary and sufficient condition for solving condition-based consensus with strong validity in asynchronous shared-memory systems [34], comprehensively characterize consensus solvability in dynamic networks with both compact and non-compact message adversaries [2], and give a novel consensus algorithm that does not rely on an implementation of the \(\Omega\) failure detector for systems with an eventually timely f-source [3, 26].
Paper organization. In Section 3, we define the elements of the space that will be endowed with our new topologies in Section 4. Section 5 introduces the consensus problem in topological terms and provides our abstract characterization result for uniform consensus (Theorem 5.2) and non-uniform consensus (Theorem 5.3), which also provide universal algorithms. Alternative characterizations based on limit exclusion and broadcastability are provided in Sections 6 and 7, respectively. Our topological characterizations are complemented by Section 8, which is devoted to applications. Some conclusions in Section 9 round off our paper. In Appendix A, we introduce process-time graphs and an operationalization of our generic system model for some classic distributed computing models.
2 Related Work
Besides the few point-set topology papers [4, 9, 37] and the closed model constructions [20, 28, 31, 33] already mentioned in Section 1, there is abundant literature on consensus algorithms and impossibility proofs.
Regarding combinatorial topology, it is worth mentioning that our study of the indistinguishability relation of prefixes of executions is closely connected to connectivity properties of the r-round protocol complex. However, in non-limit-closed models, we need to go beyond a uniformly bounded prefix length. This is in sharp contrast to the models usually considered in combinatorial topology [6, 11], which are limit-closed (typically, wait-free asynchronous).
A celebrated paper on the impossibility of consensus in asynchronous systems with crash failures is by Fischer et al. [19], who also introduced the bivalence proof technique. This impossibility can be avoided by means of unreliable failure detectors [12] or condition-based approaches restricting the allowed inputs [34]. Consensus in synchronous systems with Byzantine-faulty processes has been introduced by Lamport et al. [30]. The seminal works by Dolev et al. [15] and Dwork et al. [16] on partially synchronous systems introduced important abstractions like eventual stabilization and eventually bounded message delays, and provided a characterization of consensus solvability under various combinations of synchrony and failure models. Consensus in systems with weak timely links and crash failures was considered [3, 26]. Algorithms for consensus in systems with general omission process faults were provided by Perry and Toueg [40].
Perhaps one of the earliest characterizations of consensus solvability in synchronous distributed systems prone to communication errors is the seminal work by Santoro and Widmayer [44], where it was shown that consensus is impossible if up to \(n-1\) messages may be lost in each round. This classic result was refined by several authors [13, 45] and, more recently, by Coulouma et al. [14], where a property of an equivalence relation on the sets of communication graphs was found that captures exactly the source of consensus impossibility. Following Afek and Gafni [2], such distributed systems are nowadays known as dynamic networks, where the per-round directed communication graphs are controlled by a message adversary. Whereas the paper by Coulouma et al. [14] and follow-up work [46] studied oblivious message adversaries, where the communication graphs are picked arbitrarily from a set of candidate graphs, more recent papers [10, 49] studied eventually stabilizing message adversaries, which guarantee that some rounds with “good” communication graphs will eventually be generated. Note that oblivious message adversaries are limit-closed, which is not the case for message adversaries like the eventually stabilizing ones. Raynal and Stainer explored the relation between message adversaries and failure detectors [42].
The first characterization of consensus solvability under general message adversaries was provided by Fevat and Godard [17], albeit only for systems that consist of two processes. A bivalence argument was used there to show that certain communication patterns, namely, a “fair” or a special pair of “unfair” communication patterns (see Definition 6.6 for more information), must be excluded by the message adversary for consensus to become solvable.
3 Generic System Model
We consider distributed message passing or shared memory systems made up of a set of n deterministic processes \(\Pi\) with unique identifiers, taken from \([n]=\lbrace 1,\ldots ,n\rbrace\) for simplicity. We denote individual processes by letters p, q, and so on.
For our characterization of consensus solvability, we restrict our attention to full-information executions, in which processes continuously relay all the information they gathered to all other processes, and eventually apply some local decision function. The exchanged information includes the process’s initial value, but also, more importantly, a record of all events (message receptions, shared memory readings, object invocations, ...) witnessed by the process. As such, our general system model is hence applicable whenever no constraints are placed on the size of the local memory and the size of values to be communicated (e.g., message/shared-register size). In particular, it is applicable to classical synchronous and asynchronous message-passing and shared-memory models, with benign process and communication faults. In Appendix A, we will also provide a topologically equivalent “operationalization” of our generic system model built on top of process-time graphs [8], based the modeling framework introduced by Moses and Rajsbaum [33].
Formally, a (full-information) execution is a sequence of (full-information) configurations. For every process \(p\in \Pi\), there is an equivalence relation \(\sim _p\) on the set \(\mathcal {C}\) of configurations—the p-indistinguishability relation—indicating whether process p can locally distinguish two configurations, i.e., if it has the same view\(V_p(C)=V_p(D)\) in C and D. In this case we write \(C\sim _p D\). Note that two configurations that are indistinguishable for all processes need not be equal. In fact, configurations usually include some state of the communication media that is not accessible to any process.
In addition to the indistinguishability relations, we assume the existence of a function \(Ob:\mathcal {C}\rightarrow 2^\Pi\) that specifies the set of obedient processes in a given configuration. Obedient processes must follow the algorithm and satisfy the (consensus) specification; usually, \(Ob(C)\) is the set of non-faulty processes. Again, this information is usually not accessible to the processes. We make the restriction that disobedient processes cannot recover and become obedient again, i.e., that \(Ob(C) \supseteq Ob(C^{\prime })\) if \(C^{\prime }\) is reachable from C. We extend the obedience function to the set \(\Sigma \subseteq \mathcal {C}^\omega\) of admissible executions in a given model by setting \(Ob:\Sigma \rightarrow 2^\Pi\), \(Ob(\gamma) = \bigcap _{t\ge 0} Ob(C^t)\) where \(\gamma = (C^t)_{t\ge 0}\). Here, \(t \in \mathbb {N}_0 = \mathbb {N}\cup \lbrace 0\rbrace\) denotes a notion of global time that is not accessible to the processes. Consequently, a process is obedient in an execution if it is obedient in all of its configurations. We further make the restriction that there is at least one obedient process in every execution, i.e., that \(Ob(\gamma)\ne \emptyset\) for all \(\gamma \in \Sigma\). Moreover, we assume that \(Ob(C)=\Pi\) for every initial configuration, in order to make input value assignments (see below) well-defined for all processes.
We also assume that every process has the possibility to weakly count the steps it has taken. Formally, we assume the existence of weak clock functions \(\chi _p:\mathcal {C}\rightarrow \mathbb {N}_0\) such that for every execution \(\delta = (D^t)_{t\ge 0}\in \Sigma\) and every configuration \(C\in \mathcal {C}\), the relation \(C\sim _p D^t\) implies \(t\ge \chi _p(C)\). Additionally, we assume that \(\chi _p(D^t)\rightarrow \infty\) as \(t\rightarrow \infty\) for every execution \(\delta \in \Sigma\) and every obedient process \(p\in Ob(\delta)\). \(\chi _p\) hence ensures that a configuration \(D^t\) where p has some specific view \(V_p(D^t)=V_p(C)\) cannot occur before time \(t=\chi _p(C)\) in any execution \(\delta\). Our weak clock functions hence allow to model lockstep synchronous rounds by choosing \(\chi (D^t)=t\) for any execution \(\delta = (D^t)_{t\ge 0}\in \Sigma\), but are also suitable for modeling non-lockstep, even asynchronous, executions (see Appendix A.2).
For the discussion of decision problems, we need to introduce the notion of input values, which will also be called initial values in the sequel. Since we limit ourselves to the consensus problem, we need not distinguish between the sets of input values and output values. We thus just assume the existence of a finite set \(\mathcal {V}\) of potential input values, and require that the potential output values are also in \(\mathcal {V}\). Furthermore, the initial configuration \(I=I(\gamma)\) of any execution \(\gamma\) is assumed to contain an input value \(I_p \in \mathcal {V}\) for every process \(p\in \Pi\). This information is locally accessible to the processes, i.e., each process can access its own initial value (and those it has heard from). We assume that there is a unique initial configuration for every input-value assignment of the processes.
A decision algorithm is a collection of functions \(\Delta _p:\mathcal {C}\rightarrow \mathcal {V}\cup \lbrace \perp \rbrace\) such that \(\Delta _p(C) = \Delta _p(D)\) if \(C \sim _p D\) and \(\Delta _p(C^{\prime }) = \Delta _p(C)\) if \(C^{\prime }\) is reachable from C and \(\Delta _p(C)\ne \perp\), where \(\perp \not\in \mathcal {V}\) represents the fact that p has not decided yet. That is, decisions depend on local information only and are irrevocable. Every process p thus has at most one decision value in an execution. We can extend the decision function to executions by setting \(\Delta _p:\Sigma \rightarrow \mathcal {V}\cup \lbrace \perp \rbrace\), \(\Delta _p(\gamma) = \lim _{t\rightarrow \infty } \Delta _p(C^t)\) where \(\gamma = (C^t)_{t\ge 0}\). We say that p has decided value \(v\ne \perp\) in configuration C or execution \(\gamma\) if \(\Delta _p(C)=v\) or \(\Delta _p(\gamma)=v\), respectively.
We will consider both non-uniform and uniform consensus with either weak or strong validity as our decision tasks, which are defined as follows:
Note that we will primarily focus on consensus with weak validity, which is the usual meaning of the term consensus unless otherwise noted.
By Termination, Agreement, and the fact that every execution has at least one obedient process, for every consensus algorithm, we can define the consensus decision function \(\Delta : \Sigma \rightarrow \mathcal {V}\) by setting \(\Delta (\gamma) = \Delta _p(\gamma)\) where p is any process that is obedient in execution \(\gamma\), i.e., \(p\in Ob(\gamma)\). Recall that the initial value of process p in the execution \(\gamma\) is denoted \(I_p(\gamma)\) or just \(I_p\) if \(\gamma\) is clear from the context.
To illustrate2 the difference between uniform and non-uniform consensus, as well as to motivate the two topologies serving to characterize their solvability, consider the example of two synchronous non-communicating processes. The set of processes is \(\Pi = \lbrace 1,2\rbrace\) and the set of possible values is \(\mathcal {V}= \lbrace 0,1\rbrace\). Processes proceed in lock-step synchronous rounds, but do not communicate. Thus, the only information a process has access to is its own initial value and the current time. The set of executions \(\Sigma\) and the obedience function Ob are defined such that one of the processes eventually becomes disobedient in every execution, but not both processes. In this model, it is trivial to solve non-uniform consensus by immediately deciding on one’s own initial value, but uniform consensus is impossible.
4 Topological Structure of Full-Information Executions
In this section, we will endow the various sets introduced in Section 3 with suitable topologies. We first recall briefly the basic topological notions that are needed for our exposition. For a more thorough introduction, however, the reader is advised to refer to a textbook [36].
A topology on a set X is a family \(\mathcal {T}\) of subsets of X such that \(\emptyset \in \mathcal {T}\), \(X \in \mathcal {T}\), and \(\mathcal {T}\) contains all arbitrary unions as well as all finite intersections of its members. We call X endowed with \(\mathcal {T}\), often written as \((X, \mathcal {T})\), a topological space and the members of \(\mathcal {T}\) open sets. The complement of an open set is called closed and sets that are both open and closed, such as \(\emptyset\) and X itself, are called clopen. A topological space is disconnected, if it contains a nontrivial clopen set, which means that it it can be partitioned into two disjoint open sets. It is connected if it is not disconnected.
A function from space X to space Y is continuous if the pre-image of every open set in Y is open in X. Given a space \((X, \mathcal {T})\), \(Y \subseteq X\) is called a subspace of X if Y is equipped with the subspace topology \(\lbrace Y \cap U \mid U \in \mathcal {T}\rbrace\). Given \(A \subseteq X\), the closure of A is the intersection of all closed sets containing A. For a space X, if \(A \subseteq X\), we call x a limit point of A if it belongs to the closure of \(A \setminus \lbrace x \rbrace\). It can be shown that the closure of A is the union of A with all limit points of A. Space X is called compact if every family of open sets that covers X contains a finite sub-family that covers X.
If X is a nonempty set, then we call any function \(d:X\times X\rightarrow \mathbb {R}_+\) a distance function on X. Define \(\mathcal {T}_d \subseteq 2^X\) by setting \(U\in \mathcal {T}_d\) if and only if for all \(x\in U\) there exists some \(\varepsilon \gt 0\) such that \(B_\varepsilon (x) = \lbrace y \in X \mid d(x,y) \lt \varepsilon \rbrace \subseteq U\).
Many topological spaces are defined by metrics, i.e., symmetric, positive definite distance functions for which the triangle inequality \(d(x,y) \le d(x,z)+d(z,y)\) holds for any \(x,y,z \in X\). For a distance function to define a (potentially non-metrizable) topology though, no additional assumptions are necessary:
We will henceforth refer to \(\mathcal {T}_d\) as the topology induced by d.
An execution is a sequence of configurations, i.e., an element of the product space \(\mathcal {C}^\omega\). Since our primary objects of study are executions, we will endow this space with a topology as follows: The product topology, which is a distinguished topology on any product space \(\Pi _{\iota \in I} X_\iota\) of topological spaces, is defined as the coarsest topology such that all projection maps \(\pi _i:\Pi _{\iota \in I}X_\iota \rightarrow X_i\) (where \(\pi _i\) extracts the i-th element of the sequence) are continuous. Recall that a topology \(\mathcal {T}^{\prime }\) is coarser than a topology \(\mathcal {T}\) for the same space if every open set \(U \in \mathcal {T}^{\prime }\) is also open in \(\mathcal {T}\).
It turns out that the product topology on the space \(\mathcal {C}^\omega\) is induced by a distance function, whose form is known in a special case that covers our needs:
4.1 Process-view Distance Function for Executions
In previous work on point-set topology in distributed computing [4, 37], the set of configurations \(\mathcal {C}\) of some fixed algorithm \(\mathcal {A}\) was endowed with the discrete topology, where every subset \(U \subseteq \mathcal {C}\) is open. The discrete topology is induced by the discrete metric \(d_{\max }(C,D)=1\) if \(C\ne D\) and 0 otherwise (for configurations \(C,D\in \mathcal {C}\)). Moreover, \(\mathcal {C}^{\omega }\) was endowed with the corresponding product topology, which is induced by the common-prefix metric
where \(\gamma = (C^t)_{t\ge 0}\) and \(\delta = (D^t)_{t\ge 0}\), according to Lemma 4.2. Informally, \(d_{\max }(\gamma ,\delta)\) decreases with the length of the common prefix where no process can distinguish \(\gamma\) and \(\delta\).
By contrast, we define the p-view distance function\(d_p\) on the set \(\mathcal {C}\) of configurations for every process \(p\in \Pi\) by
\begin{equation*} d_p(C,D) = {\left\lbrace \begin{array}{ll}0 & \text{if } C \sim _p D \text{ and } p\in Ob(C)\cap Ob(D)\text{, or } C=D\\ 1 & \text{otherwise} \hspace{5.0pt}. \end{array}\right.} \end{equation*}
Extending this distance function from configurations to executions, we define the p-view distance function by
Note that \(d_{\mathrm{u}}\) does not necessarily satisfy the triangle inequality (nor definiteness): There may be executions with \(d_{p}(\beta ,\gamma)=0\) and \(d_{q}(\gamma ,\delta)=0\) but \(d_{r}(\beta ,\delta)\gt 0\) for all \(r\in \Pi\). Hence, the topology on \(\mathcal {C}^{\omega }\) induced by \(d_{\mathrm{u}}\) lacks many of the convenient (separation) properties of metric spaces, but will turn out to be sufficient for the characterization of the solvability of uniform consensus (see Theorem 5.2).
The next lemma shows that the decision function of an algorithm that solves uniform consensus is always continuous with respect to the uniform topology.
For an illustration in our non-communicating two-process example, denote by \(\gamma ^{(T)}\) the execution in which process 1 has initial value 0, process 2 has initial value 1, and process 1 becomes disobedient at time T. Similarly, denote by \(\delta ^{(U)}\) the execution with the same initial values and in which process 2 becomes disobedient at time U. Since there is no means of communication between the two processes, by Validity, each obedient process necessarily has to eventually decide on its own initial value, i.e., \(\Delta (\gamma ^{(T)}) = 1\) and \(\Delta (\delta ^{(T)}) = 0\). The uniform distance between these executions is equal to \(d_{\mathrm{u}}(\gamma ^{(T)}, \delta ^{(U)}) = 2^{-\max \lbrace T,U\rbrace }\). Thus, every \(\varepsilon\)-neighborhood of \(\gamma ^{(T)}\) contains execution \(\delta ^{(U)}\) if U is chosen large enough to ensure \(2^{-U} \lt \varepsilon\). The set of 1-deciding executions is thus not open in the uniform topology. But this means that the algorithm’s decision function \(\Delta\) cannot be continuous. Lemma 4.4 hence implies that there is no uniform consensus algorithm in the non-communicating two-process model (which is also confirmed by the more realistic application example in Section 8.2).
4.3 Non-uniform Topology for Executions
Whereas the p-view distance function given by Section 4.1 is also adequate for non-uniform consensus, this is not the case for the uniform distance function as defined in Section 4.2. The appropriate non-uniform minimum topology (abbreviated non-uniform topology) on the set \(\Sigma\) of executions is induced by the distance function
Like for \(d_{\mathrm{u}}\), neither definiteness nor the triangle inequality need to be satisfied by \(d_{\mathrm{nu}}\). The resulting non-uniform topology is finer than the uniform topology, however, since the minimum is taken over the smaller set \(Ob(\gamma)\cap Ob(\delta)\subseteq \Pi\), which means that \(d_{\mathrm{u}}(\gamma ,\delta) \le d_{\mathrm{nu}}(\gamma ,\delta)\). In particular, this implies that every decision function that is continuous with respect to the uniform topology is also continuous with respect to the non-uniform topology. Of course, this also follows from Lemma 4.4 and the fact that every uniform consensus algorithm also solves non-uniform consensus.
The following Lemma 4.5 is the analog of Lemma 4.4:
For an illustration in the non-communicating two-process example used in Section 4.2, note that the trivial algorithm that immediately decides on its initial value satisfies \(\Delta (\gamma ^{(T)}) = 1\) and \(\Delta (\delta ^{(U)}) = 0\). The algorithm does solve non-uniform consensus, since it is guaranteed that one of the processes eventually becomes disobedient. In contrast to the uniform distance function, the non-uniform distance function satisfies \(d_{\mathrm{nu}}(\gamma ^{(T)}, \delta ^{(U)}) = 1\) since \(Ob(\gamma ^{(T)}) \cap Ob(\delta ^{(U)}) = \emptyset\). This means that the minimum distance between any 0-deciding and any 1-deciding execution is at least 1. It is hence possible to separate the two sets of executions by sets that are open in the non-uniform topology, so consensus is solvable here, according to the considerations in the following section. Again, this is confirmed by the more realistic application example in Section 8.2.
5 General Consensus Characterization for Full-Information Executions
In this section, we will provide our main topological conditions for uniform and non-uniform consensus solvability.
If \(\Sigma\) has only finitely many connected components, i.e., only finitely many maximal connected sets, then every connected component is necessarily clopen. Consequently, these characterizations give rise to the following meta-procedure for determining whether consensus is solvable and constructing an algorithm if it is. It requires knowledge of the connected components of the space \(\Sigma\) of admissible executions with respect to the appropriate topology:
(1)
Initially, start with an empty set \(\Sigma _v\) for every value \(v\in \mathcal {V}\).
(2)
Add to \(\Sigma _v\) the connected components of \(\Sigma\) that contain an execution with a v-valent initial configuration.
(3)
Add any remaining connected component of \(\Sigma\) to an arbitrarily chosen set \(\Sigma _v\).
(4)
If the sets \(\Sigma _v\) are pairwise disjoint, then consensus is solvable. In this case, the sets \(\Sigma _v\) determine a consensus algorithm via the universal algorithm given in the proofs of Theorems 5.2 and 5.3. If the \(\Sigma _v\) are not pairwise disjoint, then consensus is not solvable.
Obviously, our solution algorithms need to know the decision sets \(\Sigma _v\), \(v\in \mathcal {V}\). As they usually contain uncountable many infinite executions, the question of how to obtain them in practice appears. In Section 8, we will provide several instances of labeling algorithms, which can be used here. They are based on labeling prefixes of executions, so can in principle even be computed incrementally by the processes on-the-fly during the executions.
6 Limit-based Consensus Characterization
It is possible to shed some additional light on our general consensus characterization by considering limit points. In particular, Theorem 6.4 will show that consensus is impossible if and only if certain limit points in the appropriate topologies are admissible.
Before we state our general results, we illustrate the underlying idea in a slightly restricted setting, namely when the underlying space \(\Sigma\) of configuration sequences is contained in a compact set \(K\subseteq \mathcal {C}^{\omega }\). Whereas one cannot assume this in general, it can be safely assumed in settings where the operationalization of our system model based on process-time graphs, as described in Appendix A, applies: Since the set of all process-time graphs \(\mathcal {PT}^{\omega }\) turns out to be compact and the transition function \(\hat{\tau }: \mathcal {PT}^{\omega }\rightarrow \mathcal {C}^{\omega }\) is continuous, according to Lemma A.2, we can consider the compact set \(K=\hat{\tau }(\mathcal {PT}^{\omega })\) instead of \(\mathcal {C}^{\omega }\). In this case, it is not difficult to show that \(d_p(A,B)=0\) if and only if there is a sequence of executions \(\alpha _k\) in A and a sequence of executions \(\beta _k\) in B such that both sequences converge to the same limit with respect to \(d_p\).
This distance-based characterization allows us to distinguish three cases that cause \(d_p(A,B)=0\): (i) If \(\hat{\alpha } \in A\cap B \ne \emptyset\), one can choose the sequences defined by \(\alpha _k=b_k=\hat{\alpha }=\hat{\beta }\), \(k\ge 1\). (ii) If \(A\cap B = \emptyset\) and \(\hat{\alpha }=\hat{\beta }\), there is a “fair” execution [17] as the common limit. (iii) If \(A\cap B = \emptyset\) and \(\hat{\alpha }\ne \hat{\beta }\), there is a pair of “unfair” executions [17] acting as limits, which have distance 0 (and are hence also common limits w.r.t. the distance function \(d_p\)). We note, however, that due to the non-definiteness of the pseudometric \(d_p\) (recall Lemma 4.3) and the resulting non-uniqueness of limits in the p-view topology, (iii) are actually two instances of (ii). Corollary 6.7 below will reveal that consensus is solvable if and only if no decision set \(\Sigma _v\) contains any fair or unfair execution w.r.t any \(\Sigma _w\), \(v\ne w\).
Unfortunately, generalizing the above distance-based characterizaion from p-view-topologies to the uniform and non-uniform topologies is not possible: Albeit every convergent infinite sequence \((\alpha ^t)\) w.r.t. \(d_{\mathrm{u}}\) (Section 4.2) resp. \(d_{\mathrm{nu}}\) (Section 4.3) also contains a convergent subsequence w.r.t. some (obedient) \(d_p\) by the pigeonhole principle, one might observe a different \(d_{p^{\prime }}\) for the convergent subsequence of \((\beta ^t)\). In this case, not even \(d_p(\hat{\alpha },\hat{\beta })=0\) or \(d_{p^{\prime }}(\hat{\alpha },\hat{\beta })=0\) would guarantee \(d_{\mathrm{u}}(A,B)=0\) resp. \(d_{\mathrm{nu}}(A,B)=0\), as the triangle inequality does not hold in these topologies.
On the other hand, \(d_{\mathrm{u}}(A,B)=0\) resp. \(d_{\mathrm{nu}}(A,B)=0\) is trivially guaranteed if it is the case that \(\hat{\alpha } \in B\) or \(\hat{\beta } \in A\): If, say, \(\hat{\alpha } \in B\), one can choose the constant sequence \((\beta ^t)=(\hat{\alpha })\in B^\omega\), which obviously converges to \(\hat{\alpha }\) in any p-view topology, including the particular \(d_p\) obtained for the convergent subsequence of \((\alpha ^t)\rightarrow \hat{\alpha }\) by the abovementioned pigeonhole argument. Consequently, \(d_p(A,B)=0\) and hence also \(d_{\mathrm{u}}(A,B)=0\) resp. \(d_{\mathrm{nu}}(A,B)=0\). This implies the following “if-part” of our distance-based characterization, which even holds for non-compact \(\mathcal {C}^{\omega }\):
In order to obtain the general limit-based consensus characterization stated in Theorem 6.4 below, we will not use set distances directly, however, but rather the following Separation Lemma 6.3 from [36]:
Applying Lemma 6.3 to the findings of Theorem 5.2 resp. Theorem 5.2, the following general consensus characterization can be proved:
Note that Theorem 6.4 immediately implies the following properties of the distances of the decision sets in the case consensus is solvable in a model:
Our characterization Theorem 6.4 can also be expressed via the exclusion of fair/unfair executions [17]:
An illustration of our limit-based characterizations is provided by Figure 4. Note carefully that, in the uniform case, a fair/unfair execution \(\rho\) where some process p becomes disobedient in round t implies that the same happens in all \(\alpha \in B_{2^{-t}}(\rho) \cap \Sigma _v\) and \(\beta \in B_{2^{-t}}(\rho) \cap \Sigma _w\). On the other hand, if p does not become disobedient in \(\rho\), it may still be the case that p becomes disobedient in every \(\alpha _k\) in the sequence converging to \(\rho\), at some time \(t_k\) with \(\lim _{k\rightarrow \infty } t_k=\infty\). In the non-uniform case, neither of these possibilities exists: p cannot be disobedient in the limit \(\rho\), and any \(\alpha _k\) where p is not obedient is also excluded as its distance to any other sequence is 1.
7 Consensus Characterization in Terms of Broadcastability
We will now develop another characterization of consensus solvability, with rests on the broadcastability of the decision sets \(\Sigma _v \subseteq \Sigma\) and their connected components \(\Sigma _{\gamma } \subseteq \Sigma _v\). It will explain topologically why the existence of a broadcaster is mandatory for solving the “standard version” of consensus, where any assignment of inputs from \(\mathcal {V}\) is permitted. We start with some definitions needed for formalizing this condition:
The independent arbitrary input assignment condition stated in Definition 7.2 secures that, for every execution \(\gamma\) with initial value assignment \(I(\gamma)\), there is an isomorphic execution \(\delta\) w.r.t. the HO sets of all processes that starts from an arbitrary other initial value assignment \(I(\delta)\).
In the main results of this section (Theorem 7.12 resp. Theorem 7.13), we will not only provide a necessary and sufficient condition for solving this variant of uniform resp. non-uniform consensus based on broadcastability, but also establish the general equivalence of weak validity (V) and strong validity (SV) (recall Definition 3.1). For binary consensus, i.e., \(|\mathcal {V}|=2\), this is a well-known fact [7, Example 5.1], for larger input sets, it was, to the best of our knowledge, not known yet.
Since the concise but quite technical proofs of Theorems 7.12 and 7.13 somehow obfuscate the actual cause of this equivalence (and the way we actually discovered it), we first provide an alternative explanation based on the broadcastability of connected components in the following Section 7.1, which also allows us to establish some basic results needed in Section 8.4.
7.1 Broadcastability of Connected Components
Lemma 7.4 below reveals that if consensus (with weak validity) and independent arbitrary inputs is solvable, then every connected component of \(\Sigma\) needs to be broadcastable.
In addition, Lemma 7.6 below reveals that any connected broadcastable set has a diameter strictly smaller than 1.
To emphasize the key role of the consequences of Corollary 7.7 for the equivalence of weak validity (V) and strong validity (SV), where in (SV) the consensus decision value must be the initial value of some process, we first observe that the transition from (V) to (SV) in our Theorem 5.2 resp. Theorem 5.3 just requires the replacement of condition 2., i.e., “If execution \(\gamma \in \Sigma\) is v-valent, then \(\gamma \in \Sigma _v\)”, by “If execution \(\gamma \in \Sigma _v\), then there is a process p with initial initial value \(I_p(\gamma)=v\)”. This change would result in strong versions of our theorems, since the above modification is in fact transparent for the proofs of Theorems 5.2 and 5.3. Note also that both versions are equivalent for v-valent executions. Similarly, to obtain a strong version of our meta-procedure, step (3) “Add any remaining connected component of \(\Sigma\) to an arbitrarily chosen set \(\Sigma _v\)” must be replaced by “Add every remaining connected component \(\Sigma _\gamma \subseteq \Sigma\), where execution \(\gamma \in \Sigma _\gamma\) is arbitrary, to any set \(\Sigma _v\), where v is the initial value \(I_b(\gamma)=v\) of a process b that is a broadcaster in every execution \(\gamma ^{\prime }\in \Sigma _\gamma\)”.
The crucial role of Corollary 7.7 is that it makes this modification always possible also in the case of multi-valued consensus (in the case of binary consensus, it is obvious), as it reveals that if weak consensus is solvable, then every connected component \(\Sigma _\gamma\) must have at least one common broadcaster \(b=b(\gamma ^{\prime })=b(\Sigma _\gamma)\) that has the same initial value \(I_b(\gamma ^{\prime })=I_b(\gamma)= I_b(\Sigma _\gamma)\) in all executions \(\gamma ^{\prime }\in \Sigma _\gamma\). Consequently, if decision sets resp. a meta-procedure exists that allows to solve consensus with weak validity according to Theorems 5.2 and 5.3, one can always reshuffle the connected components to form strong decision sets, which use the initial value of some broadcaster for assigning a connected component to a decision set:
Note that strong decision sets need not be unique, as some connected component \(\Sigma _\gamma\) might have several broadcasters, any of which could be used for determining its decision value v. The canonical choice to make it uniquely defined is to take the lexically smallest \(p=b(\Sigma _\gamma)\) among all broadcasters \(p^{\prime } \ge p\) in \(\Sigma _\gamma\). In the rest of our paper, all strong decision sets will be canonical.
Since the canonical strong decision sets that can be formed via the abovementioned reshuffling are easily shown to satisfy the strong versions of Theorems 5.2 and 5.3, one obtains the broadcast-based characterization of consensus stated in Theorem 7.9. Rather than proving it by formalizing the reasoning sketched above, however, we will rely on the general equivalence results Theorem 7.12 resp. Theorem 7.13 developed in the following Section 7.2. This way, the somewhat tedious and non-constructive reshuffling of connected components involved in the direct proof can be replaced by an explicit construction of the canonical strong decision sets, which utilizes binary consensus.
7.2 General Broadcast-based Characterization
We will now provide our general broadcast-based characterization for uniform and non-uniform consensus with arbitrary and independent input assignments according to Definition 7.2. In a nutshell, it uses a reduction to (the solvability of) binary consensus, where weak and strong validity are trivially equivalent, for explicitly constructing the canonical strong broadcaster decision sets.
Let \(\hat{\Sigma } \subseteq \Sigma\) denote the set of admissible executions of a multi-valued consensus algorithm starting from a single initial value assignment \(\hat{I}:\Pi \rightarrow \mathcal {V}\) (any will do, the choice is arbitrary).
With this result, we can prove the following equivalences Theorem 7.12 resp. Theorem 7.13 for uniform and non-uniform consensus:
We conclude this section by pointing out that the practical utility of the equivalence of consensus with weak and strong validity established in Theorems 7.12 and 7.13 is somewhat limited: Since the solution algorithms depend on the a priori knowledge of the decision sets, they do not give a clue on how to develop a strong consensus algorithm from a weak consensus algorithm in a given model. In fact, determining and agreeing upon a broadcaster in executions that are not v-valent is a very hard problem.
8 Applications
In this section, we will apply our topological characterizations of consensus solvability to several different examples. Apart from providing a topological explanation of bivalence proofs (Section 8.1) and folklore results for synchronous consensus under general omission faults (Section 8.2), we will provide a novel characterization of condition-based asynchronous consensus [34] with strong validity (Section 8.3), a complete characterization of consensus solvability for dynamic networks with both closed (Section 8.4) and non-closed (Section 8.5) message adversaries, and a consensus algorithm for asynchronous systems with weak timely links that does not rely on an implementation of the \(\Omega\) failure detector (Section 8.6).
8.1 Bivalence-based Impossibilities
Our topological results shed some new light on the now standard technique of bivalence-based impossibility proofs introduced in the celebrated FLP paper [19], which have been generalized [33] and used in many different contexts: Our results reveal that the forever bivalent executions constructed inductively in bivalence proofs [10, 44, 45, 49] are just the common limit of two infinite sequence of executions \(\alpha _0,\alpha _1,\ldots\) in the 0-decision set \(\Sigma _0\) and \(\beta _0,\beta _1,\ldots\) in the 1-decision set \(\Sigma _1\).
More specifically, what is common to these proofs is that one shows that, for any consensus algorithm, there is an admissible forever bivalent execution \(\gamma\). This is usually done inductively, by showing that there is a bivalent initial configuration and that, given a bivalent configuration \(C^{t-1}\) at the end of round \(t-1\), there is a 1-round extension leading to a bivalent configuration \(C^t\) at the end of round t. By definition, bivalence of \(C^t\) means that there are two admissible executions \(\alpha _t\) with decision value 0 and \(\beta _t\) with decision value 1 starting out from \(C^t\), i.e., having a common prefix that leads to \(C^t\). Consequently, their distance satisfies \(d_{\mathrm{nu}}(\alpha _t,\gamma) \lt 2^{-t}\) and \(d_{\mathrm{nu}}(\beta _t,\gamma) \lt 2^{-t}\). But then closedness of \(\Sigma _0\) and \(\Sigma _1\) implies that \(\gamma \in \Sigma _0\cap \Sigma _1\), a contradiction to their disjointness.
By construction, the \((t-1)\)-prefixes of \(\alpha _t\) and \(\alpha _{t-1}\) are the same for all t, which implies that they converge to a limit \(\hat{\alpha }\) (and analogously for \(\hat{\beta }\)), see Figure 4 for an illustration. Therefore, these executions match Definition 6.6, and Corollary 6.7 implies that the stipulated consensus algorithm cannot be correct. A specific example is the lossy-link impossibility [44], i.e., the impossibility of consensus under an oblivious message adversary for \(n=2\) that may choose any graph out of the set \(\lbrace \leftarrow ,\leftrightarrow ,\rightarrow \rbrace\), and the impossibility of solving consensus with vertex-stable source components with insufficient stability interval [10, 49]. In the case of the oblivious lossy-link message adversary using the reduced set \(\lbrace \leftarrow ,\rightarrow \rbrace\) considered by Coulouma et al. [14], consensus is solvable and there is no forever bivalent execution. Indeed, there exists a consensus algorithm where all configurations reached after the first round are already univalent, see Section 8.4.
8.2 Consensus in Synchronous Systems with General Omission Process Faults
As a more elaborate example of systems where the solvability of non-uniform and uniform consensus may be different (which also cover the simple running examples used in Section 4), we take synchronous systems with up to f general omission process faults [40]. For \(n\ge f+1\), non-uniform consensus can be solved in \(f+1\) rounds, whereas solving uniform consensus requires \(n \ge 2f+1\).
The impossibility proof of uniform consensus for \(n \le 2f\) uses a standard partitioning argument, splitting \(\Pi\) into a set P of processes with \(\vert P\vert =f\) and Q with \(\vert Q\vert =n-f\le f\). One considers an admissible execution \(\alpha _0\) where all processes \(p\in \Pi\) start with \(I_p=0\), the ones in P are correct, and the ones in Q are initially mute; the decision value of the processes in P must be 0 by validity. Similarly, \(\alpha _1\) starts from \(I_p=1\), all processes in Q are correct and the ones in P are initially mute; the decision value is hence 1. For another execution \(\alpha\), where the processes in Q are correct and the ones in P are general omission faulty, in the sense that every \(p\in P\) does not send and receive any message to/from Q, one observes \(\alpha \sim _{p}\alpha _0\), i.e., \(d_p(\alpha , \alpha _0) \lt 2^{-t}\) for all \(t\ge 0\) and all \(p\in P\). Similarly, \(\alpha \sim _{q}\alpha _1\) for every \(q\in Q\). Hence, p and q decide on different values in \(\alpha\).
Topologically, this is equivalent to \(d_{\mathrm{u}}(\alpha ,\alpha _0)=0\) as well as \(d_{\mathrm{u}}(\alpha ,\alpha _1)=0\), which implies \(\alpha \in \Sigma _0\) as well as \(\alpha \in \Sigma _1\). Consequently, \(\Sigma _0\) and \(\Sigma _1\) cannot be disjoint, as needed for uniform consensus solvability. Clearly, for \(n\ge 2f+1\), this argument is no longer applicable. And indeed, algorithms like the one proposed by Parvedy and Raynal [39] can be used for solving uniform consensus.
If one revisits the topological equivalent of the above partitioning argument for \(n \le 2f\) in the non-uniform case, it turns out that still \(d_{\mathrm{nu}}(\alpha ,\alpha _0)=0\), but \(d_{\mathrm{nu}}(\alpha ,\alpha _1)=1\) as all processes in Q are faulty. Consequently, \(\alpha \not\in \Sigma _1\). So \(\Sigma _0\) and \(\Sigma _1\) could partition the space of admissible executions. And indeed, non-uniform consensus can be solved in \(f+1\) rounds here. In order to demonstrate this by means of our Theorem 5.3, we will sketch how the required decision sets \(\Sigma _v\) can be constructed. We will do so by means of a simple labeling algorithm, which assigns a decision value \(v \in \mathcal {V}\) to every admissible execution \(\gamma\). Note that synchronous systems are particularly easy to model in our setting, since we can use the number of rounds as our global time t.
Clearly, every process that omits to send its state in some round to a (still) correct processor is revealed to every other (still) correct processor at the next round at the latest. This implies that every correct process p seen by some correct process q by the end of the \((f+1)\)-round prefix \(\gamma |_{f+1}\) in the admissible execution \(\gamma\) has also been seen by every other correct process during \(\gamma |_{f+1}\) as well, since one would need a chain of \(f+1\)different faulty processes for propagating p’s state to q otherwise. Thus, p must have managed to broadcast its initial value \(I_p(\gamma)\) to all correct processes during \(\gamma |_{f+1}\).
Consequently, if \(\gamma |_{f+1} \sim \rho |_{f+1}\), where \(\sim\) denotes the transitive closure (over all processes \(p\in \Pi\)) of the indistinguishability relation \(\sim _p\) for prefixes, they must have the same set of broadcasters. Our labeling algorithm hence just assigns to \(\gamma\) the initial value \(I_p\) of the, say, lexically smallest broadcaster p in \(\gamma |_{f+1}\). The resulting decision sets are trivially open since, for every \(\gamma \in \Sigma _v\), we have \(B_{2^{-(f+1)}}(\gamma) \subseteq \Sigma _v\) as well. The generic non-uniform consensus algorithm from Theorem 5.3 resp. Theorem 7.12 can hence be used for solving weak resp. strong consensus.
8.3 Asynchronous Condition-based Consensus
As an example of asynchronous consensus in shared-memory systems, we consider the condition-based approach presented by Mostefaoui et al. [34]. In order to circumvent the FLP impossibility [19] of consensus in the presence of process crashes, the authors considered restrictions of the vectors of allowed initial values \(I(\gamma)\) for the admissible executions \(\gamma \in \Sigma\) of the n processes in the system. To ensure compatibility with the notation used in the original paper [34], we will write \(I[1],\ldots ,I[n]\) instead of \(I_1,\ldots ,I_n\) for the initial value assignment of a given admissible execution in this section. For a set \(C \subseteq \mathcal {V}^n\) of allowed input vectors (called a condition) that is a priori known to all processes, the authors asked for properties C must satisfy such that uniform consensus can be solved in the presence of up to f crashes. Note carefully that this is an instance of consensus where the arbitrary input assumption does not apply, albeit the independent input assumption (recall Definition 7.2) is needed.
Two such properties were identified in [34]: (i) the more practical f-acceptability property, which consists of “elements” that can be directly utilized in a generic solution algorithm, and (ii) the more abstract f-legality condition. Moreover, two different variants of consensus were considered: (a) non-safe consensus, which only needs to terminate when the initial values are indeed from C, and (b) safe consensus, where the processes must also terminate for arbitrary inputs in well-behaved (in particular, fault-free) executions. Interestingly, it turned out that (i) and (ii), as well as (a) and (b), are equivalent, and that either variant of consensus can be solved in the presence of up to f crashes if and only if C is f-legal or/and f-acceptable [34, Theorem 5.7].
The generic nonsafe solution algorithm for an f-acceptable condition C is extremely simple: It only uses one round, where process \(p_i\) first writes its initial value \(I[i]\) into its entry \(V[i]\) of a snapshot object V that is initialized to \(V[*]=\bot\), and then performs snapshot reads that provide its current local view \(V_i\) until it finds at least \(n-f\) non-\(\bot\) entries in \(V_i\). The latter condition terminates the round, at the end of which \(p_i\) uses the “elements” making up f-acceptability for computing the decision value from its final view \(V_i\). Note that a \(\bot\) entry in \(V_i[j]\) can be due to a crash of \(p_j\) or just a consequence of the fact that \(p_j\) has just been slow compared to the at least \(n-f\) other processes that managed to provide non-\(\bot\) entries. To make this algorithm compatible with our setting, where all executions are infinite, we just add infinitely many empty rounds (where no process changes its state or reads/writes V). Moreover, we consider all processes to be obedient and just make at most f of them very slow when needed, which allows us to directly use our uniform topology.
The definition of f-legality is based on an undirected graph \(H(C,f)\), whose vertices are the vectors in C and where there is an edge \((I1,I2)\) if and only if the Hamming distance between \(I1 \in C\) and \(I2 \in C\) is at most f. The graph \(H(C,f)\) can be expanded into a graph \(Gin(C,f)\) of all the views \(V_i\) possibly obtained by any process \(p_i\) in the above algorithm: For every \(I\in C\), \(Gin(C,f)\) contains all the vertices that are obtained by replacing up to f entries of I by \(\bot\). Two vertices \(J1,J2 \in Gin(C,f)\) are connected by an undirected edge if \(J1[i]\ne \bot \Rightarrow J1[i]=J2[i]\) for every \(1\le i \le n\), or vice versa. It is not difficult to see that \(I1, I2 \in H(C,f)\) are connected by an edge if and only if the same vertices \(I1, I2 \in Gin(C,f)\) are connected by a path.
A condition C is f-legal if, for each connected component \(G_1,\ldots ,G_x\) of \(Gin(C,f)\), all the vertices in the component have at least one input value v in common [34, Definition 5.2]. This property translates to the corresponding connected components \(H_1,\ldots ,H_x\) of \(H(C,f)\) as: all vertices in a component must have at least one entry with input value v in common, and v appears in \(f+1\) entries in every vertex. In fact, without the latter, v would disappear from the view J in \(Gin(C,f)\) where the at most f entries holding v in \(I \in H(C,f)\) are replaced by \(\bot\).
The setting for condition-based consensus in [34] differs from the one underlying our topological results in the previous sections in two aspects: (1) It uses a validity condition that is stronger than our strong validity (SV), as it does not allow processes to decide on the initial value of an initially dead process. (2) It does not allow arbitrary input assignments, which is a pivotal assumption in all our broadcasting-based characterizations in Section 7. And indeed, as it will turn out, we do not usually have a common broadcaster p in the connected components of a decision set \(\Sigma _v\) here.
In Theorem 8.1 below, we will characterize the solvability of condition-based consensus with strong validity (SV) using our topological approach. To model (SV), the original f-legality condition must be weakened to f-quasilegality: Rather than assuming that all input assignments I in a connected component \(G_i\) in \(Gin(C,f)\), i.e., the vertices also lying in the corresponding connected component \(H_i\) in \(H(C,f)\), must have a value v in common that appears in at least \(f+1\) entries in I, f-quasilegality only requires a common value v.
For our proof, we exploit the very simple structure of the set of admissible executions \(\Sigma\) of the generic condition-based consensus algorithm, and the close relation between \(\Sigma\) and \(Gin(C,f)\). In fact, \(Gin(C,f)\) is a graph on all the possible views of the processes (at the end of the first round) in any execution. More specifically, for the admissible execution \(\alpha =\alpha (I) \in \Sigma\) starting from the initial value assignment I, the configuration \(\alpha ^1=(J_1,¸\ldots , J_n)\) after round 1 satisfies \(J_j \in G_i \subseteq Gin(C,f)\) for every \(p_j \in \Pi\). Herein, \(G_i\) is the connected component in \(Gin(C,f)\) that contains I. This holds since every \(J_j\) is obtained from I by replacing at most f entries with \(\bot\) in \(Gin(C,f)\). Note carefully that every process can hence unambiguously identify the connected component \(G_i\) the current execution belongs to, as it only needs to check in which connected component its local view lies. Recall that it is assumed that every process knows C and hence \(H(C,f)\) and \(Gin(C,f)\)a priori.
8.4 Dynamic Networks with Limit-closed Message Adversaries
In this section, we will consider consensus with independent and arbitrary input assignments in dynamic networks under message adversaries [2] that are limit-closed [47], in the sense that every convergent sequence of executions \(\alpha _0,\alpha _1,\ldots\) with \(\alpha _k \in \Sigma\) for every i has a limit \(\alpha \in \Sigma\). An illustration is shown in Figure 3, where the purple dots represent a sequence of executions \(\alpha _i\) taken from the connected component \(\Sigma _{\gamma _0}\) and \(\times\) the limit point \(\alpha\) at the boundary. The most prominent examples of limit-closed message adversaries are oblivious ones [14, 44, 46].
We recall that dynamic networks consist of a set of n lock-step synchronous fault-free processes, which execute a deterministic consensus algorithm that broadcasts its entire local state via message-passing in each of the communication-closed rounds \(1,2, \ldots\) A message adversary determines which process q receives the message broadcast by a process p in some round t, via the directed round-t communication graph \(\mathcal {G}^t\). Together with the initial configuration \(C^0\) of all the processes, the particular sequence of communication graphs \(\mathcal {G}^1,\mathcal {G}^2, \ldots\), which is called communication pattern, uniquely determines an execution. For example, an oblivious message adversary is defined by a set \(\mathbf {D}\) of allowed communication graphs and picks every \(\mathcal {G}^t\) arbitrarily from this set.
Since all processes are obedient here, we will only consider the uniform topology in the sequel. The set of all process-time graphs \(\mathcal {PT}^{\omega }\) is compact and the transition function \(\hat{\tau }: \mathcal {PT}^{\omega }\rightarrow \mathcal {C}^{\omega }\) is continuous, according to Lemma A.2, so taking \(\hat{\tau }(\mathcal {PT}^{\omega })\) results in a set of configuration sequences that is indeed compact. Note that limit-closed message adversaries are hence sometimes referred to as compact message adversaries.
The following consensus characterization holds even for general message adversaries:
Fig. 3.
We will start our considerations for limit-closed message adversaries by exploring the structure of the strong decision sets (recall Definition 7.8) of correct consensus algorithms, see Figure 3 for an illustration.
Unfortunately, Corollary 8.3 does not allow us to also infer some minimum distance \(d\gt 0\) also for the connected components in general. It does hold true, however, if there are only finitely many connected components. The latter is ensured, in particular, when \(\Sigma\) is locally connected, in the sense that every open set \(U(\delta) \subseteq \Sigma\) containing \(\delta\) also contains some connected open set \(V(\delta)\): According to [36, Theorem 25.3], all connected components of \(\Sigma\) are also open in this case. Hence, \(\Sigma =\bigcup _{\gamma \in \Sigma } \Sigma _{\gamma }\) is an open covering of \(\Sigma\), and since \(\Sigma\) is compact, there is a finite sub-covering \(\Sigma =\Sigma _{\gamma ^1}^{\prime } \cup \dots \cup \Sigma _{\gamma ^m}^{\prime }\). Every \(\Sigma _{\gamma }\) must hence be equal to one of \(\Sigma _{\gamma ^1}, \ldots , \Sigma _{\gamma ^m}\), as connected components are either disjoint or identical.
Unfortunately, however, most limit-closed message adversaries do not guarantee local connectedness. In the case of oblivious message adversaries, in particular, it has been argued [22] that isolated “islands” are created in the evolution of the protocol complex, which further develop like the original protocol complex (that must be not connected for consensus to be solvable). This phenomenon may be viewed as the result of the “self-similarity” that is inherent in the communication patterns created by such message adversaries, which is not compatible with local connectedness.
In general, for limit-closed message adversaries that induce infinitely many connected components in \(\Sigma\), one cannot infer openness (and hence clopenness) of the connected components: Consider a decision set \(\Sigma _v\) that consists of infinitely many connected components. Whereas any connected component \(\Sigma _{\gamma }\) is closed, the set of all remaining connected components \(\Sigma _v\setminus \Sigma _{\gamma }\) need not be closed. It may hence be possible to pick a sequence of executions \((\alpha _k)\) from \(\Sigma _v\setminus \Sigma _{\gamma }\) that converges to a limit \(\alpha\), and a sequence \((\beta _k)\) from \(\Sigma _{\gamma }\) that converges to \(\beta \in \Sigma _{\gamma }\), satisfying \(d_{\mathrm{u}}(\alpha ,\beta)=0\). By Lemma 6.2, this implies \(d_{\mathrm{u}}(\Sigma _{\gamma },\Sigma _v\setminus \Sigma _{\gamma })=0\). It is important to note, however, that this can only happen for connected components in the same strong broadcaster decision set \(\Sigma _v^p\), as \(d_{\mathrm{u}}(\Sigma _v^p,\Sigma _w^q) \ge d \gt 0\) prohibits a common limit across different decision sets. Consequently, consensus solvability is not per se impaired by infinitely many connected components, as Corollary 8.2 has shown.
We will now make the characterization of Corollary 8.2 for limit-closed message adversaries more operational, by introducing the \(\varepsilon\)-approximation of connected components and strong broadcaster decision sets, typically for some \(\varepsilon = 2^{-t}\), \(t\ge 0\). Informally, it provides the executions that have a t-prefix that cannot be transitively distinguished by some process. Since the number of different possible t-prefixes is finite, it can be constructed iteratively using finitely many iterations:
Note carefully that \(\Sigma _{\gamma }^\varepsilon\) is generally different (in fact, larger) than the covering of \(\Sigma _{\gamma }\) with \(\varepsilon\)-balls defined by \(\bigcup _{\delta \in \Sigma _\gamma } B_\varepsilon (\delta) \cap \Sigma\). Our \(\varepsilon\)-approximations satisfy the following properties (that actually hold for general message adversaries):
Obviously, properties (i) and (iii) of the \(\varepsilon\)-approximation of connected components also extend to arbitrary unions of those, and hence to strong broadcaster decision sets. In fact, for limit-closed message adversaries, provided \(\varepsilon\) is chosen sufficiently small, we get the following result:
Theorem 8.8 implies that if consensus is solvable, then, for every \(0\lt \varepsilon ^{\prime }\le \varepsilon\), the universal algorithm from Theorem 7.12 applied to the strong decision sets can be used for actually solving it. And indeed, the consensus algorithm given by Winkler et al. [47, Algorithm 1] can be viewed as an instantiation of this fact.
Moreover, Corollary 8.7 implies that checking the broadcastability of all the executions in \(\Sigma _v^{p,\varepsilon }\) can be done by checking the broadcastability of finite prefixes. More specifically, like the decision function \(\Delta\) of consensus, the function \(T(\alpha)\) that gives the round by which every process in \(\alpha \in \Sigma\) has the initial value \(I_p(\alpha)\) of the broadcaster p in its view is locally constant for a sufficiently small neighborhood, namely, \(B_{2^{-T(\alpha)}}(\alpha)\), and is hence continuous in any of our topologies. Since \(\Sigma _v^p=\Sigma _v^{p,\varepsilon }\) is compact, \(T(\alpha)\) is in fact uniformly continuous and hence attains its maximum \(\hat{T}\) in \(\Sigma _v^{p,\varepsilon }\). It hence suffices to check broadcastability in the t-prefixes of \(\Sigma _v^{p,\varepsilon }\) for \(t=\max \lbrace \lfloor \log _2(1/\varepsilon)\rfloor ,\hat{T}\rbrace\) in Theorem 8.8.
In [47], this has been translated into the following non-topological formulation (where \({\mathsf {MA}}\) corresponds to \(\Sigma\), \([\sigma |_r]\) is the set of r-prefixes of the executions in \(\Sigma _\sigma ^{2^{-r}}\) in the uniform topology, and \(\text{Ker}(x)\) is the set of broadcasters in the prefix x):
8.5 Dynamic Networks with Non-limit Closed Message Adversaries
In this section, we consider consensus with independent and arbitrary input assignments under message adversaries that are not limit-closed [17, 41, 49]. A simple example would be a message adversary, which guarantees that there is some finite round r where the communication graph \(\mathcal {G}^r\) is a clique. The communication pattern where \(r=\infty\), i.e., the limiting case \(r \rightarrow \infty\) (where the clique graph never happens) is forbidden.
As already mentioned in Section 8.4, our consensus characterization Corollary 8.2 also applies here, as does the generic one in Theorem 7.12, of course. Moreover, they can be combined with our limit-based characterization Theorem 6.4 and Corollary 6.7.
What does not work here, however, are our \(\varepsilon\)-approximations according to Definition 8.4, and everything built on top of it: Even if \(\varepsilon\) would be made arbitrarily small, Lemma 8.6 does not hold. An illustration is shown in Figure 4. It is apparent that adding a ball \(B_{\varepsilon }(\alpha)\) in the iterative construction of some \(\Sigma _{\gamma }^\varepsilon\), where \(d_{\mathrm{u}}(\alpha ,\rho)\lt \varepsilon\) for some forbidden limit sequence \(\rho\), inevitably lets the construction grow into some \(\Sigma _{\delta }^\varepsilon\) lying in a different strong broadcaster decision set. Whereas this could be avoided by adapting \(\varepsilon\) when coming close to r, the resulting approximation would not provide any advantage over directly using our characterization Corollary 8.2.
Fig. 4.
These topological findings are of course in accordance with the results on non-limit closed message adversaries we are aware of. In particular, the binary consensus algorithm for \(n=2\) by Fevat and Godard [17] assumes that the algorithm knows a fair execution or a pair of unfair executions according to Definition 6.6a priori, which effectively partition the execution space into two connected components.3 Such a limit exclusion is also exploited in the counterexample to consensus task solvability for \(n=2\) via a decision map that is not continuous [20], which has been suggested by Godard and Perdereau [23]: It excludes just the unfair execution \(\alpha\) based on \(\lbrace \leftrightarrow ,\leftarrow ,\leftarrow ,\ldots \rbrace\), but not the unfair execution \(\beta\) caused by \(\lbrace \rightarrow ,\leftarrow ,\leftarrow ,\ldots \rbrace\), which satisfies \(d_p(\alpha ,\beta)=0\) for the right process p and hence makes consensus impossible.
The \((D+1)\)-VSRC message adversary \(\lozenge {\mathsf {{STABLE}}}_n(D+1)\) [49] generates executions that are based on single-rooted communication graphs in every round, with the additional guarantee that, eventually, a \(D+1\)-vertex-stable root component (\(D+1\)-VSRC) occurs. Herein, a root component is a strongly connected component without in-edges from outside the component, and a x-VSRC is a root component made up of the same set of processes in x consecutive rounds. \(D\le n-1\) is the dynamic diameter of a VSRC, which guarantees that all root members reach all processes. It has been proved [49] that consensus is impossible with \(\lozenge {\mathsf {{STABLE}}}_n(x)\) for \(x\le D\), whereas an algorithm exists for \(\lozenge {\mathsf {{STABLE}}}_n(D+1)\). Obviously, \(\lozenge {\mathsf {{STABLE}}}_n(D+1)\) effectively excludes all communication patterns without any \(D+1\)-VSRC. And indeed, the choice \(x=D+1\) renders the connected components of \(\Sigma\) broadcastable by definition, which is in accordance with Corollary 8.2.
We also introduced and proved correct an explicit labeling algorithm for \(\lozenge {\mathsf {{STABLE}}}_n(n)\) in [48], which effectively operationalizes the universal consensus algorithm of Theorem 7.12: By assigning a (persistent) label\(\Delta (\sigma ^{\prime }|_r)\) to the r-prefixes of \(\sigma \in \Sigma\), it effectively assigns a corresponding unique decision value \(v \in \mathcal {V}\) to \(\sigma\), which in turn specifies the strong decision set \(\Sigma _v\) containing \(\sigma\). In the language of [48], the requirement of every \(\Sigma _v\) being open (and closed) in Theorem 7.12 translates into a matching assumption on this labeling function as follows (herein, \({\mathsf {MA}}\) corresponds to \(\Sigma\), \(\sigma |_r\) denotes the r-round prefix of execution \(\sigma\), and \(\sim\) is the transitive closure over all processes p of the prefix indistinguishability relation \(\sim _p\)):
For \(\lozenge {\mathsf {{STABLE}}}_n(n)\), it has been proved [48, Theorem 12] that the given labeling algorithm satisfies this assumption for \(r=r_{stab}+4n\), where \(r_{stab}\) is the round where the (first) \(D+1\)-VSRC in \(\sigma\) starts. Consensus is hence solvable by a suitable instantiation of the universal consensus algorithm of Theorem 7.12.
8.6 Consensus in Systems with an Eventually Timely f-source
It is well-known [15] that consensus cannot be solved in distributed systems of \(n\ge 2f+1\) (partially) synchronous processes, up to which f may crash, which are connected by reliable asynchronous communication links. For solving consensus, the system model has been strengthened by a weak timely link(WTL) assumption [3, 26]: there has to be at least one correct process p that eventually sends timely to a sufficiently large subset of the processes.
In previous work [3], at least one eventually timely f-sourcep was assumed: After some unknown initial period where all end-to-end message delays are arbitrary, every broadcast of p is received by a fixed subset \(P\subseteq \Pi\) with \(p\in P,|P|\ge f+1\) within some possibly unknown maximum end-to-end delay \(\Theta\). The authors showed how to build the \(\Omega\) failure detector in such a system, which, in conjunction with any \(\Omega\)-based consensus algorithm (like the one by Mostéfaoui and Raynal [35]) can be used to solve uniform consensus.
Their \(\Omega\) implementation lets every process broadcast a heartbeat message every \(\eta\) steps, which forms partially synchronized rounds, and maintains an accusation counter for every process q that counts the number of rounds the heartbeats of which were not received timely by more than f processes. This is done by letting every process who does not receive q’s broadcast within \(\Theta\) send an accusation message for q, and incrementing the accusation counter for q if more than f such accusation messages from different receivers came in. It is not difficult to see that the accusation counter of a process that crashes grows unboundedly, whereas the accusation counter of every timely f-source eventually stops being incremented. Since the accusation counters of all processes are exchanged and agreed-upon as well, choosing the process with the smallest accusation counter (with ties broken by process ids) is a legitimate choice for the output of \(\Omega\).
This WTL model was further relaxed [26], which allows the set \(P(k)\) of witnessing receivers of every eventually moving timely f-source to depend on the sending round k. The price to be paid for this relaxation is the need to incorporate the sender’s round number in the heartbeat and accusation messages.
In this subsection, we will use our Theorem 7.12 to prove topologically that consensus with strong validity and independent arbitrary input assignments can indeed be solved in the WTL model: We will give and prove correct an explicit labeling algorithm Algorithm 1, which assigns a decision value \(v \in \mathcal {V}\) to every execution \(\sigma\) that specifies the decision set \(\Sigma _v\) containing \(\sigma\). Applying our universal algorithm to these decision sets hence allows to solve consensus in this model. Obviously, unlike the existing algorithms, our algorithm does not rely on an implementation of \(\Omega\).
We assume a (slightly simplified) WTL model with synchronous processes and asynchronous links that are reliable and FIFO, with known \(\Theta\) for timely links. Whereas we will use the time \(t=0, 1, 2, \ldots\) our synchronous processes take their steps as global time, we note that we do not have communication-closed rounds here, i.e., have to deal with general executions according to Definition A.1 in the appendix. In an admissible execution \(\sigma\), we denote by \(F(\sigma)\) the set of up to f processes that crash in \(\sigma\), and \(C(\sigma)=Ob(\sigma)=\Pi \setminus F(\sigma)\) the set of correct processes. For an eventual timely f-source p, we will denote with \(r_{\text{stab},p}\) the stabilization round, by which it has already started to send timely: a message sent in round \(t \ge r_{\text{stab},p}\) is received by every \(q\in P(t)\) no later than in round \(t+\Theta -1\), hence it is present in q’s state at time \(t+\Theta -1\). Note carefully that this condition is automatically satisfied when q has crashed by that round. We again assume that the processes execute a full-information protocol, i.e., send their whole state in every round. For keeping the relation to the existing algorithms, we consider the state message sent by p in round t to be its \({\mathsf {heartbeat}}(t)\). Moreover, if the state of process q at time \(t+\Theta -1\) does not contain the reception \({\mathsf {heartbeat}}(t)\) from process p, we will say that q broadcasts an accusation message\({\mathsf {accusation}}(p,t)\) for round t of p in round \(t+\Theta\) (which is of course just part of q’s state sent in this round). If q crashes before round \(t+\Theta\), it will never broadcast \({\mathsf {accusation}}(p,t)\). If q crashes exactly in round \(t+\Theta\), we can nevertheless assume that it either manages to eventually communicate \({\mathsf {accusation}}(p,t)\) to all correct processes in the system, or to none: In our full information protocol, every process that receives \({\mathsf {accusation}}(p,t)\) will forward also this message to all other processes when it broadcasts its own state later on.
Note that a process q that crashes before time \(t+\Theta\) causes \({\mathsf {nottimelyrec}}_s^r(q,p,t) = {\mathsf {false}}\) for all r, and that p is appended in \({\mathsf {accusationcounter}}_s^r(p)\) for tie-breaking purposes only. For every eventually timely f-source p, the implicit forwarding of accusation messages ensures that \({\mathsf {accusationcounter}}_s^r(p)\) will eventually be the same at every correct process s in the limit \(r\rightarrow \infty\).
We now define some predicates that require knowledge of the execution \(\sigma\). Whereas they cannot be computed locally by the processes in the execution, they can be used in the labeling algorithm.
Note that it may occur that another eventual timely f-source \(p^{\prime }\ne p_\sigma\) in \(\sigma\) has a smaller stabilization time \(r_{\text{stab},p^{\prime }}\lt r_{\text{stab},p_{\sigma }}\) than the dominant one, which happens if \(p^{\prime }\) causes more accusations than \(p_\sigma\) before stabilization in total.
The following properties are almost immediate from the definitions:
The following lemma proves that two executions \(\sigma\) and \(\rho\) with indistinguishable prefixes \(\sigma |_r \sim _s \rho |_r\), i.e., \((\sigma |_r)^t \sim _s (\rho |_r)^t\) for \(0 \le t \le r\), cannot both satisfy \({\mathsf {oldenough}}(\sigma ,r)\) resp. \({\mathsf {oldenough}}(\rho ,r)\), and, hence, \({\mathsf {mature}}(\sigma ,r)\) resp. \({\mathsf {mature}}(\rho ,r)\), except when the dominant eventual timely f-source is the same in \(\sigma\) and \(\rho\):
Finally, we need the following technical lemmas:
With the abbreviation \(C(\sigma |_r)=\Pi \setminus {F(\sigma |_r)}\) for all non-faulty processes in \(\sigma |_r\), and \(\sigma |_r \sim _Q \rho |_r\) for \(\forall q\in Q:\; \sigma |_r \sim _q \rho |_r\), we define the short-hand notation \(\sigma |_r \sim _{\ge n-f}\rho |_r\) to express indistinguishability for a majority of (correct) processes, defined by \(\exists Q \subseteq C(\sigma |_r)\cap C(\rho |_r), |Q|\ge n-f\) such that \(\forall q\in Q: \; \sigma |_r \sim _{q} \rho |_r\).
The following lemma guarantees that prefixes that are indistinguishable only for strictly less than \(n-f\) processes are eventually distinguishable for all processes:
The following lemma finally shows that majority indistinguishability in conjunction with mature prefixes entails strong indistinguishability properties in earlier rounds:
With these preparations, we can define an explicit labeling algorithm Algorithm 1 for the WTL model, i.e., an algorithm that computes a label \(\Delta (\sigma |_r)\) for every r-prefix \(\sigma |_r\) of an admissible execution \(\sigma\) in our WTL model. A label can either be \(\emptyset\) (still undefined) or else denote a single process p (which will turn out to be a broadcaster), and will be persistent in \(\sigma\) in the sense that \(\Delta (\sigma |_r)=p \Rightarrow \Delta (\sigma |_{r+k})=p\) for every \(k\ge 0\). Note that we can hence uniquely also assign a label \(\Delta (\sigma)\) to an infinite execution. Note that, for defining our decision sets, we will assign \(\sigma\) to \(\Sigma _{I_p}\), where \(I_p\) is the initial value of \(p=\Delta (\sigma)\) in \(\sigma\).
Informally, our labeling algorithm works as follows: If there is some unlabeled mature prefix \(\rho |_r\), it is labeled either (i) with the label of some already labeled but not yet mature \(\sigma |_r\) if the latter got its label early enough, namely, by the round \(r_0\) where \({\mathsf {oldenough}}(\rho ,r_0)={\mathsf {true}}\), or else (ii) with its dominant \(p_\rho\).
The following Theorem 8.18 in conjunction with Lemma 8.19 shows that Algorithm 1 computes labels, which result in strong decision sets that are compatible with the needs of Theorem 7.12. Strong consensus in the WTL model can hence be solved by means of our universal algorithm.
The following Lemma 8.19 finally confirms that a non-empty label p assigned to some prefix \(\sigma |_r\) is indeed a broadcaster:
9 Conclusions
We provided a complete characterization of both uniform and nonuniform deterministic consensus solvability in distributed systems with benign process and communication failures using point-set topology. Consensus can only be solved when the space of admissible executions can be partitioned into disjoint decision sets that are both closed and open in our topologies. We also showed that this requires exclusion of certain (fair and unfair) limit sequences, which limit broadcastability and happen to coincide with the forever bivalent executions constructed in bivalence and bipotence proofs. The utility and wide applicability of our characterization was demonstrated by applying it to several different distributed computing models.
Part of our future work will be devoted to a generalization of our topological framework to other decision problems. Since the initial publication of our results, this generalized study has been started by Attiya et al. [5]. Another very interesting area of future research is to study the homology of non-compact message adversaries, i.e., a more detailed topological structure of the space of admissible executions.
Acknowledgments
We gratefully acknowledge the suggestions of the reviewers, which stimulated the inclusion of several additional results and pointed out many ways to improve our paper.
Footnotes
1
Informally, a model is limit-closed if the limit of a sequence of growing prefixes of admissible executions is admissible. Note that the wait-free asynchronous model is limit-closed.
2
We chose this simplistic illustrating example in order not to obfuscate the essentials. See Section 8 for more realistic examples.
3
Note that there are uncountably many choices for separating \(\Sigma _0\) and \(\Sigma _1\) here, however.
4
This failed state \(\bot _p\) is the only essential difference to the model of Moses and Rajsbaum [33], where faults are implicitly caused by a deviation from the protocol. This assumption makes sense for constructing “permutation layers”, for example, where it is not the environment that crashes a process at will, but rather the layer construction, which implies that some process takes only finitely many steps. Such a process just remains in the local state reached after its last computing step. In our setting, however, the fault state of all processes is solely controlled by the omniscient environment. Hence, we can safely use a failed state \(\bot _p\) to gain simplicity without losing expressive power.
5
A different, but equivalent, conceptual model would be to assume that the state of a processor consists of a visible state and, in the case of message passing, message buffers that hold in-transit messages.
6
We note that both synchronous and asynchronous communication-closed rounds, as well as the executions \(\mathcal {C}^{\omega }\) defined in our generic system model in Section 3, are of course also sequences of consistent cuts.
7
Note that we slightly abuse the notation \(\mathcal {PT}^{\omega }\) here, which normally represents \(\mathcal {PT}\times \mathcal {PT}\times \dots\).
8
Tychonoff’s theorem states that any product of compact spaces is compact (with respect to the product topology).
A Process-Time Graphs
In the main body of our paper, we have formalized our topological results in terms of admissible executions in the generic system model introduced in Section 3. In this section, we will show that they also hold a topological space consisting of other objects, namely, process-time graphs [8]. In a nutshell, a process-time graph describes the process scheduling and all communication occurring in a run, along with the set of initial values.
Actually, since we consider deterministic algorithms only, a process-time graph corresponds to a unique execution (and vice versa). This equivalence, which actually results from a transition function that is continuous in all our topologies (see Lemma A.2), will eventually allow us to use our topological reasoning in either space alike.
In order to define process-time graphs as generic as possible, we will resort to an intermediate operational system model that is essentially equivalent to the very flexible general system model from Moses and Rajsbaum [33]. Crucially, it will also instantiate the weak clock functions \(\chi _p(C^t)\) stipulated in our generic model in Section 3, which must satisfy \(\chi _p(C^t)\le t\) in every admissible execution \((C^t)_{t\ge 0} \in \Sigma\). Since t represents some global notion of time here (called global real time in the sequel), ensuring this property is sometimes not trivial. More concretely, whereas t is inherently known at every process in the case of lock-step synchronous systems like dynamic networks under message adversaries [49], for example, this is not the case for purely asynchronous systems [19].
A.1 Basic Operational System Model
Following Moses and Rajsbaum [33], we consider message passing or shared memory distributed systems made up of a set \(\Pi\) of \(n\ge 2\) processes. We stipulate a global discrete clock with values taken from \(\mathbb {N}_0=\mathbb {N}\cup \lbrace 0\rbrace\), which represents global real time in multiples of some arbitrary unit time. Depending on the particular distributed computing model, this global clock may or may not be accessible to the processes.
Processes are modeled as communicating state machines that encode a deterministic distributed algorithm (protocol) \(\mathcal {P}\). At every real time time \(t \in \mathbb {N}_0\), process p is in some local state\(L_p^t \in \mathcal {L}_p \cup \lbrace \bot _p\rbrace\), where \(\bot _p\not\in \mathcal {L}_p\) is a special state representing that process p has failed.4 Local state transitions of p are caused by local actions taken from the set \({\mathsf {ACT}}_p\), which may be internal bookkeeping operations and/or the initiation of shared memory operations resp. of sending messages; their exact semantics may vary from model to model. Note that a single action may consist of finitely many non-zero time operations, which are initiated simultaneously but may complete at different times. The deterministic protocol \(\mathcal {P}_p: \mathcal {L}_p \rightarrow {\mathsf {ACT}}_p\), representing p’s part in \(\mathcal {P}\), is a function that specifies the local action p is ready to perform when in state \(L_p \in \mathcal {L}_p\). We do not restrict the actions p can perform when in state \(\bot _p\).
In addition, there is an additional non-deterministic state machine called the environment\(\epsilon\), which represents the adversary that is responsible for actions outside the sphere of control of the processes’ protocols. It controls things like the completion of shared memory operations initiated earlier resp. the delivery of previously sent messages, the occurrence of process and communication failures, and (optionally) the occurrence of external environment events that can be used for modeling oracle inputs like failure detectors [12]. Let \({\mathsf {act}}_\epsilon\) be the set of all possible combinations of such environment actions (also called events for conciseness later on). We assume that the environment keeps track of pending shared-memory operations resp. sent messages in its environment state\(L_\epsilon \in \mathcal {L}_\epsilon\). The environment is also in charge of process scheduling, i.e., determines when a process performs a state transition, which will be referred to as taking a step. Formally, we assume that the set \({\mathsf {ACT}}_\epsilon\) of all possible environment actions consists of all pairs \(({\mathsf {Sched}},e)\), made up of the set of processes \({\mathsf {Sched}}\subseteq \Pi\) that take a step and some \(e\in {\mathsf {act}}_\epsilon\) (which may both be empty as well). The non-deterministic environment protocol\(\mathcal {P}_\epsilon \subseteq \mathcal {G}\times ({\mathsf {ACT}}_\epsilon \times \mathcal {L}_\epsilon)\) is an arbitrary relation that, given the current global state \(G\in \mathcal {G}\) (defined below, which also contains the current environment state \(L_\epsilon \in \mathcal {L}_\epsilon)\), chooses the next environment action \(E=({\mathsf {Sched}},e)\in {\mathsf {ACT}}_\epsilon\) and the successor environment state \(L_\epsilon ^{\prime } \in \mathcal {L}_\epsilon\). Note carefully that we assume that only E is actually chosen non-deterministically by \(\mathcal {P}_\epsilon\), whereas \(L_\epsilon ^{\prime }\) is determined by a transition function \(\tau _\epsilon : \mathcal {G}\times {\mathsf {ACT}}_\epsilon \rightarrow \mathcal {L}_\epsilon\) according to \(L_\epsilon ^{\prime }=\tau _\epsilon (G,E)\).
Finally, a global state of our system (simply called state) is an element of \(\mathcal {G}=\mathcal {L}_\epsilon \times \mathcal {L}_1 \times \cdots \times \mathcal {L}_n\). Given a global state \(G \in \mathcal {G}\), \(G_i\) denotes the local state of process i in G, and \(G_\epsilon\) denotes the state of the environment in G. Recall that it is \(G_\epsilon\) that keeps track of in-transit (i.e., just sent) messages, pending shared-memory operations and so on.5 We also write \(G=(G_\epsilon ,C)\), where the vector of the local states \(C=(C_1,\ldots ,C_n)=(G_1,\ldots ,G_n)\) of all the processes is called configuration. Given C, the component \(C_i\) denotes the local state of process i in C, and the set of all possible configurations is denoted as \(\mathcal {C}\). Note carefully that there may be global configurations \(G\ne G^{\prime }\) where the corresponding configurations satisfy \(C = C^{\prime }\), e.g., in the case of different in-transit messages.
A joint action is a pair \((E,A)\), where \(E=({\mathsf {Sched}},e)\in {\mathsf {ACT}}_\epsilon\), and A is a vector with index set \({\mathsf {Sched}}\) such that \(A_p \in {\mathsf {ACT}}_p\) for \(p \in {\mathsf {Sched}}\). When the joint action E is applied to global state G where process p is in local state \(G_p\), then \(A_p=\mathcal {P}_p(G_p)\) is the action prescribed by p’s protocol. Note that some environment actions, like message receptions at process p require \(p \in {\mathsf {Sched}}\), i.e., “wake-up” the process. For example, a joint action \((E,A)\) that causes p to send a message m to q and process r to receive a message \(m^{\prime }\) sent to it by process s earlier, typically works as follows: (i) p is caused to take a step, where its protocol \(\mathcal {P}_p\) initiates the sending of m; (ii) the environment adds m to the send buffer of the communication channel from p to q (maintained in the environment state \(L_\epsilon\)); (iii) the environment moves \(m^{\prime }\) from the send buffer of the communication channel from s to r (maintained in the environment state \(L_\epsilon\)) to the receive buffer (maintained in the local state of r), and (iv) causes r to take a step. It follows that the local state \(L_r\) of process r reflects the content of message \(m^{\prime }\) immediately after the step scheduled along with the message reception.
With \({\mathsf {ACT}}\) denoting the set of all possible joint actions, the transition function\(\tau : \mathcal {G}\times {\mathsf {ACT}}\rightarrow \mathcal {G}\) describes the evolution of the global state G after application of the joint action \((E,A)\), which results in the successor state \(G^{\prime }= \tau (G,(E,A))\). A run of \(\mathcal {P}\) is an infinite sequence of global states \(G^0,G^1,G^2,\ldots\) generated by an infinite sequence of joint actions. In order to guarantee a stable global state at integer times, we assume for simplicity that the joint actions occur atomically and instantaneously at times \(0.5,1.5,2.5,\ldots\), i.e., that \(G^{t+1}=\tau (G^t,(E^{t.5},A^{t.5}))\). \(G^0\) is the initial state of the run, taken from the set of possible initial states \(\mathcal {G}^0\). Finally, \(\Psi\) denotes the subset of all admissible runs of our system. \(\Psi\) is typically used for enforcing liveness conditions like “every message sent to a correct process is eventually delivered” or “every correct process takes infinitely many steps”.
Unlike Moses and Rajsbaum [33], we handle process failures explicitly in the state of the processes, i.e., via the transition function: If some joint action \((E^{t.5},A^{t.5})\) contains \(E^{t.5}=({\mathsf {Sched}},e)\), where e requests some process p to fail, this will force \(G_p^{t+1}=\bot _p\) in the successor state \(G^{t+1}= \tau (G^t,(E^{t.5},A^{t.5}))\), irrespective of any other operations in e (like the delivery of a message) that would otherwise affect p. All process failures are persistent, that is, we require that all subsequent environment actions \(E^{t^{\prime }.5}\) for \(t^{\prime }\ge t\) also request p to fail. As a convention, we consider every \(E^{t^{\prime }.5}\) where p fails as p taking a step as well. Depending on the type of process failure, failing may cause p to stop its protocol-compliant internal computations, to drop all incoming messages, and/or to stop sending further messages. In the case of crash failures, for example, the process may send a subset of the outgoing messages demanded by \(\mathcal {P}_p\) in the very first failing step and does not perform any protocol-compliant actions in future steps. A send omission-faulty process does the same, except that it may send protocol-compliant messages to some processes also in future steps. A receive omission-faulty process may omit to process some of its received messages in every step where it fails, but sends protocol-compliant messages to every receiver. A general omission-faulty process combines the possible behaviors of send and receive omissions. Note that message loss can also be modeled in a different way in our setting: Rather than attributing an omission failure to the sender or receiver process, it can also be considered a communication failure caused by the environment. The involved sender process p resp. receiver process q continue to act according to its protocol in this case, i.e., would not enter the fault state \(\bot _p\) resp. \(\bot _q\) here.
Since we only consider deterministic protocols, a run \(G^0,G^1,G^2,\ldots\) is uniquely determined by the initial configuration \(C^0\) and the sequence of tuples \((L_\epsilon ^0,E^{0.5}), (L_\epsilon ^1,E^{1.5}), \ldots\) consisting of tuples \((L_\epsilon ^t,E^{t.5})\) of environment state and environment actions for \(t\ge 0\). Let \(\mathcal {G}^{\omega }\) resp. \(\mathcal {C}^{\omega }\) be the set of all infinite runs resp. executions (configuration sequences), with \(\Psi \subseteq \mathcal {G}^{\omega }\) resp. \(\Sigma \subseteq \mathcal {C}^{\omega }\) denoting the set of admissible runs resp. executions that result from admissible environment action sequences \(E^{0.5},E^{1.5},\ldots\); after all, they may be required to satisfy liveness constraints like fairness that cannot be expressed via the transition function.
Our assumptions on the environment protocol, namely, \(L_\epsilon ^{t+1}=\tau _\epsilon (G^t,E^{t.5})\), actually imply that a run \(G^0,G^1,G^2,\ldots\), and thus also the corresponding execution \(C^0,C^1,C^2,\ldots\), is already uniquely determined by the initial state \(G^0=(L_\epsilon ^0,C^0)\) and the sequence of chosen environment actions \(E^{0.5},E^{1.5},\ldots\). Since \(L_\epsilon ^0\) is fixed and the environment actions abstract away almost all of the internal workings of the protocols and their complex internal states, it should be possible to uniquely describe the evolution of a run/execution just by means of the sequence \(E^{0.5},E^{1.5},\ldots\). In the following, we will show that this is indeed the case.
A.2 Implementing Global Time Satisfying the Weak Clock Property
Our topological framework crucially relies on the ability to distinguish/not distinguish two local states \(\alpha _p^t\) and \(\beta _p^t\) in two executions \(\alpha\) and \(\beta\) at global real time t. Clearly, this is easy for an omniscent observer who knows the corresponding global states and can thus verify that \(\alpha _p^t\) and \(\beta _p^t\) arise from the same global time t. Processes cannot do that in asynchronous systems, however, since t is not available to the processes and hence cannot be included in \(\alpha _p^t\) and \(\beta _p^t\). Consequently, two different sequences of environment actions (called events in the sequel for conciseness) \(E_\alpha ^{0.5},E_\alpha ^{1.5},\ldots ,E_\alpha ^{(t-1).5}\) and \(E_\beta ^{0.5},E_\beta ^{1.5},\ldots ,E_\beta ^{(t^{\prime }-1).5}\), applied to the same initial state, may produce the same state \(\alpha _p^t=\beta _p^{t^{\prime }}\). This happens when they are causal shuffles of each other, i.e., reorderings of the steps of the processes that are in accordance with the happens-before relation [29]. Hence, the (in)distinguishability of configurations does not necessarily match the (in)distinguishability of the corresponding event sequences.
Whereas our generic system model does not actually require processes to have a common notion of time, it does require that the weak clock functions \(\chi _p\) do not progress faster than global real time. We will accomplish this in our operational system model by defining some alternative notion of global time that is accessible to the processes. Doing this will also rule out the problem spotted above, i.e., ensure that runs (event sequences) uniquely determine executions (configuration sequences).
There are many conceivable ways for defining global time, including the following possibilities:
(i)
In the case of lock-step synchronous distributed systems, like dynamic networks under message adversaries [38, 47, 48], nothing needs to be done here since all processes inherently know global real time t.
(ii)
In the case of asynchronous systems with a majority of correct processes, the arguably most popular approach for message-passing systems (see e.g., [3, 26, 35]) is the simulation of asynchronous communication-closed rounds: Processes organize rounds \(r=1,2,\ldots\) by locally waiting until \(n-f\) messages sent in the current round r have been received. These \(n-f\) messages are then processed, which defines both the local state at the beginning of the next round \(r+1\) and the message sent to everybody in this next round. Late messages are discarded, and early messages are buffered locally (in the state of the environment) until the appropriate round is reached. The very same approach can also be used in shared-memory systems with immediate snapshots [1], where a process can safely wait until it sees \(n-f\) entries in a snapshot. Just using the round numbers as global time, i.e., choosing \(t=r\), is all that is needed for defining global time in such a model.
(iii)
In models without communication-closed rounds [19, 43], a suitable notion of global time can be derived from other6 definitions of consistent cuts [32]. We will show how this can be done in our operational system model based on Mattern’s vector clocks. Our construction will exploit the fact that a local state transition of a process happens only when it takes a step in our model: In between the \(\ell\)th and \((\ell +1)\)th step of any fixed process p, which happens at time \((t_p(\ell)-1).5\) and \((t_p(\ell +1)-1).5\), respectively, only environment actions (external environment events, message deliveries, shared memory completions), if any, can happen, which change the state of the environment but not the local state of p.
We will start out from the sequence of arbitrary cuts [32] \(IC^0,IC^1,IC^2,\ldots\) (indexed by an integer index\(k\ge 0\)) occurring in a given run \(G^0,G^1,G^2,\ldots\) (which itself is indexed by the global real time t), where the frontier\(IF^k\) of \(IC^k\) is formed by the local states of the processes after they have taken their kth step, i.e., \(IF^0=IC^0=C^0\) and \(IF^k=(G_1^{t_1(k)},\ldots ,G_n^{t_n(k)})\) for \(k\ge 1\), with \((t_p(k)-1).5\) being the time when process p takes its kth step. Note that the latter is applied to p’s state \(IF_p^{k-1}\) in the frontier \(IF^{k-1}\) of \(IC^{k-1}\) and processes all the external environment events and all the messages received/shared memory operations completed since then. Recall the convention that every environment action where process q fails is also considered as q taking a step.
Clearly, except in lock-step synchronous systems, \(t_p(k)\ne t_q(k)\), so \(IC^0,IC^1,IC^2,\ldots\) can be viewed as the result of applying a trivial “synchronic layering” in terms of Moses and Rajsbaum [33]. Unfortunately, though, any \(IC^k\) may be an inconsistent cut, as messages sent by a fast process p in its \((k+1)\)th step may have been received by a slow process q by its kth step. \(IC^k\) would violate causality in this case, i.e., would not be left-closed w.r.t. Lamport’s happens-before relation [29].
Recall that we restricted our attention to consensus algorithms using full-information protocols, where every message sent contains the entire state transition history of the sender. As a consequence, we do not significantly lose applicability of our results by further restricting the protocol and the supported distributed computing models as follows:
(i)
In a single state transition of \(\mathcal {P}_p\), process p, can
—
actually receive all messages delivered to it since its last step,
—
initiate the sending of at most one message to every process, resp.,
—
initiate at most one single-writer multiple-reader shared memory operation in the shared memory owned by some other process (but no restriction on operations in its own shared memory portion).
(ii)
In addition to (optional) external environment events, the environment protocol only provides
—
\({\mathsf {fail}}(q) \in {\mathsf {act}}_\epsilon\), which tells process q to fail,
—
\({\mathsf {delv}}(q,p,t_k)\in {\mathsf {act}}_\epsilon\), which identifies the message m to be delivered to process q (for reception in its next step) by the pair \((p,t_k)\), where p is the sending process and \(t_k.5\) is the time when the sending of m has been initiated, resp.,
—
\({\mathsf {done}}(q,p,t_\ell ,t_k) \in {\mathsf {act}}_\epsilon\), which identifies the completed shared memory operation (to be processed in its next step), in the shared memory owned by p, as the one initiated by process \(q\ne p\) in its step at time \(t_\ell .5\); in a read-type operation, it will return to q the shared memory content based on p’s state at time \(t_k\), with \(t_\ell \le t_k\).
In such a system, given any cut \(IC^k\), it is possible to determine the unique largest consistent cut \(CC^k \subseteq IC^k\) [32]. By construction, \(CC^0=IC^0\), and the frontier \(CF^k\) of \(CC^k\), \(k\ge 1\), consists of the local states of all processes \(q\in \Pi\) reached by having taken some \(\ell (q)\)th step, \(0\le \ell (q)\le k\), with at least one process p having taken its kth step, i.e., \(\ell (p)=k\) and thus \(CF^k_p=IF^k_p\), and \(CF^k_q = IF_q^{\ell (q)}\) with \(0\le \ell (q)\le k\) for all processes q. Note carefully that \(\ell (q)\lt k\) happens when, in \(IC^k\), process q receives some message/data initiated at some step \(\gt k\) at or before its own kth step but after its \(\ell (q)\)th step.
Whereas the environment protocol could of course determine all the consistent cuts \(CC^0,CC^1,CC^2,\ldots\) based on the corresponding sequence of global configurations, this is typically not the case for the processes (unless in the special case of a synchronous system). However, in distributed systems adhering to the above constraints, processes can obtain this knowledge (that is to say, their local share of a consistent cut) via vector clocks [32]. More specifically, it is possible to implement a vector clock \(k_p=(k_p^1,\ldots ,k_p^n)\) at process p, where \(k_p^p\) counts the number of steps taken by p itself so far, and \(k_p^q\), \(q\ne p\), gives the number of steps that p knows that q has taken so far. Vector clocks are maintained as follows: Initially, \(k_p=(0,\ldots ,0)\), and every message sent resp. every shared memory operation data written by p gets \(k_p\) as piggybacked information (after advancing \(k_p^p\)). At every local state transition in p’s protocol \(P_p\), \(k_p^p\) is advanced by 1. Moreover, when a previously received message/previously read data value (containing the originating process q’s vector clock value \(\hat{k}_q\)) is to be processed in the step, \(k_p\) is adjusted to the maximum of its previous value and \(\hat{k}_q\) component-wise, i.e., \(k_p^q=\max \lbrace k_p^q,\hat{k}_q^q\rbrace\) for \(q\ne p\). Obviously, all this can be implemented transparently atop of any protocol \(\mathcal {P}\) running in the system.
Now, given the sequence of global states \(AC^0,AC^1,AC^2,\ldots\) of the processes running the so augmented protocol in some run \(G^0,G^1,G^2,\ldots\), there is a well-known algorithm for computing the maximal consistent cut \(ACC^k\) for the non-consistent cut \(AIC^k\) formed by the frontier \(AIF^k\) of the local states of the processes after every process has taken its kth step: Starting from \(\ell :=k\), process p searches for the sought \(\ell (p)\) by checking the vector clock value \(k_p(\ell)\) of the state after its own \(\ell\)th step. It stops searching and sets \(\ell (p):=\ell\) iff \(k_p(\ell)\) is less or equal to \((k,\ldots ,k)\) component-wise. The state \(AIF_p^{\ell (p)}\) is then process p’s contribution in the frontier \(ACF^k\) of the consistent cut \(ACC^k\). Clearly, from \(ACC^0,ACC^1,ACC^2,\ldots\), the sought sequence of the consistent cuts \(CC^0,CC^1,CC^2,\ldots\) can be obtained trivially by discarding all vector clock information. Therefore, even the processes can compute their share, i.e., their local state, in \(CC^k\) for every k.
By construction, the sequence of consistent cuts \(CC^0, CC^1, CC^2,\ldots\), and hence the sequence of its frontiers \(CF^0,CF^1,CF^2,\ldots\), completely describe the evolution of the local states of the processes in a run \(G^0,G^1,G^2,\ldots\). In our operational model, we will hence just use the indices k of \(CC^k\) as global time for specifying executions: Starting from the initial state \(CC^0\), we consider \(CC^k\) as the result of applying round\(k\ge 1\) to \(CC^{k-1}\) (as we did in the case of lock-step rounds).
A.3 Defining Process-time Graphs
No matter how consistent cuts, i.e., global time, is implemented, from now on, we just overload the notation used so far and denote by \(C^k\) the frontier \(CF^k\) in the consistent cut at global time k. So given an infinite execution \(\alpha\), we again denote by \(\alpha ^t\) the tth configuration (= the consistent cut with index t) in \(\alpha\).
Clearly, by construction, every \(C^k\) is uniquely determined by \(C^0\) and all the events that cause the steps leading to \(C^k\). In particular, we can define a vector of events \(E^k\), where \(E_p^k\) is a set containing all the events that must be applied to \(C_p^{k-1}\) in order to arrive at \(C_p^k\). Note carefully that a process p that does not make a step, i.e., is not scheduled in \(E^k\) and thus has the same non-\(\bot _p\) state in \(C^{k-1}\) and \(C^k\), does not have any event \({\mathsf {delv}}(p,*,*)\in E_p^k\) (resp. \({\mathsf {done}}(p,*,*)\in E_p^k\)) by construction, i.e., \(E_p^k=\emptyset\). Otherwise, \(E_p^k\) contains a “make a step” event, all (optional) external environment events, and \({\mathsf {delv}}(p,*,*)\) for all messages that have been sent to p in steps within \(C^{k-1}\) and are delivered to p after its previous step but before or at its kth step (resp. \({\mathsf {done}}(p,*,*,*)\) for all completed shared memory operation initiated by p in steps within \(CC^{k-1}\) and completed after p’s previous step but before or at its kth step). Note that \(E_p^1\) cannot contain any \({\mathsf {delv}}(p,*,*)\), as no messages have been sent before (resp. no \({\mathsf {done}}(p,*,*,*)\), as no shared memory operations have been initiated before).
As a consequence of our construction, the mismatch problem spotted at the beginning of Appendix A.2 no longer exists, and we can reason about executions and the corresponding event sequences alike.
Rather than considering \(C^0\) in conjunction with \(E^1,\ldots ,E^k\), however, we will consider the corresponding process-time graph k-prefix\(PTG^k\) [8] instead, which we will now define. Since we are only interested in consensus algorithms here, we assume that every process has a dedicated initial state for every possible initial value v, taken from a finite input domain \(\mathcal {V}\). For every assignment of initial values \(x\in \mathcal {V}^n\) to the n processes in the initial configuration \(C^0\), we inductively construct the following sequence of process-time graph prefixes \(PTG^t\):
Figure 5 shows an example of a process-time graph prefix occuring in a run with lock-step synchronous or asynchronous rounds. The nodes are horizontally aligned according to global time, progressing along the vertical axis.
Fig. 5.
Figure 6 shows an example of a process-time graph prefix occuring in a run with processes that do not execute in lock-step rounds and may crash. Nodes are again horizontally aligned according to global time, progressing along the vertical axis. The frontier \(C^k\) of the kth consistent cut, reached at the end of round k, is made up of \(C_p^k=\lbrace (p,\ell _p(k),*) \in PTG^k \mid \mbox{$0\le \ell _p(k)\le k$ is maximal}\rbrace\). That is, starting from the (possibly inconsistent) cut made up of the nodes \((p,k,*)\) of all processes, one has to go down for process p until the first node is reached where no edge originating in a node with time \(\gt k\) has been received.
Fig. 6.
Let \(\mathcal {PT}^t\) be the set of all possible process-time graph t-prefixes, and \(\mathcal {PT}^\omega\) be the set of all possible infinite process-time graphs, for all possible runs of our system. Note carefully that \(\mathcal {PT}^t\), as well every set \(\mathcal {P}^\ell\) of round-\(\ell\) process-time graphs for finite \(\ell\), is necessarily finite (provided the encoding (\(*\)) for external environment events has a finite domain, which we assume). Clearly, \(\mathcal {PT}^t\) resp. \(\mathcal {PT}^\omega\) can be expressed as a finite resp. infinite sequence \((P^0,\ldots ,P^t) \in \mathcal {P}^0 \times \mathcal {P}^1 \times \cdots \times \mathcal {P}^t = \mathcal {PT}^t\) resp. \((P^0,P^1,\ldots) \in \mathcal {P}^0 \times \mathcal {P}^1 \times \dots = \mathcal {PT}^{\omega }\) of round-\(\ell\) process time graphs.7
We will denote by \(PS\subseteq \mathcal {PT}^{\omega }\) the set of all admissible process-time graphs in the given model, and by \(\Sigma \subseteq \mathcal {C}^\omega\) the corresponding set of admissible executions. Note carefully that process-time graphs are independent of the (decision function of the) consensus algorithm, albeit they do depend on the initial values.
Due to the one-to-one of process-time graphs and executions established before, the topological machinery developed in Sections 4–5 for \(\Sigma \subseteq \mathcal {C}^{\omega }\) can also be applied to \(PS \subseteq \mathcal {PT}^{\omega }\). Since, in sharp contrast to the set of configurations \(\mathcal {C}\), the set of process-time graphs \(\mathcal {PT}^t\) is finite for any time t and hence compact in the discrete topology, Tychonoff’s theorem8 implies compactness of the p-view topology on \(\mathcal {PT}^{\omega }\).
Whereas this is not necessarily the case for \(\mathcal {C}^{\omega }\), we can prove compactness of the image of \(\mathcal {PT}^{\omega }\) under an appropriately defined operational transition function: Given the original transition function \(\tau _\epsilon : \mathcal {G}\times {\mathsf {ACT}}_\epsilon \rightarrow \mathcal {L}_\epsilon\), it is possible to define a PTG transition function \(\hat{\tau }: \mathcal {PT}^{\omega }\rightarrow \mathcal {C}^{\omega }\) that just provides the (unique) execution for a given process-time graph. The following Lemma A.2 shows that \(\hat{\tau }\) is continuous in any of our topologies.
Since the image of a compact space under a continuous function is compact, it hence follows that the set \(\hat{\tau }[\mathcal {PT}^{\omega }] \subseteq \mathcal {C}^{\omega }\) of admissible executions is a compact subspace of \(\mathcal {C}^{\omega }\). The common structure of \(\mathcal {PT}^{\omega }\) and its image under the PTG transition function \(\hat{\tau }\), implied by the continuity of \(\tau\), hence allows us to reason in either of these spaces. In particular, with Definition A.3, the analog of Theorems 5.2 and 5.3 read as follows:
References
[1]
Yehuda Afek, Hagit Attiya, Danny Dolev, Eli Gafni, Michael Merritt, and Nir Shavit. 1993. Atomic snapshots of shared memory. J. ACM 40, 4 (1993), 873–890.
Yehuda Afek and Eli Gafni. 2013. Asynchrony from synchrony. In Proceedings of the 14th International Conference on Distributed Computing and Networking (ICDCN’13), Davide Frey, Michel Raynal, Saswati Sarkar, Rundrapatna K. Shyamasundar, and Prasun Sinha (Eds.). Lecture Notes in Computer Science, Vol. 7730. Springer, 225–239.
Marcos K. Aguilera, Carole Delporte-Gallet, Hugues Fauconnier, and Sam Toueg. 2004. Communication-efficient leader election and consensus with limited link synchrony. In Proceedings of the 23rd ACM Symposium on Principles of Distributed Computing (PODC’04), Shay Kutten (Ed.). ACM Press, New York, 328–337.
Hagit Attiya, Armando Castañeda, and Thomas Nowak. 2023. Topological characterization of task solvability in general models of computation. In Proceedings of the 37th International Symposium on Distributed Computing (DISC’23), Rotem Oshman (Ed.). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, 24:1–24:23.
Hagit Attiya, Armando Castañeda, and Sergio Rajsbaum. 2020. Locally solvable tasks and the limitations of valency arguments. In Proceedings of the 24th International Conference on Principles of Distributed Systems (OPODIS’20), Quentin Bramas, Rotem Oshman, and Paolo Romano (Eds.). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 18:1–18:16.
Ido Ben-Zvi and Yoram Moses. 2014. Beyond Lamport’s happened-before: On time bounds and the ordering of events in distributed systems. J. ACM 61, 2 (2014), 13:1–13:26.
Martin Biely and Peter Robinson. 2019. On the hardness of the strongly dependent decision problem. In Proceedings of the 20th International Conference on Distributed Computing and Networking (ICDCN’19). ACM Press, New York, 120–123.
Armando Castañeda, Pierre Fraigniaud, Ami Paz, Sergio Rajsbaum, Matthieu Roy, and Corentin Travers. 2019. Synchronous t-Resilient consensus in arbitrary graphs. In Proceedings of the 21st Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS’19), Mohsen Ghaffari, Mikhail Nesterenko, Sébastien Tixeuil, Sara Tucci, and Yukiko Yamauchi (Eds.). Springer, 53–68.
Bernadette Charron-Bost and André Schiper. 2009. The Heard-Of model: Computing in distributed systems with benign faults. Distrib. Comput. 22, 1 (April2009), 49–71.
Étienne Coulouma, Emmanuel Godard, and Joseph G. Peters. 2015. A characterization of oblivious message adversaries for which Consensus is solvable. Theor. Comput. Sci. 584 (2015), 80–90.
Tristan Fevat and Emmanuel Godard. 2011. Minimal obstructions for the coordinated attack problem and beyond. In Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, (IPDPS’11). 1001–1011.
Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. 1985. Impossibility of distributed consensus with one faulty process. J. ACM 32, 2 (1985), 374–382.
Eli Gafni, Petr Kuznetsov, and Ciprian Manolescu. 2014. A generalized asynchronous computability theorem. In Proceedings of the 33rd ACM Symposium on Principles of Distributed Computing (PODC’14), Shlomi Dolev (Ed.). ACM Press, New York, 222–231.
Hugo Rincon Galeana, Sergio Rajsbaum, and Ulrich Schmid. 2022. Continuous tasks and the asynchronous computability theorem. In Proceedings of the 13th Innovations in Theoretical Computer Science Conference (ITCS’22), Mark Braverman (Ed.). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 73:1–73:27.
Hugo Rincon Galeana, Ulrich Schmid, Kyrill Winkler, Ami Paz, and Stefan Schmid. 2023. Topological Characterization of Consensus Solvability in Directed Dynamic Networks. http://arxiv.org/abs/2304.02316
Martin Hutle, Dahlia Malkhi, Ulrich Schmid, and Lidong Zhou. 2009. Chasing the weakest system model for implementing Omega and consensus. IEEE T. Depend. Secure 6, 4 (2009), 269–281.
Petr Kuznetsov, Thibault Rieutord, and Yuan He. 2018. An asynchronous computability theorem for fair adversaries. In Proceedings of the 37th ACM Symposium on Principles of Distributed Computing (PODC’18), Idit Keidar (Ed.). ACM Press, New York, 387–396.
Friedemann Mattern. 1989. Virtual time and global states of distributed systems. In Proceedings of the International Workshop on Parallel and Distributed Algorithms, Michel Cosnard, Yves Rober, Patrice Quinton, and Michel Raynal (Eds.). North Holland, Amsterdam, 215–226.
Achour Mostefaoui, Sergio Rajsbaum, and Michel Raynal. 2003. Conditions on input vectors for consensus solvability in asynchronous distributed systems. J. ACM 50, 6 (2003), 922–954.
Thomas Nowak, Ulrich Schmid, and Kyrill Winkler. 2019. Topological characterization of consensus under general message adversaries. In Proceedings of the 28th ACM Symposium on Principles of Distributed Computing (PODC’19), Faith Ellen (Ed.). ACM Press, New York, 218–227.
P. R. Parvedy and M. Raynal. 2003. Uniform agreement despite process omission failures. In Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS’03), Jack Dongarra (Ed.). IEEE Press, New York, 22–26.
Kenneth J. Perry and Sam Toueg. 1986. Distributed agreement in the presence of processor and communication faults. IEEE T. Software Eng. SE-12, 3 (1986), 477–482.
Michel Raynal and Julien Stainer. 2013. Synchrony weakened by message adversaries vs asynchrony restricted by failure detectors. In Proceedings of the 32nd ACM Symposium on Principles of Distributed Computing (PODC’13), Gadi Taubenfeld (Ed.). ACM Press, New York, 166–175.
Nicola Santoro and Peter Widmayer. 1989. Time is not a healer. In Proceedings of the 6th Annual Symposium on Theoretical Aspects of Computer Science (STACS’89). Springer, 304–313.
Ulrich Schmid, Bettina Weiss, and Idit Keidar. 2009. Impossibility results and lower bounds for consensus under link failures. SIAM J. Comput. 38, 5 (2009), 1912–1951.
Kyrill Winkler, Ami Paz, Hugo Rincon Galeana, Stefan Schmid, and Ulrich Schmid. 2023. The time complexity of consensus under oblivious message adversaries. In Proceedings of the 14th Innovations in Theoretical Computer Science Conference (ITCS’23), Yael Tauman Kalai (Ed.). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, 100:1–100:28.
Kyrill Winkler, Ulrich Schmid, and Yoram Moses. 2019. A characterization of consensus solvability for closed message adversaries. In Proceedings of the 23rd International Conference on Principles of Distributed Systems (OPODIS’19). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Dagstuhl, 17:1–17:16.
Kyrill Winkler, Ulrich Schmid, and Thomas Nowak. 2021. Valency-based consensus under message adversaries without limit-closure. In Proceedings of the 23rd International Symposium on Fundamentals of Computation Theory (FCT’21), Evripidis Bampis and Aris Pagourtzis (Eds.). Springer, 457–474.
PODC '19: Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing
In this paper, we provide a rigorous characterization of consensus solvability in synchronous directed dynamic networks controlled by an arbitrary message adversary using point-set topology: We extend the approach introduced by Alpern and Schneider in ...
This paper introduces and investigates the k-simultaneous consensus task: each process participates at the same time in k independent consensus instances until it decides in any one of them. It is shown that the k-simultaneous consensus task is ...
The classification of properties of concurrent programs into safety and liveness was first proposed by Lamport [22]. Since then several characterizations of hierarchies of properties have been given, see e.g. [3, 20, 9, 21]; this includes syntactic ...