research-article

Open access

Faster Set Cover in the MPC Model

Authors:

Hongyan Ji,

Shreyas Pai,

Sriram Pemmaraju,

Joshua SobelAuthors Info & Claims

ICDCN '25: Proceedings of the 26th International Conference on Distributed Computing and Networking

Pages 144 - 151

https://doi.org/10.1145/3700838.3700861

Published: 04 January 2025 Publication History

PDF eReader

Abstract

The Massively Parallel Computation (MPC) model is a popular abstraction for large-scale distributed computing. The Set Cover problem is a classical combinatorial optimization problem that has a wide variety of applications and generalizes many well-studied problems such as vertex cover, edge cover, and minimum dominating set. For a Set Cover instance with a ground set of n elements and a collection of m sets, we present two O(log n)-approximation algorithms in the MPC model. Our algorithms run in $\tilde{O}(\log N)$-rounds in the linear-memory MPC model and in $\tilde{O}(\log ^{1.5} N)$ rounds in the sublinear-memory MPC model, where N = n + m. These are the first O(log n)-approximation algorithms for Set Cover in this setting that run in o(log ²N) rounds. Our results are obtained by repurposing the sparsified graph exponentiation technique that has been successful for the maximal independent set problem in the MPC model and applying it to a simple, distributed Set Cover algorithm by Grunau, Mitrović, Rubinfeld, and Vakilian (SODA 2020).

1 Introduction

We present fast algorithms for the SetCover problem in the Massively Parallel Computation (MPC) model [12, 16, 24, 31]. In the SetCover problem we are given a ground set of n elements and a collection of m sets that jointly cover all elements. The goal is to find the smallest subcollection of sets that still cover all elements. SetCover is a classical combinatorial optimization problem that is a generalization of many well-studied problems like vertex cover, edge cover, and minimum dominating set. The MPC model is defined by a set of machines connected via an all-to-all communication network, with each machine having at most W words of local memory (one word is O(log N) bits). Communication and computation in this model are synchronous. At the beginning of the algorithm, the input is distributed across the machines. During a round, each machine receives up to W words from all the other machines, performs local computation, and sends up to W words to all the other machines to guide the computation in the next round. Note that in this model the bandwidth constraint is not associated with individual edges, but is associated with each machine.

A natural, greedy sequential algorithm provides a ln n + O(1)-approximation [5, 23, 26]. This algorithm is optimal since there can be no cln n-approximation to SetCover for any c < 1 unless P = NP [11]. The SetCover problem has also been extensively studied in parallel settings and O(log n)-approximation algorithms are known [3, 30], and these algorithms have served as the basis for distributed SetCover approximation algorithms [4, 22]. For example, in the Congest model¹ of distributed computation, the work of Censor-Hillel and Dory [4] implies an algorithm for the SetCover problem that guarantees an approximation ratio of O(log n), and takes O(log ²N) rounds, where N = n + m. This Congest algorithm succeeds with high probability (whp), i.e., with probability at least 1 − 1/N^c for a constant c > 1.

There is an active stream of research that aims to design MPC algorithms that are significantly faster than their Congest counterparts for various problems [7, 8, 14, 15, 25]. In particular for the Maximal Independent Set (MIS) problem, the best-known round complexity (in terms of the number of vertices, N) in the Congest model is O(log N) [1, 27], whereas in low-memory MPC model it is $\tilde{O}(\sqrt {\log N})$² [15, 25] and in the linear-memory MPC model it is O(log log N) [14]. But, as far as we know, SetCover seems to have resisted such improvements thus far and the fastest known MPC algorithm simply simulates the fastest Congest algorithm [4] one round at a time, taking O(log ²N) rounds. In this paper, motivated by the speedups obtained for MIS, we show that there are substantially faster MPC algorithms for SetCover. Specifically, we prove the following theorem.

Theorem (Informal Main Result) We are given a SetCover instance with a ground set of n elements and a collection of m sets. Let N = m + n. We can compute whp an O(log n)-approximate solution to SetCover that runs in

•

$\tilde{O}(\log ^{1.5} N)$ rounds in the low-memory MPC model and

•

$\tilde{O}(\log N)$ rounds in the linear-memory MPC model.

1.1 SetCover in the MPC Model

The input to the SetCover problem is a ground set X, |X| = n and a collection $\mathcal {S} = \lbrace S_1, S_2, \ldots , S_m\rbrace$ of subsets S_i⊆X satisfying ∪ _iS_i = X. The output of the problem is required to be the smallest subcollection $\mathcal {C} \subseteq \mathcal {S}$ such that $\cup _{S \in \mathcal {C}} S = X$. For a SetCover instance, we use s to denote the size of the largest set and t to denote the largest number of sets an element appears in. A SetCover instance can be represented as a bipartite graph $G = (X, \mathcal {S}, E)$ with node set X ∪ S of size N = m + n, and the edge set E consisting of edges {S, x}, where x ∈ X, $S \in \mathcal {S}$, and x ∈ S.

In the MPC model, the local memory of each machine W and the number of machines $|\mathcal {M}|$ used are assumed to be sublinear to the input size I. Specifically, W = I^α and $|\mathcal {M}| = I^{\beta }$, where 0 < α < 1, and 0 < β < 1 are constants. The number of machines $|\mathcal {M}|$ is assumed to be large enough so that we have enough memory to store the entire input, that is $|\mathcal {M}| \cdot W \ge I$. When the input is an N-node graph, the difficulty of designing MPC algorithms seems to be tied to how the memory per machine W relates to N. For this reason, researchers have separately studied the following three memory-size regimes. Note that in all three regimes, it is still true that the local memory parameter W is sublinear in the total input size I.

Low-Memory MPC:

W = Θ(N^δ) local memory per machine for 0 < δ < 1.

Linear-Memory MPC:

W = Θ(N) local memory per machine.

Superlinear-Memory MPC:

Θ(N^{1 + δ}) local memory per machine for 0 < δ < 1.

It is known that SetCover can be solved (approximately) in O(1) rounds in the superlinear-memory MPC model [20] (see details below in Section 3). Our paper focuses on the two more challenging memory regimes, low-memory MPC and linear-memory MPC.

In the MPC model, the standard assumption is that the input graph edges are distributed arbitrarily among the machines. But it is convenient to design algorithms assuming that the input is distributed in a node-centric manner³. That is for each node v in the input graph, there is a machine M_v that hosts it and M_v knows all the neighbors of v and the machines that host these neighbors. A machine can host multiple vertices as long as the local memory constraint is not violated.

An issue that immediately arises in the low-memory MPC model, is that the degree of a node itself can be too large to fit in a single machine. We deal with this issue in a standard way [2, 15, 25]. We first replace the edges incident on a node v with $\deg (v) \gt N^\delta$ by a virtual, N^O(δ)-ary balanced tree of depth O(1/δ). The root of this tree is v and its leaves are the neighbors of v. Each intermediate node is a virtual node with degree N^O(δ) and thus the neighborhood of every virtual node can fit in a machine. Communication between v and its neighbors occurs via this tree and takes O(1/δ) rounds, instead of just 1 round.

2 Technical Contributions

Our starting point is a simple SetCover algorithm by Grunau, Mitrović, Rubinfeld, and Vakilian [19]. For our purpose, it is best to view this as a distributed Congest model algorithm. In the pseudocode (Algorithm 1) below, s is the size of the largest set in $\mathcal {S}$ and t is the maximum number of sets an element appears in. We use $\deg _{i,k}(S)$ to refer to the number of uncovered elements in S just before iteration k in stage i. This algorithm produces an O(log s)-approximate solution in expectation to SetCover [19]. This algorithm can be implemented “as is” in the linear-memory MPC model in O(log slog t) rounds and in the low-memory MPC model in $O\left(\frac{1}{\delta } \log \mathbf {s} \log \mathbf {t}\right)$ rounds.

While this algorithm is closely related to the nearly-30-year-old parallel SetCover algorithm of Berger, Rompel, and Shor [3], it is similar in spirit to other parallel and distributed SetCover algorithms [4, 22, 30]. All of these algorithms emulate the standard, sequential greedy SetCover algorithm, but relax the sequential nature of the greedy algorithm by processing not just one “best” set, but multiple “good enough” sets in parallel. Specifically, all the sets S for which $\deg _{i, k}(S) \ge \mathbf {s}/2^i$ (in Line 4 Algorithm 1) are good enough and they are candidates for being added to the set cover. For example, in Stage 1, all sets with cardinality at least s/2 are considered good enough. However, adding all of these good enough sets to the set cover can lead to a very large solution and damage the approximation factor. So sets are added in log t iterations in a randomized fashion with geometrically increasing probability. For example, in iteration 1, sets are added with probability 2/t. Intuitively, this means that elements that belong to close to t sets are covered, but only by O(1) sets. This allows the algorithm to double the set selection probability to 4/t in the next iteration and so on. Adding candidate sets to the set cover in such a way that no element is covered by more than O(1) sets leads to a O(log s)-approximation. The log t iterations in each stage, essentially do this type of local symmetry breaking in parallel. Our work in this paper is motivated by the fact that for MIS, which is a classical local symmetry breaking problem, there are techniques that lead to fast MPC algorithms.

2.1 Sparsified Graph Exponentiation

A popular technique [13, 15, 21, 28] for speeding up the simulation of Congest algorithms in “all-to-all” communication models is graph exponentiation. The main idea of this technique is that, if every node v knows the state of the k-neighborhood around v, then by exchanging this information with all nodes, it is possible to learn the state of the 2k-neighborhood around each node. If the information exchange in each iteration can be done in O(1) rounds, then nodes can learn the state of their ℓ -neighborhood in O(log ℓ) rounds. The main obstacle to obtaining this exponential speedup is that the k-neighborhoods around nodes may be so large that exchanging them may take too many rounds. However, many Congest algorithms are randomized and there is a natural sparsification that occurs, i.e., in each round only a randomly sampled subset of the nodes are active, and the rest are silent. This implies that the k-neighborhoods that are exchanged only need to involve sparse subgraphs induced by the sampled nodes. This sparsified graph exponentiation technique has been applied to the MIS problem in the low-memory MPC model [15]. Furthermore, a simulation theorem that allows us to apply sparsified graph exponentiation as a black box was presented in [25] and applied to the 2-ruling set problem and MIS. In this paper, we show that the sparsified graph exponentiation technique can be applied to Algorithm 1 to obtain fast simulations in both the low-memory and the linear-memory MPC models.

2.2 Sparsified Graph Exponentiation Applied to SetCover

Our overall plan is to partition the log t iterations in a stage in Algorithm 1 into batches and simulate each batch fast, using sparsified graph exponentiation. Consider a batch with b iterations, k = B + 1, B + 2, …, B + b in Stage i. At the start of this batch, it is possible to identify sets that will be “active” in this batch. Call a set S a candidate for this batch if $\deg _{i, B}(S) \ge \mathbf {s}/2^i$. For each candidate set S, we toss b coins c_k(S) for k = B + 1, B + 2, …, B + b with Prob[c_k(S) = 1] = 2^k/t. Call a set S selected in this batch if it is a candidate for this batch and c_k(S) = 1 for some B + 1 ≤ k ≤ B + b. It is clear from Lines 4-5 in Algorithm 1 that the only sets that can join the set cover in this batch are those that are selected in this batch.

Whether a selected set S is added to the set cover in this batch can be determined by examining the 2b-hop neighborhood of S in the subgraph induced by selected sets and all elements. We could get a fast simulation of Algorithm 1 if we can use the graph exponentiation technique to gather this subgraph at S, for every selected set S in O(log b) rounds. Unfortunately, this graph is too large because while we have sparsified the sets (by keeping only selected sets), we have not sparsified the elements. To explain our idea of element sparsification, we first describe the role of elements in the sparsified graph. Consider a candidate set S for which c_k(S) = 1 for some B + 1 ≤ k ≤ B + b, but $c_{k^{\prime }}(S) = 0$ for all B + 1 ≤ k′ < k. In other words, k denotes the first iteration in this batch in which S is selected. Even though S is selected in iteration k, whether it should be added to the set cover depends on whether S is still a candidate just before iteration k, i.e., if $\deg _{i, k}(S) \ge \mathbf {s}/2^i$. To check this S needs to know the number of uncovered elements it contains (though not necessarily which elements are uncovered). In order to sparsify elements, we use the insight that it is enough for S to have a good enough estimate of the number of uncovered elements it contains. It turns out that it is enough to randomly sample a small number of elements so that there are roughly Θ(log N) sampled elements for each iteration in the neighborhood of each candidate set.

2.3 Main Results

Let N = m + n. For the low-memory MPC algorithm, each batch has a fixed length consisting of $\Theta (\delta \sqrt {\log N}/4)$ iterations, that can be simulated using sparsified graph exponentiation in O(log log N) rounds whp. This leads to a low-memory MPC model algorithm that runs in

\[\tilde{O}\left(\frac{1}{\delta ^2}\frac{\log \mathbf {s}\log \mathbf {t}}{\sqrt {\log N}}\right) = \tilde{O}\left(\frac{1}{\delta ^2} \min \lbrace \sqrt {\log \mathbf {s}} \log \mathbf {t}, \log \mathbf {s} \sqrt {\log \mathbf {t}}\rbrace \right)\]

rounds (see Theorem 4.6). In the linear-memory MPC algorithm, we simulate the first log t − O(log log N) iterations of each stage i by using the sparsification technique developed in the low-memory MPC algorithm. The interesting point here is that we do not need sparsified graph exponentiation because the active graph induced by the selected candidate sets and sampled elements in the first log t − O(log log N) iterations has O(N) size and can be sent to a single machine for local processing. Then the remaining O(log log N) iterations can be executed one by one. Our linear-memory MPC algorithm has an overall running time of O(log s · log log N) rounds whp in the linear-memory MPC model (see Theorem 5.2).

3 Related Work

Harvey, Liaw, and Liu [20] present, what can be viewed as a O(1)-round superlinear-memory MPC algorithm giving a t-approximation algorithm for weighted SetCover. More precisely, their algorithm runs in O((c/μ)²) rounds, where the space per machine is assumed to be Θ(t · n^{1 + μ}).

It is also worth mentioning that Grunau, Mitrović, Rubinfeld, and Vakilian [19] use their base algorithm to design algorithms in Local Computation Algorithms (LCA) model with small query complexity. In the LCA model one can perform queries on the input and learn about a small portion of the output. For example, for SetCover in the LCA model, we want to perform as few queries as possible to find out if a set S belongs to the approximate set cover solution. Some of the techniques used in the present paper (e.g., sparsified graph exponentiation) for SetCover in the MPC model may be implicit in [19] and are used to reduce query complexity in LCA. However, none of the results in [19] are in the MPC model.

In a recent paper, Götte, Kolb, Scheideler, and Werthmann [17] present a SetCover approximation algorithm in the beeping model⁴ and also presents a variant that has low message complexity in the Congest model. Even though this algorithm has many features specific to the beeping model, at its core it is similar to the base algorithm in [19]. It is important to note that even though our work is in the MPC model, [19] is in the LCA model, and [17] is in the beeping and Congest models, the common goal in all of these is to reduce the message complexity of the algorithms without worsening the quality of the approximation.

Note added to the camera-ready version: As the current paper was being reviewed, we were alerted to a paper by Dhulipala, Dinitz, Łącki, and Mitrović posted on arxiv [10] (and to appear in DISC 2024 [9]), one of whose results is very similar to our result. In fact, the “new, surprisingly simple” algorithm referred to in the abstract of this paper [10] seems quite similar to the “base algorithm” from [19] that we refer to. We note that [10] uses this algorithm as a starting point for deriving multiple algorithms (with different approximation factors), not just in the MPC model, but also in the PRAM model.

4 Low-Memory MPC Algorithm

Fix an arbitrary stage i. The log t iterations in the stage are partitioned into batches, each batch consisting of $\delta \sqrt {\log N}/4$ iterations. To be precise, let $P_j = {\delta \sqrt {\log N}/4}\cdot (j-1)$ denote batch boundaries. Then batch j, for j = 1, 2, … consists of iterations k = P_j + 1 through k = P_{j + 1}. Recall that a set S is called a candidate for batch j if $\deg _{i, P_j+1}(S) \ge \mathbf {s}/2^i$. Also recall that each candidate S tosses $\delta \sqrt {\log N}/4$ coins c_k(S) for P_j + 1 ≤ k ≤ P_{j + 1} with Prob[c_k(S) = 1] = 2^k/t to mimic the random selection of candidates in Line 5 in Algorithm 1 . A candidate set S is said to be selected if c_k(S) = 1 for some P_j + 1 ≤ k ≤ P_{j + 1}. A candidate set S is said to be selected in iteration ℓ, P_j + 1 ≤ ℓ ≤ P_{j + 1} if c_ℓ(S) = 1 and c_k(S) = 0 for all P_j + 1 ≤ k < ℓ. As mentioned in the previous section, we just need enough not-yet-covered elements to participate in the algorithm in batch j to obtain an up-to-date estimate of the number of not-yet-covered elements in each selected candidate set. For this purpose, we sample each element that is not covered by the start of batch j with probability 2clog N · 2ⁱ/s independently for each iteration in batch j, for some constant c that will be fixed later. This is done so that we can get independent estimates of the degree of each iteration.

Let $A = (\mathcal {S}_A, X_A, E_A)$ be the bipartite graph with node set $\mathcal {S}_A \cup X_A$, where $\mathcal {S}_A$ is the set of selected candidate sets in batch j and X_A is the set of sampled elements in batch j and the edge set E_A consists of edges {S, e}, where $S \in \mathcal {S}_A$, e ∈ X_A and e ∈ S. We call A the active graph for batch j because it includes all sets and elements that might be active in batch j. Let B_A(v, r) be the subgraph of A induced by nodes that are at most r hops from v ∈ S_A ∪ X_A. Each node w in B_A(v, r) representing a set is labeled with the following information: (a) $\deg _{i, P_j+1}(w)$ and (b) the coin toss sequence $(c_{P_j+1}(w), c_{P_j+2}(w), \ldots , c_{P_{j+1}}(w))$. The algorithm that simulates batch j in stage i is described in Algorithm 2 .

As informally described in the introduction, for a set S that is selected in iteration ℓ, P_j + 1 ≤ ℓ ≤ P_{j + 1} (i.e., in batch j), we use a sample of elements to estimate $\deg _{i,\ell }(S)$ and determine if $\deg _{i, \ell }(S) \ge \mathbf {s}/2^i$. The following lemma shows that the estimator used in Line 9 of Algorithm 2 for $\deg _{i, \ell }(S)$ is good enough. Specifically, set S may be incorrectly classified as a candidate when the number of not-yet-covered elements in S is very close, i.e., within a $\left[\frac{1}{(1+\epsilon)}, \frac{1}{(1-\epsilon)}\right]$ factor of the Stage i threshold of s/2ⁱ for an arbitrarily small constant ϵ. However, this only worsens the approximation factor of the algorithm by a factor that depends on ϵ.

Lemma 4.1

Suppose that S is a candidate selected in iteration ℓ, P_j + 1 ≤ ℓ ≤ P_{j + 1}. Let C(S) be the subset of S of sampled elements in batch j. Let C_u(S)⊆C(S) be the set of sampled neighbors of u that remain uncovered at the end of iteration ℓ − 1. Then, for any constant ϵ > 0, whp,

\[\begin{aligned} \left(\frac{|C_u(S)|}{|C(S)|} \right) \cdot \deg _{i, P_j+1}(S) &\ge \frac{\mathbf {s}}{2^i} \quad \Longrightarrow \quad \deg _{i, \ell }(S) \ge \frac{\mathbf {s}}{2^i(1+\epsilon)}, \\ \left(\frac{|C_u(S)|}{|C(S)|} \right) \cdot \deg _{i, P_j+1}(S) &\lt \frac{\mathbf {s}}{2^i} \quad \Longrightarrow \quad \deg _{i, \ell }(S) \lt \frac{\mathbf {s}}{2^i(1-\epsilon)}. \end{aligned}\]

Proof.

Let U denote the set of not-yet-covered elements in S just before the start of the batch. Thus, $\deg _{i,P_j+1}(S) = |U|$. Since S is a candidate before the start of this batch, |U| ≥ s/2ⁱ and therefore $\mathbb {E}[|C(S)|] = (2c\log N \cdot 2^i/\mathbf {s}) \cdot |U| \ge 2c\log N$ and by Chernoff bounds

\begin{equation}\mbox{Prob}\left[||C(S)| - \mathbb {E}[|C(S)|]| \ge \frac{\epsilon }{4} \cdot \mathbb {E}[|C(S)|]\right] \le \frac{1}{N^3}\end{equation}

(1)

for large enough constant c. Furthermore, $\mathbb {E}[|C_u(S)|] = (2c\log N \cdot 2^i/\mathbf {s}) \cdot \deg _{i,\ell }(S)$. We partition the rest of the proof into two cases depending on how large $\deg _{i,\ell }(S)$ is relative to s/2^{i + 1}.

Case (1):

$\deg _{i,\ell }(S) \ge \mathbf {s}/2^{i+1}$. In this case, $\mathbb {E}[|C_u(S)|] \ge c \log N$ and by Chernoff bounds, for large enough constant c,

\begin{equation}\mbox{Prob}\left[||C_u(S)| - \mathbb {E}[|C_u(S)|]| \ge \frac{\epsilon }{4} \cdot \mathbb {E}[|C_u(S)|]\right] \le \frac{1}{N^3}.\end{equation}

(2)

Let $\mathcal {E}_1$ denote the event $||C(S)| - \mathbb {E}[|C(S)|]| \lt \frac{\epsilon }{4} \cdot \mathbb {E}[|C(S)|]$ and similarly let $\mathcal {E}_2$ denote the event $||C_u(S)| - \mathbb {E}[|C_u(S)|]| \lt \frac{\epsilon }{4} \cdot \mathbb {E}[|C_u(S)|]$. Then, by bounds in (1) and (2), $\mbox{Prob}[\mathcal {E}_1 \wedge \mathcal {E}_2] \ge 1 - 2/N^3 \ge 1 - 1/N^2$ for N ≥ 2. Therefore, with probability at least 1 − 1/N²

\begin{equation}\left(\frac{1-\epsilon /4}{1+\epsilon /4}\right) \cdot \frac{\deg _{i,\ell }(S)}{|U|} \le \frac{|C_u(S)|}{|C(S)|} \le \left(\frac{1+\epsilon /4}{1-\epsilon /4}\right) \cdot \frac{\deg _{i,\ell }(S)}{|U|}.\end{equation}

(3)

For ϵ ≤ 1, the inequalities in (3) imply that with probability at least 1 − 1/N²,

\[(1-\epsilon) \cdot \deg _{i,\ell }(S) \le \frac{|C_u(S)|}{|C(S)|} \cdot |U| \le (1+\epsilon) \cdot \deg _{i,\ell }(S).\]

Case (2):

$\deg _{i,\ell }(S) \lt \mathbf {s}/2^{i+1}$. In this case, $\mathbb {E}[|C_u(S)|] \lt c \log N$. Therefore, by Chernoff bounds Prob[|C_u(S)| ≥ (3c/2)log N] ≤ 1/N³. The bound in (1) also implies that Prob[|C(S)| ≤ ((3c/2)log N · 2ⁱ/s) · |U|] ≤ 1/N³. Therefore, using a similar reasoning as in Case 1, with probability at least 1 − 1/N², both the events |C_u(S)| < (3c/2)log N and |C(S)| > ((3c/2)log N · 2ⁱ/s) · |U| occur. This implies that with probability at least 1 − 1/N²,

\[\frac{|C_u(S)|}{|C(S)|} \cdot |U| \lt \frac{(3c/2) \log N}{((3c/2) \log N \cdot 2^i/s) \cdot |U|}\cdot |U| = \frac{s}{2^i}.\]

To finish the proof we separately consider the two parts in the statement of the lemma.

•

If $(|C_u(S)|/|C(S)|)\cdot \deg _{i, P_j+1}(S) \ge \mathbf {s}/2^i$, then with probability at least 1 − 1/N² Case (1) holds. This is because if Case (2) were to hold, then with probability at least 1 − 1/N², (|C_u(S)|/|C(S)|) · |U| < s/2ⁱ. Thus, whp $\deg _{i,\ell }(S) \ge \mathbf {s}/2^{i+1}$ and therefore $(|C_u(S)|/|C(S)|)\cdot \deg _{i, P_j+1}(S) \le (1+\epsilon) \cdot \deg _{i,\ell }(S)$. Hence, $\deg _{i,\ell }(S) \ge \mathbf {s}/2(1+\epsilon)$.

•

If $(|C_u(S)|/|C(S)|)\cdot \deg _{i, P_j+1}(S) \ge \mathbf {s}/2^i$ then either Case (1) or (2) above may hold. If Case (1) holds and $\deg _{i,\ell }(S) \ge \mathbf {s}/2^{i+1}$ then $(|C_u(S)|/|C(S)|)\cdot \deg _{i, P_j+1}(S) \ge (1-\epsilon) \cdot \deg _{i,\ell }(S)$ and therefore $\deg _{i,\ell }(S) \le \mathbf {s}/2^i(1-\epsilon)$. If Case (2) holds, then $\deg _{i,\ell }(S) \lt \mathbf {s}/2^{i+1} \lt \mathbf {s}/2^i(1-\epsilon)$.

□

The lemma above shows that the proposed estimator for $\deg _{i, \ell }(S)$ is good enough. To compute this estimator, a selected candidate set S needs to know the quantities |C_u(S)|, |C(S)|, and $\deg _{i, P_j+1}(S)$. The latter two quantities are available at the start of the batch just by communicating with neighbors. The following lemma shows that C_u(S) can be computed via local computation on $B_A(S, \delta \sqrt {\log N}/2)$.

Lemma 4.2

Suppose that a candidate node S is selected in iteration ℓ, P_j + 1 ≤ ℓ ≤ P_{j + 1}. Further suppose that machine hosting set S knows $B_A(S, \delta \sqrt {\log N}/2)$. Then, the machine can determine whp via local computation the set C_u(S) of sampled neighbors who remain uncovered after ℓ − 1 iterations.

Proof.

Let M_S denote the machine hosting set S. We will show by induction that for any h, 1 ≤ h ≤ ℓ, machine M_S can perform local computation on $B_A(S, \delta \sqrt {\log N}/2)$ to determine for any selected candidate set Y in B_A(S, 2(ℓ − h)) whp, the set C_u(Y) of sampled neighbors of Y who remain uncovered after k iterations for any k ≤ h − 1. Plugging h = ℓ into this claim leads to the lemma.

Base case: h = 1. Consider a selected candidate set Y ∈ B_A(S, 2(ℓ − 1)). Let C(Y) denote the set of sampled neighbors of Y. After h − 1 = 0 iterations all nodes in C(Y) remain uncovered. Since all elements in C(Y) belong to $B_A(S, 2(\ell -1)+1) \subseteq B_A(S, \delta \sqrt {\log N}/2)$, machine M_S can compute the set C_u(Y) = C(Y) from $B_A(S, \delta \sqrt {\log N}/2)$ using the first group of sampled elements.

Inductive step: We assume the inductive claim is true after iteration h. In other words, for any selected candidate set Y ∈ B_A(S, 2(ℓ − h)) and k ≤ h − 1, machine M_S can compute the set C_u(Y) of sampled neighbors of Y who remain uncovered after k iterations via local computation on $B_A(S, \delta \sqrt {\log N}/2)$ using the first h − 1 groups of sampled elements. Therefore, for each such Y and k, machine M_S knows if $(|C_u(Y)|/|C(Y)|)\cdot \deg _{i, k}(Y) \ge \mathbf {s}/2^i$ and if Y is selected in iteration k + 1. This implies that for each such Y and k, machine M_S knows if Y joins the set cover in iteration k + 1. Now consider any sampled element x ∈ B_A(S, 2(ℓ − h) − 1) belonging to a group k′ ≤ h. All neighbors of x that are selected candidates are in B_A(S, 2(ℓ − h)). Therefore, for each k ≤ h − 1, machine M_S can figure out via local computation on $B_A(S, \delta \sqrt {\log N}/2)$ if x is covered after k′ = k + 1 iterations have been completed. This implies that for any selected candidate set Y ∈ B_A(S, 2(ℓ − h) − 2) = B_A(S, 2(ℓ − (h + 1))) and any k′ ≤ h, machine M_S can compute C_u(Y), the set of sampled neighbors in group k′ that remain uncovered after k′ iterations. This establishes the inductive hypothesis after iteration h + 1. □

In the next lemma, we show an upper bound on the maximum degree of the active graph. Subsequently, we use this (in Lemma 4.4) to show that every $\delta \sqrt {\log N}/2$-neighborhood in the active graph has size O(N^δ/2).

Lemma 4.3

At the start of batch j, whp every not-yet-covered element x has at most $\mathbf {t} \cdot \ln N/2^{P_j}$ candidate neighbors.

Proof.

Consider an element x ∈ X that is not yet covered at the start of batch j. Suppose x has more than $\mathbf {t} \cdot \ln N/2^{P_j}$ candidate neighbors at the start of batch j. Each of these candidates is selected with probability $2^{P_{j}+1}/\mathbf {t}$ in batch j − 1. Then the probability that none of these candidates are selected in batch j is

\[\left(1 - \frac{2^{P_{j}+1}}{\mathbf {t}}\right)^{\mathbf {t}\ln N/2^{P_j}} \le \exp (2\ln N) = \frac{1}{N^2}.\]

Thus with probability at least 1 − 1/N² such an element x is covered in batch j. By the union bound, with probability at least 1 − 1/N every element x ∈ X not-yet-covered at the start of batch j has at most $\mathbf {t} \cdot \ln N/2^{P_{j+1}}$ candidate neighbors. □

Lemma 4.4

For any node x ∈ A, the number of edges in the ball $B_A(x, \delta \sqrt {\log N}/2)$ is bounded by O(N^δ/2) whp.

Proof.

By Lemma 4.3 each not-yet-covered element has at most $\mathbf {t} \ln N/2^{P_j}$ neighboring candidates with high probability. Furthermore, each candidate set is selected with probability at most $2\cdot 2^{P_{j+1}}/\mathbf {t}$. Thus, in expectation, a not-yet-covered element has at most $2\ln N\cdot 2^{\delta \sqrt {\log N}/4}$ selected candidate neighbors. By Chernoff bounds, a not-yet-covered node has more than $(6\log N)2\ln N\cdot 2^{\delta \sqrt {\log N}/4}$ with probability at most 2^{− 6log N} = 1/N⁶. Thus, whp, every not-yet-covered element has at most $(6\log N)2\ln N\cdot 2^{\delta \sqrt {\log N}/4} = O(\log ^2 N\cdot 2^{\delta \sqrt {\log N}/4})$ selected candidate neighbors.

On the other hand, every not-yet-covered element e is sampled with probability 2clog N · 2ⁱ/s. At the beginning of stage i the number of uncovered elements of any set not currently in the set cover is at most s/2^{i − 1}, since any set with more not-yet-covered elements would have been a candidate in the previous stage and was either added to the set cover or now has fewer than s/2^{i − 1} uncovered elements. Thus, the expected number of sampled not-yet-covered elements in any candidate set in this batch is 4clog N. By Chernoff bounds, the probability that a set not in the current set cover contains more than 24clog N sampled uncovered elements is less than 2^{− 24clog N} = 1/N^24c. Thus whp every set not in the current set cover has at most 24clog N sampled uncovered elements, from one group. And since we independently sample a group of elements for each iteration in the batch, whp every set not in the current set cover has at most 24clog ^1.5N sampled uncovered elements overall.

Combining the previous two results, the degree of the active graph is bounded by $\max \lbrace O(\log ^2 N\cdot 2^{\delta \sqrt {\log N}/4}), O(\log ^{1.5} N)\rbrace =O(\log ^2 N \cdot 2^{\delta \sqrt {\log N}/4})$ whp. Next, we can bound the number of edges in $B_A(x, \delta \sqrt {\log N}/2)$ by $(C\log ^2 N \cdot 2^{\delta \sqrt {\log N}/4})^{\delta \sqrt {\log N}/2}$ for some constant C. This quantity is bounded above by O(N^δ/2). □

Lemma 4.5

Suppose that the total memory, in all machines, is Ω(N^{1 + δ}). Then, for every set $S \in \mathcal {S}_A$, the subgraph $B_A(S, \delta \sqrt {\log N}/2)$ can be gathered at S (in parallel) in O(log log N) rounds.

Proof.

Suppose the total memory is Ω(N^{1 + δ}). In that case, there are Ω(N) machines (since per machine memory is O(N^δ)) and therefore we can assume without loss of generality that every machine hosts O(1) sets or elements. For a node v in the active graph A, let M_v denote the machine hosting v. Suppose that each machine M_v knows B_A(v, p) for some $0 \le p \le \delta \sqrt {\log N}$. Each machine M_v then sends B_A(v, p) to machine M_u for every node u in B_A(v, p). After this communication is completed, each machine M_v can construct B_A(v, 2p) from the information it has received because B_A(v, 2p) is contained in the union of B_A(u, p) for all u in B_A(v, p).

We now argue that this communication can be performed in O(1) rounds. First, note that by Lemma 4.4, the size of B_A(v, p) is bounded above by O(N^δ/2). This also means that B_A(v, p) contains O(N^δ/2) nodes. Therefore, M_v needs to send a total of O(N^δ/2) × O(N^δ/2) = O(N^δ) words. A symmetric argument shows a O(N^δ) bound on the number of words M_v receives. Since O(N^δ) words can be sent and received in each communication round by a machine this communication can be completed in O(1) rounds. □

Theorem 4.6

There is a low-memory MPC algorithm that runs in $\tilde{O}\left(\frac{1}{\delta ^2}\log \mathbf {s}\log \mathbf {t}/\sqrt {\log N}\right)$ rounds and produces an O(log s)-approximate set cover whp. This algorithm needs O(I + N^{1 + δ}) total memory.

Proof.

The purpose of Line 1 in Algorithm 2 is for every element to know if it is covered by a set added to the set cover in the previous batch. This boolean information can be aggregated up the depth-O(1/δ)-tree representing the neighborhood of each element. Hence, Line 1 can be implemented in O(1/δ) rounds. The purpose of Line 2 in Algorithm 2 is for every set S not in the set cover to know $\deg _{i,P_j+1}(S)$, i.e., the number of elements in S not-yet-covered at the end of the previous batch. This count can be aggregated up the depth-O(1/δ)-tree representing the neighborhood of each set. Hence, Line 2 can be implemented in O(1/δ) rounds. Lines 3-5 involve local computation only. Line 6 is the graph exponentiation step, which takes O(log log N) rounds in the low-memory MPC model according to Lemma 4.5. Lines 7-9 only require local computation. Hence, Algorithm 1, which simulates a batch of iterations, can be implemented in O(1/δ + log log N) rounds. Each batch consists of $\delta \sqrt {\log N}/4$ iterations and thus there are $4\log \mathbf {t}/\delta \sqrt {\log N}$ batches in a stage. This implies a total of $O\left(\frac{1}{\delta ^2} \log \mathbf {t} \log \log N/ \sqrt {\log N}\right)$ rounds per stage and thus a total of $\tilde{O}\left(\frac{1}{\delta ^2}\log \mathbf {s}\log \mathbf {t}/\sqrt {\log N}\right)$ rounds.

The fact that Algorithm 1 is an O(log s)-approximation (in expectation) is shown in [18, 19]. Algorithm 2 simulates Algorithm 1 faithfully expects that a set S may be incorrectly classified as a candidate when the number of not-yet-covered elements in S is very close to the Stage i threshold of s/2ⁱ. Specifically, in Algorithm 1 a set S is a candidate in iteration ℓ in Stage i if $\deg _{i, \ell }(S) \ge \mathbf {s}/2^i$. But, as per Lemma 4.1, S may be classified as a candidate even if $\deg _{i, \ell }(S)$ is slightly smaller, i.e., in the range $\left[\frac{\mathbf {s}}{2^i(1+\epsilon)}, \frac{\mathbf {s}}{2^i}\right)$. Similarly, S may be may be classified as a non-candidate even if $\deg _{i, \ell }(S)$ is slightly larger, i.e., in the range $\left[\frac{\mathbf {s}}{2^i}, \frac{\mathbf {s}}{2^i(1-\epsilon)} \right)$. This approximation candidate-estimation only worsens the approximation factor by a factor that depends on ϵ, which we can take to be an arbitrarily small constant.

The O(log s)-approximation factor holds only in expectation. We can make the O(log s)-approximation hold whp (with a larger constant) using the standard technique of running O(log N) copies of the algorithm and returning the smallest solution. Standard Chernoff bounds show that the minimum of these solutions has an approximation ratio O(log s) whp. This increases our memory usage (both local and total) by a factor of log N but this can be absorbed in the big-Oh notation by reducing δ to 0.9δ since log N = O(N^0.1δ). This only increases the round complexity by a constant factor. □

4.1 Linear Total Memory

In this section, we describe how to modify Algorithm 2 to get linear total memory. This comes at the cost of a slightly higher round complexity compared to Theorem 4.6.

Theorem 4.7

There is a low-memory MPC algorithm that runs in $\tilde{O}\left(\frac{1}{\delta ^2}\sqrt {\log \mathbf {s}}\log \mathbf {t}\right)$ rounds and produces an O(log s)-approximate set cover whp. This algorithm needs $\tilde{O}(I)$ total memory.

Proof.

To use less total memory, we modify Algorithm 2 so that each batch in stage i executes $R = \delta \sqrt {\log \mathbf {s}}/4$ iterations. Therefore, the size of the active graph in Lemma 4.4 is bounded above by O(s^δ/2) whp. In stage i, only candidates with degree in range [s/2ⁱ, s/2^{i − 1}) are active. Therefore until stage (1 − δ/2)log s we can charge the active graph collected at an active set S to unique input graph edges for each batch since $\deg (S) \ge \mathbf {s}^{\delta /2}$. These stages are simulated in $O\left(\frac{1}{\delta ^2}\log \mathbf {t} \sqrt {\log \mathbf {s}}\right)$ rounds. Now we want to simulate the remaining (δ/2)log s stages. We shrink the batch size to R/2 and this allows us to simulate an additional (δ/4)log s stages while still charging the active graph (of size O(s^δ/4)) to unique input edges for each batch. We keep halving the batch size at most O(log log s) times where we simulate (δ/2^{t + 1})log s stages with batch size R/2^t until we simulate all stages. This simulation runs in $\tilde{O}\left(\frac{1}{\delta ^2} \sum _{t=1}^{\log \log \mathbf {s}} \delta 2^t\log \mathbf {s} \log \mathbf {t} / 2^{t+1} R\right) = \tilde{O}\left(\frac{1}{\delta ^2} \log \mathbf {t} \sqrt {\log \mathbf {s}} \right)$ rounds. Each input graph edge is charged a most O(log slog t) times so the total memory used is $\tilde{O}(I)$. □

5 Exponentially Faster Simulation in Linear-Memory MPC

In this section we will show that in the linear-memory MPC model the simulation from the previous section can be substantially simplified and accelerated, leading to an algorithm that runs O(log s · log log N) rounds whp. Our main insight is that the active graph induced by a large number of initial iterations in a stage has size O(N). This implies that these initial iterations can be simulated in constant rounds in the linear-memory MPC model.

In particular, for any stage i consider iterations 1 to log t − 2log log N. Each candidate has a probability $\sum _{1}^{\log \mathbf {t} - 3\log \log N} 2^k/\mathbf {t} \le 2 \cdot 2^{\log \mathbf {t} - 3\log \log N }/\mathbf {t} \le 2/(\log ^3 N)$ of being selected. In addition to selecting candidates, each element that is not covered by the start of batch j in stage i is sampled independently in log t − 3log log N groups, each with probability 2clog N · 2ⁱ/s, for some constant c. Let $A = (\mathcal {S}_A, X_A, E_A)$ be the active graph for stage i and batch j which is the bipartite graph with node set $\mathcal {S}_A \cup X_A$, where $\mathcal {S}_A$ is the set of selected candidate sets and X_A is the set of sampled elements and the edge set E_A consists of edges {S, e}, where $S \in \mathcal {S}_A, e \in X_A$, and e ∈ S.

Lemma 5.1

The active graph A for iterations 1 to log t − 3log log N of stage i has O(N/log N) edges whp.

Proof.

Each candidate will select itself in the first log t − 3log log N iterations with probability at most 2/log ³N. Therefore, the expected number of selected candidates that is $\mathbb {E}[|\mathcal {S}_A|] \le 2N/\log ^2 N$. Since the candidates select themselves independently, we can use Chernoff bound to say that whp, $|\mathcal {S}_A| \le 2c N/\log ^2 N$.

Now since we are in stage i, each set has degree at most s/2ⁱ and each element samples itself independently, in log t − 3log log N groups, each with probability 2clog N · 2ⁱ/s. Therefore, using Chernoff bound, we can say that whp, every candidate at most O(log Nlog t) = O(log ²N) sampled neighbors. Therefore, $|E_A| \le |\mathcal {S}_A| \cdot O(\log ^2 N) \le O(N/\log N)$ whp, and the lemma holds. □

We are now ready to prove the main result of this section.

Theorem 5.2

There is a linear-memory MPC algorithm that runs in O(log s · log log N) rounds and produces an O(log s)-approximate set cover whp. This algorithm uses a total of $\tilde{O}(I)$ memory.

Proof.

We simulate iterations 1 to log t − 3log log N of stage i by having each machine compute the local part of the active graph A and send it to a single leader machine. Then the leader machine can simulate the log t − 3log log N iterations locally using Algorithm 2 and send to each set and element its local outcome. This requires O(1) rounds.

The remaining 3log log N iterations can be simulated one by one, and then we move on to stage i + 1. Hence stage i can be simulated in O(log log N) rounds. Therefore, the overall round complexity is O(log s · log log N).

Similar to the proof of Theorem 4.6, the use of the estimators of Lemma 4.1 only worsens the expected O(log s) approximation factor of the base Algorithm 1 by a factor that depends on ϵ, which we can make an arbitrarily small constant.

The O(log s)-approximation factor we get only holds for the expected cost returned by the algorithm. We can make the O(log s)-approximation hold whp (with a larger constant) using the standard technique of running O(log N) copies of the algorithm and returning the smallest solution. For each copy, the active graph in stage i has size O(N/log N) whp, and we assign O(log N) different leader machines that gather each active graph, simulate all iterations and send the outcome to each set and element. Therefore at each stage, each machine sends and receives at most O((N/log N) · log N) = O(N) messages, and we have no overhead in round complexity.

We can also simulate the remaining O(log log N) for each copy in parallel by noting that a faithful simulation of Algorithm 1 only requires each set added to the cover to send one bit to each element it covers and each uncovered element sends one bit to its neighboring sets, so it can update the $\deg _{i,k}(\cdot)$ value for the next iteration k. Since each set an element sends only one bit, we can fit the messages of the O(log N) copies in O(1) words of local memory by having an implicit ordering over all the copies. Therefore, the local memory usage for each machine is still O(N) words.

Once each set knows which of the O(log N) set covers it belongs to, we can aggregate this information into one machine that can compute all O(log N) costs. Each set sends a bit string of size O(log N) to this machine denoting all the copies in which it belongs to the set cover solution. Therefore we have O(log N) solutions generated independently, each with expected approximation ratio O(log s). Standard Chernoff bounds show that the minimum of these solutions has an approximation ratio O(log s) whp.

Note that for the entire simulation we need only O(log N) additional machines and since I = Ω(N), the additional O(Nlog N) total memory is absorbed into the $\tilde{O}(\cdot)$ notation. □

6 Conclusion and Future Work

This paper presents the first $\tilde{O}(\log N)$-round and $\tilde{O}(\log ^{1.5} N)$-round algorithms in the linear-memory and low-memory MPC models respectively for the classical SetCover problem. These algorithms are obtained by using the technique of sparsified graph exponentiation, which was previously used to obtain faster MPC algorithms for MIS [15] and 2-ruling sets [25].

Several natural questions arise from this work.

First, is it possible to design o(log N)-round MPC algorithms for SetCover, at least in the linear-memory MPC model? Some impressive progress has been made on derandomizing MPC algorithms for MIS [6] and coloring [7]. Can these techniques be extended to yield fast, deterministic algorithms for SetCover? In combinatorial optimization, researchers have considered many generalizations of SetCover such as partial SetCover, capacitated SetCover, multi-SetCover, etc. Designing fast MPC algorithms for these generalizations would be a natural follow-up to this work.

Footnotes

The Congest model [29] is a classical distributed model where the input is a graph G = (V, E), which also serves as the communication network. Nodes in the graph are processors with unique IDs from a space whose size is polynomial in |V|. In each round, each node can send an O(log |V|)-bit message to each of its neighbors.

The $\tilde{O}(\cdot)$ notation hides polylog log N factors.

Given an arbitrary distribution of edges across machines we can convert it to a node-centric distribution as follows: (1) for each edge {u, v} create two ordered tuples (u, v) and (v, u) and (2) sort the list of tuples by node ID and machine ID so that earliest tuples in the ordering appear on machines with smallest ID. The result is that the edges incident on each node are now on the same or two machines with adjacent IDs. It is easy to go from this intermediate distribution to a node-centric distribution by having machines with adjacent IDs coordinate with each other. Since we can do sorting in MPC in constant rounds [16], a node-centric distribution be assumed without loss of generality.

⁴

In the beeping model nodes communicate with beeps. In the specific version of the beeping model in [17], in each round a node can either listen or beep. If, in a round, a node v is listening and some subset of its neighbors beep, then v hears a beep.

References

[1]

Noga Alon, László Babai, and Alon Itai. 1986. A Fast and Simple Randomized Parallel Algorithm for the Maximal Independent Set Problem. J. Algorithms 7, 4 (dec 1986), 567–583.

Abstract

1 Introduction

1.1 SetCover in the MPC Model

2 Technical Contributions

2.1 Sparsified Graph Exponentiation

2.2 Sparsified Graph Exponentiation Applied to SetCover

2.3 Main Results

3 Related Work

4 Low-Memory MPC Algorithm

4.1 Linear Total Memory

5 Exponentially Faster Simulation in Linear-Memory MPC

6 Conclusion and Future Work

Footnotes

References

Index Terms

Recommendations

Dynamic set cover: improved amortized and worst-case update time

A Distributed (2+ε)-Approximation for Vertex Cover in O(logδ/ε log log δ) Rounds

Preemptive and non-preemptive generalized min sum set cover

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations