Keywords

1 Introduction

In Secure Multiparty Computation (SMC), p parties compute \((y_1,\ldots ,y_p)=f(x_1,\ldots ,x_p)\), with the party \(P_i\) providing the input \(x_i\) and learning no more than the output \(y_i\). For any functionality f, there exists a SMC protocol for it [19, 36]. While the general construction is inefficient in practice, several SMC frameworks have appeared [3, 6, 12, 21, 29] and certain classes of algorithms can be executed with reasonable efficiency on top of them. In particular, these algorithms should have control flow and data access patterns that depend only or mostly on public data.

Private information retrieval (PIR) and oblivious RAM (ORAM) are among techniques for hiding the patterns of data access. They both posit a client-server setting, where the client queries the elements in server’s memory without the server learning which elements are accessed. For a n-element vector, the asymptotic complexity of both PIR (which allows only reading) and ORAM techniques (which also allow writing) is \(\tilde{O}(\log ^2 n)\). To adapt these techniques to the SMC setting, at least the client’s computations have to be performed through SMC protocols. This brings further complications.

We provide an alternative mechanism for reading an element of a vector according to a private index (private writing is not considered in this paper). We give a private lookup protocol, the operations of which can be partitioned into the offline part — these that can be done without knowing the actual inputs —, and the online part — these that require the inputs. In case of private lookup, it makes sense to consider even three phases — offline, vector-only (where the actual vector, either public or private, is available) and online (where the private index is also available). In our protocols, the online phase requires only a constant number of costly SMC operations, while the bulk of the work is done in the offline and, depending on the protocol and the secrecy of the vector elements, in the vector-only phases. In cases where the main cost of SMC operations is communication between parties, the offline (and vector-only) computations could be performed using dedicated high-bandwidth high-latency channels.

Our private lookup protocols are universally composable, they may be freely used as components in protocols for more complex privacy-preserving applications. In this paper we demonstrate their use in two applications that need oblivious read access to data, but where the pattern for write accesses is public. We have implemented SMC protocols for executing deterministic finite automata (DFA), and for finding the single-source shortest distances (SSSD) in sparse graphs. The latter protocol is based on the well-known Bellman-Ford algorithm. In both protocols, the processed objects (automaton, input string, the graph) are private, except for their sizes.

Our protocols inherit the security guarantees of the underlying SMC implementation. If the SMC implementation provides security against passive resp. also active adversaries, then so do our protocols. If the security provided by the SMC implementation is information-theoretical resp. only computational, then this also applies to our protocols. Perhaps surprisingly, the protocols in this paper are the first information-theoretically secure protocols for DFA execution, if an information-theoretically secure protocol set for SMC is used. All previous protocols have used cryptographic constructions (encryption) that rely on computational hardness assumptions for security.

Structure of the Paper. We review work related to privacy-preserving data access, and to our example applications in Sect. 2. In Sect. 3 we describe the framework in which our protocols are defined and their security and performance properties stated. Sect. 4 presents the private lookup and discusses its security and performance. In Sects. 5 and 6 we describe our implementations of privacy-preserving DFA execution and SSSD, and discuss their performance. Finally, we draw the conclusions in Sect. 7.

2 Related Work

Secure multiparty computation (SMC) protocol sets can be based on a variety of different techniques, including garbled circuits [36], secret sharing [4, 17, 32] or homomorphic encryption [9]. A highly suitable abstraction of SMC is the universally composable Arithmetic Black Box (ABB) [14], the use of which allows very simple security proofs for higher-level SMC applications. Using the ABB to derive efficient privacy-preserving implementations for various computational tasks is an ongoing field of research [1, 8, 11, 27], also containing this paper.

Conceptually, our protocols are most similar to private information retrieval (PIR) [24], for which there exist protocols with \(O(\log ^2n)\) communication and \(O(n/\log n)\) public-key operations [26]. We know of no attempts to implement these techniques on top of SMC, though.

Oblivious RAM (ORAM) [20] is a more versatile technique with similar communication complexity [33] (but higher round complexity and client’s memory requirements). The integration of ORAM with SMC has been studied [13, 18, 22, 28]. In general, such systems require at least \(O(\log ^3n)\) overhead for oblivious data access. We also note that the “trivial” way of expanding the private index into its characteristic vector and computing its scalar product with the array brings O(n) overhead, but may be more efficient in practice due to smaller constants hidden in the O-notation [22, 25].

In this paper, we present protocols for DFA execution, and for SSSD in sparse graphs. In [15, 30], garbled circuits have been adapted for DFA execution, where one party knows the DFA and the other one the input string. This approach, which is not universally composable, works well if the automaton and alphabet are small (but the input string may be long). In [2, 16, 34, 35], DFA execution protocols based on homomorphic encryption are given, some of them resembling PIR protocols. Privacy-preserving graph algorithms have been studied in [5] in a non-composable manner. Composable SSSD protocols for dense graphs have been studied in [1]. Recently, ORAM-with-SMC techniques have been used to implement Dijkstra’s algorithm for sparse graphs [22].

3 Preliminaries

Universal composability (UC) [7] is a framework for stating security properties of systems. It considers an ideal functionality \(\mathcal {F}\) and its implementation \(\pi \) with identical interfaces to the intended users. The latter is at least as secure as the former, if for any attacker \(\mathcal {A}\) there exists an attacker \(\mathcal {A}_S\), such that \(\pi \Vert \mathcal {A}\) and \(\mathcal {F}\Vert \mathcal {A}_S\) are indistinguishable to any potential user of \(\pi \)/\(\mathcal {F}\). The value of the framework lies in the composability theorem: if \(\pi \) is at least secure as \(\mathcal {F}\), then \(\xi ^\pi \) is at least as secure as \(\xi ^\mathcal {F}\) for any system \(\xi \) that uses \(\pi \)/\(\mathcal {F}\). We say that such \(\xi \) is implemented in the \(\mathcal {F}\)-hybrid model. When arguing about the security of such \(\xi \), we may assume that it uses the ideal functionality \(\mathcal {F}\) as a subroutine. All derived conclusions will be valid also for \(\xi ^\pi \).

The arithmetic black box is an ideal functionality \(\mathcal {F}_\mathsf {ABB}\). It allows its users (a fixed number p of parties) to securely store and retrieve values, and to perform computations with them. When a party sends the command \(\mathsf {store}(v)\) to \(\mathcal {F}_\mathsf {ABB}\), where v is some value, the functionality assigns a new handle h (sequentially taken integers) to it by storing the pair (hv) and sending h to all parties. If a sufficient number (depending on implementation details) of parties send the command \(\mathsf {retrieve}(h)\) to \(\mathcal {F}_\mathsf {ABB}\), it looks up (hv) among the stored pairs and responds with v to all parties. When a sufficient number of parties send the command \(\mathsf {compute}( op ;h_1,\ldots ,h_k; params )\) to \(\mathcal {F}_\mathsf {ABB}\), it looks up the values \(v_1,\ldots ,v_k\) corresponding to the handles \(h_1,\ldots ,h_k\), performs the operation \( op \) (parametrized with \( params \)) on them, stores the result v together with a new handle h, and sends h to all parties. In this way, the parties can perform computations without revealing anything about the intermediate values or results, unless a sufficiently large coalition wants a value to be revealed.

The existing implementations of ABB are protocol sets \(\pi _\mathsf {ABB}\) based on either secret sharing [3, 6, 12] or threshold homomorphic encryption [14, 21]. Depending on the implementation, the ABB offers protection against a honest-but-curious, or a malicious party, or a number of parties (up to a certain limit). E.g. the implementation of the ABB by Sharemind [3] consists of three parties, providing protection against one honest-but-curious party.

In this paper, the protocols are given and their security (and correctness) argued in the \(\mathcal {F}_\mathsf {ABB}\)-hybrid model. The arguments remain valid if \(\mathcal {F}_\mathsf {ABB}\) is replaced with a secure implementation \(\pi _\mathsf {ABB}\).

Typically, the ABB performs computations with values v from some ring \(\mathbb {R}\). The set of operations definitely includes addition/subtraction, multiplication of a stored value with a public value (this operation motivates the \( params \) in the \(\mathsf {compute}\)-command), and multiplication. Even though all algorithms can be expressed using just these operations, most ABB implementations provide more operations (as primitive protocols) for greater efficiency of the implementations of algorithms on top of the ABB. In all ABB implementations, addition, and multiplication with a public value occur negligible costs; hence they’re not counted when analyzing the complexity of protocols using the ABB. Other operations may require a variable amount of communication (in one or several rounds) between parties, and/or expensive computation. The ABB can execute several operations in parallel; the round complexity of a protocol is the number of communication rounds all operations of the protocol require, when parallelized as much as possible.

It is common to use \([\![ v ]\!]\) to denote the value v stored in the ABB. The notation \([\![ v_1 ]\!]\; op \;[\![ v_2 ]\!]\) denotes the computation of \(v_1\; op \;v_2\) by the ABB (translated to a protocol in the implementation \(\pi _\mathsf {ABB}\)).

In the next section, we give a protocol for private lookup. Formally, we are presenting a secure implementation for the functionality \(\mathcal {F}_\mathsf {ABB+LU}\) that accepts the same commands as \(\mathcal {F}_\mathsf {ABB}\), answering them in the same manner. Additionally, it accepts the command \(\mathsf {lookup}(h_1,\ldots ,h_n,h_\mathsf {idx})\). When a sufficient number of parties has sent such command to \(\mathcal {F}_\mathsf {ABB+LU}\), it looks up the value \(v_\mathsf {idx}\) corresponding to \(h_\mathsf {idx}\) and the value \(v'\) corresponding to \(h_{v_\mathsf {idx}}\). It stores \(v'\) together with a new handle \(h'\) and sends \(h'\) to all parties.

The implementation \(\pi _\mathsf {ABB+LU}\) is given in the \(\mathcal {F}_\mathsf {ABB}\)-hybrid model. It simply invokes \(\mathcal {F}_\mathsf {ABB}\) for all \(\mathcal {F}_\mathsf {ABB}\) commands. The implementation of the \(\mathsf {lookup}\)-command is given below.

4 Protocol for Private Lookup

Our protocol, depicted in Algorithm 1, takes the handles to elements \(v_{i_1},\ldots ,v_{i_m}\) (with arbitrary non-zero, mutually different indices) and the handle to the index j stored inside the ABB, and returns a handle to the element \(v_j\). It represents the elements as a polynomial V over a suitable field \(\mathbb {F}\), satisfying \(V(i_j)=v_{i_j}\) for all \(j\in \{1,\ldots ,m\}\) (with \(i_1,\ldots ,i_m\) also belonging to \(\mathbb {F}\)). The lookup then amounts to the evaluation of the polynomial in a point. Similar ideas have appeared in [35] (for DFAs). We will then combine these ideas with a method to move offline most of the computations for the polynomial evaluation [27]. Both the idea and the method have been slightly improved and expanded in this paper.

Let our ABB work with the values from the field \(\mathbb {F}\), where \(|\mathbb {F}|\ge m+1\). There exist protocols for generating a uniformly random element of \(\mathbb {F}\) inside the ABB (denote: \([\![ r ]\!]\mathop {\leftarrow }\limits ^{\$}\mathbb {F}\)), and for generating a uniformly random non-zero element of \(\mathbb {F}\) together with its inverse (denote: \(([\![ r ]\!],[\![ r^{-1} ]\!])\mathop {\leftarrow }\limits ^{\$}\mathbb {F}^*\)). These protocols require a small constant number of multiplications on average for any ABB [11].

There exist Lagrange interpolation coefficients \(\lambda ^\mathbf {I}_{j,k}\) depending only on the set \(\mathbf {I}=\{i_1,\ldots ,i_m\}\), such that \(V(x)=\sum _{j=0}^{m-1}c_jx^j\), where \(c_j=\sum _{k=1}^m\lambda ^\mathbf {I}_{j,k}v_{i_k}\). These coefficients are public and computed in the offline phase of Algorithm 1.

figure a

Correctness. The definition of \(c_k\) gives \(\sum _{k=0}^{m-1}c_kl^k=v_l\) for all \(l\in \{i_1,\ldots ,j_m\}\). We can now verify that \(w=\sum _{k=0}^{m-1}y_kz^k=\sum _{k=0}^{m-1}c_kr^kj^kr^{-k}=v_j\).

Security and Privacy. To discuss the security properties of a protocol in the \(\mathcal {F}_\mathsf {ABB}\)-hybrid model, we only have to consider which extra information the adversary may be able to obtain from the \(\mathsf {retrieve}\)-commands, and how it can affect the run of the protocol through the values it \(\mathsf {store}\)-s (the latter is significant only if the adversary is active). There are no \(\mathsf {store}\)-commands in Algorithm 1. The results of the \(\mathsf {retrieve}\)-commands are uniformly randomly distributed elements of \(\mathbb {F}^*\), independent of everything else the adversary sees. These can be simulated without any access to \(v_{i_1},\ldots ,v_{i_m}\) and j. Hence Algorithm 1 is secure and private against the same kinds of adversaries that the used implementation \(\pi _\mathsf {ABB}\) of \(\mathcal {F}_\mathsf {ABB}\) can tolerate.

Complexity. In the offline stage, we perform \(m-2\) multiplications. We also generate one random invertible element together with its inverse, this generation costs the same as a couple of multiplications [11]. The round complexity of this computation, as presented in Algorithm 1 is also O(m), which would be bad for online computations. For offline computations, the acceptability of such round complexity mainly depends on the latency of the used communication channels. The offline phase could be performed in O(1) rounds [11] at the cost of increasing the number of multiplications a couple of times. In the vector-only phase, the computation of the values \([\![ c_k ]\!]\) is free, while the computation of the values \([\![ y_k ]\!]\) requires \(m-1\) multiplications (the computation of \([\![ y_0 ]\!]\) is free). All these multiplications can be performed in parallel. If the vector v were public then the computation of \([\![ y_k ]\!]\) would have been free, too. The only costly operations in the online phase is a single multiplication and a single \(\mathsf {retrieve}\)-operation; these have similar complexities in existing ABB implementations.

4.1 Speeding Up the Offline Phase

The preceding complexity analysis is valid for any implementation of the ABB. Some implementations contain additional efficient operations that speed up certain phases of Algorithm 1. If we use the additive secret sharing based implementation, as used in Sharemind [4], and a binary field \(\mathbb {F}\), then we can bring down the complexity of the offline phase to \(O(\sqrt{m})\) as shown in the following.

The Sharemind ABB is realized by three parties, offering protection against passive attacks by one of the parties. The ABB stores elements of some ring \(\mathbb {R}\); a value \(v\in \mathbb {R}\) is represented by \(\pi _\mathsf {ABB}\) as \([\![ v ]\!]=([\![ v ]\!]_1,[\![ v ]\!]_2,[\![ v ]\!]_3)\in \mathbb {R}^3\) satisfying \([\![ v ]\!]_1+[\![ v ]\!]_2+[\![ v ]\!]_3=v\), where the share \([\![ v ]\!]_i\) is kept by the i-th party \(P_i\). Messages depending on these shares are sent among the parties, hence it is important to rerandomize \([\![ v ]\!]\) before each use. The resharing protocol [4, Algorithm 1] (repeated here as Algorithm 2; all indices of the parties are modulo 3) is used for this rerandomization. We note that in this algorithm, the generation and distribution of random elements can take place offline. Even better, only random seeds can be distributed ahead of the computation and new elements of \(\mathbb {R}\) generated from them as needed. Hence we consider the resharing protocol to involve only local operations and have the cost 0 in our complexity analysis.

figure b

If the ring \(\mathbb {R}\) is a binary field \(\mathbb {F}\), then the additive sharing is actually bit-wise secret sharing: \([\![ v ]\!]=[\![ v ]\!]_1\oplus [\![ v ]\!]_2\oplus [\![ v ]\!]_3\), where \(\oplus \) denotes bit-wise exclusive or. For such sharings, the usual arithmetic operations with shared values in \(\mathbb {Z}_{2^n}\) are more costly, compared to additive sharings over \(\mathbb {Z}_{2^n}\), but equality checks and comparisons are cheaper [4]. As most operations with array indices are expected to be comparisons, bit-wise secret sharing may be a good choice for them.

figure c

Sharemind’s multiplication protocol [4, Algorithm 2] (repeated as Algorithm 3) is based on the equality \(([\![ u ]\!]_1+[\![ u ]\!]_2+[\![ u ]\!]_3)([\![ v ]\!]_1+[\![ v ]\!]_2+[\![ v ]\!]_3)=\sum _{i,j=1}^3 [\![ u ]\!]_i[\![ v ]\!]_j\). After the party \(P_i\) has sent \([\![ u ]\!]_i\) and \([\![ v ]\!]_i\) to party \(P_{i+1}\) (here and subsequently, all party indices are modulo 3), each of these nine components of the sum can be computed by one of the parties. The multiplication protocol is secure against one honest-but-curious party [4, Theorem 2]. Indeed, as the sending of \([\![ u ]\!]_i\) from \(P_i\) to \(P_{i+1}\) takes place after resharing \([\![ u ]\!]\), the value \([\![ u ]\!]_i\) is a uniformly random number independent of all other values \(P_{i+1}\) sees. Hence the simulator for \(P_{i+1}\)’s view could itself generate this value. The same consideration also underlies the security proof of the specialized offline phase protocol given in Algorithm 4, the properties of which we discuss below.

figure d

Privacy. We have to show that the view of a single party \(P_i\) can be simulated without access to the shares held by other parties. Party \(P_i\) receives messages only in lines 4 and 9 of Algorithm 4. In both cases, it receives a share of a freshly reshared value. Hence this message can be simulated by a uniformly random number, as discussed above.

Complexity. We consider local computation and resharings to be free, hence we have to count the number of messages sent by the parties. It is easy to see that a party sends at most \(\sqrt{m+1}\) elements of the field \(\mathbb {F}\) in lines 4 and 9 of Algorithm 4. This is also the round complexity of Algorithm 4. But by tracking the data dependencies in the first loop, we see that its iterations no. \(2^{k-1},\ldots ,2^k-1\) could be done in parallel for each \(k\in \{1,\ldots ,q-1\}\). Hence Algorithm 4 could be executed in \(O(\log m)\) rounds.

Correctness. We can use Algorithm 4 only if \(\mathbb {F}\) is a binary field. In this case squaring a shared value \([\![ u ]\!]\) is a local operation: \(([\![ u ]\!]_1+[\![ u ]\!]_2+[\![ u ]\!]_3)^2=[\![ u ]\!]_1^2+[\![ u ]\!]_2^2+[\![ u ]\!]_3^2\) and the computation of \([\![ u^2 ]\!]_i=[\![ u ]\!]_i^2\) only requires the knowledge of \([\![ u ]\!]_i\). Regarding Algorithm 4, note that its first loop satisfies the invariant that in the beginning of each iteration, each party \(P_i\) knows the values \([\![ v^0 ]\!]_i,\ldots ,[\![ v^{2j-1} ]\!]_i\) and also \([\![ v^0 ]\!]_{i-1},\ldots ,[\![ v^{2j-1} ]\!]_{i-1}\). With these values, it can compute \([\![ v^{2j} ]\!]_i\) and \([\![ v^{2j+1} ]\!]_i\) (effectively, we are computing \(v^{2j}=(v^j)^2\) and \(v^{2j+1}=v^j\cdot v^{j+1}\)). Party \(P_i\) can also compute \([\![ v^{2j} ]\!]_{i-1}\). It receives \([\![ v^{2j+1} ]\!]_{i-1}\) from \(P_{i-1}\). In the second loop, we compute \(v^j=(v^r)^{2^q}\cdot v^s\) for \(j=2^q\cdot r+s\). Again, \([\![ (v^r)^{2^q} ]\!]_i\) is locally computed from \([\![ v^r ]\!]_i\) by squaring it q times.

4.2 Speeding Up the Vector-Only Phase

A different kind of optimization is available if the ABB implementation is based on Shamir’s secret sharing [32], using Gennaro et al.’s multiplication protocol [17] (examples are VIFF [12] and SEPIA [6]). Such ABB impelementations (for p parties), secure against t parties can be given if \(2t+1\le p\) (for passive security) or \(3t+1\le p\) (for active security). In such implementation, a value \(v\in \mathbb {F}\) for a field \(\mathbb {F}\) of size at least \(p+1\) is stored in the ABB as \([\![ v ]\!]=([\![ v ]\!]_1,\ldots ,[\![ v ]\!]_p)\), such that there exists a polynomial f over \(\mathbb {F}\) with degree at most t and satisfying \(f(0)=v\) and \(f(c_i)=v_i\) for all \(i\in \{1,\ldots ,p\}\), where \(\mathbf {C}=\{c_1,\ldots ,c_p\}\) is a set of mutually different, public, fixed, nonzero elements of \(\mathbb {F}\). The share \(v_i\) is kept by the i-th party \(P_i\).

Our optimization relies on the computation of a scalar product \([\![ \sum _{i=1}^k u_iv_i ]\!]\) from the values \([\![ u_1 ]\!],\ldots ,[\![ u_k ]\!]\) and \([\![ v_1 ]\!],\ldots [\![ v_k ]\!]\) stored inside the ABB having the same cost as performing a single multiplication of stored values. For reference, Algorithm 5 presents the scalar product protocol in the SSS-based ABB providing passive security (thus \(2t+1\le p\)) [17]. The multiplication protocol can be obtained from it simply by letting the length of the vectors to be 1. In this protocol, the values \(\lambda ^\mathbf {C}_i\) are the Lagrange interpolation coefficients satisfying \(f(0)=\sum _{i=1}^p\lambda ^\mathbf {C}_if(c_i)\) for any polynomial f over \(\mathbb {F}\) of degree at most 2t. The protocols providing active security are much more complex [10], but similarly have equal costs for multiplication and scalar product.

figure e
figure f
Table 1. Communication costs (in elements of \(\mathbb {F}\)) of different private lookup protocols

Our optimization consists of a reordering of the operations of the vector-only and online phases of the private lookup protocol, as depicted in Algorithm 6. We see that compared to Algorithm 1, we have moved the entire computation of the products \(z^j[\![ c_j ]\!][\![ r^j ]\!]\) to the online phase, thereby reducing the vector-only phase to the computation of certain linear combinations. The online phase becomes more complex, but only by a single scalar product, which costs the same as a single multiplication. The correctness and privacy arguments for Algorithm 6 are the same as for Algorithm 1.

Unfortunately, we cannot use the optimizations of both Sects. 4.1 and 4.2 at the same time, as the cost of converting from one representation to the other would cancel any efficiency gains. If we have three parties and seek passive security against one of them, then our choices are given in Table 1. Recall that multiplication in both representations and retrieval in the additive representation requires the communication of 6 field elements in total. Retrieval in Shamir’s secret sharing based representation requires 3 field elements to be sent.

5 Protocol for DFA Execution

A DFA is a tuple \(A=(Q,\Sigma ,\delta ,q_0,F)\), where Q is a set of states, \(\Sigma \) is the alphabet (a set of characters), \(q_0\in Q\) is the initial state, \(F\subseteq Q\) is the set of final states and \(\delta :Q\times \Sigma \rightarrow Q\) is the transition function. To execute a string \(w=w_1\cdots w_\ell \in \Sigma ^*\) on A means to find the states \(q_1,\ldots ,q_\ell \), such that \(q_i=\delta (q_{i-1},w_i)\) for all \(i\in \{1,\ldots ,\ell \}\) and check whether \(q_\ell \in F\).

In our implementation, the size of the problem — the numbers \(|Q|=m\), \(|\Sigma |=n\) and \(|w|=\ell \) — is public, but \(\delta \) and F are private. We need private lookup to implement \(\delta \). It is represented as a table of \(|Q|\cdot |\Sigma |\) private values. We compute the private index from \([\![ q_{i-1} ]\!]\) and \([\![ w_i ]\!]\) and use it to find \([\![ q_i ]\!]\) from this table, seen as a vector of length \(N=mn\). We have implemented DFA execution using both additive sharing and Shamir’s sharing, in fields GF(p) for \(p=2^{32}-5\) and \(GF(2^{32})\). We have measured the performance of Algorithm 1 in all cases, as well as the optimizations of Algorithm 4 and in appropriate cases. Our tests were performed on three computing nodes, each of which was deployed on a separate machine. The computers in the cluster were connected by an Ethernet local area network with link speed of 1 Gbps. Each computer in the cluster had 48 GB of RAM and a 12-core 3 GHz CPU with Hyper Threading.

Our implementation is sub-optimal in the round complexity (which is \(O(\ell )\)), as it faithfully implements the definition of the DFA execution. Hence the running time of the online phase is currently dominated by the latency of the network. On the other hand, this also implies that many instances of DFA execution run in parallel would have almost the same runtime (for online phase) as the single instance. It is well-known that the DFA execution could be implemented in parallel fashion, using \(O(\log \ell )\) time (or SMC rounds). This, however, increases the total work performed by the algorithm by a factor of O(m).

We see that for the vector-only phase of the private lookup, we need the description of \(\delta \), but not yet the string w. This corresponds very well to certain envisioned cloud services, in particular to privacy-preserving spam filtering, where the spamminess is detected with regular expressions.

Table 2 presents the actual running times of our DFA execution implementation. All running times are in milliseconds, given with 2–3 significant digits. We have measured the running time for different automaton sizes m and alphabet sizes n, with \(N=mn\) being between 6 and 30000. The length of the input string was always 2000 — the work performed by the algorithm, as well as its timing behavior is perfectly linear in this length.

Table 2. DFA execution benchmarks (times in milliseconds, \(\ell =2000\))

We see that the running time of the online phase is indeed only slightly dependent on N: it is \(\approx 286+0.042N\,\,\,\upmu \)s per character of the input string when using the field GF(p) (for \(GF(2^{32}\)), it is around \(290+0.105N\)). This slight dependence on N is caused by the local computations, the amount of which depends on N. Its effect would be even lower if the network latency were higher. Other phases depend much more on N: offline phase requires \(0.4N\,\,\,\upmu \)s and vector-only phase \(0.23N+4.7\cdot 10^{-6}N^2\,\,\,\upmu \)s per character for GF(p).

Among related work, running times of their algorithm implementations have been presented in [15, 30]. Our implementation is significantly more efficient than [15]. They report the running time of 8 s for processing a string of length \(\ell =10\) on an automaton with \(mn=40000\). Compare this number with our reported running time of \(\approx 3\) s for the online phase or even with \(\approx 40\) s for all three phases of processing a 2000-character string on an automaton with \(mn=30000\).

Our running times do not seem that impressive when compared to [30], where, with a more optimized implementation inspired by garbled circuits, running times as low as 12 s are reported for \(n=2\) and \(\{m,\ell \}=\{20,150000\}\). But even then, if we consider \(mn\ell \) to be a valid measure of the size of the problem, our implementation is a couple of times faster (and the online phase requires only 10 % of that time). Also, they are solving a narrower problem, with one of the parties knowing the automaton and the other knowing the input string, while our protocols are universally composable. On the other hand, their implementation is secure against malicious adversaries, while we have tested our protocols only on ABB implementations secure against passive attacks only.

6 Protocol for SSSD

Let \(G=(V,E)\) be a directed graph with \(s,t:E\rightarrow V\) giving the source and target, and \(w:E\rightarrow \mathbb {N}\) giving the length of each edge. Let \(v_0\in V\). Bellman-Ford (BF) algorithm for SSSD starts by defining \(d_0[v]=0\), if \(v=v_0\), and \(d_0[v]=\infty \) for \(v\in V\backslash \{v_0\}\). It will then compute \(d_{i+1}[v]=\min (d_i[v], \min _{e\in t^{-1}(v)} d_i[s(e)]+w(e))\) for all \(v\in V\) and \(i\in \{0,\ldots ,|V|-2\}\). The vector \(d_{|V|-1}\) is the result of the algorithm.

We have implemented the BF algorithm on top of the Sharemind platform, hiding the structure of the graph, as well as the lengths of edges. In our implementation, the numbers \(n=|V|\) and \(m=|E|\) are public, and so are the in-degrees of vertices (obviously, these could be hidden by using suitable paddings). In effect, the mapping t in the definition of the graph is public, while the mappings s and w are private. During the execution, we use private lookup to find \(d_i[s(e)]\). As the vectors \(d_i\) have to be computed one after another, but the elements of the same vector can be computed in parallel, our implementation has O(n) rounds in the online phase.

As the vector \(d_i\) is not yet available at the start of computation, we use the optimized vector-only phase to avoid an O(n) factor during the execution of the BF algorithm. Hence we use Shamir’s secret sharing based ABB implementation. We have to perform arithmetic and comparisons with secret values, hence we must use a prime field as the field \(\mathbb {F}\) (we use GF(p) with \(p=2^{32}-5\)).

Table 3 presents the actual running times of our implementation of the Bellman-Ford algorithm on sparse graphs. All running times are in seconds, given with 2–3 significant digits. We have measured the running time for different graphs with n vertices (where \(100\le n\le 2000\)) and m edges, also matching the problem sizes in related work. Hence we have included cycle graphs (where \(m=n\)), as well as the complete directed graph on 100 vertices (with \(m=9900\)). We also believe that in applications, the used graphs are often planar. Thus we have selected the parameters of certain graphs to match planar graphs, where most faces are triangles and edges are bidirectional (\(m\approx 6n\)).

Table 3. SSSD execution benchmarks (times in seconds)

We see that in our tests, the offline phase requires around \(4.74n^2+1.28mn+0.181mn^2\) \(\upmu \)s, and the online phase around \(14.7n^2+67.3mn+0.0257mn^2\) \(\upmu \)s. The asymptotic running time of the BF algorithm is O(mn). Hence we see that the online phase depends much less on the \(mn^2\) term than the offline phase.

The running times reported in related work are much higher. The protocols of [1] are implemented on VIFF [12], running with three parties on a single machine, and requiring 5622 s for SSSD in 128-vertex complete graph. Compare this with our running time of 87 s (offline + online) for the 100-vertex complete graph. We require (550 + 1540) s (online + offline) for a graph with \(m=n=2000\). In [22], the same graph requires around 10000 s with implementation based on SPDZ [23] (hence their time probably does not include SPDZ precomputations).

7 Conclusions

In this paper, we have shown that arithmetic black boxes support fast lookups from private tables according to a private index. We have used this operation to obtain very efficient algorithms for certain tasks. Our results show that for private lookups in an ABB, complex techniques based on Oblivious RAMs [13] are not necessary. Beside the DFA execution or the Bellman-Ford algorithms, we expect our techniques to have wide applicability in making algorithms with sensitive data reading patterns privacy preserving.