Route Planning Systems (RPS) are a core component of autonomous personal transport systems essential for safe and efficient navigation of dynamic urban environments with the support of edge-based smart city infrastructure, but they also raise concerns about user route privacy in the context of both privately owned and commercial vehicles. Numerous high-profile data breaches in recent years have fortunately motivated research on privacy-preserving RPS, but most of them are rendered impractical by greatly increased communication and processing overhead. We address this by proposing an approach called Hierarchical Privacy-Preserving Route Planning (HPRoP), which divides and distributes the route-planning task across multiple levels and protects locations along the entire route. This is done by combining Inertial Flow partitioning, Private Information Retrieval (PIR), and Edge Computing techniques with our novel route-planning heuristic algorithm. Normalized metrics were also formulated to quantify the privacy of the source/destination points (endpoint location privacy) and the route itself (route privacy). Evaluation on a simulated road network showed that HPRoP reliably produces routes differing only by ≤ 20% in length from optimal shortest paths, with completion times within ∼ 25 seconds, which is reasonable for a PIR-based approach. On top of this, more than half of the produced routes achieved near-optimal endpoint location privacy (∼ 1.0) and good route privacy (≥ 0.8).

1 Introduction

Route Planning Services (RPS) are web-based applications that can calculate routes between two chosen points in a road network and are indispensable navigational aids for commuters, vehicle operators, autonomous vehicles, and so on. However, these same routes can reveal points of interest [15] that may contain users’ places of work and residence, in addition to highly sensitive information about their political, sexual, or religious tendencies [30]. Furthermore, this exposes users to risks of targeted criminal acts, mass surveillance, discrimination, and so on. The rise in high-profile data breaches over the past decade has highlighted the importance of protecting user location (and route) data and has spurred many recent works focusing on adding privacy-preserving mechanisms to RPS.

In the RPS context, privacy primarily entails keeping users’ origin, destination, and route information from being acquired by untrusted entities. It exists side-by-side with other important Quality of Service (QoS) metrics such as Utility (e.g., the accuracy of the routes) and Performance (e.g., the response time of the RPS), which affect whether or not an RPS would gain widespread adoption. Finding a good balance between these metrics is crucial for any privacy-preserving RPS. For instance, privacy-preserving protocols for querying road traffic data have been developed for Vehicle Ad-hoc Networks (VANETs), but these either offload the computation cost of route planning onto the vehicle itself [24, 25, 39] or an external trusted entity [3, 4]. The latter approaches are already done by modern RPS, while the former are not web-based RPS at all. Structured encryption schemes [17, 26, 37] provide privacy-protection for both user queries and road network data while also being relatively lightweight and efficient but at the cost of inherently leaking some query-related information that are detrimental to the privacy of user routes.

Aside from these, there are also works that utilize Private Information Retrieval (PIR) [7], since this protocol has strong privacy guarantees. In an RPS, their efficiency depends mostly on the routing algorithm but are often considered too computationally heavy to be used on very large databases—such as the road network graph of a large city. As such, only a few examples of PIR-based RPS have been developed over the past decade. One approach [35] compresses road network graphs in a novel PIR-queryable manner but results in longer pre-processing and query response times. The other approach [28] partitions the road network into disjoint subgraphs and these are individually retrieved via PIR to inform a local routing algorithm on the user’s device. This results in longer route completion times, since many PIR queries are needed to complete the route. Both approaches clearly entail a significant degradation of QoS, which dissuades most mainstream RPS from experimenting with and adopting them.

Our approach aims to create a RPS that addresses the aforementioned issues by fulfilling the following three objectives: (1) produce close-to-optimal users’ routes, (2) provide strong privacy guarantees for users’ route data, and (3) maintain an adequate level of performance by minimizing processing and communication overhead as much as possible. The approach involves two phases. In the pre-processing phase, a graph partitioning technique is used to hierarchically divide the road network into balanced partitions. Routes within each partition are then pre-computed and stored in separate databases, and the same is done for an additional set of routes between neighboring partitions. In the routing phase, a novel hierarchical heuristic algorithm on the user’s device privately obtains partial routes from the different partition databases using PIR and iteratively combines them to create a progressively finer route. We named the proposed approach, Hierarchical Privacy-Preserving Route Planning(HPRoP), which makes the following key contributions:

–

An RPS that uses a hierarchical partitioning of the road network with a novel hierarchical route-planning algorithm to produce routes with good optimal route approximation while maintaining strong privacy guarantees and low route completion times,

–

A pair of privacy metrics—endpoint location privacy and route privacy—that aim to be general enough to be applicable to other privacy-preserving RPS while also accounting for specific characteristics of PIR-based RPS (such as providing strong privacy guarantees between routes in the same database), and

–

A comprehensive evaluation of the Utility, Privacy, and Performance of the proposed RPS on a simulated road network based in Osaka City against two PIR-based baseline approaches.

Note that the baseline approaches mentioned above were also developed solely for this article to address the lack of recent PIR-based route-planning approaches that HPRoP can be compared against. However, we do not count them as separate contributions, since they are primarily used as an evaluation tool. The rest of this article is organized as follows: Section 2 presents a summary of prior work related to privacy-preserving route planning. Section 3 discusses the mathematical models, assumptions, and other preliminaries. Section 4 presents the concept and intuition behind route planning under the constraints of PIR. Section 5 defines and discusses the privacy metrics used to evaluate the RPS. Section 6 presents the key ideas behind the approach itself and the details of the heuristic algorithm. Section 7 discusses the evaluation framework and the results. Section 8 concludes with a brief summary of the article and some notes on potential future work.

2 Related Work

Most algorithms for calculating exact shortest paths require knowledge of the exact source and destination locations. Classical algorithms such as Dijkstra’s and Bellman-Ford [10] calculate routes by repeatedly scanning connected vertices from some source point and assigning them weights until a path to the destination point is found. Modern algorithms improve upon this by leveraging unique properties of road networks. ALT [18] pre-computes distances to fixed landmarks and uses them as lower bounds to inform a bidirectional A* search [21]. Contraction Hierarchies [16] pre-processes the network graph to establish “shortcut edges,” facilitating faster route calculation between distant points. Customizable Route Planning [11] uses the network graph’s topology in their metric-independent hierarchical routing method. Regardless, deploying these on the server-side inevitably means that the user’s origin and destination must be divulged so the final route can be computed. Meanwhile, deploying these on the client side means downloading large amounts of road network data and computing routes on more resource-constrained machines. In other words, the first case compromises privacy while the second degrades functionality.

Alternatively, algorithms for finding approximate shortest paths also exist, focusing on quickly obtaining short routes rather than finding the exact shortest ones. Point-to-point variants of these [8, 22, 23] were developed at a time when mobile computational power was very limited and have been outclassed by modern exact shortest path algorithms. Yet, these remain useful in All-Pairs Shortest Path (APSP) distance oracles [1, 34] for speeding up goal-directed route calculations.

Recent research on privacy-preserving route-planning techniques generally fall under three categories: (1) Structured Encryption-based, (2) PIR-based, and (3) Other encryption-based schemes. In addition, a number of schemes for privately querying road traffic information with applications to vehicle navigation systems also exist, but, since these either perform route planning on the vehicle itself [24, 25, 39] or an external trusted entity [3, 4], these works have been excluded here.

Structured Encryption [6] refers to techniques that allow data structures to be encrypted such that these can be queried later in a privacy-preserving manner. They are typically more efficient in terms of computation time and communication overhead at the cost of leaking a small amount of information about the data and the queries. The approaches in References [26, 37] use structured encryption to find the shortest distance between vertex-pairs in encrypted graphs. These, however, are only able to find shortest distance values instead of the actual shortest paths and are vulnerable to collusion between the storage and computing servers—which are essentially the same entity in the RPS context. In contrast, Reference [17] is able to retrieve the actual shortest paths on a single-server setup, which effectively eliminates the aforementioned issues. However, pre-computing the encrypted database for large sparse graphs (i.e., with \(|V| \ge 10,\!000\) and \(|E| \ge 30,\!000\)) took upwards of 16.5 hours and produced very large files (around 4.4 GB), rendering it impractical for dynamic scenarios such as real-time route planning. Additionally, while all three have heavily constrained leakage profiles, some of the information they inherently leak may be detrimental to route privacy. For instance, the number of potential query elements (i.e., the database size) can be used to determine the specific subgraph of the road network graph being used. Query repetitions and total queries for route completion can be jointly analyzed across multiple sessions to deduce the actual route. Different approaches may also leak other information in addition to the ones mentioned here.

PIR [7] is a technique that allows remote databases to be queried in such a way that the retrieved element would not be revealed to the service provider or any third-party entity. PIR-based schemes, therefore, have slightly stronger privacy guarantees in that repeated queries and underlying database sizes are not leaked but have the disadvantage of much higher communication overhead. While most modern implementations [2, 19, 27, 29] have become very communication-efficient (i.e., up to \(O(\sqrt {N})\)), they remain impractical for accessing a database of APSP in a large city. This is because PIR schemes need to go through each individual element to avoid leaking information about the element being retrieved [5]. The approach in Reference [35] describes a method for compressing road network graphs via sign-decomposition combined with Yao’s garbled circuits [36] and PIR to protect both user queries and said graph, giving it strong end-to-end privacy guarantees. However, it also has relatively long pre-processing and query response times, since the protocol must operate on compressed and encrypted data at all times. The approach in Reference [28] partitions the network graph into disjoint sections (i.e., each consisting of a separate subgraph) that can then be retrieved via PIR during local computation of a shortest path during the routing phase. While this requires minimal pre-processing time, the total route completion time remains rather long, since the locally run routing algorithm would need multiple PIR queries to complete a single route.

Other encryption-based schemes with much stronger privacy guarantees also exist, but they are also much less efficient than the previous two. For instance, Reference [38] allows users to request routes between arbitrary source and destination partitions, as well as within said partitions in a privacy-preserving manner using 1-of-n Oblivious Transfer [31]. However, the scheme needs to compute All-Pairs of Shortest Paths (APSP) for the aforementioned partitions during the routing phase, drastically slowing down query response times. Similarly, the work in Reference [14] uses Paillier’s Encryption to privately query outgoing edge weights from vertices in the road network graph that, in turn, is used to inform a route-planning algorithm running locally on the user’s own device. The scheme, unfortunately, has a very high communication overhead, since it has to make a separate query for every vertex that needs to be “scanned” by the routing algorithm.

As this work aims to achieve strong privacy guarantees for users’ routes while also meeting utility and performance targets, PIR was chosen as the core privacy-preservation mechanism for the proposed approach. Unlike structured encryption, it does not leak information about query repetitions, which can potentially be analyzed by an adversary to distinguish between different route requests by the same user.

3 Models and Assumptions

This section presents the assumptions and mathematical models related to the road network graph, its corresponding partition graph, and the “approximate” routes that can be derived from the latter. Table 1 provides a summary of the symbols and notation used in this section.

Table 1.

Symbol	Description
\(G = (V,E)\)	Road network graph with road segments E and intersections V
\(l_{G}(u,v)\)	Length of a road segment having endpoints \((u,v)\)
\(s, d\)	Source (s) and Destination (d) vertices
\(r_{G}(s,d)\)	Path/route between s and d
\(L_{G}(r_{G}(s,d))\)	Total length of path/route \(r_{G}(s,d)\)
\(R_{G}(s,d)\)	All possible paths between s and d
\(\rho _{G}(s,d)\)	Shortest path between s and d
\(G_{P} = (P, C)\)	Partition graph derived from G with the set of all partitions P and the connections between them C
\(p = (V_{p}, E_{p})\)	A partition (i.e., a subgraph of G) consisting of road segments \(E_{p}\) and endpoints \(V_{p}\) within it
\(NB_{p_{u}}\)	Set of all neighboring partitions for partition \(p_{u}\)
\(\ell\)	Partition level
\(\mathcal {L}\)	Maximum partition level (\(0 \le \ell \le \mathcal {L}\))
\(P^{\ell }\)	Subset of partitions in P at level \(\ell\)
\(p^{\ell }_{i}\)	A partition belonging under \(P^{\ell }\) with a given index i
\(x_{u}\)	Representative vertex for the partition \(p_{u}\)
\(c_{p_{u}p_{v}}\)	A connection between partitions \(p_{u}\) and \(p_{v}\) through shortest path \(\rho _{G}(x_{u},x_{v})\)
\(dist_{G}(u,v)\)	Shortest path distance between u and v in G
\(\mathcal {D}(p)\)	Database of shortest paths for partition p
\(\mathcal {C}(\rho _{0}, \ldots ,\rho _{n})\)	Arbitrary route combination heuristic
\(r^{*}_{G}(u,v)\)	Approximate shortest path between u and v
\(\alpha (r^{*}_{G}(u,v))\)	Optimal route approximation metric

Table 1. Summary of Symbols Used in Section 3

3.1 Road Network Partitioning Model

A road network can be modelled as a directed graph \(G = (V,E),\) where the edges E represent road segments, and the vertices V represent either road intersections or terminals. Each road segment \(e \in E\) is a directed edge between two vertices \(u,v \in V\) such that \(e = (u,v)\), with parallel opposing lanes being represented by two directed edges going in opposite directions. The traversal cost for e is given by its length, \(l_{G}(u,v)\). A route or path between two vertices \(s,d \in V\) is defined as a sequence of vertices \(r_{G}(s,d) = (v^{sd}_{1}, v^{sd}_{2}, \dots , v^{sd}_{n-1}, v^{sd}_{n})\) where the first vertex \(v^{sd}_{1}=s\) and the last vertex \(v^{sd}_{n}=d\). The total length of \(r_{G}(s,d)\) is given by \(L_{G}(r_{G}(s,d)) = \sum ^{|r_{G}(s,d)|-1}_{i=1} l_{G}(v^{sd}_{i}, v^{sd}_{i+1})\). Denoting all possible paths (between s and d) as \(R_{G}(s,d)\), the Shortest Path is then:

\begin{equation} \rho _{G}(s,d) = arg~min_{r \in R_{G}(s,d)}~ L_{G}(r). \end{equation}

(1)

Most modern route-planning algorithms can already deal with very large road networks, but typically do not incorporate route privacy protection mechanisms out of the box, as they tend to significantly increase processing and communication overhead, as mentioned in Section 2. However, if the route-planning task can be divided, then the additional processing cost incurred by the privacy mechanism can be distributed across multiple devices instead and ultimately improve RPS performance. This is done by first dividing the road network into different areas called partitions.

An arbitrary partitioning of the road network graph G is represented by a separate partition graph \(G_{P} = (P,C),\) where the vertices P represent partitions and the edges C represent connections between them. Each partition \(p \in P\) is a subgraph \(p = (V_{p}, E_{p})\) such that \(V_{p} \subseteq V\) and \(E_{p} \subseteq E\). Each connection \(c_{p_{u}p_{v}} \in C\) is a shortest path \(\rho _{G}(x_{u},x_{v})\) between the representative vertices \(x_{u},x_{v} \in V\) of any two neighboring partitions \(p_{u},p_{v} \in P\) where \(x_{u} \in V_{p_{u}} \wedge x_{v} \in V_{p_{v}}\). Two partitions are considered “neighbors” if \(\exists ~e = (u,v)~\text{s.t.}~e \in E \wedge u \in V_{p_{u}} \wedge v \in V_{p_{v}}\). Finally, the set of all neighboring partitions for \(p_{u}\) is defined as \(NB_{p_{u}} = \lbrace p_{v} | p_{v} \in P \wedge (c_{p_{u}p_{v}} \in C \vee c_{p_{v}p_{u}} \in C)\rbrace .\)

Representative vertices are ideally chosen to minimize the shortest path distance to all other vertices in the partition. Vertices with high graph centrality are good candidates, but our preliminary experiments show that it is viable to simply choose the vertex closest to the geospatial average of the coordinates of each partition’s vertices with no impact on routing performance. Thus, representative vertices were chosen via this method. It then follows that \(l_{G_{P}}(p_{u},p_{v}) = L_{G}(\rho _{G}(x_{u},x_{v}))\). For simplicity, the shortest path distance between any \(u,v \in G\) is represented by \(dist_{G}(u,v)\), while \(dist_{G_{P}}(p_{u}, p_{v})\) refers to the shortest path distance between their containing partitions, \(p_{u}, p_{v} \in P\).

Partitions are defined in a hierarchical manner with the maximum partition level being \(\mathcal {L}\) and the subset of partitions for each level \(\ell \lt \mathcal {L}\) denoted as \(P^{\ell } \subseteq P\) such that \(P^{\ell } \subset P^{\ell -1}\) for \(\ell \gt 0\). A partition may also be defined as \(p^{\ell }_{i}\), where \(\ell\) is the partition’s level and i is the partition’s index at that level given \(P^{\ell } = \lbrace p^{\ell }_{0}, p^{\ell }_{1}, \dots , p^{\ell }_{n}\rbrace\). The base level partition set \(P^{0}\) contains only one partition/subgraph \(p^{0}_{0} = G\) at \(\ell = 0\). Conversely, \(p^{0}_{0}\) might be composed of several smaller partitions \(p^{1}_{0}\), \(p^{1}_{1}\), \(p^{1}_{2}\), and \(p^{1}_{3}\) at \(\ell = 1\) such that all of them are subgraphs of \(p^{0}_{0}\) and so on. For clarity, partition levels will henceforth be referred to by their position (i.e., higher or lower) instead of their subgraph’s size.

Each partition \(p^{\ell }\) has three sets of shortest path data, as depicted in Figure 1: (1) the shortest paths within the partition \(p^{\ell }\) (black edges), (2) the shortest paths to its neighboring partitions, \(NB_{p^{\ell }}\) (blue edges), and (3) the shortest paths between the containing partition, \(p^{\ell -1}\), to its own neighbors, \(NB_{p^{\ell -1}}\) (green edges). This database of shortest paths is represented by \(\mathcal {D}(p^{\ell })\).

Fig. 1.

3.2 Approximate Shortest Path Model

Our approach relaxes the shortest path problem by accepting approximate shortest paths between two areas (in this case, partitions) containing s and d in place of the exact shortest path between the two points. This, in turn, reduces the number of queries required to obtain a route (hence, faster route completion times) at the cost of potentially having slightly longer paths. These are formally defined here as follows: Given an arbitrary \(s,d \in V\), the exact shortest path \(\rho _{G}(s,d)\) rarely coincides with the shortest path \(\rho _{G_{P}}(x_{s},x_{d}),\) where \(x_{s}\) and \(x_{d}\) are the representative vertices in the same partitions as s and d, respectively. This is because only routes between representative vertices of partitions in P can be produced from \(G_{P}\), and it is highly likely that \(s \ne x_{s}\) or \(d \ne x_{d}\). However, it is still possible to “complete” the route by adding in the missing start and end sections. Letting \(\mathcal {C}(\rho _{1}, \ldots , \rho _{n})\) be a route combination heuristic, the simplest “completed” route would be:

\begin{equation} r^{*}_{G}(s,d) = \mathcal {C}(\rho _{G}(s,x_{s}), \rho _{G_{P}}(x_{s},x_{d}), \rho _{G}(x_{d},d)). \end{equation}

(2)

This can be generalized further by replacing \(\rho _{G_{P}}(x_{s},x_{d})\) with \(\mathcal {C}(\rho _{G_{P}}(x_{1}, x_{2}), \dots , \rho _{G_{P}}(x_{n-1}, x_{n}))\):

\begin{equation} r^{*}_{G}(s,d) = \mathcal {C}(\rho _{G}(s,x_{s}), \rho _{G_{P}}(x_{1}, x_{2}), \dots , \rho _{G_{P}}(x_{n-1}, x_{n}), \rho _{G}(x_{d},d)), \end{equation}

(3)

where \(x_{1}=x_{s}\) and \(x_{n}=x_{d}\). In this work, these kinds of combined paths are designated as approximate shortest paths, and their quality is measured based on how well they approximate their counterpart exact shortest paths. This metric is defined as the optimal route approximation:

\begin{equation} \alpha (r^{*}_{G}(s,d)) = \frac{l_{G}(r^{*}_{G}(s,d))}{dist^{*}_{G}(s,d)}, \end{equation}

(4)

where \(l_{G}(r^{*}_{G}(s,d))\) is the length of the approximate shortest path, and \(dist_{G}(s,d)\) is the length of the exact shortest path in G.

3.3 Assumptions

Having presented the mathematical models relevant to route planning, we now state the core assumptions in this work as follows: Foremost is that the RPS operates over a particular service region and is primarily used by the general public for their day-to-day activities. The service region in this case is assumed to be a geographical area of arbitrary shape and size that has fixed bounds. This area is assumed to have comprehensive road network data available such that an RPS can be used to calculate routes within it. This road network is assumed to be represented as a graph that can be divided multiple times to produce subgraphs representing partitions as per Section 3.1. Each highest-level partition is assumed to be handled by a distinct physical or virtual device for the purpose of the RPS. Additionally, it is assumed that each partition can calculate, build, and maintain its own database of shortest paths \(\mathcal {D}(p)\) independent of its other tasks. This per-partition database is then assumed to be queryable by users in a privacy-preserving manner through PIR.

We additionally assume that all entities other than the user are potential threats—henceforth, simply called “adversaries”—interested in gaining access to the user’s route information. Note that no distinction is made between the service providers themselves and malicious third-parties. The kinds of information that can be leaked include the user’s: (1) exact origin, (2) exact destination, and (3) the calculated “route” between them.

4 Route Planning with PIR

PIR can be used by RPS to provide strong privacy guarantees by protecting the database representation of the road network graph used to calculate routes. In the simplest case, consider a database containing APSP for the whole road network graph, indexed by pairs of origin and destination vertices, \((s,d)\). PIR can then be used to retrieve any route between any two locations on the road network using a single query (i.e., \(O(1)\)) with very high privacy. This is the naive approach, which is not feasible in practice for two reasons: (1) it requires a prohibitively large amount of storage space, and (2) pre-computing APSP information for large road network graphs takes a very long time. As the following sections will discuss space and time complexity in great detail, a summary of the symbols and notation for this section is provided in Table 2.

Table 2.

Symbol	Description
\(R_{G}^{*}\)	Set of shortest paths between all possible pairs of vertices in V
\(L_{G}^{max}\)	Length of the longest path in the set, \(R_{G}^{*}\)
\(\mathcal {C}_{pir}\)	Constant representing the impact of database sizes on record retrieval times for PIR
\(\mathcal {N}_{p}\)	Average number of vertices in a partition (across all \(p \in P\))
\(V^{adj}_{p}\)	Set of vertices in partition p that are adjacent to other partitions
\(\mathcal {N}^{adj}_{p}\)	Average number of vertices adjacent to other partitions (i.e., \(V^{adj}_{p}\)) (across all \(p \in P\))
\(\mathcal {N}_{c}\)	Average number of external connections (across all \(p \in P\))

Table 2. Summary of Symbols Used in Section 4

Assuming a graph with \(10,\!000\) vertices, a longest path length of 20 vertices, and a 1-byte representation for vertex data, the total PIR database size is already around 2 GB. This is even larger for city-sized road networks such as Osaka City’s, which has \(\sim 99,\!000\) vertices (\(\sim 200.8\) GB keeping all other parameters same). In short, letting \(R^{*}_{G} = \lbrace r^{*}_{G}(u,v) | u,v \in V \rbrace\) and \(L^{max}_{G} = L_{G}({\arg\max}_{r \in R^{*}_{G}} |r|\)), the space complexity for such an approach is \(O(|V|^{2} \cdot L^{max}_{G})\). This issue is made even worse by PIR, since individual record retrieval times become slower with larger database sizes. This has been verified through preliminary experiments where data retrieval times were observed to scale linearly with database sizes by a constant factor, \(\mathcal {C}_{pir}\), such that accessing a single route would have a time complexity of \(O(\mathcal {C}_{pir} \cdot |V|^{2}) \approx O(|V|^{2})\) instead of the expected \(O(1)\).

Since time complexity is heavily dependent on space-complexity for PIR-based approaches, existing works [28, 35] have focused on tackling the space complexity problem, since pre-processing is assumed to be done offline only once. This is not the case for real-world road networks, however, where traffic conditions can change very quickly. One way to solve this is by distributing route data among several edge servers such that each one only manages vertices for a distinct partition. This reduces the time complexity to \(O(|V| \cdot \mathcal {N}_{p}),\) where \(\mathcal {N}_{p} = 1 /|P| \cdot \sum _{p \in P} |V_{p}|\) is the average vertex count per partition, while the per-partition space complexity becomes \(O(\mathcal {N}_{p} \cdot |V| \cdot L^{max}_{G})\). These databases still need to be kept updated, but it is much easier to do so in a distributed manner. However, dividing the data comes at the cost of weaker privacy, since the set of route data stored on the accessed edge server is assumed known to the adversary. This is analyzed further in Section 5.

4.1 Exact Partial Region Dijkstra’s Algorithm (EPR-D)

The space complexity problem can be mitigated further by storing only edge weights between adjacent vertex pairs, as this is the minimum information needed by Dijkstra’s algorithm. This is also known as the adjacency matrix representation, which readily maps into a database that can then be used with any PIR scheme. We designated this PIR-adapted approach as Exact Partial Region Dijkstra’s Algorithm (EPR-D), since it simply partitions and distributes graph data across several edge servers and produces exact shortest paths. A notable disadvantage is its use of separate PIR queries to retrieve information for each vertex, since the original algorithm tends to scan a lot of vertices, which can make route completion times very long. It is also possible to reidentify routes based on the sequence of edge servers queried by the user. This is examined further in Section 5. This is reflected in its average time complexity of \(O(\mathcal {N}_{p} \cdot [\mathcal {N}_{p} + \mathcal {N}^{adj}_{p}] \cdot [|V| + |E| log |V|]),\) where \(\mathcal {N}^{adj}_{p} = 1 /|P| \cdot \sum _{p \in P} |V^{adj}_{p}|\) is the average number of vertices adjacent to other partitions, in turn, represented by \(V_{p}^{adj} = \lbrace v | v \notin V_{p} \wedge [ (u,v) \in E \wedge u \in V_{p} ] \rbrace\). Meanwhile, its average per-partition space complexity is \(O(\mathcal {N}_{p} \cdot [\mathcal {N}_{p} + \mathcal {N}^{adj}_{p}])\). Table 3 summarizes the complexity of EPR-D.

Table 3.

	Space Complexity	Time Complexity
Naive	\(O(\|V\|^{2} \cdot L^{max}_{G})\)	\(O(\|V\|^{2})\)
EPR-D	\(O(\mathcal {N}_{p} \cdot [\mathcal {N}_{p} + \mathcal {N}^{adj}_{p}])\)	\(O(\mathcal {N}_{p} \cdot [\mathcal {N}_{p} + \mathcal {N}^{adj}_{p}] \cdot [\|V\| + \|E\| log \|V\|])\)
APR-D	\(O([\mathcal {N}^{2}_{p} + \mathcal {N}_{c}] \cdot R^{*,max}_{G})\)	\(O([\mathcal {N}^{2}_{p} + \mathcal {N}_{c}] \cdot [ (\|P\| + \|C\| log \|P\|) + 2) ])\)

Table 3. Summary of Time and Space Complexity for Section 4

4.2 Approximate Partial Region Dijkstra’s Algorithm (APR-D)

A possible solution to the time complexity issue is by allowing the RPS to produce approximate shortest paths instead of exact shortest paths. This can be done as follows: (1) calculate an approximate route between source and destination partitions \(p_{s}\) and \({p_{d}}\) using Dijkstra’s algorithm over partition graph \(G_{P}\), (2) retrieve subpaths to both s and d from within their respective partitions via database lookup, and (3) merge all obtained paths to form the final route. We designated this approach as Approximate Partial Region Dijkstra’s Algorithm (APR-D), since it produces approximate routes instead of exact ones. While it is significantly faster than EPR-D, it requires more space (since full routes are stored). It also has weaker privacy guarantees, since routes between distant areas tend to follow the same intermediate route, as will be expanded upon in Section 5. Since the database has to store complete routes, the space complexity is larger than EPR-D’s being at \(O([\mathcal {N}^{2}_{p} + \mathcal {N}_{c}] \cdot R^{*,max}_{G}),\) where \(\mathcal {N}_{c} = 1 /|P| \cdot \sum _{p \in P} |C_{p}|\) is the average number of external connections from each partition. The different stages have different time complexities, with the first stage having \(O([\mathcal {N}^{2}_{p} + \mathcal {N}_{c}] \cdot [|P| + |C| log |P|])\) and the second stage having \(O(2 \cdot \mathcal {N}^{2}_{p} + \mathcal {N}_{c})\). The combined time complexity is then \(O([\mathcal {N}^{2}_{p} + \mathcal {N}_{c}] \cdot [ (|P| + |C| log |P|) + 2) ])\), which is significantly less than that of EPR-D, since \(|P| \lt \lt |V|\). The complexity of APR-D is also summarized in Table 3.

5 Proposed Privacy Metrics

To evaluate the effectiveness of a privacy-preserving PIR-based RPS, objectively quantifiable privacy metrics must first be established in the RPS context. Thus, the two models presented in this section jointly characterize the privacy of the different kinds of information that can be leaked by an RPS, as described in Section 3.3. Table 4 provides a summary of the symbols and notation used in this section.

Table 4.

Symbol	Description
\(\Omega (s,d)\)	Endpoint location privacy metric
\(\mathcal {R}_{G}(u,v)\)	Arbitrary routing mechanism operating on some graph G
\(Q^{s,d}\)	Query sequence used to obtain a route from s to d
\(Q^{*}\)	An arbitrary query sequence for no specific route
\(\Phi (Q^{*})\)	Route privacy metric for some candidate query sequence \(Q^{*}\)
\(P_{s,d}\)	Partition sequence derived from some query sequence \(Q^{s,d}\)
\(k(Q_{s,d}, Q^{*})\)	Indicator function for checking if \(Q{*}\) can replace \(Q_{s,d}\) and vice versa
\(V_{X}\)	A subset of V containing only the representative vertices

Table 4. Summary of Symbols Used in Section 5

5.1 Endpoint Location Privacy Model

As mentioned in Section 3.1, the partition database is used to store the shortest paths for each partition and is queried privately using PIR in our approach. The queried partitions are assumed to be knowable by adversaries, but strong privacy is still guaranteed for exact locations within each partition. Let \(\mathcal {R}_{G_{P}}(s,d)\) be a routing mechanism that returns the approximate shortest path \(\rho _{G_{P}}(s,d)\), and suppose that the origin location s is replaced with a nearby location \(s^{\prime }\). If \(s^{\prime }\) is still in the same partition as s (i.e., \(s^{\prime } \in V_{p_{s}}\)), then \(\mathcal {R}_{G_{P}}(s^{\prime },d)=\mathcal {R}_{G_{P}}(s,d)\). The same applies replacing d with any \(d^{\prime } \in V_{p_{d}}\). An adversary knowing only \(\mathcal {R}_{G_{P}}(s,d)\) would be unable to distinguish \(s,d\) from all other possible \(s^{\prime },d^{\prime }\) as long as \(s^{\prime } \in V_{p_{s}}\) and \(d^{\prime } \in V_{p_{d}}\). Thus, location privacy is guaranteed for s and d within \(p_{s}\) and \(p_{d}\), respectively.

The privacy for any \(s,d\) pair is then proportional to the number of possible \(s^{\prime },d^{\prime }\) pairs that can be drawn between \(V_{p_{s}}\) and \(V_{p_{d}}\). This is designated as the endpoint location privacy metric:

\begin{equation} \Omega (s,d) = 1 - \frac{1}{|V_{p_{s}}|\cdot |V_{p_{d}}|}, \end{equation}

(5)

where \(V_{p_{s}}\) and \(V_{p_{d}}\) give the sets of all vertices in \(p_{s}\) and \(p_{d}\), respectively. Note that while having more vertices per partition (i.e., larger \(|V_{p_{s}}|\) and \(|V_{p_{d}}|\)) is advantageous for endpoint location privacy, this also means higher computation overhead for PIR. Extensive preliminary experiments with the PIR scheme used by our approach showed that the retrieval times scaled almost linearly with the database size at a rate of roughly \(\frac{1}{45,000} \approx 2.22 \times 10^{-5}\) seconds per record. That is, a database with \(|V_{p}|^{2} \approx 10,\!000\) routes was found to have an average retrieval time of \(\sim 0.25\) second, while another with \(|V_{p}|^{2} \approx 100,\!000\) routes took \(\sim 2\) seconds.

5.2 Route Privacy Model

Endpoint location privacy assumes that a route’s origin and destination partition are already known and thus quantifies only the privacy of the exact origin and destination points. This section focuses on the privacy of the routes themselves. As with endpoint location privacy, it is assumed that the queried partitions are knowable by adversaries. It is also assumed that they have in-depth knowledge about the algorithms used by the RPS.

Let \(Q_{s,d} = \lbrace q_{0}, \dots , q_{n} \rbrace\) be the query sequence that a user must perform to obtain a route from s to d. This sequence can be transformed into a partition sequence \(P_{s,d} = \lbrace p_{0}, \dots , p_{n} \rbrace\) (where \(p_{0} = p_{s}\) and \(p_{n} = p_{d}\)) using a function \(f_{qp}:Q \xrightarrow {} P\) that maps every element of Q to its handling partition in P. If the partitioning is hierarchical, then only the highest-level partitions are considered, since they already contain the shortest path data of lower-level partitions, as stated in Section 3.1.

Note that partition sequences are not simply partitions along the final route. For instance, in the case of Dijkstra’s algorithm, they can be thought of as the entire sequence of “scanned” vertices, as depicted in Figure 2. It is therefore possible for several routes to share the same partition sequence (though unlikely in the case of Dijkstra’s). This is modeled as an indicator function that identifies whether or not a candidate \(Q^{*}\) can replace \(Q_{s,d}\) for calculating \(r_{G_{P}}(s,d)\) and vice versa:

\begin{equation} k(Q_{s,d},Q^{*}) = {\left\lbrace \begin{array}{ll}1 & \text{if}~f_{qp}(Q_{s,d}) = f_{qp}(Q^{*})\\ 0 & \text{otherwise.}\\ \end{array}\right.} \end{equation}

(6)

Fig. 2.

With this, the lower bound for the total number of distinct routes that share \(Q^{*}\) is:

\begin{equation} \sum _{s^{\prime },d^{\prime } \in V_{X}} k(Q_{s^{\prime },d^{\prime }}, Q^{*}), \end{equation}

(7)

where \(V_{X} \subseteq V\) contains only the representative vertices (e.g., \(x_{u}, x_{v}\)) associated with the partition graph \(G_{P}\), as described in Section 3.1. This ensures that routes where \(p_{s^{\prime }} \ne p_{s}\) and \(p_{d^{\prime }} \ne p_{d}\) are counted with equal importance as the route where \(p_{s^{\prime }} = p_{s} \wedge p_{d^{\prime }} = p_{d}\)—hence, the focus on distinct routes. Finally, the route privacy can then be quantified for any \(Q^{*}\) as follows:

\begin{equation} \Phi (Q^{*}) = 1 - \frac{1}{ \sum \nolimits _{s^{\prime },d^{\prime } \in V_{X}} k(Q_{s^{\prime },d^{\prime }}, Q^{*})}. \end{equation}

(8)

6 Hierarchical Privacy-Preserving Route Planning

Our proposed approach, HPRoP, is built up from several key design choices to meet particular requirements. That is, HPRoP should be able to:

–

REQ1: Compose feasible approximate shortest paths by carefully choosing its component routes from the appropriate partitions,

–

REQ2: Privately retrieve component route information from said partitions with minimal processing overhead,

–

REQ3: Produce an approximate shortest path with good optimal route approximation values (i.e., \(\alpha (r^{*}_{G}(s,d)) \rightarrow 1.0\)),

–

REQ4: Ensure a good level of privacy protection for the user’s exact origin and destination points and intermediate route,

–

REQ5: Reflect dynamic and up-to-date road conditions, and

–

REQ6: Scale reasonably well with changes in client demand and computational resource availability over time and per area.

All these are brought together by a novel hierarchical route-planning heuristic presented in the latter half of this section, along with other improvements to privacy and routing.

6.1 Private Information Retrieval (PIR)

HPRoP uses PIR as its core route privacy-preservation mechanism. The choice of implementation was the SealPIR [2] library configured to use Brakerski/Fan-Vercauteren (BFV) Homomorphic Encryption (HE) [13] with a database upper bound of \(N = 216\), a plaintext modulus of \(log(t) = 12\), and a dimensionality factor, \(d = 2\). This implementation was chosen specifically for its significantly reduced processing and communication overhead compared to other HE-based ones, making it ideal for an RPS. This along with the hierarchical route-planning heuristic drastically reduces the number of queries and, in effect, the route completion time.

6.2 Inertial Flow Partitioning

Optimal route approximation and endpoint location privacy are highly dependent on how the service region is partitioned. Straightforward methods such as grid partitioning are simple but often result in disjoint partition subgraphs in the presence of natural barriers such as rivers and so on. To mitigate this, one way would be to ensure that each partition subgraph is a strongly connected component, ensuring high internal connectivity and reachability. Examples of road network graph aware partitioning methods include PUNCH [12], Buffoon [32], and Inertial Flow [33].

Inertial Flow was chosen for HPRoP, as it is a relatively simple algorithm based on maximum flow that results in balanced partitions and also preserves internal connectivity. A vertex threshold of around 300 nodes per partition was chosen instead of an area-based threshold to guarantee an Endpoint Location Privacy of \(\Omega (s,d) \approx 99.998\%\) (i.e., \(\lt 0.002\%\) probability). Moreover, based on preliminary experiments with SealPIR, a partition database with \(300^2 \approx 90,000\) routes is expected to have retrieval times between \(1.5-2.5\) seconds (1.75 seconds, on average), which is viable when combined with HPRoP’s reduced query counts. A queue is initialized by adding the entire road network graph to it. A graph from this queue is then used as input to the Inertial Flow algorithm to produce two balanced partition subgraphs. The simple iterative technique in Reference [1] was used alongside Inertial Flow to find and apply optimal cuts during this step. If any of the subgraphs do not yet satisfy vertex threshold, they are simply added back to the queue. This entire procedure is then repeated until the queue is empty. Afterwards, every two consecutive cuts was then retroactively denoted as a separate partition level, as shown in Figure 3, and the partitions under each are then tagged accordingly. Due to the vertex threshold, the highest-level partition may vary greatly from area to area. Some routes therefore require more queries to complete over other routes, which increases profiling risk. A partition level threshold \(\ell _{threshold} = 3\) was therefore imposed such that higher level ones were reassigned to \(\ell = \ell _{threshold}\).

Fig. 3.

6.3 Distributed Architecture

HPRoP leverages the hierarchically partitioned road network by delegating each partition to a different entity. In the cloud-based scenario, these entities would be server instances; while, in the edge-based scenario, these would be edge-servers throughout the smart city. The edge-based architecture presents several advantages. First, it allows PIR queries to be directed only to partitions that have the information necessary to answer them, effectively distributing the computational load of using PIR. Second, it allows the system to better scale based on the number of users, availability of computing resources, and so on, which can vary greatly at different times across different parts of the city, as shown in Figure 4. Finally, it also makes the pre-computation of shortest paths to neighboring partitions more efficient, since it can be done independently by every partition after an initial exchange of road condition information with said neighbors. This is useful for reflecting dynamic road conditions bound to some local area.

Fig. 4.

6.4 Heuristic Algorithm

HPRoP uses a simple heuristic algorithm to arrive at an approximate shortest path as follows:

(1)

Initialization: Find an initial basis route between the lowest-level partitions containing source and destination, then move up one level.

(2)

Subroute Connection: Connect the source and destination partitions at the current level to the basis route using subroutes.

(3)

Basis Route Merging: Merge the subroutes into the basis route.

(4)

Repeat steps (2) to (3) until the highest partition level is reached

Algorithm 1 runs exclusively on the client-side, sending PIR queries to edge servers handling specific partitions. Route information is obtained solely through these PIR queries, and, thus, no information is leaked by the queries themselves. However, the number of queries, their timestamps, and the partitions they were sent to are still assumed to be known to the adversary.

The algorithm starts with an Initialization step (lines 2–4 in Algorithm 1), which finds an approximate shortest path between the lowest-level partitions containing s and d separately as shown in Figure 5. This level is denoted as the base level \(l_{o} = {\arg\min}_{l}~(p_{s}^{l} \ne p_{d}^{l})\). For simplicity, this is just denoted as \(FindBaseLevel(s,d)\) in Algorithm 1. The client then sends a PIR query to partition \(p_{s}^{l_{o}}\), retrieving a route to partition \(p_{d}^{l_{o}}\). This corresponds to \(RetrievePath(p_{u}, p_{v}, \mathfrak {D})\), which retrieves the shortest path between two partitions taking into account some direction flag \(\mathfrak {D}\). This flag simply indicates whether the path is being calculated from the source or the destination, which will be relevant later. The retrieved route is then denoted as the initial basis route \(r_{*}\) shown in Figure 6. At the end of this step, the current level variable \(l_{c}\) is also initialized (line 4).

Fig. 5.

Fig. 6.

The main loop starts from the Subroute Connection steps (lines 6–9 in Algorithm 1). The algorithm for Subroute Connection itself is described in Algorithm 2. A subroute \(r_{x}^{l_{c}}\) is defined as a path that connects a basis partition \(p_{x}^{l_{c}}\) to the current basis route \(r_{*}\). The basis partition given by \(p_{x}^{l_{c}}\) is always the source or destination partition at level \(l_{c}\) containing some vertex x and is used to obtain the source or destination sub-partitions (depending on the direction \(\mathfrak {D}\)) at line 3 of Algorithm 2. For instance, the source subroute is obtained by finding a sequence of shortest paths from \(p_{s}^{l_{c}}\) that connects to the basis route, as shown in Figure 7. This step also uses several important functions, such as: (1) \(FindRoutePartitions(r_{*},l)\), which gives the sequence of partitions at level l along \(r_{*}\), and (2) \(DoesNotIntersect(r_{x}^{l_{c}}, r_{*})\), which is “True” if \(r_{x}^{l_{c}}\) and \(r_{*}\) have no common vertices. Starting from \(p_{x}^{l_{c}}\), it builds \(r_{x}^{l_{c}}\) by retrieving a path to nearby connected partitions (line 9) and then connecting them to the subroute (lines 10–13). Since \(r_{x}^{l_{c}}\) might not yet intersect \(r_{*}\) during the initial iteration, this is repeated with an updated reference partition (lines 14–15) until an intersection is found. Computing the destination subroute follows the same steps but first has to reverse the aforementioned sequence (lines 5–7) so the algorithm can begin from the last route partition. This step is shown in Figure 8.

Fig. 7.

Fig. 8.

The Basis Route Merging step (line 13 in Algorithm 1) is performed once the source and destination subroutes are found. The basic idea is to find the “best” point at which the subroutes intersect with the basis route and join them there. For the source subroute, the “best” point is as far as possible from the start of the basis route; while for the destination subroute, this is as far as possible from the end of the basis route. This is illustrated in Figure 9. This merged route is then used as the new basis route. The current level \(l_{c}\) is then updated (line 15), and the loop is restarted. The loop is terminated once \(l_{c}\) reaches \(l_{threshold}\).

Fig. 9.

The time complexity of this algorithm depends on the required number of PIR database lookups. Finding the initial basis route and calculating the final routes at \(p_{s}\) and \(p_{d}\) always require a single lookup each. Meanwhile, the number of lookups at each level depends on the maximum length of the shortest path in \(G_{P}\) at that level, which is given by \(max_{p^{\ell }_{u},p^{\ell }_{v} \in P^{\ell }} |r^{*}_{G^{\ell }_{P}}(p^{\ell }_{u}, p^{\ell }_{v})|\), where \(G^{\ell }_{P} \subset G_{P}\) containing only that level’s partitions \(P^{\ell }\) and connections \(C^{\ell }\). The average time complexity is then:

\begin{equation} O\left(\left[\mathcal {N}^{2}_{p} + \sum \limits _{\ell \in L} \mathcal {N}^{\ell }_{c}\right] \cdot \left[ 3 + \sum \limits _{\ell \in L} 2 \cdot \max \limits _{p^{\ell }_{u},p^{\ell }_{v} \in P^{\ell }} |r^{*}_{G^{\ell }_{P}}(p^{\ell }_{u}, p^{\ell }_{v})| \right] \right), \end{equation}

(9)

where \(\mathcal {N}^{\ell }_{c}\) is the average number of connections from each partition at level \(\ell\). Since HPRoP precomputes and stores shortest path data for multiple levels per partition, it is necessary to account for \(\mathcal {N}^{\ell }_{c}\) in HPRoP’s space complexity:

\begin{equation} O\left(\left[\mathcal {N}^{2}_{p} + \sum _{\ell \in L} \mathcal {N}^{\ell }_{c}\right] \cdot R^{*,max}_{G} \right). \end{equation}

(10)

6.4.1 Route Privacy Mechanism.

Route privacy, as defined in Section 5.2, quantifies the privacy based on how many other possible routes have a query sequence matching that of a given route, where more matches mean better privacy. That is, an adequate route privacy mechanism should: (1) maximize the matching of query sequences between all possible routes and (2) minimize the information gain from the order of queries in the sequence itself. HPRoP already achieves the latter via hierarchical execution, which can somewhat obfuscate the actual query sequence but does not necessarily strengthen the former. To address this, the algorithm is extended to pad the query sequence with dummy queries. The basic idea is to query multiple other partitions instead of just \(p_{s}\) and \(p_{d}\) to ensure that the actual origin and destination partitions are “hidden” among them. In theory, querying more partitions would mean better route privacy at the cost of longer route completion times. Achieving full route privacy, however, would require querying all level partitions at every iteration of the algorithm at least once, which would require prohibitively long route completion times. For example, a service region with a total of 463 partitions would require roughly 810 seconds (13.5 minutes), on average, to complete a single route. However, simply querying a small random subset of the aforementioned partitions will not be enough to ensure a certain level of route privacy, since an adversary can simply use the hierarchy of partitions to check for inconsistencies in the set of queried partitions and easily identify the dummy ones. Instead, we chose to limit HPRoP to querying all other partitions under the same parent as the highest-level partitions containing s and d. This selection method is straightforward and ensures that none of the queried partitions can easily be identified as dummy partitions. Additionally, this ensures that route privacy will be around \(\Phi (Q^{*}) \approx 1 - 1/jk\), where j and k are the total number of partitions under the same parent partitions as \(p_{s}\) and \(p_{d}\), respectively.

This is implemented through the Subroute End Connection steps (lines 10–12 in Algorithm 1), while the procedure itself is presented in Algorithm 3. Instead of stopping when the subroute and basis route first intersect, this algorithm continues until all other partitions sharing the same parent \(p_{x}^{l_{c}-1}\) as the basis partition have been queried. This is done by repeatedly drawing a partition \(p_{sub}\) from a queue of these same-parent sub-partitions \(SP^{l_{c}}\) (line 10). If \(p_{sub}\) is the same as the current partition \(p_{c}\), then a part of the subroute is retrieved as normal (lines 11–18). If it is a route partition, then it is instead pushed back to the subpartition queue (lines 19–22). If both prior conditions are not satisfied, then a dummy query is simply sent to \(p_{sub}\) (line 24). This ensures that important queries are mixed in with dummy queries, making it more difficult to determine which ones are relevant.

6.5 Shortcut Connections

A simple strategy for improving algorithm performance is through pre-computing and storing shortest path data to partitions beyond just the adjacent ones. This reduces the number of queries to complete a route while also improving optimal route approximation for paths to further away partitions. These paths to non-adjacent partitions were therefore denoted as Shortcut Connections, represented as additional edges in the partition graph.

These shortcut connections, however, also increase the pre-computation time for each partition relative to how many of them need to be made. In addition, the extent of the road network graph needed for pre-computation also increases based on the distance to the partitions being connected. It is therefore more useful to limit the number of partitions to connect to and how far those partitions can be. HPRoP considers two methods for determining shortcut connections: (1) the Same Parent Shortcuts method and (2) the N-hop Neighbor Shortcuts method. Same Parent Shortcuts simply connects each partition to all other partitions under the same parent partition, as shown in Figure 10 (Middle). N-hop Neighbor Shortcuts pre-computes shortcuts to other N-hop away partitions, as shown in Figure 10 (Right). Both would theoretically improve the optimal route approximation when calculating subroutes between non-adjacent partitions and greatly reduce the possibility of broken routes. HPRoP currently has no mechanisms to handle these other than deferring to the user’s device to perform local route calculation to bridge the final gap. The usefulness of these methods is shown through the results in Section 7.3.1.

Fig. 10.

7 Evaluation

In this section, the details of the evaluation framework for HPRoP are first presented prior to showing the actual evaluation results and their subsequent analysis.

7.1 Environment

HPRoP was implemented in Python on a Jupyter notebook for ease of testing and visualization, with the notebook itself encapsulated in a Docker container for portability. The execution environment was a dedicated Linux server running Ubuntu 20.04.1 SMP equipped with a AMD Ryzen Threadripper 3970X 32-Core processor and 256 GB RAM in total.

The service region was a rectangular geographical area of roughly \(546~km^{2}\) (i.e., 26 km in width, 21 km in height) encompassing the entire road network of Osaka City, Japan, and a portion of the immediately outlying areas. Its road network graph consists of \(|V| = 99,\!734\) vertices and \(|E| = 269,\!614\) edges. The region is hierarchically partitioned using the Inertial Flow algorithm, as shown in Figure 3 based on the parameters in Section 6.2.

7.2 Methodology

Evaluation was done through several metrics under the following categories: (1) Utility, (2) Privacy, and (3) Performance. The Utility category pertains to the usefulness of the service, with the Optimal Route Approximation metric falling under this category. Since the base algorithm in Section 6.4 cannot guarantee complete routes, Route Errors are also included here as a metric. Route errors are then defined as the occurrence count of broken routes during testing. In turn, a broken route is defined as a route where a subroute connection cannot be established to the highest-level partition containing either s or d, and thereby results in a route that cannot be completed by HPRoP’s algorithm. The Privacy category is composed of the Endpoint Location Privacy and Route Privacy metrics, described in Section 5. The Performance category pertains to how well the service can deal with higher client demand, dynamic road conditions, and so on, without service quality degradation. The Memory Usage, Route Completion Time, and Pre-processing Time metrics fall under this category.

Evaluation was done by comparing HPRoP to the two baseline PIR-based approaches—EPR-D and APR-D—previously presented in Section 4. For the Utility category, Privacy category, and Route Completion Time metrics, all three approaches were evaluated by calculating routes for randomly generated \(s,d\) pairs until 4,000 successful routes have been completed. This termination threshold was chosen to be sufficiently high enough to capture any route errors that might occur for HPRoP. Note that both APR-D and EPR-D are guaranteed to produce complete routes, so this metric is no longer evaluated for them. Memory Usage was evaluated by calculating the projected size of the per-partition databases for the highest-level partitions. Finally, Pre-processing Time was evaluated by measuring the time to build each partition’s database.

7.3 Results

7.3.1 Effect of Shortcut Connections.

The base performance of HPRoP under different shortcut connection methods in Section 6.5 is first characterized with the goal of finding the method that maximizes optimal route approximation while minimizing both route completion time and route errors. Figure 11 shows the resulting distribution of optimal route approximation values for 4,000 successful test routes under different methods. Surprisingly, Same Parent Shortcuts have an almost negligible effect on optimal route approximation, with N-hop Neighbor Shortcuts being a more effective way to increase the said metric. However, Same Parent Shortcuts were highly effective in preventing the occurrence of broken routes, reducing the error count to 21, as shown in Figure 12. Further investigation of these remaining errors showed that they were caused by choosing the same vertex as s and d, which results in failure, as no routing can be done by the algorithm. In short, for all 4,000 test routes, Same Parent Shortcuts seem to eliminate all occurrences of true broken routes—i.e., where the highest-level partitions containing s or d could not be reached. This also validates the hypothesis that broken routes are caused by a lack of reachability at the final partition level that can contain more than 4 child partitions due to the deliberate choice to set \(\ell _{threshold} = 3\) mentioned in Section 6.2.

Fig. 11.

Fig. 12.

Meanwhile, using 1-hop Neighbor Shortcuts drastically improves the results of the 75th percentile from \(\alpha (r_{*}) \approx 1.54\) to \(\alpha (r_{*}) \approx 1.28\), but anything beyond 2-hop Neighbor Shortcuts is seen to have gradually diminishing returns. Finally, the overhead caused by the additional shortcut connections is evaluated based on the pre-processing time metric in Section 7.2. Figure 13 shows that both methods increase the per-partition pre-processing time to about \(\sim 1.5\) seconds once 1-Hop Neighbor Shortcuts are introduced but stabilizes around this value even as the number of hops is further increased. In contrast, the effect of Same Parent Shortcuts on pre-processing time is minimal, amounting to an increase of \(\sim 0.1\) second, on average. Thus, both methods are equally viable in terms of this metric.

Fig. 13.

Table 5 summarizes the results discussed so far. Based on these, it was decided to use 2-hop Neighbor Shortcuts with Same Parent Shortcuts as the representative configuration for HPRoP, as it offers good optimal route approximation and short pre-computation times with minimal errors.

Table 5.

	Normal				With Same Parent Shortcuts
	0-hop	1-hop	2-hop	3-hop	0-hop	1-hop	2-hop	3-hop
Optimal Route Approximation
Min	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0
25%	1.160	1.062	1.018	1.003	1.133	1.060	1.017	1.003
50%	1.314	1.141	1.080	1.052	1.270	1.138	1.079	1.052
75%	1.590	1.287	1.202	1.156	1.510	1.288	1.202	1.156
Max	16.256	5.210	5.371	5.371	15.424	5.210	5.371	5.371
Per-partition Routes Calculation Time (in seconds)
Min	0.04	0.11	0.19	0.19	0.08	0.11	0.19	0.19
25%	0.16	0.58	1.88	3.33	0.24	0.75	1.92	3.48
50%	0.20	1.43	2.49	4.41	0.47	1.51	2.90	4.75
75%	0.28	1.60	3.36	5.70	1.30	1.74	3.47	6.03
Max	1.27	3.17	5.78	10.55	3.15	3.37	6.75	11.44
Route Errors (per 4,000 successful routes)
Total	231.0	52.0	30.0	24.0	21.0	21.0	21.0	21.0

Table 5. Summary of Results for Different Shortcut Connection Configurations

7.3.2 Optimal Route Approximation.

The performance of HPRoP, APR-D, and EPR-D in terms of optimal route approximation is shown in Figure 14. As expected, EPR-D achieves a constant optimal route approximation value of \(\alpha (r_{*}) = 1.0\), since it always produces exact shortest paths. APR-D produced routes with \(\alpha (r_{*})\approx 1.44\) for the 75th percentile despite operating only over the partition graph. HPRoP produced even better routes with \(\alpha (r_{*}) \approx 1.20\) for the 75th percentile, whereas APR-D was only able to achieve this for the 25th percentile of all results. Additionally, HPRoP achieves much better worst-case routes than APR-D.

Fig. 14.

7.3.3 Endpoint Location Privacy.

Endpoint Location Privacy \(\Omega (s,d)\) describes how well the exact \(s,d\) is kept private, as described in Section 5.1. Since Inertial Flow partitioning was used, evaluation results showed that all three approaches were able to achieve an average endpoint location privacy of \(\Omega (s,d) = 0.999982\). Thus, each route is indistinguishable from approximately \(99\%\) of all other routes between the same partitions, making all three approaches equally viable.

7.3.4 Route Privacy.

Route Privacy \(\Phi (Q^{*})\) describes how well a route is kept private based on how many other routes share a query sequence similar to its own, as described in Section 5.2. To evaluate this, a lookup table for the routes between each \(s,d \in V_{x}\) such that \(s \ne d\) was done separately for EPR-D, APR-D, and HPRoP to account for their unique querying behaviors, as shown in Figure 16.

Fig. 15.

Fig. 16.

Figure 15 shows a comparison of the route privacy distributions across all three approaches. EPR-D showed the worst route privacy, achieving \(\Phi (Q^{*}) \gt 0\) only for the 25th percentile of all results. APR-D performed only slightly better with \(\Phi (Q^{*}) \gt 0\) for about \(50\%\) of all results, while achieving \(\Phi (Q^{*}) \ge 0.5\) for only \(25\%\) of them. This is expected, since both have no inherent privacy mechanisms, yet this does illustrate that APR-D is still better than EPR-D in terms of route privacy. HPRoP, in contrast, has route privacy of \(\Phi (Q^{*}) \ge 0.80\) for the 50th percentile of all results, while also achieving \(\Phi (Q^{*}) \ge 0.50\) for the 75th percentile, surpassing both EPR-D and APR-D. However, further analysis of the routes showed that the worst-case route privacy (i.e., \(\Phi (Q^{*}) = 0.0\)) still happens for \(\sim 12.2\%\) of all test routes. This suggests that HPRoP does not fully guarantee route privacy for all cases, although this can be increased further by adding more dummy queries.

Additionally, the relationship between optimal route approximation \(\alpha (r_{*})\) and route privacy metric \(\Phi (Q^{*})\) was analyzed for HPRoP. This was done to determine whether some tradeoff exists between the two metrics. The results in Figure 17 show that the majority of the routes have \(\alpha (r_{s,d}) \lt 1.2\) across widely varying levels of route privacy. This suggests that the two metrics have little effect on one another, and performing a simple Pearson correlation confirms that there is only a weak positive correlation (\(\rho \approx 0.104\)) between the two.

Fig. 17.

7.3.5 Route Completion Time.

Obtaining a route using any of the three algorithms (APR-D, EPR-D, and HPRoP) requires the client to make multiple PIR queries to the RPS. Thus, route completion time is highly dependent on how many such queries need to be made as shown in Figure 18, and is equivalent to the total processing overhead for an RPS. In this work, it is calculated based on projections derived from the preliminary experiments on SealPIR database retrieval times mentioned at the end of Section 5.1. The empirically derived value of roughly \(\frac{1}{45,000} \approx 2.22 \times 10^{-5}\) seconds per record is then used to calculate the route completion times, which are shown in Figure 19. Note that we opted to use projections here instead of simulating the actual results, as the latter would take a prohibitively long time to conclude in the case of EPR-D (and, to a lesser extent, APR-D). For instance, APR-D requires \(\sim 214\) queries, on average, which is expected to take 5.55 minutes (333.26 seconds) per route. EPR-D is even worse, requiring \(\sim 4,\!958\) queries, on average, which would take at least 2.15 hours (7,731.47 seconds) just to complete a single route. Both are clearly impractical from the perspective of any modern RPS. This is in contrast to HPRoP, which requires significantly less queries, on average (at \(\sim 25\) queries), and takes only 23.55 seconds per route. This is because HPRoP’s algorithm bypasses the need to explore large sections of the partition and road network graph, as it already starts with a very coarse route between the origin and destination partitions at the lowest level to guide its route search. APR-D and EPR-D, in contrast, explore outwards from the origin, checking every unexplored partition or vertex on the way, as per Dijkstra’s algorithm.

Fig. 18.

Fig. 19.

7.3.6 Memory Usage.

Memory usage depends on the size of the route databases maintained by each partition, which is very important in a distributed environment with resource-constrained devices. Figure 20 shows the distribution of per-partition memory usage across the three approaches. APR-D and HPRoP calculate and store a similar number of routes, thus requiring an allocation of around 20–160 MB per partition. EPR-D has a significantly smaller memory footprint at around 0.5–4 MB per partition, because it essentially stores a next-hop matrix instead. In practice, however, the memory usage of all three approaches is well within the capabilities of standard edge servers.

Fig. 20.

7.3.7 Pre-processing Time.

The pre-processing time metric is total time needed to build each partition’s database during the pre-processing phase, which determines how often it can be updated during operation. As shown in Figure 21, EPR-D requires less than \(\le 0.5\) second per partition, on average, since it only needs to calculate the routes inside each partition, while APR-D requires \(\le 1.5\) seconds, since it also needs to calculate routes to immediately neighboring partitions in addition. Meanwhile, HPRoP needs to calculate routes inside each partition, routes to neighboring partitions, and routes to other partitions under the same parent. This results in slightly longer pre-processing times at 1–5 seconds per partition. Regardless, the pre-processing time for all three approaches is clearly fast enough to accommodate frequent updates even under dynamic road conditions.

Fig. 21.

8 Conclusion

In this work, the Hierarchical Privacy-Preserving Route Planning (HPRoP) approach was proposed, which combines Inertial Flow partitioning, Private Information Retrieval (PIR), and Edge Computing techniques along with a novel hierarchical route-planning heuristic algorithm to produce routes that can adequately approximate the actual shortest paths while also providing endpoint location privacy and route privacy. HPRoP reliably produced routes with an optimal route approximation of \(\alpha (r_{*})\le 1.2\), while also achieving near-optimal endpoint location privacy at \(\Omega (s,d) \approx 1.0\) and good route privacy at \(\Phi (Q^{*}) \ge 0.5\). In terms of performance, HPRoP has a route completion time of around 23.55 seconds, on average, which is reasonable for a privacy-preserving RPS. Its viability for deployment in a distributed/edge-based smart city context were also shown through its relatively small memory footprint (20–160 MB for each partition’s database) and short pre-processing times (2–5 seconds per partition), which are well within the capabilities of conventional edge servers.

In addition, although most modern route-planning algorithms can also use PIR, we did not use them as the basis of our approach due to several reasons. For instance, bounded-hop techniques such as two-hop labelling [9] are the fastest known class of routing algorithms but they require computing and storing prohibitively large indices for city-sized road networks, which becomes even worse in combination with PIR. Separator-based techniques such as Customizable Route Planning (CRP) [11] and goal-directed techniques such as ALT (based on A*) [18] and Arc Flags [20] have very long precomputation times, making them infeasible to use with dynamic road networks having edge weights that need to be updated frequently. The hierarchical technique called Contraction Hierarchies (CH) [16] has none of the aforementioned problems but still presents a potential route privacy risk, since it must explicitly query the partitions along the actual shortest path multiple times to obtain the final path. Nevertheless, it is a good routing algorithm to consider for future PIR-based RPS work.

Aside from the experiments presented in Section 7, we also conducted tests on how well HPRoP generalizes to other road networks, but the results are omitted for conciseness. In particular, it was tested on three other large cities—New York, Shanghai, and Tokyo—by obtaining 4,000 test routes and examining the optimal route approximation distribution. For \(75\%\) of all routes, the optimal route approximation was at \(\alpha (r_{*})\le 1.20\) for New York, \(\alpha (r_{*})\le 1.23\) for Shanghai, and \(\alpha (r_{*})\le 1.24\) for Tokyo. This indicates that HPRoP generalizes somewhat, but further research is still required.

In the future, we also plan to refine HPRoP’s route-planning algorithm to further improve optimal route approximation and reduce route completion times by exploring more efficient data structures for route information and improve route privacy by considering out-of-order query execution.

References

[1]

Gaurav Aggarwal, Sreenivas Gollapudi, and Ali Kemal Sinop. 2021. Sketch-based algorithms for approximate shortest paths in road networks. In Proceedings of the Web Conference. 3918–3929.

Abstract

1 Introduction

2 Related Work

3 Models and Assumptions

3.1 Road Network Partitioning Model

3.2 Approximate Shortest Path Model

3.3 Assumptions

4 Route Planning with PIR

4.1 Exact Partial Region Dijkstra’s Algorithm (EPR-D)

4.2 Approximate Partial Region Dijkstra’s Algorithm (APR-D)

5 Proposed Privacy Metrics

5.1 Endpoint Location Privacy Model

5.2 Route Privacy Model

6 Hierarchical Privacy-Preserving Route Planning

6.1 Private Information Retrieval (PIR)

6.2 Inertial Flow Partitioning

6.3 Distributed Architecture

6.4 Heuristic Algorithm

6.4.1 Route Privacy Mechanism.

6.5 Shortcut Connections

7 Evaluation

7.1 Environment

7.2 Methodology

7.3 Results

7.3.1 Effect of Shortcut Connections.

7.3.2 Optimal Route Approximation.

7.3.3 Endpoint Location Privacy.

7.3.4 Route Privacy.

7.3.5 Route Completion Time.

7.3.6 Memory Usage.

7.3.7 Pre-processing Time.

8 Conclusion

References

Cited By

Index Terms

Recommendations

Preserving location privacy without exact locations in mobile services

Collaborative trajectory privacy preserving scheme in location-based services

K-Anonymous Cloaking Algorithm Based on Weighted Adjacency Graph for Preserving Location Privacy

Comments

Information

Published In

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations