research-article

Open access

Mobility Prediction via Rule-enhanced Knowledge Graph

Authors:

Qiaohong Yu,

Huandong Wang,

Yu Liu,

Depeng Jin,

Yong Li,

Lin Zhu,

Junlan FengAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data, Volume 18, Issue 9

Article No.: 215, Pages 1 - 21

https://doi.org/10.1145/3677019

Published: 09 October 2024 Publication History

PDF eReader

Abstract

With the rapid development of location acquisition technologies, massive mobile trajectories have been collected and made available to us, which support a fantastic way of understanding and modeling individuals’ mobility. However, existing data-driven methods either fail to capture the long-range dependency or suffer from a high computational cost. To overcome these issues, we propose a knowledge-driven framework for mobility prediction, which leverages knowledge graphs (KG) to formulate the mobility prediction task into the KG completion problem through integrating the structured “knowledge” from the mobility data. However, most related mobility prediction works only focus on the structured information encoded in existing triples, which ignores the rich semantic information of relation paths composed of multiple relation triples. In this article, we apply a dedicated module to extract the supplementary semantic structure of paths in KG, which contributes to the interpretability and accuracy of our model. Specifically, the extracted rules are applied to capture the dependencies between relational facts. Moreover, by incorporating user information in the entity-relation space with the corresponding hyperplane, our method could capture diverse user mobility patterns and model the personal characteristics of users to improve the accuracy of mobility prediction. Extensive evaluations illustrate that our proposed model beats state-of-the-art mobility prediction algorithms, which verifies the superiority of utilizing logical rules and user hyperplanes. Our implementation code is available at https://github.com/tsinghua-fib-lab/RulekG-MobiPre.git

1 Introduction

Rapidly evolving mobile localization technology has enabled us to access enormous spatio-temporal user mobility data, which helps us better analyze and model people’s mobility [30, 36, 43]. Currently, most existing approaches adopt data-driven methods to predict users’ future movement. As shown in Figure 1(a), these methods utilize powerful machine learning models to implement future mobility prediction based on information propagated along empirical physical paths, e.g., along the historical trajectories [14], or directly from all historical mobility records in a fully connected manner [7]. However, these methods either fail to capture the long-range dependency of trajectories [14] or suffer from a large computational cost [7], whose reason is that the utilized information propagation paths in these data-driven methods are either ineffective or inefficient.

Fig. 1.

Meanwhile, the emerging techniques of knowledge graphs (KG) offer a promising approach for extracting structured “knowledge” from the massive spatio-temporal mobility data [16]. A variety of applications based on KG methods, including user profiling and recommendation systems, have achieved considerable success [18, 27, 33, 36]. Thus, one intriguing research question is whether we can solve the mobility prediction problem in the knowledge-driven paradigm. As shown in Figure 1(b), based on KG, we are able to describe the mobility behavior of users in the form of relational facts in the KG and utilize powerful knowledge graph embedding (KGE) techniques to extract features from massive trajectory data. Then, logical rules between relational facts can be further extracted to serve as the information propagation paths. For example, just as the second rule example, assuming that one user visited location \(p_{1}\) at time \(t_{1}\) and location \(p_{2}\) at time \(t_{2}\), there must be a potential transfer relationship between these two locations. By injecting this logical rule into our constructed knowledge graph, we can predict future user mobility behavior accurately, e.g., \(p_{5}\) in Figure 1(b). In this way, the structured “knowledge” in terms of KGEs and logic paths can predict future user mobility behavior by examining the plausibility of potential facts, which exhibits a promising way to achieve effective and efficient mobility prediction.

However, utilizing KG to model the spatio-temporal mobility behavior of users is also difficult due to a number of key challenges. First, there exists significant diversity between the mobility patterns of different users. For example, the home place of one user might be the workplace of another user, leading to their different mobility patterns across these two places. How to model the diverse spatio-temporal mobility patterns of different users in a uniform manner based on KG techniques is the first challenge. Second, the mobility patterns of users can be represented by multiple types of relational facts describing the different aspects of interactions between the location and time. For example, the fact of visiting some location at some time characterizes the interaction between location entities and time entities, while the fact of moving from one location to another location characterizes the interactions between different locations. Overall, both these complicated relational facts and the dependency between them reflect important aspects of human mobility, while the latter can be well captured via logical rules, which describe the logical relationship between relations in KG. How to extract and utilize useful underlying logical rules to model these multiple types of relational facts is another challenge.

In this article, we focus on utilizing the knowledge graph technique to model the spatio-temporal mobility behavior of users. Specifically, the spatio-temporal mobility patterns of users are modeled by the facts belonging to elaborately designed relation types reflecting different aspects of human mobility. Further, in order to characterize the diversity of mobility patterns across different users, we introduce the concept of “user hyperplane.” The plausibility of fact is evaluated based on the projected embedding vectors of entities on different “user hyperplane,” which has a strong ability to characterize their difference. What’s more, in order to collaboratively learn different types of relational facts, we utilize the structure of logical rules and paths, which capture interpretable relationships and dependencies between differential relational facts. Through dedicated design on the logical rules extraction, more efficient logical rules are captured to characterize users’ complex mobility patterns. Besides, by using the logical rules to regularize the embedding vector of a fact and the path that can derive the fact based on the logic rules, different types of relational facts can be modeled in a cohesive manner by extracting the rich semantic structure between them.

In summary, our article makes the following contributions:

—

We adopt a novel framework to model the user’s spatio-temporal mobility patterns as multi-type relational facts in the knowledge graph. Specifically, the mobility patterns are characterized by the relation between temporal entities and location entities under different “user hyperplanes.” Then, the mobility prediction task is solved by estimating the plausibility of the corresponding facts.

—

We apply a dedicated logical rule extraction method to capture the valid logical rules, after that, creatively inject these rules and corresponding paths into the knowledge graph to collaboratively learn different types of relational facts in a cohesive manner, which is able to extract the rich semantic structure between them and better characterize the different aspects of mobility patterns.

—

Extensive evaluations show that our proposed model beats state-of-the-art mobility prediction algorithms. Specifically, the mean reciprocal rank (MRR) is relatively improved by 7.78% on average. What’s more, logical rules have been verified to be beneficial, i.e., the MRR is relatively improved by 6.31% on average by introducing logical rules, demonstrating the effectiveness of our proposed model.

The structure of this article is as follows. We begin by briefly reviewing the related works systematically and underlining the difference of our work. Further, we introduce a mathematical model and formulate the problem. We then present details of our methodology and propose the mobility prediction. Following our methodology, we evaluate and validate the prediction performance of our proposed approach. Finally, we conclude with a summary and possible future work.

2 Related Work

2.1 Mobility Prediction

In the past decade, mobility prediction has been widely investigated in spatio-temporal data mining [34], which aims to predict users’ future mobility based on historical trajectories. Especially, the early methods focus on Markov model and its variations, representing locations as states while leveraging the transition matrix for prediction. For example, based on the clustered historical trajectories, hidden Markov models (HMMs) are firstly utilized to predict the user’s next location [21]. Further, GMove [42] characterizes group-level mobility regularity via the ensemble of HMMs. Besides, Multi-Chain [32] builds several interconnected Markov chains to model various mobility patterns for prediction. On the other hand, several recent works explore deep learning for mobility prediction. Long- and Short-Term Preference Modeling [26] focuses on the temporal and spatial correlations between historical and current trajectories, and modifies RNN for robust prediction. DeepMove [7] combines recurrent neural network (RNN) and attention mechanism for interpretable mobility prediction. Following the similar design, DeepJMT [3] further performs joint mobility and time prediction. GETNext [40] incorporates users’ preference, spatio-temporal context, and other relevant features together into a transformer model to better capture the mobility characteristics. SSDL [8] introduces a novel disentangled representation learning architecture to understand human time-independent and time-dependent mobility patterns. GaGASAN [35] utilizes a multi-scale time encoding technique and a self-attention mechanism to model different temporal patterns and capture long and short range contexts of sequence transitions, respectively. However, most existing mobility prediction models focus on sequence modeling, while ignore the semantic relationships among locations, time points as well as user behaviors, which is exactly what our proposed model achieves.

2.2 Knowledge Graph

Due to the limited observations, KG is incomplete and thus needs to be completed, which motivates several important related works [23]. For instance, TransE [1] completes missing links by measuring the distance between head and tail entities via relation translation. DistMult [37] and ComplEx [28] extend Canonical Polyadic decomposition for completion with symmetric and complex-valued embedding adopted, respectively. Recently, more works have begun to develop dynamic knowledge graph models instead of static KG. HyTE [5] achieves the temporal KG completion through projecting entities and relations to the timestamp space. DE-SimplE [9] put forward diachronic entity embeddings to characterize the features timely with high expressive ability. TNTComplEx extends ComplEx to temporal KG, and achieves the state-of-the-art performance. Such works above only focus on the direct link in KG, while recent works further consider the logic rule in KG. Specifically, PTransE [13] extracts rule-based relation paths, which are combined with TransE to infer missing links between entities. RJPE [22] uses Horn rules to associate relations at the semantic level, and provides an interpretable KG completion framework. Moreover, both Neural LP [39] and DRUM [24] combine first-order logic with RNN to learn logic rules in KG for completion. Hence, such methods motivate our rule-enhanced KG for mobility prediction. Besides, there is also some research applying KG for user behavior modeling. For example, KGPMF [44] learns an urban movement KG via a multi-view learning framework. IMUP [33] constructs a spatial KG for mobile user profiling in a reinforcement learning framework, while KG-MUP [18] combines both KG and feature engineering for mobile user profiling. However, such constructed KG largely ignores the temporal information, thus not applicable in our scenario. Other approaches incorporate temporal information in user behavior modeling. Specifically, STKG [31] constructs a spatio-temporal urban KG for user mobility prediction. STKGRec [2] builds a spatial–temporal KG from check-in sequences to implement the next point-of-interest (POI) recommendation. Although temporal information is considered in these approaches, they describe user behavior based on only a single type of relations, and thus are limited in describing the multiple aspects of user behavior. Different from them, we utilize multiple types of relations to describe the different aspects of user behavior, and further extract logical rules between relations to model their dependency.

3 System Overview

3.1 Knowledge Graph

The knowledge graph is a directed graph structure defined on the nodes of entities. The edges connecting different entities correspond to the relational facts, of which the types are defined to the relations. Specifically, each relational fact in the knowledge graph is represented by a three-tuple \((e_{h},r,e_{t})\) consisting of the head entity \(e_{h}\), tail entity \(e_{t}\), and relation \(r\). The set of possible entities and relation types are organized to be the schema of the knowledge graph. Due to the strong ability of the knowledge graph model in terms of extracting structured knowledge from data and modeling multiple types of relations in a uniform manner, it has shown great success in numerous applications including language understanding and recommendation systems [15, 27]. In this article, we aim to model the human mobility behavior based on the knowledge graph model.

3.2 System Overview

We show the workflow of our proposed system in Figure 2. As we can observe, our system first defines the mobility-based schema to point out the formats and types of entities and relations, which is then used to extract the corresponding relational facts in the constructed knowledge graph owing to the characteristics of multivariate relational facts in the spatio-temporal mobility data, we propose a KGE model based on the concept of user hyperplane. Besides, we extract and inject the logical rules into the knowledge graph to learn semantic information between paths and relations, which helps to capture efficient high-order information and diverse user. Finally, according to the learned embeddings of entities and relations in our knowledge graph, the obtained embedding model is used to predict user future movement by evaluating the plausibility of the corresponding relational facts.

Fig. 2.

4 Knowledge Graph Construction

In order to characterize the spatio-temporal mobility behavior based on the knowledge graph model, we must first design the mobility-based schema to define the entities and relations in the knowledge graph explicitly. Specifically, four different types of entities are considered, including the spatial venues, the regions, the categories, and the time units. The spatial venues represent the specific PoI, and we denote the set of spatial venues as \(\mathcal{L}\). Moreover, the regions and categories denote the pre-divided areas and pre-defined PoI categories, respectively, which are denoted as \(\mathcal{R}\) and \(\mathcal{C}\). As for the time units, following the extensively used timestamp preprocessing mechanism used in existing methods [36, 44], we divide one day into half-hour time intervals, and we further distinguish time bins belonging to working day and non-working day, i.e., totally 96 time units. We further represent the set of time units as \(\mathcal{T}\). In addition, we define the set of all entities as \(\mathcal{E}\). Then, between these entities, we define six different types of relations, including the spatio-temporal visitation relation, spatial transition relation, temporal closeness relation, spatial affiliation relation, spatial category relation, and similar visitation relation, to characterize the mobility behavior of users in the form of relational facts. The definitions of these relations are introduced in detail in the following.

The spatio-temporal visitation relation is defined to describe the visiting behavior to different locations at different times, which characterizes the interaction between locations and time units, where the locations can be PoIs, regions, or categories. Specifically, it can be formally defined as follows.

Definition 1

(Spatio-Temporal Visitation Relation). The spatio-temporal visitation relation \(r_{V}\) exists between the location \(l\) and time \(t\) if and only if a user visits the location \(l\) at time \(t\), of which the corresponding fact is represented by a three-tuple \((t,r_{V},l)\).

The spatial transition relation is defined to describe the transiting behavior of users between different locations, which characterizes the interaction between locations. Specifically, it can be formally defined as follows.

Definition 2

(Spatial Transition Relation). The spatial transition \(r_{T}\) exists between two different location \(l_{1}\) and \(l_{2}\) if and only if a user moves from the location \(l_{1}\) and \(l_{2}\) within a predefined temporal threshold \(T_{max}\), of which the corresponding fact is represented by a three-tuple \((l_{1},r_{T},l_{2})\).

In addition, we also define the temporal closeness relation to characterize the fine-grained structure between different time units. For example, two close time units should have similar embedding vectors. What’s more, the temporal closeness relation can be used as a bridge to build path structure based on which the rich semantic information can be transmitted and extracted. The details will be introduced in the methodology section. Overall, the temporal closeness relation is formally defined as follows.

Definition 3

(Temporal Closeness Relation). The temporal closeness relation \(r_{W}(\Delta t)\) exists between two time units \(t_{1}\) and \(t_{2}\) if and only if the time gap between \(t_{1}\) and \(t_{2}\) is less than \(T_{max}\) and \(\Delta t=|t_{1}-t_{2}|\). Without loss of generality, we assume \(t_{1}\leq t_{2}\), then the corresponding relational fact is denoted by a three-tuple \((t_{1},r_{W}(t_{2}-t_{1}),t_{2})\).

The spatial affiliation relation is defined to describe the affiliation property to different regions at different locations, which characterizes the interaction between spatial venues and regions. Specifically, it can be formally defined as follows.

Definition 4

(Spatial Affiliation Relation). The spatial affiliation relation \(r_{L}\) exists between the location \(l_{1}\) and the region \(r_{1}\) if and only if an arbitrary location \(l_{1}\) is located at the region \(r_{1}\), of which the corresponding fact is represented by a three-tuple \((l_{1},r_{L},r_{1})\).

The spatial category relation is defined as describing the category property to different categories at different locations, which characterizes the interaction between spatial venues and categories. Specifically, it can be formally defined as follows.

Definition 5

(Spatial Category Relation). The spatial category relation \(r_{C}\) exists between the location \(l_{1}\) and the category \(c_{1}\) if and only if the category of an arbitrary location \(l_{1}\) is the category \(c_{1}\), of which the corresponding fact is represented by a three-tuple \((l_{1},r_{C},c_{1})\).

The similar visitation relation is defined to describe the similar visitation property for two users who visit the same PoIs or have a higher cosine similarity of PoI category sequence, if and only if one user owns much historical data and the other user has scarce data. Generally speaking, two users sharing the same PoI visitation record or similar PoI category distribution should have similar mobility characteristics. Additionally, the defined relationship plays a significant role in assisting in embedding learning of users with scarce data. Overall, it can be formally defined as follows.

Definition 6

(Similar Visitation Relation). The similar visitation relation \(r_{S}\) exists between the user \(u_{1}\) and the other \(u_{2}\) if and only if the user \(u_{1}\) and user \(u_{2}\) have visited more than T identical PoIs or the cosine similarity of the visited PoI category sequence is greater than \(\theta\) and the number of one user’ records is less than N, of which the corresponding fact is represented by a three-tuple \((u_{1},r_{S},u_{2})\).

Based on the above-defined relations, we can extract relational facts from the observed spatio-temporal user mobility data. Further, users’ future movements can also be represented by unobserved relational facts. Then, predicting users’ future mobility can be converted to evaluating the plausibility of possible relational facts.

5 Methodology

5.1 Embedding Model Based on User Hyperplane

In order to predict the plausibility of unobserved relational facts, we follow the common practices of knowledge graph completion methods and utilize an embedding model to map each entity or relation type into a low-dimensional vector. Then, a scoring function of relational facts is defined based on the embedding vectors of entities and relations to measure the plausibility of relational facts. By training the embedding model on the observed facts, the scoring function is able to evaluate the plausibility of arbitrary unobserved relational facts.

Further, in order to characterize the diverse mobility patterns of different users in the framework of the knowledge graph, we model different users as different hyperplanes [5]. Then, the knowledge subgraph composed of relational facts belonging to each user is projected to the hyperplane to learn the personalized mobility patterns of the corresponding user.

For an arbitrary relational fact \((e_{h},r,e_{t})\) belonging to the subgraph of user \(u\), its plausibility is also evaluated on the hyperplane of user \(u\). Specifically, we denote their embedding vectors as \(\boldsymbol{e}_{h}\), \(\boldsymbol{e}_{t}\), and \(\boldsymbol{r}\), respectively, and the subgraph of user \(u\) is denoted by \(\mathcal{G}^{u}\). The user hyperplane is modeled by the normal vector denoted by \(\boldsymbol{u}\). Based on the above definitions, we first compute their projected embedding vectors on the user hyperplane \(\boldsymbol{u}\) as Equation (1)

\begin{align}\begin{cases}\boldsymbol{e}_{h}^{(u)}=\boldsymbol{e}_{h}-(\boldsymbol{u}^{T} \boldsymbol{e}_{h})\boldsymbol{u}, \\\boldsymbol{e}_{t}^{(u)}=\boldsymbol{e}_{t}-(\boldsymbol{u}^{T}\boldsymbol{e}_ {t})\boldsymbol{u}, \\\boldsymbol{r}^{(u)}=\boldsymbol{r}-(\boldsymbol{u}^{T}\boldsymbol{r}) \boldsymbol{u}.\end{cases}\end{align}

(1)

Further, we adopt the complex-domain scoring function to evaluate the plausibility of relational facts, which have been proven to have a strong ability in capturing antisymmetric relations [12, 28]. Specifically, we convert each real-valued embedding vector \(\boldsymbol{e}_{h}\) into the complex-valued vector by splitting it into two equal-sized parts, i.e., \(\boldsymbol{e}^{(u)}_{h}=[\hat{\boldsymbol{e}}^{(u)}_{h},\check{\boldsymbol{e} }^{(u)}_{h}]=\hat{\boldsymbol{e}}^{(u)}_{h}+i\check{\boldsymbol{e}}^{(u)}_{h}\). Further, \(\hat{\boldsymbol{e}}^{(u)}_{h}\) is used as the real component, and \(\check{\boldsymbol{e}}^{(u)}_{h}\) is used as the imaginary component to jointly form a complex vector in calculating the scoring function. Similar operations are also implemented on the relation \(\boldsymbol{r}\) and entity \(\boldsymbol{e}_{t}\). Then, the scoring function of each tuple \((e_{h},r,e_{t})\) under the user hyperplane \(\boldsymbol{u}\) is computed by Equation (2)

\begin{align}f^{(u)}(e_{h},r,e_{t}) & = {\rm Re}(\langle\hat{\boldsymbol{e}}^{(u)}_{h}+i\check{ \boldsymbol{e}}^{(u)}_{h},\hat{\boldsymbol{r}}^{(u)}+i\check{\boldsymbol{r}}^{ (u)},\overline{\hat{\boldsymbol{e}}^{(u)}_{t}+i\check{\boldsymbol{e}}^{(u)}_{t }}\rangle) \\& = \langle\hat{\boldsymbol{e}}^{(u)}_{h},\hat{\boldsymbol{r}}^{(u)}, \hat{\boldsymbol{e}}^{(u)}_{t}\rangle+\langle\hat{\boldsymbol{e}}^{(u)}_{h}, \check{\boldsymbol{r}}^{(u)},\check{\boldsymbol{e}}^{(u)}_{t}\rangle \\&\quad{} +\langle\check{\boldsymbol{e}}^{(u)}_{h},\hat{\boldsymbol{r}}^{(u) },\check{\boldsymbol{e}}^{(u)}_{t}\rangle-\langle\check{\boldsymbol{e}}^{(u)} _{h},\check{\boldsymbol{r}}^{(u)},\hat{\boldsymbol{e}}^{(u)}_{t}\rangle, \end{align}

(2)

where \(\langle\cdot\rangle\) is the multi-linear dot product. For an arbitrary complex number \(z\), \(\bar{z}\) is its complex conjugate, and \({\rm Re}(z)\) is its real component. Furthermore, the plausibility of whether the fact \((e_{h},r,e_{t})\) exists is modeled to be in proportion to its score \(f^{(u)}(e_{h},r,e_{t})\), i.e., Equation (2). The cross-entropy loss function can evaluate the difference between the probability distribution output by the model and the true probability distribution. The larger the difference, the more punishment. By minimizing the gap between model predictions and labels, the prediction results are more accurate. Thus, the cross-entropy loss function is widely used in multi-classification problems [12, 29]. Then, the embedding model is trained by using the cross entropy of the tail entity as its loss, which can be calculated as Equation (3)

\begin{align}\ell^{(u)}(e_{h},r,e_{t})=-f^{(u)}(e_{h},r,e_{t})+{\rm log}\left(\sum_{e^{ \prime}_{t}\in\mathcal{E}}{\rm e}^{f^{(u)}(e_{h},r,e^{\prime}_{t})}\right).\end{align}

(3)

Then, based on Equation (3) to calculate the relational fact loss, the total loss for all relational facts can be expressed by:

\begin{align}\mathcal{L}_{C}=\sum_{u\in\mathcal{U}}\sum_{(e_{h},r,e_{t})\in \mathcal{G}^{u}}\ell(e_{h},r,e_{t}).\end{align}

(4)

5.2 Logical Rules Extraction

Learning the embedding model based on the loss function Equation (4) cannot well capture the dependence between relational facts. For example, the combination of several relational facts is possible to derive another relational fact, which is ignored by the optimization only based on Equation (4). On the other hand, the dependence between relational facts can serve as the efficient information propagation path. For example, we can infer whether the relationship exists in the KG by integrating high-order information from other relational facts. Thus, the dependence between relational facts should be learned to extract rich semantics and high-order information. In order to achieve this goal, we utilize the structure of logical rules and paths [13, 22].

For example, the structure of two entities connected by a certain multi-step path can be used to derive another relational fact between these two entities. Specifically, the multi-step paths in the knowledge graph contain rich semantic information. Thus, by injecting logic rules in the knowledge graph, interpretable relationships and dependencies between differential relational facts can be captured [22].

To obtain the explainable and implicit rules from the knowledge graph, we utilize Neural LP [39] to extract logical rules and corresponding confidence automatically, which overcomes the disadvantages of utilizing fixed rules in previous papers [17]. Specifically, the relation path extraction is to retrieve a ranked list of the relational facts by maximizing the score of the query. For example, as shown in Equation (5), suppose given the query \((l_{1},r_{T},l_{2})\), i.e., a user visits two different spatial venues \(l_{1}\) and \(l_{2}\) at time units \(t_{1}\) and \(t_{2}\) within the temporal threshold \(T_{max}\), we can obtain a weighted relation path \((l_{1},r^{-1}_{V},t_{1})\land(t_{1},r_{W},t_{2})\land(t_{2},r_{V},l_{2})\), where \(r^{-1}_{V}\) is the inverse relation of \(r_{V}\) and the weight denotes the confidence correlated with this path

\begin{align}\quad query(l_{1},r_{T},l_{2})\leftarrow(l_{1},r^{-1}_{V},t_{1})\land(t_{1},r_ {W},t_{2})\land(t_{2},r_{V},l_{2}).\end{align}

(5)

Thus, our purpose is to obtain the rules and corresponding confidences \(\alpha\). Specifically, followed the operator \(M_{R}\) representing each relation \(R\) in TensorLog [4], what we want to learn is shown as follows,

\begin{align}\sum_{l}\alpha_{l}\prod_{k\in\beta_{l}}M_{R_{k}},\end{align}

(6)

where \(l\) contains all possible rules, \(\alpha_{l}\) is the confidence of the rule \(l\) and \(\beta_{l}\) is an ordered list of all relations in rule \(l\). Then, the score candidate entity \(e_{t}\) given an arbitrary entity \(e_{h}\) equals the inner products between the logical rules and head entity embedding \(\boldsymbol{e}_{h}\), just as Equation (7)

\begin{align}score(e_{t}|e_{h})=\boldsymbol{e}_{t}^{T}\left(\sum_{l}\left(\alpha_{l}\left(\prod_{k\in\beta _{l}}M_{R_{k}}\boldsymbol{e}_{h}\right)\right)\right),\end{align}

(7)

where \(\boldsymbol{e}_{h}\) and \(\boldsymbol{e}_{t}\) denote the embeddings of \(e_{h}\) and \(e_{t}\), respectively. Owing to the prediction task, we only focus on the two types of queries, i.e., relational facts including spatio-temporal visitation relation \(r_{T}\) and spatial transition relation \(r_{V}\). Thus, we can optimize the model by maximizing the score of the query in our constructed knowledge graph, as shown in Equation (8)

\begin{align}\mathop{max}\limits_{(\alpha_{l},\beta_{l})}\sum_{\left\{(e_{h},e_{t}) \right\}}score(e_{t}|e_{h})=\mathop{max}\limits_{(\alpha_{l},\beta_{l}) }\sum_{\left\{(e_{h},e_{t})\right\}}\boldsymbol{e}_{t}^{T}\left(\sum_{l}\left(\alpha_{l}\left(\prod_{k\in\beta_{l}}M_{R_{k}}\boldsymbol{e}_{h}\right)\right)\right).\end{align}

(8)

Finally, we can recover the logical rules and confidences according to the optimized parameters \(\alpha_{l}\) and \(\beta_{l}\).

5.3 Injecting Logic Rules

In order to utilize the rule structure, we first learn the representation of paths corresponding to different rules. Different from the common path representation method adopted in [13, 22], which only utilizes the embedding of the relations of facts along the path, we argue that the entities along the path are also important. For example, in the second rule shown in Figure 1(b), if \(l_{1}\) and \(l_{2}\) are the home and workplace of a user, respectively, the spatial transition relational facts between them occur with larger probability with certain time gaps, e.g., 8 hours. The intuition here is that the entities \(t_{1}\) and \(t_{2}\) in this path play a critical role in determining the plausibility of the corresponding facts. Thus, the entities along the paths should also be considered in calculating the path representations. Thus, we utilize separate neural networks to extract the embedding of paths with different lengths. By defining \(\boldsymbol{p}\) as the concatenated vector of all entities and relations along the path \(p\) except the head entity of the first fact and the tail entity of the last fact, the path representation vector \(\boldsymbol{C}(p)\) can be calculated as follows:

\begin{align}\boldsymbol{C}(p)={\rm MLP}(\boldsymbol{p}).\end{align}

(9)

Then, based on the obtained path representation, the similarity between paths and relations is evaluated by the following energy function [22]

\begin{align}E(p,r)=||\boldsymbol{C}^{(u)}(p)-\boldsymbol{r}^{(u)}||.\end{align}

(10)

We further denote the set of paths that satisfy the form listed in Table 4 and link entities \(e_{h}\) and \(e_{t}\) as \(P(e_{h},e_{t})\). Then, the loss function derived from the rule structure can be represented as follows

\begin{align}\mathcal{L}_{R}=\sum_{(e_{h},r,e_{t})\in\mathcal{G}\land p\in P(e_{h},e_{t})} \alpha*E(p,r),\end{align}

(11)

where \(\alpha\) represents the confidence of the corresponding relation path.

Due to the above equation of the loss regarding reconstructing the knowledge graph and modeling the dependence between relational facts based on rules, the total loss can be presented as follows:

\begin{align}\mathcal{L}=\mathcal{L}_{C}+\lambda\mathcal{L}_{R},\end{align}

(12)

where \(\lambda \gt 0\) is used to balance the loss function of reconstruction loss and logical rule loss. Overall, we can train our model by minimizing the total loss Equation (12), and obtain the embedding vectors of all entities in the knowledge graph, which will be further used in the mobility prediction task.

We present the pseudocode describing the process of learning embedding vectors of all entities and relations in the constructed knowledge graph in Algorithm 1. As we can observe, in each epoch, this algorithm samples a mini-batch of facts in the training set \(\mathcal{S}\). For each fact in the mini-batch, its loss is calculated based on reconstruction loss and logical rule loss. First, the proposed model evaluates the plausibility of the existence of the relational fact and calculates the reconstruction loss based on Equations (2) and (3), respectively. Then, according to the given head entity \(e_{h}\) and tail entity \(e_{t}\), judge whether its relational path set \(P\left(e_{h},e_{t}\right)\) is empty. If not empty, traverse all the paths to calculate the logical rule loss derived from the rule structure according to Equation (11). Finally, this algorithm updates embeddings of entities and relations based on the gradient of the total loss.

5.4 Mobility Prediction

Similarly, the pseudocode describing the process of implementing mobility prediction based on the knowledge graph model is shown in Algorithm 2. As we can observe, given user \(u\) and the corresponding target time \(t\) or the previously visited location \(l_{0}\), this algorithm will implement mobility prediction based on different types of relational facts. Specifically, if only the target time is given, the future movement is predicted by evaluating the plausibility of the corresponding spatio-temporal visitation relational fact. The approach is similar when only the previously visited spatial venue is given. Differently, if both the target time and the previously visited spatial venue are given, the mobility prediction result is given by maximizing the plausibility of the two different relational facts simultaneously, which is derived from the total loss function Equation (12). Note the loss in terms of the utilized rules is not calculated in the mobility prediction task, since it is only influenced by the embedding vectors of the relations and the passing entities of the paths, which is not the function of the tail entities of the paths. Thus, it can be ignored in Algorithm 2.

6 Experiments

6.1 Experiment Settings

6.1.1 Datasets.

In our experiment, two representative real-world mobility datasets are processed to evaluate the performance of our proposed model. The first one is a real-world mobile application dataset collected when users request location service through their applications in Beijing. The mobile application dataset contains users’ trajectories between 17 September and 31 October in 2016. Each record consists of the user ID, timestamp, and PoI, i.e., the user requests location service of WeChat like search, check-in, etc. Table 1 illustrates two example mobility records of user \(u_{1}\) and user \(u_{2}\), where \(l_{1},l_{2},l_{3}\) are different spatial venues and time is defined in like manner. We can observe that the different users hold distinguishing granularity and length and various mobility patterns. The other is the public Foursquare check-in data from New York, collected from Foursquare API from 3 April 2012, to 16 February 2013. This dataset spans almost one year and contains various mobility patterns, collected from a previous work [38]. In Foursquare, the location information is collected when users actively share their corresponding locations with their friends and the public, resulting in the sparsity of the Foursquare dataset and few records for some users.

Table 1.

User	Trajectories
user \(u_{1}\)	\((t_{1},l_{1})\)	\((t_{2},l_{3})\)	\((t_{3},l_{2})\)	-	-
user \(u_{2}\)	\((t_{1},l_{3})\)	\((t_{2},l_{1})\)	\((t_{3},l_{2})\)	\((t_{4},l_{1})\)	\((t_{5},l_{3})\)

Table 1. Two Simple Example Trajectories of Users

Finally, the experiments are conducted in terms of the 7:1:2 division mode. Specifically, the recorded trajectories of each user are ranked according to the time series first, then, the first 70% of each user’s trajectory data are trained on our proposed model, and the last 20% are leveraged to evaluate the performance of our method, i.e., predicting the users’ future location based on their historical trajectories.

The statistics of datasets are shown in Table 2.

Table 2.

Dataset	#Users	#PoIs	#Regions	#categories	#Records
Beijing	1,027	8,168	256	14	206,551
Foursquare	1,082	16,945	1,611	11	46,041

Table 2. Statistics of Datasets Used in the Experiments

6.1.2 Evaluation Protocols.

Four principle evaluation metrics are introduced to assess the performance on mobility prediction of our proposed model\(\colon\) the MRR and Accuracy@k (k = 1, 5, 10). The mathematical formula is shown as follows

\begin{align}MRR=\frac{1}{U\cdot N}\sum_{u=1}^{U}\sum_{i=1}^{N}\frac{1}{rank_{u,i}},\end{align}

(13)

\begin{align}Accuracy@k=\frac{1}{U\cdot N}\sum_{u=1}^{U}\sum_{i=1}^{N}{rank_{u,i}\leq k},\end{align}

(14)

where \(U\) represents the number of users, and \(N\) denotes the length of user \(u\)’ mobile records, \(rank_{u,i}\) represents the rank of \(i_{th}\) ground truth in the whole candidate locations of user \(u\)’ trajectories. We evaluate the performance by computing the mean results of all the users in terms of all metrics. MRR is the average reciprocal of ranks of predicting results, ranging from 0 to 1. Specifically, achieving high MRR means the effectiveness of our proposed method. Accuracy@k demonstrates the average probability of the ground truth appearing in the top-k candidate locations. Similarly, higher Accuracy@k can verify the superiority of the proposed method. In general, MRR and Accuracy@k metrics are utilized widely in prediction problems like link prediction [22], destination prediction [26], etc. Furthermore, the accurate prediction is meaningful significantly not only for the location recommendations for business [41] and the government’s regulation on transportation [20].

6.1.3 Baselines for Comparison.

To testify the superiority of our proposed approach, we compare it with the following two types of baselines, where the first three baselines are widely used mobility prediction algorithms, and the others are representative knowledge graph completion models.

RNN-Based Models. DeepMove [7] considers the attention module into the RNN to capture well the long-term dependencies between the future mobility and the historical trajectories. APHMP [6] applies an attention module to capture useful information in historical mobility records and integrates it into the RNN to model users’ mobility patterns.

Self Attention-Based Model. STAN [19] aggregates all relevant visits in human mobility records and recalls the most plausible location candidates through weighted representations to achieve location prediction.

GNN-Based Models. SACN [25] utilizes a weighted graph convolutional network as an encoder to learn the graph connectivity structure, and learned weights to represent the information propagation from neighbors. CompGCN [29] jointly embeds nodes and relations in a graph to learn embeddings by leveraging composition operations and demonstrates superiority on the link prediction task.

KG-Based Models. TNTComplEx [12] proposes a ComplEx [28] decomposition of four-order tensor characterizing the temporal scopes of each triple and incorporates a non-temporal component to represent the static component, which improves accuracy greatly on the temporal knowledge graph completion task. DE-Distmult [9] leverages a diachronic entity embedding function to characterize the entities in the corresponding time for temporal knowledge graph completion. xERTE [11] iteratively samples relevant edges of entities in the subgraph and applies an attention mechanism on the sampled edges to achieve temporal knowledge graph completion.

6.1.4 Qualitative Analysis.

Before conducting the experiments, we first analyze the differences between our proposed model and the baselines from a qualitative perspective, containing complexity and methodology.

Considering the complexity, the models of APHMP, DeepMove, and STAN all have about a quadratic growth of the time complexity with the length of the historical trajectories due to the quadratic computational complexity of the attention mechanism, while GNN-based models and KG-based models do not have this issue. These models only require to iterate over each record without the computational time influenced by the historical trajectories, indicating linear complexity with the problem size.

As for methodology, Table 3 illustrates the ability of our proposed model and baselines in terms of four aspects, containing Structured “Knowledge,” Long-Range Dependency, User Diversity, and Data Imbalance. Structured “Knowledge” means incorporating “knowledge” from multiple sources in a cohesive manner to better model user mobility behaviors. Long-range dependency and User Diversity mean capturing the long-range dependency and user-personalized mobility patterns of trajectories, respectively. The algorithm with a check mark in the column Data Imbalance indicates that it can alleviate the data imbalance issue and better model users with scarce data.

Table 3.

Models	Structured ”Knowledge”	Long-Range Dependency	User Diversity	Data Imbalance
APHMP	✘	✔	✘	✘
DeepMove	✘	✔	✔	✘
STAN	✘	✔	✘	✘
SACN	✔	✘	✔	✘
CompGCN	✔	✘	✘	✘
De-Distmult	✔	✘	✘	✘
xERTE	✔	✔	✔	✘
TNTComplEx	✔	✘	✘	✘
Our	✔	✔	✔	✔

Table 3. Comparison on Methodology

6.1.5 Parameter Setting.

In our experiments, the default embedding dimension of users, locations, and relations are all set as 40. In these two datasets, the embedding regularization is 0.01, and the learning rate ranges from 0.0005 to 0.01. Besides, the predefined temporal threshold \(T_{max}\) is set as 24. In addition, we apply grid search to select the hyper-parameter \(\lambda\). We manually finetune the hyper-parameter \(\lambda\) in 0, 0.5, 1, 2 to balance the loss in terms of reconstructing the knowledge graph and modeling the dependence between relational facts based on rules. The hyper-parameter \(\lambda\) is assigned as 0.5, where our proposed algorithm achieves the optimal results on validation sets.

6.2 Experiment Results

In this section, we demonstrate the extracted rules and evaluate the performance of our proposed approach and baselines on the two datasets in terms of four metrics and two scenarios, including the comparative experiments, ablation study, case study, etc.

6.2.1 Extracted Rules.

Table 4 in Appendix demonstrates the extracted five rules in our experiments, where the first and second rows present the confidences of the corresponding rules on the two datasets, respectively. From the extracted rules, we can observe that there exist complicated and implicit interactions between times and locations as well as locations and locations, which are stored in the relation paths. Besides, we select the top-k (k \({=}\) 0,3,5,10) rules from the extracted rules for mobility prediction to analyze the performance variation with the number of rules. From the evaluation results displayed in Figure 3, we can observe two interesting insights. (1) our model based on rules acquires much semantic information from the extracted rules and achieves better performance than without rules in terms of four metrics; (2) the extracted rules with higher confidence are more efficient on mobility prediction than lower confidence. For example, our model with top-5 rules obtains better performance than with top-10 rules. The reason may be that the lower-confidence rules have side effects on embedding learning and then influence prediction performance. Thus, the following experiments are all conducted on the situation using top-5 rules to evaluate the performance of mobility prediction.

Table 4.

Confidence	Confidence	Paths	Derived Relations
Beijing	Foursquare
0.974	0.561	\((l_{1},r^{-1}_{V},t_{1})\land(t_{1},r_{W}(\Delta t),t_{2})\land(t_{2},r_{V},l_ {2})\)	\((l_{1},r_{T},l_{2})\)
0.887	0.926	\((t_{1},r_{V},l_{1})\land(l_{1},r_{L},r_{1})\)	\((t_{1},r_{V},r_{1})\)
0.396	0.474	\((t_{1},r_{V},l_{1})\land(l_{1},r_{C},c_{1})\)	\((t_{1},r_{V},c_{1})\)
0.087	0.278	\((l_{1},r_{T},l_{2})\land(l_{2},r_{L},r_{2})\)	\((l_{1},r_{T},r_{2})\)
0.670	0.456	\((l_{1},r_{T},l_{2})\land(l_{2},r_{C},c_{2})\)	\((l_{1},r_{T},c_{2})\)

Table 4. The Extracted Top-5 Rules on Two Datasets

Fig. 3.

6.2.2 Overall Performance.

Considering different prediction methods, the Location-based prediction represents predicting users’ future location given a previously visited location, and the Time-based prediction means predicting users’ location at the given time. From the comparative results in Table 5, we can draw several significant conclusions. First, compared with the RNN-based model, the KG embedding models most achieve better performance on the Beijing dataset. Compared with the RNN-based and Self attention-based baselines, our proposed method obtains an improvement of 6.11% and 4.38% in terms of MRR on the Location-based prediction. This demonstrates that KG embedding models could better learn user embeddings, time, and locations by integrating the structured “knowledge” to achieve better mobility prediction. In addition, our proposed model consistently outperforms other KG embedding models in terms of all metrics. In particular, our model achieves a 4.9% performance gain on the Location-based Prediction compared with De-Distmult. It demonstrates that the relation paths indeed supply an excellent supplement for representation learning and capture the rich semantic information of mobile trajectories. Besides, owing to the restrictions on the symmetric relations of Distmult [37], TNTComplEx achieves better performance than De-Distmult, which demonstrates the effectiveness of the complex vectors. GNN-based models apply GCNs as the encoder to characterize the graph structure and KGE models to capture the interactions between entities and relations. CompGCN jointly generates expressive entity and relation embeddings, contributing to capturing the interactions between entities and relations and achieving higher accuracy than SACN, but still less effective than our model, which verifies the superiority of simultaneously capturing the semantic information of paths and entities and relations properties of our proposed model. Generally, the results illustrate that our proposed model outperforms all the baselines in terms of all metrics, which verifies the superiority of the KGE methods. Besides, the results also illustrate the rules indeed extract rich semantic information from the paths and construct the semantic association between relations, assisting learning relation embeddings and achieving better performance.

Table 5.

		Accuracy@k			MRR	Accuracy@k			MRR
		Location-based Prediction				Time-based Prediction
		\(k=1\)	\(k=5\)	\(k=10\)	MRR	\(k=1\)	\(k=5\)	\(k=10\)	MRR
RNN-based	APHMP	47.20%	67.69%	70.48%	56.45%	-	-	-	-
RNN-based	DeepMove	46.83%	68.38%	70.89%	56.48%	-	-	-	-
Self attention-based	STAN	48.33%	68.67%	72.24%	58.21%	-	-	-	-
GNN-based	SACN	49.91%	71.2%	74.9%	59.38%	48.40%	70.20%	74.47%	58.30%
GNN-based	CompGCN	51.73%	68.86%	72.61%	59.44%	49.65%	67.63%	71.0%	57.56%
KG-based	De-Distmult	48.4%	69.18%	71.74%	57.69%	48.71%	69.01%	71.16%	58.47%
	xERTE	49.79%	70.66%	72.78%	59.14%	50.19%	71.63%	73.88%	59.78%
	TNTComplEx	49.86%	73.67%	76.53%	60.15%	49.93%	72.22%	74.89%	59.87%
	Our	52.15%	77.32%	80.96%	63.09%	51.41%	77.24%	79.41%	62.87%

Table 5. Performance on the Beijing Dataset

Bold denotes best (highest) results, and italic indicates different metrics.

Furthermore, we implement the experiments on the Foursquare dataset, where the mobility records are sparser. As shown in Table 6, the superiority of KG embedding is not apparent compared with the Beijing dataset. The reason may be that RNN-based and Self attention-based models capture users’ mobility patterns more effectively on the long-term user behavior modeling problem. Even so, our model achieves 4.07% and 4.03% performance gain in terms of Accuracy@5 and Accuracy@10 compared with STAN. Generally, as can be expected, our proposed method could learn better embeddings and achieve more accurate prediction than baselines without the limitation of the dataset’s characteristics, which further demonstrates the effectiveness of employing rules for integrating paths to solve the mobility prediction problem.

Table 6.

		Accuracy@k			MRR	Accuracy@k			MRR
		Location-based Prediction				Time-based Prediction
		\(k=1\)	\(k=5\)	\(k=10\)	MRR	\(k=1\)	\(k=5\)	\(k=10\)	MRR
RNN-based	APHMP	6.11%	13.14%	16.21%	9.49%	-	-	-	-
RNN-based	DeepMove	8.36%	14.99%	16.76%	11.28%	-	-	-	-
Self attention-based	STAN	7.89%	15.65%	19.83%	12.72%	-	-	-	-
GNN-based	SACN	8.13%16.84%	19.59%	12.22%	9.52%	22.48%	26.85%	15.39%
GNN-based	CompGCN	8.66%	14.15%	16.71%	11.45%	11.57%	17.72%	19.45%	14.43%
KG-based	De-Distmult	8.83%	15.0%	16.81%	11.29%	9.35%	17.11%	22.61%	14.07%
	xERTE	8.18%	18.62%	20.88%	12.76%	10.94%	19.67%	22.85%	15.28%
	TNTComplEx	8.31%	17.97%	21.69%	12.85%	12.76%	19.81%	22.73%	16.23%
	Our	9.43%	21.43%	25.12%	15.09%	13.87%	25.28%	29.37%	19.24%

Table 6. Performance on the Foursquare Dataset

Bold denotes best (highest) results, and italic indicates different metrics.

6.2.3 Ablation Study.

To verify the effectiveness of different components of our proposed model, we implement the ablation study on the Beijing dataset. Specifically, -User hyperplane means removing the module of the user hyperplane, i.e., without user information. In the experiment, we define \(\boldsymbol{p}\) as the concatenated vector of all the entities and relations along the path \(p\) to obtain the path representation vector except the head entity and tail entity of two sides. -Multilayer Perceptron (MLP) means characterizing the path representation with only the relations along the path \(p\), abandoning the middle entities within the path. Finally, -Rule means removing the extracted rules and corresponding relation paths and modeling the defined relationships without injecting logical rules.

As shown in Table 7, we can observe that removing the user hyperplane will lead to significant performance degradation with 4.7% and 5.78% in terms of MRR on the Location-based Prediction and Time-based Prediction. The user hyperplane module is proposed to characterize the diverse mobility patterns of different users. From the results, we can conclude that our proposed method captures the characteristics of personalized mobility patterns and consistently achieves better performance in terms of four metrics by constructing the subgraph of the corresponding user.

Table 7.

Models	Location-based Prediction				Time-based Prediction
	Accuracy@k			MRR	Accuracy@k			MRR
	\(k=1\)	\(k=5\)	\(k=10\)	MRR	\(k=1\)	\(k=5\)	\(k=10\)	MRR
Our	52.15%	77.32%	80.96%	63.09%	51.41%	77.24%	79.41%	62.87%
-User hyperplane	49.17%	68.46%	71.55%	57.89%	48.32%	66.35%	68.72%	56.57%
-MLP	50.65%	75.67%	78.49%	61.67%	50.34%	75.5%	78.33%	61.36%
-Rule	49.49%	72.8%	74.76%	58.80%	48.42%	70.73%	73.56%	58.50%

Table 7. Prediction Performance of Ablation Study

Bold denotes best (highest) results, and italic indicates different metrics. MLP, multilayer perceptron.

Furthermore, the performance variation of the model discarding the MLP operator demonstrates the entities of the middle facts in the path contain important semantic information of mobility behaviors. Last, we attempt to delete the extracted rules and corresponding relation paths to evaluate the performance. As we can find, our proposed model obtains an improvement of 3.79% and 3.85% in terms of MRR on Location-based Prediction and Time-based Prediction, respectively, compared to the models without rules. This demonstrates that employing rules for extracting more semantic information and integrating paths facilitates relation embeddings and mobility prediction.

6.2.4 Imbalance Study.

Considering the situation that some users have abundant data in the mobile records, while some have scarce records, directly training the embedding model easily leads to the data imbalance issue. Our proposed model defines the similar visitation relation in the constructed knowledge graph to alleviate the above problem. To verify the efficiency of our model, we filter the users whose records are less than 30 and observe the model performance variation with or without the similar visitation relation. Figure 4 shows the comparison results of the prediction performance in terms of four evaluation metrics, where w/ sim denotes to include the similar visitation relation when constructing the knowledge graph, and w/o sim means not to include. It can be seen that after including this relation, users with scarce data have obvious performance gains in the four metrics, and achieve 2.03% and 3.07% in terms of Acc@5 and Acc@10, respectively. This experiment demonstrates that our model indeed captures the dependencies between users efficiently and assists the embedding learning of users with scarce data. Further, it also verifies that our model has indeed alleviated the problem of data imbalance to a certain extent by utilizing the similar visitation relation to capture the dependencies between users with similar visitation properties.

Fig. 4.

6.3 Case Study

To explore and analyze the performance variation of different users and PoIs, we utilize two rules to select parts of the mobility records and conduct case study experiments. Specifically, the users are clustered into different groups by record number and the radius of gyration [10]. Besides, each user constructs their own knowledge graph independently from one another to demonstrate the capability of capturing user mobility characteristics; similarly, the PoIs are divided by PoI visit frequency and \(\Delta t\) between the adjacent mobility records. The number of records indicates the length of the corresponding user’s mobility records. Specifically, a larger number of records means a denser trajectory during the same period, reflecting the sparsity level of the user’s mobility on the temporal dimension. The radius of gyration reflects the spatial range of user mobility, representing the entropy of locations in the spatial dimension. For the group of PoIs, PoI visit frequency denotes the visit frequency of certain PoI, representing the visitation popularity of users, while the \(\Delta t\) records the time intervals between adjacent mobility records, characterizing the temporal closeness level. The units of \(\Delta t\) and the radius of gyration are hours and kilometers, respectively.

From the evaluation performance in terms of four metrics in Figure 5, we can observe that our model achieves better performance as the number of records and PoI visit frequency increase. Conversely, the prediction accuracy decreases gradually as the radius of gyration and \(\Delta t\) increase. Specifically, from Figure 5(a) and (c), we can infer that entity and relations in the constructed knowledge graph with denser trajectories and higher PoI visit frequency can learn better embeddings and more mobility characteristics, which helps to achieve higher mobility prediction accuracy and also verifies the necessity of defining the similar mobility relation introduced before. Figure 5(b) demonstrates that the expansion of the spatial range of mobility brings more noises and bigger entropy, which influences the performance of mobility prediction, that is, the user tending to visit distant spatial venues has a bigger radius of gyration, making their mobility hard to predict. From Figure 5(d), we can draw the conclusion that the performance variation of \(\Delta t\) reflects that the spatial transition relation does not work with longer intervals between mobility records. Overall, the current mobility is mostly not affected by long-ago trajectories, but only related to the needs of the corresponding time, which is also consistent with two patterns about mobility prediction. Generally speaking, the experiments analyze the prediction performance through clustering the mobility records based on users and PoIs, which is the basis of model design.

Fig. 5.

7 Conclusions and Future Work

In this article, we focus on modeling human mobility behavior based on the knowledge graph technique. By proposing six different types of relations characterizing various aspects of human mobility in terms of temporal and spatial dimensions, we further propose an embedding model based on user hyperplane to model the diversity of mobility patterns of different users. Additionally, the logic rules are injected into the knowledge graph to model the semantic information between multi-step paths and relations. Then, the obtained embedding vectors of entities and relations are used to predict users’ future movement by evaluating the plausibility of corresponding facts. Extensive experimental results show that our proposed model beats the state-of-the-art mobility prediction algorithm by improving the prediction accuracy for around 7.78% relatively on average in terms of MRR, demonstrating its effectiveness. In future work, we will consider combining much more information about the locations, such as the latitude and longitude. On the other hand, considering more fine-grained user features to derive a better user hyperplane model is one research direction.

References

[1]

Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS ’13), Vol. 26, 2787–2795.

Abstract

1 Introduction

2 Related Work

2.1 Mobility Prediction

2.2 Knowledge Graph

3 System Overview

3.1 Knowledge Graph

3.2 System Overview

4 Knowledge Graph Construction

5 Methodology

5.1 Embedding Model Based on User Hyperplane

5.2 Logical Rules Extraction

5.3 Injecting Logic Rules

5.4 Mobility Prediction

6 Experiments

6.1 Experiment Settings

6.1.1 Datasets.

6.1.2 Evaluation Protocols.

6.1.3 Baselines for Comparison.

6.1.4 Qualitative Analysis.

6.1.5 Parameter Setting.

6.2 Experiment Results

6.2.1 Extracted Rules.

6.2.2 Overall Performance.

6.2.3 Ablation Study.

6.2.4 Imbalance Study.

6.3 Case Study

7 Conclusions and Future Work

References

Cited By

Index Terms

Recommendations

Evaluating Next-Cell Predictors with Extensive Wi-Fi Mobility Data

Mobility prediction using future knowledge

Spatio-Temporal Urban Knowledge Graph Enabled Mobility Prediction

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations