research-article

Open access

Context-Aware Semantic Annotation of Mobility Records

Authors:

Huandong Wang,

Yong Li,

Junjie Lin,

Hancheng Cao,

Depeng JinAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 16, Issue 3

Article No.: 47, Pages 1 - 20

https://doi.org/10.1145/3477048

Published: 22 October 2021 Publication History

All formats PDF

Abstract

The wide adoption of mobile devices has provided us with a massive volume of human mobility records. However, a large portion of these records is unlabeled, i.e., only have GPS coordinates without semantic information (e.g., Point of Interest (POI)). To make those unlabeled records associate with more information for further applications, it is of great importance to annotate the original data with POIs information based on the external context. Nevertheless, semantic annotation of mobility records is challenging due to three aspects: the complex relationship among multiple domains of context, the sparsity of mobility records, and difficulties in balancing personal preference and crowd preference. To address these challenges, we propose CAP, a context-aware personalized semantic annotation model, where we use a Bayesian mixture model to model the complex relationship among five domains of context—location, time, POI category, personal preference, and crowd preference. We evaluate our model on two real-world datasets, and demonstrate that our proposed method significantly outperforms the state-of-the-art algorithms by over 11.8%.

1 Introduction

With the popularization of mobile devices and the booming of location-based social networks (LBSNs), a massive volume of human mobility records becomes available to service providers. Among them, records labeled with semantic information (e.g., Point of Interest (POI)) are of the greatest importance because they indicate where users actually visit (e.g., a particular restaurant), which acts as a powerful context for accurate user behavior understanding and analysis [2, 4]. For example, using mobility records with semantic information, service providers can make more accurate location prediction [5, 21], better understand human mobility patterns [1, 36], know the user preference better and recommend corresponding products or POIs for them [24], and better facilitate other applications (e.g., healthcare [14]). Also, local businesses and governments can use the information to better understand region demands and support their decision making process [3, 12, 16].

Although labeled records with semantic information are useful, such data are limited in number since they can only be obtained when LBSN users actively post their status. Under most circumstances, however, only unlabeled GPS records are available to service providers when users use related applications. Moreover, these GPS records are often noisy and not perfectly located on certain POIs. Thus, it is important and challenging to annotate unlabeled records with correct POIs to make them as informative as labeled records.

To achieve this goal, a number of methods have been proposed in the literature. Existing studies consider the distance between the GPS coordinates and the location of POIs [35]. Gong et al. [7] further take the temporal context and POI category into consideration. Wu et al. [34] explore the history of a user’s mobility records, yet they model each user independently without considering the crowd preference. Zhang et al. [48] further leverage user grouping to annotate mobility records with POIs. These studies are still far from satisfactory since they did not properly address the following three challenges:

–Complexity of Context. Semantic annotations are based on external context, including the distance between the user and POIs, the popularity of POIs at different times of the day, user’s history, and other users’ history. It is challenging to consider all these factors simultaneously and combine them in one model of annotation.

–Sparsity of Records. It is challenging to model POIs and users before annotation due to the skew and sparsity of mobility records. Since a small number of POIs contribute to the majority of mobility records, it is challenging to model POIs that are rarely visited. Furthermore, it is difficult to model the mobility pattern of users who leave only a few records.

–Personal and Crowd Preference. To achieve personalized annotation, we should balance personal preference and crowd preference. When considering personal preference, most existing works ignore crowd preference, which is important information especially when the data are highly sparse.

To tackle the three challenges, in this article, we propose CAP, a context-aware personalized (CAP) semantic annotation model based on the complex context and user preference. More specifically, we propose a Bayesian mixture model by considering five domains of context simultaneously—location, time, POI category, personal preference, and crowd preference. In order to alleviate data skew, we assume POIs in the same category share the same temporal pattern, which is captured by a Gaussian mixture distribution [7]. To overcome the problem of data sparsity, we use the semi-supervised learning method by adding unlabeled records to the training set. We model user preference with the multinomial-Dirichlet distribution. By tactfully choosing the Dirichlet prior, we successfully balance user preference and common behavior patterns. Our contributions can be summarized as follows:

– We propose a novel Bayesian mixture model, which is the first work to simultaneously model five domains of context in human mobility. By taking location, time, POI category, personal preference, and crowd preference into consideration, we present by far the most systematic approach, which achieves considerable performance gain in the task of semantic annotation.

– We use a semi-supervised learning method to alleviate data sparsity by using both labeled and unlabeled mobility records. Specifically, we first model crowd preference using labeled records and further capture personal preference using unlabeled records.

– We evaluate our proposed model with real-world datasets of two cities through extensive experiments. The results show that our method significantly outperforms the state-of-the-art algorithms on all datasets by over 11.8%.

2 System Model and Overview

2.1 Mathematical Setup and Problem Description

We denote a human mobility record as a 4-tuple \(\mathbf {d}_{i}=\lbrace u_{i},\mathbf {l}_{i},t_{i},p_{i}\rbrace\), where \(u_{i}\) represents the user ID, \(\mathbf {l}_{i}\) represents the geographic coordinates, and \(t_{i}\) represents the timestamp. If the record is labeled, \(p_{i}\) is the associated POI. For unlabeled records without POI, \(p_{i}\) is null. Given any user \(u\in \mathcal {U}\), we define the set of all the labeled and unlabeled records as \(\mathcal {D}^{u}_{L}\) and \(\mathcal {D}^{u}_{U}\), respectively. The set of all labeled records of all users is denoted as \(\mathcal {D}_{L}=\bigcup _{u\in \mathcal {U}}\mathcal {D}^{u}_{L}\).

We further denote a POI as a 2-tuple \(p=\lbrace \mathbf {l}, c\rbrace\), where \(\mathbf {l}\) represents the geographic coordinates and \(c\) represents the category. The set of POIs and POI categories are denoted as \(\mathcal {P}\) and \(\mathcal {C}\), respectively. Given a location \(\mathbf {l}\), we further define a set containing candidate POIs within a distance \(\delta\) from location \(\mathbf {l}\) as \(\mathcal {P}_{\mathbf {l}}\), i.e., \(\mathcal {P}_{\mathbf {l}}=\lbrace p|p\in \mathcal {P}\wedge dist(p.\mathbf {l}, \mathbf {l})\lt \delta \rbrace\). Overall, we summarize major notations used in this article in Table 1. Then, the problem of this article can be defined as follows:

Table 1.

Notation	Description
\(\mathcal {U},\mathcal {C},\mathcal {P}\)	The set of all users, POI categories, and POIs, respectively.
\(\mathcal {P}_{\mathbf {l}}\)	The set of candidate POIs at location \(\mathbf {l}\).
\(\mathcal {D}^{u}_{L},\mathcal {D}^{u}_{U}\)	The set of labeled and unlabeled records for user \(u\), respectively.
\(\mathbf {d}_i,u_{i},\mathbf {l}_{i},t_{i},p_{i}\)	The \(i\)th record and its user, location, timestamp and POI.
\(r_i,c_i,z_i\)	The region, POI category and temporal Gaussian component of the \(i\)th record.
\(\mathbf {\mu }_{r},\mathbf {\Sigma }_{r}\)	The mean and covariance matrix corresponding to region \(r\).
\(\nu _{c,z},\sigma _{c,z}\)	The mean and standard deviation of temporal Gaussian component \(z\) for category \(c\).
\(\mathbf {\theta }\)	Parameters of multinomial region distribution.
\(\mathbf {\psi }_{c}\)	Parameters of multinomial temporal Gaussian component distribution for category \(c\).
\(\mathbf {\Phi }_{r,u}\)	Parameters of multinomial category distribution for region \(r\), user \(u\).
\(b\cdot \mathbf {\Phi }_{r}\)	Parameters of Dirichlet prior for \(\mathbf {\Phi }_{r,u}\) in region \(r\), where \(b\) is a hyper-parameter to adjust the influence of \(\mathbf {\Phi }_{r}\).
\(\mathbf {\mu }_{0},\kappa _{0},\mathbf {\Psi }_{0},\rho _{0}\)	Hyper-parameters of NIW prior for \((\mathbf {\mu }_{r},\mathbf {\Sigma }_{r})\).
\(\nu _{0},\lambda _{0},\epsilon _{0},\tau _{0}\)	Hyper-parameters of NIG prior for \((\nu _{c,z},\sigma _{c,z})\).
\(\alpha ,\beta ,\gamma\)	Hyper-parameters of Dirichlet prior for \(\mathbf {\theta },\mathbf {\Phi _{r}},\mathbf {\psi _{c}}\), respectively.

Table 1. Notations and Description

Semantic Annotation Problem:

Given: POIs \(\mathcal {P}\), labeled records \(\mathcal {D}_{L}\), a target user \(u\) with the corresponding unlabeled records \(\mathcal {D}^{u}_{U}\).

Problem: Find the associated POI \(p_{i}\in \mathcal {P}\) for each unlabeled record \(\mathbf {d}_{i} \in \mathcal {D}^{u}_{U}\).

2.2 System Overview

Figure 1 shows the framework of our POI annotation system that models users’ mobility behavior based on human mobility records. Our system takes labeled and unlabeled mobility records of all users as input. The goal is to annotate each unlabeled mobility record with the correct POI.

Fig. 1.

When annotating an unlabeled record with POI, our system takes the following context into consideration: the distance between the user’s location and candidate POIs, POI category obtained from map services, the popularity of POIs at different times of the day, crowd preference in different functional regions, and personal preference inferred based on user history.

As we can observe from Figure 1, for better modeling users’ mobility in terms of the spatial dimension, our system first automatically divides the city into functional regions, where most users follow similar crowd preference to POIs of different categories [45]. The motivation of using such automatically divided regions rather than existing static regions (e.g., administrative regions) is that users’ behavior may be not consistent with existing static regions. Furthermore, POIs and users are evolving over time. Thus, the boundaries of functional regions are ambiguous, and we should use regions automatically divided by our system.

At the same time, for better modeling users’ mobility in terms of the temporal dimension, based on timestamps and POI categories from the mobility records of all users, our system seeks to estimate the temporal patterns of visitations to different POIs. However, due to data skew, i.e., some POIs are popular while others are rarely visited, it is hard to model the temporal visitation pattern of each POI accurately. Thus, we utilize the fact that POIs of the same category share the same temporal visitation pattern, e.g., restaurants are popular during lunchtime and dinnertime. Then, they share the same parameters describing the temporal visitation pattern in our model, and our system only needs to estimate the temporal patterns of visitations to POIs of each category.

Our system also takes personal preference into consideration. By utilizing the crowd preference of automatically divided functional regions as prior, our model further learns the personal preference of each user based on his/her historical labeled mobility records.

Finally, our system combines the contexts of location, time, POI category, crowd preference, and personal preference in a cohesive manner to estimate the associated POI of each unlabeled mobility record.

3 Semantic Annotation Model

As introduced in the system overview, we design our system based on the following three observations: First, different areas of a city show distinct functions. In different functional regions (e.g., financial district, airport, uptown), people tend to visit POIs of corresponding categories. Second, different categories of POI have unique temporal visiting patterns, i.e., their popularity varies at different times of the day. Third, although users usually follow crowd preference when visiting an area, they also demonstrate personal preference.

3.1 Bayesian Mixture Model

Given user \(u\in \mathcal {U}\), we propose a Bayesian mixture model to describe the mobility patterns based on labeled records of all users \(\mathcal {D}_{L}\) and unlabeled records \(\mathcal {D}^{u}_{U}\). The \(i\)th record is denoted as \(\mathbf {d}_{i} \in \mathcal {D}_{L}\bigcup \mathcal {D}^{u}_{U}\). Our model jointly utilizes spatial context, temporal context, and user preference, and can be decomposed into three components: location, time, and POI category modeling.

3.1.1 Location.

It has been found that different areas of a city have distinct functions in a number of existing studies [41, 42]. Thus, our model divides a city into functional regions based on users’ mobility records. Given record \(\mathbf {d}_{i}\) with location \(\mathbf {l_{i}}\) characterized by geographic coordinates, we introduce a latent discrete random variable \(r_{i}\) to indicate which region it belongs to. The set of all regions is denoted as \(R\). We use multinomial distribution \(\rm {Multi}(\mathbf {\theta })\) to model \(r_{i}\), where \(\mathbf {\theta }\) is a \(|R|\)-sized vector and \(p(r_{i}=r)=\theta _{r}\) for each \(r\in R\). In order to model the randomness of human behavior and the inherent GPS error, for records belonging to each \(r\in R\), we use bivariate Gaussian distribution to model their location by following existing approaches [24, 45, 46]. In addition, in order to avoid too much parameters, the location \(l_i\) is modeled to be independent with the category \(c_i\) conditioned on the region \(r_i\). Therefore, given record \(\mathbf {d}_{i}\) belonging to region \(r\), the probability density of its geographic coordinate \(\mathbf {l_{i}}\) can be represented as follows:

\begin{equation} \begin{aligned}p(\mathbf {l_{i}}|r)=\mathcal {N}(\mathbf {l_{i}}|\mathbf {\mu }_{r},\mathbf {\Sigma }_{r})=\frac{\exp \left(\frac{-(\mathbf {l_{i}}-\mathbf {\mu }_{r})^{T}\mathbf {\Sigma }_{r}^{-1}(\mathbf {l_{i}}-\mathbf {\mu }_{r})}{2}\right)}{2\pi \sqrt {|\mathbf {\Sigma }_{r}|}}, \end{aligned} \end{equation}

(1)

where \(\mathbf {\mu }_{r}\) and \(\mathbf {\Sigma }_{r}\) are the mean vector and covariance matrix of region \(r\), respectively.

3.1.2 Time.

To alleviate data skew, i.e., some POIs are popular while others are rarely visited, we assume POIs of the same category share the same temporal visitation pattern. For example, restaurants are popular during lunchtime and dinnertime, whereas nightclubs are popular in the evening. We model the temporal pattern as Gaussian mixture distribution with \(H\) temporal Gaussian components [49]. Suppose that record \(\mathbf {d}_{i}\) belongs to category \(c\), the probability density of its time \(t_{i}\) can be written as

\begin{equation} \begin{aligned}p(t_{i}|c)\!=\!\!\sum _{z=1}^H \mathbf {\psi }_{c,z}\mathcal {N}(t_{i}|\nu _{c,z},\sigma _{c,z}) \!=\!\!\sum _{z=1}^H \mathbf {\psi }_{c,z} \frac{\exp \left(\frac{-(t_{i}-\nu _{c,z})^{2}}{2\sigma _{c,z}^{2}}\right)}{\sqrt {2\pi \sigma _{c,z}^{2}}}, \end{aligned} \end{equation}

(2)

where \(\mathcal {N}(t_{i}|\nu _{c,z},\sigma _{c,z})\) is a Gaussian component, \(z\) is a latent discrete random variable indicating which component it belongs to, and \(\nu _{c,z}\) and \(\sigma _{c,z}\) are mean and standard deviation of component \(z\) for category \(c\). Based on observations from existing studies [11, 20], most POIs have one or two peak hours in a day, we set \(H=2\) in our model.

3.1.3 POI Category.

To leverage personal preference, we assume that different users tend to visit different categories of POI in a certain region. For example, a user may visit office buildings and fast food restaurants in one region frequently if he works there, while another user who resides there may always visit an apartment instead. Given record \(\mathbf {d}_{i}\), we use variable \(c_{i}\) to indicate its category. Given region \(r\) and user \(u\), we use multinomial distribution \({\rm Multi}(\mathbf {\Phi }_{r,u})\) to model the category, where \(\mathbf {\Phi }_{r,u}\) is a \(|\mathcal {C}|\)-sized vector and \(p(c_{i}=c|r,u)=\Phi _{r,u,c}\) for each \(c\in \mathcal {C}\). Note that the correlation between \(c_i\) and \(\boldsymbol {l}_i\) are modeled by combining \(p(\boldsymbol {l}_i|r)\) and \(p(c_i|r,u)\) with the joint correlated variable \(r_i\). In addition, when the record \(\mathbf {d}_{i}\) is labeled, \(c_{i}\) is an observable variable, but it becomes a latent variable when \(\mathbf {d}_{i}\) is unlabeled.

However, the labeled mobility records of users are usually sparse [9], which leads to overfitting if we directly train \(\mathbf {\Phi }_{r,u}\) for each region and each user using such limited data. To overcome this problem, we further consider crowd preference by first replacing \(\mathbf {\Phi }_{r,u}\) with a universal vector \(\mathbf {\Phi }_{r}\) for all users, i.e., supposing every user has the same interest in a region. We use labeled records of all users \(\mathcal {D}_{L}\) to train the model and learn \(\mathbf {\Phi }_{r}\) for each region. After that, for each user \(u \in \mathcal {U}\), we add the corresponding unlabeled records \(D^{u}_{U}\) to the training set and use \(b\cdot \mathbf {\Phi }_{r}\) as a prior of \(\mathbf {\Phi }_{r,u}\), i.e., \(\mathbf {\Phi }_{r,u}\sim {\rm Dirichlet}(b\cdot \mathbf {\Phi }_{r})\) where \(b\) is a hyper-parameter to adjust the influence of \(\mathbf {\Phi }_{r}\).

Further, we employ common conjugate prior distributions as prior distribution for parameters in our model to simplify the parameter estimation. For \(\mathbf {\theta }\), \(\mathbf {\psi }_{c}\) and \(\mathbf {\Phi }_{r,u}\), which are the parameters of multinomial distribution, we employ Dirichlet distribution as their prior distribution. For \(\mathbf {\mu }_{r}\) and \(\mathbf {\Sigma }_{r}\), which are the parameters of bivariant Gaussian distribution, we employ Normal-inverse-Wishart (NIW) distribution as their prior. For \(\nu _{c,z}\) and \(\sigma _{c,z}\), which are the parameters of Gaussian distribution, we use Normal-inverse-gamma (NIG) distribution.

In summary, we divide a city into several regions and apply a Gaussian distribution for each region. For each POI category, we use the Gaussian mixture distribution to model its temporal pattern. Also, we adopt a multinomial distribution to model user preference in each region. Figure 2 plots the graphical model and the generative progress is shown in Algorithm 1.

Fig. 2.

3.2 Parameter Estimation

We use collapsed Gibbs sampling to approximate parameters in the model. For a record \(\mathbf {d}_{i}\) belonging to region \(r\), category \(c\), temporal Gaussian component \(z\), and user \(u\), its collapsed probability density can be represented as follows:

\begin{equation} \begin{aligned}&p(\mathbf {d}_{i}|\mathbf {d}_{-i},r,c,z,u) \\ =&\underbrace{\iint p(\mathbf {l}_{i}|\mathbf {\mu }_{r},\mathbf {\Sigma }_{r})p(\mathbf {\mu }_{r},\mathbf {\Sigma }_{r}|\mathbf {d}_{-i})d\mathbf {\mu }_{r}d\mathbf {\Sigma }_{r}}_{(1)} \underbrace{\int p(r|\mathbf {\theta })p(\mathbf {\theta }|\mathbf {d}_{-i})d\mathbf {\theta }}_{(2)} \underbrace{\int p(c|\mathbf {\Phi }_{r,u})p(\mathbf {\Phi }_{r,u}|\mathbf {d}_{-i})d\mathbf {\Phi }_{r,u}}_{(3)} \\ &\underbrace{\iint p(t_{i}|\nu _{c,z},\sigma _{c,z})p(\nu _{c,z},\sigma _{c,z}|\mathbf {d}_{-i})d\nu _{c,z}d\sigma _{c,z}}_{(4)} \underbrace{\int p(z|\mathbf {\psi }_{c})p(\mathbf {\psi }_{c}|\mathbf {d}_{-i})d\mathbf {\psi }_{c}}_{(5)}, \end{aligned} \end{equation}

(3)

where, we omit the hyper-parameters {\(\mathbf {\mu }_{0}\), \(\kappa _{0}\), \(\mathbf {\Psi }_{0}\), \(\rho _{0}\), \(\nu _{0}\), \(\lambda _{0}\), \(\epsilon _{0}\), \(\tau _{0}\), \(\alpha\), \(\beta\), \(\gamma\)} for simplicity.

For term (1) in Equation (3), since we use the conjugate prior for parameters \(\mathbf {\mu }_{r}\) and \(\mathbf {\Sigma }_{r}\), (\(\mathbf {\mu }_{r},\mathbf {\Sigma }_{r}|\mathbf {d}_{-i}\)) follows NIW distribution with parameters (\(\mathbf {\mu }_{r},\kappa _{r},\mathbf {\Psi }_{r},\rho _{r}\)), which can be calculated as follows:

\begin{equation} \left\lbrace \begin{aligned}\!\!\kappa _{r} & = \kappa _{0}+n_{r}, \quad \mathbf {\mu }_{r} = \frac{\kappa _{0}\mathbf {\mu }_{0}+n_{r}\bar{\mathbf {l}}_{r}}{\kappa _{r}}, \quad \rho _{r} = \rho _{0}+n_{r},\\ \!\!\mathbf {\Psi }_{r} & = \mathbf {\Psi }_{0}\!+\!\!\!\!\!\!\!\sum _{r_{j}=r,j\ne i}\!\!\!\!(\mathbf {l}_{j}\!-\bar{\mathbf {l}}_{r})(\mathbf {l}_{j}\!-\bar{\mathbf {l}}_{r})^{T} \!\!\!\!+\!\frac{\kappa _{0}n_{r}}{\kappa _{r}}(\bar{\mathbf {l}}_{r}\!-\!\mathbf {\mu }_{0})(\bar{\mathbf {l}}_{r}\!-\!\mathbf {\mu }_{0})^{T}\!\!\!, \end{aligned} \right. \end{equation}

(4)

where \(n_{r}\) is the number of records belonging to region \(r\) and \(\bar{\mathbf {l}}_{r}\) is the mean vector of these records, i.e., \(\bar{\mathbf {l}}_{r}=\frac{1}{n_{r}}\sum _{r_{j}=r,j\ne i}\mathbf {l}_{j}\). Therefore, the first term of the collapsed probability density can be calculated as

\begin{equation*} \begin{aligned}\!\!\int \!\!\!\!\!\int \!\!\!p(\mathbf {l}_{i}|\mathbf {\mu }_{r},\!\mathbf {\Sigma }_{r})p(\mathbf {\mu }_{r},\mathbf {\Sigma }_{r}|\mathbf {d}_{-i})d\mathbf {\mu }_{r}d\mathbf {\Sigma }_{r} \!=\! t_{\rho _{r}-1}\left(\mathbf {l}_{i}|\mathbf {\mu }_{r},\!\frac{\mathbf {\Psi }_{r}(\kappa _{r}+1)}{\kappa _{r}(\rho _{r}-1)}\right)\!, \end{aligned} \end{equation*}

where \(t_{\rho _{r}-1}(\cdot)\) is the probability density function of bivariate \(t\)-distribution.

Similarly, for term (4), (\(\nu _{c,z},\sigma _{c,z}|\mathbf {d}_{-i}\)) follows NIG distribution with parameters (\(\nu _{c,z},\lambda _{c,z},\epsilon _{c,z},\tau _{c,z}\)), which can be calculated as follows:

\begin{equation} \left\lbrace \begin{aligned}\lambda _{c,z}\! & =\! \lambda _{0}\!+\!n_{c,z}, \nu _{c,z}\!=\!\frac{\lambda _{0}\nu _{0}\!+\!n_{c,z}\bar{t}_{c,z}}{\lambda _{c,z}}\!, \tau _{c,z}\!=\!\tau _{0}\!+\!\frac{n_{c,z}}{2}\!, \\ \epsilon _{c,z}\! & =\! \epsilon _{0}\!+\!\!\!\!\!\!\sum _{c_{j}=c,z_{j}=z,j\ne i}\!\!\!\!\frac{(t_{j}-\bar{t}_{c,z})^{2}}{2} \!+\! \frac{\lambda _{0}n_{c,z}}{2\lambda _{c,z}}(\bar{t}_{c,z}\!-\!\nu _{0})^{2}, \end{aligned} \right. \end{equation}

(5)

where \(n_{c,z}\) is the number of records belonging to category \(c\) and temporal Gaussian component \(z\); \(\bar{t}_{c,z}\) is the mean value of these records, i.e., \(\bar{t}_{c,z}=\frac{1}{n_{c,z}}\sum _{c_{j}=c,z_{j}=z,j\ne i}t_{j}\). Therefore, term (4) can be calculated as

\begin{equation*} \begin{aligned}\!\!\int \!\!\!\!\!\int \!\!\!p(\!t_{i}|\nu _{c,z},\!\sigma _{\!c,z})p(\!\nu _{c,z},\!\sigma _{\!c,z}\!|\mathbf {d}_{\!-\!i}\!)d\nu _{c,z}d\sigma _{c,z} \!\!=\!t_{2\tau _{c,z}}\left(\!t_{i}|\nu _{c,z},\!\!\frac{\epsilon _{c,z}(\lambda _{c,z}\!\!\!+\!\!1)}{\lambda _{c,z}\tau _{c,z}}\right)\!,, \end{aligned} \end{equation*}

where \(t_{\tau _{c,z}}(\cdot)\) is the probability density function of t-distribution.

For term (2), \(\mathbf {\theta }|\mathbf {d}_{-i}\) follows Dirichlet distribution and we have

\begin{equation} \begin{aligned}\int p(r|\mathbf {\theta })p(\mathbf {\theta }|\mathbf {d}_{-i})d\mathbf {\theta } = \theta _{r} = \frac{n_{r}+\alpha }{\sum _{r^{\prime }=1}^{|R|} (n_{r^{\prime }}+\alpha)}. \end{aligned} \end{equation}

(6)

For term (5), \(\mathbf {\psi }_{c}|\mathbf {d}_{-i}\) follows Dirichlet distribution and we have

\begin{equation} \begin{aligned}\int p(z|\mathbf {\psi }_{c})p(\mathbf {\psi }_{c}|\mathbf {d}_{-i})d\mathbf {\psi }_{c} = \psi _{c,z} = \frac{n_{c,z}+\gamma }{\sum _{z^{\prime }=1}^{2} (n_{c,z^{\prime }}+\gamma)}. \end{aligned} \end{equation}

(7)

As for term (3), we first replace \(\mathbf {\Phi }_{r,u}\) with \(\mathbf {\Phi }_{r}\). \(\mathbf {\Phi }_{r}|\mathbf {d}_{-i}\) follows Dirichlet distribution and we have

\begin{equation} \begin{aligned}\!\!\int \! p(c|\mathbf {\Phi }_{r,u})p(\mathbf {\Phi }_{r,u}|\mathbf {d}_{-i})d\mathbf {\Phi }_{r,u} = \Phi _{r,c} = \frac{n_{r,c}+\beta }{\sum _{c^{\prime }=1}^{|\mathcal {C}|} (n_{r,c^{\prime }}+\beta)}, \end{aligned} \end{equation}

(8)

where \(n_{r,c}\) is the number of records belonging to region \(r\) and category \(c\).

After that, we consider each user’s personal preference and \(\mathbf {\Phi }_{r,u}|\mathbf {d}_{-i}\) follows Dirichlet distribution with prior \(b\cdot \mathbf {\Phi }_{r}\) and we have

\begin{equation} \begin{aligned}\!\!\int \!p(c|\mathbf {\Phi }_{r,u}\!)p(\mathbf {\Phi }_{r,u}\!|\mathbf {d}_{-i})d\mathbf {\Phi }_{r,u} \!=\! \Phi _{r,u,c} \!=\! \frac{n_{r,u,c}+b\Phi _{r,c}}{\!\!\sum \limits _{c^{\prime }=1}^{|\mathcal {C}|} \!(n_{r,u,c^{\prime }}\!+\!b\Phi _{r,c^{\prime }}\!)}\!, \end{aligned} \end{equation}

(9)

where \(n_{r,u,c}\) is the number of records belonging to region \(r\), user \(u\), and category \(c\).

Then, for every record \(\mathbf {d}_{i}\), we can simultaneously sample region \(r_{i}\), category \(c_{i}\), and temporal Gaussian component \(z_{i}\) according to the following posterior probability:

\begin{equation} \begin{aligned}&p(r_{i}=r,c_{i}=c,z_{i}=z|\mathbf {d}_{-i}) \propto t_{\rho _{r}-1}\left(\mathbf {l}_{i}|\mathbf {\mu }_{r},\frac{\mathbf {\Psi }_{r}(\kappa _{r}+1)}{\kappa _{r}(\rho _{r}-1)}\right) \\ & \cdot \theta _{r} \cdot \Phi _{r,u,c} \cdot t_{2\tau _{c,z}}\left(t_{i}|\nu _{c,z},\frac{\epsilon _{c,z}(\lambda _{c,z}+1)}{\lambda _{c,z}\tau _{c,z}}\right) \cdot \psi _{c,z}. \end{aligned} \end{equation}

(10)

However, if we directly sample \(r_{i},c_{i},z_{i}\) based on Equation (10), the computational complexity will be \(\mathcal {O}(M|R||\mathcal {C}||\mathcal {D}|)\), which is tough. To reduce the computation cost, we find that when \(\mathbf {d}_{i}\) is labeled, i.e., category \(c_{i}\) is fixed, \(r_{i}\) and \(z_{i}\) will become independent. Therefore, if we sample \(r_{i}\) and \(z_{i}\) separately, the computational complexity can be reduced to \(\mathcal {O}(M|R||\mathcal {D}|)\). Their posterior probability distributions are as follows:

\begin{equation} \begin{aligned}&p(r_{i}=r|\mathbf {d}_{-i}) \propto t_{\rho _{r}-1}\left(\mathbf {l}_{i}|\mathbf {\mu }_{r},\frac{\mathbf {\Psi }_{r}(\kappa _{r}+1)}{\kappa _{r}(\rho _{r}-1)}\right) \theta _{r} \Phi _{r,c}, \\ &p(z_{i}=z|\mathbf {d}_{-i}) \propto t_{2\tau _{c,z}}\left(t_{i}|\nu _{c,z},\frac{\epsilon _{c,z}(\lambda _{c,z}+1)}{\lambda _{c,z}\tau _{c,z}}\right) \psi _{c,z}. \end{aligned} \end{equation}

(11)

The detailed process of collapsed Gibbs sampling is shown in Algorithm 2. It takes the number of iterations \(M\), region number \(|R|\), category number \(|\mathcal {C}|\), human mobility records \(\mathcal {D}=\mathcal {D}_{L}\cup \mathcal {D}^{u}_{U}\), and hyper-parameters as the input. First, it randomly initializes \(r_{i}\), \(z_{i}\) for all records and \(c_{i}\) for unlabeled records. Then, it iterates sampling them for each labeled records \(\mathbf {d}_{i}\in \mathcal {D}_{L}\). The statistics are updated after each time of sampling. After \(M\) iterations, we use crowd preference \(\mathbf {\Phi }_{r}\) as a prior of user preference \(\mathbf {\Phi }_{r,u}\) and further learn \(\mathbf {\Phi }_{r,u}\) using unlabeled records of user \(u\). Similarly, it iterates sampling \(r_{i}\), \(z_{i}\), \(c_{i}\) for each unlabeled records \(\mathbf {d}_{i}\in \mathcal {D}^{u}_{U}\) and updates the statistics after each time of sampling. After \(M\) iterations, this algorithm outputs the statistics \(\lbrace \mathbf {\mu }_{r},\!\kappa _{r},\!\mathbf {\Psi }_{r},\!\rho _{r}, \!\nu _{c,z},\!\lambda _{c,z},\!\epsilon _{c,z},\!\tau _{c,z},\!\theta _{r},\!\psi _{c,z},\!\Phi _{r,u,c}\rbrace\) as the final results.

The computational complexity of collapsed Gibbs sampling shown in Algorithm 2 is \(\mathcal {O}(M(|R||\mathcal {C}||\mathcal {D}^{u}_{U}|+|R||\mathcal {D}_{L}|))\). Since the number of all labeled records is much more than unlabeled records of user \(u\), i.e., \(|\mathcal {D}_{L}|\gg |\mathcal {D}^{u}_{U}|\), the complexity can be approximated as \(\mathcal {O}(M|R||\mathcal {D}|)\). Thus, the computation cost roughly grows linearly with the number of iterations, number of regions, and number of records, which is feasible in practice.

3.3 POI Annotation

Given a record \(\mathbf {d}_{i}\) with geographical location \(\mathbf {l}_{i}\), time \(t_{i}\), user \(u_{i}\), and the set of candidate POI \(\mathcal {P}_{\mathbf {l}_{i}}\), we aim at finding the real POI \(p_{i}\in \mathcal {P}_{\mathbf {l}_{i}}\) the user visits. To achieve this goal, we calculate the probability of user visiting each candidate POI, i.e., \(p(p_{i}=p|\mathbf {l}_{i},t_{i})\) and choose the POI with maximum probability. We first calculate the probability that record \(\mathbf {d}_{i}\) appears in region \(r\) as \(p(r_{i}=r) \propto \theta _{r} \mathcal {N}(\mathbf {l_{i}}|\mathbf {\mu }_{r},\mathbf {\Sigma }_{r})\). Then, the probability of each candidate POI \(p\in \mathcal {P}_{\mathbf {l}_{i}}\), whose category is \(c\), is given by

\begin{equation} \begin{aligned}&p(p_{i}=p|\mathbf {l}_{i},t_{i}) \propto \left(\sum _{r=1}^{|R|} p(r_{i}=r) \Phi _{r,u,c}\right), \\ &\left(\sum _{z=1}^2 \mathbf {\psi }_{c,z}\mathcal {N}(t_{i}|\nu _{c,z},\sigma _{c,z})\right) \exp \left(-\frac{dist(p.\mathbf {l},\mathbf {l}_{i})^{2}}{\zeta ^{2}}\right), \end{aligned} \end{equation}

(12)

where \(\zeta\) is a hyper-parameter and the last term models distance decay.

To sum up, for semantic annotation of mobility records, we develop a Bayesian mixture model based on five domains of context—location, time, POI category, personal preference, and crowd preference. We further adopt collapsed Gibbs sampling to approximate parameters using both labeled and unlabeled records. Finally, we annotate each record with POI that has maximum probability based on Equation (12).

4 Performance Evaluation

4.1 Datasets

Our experiments are conducted on two human mobility datasets provided by WeChat, the most popular mobile instant messenger in China. They were collected between April 17\(-\)May 17, 2018, and cover the entire metropolitan area of Beijing and Guangzhou, which are two of the largest cities in China. The datasets contain both labeled and unlabeled human mobility records: labeled records are posted by users when they actively share POIs with friends and unlabeled records are collected by WeChat when users use other location services. The labeled records are check-ins containing anonymized user ID, GPS coordinates, time, POI, and category, while the unlabeled records do not contain POI and category. Since a visit may generate more than one unlabeled record, we combine them through stopping point detection methods [51]. On both datasets, we focus on users with more than two check-ins as our target users. Figures 3(a) and (b) show the distribution of the number of labeled and unlabeled records for each user. On the Beijing dataset, 80% of the users have less than three labeled records and 63 unlabeled records; on the Guangzhou dataset, 80% of the users have less than four labeled records and 79 unlabeled records.

Fig. 3.

Our POI dataset is also provided by WeChat. We only consider POI with at least one check-in record during April 17\(-\)May 17, 2018. In Guangzhou and Beijing, the average number of candidate POIs for each annotation is 19.4 and 8.1, respectively. For each POI, we obtain two levels of categories. The lower level has 16 coarse-grained categories, while the higher level has 54 fine-grained categories. For example, the category “restaurant” in the lower level is subdivided into seven categories in the higher level, including “Chinese restaurant”, “western restaurant”, and so on. We use both levels of the POI category to evaluate our model. Table 2 summarizes the statistics of two datasets.

Table 2.

Dataset	Guangzhou	Beijing
# users	8,493	4,465
# target users	427	390
# labeled records	12,953	6,397
# unlabeled records	59,172	20,752
# POIs	5,773	3,506

Table 2. Statistics of Two Datasets

Table 3.

	Guangzhou				Beijing
	Accuracy@k			Relative	Accuracy@k			Relative
	k=1	k=2	k=3	Improvement	k=1	k=2	k=3	Improvement
DIST	33.7%	46.1%	55.7%	–	42.6%	58.7%	68.7%	–
VPPM	34.0%	46.4%	55.7%	0.9%	42.6%	58.7%	68.7%	0.0%
Bayes(16)	35.4%	49.2%	55.5%	5.0%	43.3%	61.0%	70.5%	1.6%
Bayes(54)	34.4%	48.0%	55.3%	2.1%	43.6%	57.9%	68.7%	2.3%
STR(16)	34.7%	–	–	3.0%	44.9%	–	–	5.4%
STR(54)	34.4%	–	–	2.1%	44.6%	–	–	4.7%
STR+(16)	37.9%	–	–	12.5%	47.4%	–	–	11.3%
STR+(54)	38.6%	–	–	14.5%	46.7%	–	–	9.6%
CA(16)	43.3%	57.8%	64.9%	28.5%	48.2%	64.9%	72.6%	13.1%
CA(54)	44.0%	57.6%	65.1%	30.6%	47.2%	64.4%	70.3%	10.8%
CA+(16)	47.1%	62.8%	69.1%	39.8%	49.0%	67.7%	73.1%	15.0%
CA+(54)	49.4%	64.9%	69.1%	46.6%	50.0%	56.7%	74.1%	17.4%
CAP(16)	48.5%	63.0%	69.8%	43.9%	51.8%	69.0%	74.6%	21.6%
CAP(54)	50.4%	65.3%	70.0%	49.6%	52.1%	69.7%	76.4%	22.3%

Table 3. Performance on Two Datasets

4.2 Baselines and Metrics

We denote our proposed model as CAP, and compare it with several state-of-the-art POI annotation algorithms as follows:

4.2.1 Distance-Based Method (DIST).

This method only considers the geographic distance between user and candidates POIs. It simply annotates the geographically closest POI to each record.

4.2.2 Bayesian Activity Inference Model (Bayes).

Gong et al. [7] propose this method to infer the trip purpose of taxi passengers by taking both location and time into consideration. It formulates the visit probability function of a candidate POI \(p \in \mathcal {P}_{\mathbf {l}}\), based on Bayes’ rules. The function is given by

\begin{equation} p(p|\mathbf {l},t)\propto A_{p}\cdot dist(p.\mathbf {l}, \mathbf {l})^{-\lambda }\cdot pop(p|t). \end{equation}

(13)

\(A_{p}\) is the attractiveness of \(p\), which is not available in our datasets, so we omit this term. \(pop(p|t)\) is the popularity of \(p\) at time \(t\).

4.2.3 Visit Probability Propagating Model (VPPM).

Wang et al. [32] model the probability of POIs visited by passengers as the following parametric function:

\begin{equation} p(p|\mathbf {l})= \frac{\beta _1}{\beta _2} dist(p.\mathbf {l}, \mathbf {l})\cdot {\rm exp}(1-dist(p.\mathbf {l}, \mathbf {l})/\beta _2). \end{equation}

(14)

The probability distribution achieves the maximum value with the distance of \(\beta _2\) and has an exponential heavy tail, which is used to propagate the visit probability of passengers from the drop-off point to their target POIs.

4.2.4 Spatial-Temporal Regularity-Based Model (STR).

Wu and Li [34] propose a Markov random field model by utilizing spatial and temporal regularity of human mobility. Based on the observation that a user tends to visit similar POIs (e.g., POIs with the same category) when observed at close locations or similar time of the day, the POIs visited by a user can be inferred through minimizing pairwise potentials between spatial-temporal related records. Spatial-temporal Regularity (STR) captures the personal preference of each user independently by using all the location records of each user, i.e., it does not take crowd preference into consideration.

4.2.5 Semi-Supervised Version of STR (STR+).

The basic version of STR only utilizes time and location of human mobility records, ignoring the ground truth labels that are available in labeled records. The semi-supervised version of STR (STR+) further leverages those ground truth labels by setting the node potential of labeled records with a fixed value [34].

4.2.6 Context-Aware Annotation Model (CA).

This model is a simplified version of our method, which does not consider personal preference and only use labeled records. In this model, we approximate parameters in the Bayesian mixture model using only labeled records and replace personal preference vector \(\mathbf {\Phi }_{r,u}\) with a universal vector \(\mathbf {\Phi }_{r}\) for all users.

4.2.7 Supervised Version of CAP (CA+).

Compared with Context-aware Annotation (CA), this model further considers personal preference without using unlabeled records, which is a supervised version of CAP (CA+). It utilizes labeled records to approximate \(\mathbf {\Phi }_{r,u}\) for each user in the Bayesian mixture model.

In order to measure the correctness of the annotation result, we use a widely-used metric: accuracy@k, which is defined as \(\frac{N^{k}}{N}\), where \(N^{k}\) is the number of records whose associated POI is correctly predicted in the top-k results, and \(N\) is the total number of records. Note that STR only has accuracy@1 due to its model.

We also compare accuracy@1 of each method with DIST to calculate the relative improvement. Let \(N_{DIST}^{1}\) be the number of records being correctly annotated using DIST, we define the relative improvement as \(\frac{N^{1}-N_{DIST}^{1}}{N_{DIST}^{1}}\).

4.3 Parameter Settings

On both datasets, we use the last check-in of target users as the test set. All the other records are used as the training set. We set the parameters of our model as follows to achieve best performance: \(\delta =500\) \(m\), \(\zeta =100\) \(m\), \(M=20\), \(\kappa _{0}=1\), \(\rho _{0}=10\), \(\lambda _{0}=1\), \(\tau _{0}=2\), \(\alpha =10000\), \(\beta =10\), \(\gamma =10\), \(b=3\). \(\mathbf {\mu }_{0}\), and \(\mathbf {\Psi }_{0}\) are mean vector and covariance matrix of locations of all records. \(\nu _{0}\) and \(\epsilon _{0}\) are the mean and standard deviation of timestamps of all records. \(|R|\) is set to be 225 and 125 for Guangzhou and Beijing, respectively. We further discuss these parameters in Parameter Study. As for the baselines, parameters are tuned to achieve the best performance regarding the accuracy@1 metric.

4.4 Experimental Results

4.4.1 Case Study.

To evaluate the performance of our model, we first visualize the spatial and temporal pattern generated by our model based on the training set. We conduct a case study on the Beijing dataset. Figure 4 shows the temporal patterns of four most representative categories among 54 fine-grained categories. As we can observe, each category has a unique temporal pattern: fast food restaurants are popular during lunchtime and suppertime, which coincides with our prior knowledge. Most people visit shopping malls from 10AM to 10PM, corresponding with the malls’ opening time. Obviously, places of entertainment are more popular in the evening, and people tend to see a doctor in the daytime. Our model learns those temporal patterns from the training set without any prior knowledge, which demonstrates its effectiveness.

Fig. 4.

Figure 5 visualizes the functional regions that our model generates. Basically, the function of these regions is consistent with our prior knowledge. To further demonstrate the effectiveness of our model, we select two typical regions and plot their category distribution in Figure 5. Region A represents Beijing Capital International Airport Region, and 94% mobility records in this region are related to transportation; Region B represents the Imperial Palace Region, and 84% mobility records there are related to tourist spots. Note that for brevity and readability, here, we present results for coarse-grained POI categories. Comparing with administrative regions that most map services use, the functional regions that our model learns and generates are more informative. This is because functional regions in a city are generally not consistent with administrative regions since their boundaries are ambiguous and evolving over time. By revealing distinct functions of different areas, we can better infer the POIs that users visit.

Fig. 5.

In summary, both the temporal and spatial observations demonstrate the strong ability of our model to capture human mobility patterns.

4.4.2 Performance Comparison.

Table 3 shows the annotation performance in terms of accuracy@k with relative improvement. For methods other than DIST, we use both levels of the POI category and denote them as (16) and (54), respectively. As we can observe, our method outperforms all baselines on both datasets. Specifically, on the Guangzhou dataset, CAP surpasses DIST by 16.7%, with 49.6% relative improvement, and it performs better than VPPM, Bayes, STR, and STR+ by 16.4%, 15.0%, 16.4%, and 11.8%, respectively. Similarly, on the Beijing dataset, CAP outperforms DIST, VPPM, Bayes, STR, and STR+ by 9.5%, 9.5%, 8.5%, 9.0%, and 3.9%, respectively.

Note that our method achieves a very large improvement on the Guangzhou dataset. We believe that this performance gain is due to the fact that Guangzhou has a higher POI density than Beijing, i.e., there are more candidate POIs for each annotation, which makes the POI annotation problem more challenging. In addition, we can observe that VPPM outperforms DIST on the Guangzhou dataset, while they achieve similar performance on the Beijing dataset, which also indicates that Guangzhou has a higher POI density than Beijing. Since with a larger distance to POIs, the probability distribution (14) degrades into being inversely proportional to the distance, similar to the DIST algorithm. Thus, they achieve similar performance on the Beijing dataset.

Further, in Figure 6(b), we rank the records based on the number of candidate POIs and divide them into four groups. We can observe that as the number of candidate POI increases, the advantage of our model is more evident. For records with less than three candidate POIs, our method surpasses the baseline with the highest accuracy by only 1.3%, while for records with more than nine candidate POIs, the performance gain of our method increases to 3.9%.

Fig. 6.

We also rank the users based on the number of mobility records and divide them into four groups. As shown in Figure 6(a), our method achieves greater performance gain for users with more mobility records. For users with less than 10 records, the performance gain of our method is 1.6%, while for users with 70+ records, it comes to 6.8%.

In addition, as we can observe from Table 3, the performances are different when using different levels of category. While the coarse-grained categories cannot fully leverage information of all categories, it alleviates data skew and avoids overfitting. In comparison, the fine-grained POI categories illustrate mobility patterns more detailedly, while suffering from difficulties in training with limited data.

4.4.3 Parameter Study.

In our model, region number \(|R|\) is an important hyper-parameter. For each dataset and each level of categories, we plot the accuracy as a function of \(|R|\) in Figure 7. Generally, a larger region number leads to better performance, since it can better capture user preference in different areas of a city. However, due to data sparsity, an excessively large region number may reduce the number of records in each region and therefore cause overfitting. According to Figure 7, We set \(|R|\) as 225 and 125 for Guangzhou and Beijing, respectively.

Fig. 7.

Next, we examine how the parameters of NIW and NIG prior influence the performance. Figures 8(a) and (b) show the impact of parameters \(\kappa _{0}\) and \(\rho _{0}\) in the prior of NIG distribution. We can observe that they achieve the best performance near our default settings. Figures 8(c) and (d) show the impact of parameters \(\lambda _{0}\) and \(\tau _{0}\) in the prior of NIW distribution. Compared with \(\kappa _{0}\) and \(\rho _{0}\), they show less impact on the performance. Figures 8(e), (f), and (g) plot the impact of parameter \(\alpha\), \(\beta\), \(\gamma\) in the Dirichlet process (DP). Since all these curves demonstrate a moderate variance, we conclude that our model is not sensitive to the hyper-parameters, indicating the robustness of our method. Figure 8(h) plots the impact of parameter \(b\), which is used to adjust the influence of crowd preference, where \(b=0\) means only personal preference is considered in the model. By choosing a proper \(b\) value, we can balance personal preference and crowd preference so as to achieve the best performance.

Fig. 8.

4.4.4 Computational Time.

Finally, we evaluate the performance of our proposed model in terms of computational time. Without loss of generality, we only show the performance of our proposed CAP algorithm compared with STR+, DIST, Bayes, and VPPM algorithms on the Beijing dataset. In addition, the computational time of all algorithms is evaluated on a 16-Core 2.10 GHz Linux server. We first show the average computational time for POI annotation of each user in Figure 9(a), where the region number is set to be 125. We can observe that DIST, Bayes, and VPPM have short computational time, since they annotate mobility records directly based on probability statistics of POI, where there are not many parameters to be estimated. Different from them, CAP and STR+ algorithms require relatively large computational time to estimate their parameters. Specifically, we can observe that the computational time of CAP is a little larger than that of STR+. Compared with the performance improvement, the additional computational time is acceptable. Further, we show the computational time as the function of the region number in Figure 9(b). We can observe an obvious linear relationship between the computational time and region number, which is consistent with our theoretical analysis in Section 3.2 and indicates that our proposed algorithm is feasible in practice.

Fig. 9.

5 Related Work

5.1 Semantic Annotation of Mobility Records

Researchers have studied the semantic annotation problem using various methods [7, 8, 13, 34, 35, 37, 38]. Guc et al. [8] introduce a conceptual annotation model based on the notion of episodes. Gong et al.[7] formulate the probability that people visit certain POI based on Bayes’ rule, in order to infer the trip purpose of taxi passengers. Yan et al. [37, 38] propose a hidden Markov model (HMM) to infer the POI based on the previous place the user visits. Lian et al. [13] leverage the learning to rank method by considering four features—user history, distance, review, and web popularity. Wu et al. [35] use Kernel Density Estimation (KDE) methods to annotate mobility data with dynamic semantics based on external contextual data. To achieve personalized annotation, Wu et al. [34] propose a Markov random field model by utilizing spatial and temporal regularity of human mobility. Zhang et al. [48] further leverage user grouping to annotate mobility records with POIs. Different from these methods, our work is the first to consider location, time, category, user preference, and similarity among users simultaneously using the probabilistic graphical model.

5.2 Mobility Modeling

Mobility modeling is a classical task in urban computing [1, 4, 5, 21, 36]. A number of approaches model users’ mobility behavior by using the Markov-based model. Lu et al. [18] use the Markov model, where each state corresponds to a location. Mathew et al. [19] use a variant of the Markov model, i.e., HMM, where each hidden state corresponds to a probabilistic distribution over locations. Zhang et al. [47] further enhance HMM by using a user grouping scheme. In addition, some approaches use the DP to model user mobility. Jeong et al. [10] use DP to cluster users based on their mobility patterns. McInerney et al. [20] use DP to share parameters describing the periodicity of mobility among the users. Wang et al. [25, 27] use DP to model users’ app usage and mobility simultaneously. Wang et al. [31] develop a probabilistic model to infer the purposes of taxi passengers. Zheng et al. [49] develop a Bayesian-based model to cluster users based on their characteristics behavior patterns. Further, a number of studies consider context information, such as social media or social networks, in mobility modeling. Specifically, Zhang et al. [46] leverage the information from the geo-tagged tweet. Wang et al. [23] use the information of social networks. Huai et al. [9] propose a sequential model by extending the Bayesian hidden Markov model for modeling the personalized contextual information of mobile users. Wang et al. [26] integrate the neural attention mechanism into the extended Markov model to predict future movements of users. Wang et al. [29] investigate the mobile user profiling problem based on POI check-in data, and they propose an adversarial substructured learning method to solve this problem. Liu et al. [17] predict the bus travel demand based on mobility patterns obtained from the location traces of taxicabs and the mobility records in bus transactions, which is used for bus routing optimization. Wang et al. [30] propose a collective embedding framework to learn the community structure based on periodic spatial-temporal mobility graphs of humans. Fu et al. [6] focus on identifying and quantifying the urban forms of residential communities, and they propose a collective learning approach based on individual-level human mobility data to solve this problem. Yao et al. [39] focus on utilizing the “co-occurrence” of origin–destination zones in mobility trajectories to learn their embeddings.

Wang et al. [28] consider the continuity of human mobility in terms of temporal dimension by utilizing von Mises distribution in the Bayesian mixture model. However, this model mainly considers the spatial and temporal mobility patterns of users, and factors including the POI category and the temporal patterns of different POI categories are not considered, which is different from our proposed CAP algorithm. Different with the goal of predicting users’ future movement in these approaches, we seek to annotate unlabeled mobility records with correct POIs, i.e., extracting the semantic information of these mobility records.

5.3 Location Recommendation

Location context plays an important role in location-based services. Ye et al. [40] consider the social and geographical characteristics of users and locations in their location recommendation system. Zheng et al. [50] recommend locations to users based on their accurate locations and comments. Sun et al. [22] focus on inferring users’ intent from app usage behavior and other contextual information to provide personalized recommendations. Yuan et al. [43] recommend POIs to users by using a collaborative filtering method and incorporating temporal information. Liu et al. [15] propose a geographical probabilistic factor analysis framework for POI recommendations, which strategically takes multiple factors, which influence the user check-in decision process, into consideration. Wang et al. [33] recommend spatial items by modeling and fusing the sequential influence, cyclic patterns, and personal interests. A nonparametric Bayesian approach is further developed in [44] to capture spatio-temporal contextual information and users’ topical interests and intentions for accurate recommendation and retrieval. These approaches seek to find out new locations or POIs that users will visit in the future. Different from them, our goal is to annotate unlabeled historical user trajectories with POIs.

6 Conclusions

In this article, we propose a context-aware personalized semantic annotation model. We use a Bayesian mixture model to capture human mobility patterns by considering five domains of information simultaneously—location, time, POI category, personal preference, and crowd preference. Extensive results on two real-world datasets demonstrate that our method significantly outperforms the state-of-the-art algorithms by over 11.8%. In the future, we will take periodic mobility patterns into consideration so as to develop better models for semantic annotation of mobility records. Also, we will further model temporal patterns of POI in a similar way as user preference, i.e., modeling each POI separately using category-level distribution as prior. Since the results of CAP can facilitate further applications, including POI recommendation, population estimation, and public security, we will study related problems based on our proposed method in future work.

References

[1]

Hancheng Cao, Zhilong Chen, Fengli Xu, Yong Li, and Vassilis Kostakos. 2018. Revisitation in urban space vs. online: A comparison across pois, websites, and smartphone apps. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 4 (2018), 1–24.

Abstract

1 Introduction

2 System Model and Overview

2.1 Mathematical Setup and Problem Description

2.2 System Overview

3 Semantic Annotation Model

3.1 Bayesian Mixture Model

3.1.1 Location.

3.1.2 Time.

3.1.3 POI Category.

3.2 Parameter Estimation

3.3 POI Annotation

4 Performance Evaluation

4.1 Datasets

4.2 Baselines and Metrics

4.2.1 Distance-Based Method (DIST).

4.2.2 Bayesian Activity Inference Model (Bayes).

4.2.3 Visit Probability Propagating Model (VPPM).

4.2.4 Spatial-Temporal Regularity-Based Model (STR).

4.2.5 Semi-Supervised Version of STR (STR+).

4.2.6 Context-Aware Annotation Model (CA).

4.2.7 Supervised Version of CAP (CA+).

4.3 Parameter Settings

4.4 Experimental Results

4.4.1 Case Study.

4.4.2 Performance Comparison.

4.4.3 Parameter Study.

4.4.4 Computational Time.

5 Related Work

5.1 Semantic Annotation of Mobility Records

5.2 Mobility Modeling

5.3 Location Recommendation

6 Conclusions

References

Cited By

Index Terms

Recommendations

Context-Aware Location Annotation on Mobility Records Through User Grouping

Point of interest mining with proper semantic annotation

Survey of semantic annotation platforms

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations