Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

You Are How You Use Apps: User Profiling Based on Spatiotemporal App Usage Behavior

Published: 21 July 2023 Publication History

Abstract

Mobile apps have become an indispensable part of people’s daily lives. Users determine what apps to use and when and where to use them based on their tastes, interests, and personal demands, depending on their personality traits. This article aims to infer user profiles from their spatiotemporal mobile app usage behavior. Specifically, we first transform mobile app usage records into a heterogeneous graph. On the graph, nodes represent users, apps, locations, and time slots. Edges describe the co-occurrence of entities in usage records. We then develop a multi-relational heterogeneous graph attention network (MRel-HGAN), an end-to-end system for user profiling. MRel-HGAN first adopts a neighbor sampling strategy based on bootstrapping to sample heavily connected neighbors of a fixed size for each node. Next, we design a relational graph convolutional operation and a multi-relational attention operation. Through such modules, MRel-HGAN can generate node embedding by sufficiently leveraging the rich semantic information of the multi-relational structure in the mobile app usage graph. Experimental results on real-world mobile app usage datasets show the effectiveness and superiority of our MRel-HGAN in the user profiling task for attributes of gender and age.

1 Introduction

Accurate and large-scale user profiles are required for providing personalized services [1, 2, 3], such as customized search, recommendations, and advertisements, among others. However, in practice, user profiles are usually unknown and hard to obtain because of privacy settings. Therefore, user profiling, which aims to infer individual personality traits from user-generated data, is significant for real-world applications. In this article, we focus on inferring user profiles based on their spatiotemporal mobile app usage behavior. In particular, compared with other data sources, mobile app usage data has the following three advantages for the task of user profiling. First, the prevalence of smartphones makes it possible to automatically collect large-scale and fine-grained mobile app usage data for service providers [4, 5, 6]. The large-scale dataset allows us to adopt more advanced models, such as deep neural networks, to improve the robustness and accuracy of user profiling. Second, users choose which apps to use based on their individual needs and preferences, which are heavily influenced by their personality attributes [7, 8], including gender, income, age, and occupation, among others. Hence, users’ mobile app usage behavior can correspondingly reveal their profiles. Third, the mobile app usage behavior also contains rich spatiotemporal features of users—that is, location and time information of app usage records. Such spatiotemporal features are also helpful for inferring user personality traits.
Previous studies in the scope of mobile app usage behavior analysis for user profiling can be characterized into two types: \((1)\) descriptive analysis, where researchers apply statistical methods to describe how user profile traits, such as gender, affect their app usage behavior [9], and \((2)\) predictive analysis, where researchers recognize distinct patterns from mobile app usage traces and use classification models, such as Support Vector Machine (SVM), to predict users’ profile labels [10]. Nevertheless, previous studies have the following two limitations. First, previous studies principally rely on handcrafted features [10, 11]. They empirically defined descriptive rules based on small-scale datasets, lacking generalization capability when dealing with large-scale and noisy mobile app usage datasets. Second, previous studies did not explore the spatiotemporal features of mobile app usage behavior [7, 9]. They only considered app adoptions of mobile users (i.e., what apps were used) while ignoring where and when. Such a single type of input data limits the performance of existing app usage behavior-based user profiling models.
Alternatively, in recent years, graph-based representation learning, such as Graph Convolutional Networks (GCNs), shows great potential for automatic behavior profiling [12]. Recent studies have shown that a graph data structure can provide a general representation to integrate multiple types of data [13, 14, 15]. Using a graph structure to represent spatiotemporal mobile app usage behavior can overcome the limits of previous studies. Introducing user, app, location, and time nodes into a graph can encode the spatiotemporal features of mobile users’ app usage behavior. Then, by using the embedding of user nodes, we can provide individual-level user profiling.
Consequently, the combination of spatiotemporal mobile app usage data and graph-based representation learning is promising for the task of user profiling. However, three unique challenges arise in achieving this goal:
(1)
Each mobile app usage record involves four types of entities. Thus, the app usage graph is heterogeneous and has four node types: users, apps, locations, and time. Because different node types have different semantics, the ability to distinguish neighbor node types and select informative neighbors is required in the graph model.
(2)
Generally, mobile users use many apps and access a large number of locations. Therefore, app and location nodes will have dense connections with user nodes in the app usage graph. Such dense connections among nodes will cause severe neighborhood expansion and oversmoothing issues for graph-based representation learning methods [16].
(3)
Spatiotemporal mobile app usage data carry various relations among users, apps, locations, and time. Therefore, the app usage graph has multiple relational edges: user-app, user-location, user-time, app-time, app-location, and location-time edges. The app usage graph is undirected. Different relational edges have different semantics. Therefore, fusing the diverse semantic information into node representations is also challenging.
To overcome the preceding challenges, we propose a new framework, named Multi-Relational Heterogeneous Graph Attention Network (MRel-HGAN), to infer user profiles from their spatiotemporal mobile app usage behavior. First, to cope with the heterogeneity of mobile app usage graph, we leverage a relational graph convolutional operation consisting of relation-specific propagation and aggregation phases, which can distinguish the types of neighbors during operations and learn multiple relation-specific representations (with different semantics) for a single node. Second, to solve the issue of a high density of the app usage graph, we design a neighbor sampling strategy that samples strongly correlated neighbors of a fixed size for each node. The sampling operation can make the graph sparse and mitigate the issues of neighborhood expansion and oversmoothing. Third, to fuse the different semantic information from multiple relational edges, we leverage a multi-relational attention operation to learn the importance of each relation-specific representation and assign proper weights to them. By doing so, for each node in the mobile app usage graph, we fuse its multiple relation-specific representations into one feature vector.
In summary, we present the main contributions as follows:
We introduce a promising graph learning based framework to the problem of user profiling based on spatiotemporal mobile app usage data. By exploring the co-occurrence of users, locations, time, and apps in usage records, we construct a multi-relational heterogeneous mobile app usage graph. We also extract node features by utilizing side information, such as app category and Point of Interest (POI).
We develop MRel-HGAN to learn the node embeddings of the mobile app usage graph. By employing a relational graph convolutional operation and multi-relational attention operation, MRel-HGAN can adequately leverage the multi-relational graph structure, and heterogeneous node features to label user profiles.
We conduct extensive experiments based on large-scale real-world mobile app usage datasets. The experimental results exhibit the superiority of MRel-HGAN over the State-of-the-Art (SOTA) models for the task of user profiling for attributes of gender and age.
We present the preliminaries of user profiling from users’ spatiotemporal app usage behavior in Section 2. In Section 3, we detail how to construct the heterogeneous app usage graph and determine the initial features of vertexes. In Section 4, we elaborate on the network design of our proposed MRel-HGAN. We then evaluate MRel-HGAN by comparing it with other SOTA models in Section 5. Related work and study limitations are presented in Section 6. We conclude the article briefly in Section 7.

2 Preliminaries

This section formally introduces the mobile app usage behavior and the problem of user profiling. We then provide an overview of our system framework.

2.1 Mobile App Usage Behavior

Mobile app usage behavior is defined as a set of activity records generated by mobile users using mobile apps on smartphones [17]. Generally, when one user launches an app in the foreground of a smartphone, that launch will be regarded as a use of that app. In particular, an app usage record involves four key features: who, what, where, and when. In other words, a usage record is aggregated into a tetrad—that is, \(r = \, \lt u, a, l, t\gt\) , where u denotes the user, a denotes the app used, l denotes the user’s location, and t indicates the timestamp of that record. The location information of mobile users can be gathered from the GPS sensors of mobile phones. Alternatively, network operators can infer the coarse location from network-level data, such as the location of associated base stations. As a result, app usage behavior, as a kind of spatiotemporal data, contains the interior relationships among users, apps, locations, and time [18]. By investigating such relationships, we can infer user attributes, improve app usability, and provide context-aware app recommendations and usage predictions [19, 20, 21]. This article works on the problem of automatic user profiling from spatiotemporal app usage behavior by exploring hidden relations.

2.2 User Profiling Problem

Our crucial user profiling problem is to infer user attributes from their app usage behavior. Specifically, existing studies have presented the existence of relations between user attributes and the apps they use [10, 22]. For example, Malmi and Weber [10] exhibited that the presence of period-tracking apps is a good predictor for gender. However, existing works did not take full advantage of the hidden relationships in usage behavior as they ignored spatiotemporal features [7, 23]. Therefore, inspired by the limitations of existing studies, in this article, instead of only considering app adoptions, we tackle the user profiling problem by thoroughly leveraging users’ spatiotemporal app usage behavior.
Definition 2.1.
User profiling from spatiotemporal app usage behavior. Given a set of app usage records \(R = [\lt u, a, t, l\gt ]\) and user profile labels Y, the aim is to design a framework \(f: R \rightarrow Y\) that takes the app usage records as inputs and outputs profile labels.
We formulate this problem as a task of multi-label classification. Essentially, we first construct a heterogeneous graph from spatiotemporal mobile app usage records, representing co-occurrence relations between users, apps, locations, and time. We then infer the labels of user nodes from both the local multi-relational graph structure and features of heterogeneous nodes.

2.3 System Overview

Figure 1 presents an overview of our proposed system. Specifically, there are three critical steps illustrated as follows.
Fig. 1.
Fig. 1. System overview, where U refers to user nodes, A refers to app nodes, L refers to location nodes, and T refers to time nodes.
Construction of the Mobile App Usage Graph. In this step, we detect the co-occurrence of users, apps, locations, and time from mobile app usage datasets and construct a normalized weighted app usage graph. Figure 1 exhibits a schematic diagram of the heterogeneous app usage graph, where U represents user nodes, A represents app nodes, L represents location nodes, and T represents time nodes.
Automatic Representation Learning of User Nodes. In this step, we incorporate the heterogeneous neighbor sampling, relational graph convolutional operation, and multi-relational attention operation into the graph learning framework to develop a representation learning model that can leverage both multi-relational graph structure and heterogeneous node features.
User Profiling. In this step, we exploit the learned representations of user nodes to infer user profile labels. Explicitly, we set the dimension of a user’s representation vector as the number of labels in the user profiling task. We then predict user profile labels by applying the softmax function to users’ representations.
Notably, our proposed system is an end-to-end solution. The notations we will use throughout the article are summarized in Table 1.
Table 1.
NotationExplanation
USet of user nodes
ASet of app nodes
LSet of location nodes
TSet of time nodes
\(O_V\) Set of types of nodes
\(R_E\) Set of relations of edges
\(\hat{\omega }\) Normalized weight of edge
\(\boldsymbol {h}\) Initial node feature
\(\hat{\boldsymbol {h}}\) Projected node feature
\(\hat{\mathcal {N}}(v)\) Sampled fixed-size neighbors of node v
\(\boldsymbol {h}^{r_e}\) Relation-specific representation
\(\boldsymbol {h}^{\prime }\) The fused representation
Table 1. Notations and Explanations

3 Construction of the App Usage Graph

This section proposes a testable framework to transform spatiotemporal app usage records with side information (i.e., app category and POI) into a graph. In particular, the graph construction framework consists of two models: constructing the graph structure and extracting vertex features.

3.1 Graph Structure Construction

To represent the interior relations between users, apps, locations, and time, we construct a multi-relational heterogeneous graph to encode the associations between different entities in mobile app usage records. The structure of the graph is shown in Figure 2, where vertexes stand for the four-type objects and edges reflect the co-occurrence of different objects in app usage records. Specifically, since time is a continuous variable, we evenly divide 1 day into T time slots and use time slots to represent time instead. For simplicity, we still use the term time in the following explanations. For a multi-relational heterogeneous graph, \(G=(V, E, O_V, R_E, H)\) , V and E respectively denote the vertexes and edges in the graph. \(O_V\) and \(R_E\) represent the set of vertex types and relation types, respectively. H is the set of the feature vector of each vertex. In our case, the vertex types \(O_V\) include user, app, time, and location. The edge types \(R_E\) include user-app edges \(E_{ua}\) that reflect the app adoptions of users, user-location edges \(E_{ul}\) that describe the trajectories of users, user-time edges \(E_{ut}\) that reveal the active time of users, app-time edges \(E_{at}\) that represent the temporal features of apps, app-location edges \(E_{al}\) that indicate the spatial features of apps, and location-time edges \(E_{lt}\) that express the temporal dynamics of locations. Notably, the heterogeneous app usage graph is undirected.
Fig. 2.
Fig. 2. Heterogeneous spatiotemporal app usage graph.
To model the strength of connections between vertexes, we assign each edge a weight. In practice, the method to compute the weight of edges is described as follows. Initially, the weight of each edge is set as 0. For an app usage record \(r = \, \lt u, a, l, t\gt\) , the weights of edges \(e(u, a)\) , \(e(u, t)\) , \(e(u, l)\) , \(e(a, t)\) , \(e(a, l)\) , and \(e(l, t)\) will increase by 1. Traversing all records, we will get the weights of all edges. However, as app usage records are unevenly distributed in all user, location, time, and app dimensions, we need to normalize the edge weights across different edge types. In practice, we normalize the weight for each edge type separately. For example, for user-app edges \(E_{ua}\) , we set the normalized edge weight as
\begin{equation} \hat{\omega }(u, a) = \frac{\omega (u, a)-\min \limits {\left(\omega (u, a)\right)}}{\max \limits {\left(\omega (u, a)\right)}-\min \limits {\left(\omega (u, a)\right)}}, \forall u \in U, a \in A, \end{equation}
(1)
where \(\hat{\omega }(u, a)\) denotes the normalized weight of edge \(e(u,a)\) , U is the set of user vertexes, and A is the set of app vertexes. We then apply a similar computation to other edge types (i.e., \(e(u, t)\) , \(e(u, l)\) , \(e(a, t)\) , \(e(a, l)\) , and \(e(l, t)\) ) and finally obtain the normalized weights for all types of edges.
The mobile app usage graph can be partitioned into six subgraphs based on the types of edges: user-app subgraph \(G_{ua}\) , user-location subgraph \(G_{ul}\) , user-time subgraph \(G_{ut}\) , app-time subgraph \(G_{at}\) , app-location subgraph \(G_{al}\) , and location-time subgraph \(G_{lt}\) . Each subgraph is a normalized weighted bipartite graph, and different subgraphs describe different semantic relations between heterogeneous vertexes.

3.2 Vertex Feature Extraction

To leverage the rich side information of vertexes, we assign a feature vector \(\boldsymbol {h}\) to each vertex \(v \in V\) .
For app vertexes, we use the app category information as app features. In each app category, the apps are usually of similar functionality and have an inherent semantic meaning [24]. The apps’ categorical information is available in app stores, like Google Play and Apple Store for Android and iOS apps, respectively. Assuming the number of app categories is C, for an app vertex a, we use a one-hot vector to express the feature and denote the categorical information; thus, the feature vector is \(\boldsymbol {h}_a \in \mathbb {R} ^{1\times C}\) ,
\begin{equation} h_{ai} = \left\lbrace \begin{aligned}1 & & \text{if}~~c_a = i, \\ 0 & & \text{otherwise}. \end{aligned} ~~~~~~~~\forall a \in A, \right. \end{equation}
(2)
where \(c_a\) is the category label of app a.
For location vertexes, we leverage the density and category information of nearby POIs of that location. A POI refers to a point location with specific functions, such as residence, workplace, and theater. Previous studies have shown that locations with similar POI distributions have similar urban functions [25, 26]. Hence, we use the POI distribution within the location as the semantic feature of the location vertex. In practice, the POI information can be crawled from map service providers like Tencent Maps and Google Maps. Generally, a POI is recorded with a tuple consisting of a POI category, a name, and a geo-position (i.e., latitude and longitude). For each location, we count the number of POIs in each category accordingly. Assuming the number of POI categories is P, for an arbitrary location \(l \in L,\) where L is the set of location vertexes, we have a POI distribution vector \(\boldsymbol {p}_l = [p_{l1}, p_{l2}, \ldots , p_{lP}],\) where \(p_{li}\) stands for the number of POIs of POI category i in location l. In particular, the popularity of POI categories varies a lot. For instance, restaurant POIs are far more popular than tourist spot POIs. Therefore, we normalize the POI features to eliminate the influence of the uneven popularity distribution across different POI categories. In this work, we apply TF-IDF [27]. For a location vertex \(v_l\) , its feature vector \(\boldsymbol {h}_l \in \mathbb {R} ^{1\times P}\) can be calculated as
\begin{equation} h_{li} = \frac{p_{li}}{\sum _{i=1}^P{p_{li}}}\cdot \log \frac{|L|}{|\lbrace \boldsymbol {p}_l:p_{li}\gt 0\rbrace |+1}, \forall l \in L, i = 1, \ldots ,P, \end{equation}
(3)
where \(\frac{p_{li}}{\sum _{i=1}^P{p_{li}}}\) is the term frequency and \(\log \frac{|L|}{|\lbrace \boldsymbol {p}_l:p_{li}\gt 0\rbrace |+1}\) represents the inverse document frequency.
For time vertexes, we use one-hot vectors to express their features. For a time vertex \(t \in T\) , where T is the set of time vertexes, the feature vector is \(\boldsymbol {h}_t \in \mathbb {R}^{1 \times |T|}\) ,
\begin{equation} h_{ti} = \left\lbrace \begin{aligned}1 & & \text{if}~~t = i, \\ 0 & & \text{otherwise}. \end{aligned} ~~~~~~~~\forall t \in T. \right. \end{equation}
(4)
Similarly, we use one-hot vectors to express user vertexes’ features as well, and the feature vector of a user vertex \(u \in U\) is \(\boldsymbol {h}_u \in \mathbb {R}^{1 \times |U|}\) ,
\begin{equation} h_{ui} = \left\lbrace \begin{aligned}1 & & \text{if}~~u = i, \\ 0 & & \text{otherwise}. \end{aligned} ~~~~~~~~\forall u \in U. \right. \end{equation}
(5)

4 Representation Learning for Users

This section formally presents the design of MRel-HGAN and shows how to employ MRel-HGAN to learn representations of user nodes. Precisely, MRel-HGAN consists of three parts: (1) heterogeneous neighbor sampling, (2) relational graph convolutional operation, and (3) multi-relational attention operation.

4.1 Neighbor Sampling

The critical idea of Graph Neural Networks (GNNs) is to aggregate features from a node’s neighbors [28]. Typically, the computation of GNN is carried out in two phases: (1) the message passing phase and (2) the aggregating and updating phase. Specifically, in the message passing phase, a node passes its representation vector to its first-order neighbors. In the aggregating and updating phase, a node first aggregates the received representation vectors with its own representation. Then, the node updates its own representation vector with the aggregated one. By increasing the number of network layers, each node can incorporate information from higher-order neighbors and thus learn richer node features. For example, as shown in Figure 3(b), in layer 1, nodes C, D, F will pass their representation vectors to their first-order neighbors (e.g., node E). In layer 2, node E will then pass the aggregated representation to node D. In this way, node D can incorporate the feature information from its second-order neighbors (i.e., nodes C and F). However, applying this approach to a mobile app usage graph may raise several issues:
Fig. 3.
Fig. 3. An example of neighbor sampling.
Neighborhood expansion: For a given node, computing its hidden representation requires considering its first-order neighbors. In turn, its first-order neighbors must consider their own first-order neighbors, and so on. Such a process causes recursive neighborhood expansion by growing with each additional layer. Since mobile users usually use a large number of apps and visit many locations, the mobile app usage graph is large scale and dense. Therefore, the issue of neighborhood expansion will be quite severe for the app usage graph.
Oversmoothing: As mentioned earlier, the app usage graph is dense, which could lead to another severe issue—that is, oversmoothing [16]. The dense connections between nodes make the learned representations indistinguishable, which hurts the profiling accuracy.
Various neighbor sizes: Apps and locations have varying popularity, and thus nodes have varying degrees. For example, as one of the most popular apps, Facebook is used by millions of people, whereas some apps only have a few users. Hence, the representations of nodes with high degrees could be impaired by weakly connected neighbors, and nodes with low degrees may not adequately learn their representations.
To solve these issues, we apply a heterogeneous neighbor sampling strategy based on a bootstrapping approach. Specifically, for each node, we randomly sample a fixed-size set of neighbors with probability proportional to the edge weights. Mathematically, we use \(\hat{\mathcal {N}}(v)\) to denote the sampled fixed-size neighbors of node v, drawn from the set \(\lbrace u \in V: e(v,u) \in E \rbrace\) . We then use the neighbors sampled to approximate the aggregation of the total neighbors. For example, as depicted in Figure 3(c), we set the sample size as 2. Therefore, by dropping the edge \(e(D, A)\) , node D only passes its representation vectors to nodes B and E in layer 1. Correspondingly, in layer 2, node D will only aggregate the representations from nodes B and E as well. We also note that the app usage graph is heterogeneous, having various node types. Moreover, the degree distribution of different node types varies greatly. Thus, for different types of nodes, we implement the sampling strategy separately. In other words, given a node v, we sample its neighbors of user nodes, app nodes, location nodes, and time nodes with the fixed size of \(\hat{\mathcal {N}_u}(v)\) , \(\hat{\mathcal {N}_a}(v)\) , \(\hat{\mathcal {N}_l}(v)\) , and \(\hat{\mathcal {N}_t}(v)\) , respectively.
The heterogeneous neighbor sampling strategy can avoid the issues mentioned earlier due to two principal reasons. (First, only a small set of the most relevant neighbors with large edge weights are selected, thus mitigating the issues of neighborhood expansion and oversmoothing. Second, for each node, all types of neighbors are collected, and the sample size is fixed by leveraging bootstrapping, which solves the issue of varying node degrees.

4.2 Relational Graph Convolutional Operation

Because of the heterogeneity of nodes, as illustrated in Section 3.2, the nodes with different types have different feature spaces. Hence, we first project the features of different node types into the same space using a type-specific transformation. Mathematically, the projection process can be expressed as follows:
\begin{equation} \hat{\boldsymbol {h}}_v = {\bf M}_{o_v} \cdot \boldsymbol {h}_v, \end{equation}
(6)
where \(\boldsymbol {M}_{o_v}\) denotes transformation matrix and \(o_v\) represents the type of node v. \(\boldsymbol {h}_v\) and \(\hat{{\bf h}}_v\) denote the original and projected feature vectors of node v, respectively. After the node type-specific projection operation, the downstream operations (e.g., graph convolution) can cope with arbitrary types of nodes.
The app usage graph has different edge types, revealing different relations between nodes and having different semantics. For example, user-app edges can describe the app co-using relationships between users, whereas user-location edges can depict the relationships between users visiting the same location. However, the conventional graph convolutional operation treats graph edges equally and cannot explore the different semantics of various edge types. As a result, it cannot be applied to the mobile app usage graph directly.
In this work, we leverage a relational graph convolutional operation consisting of relation-specific propagation and aggregation phases. For each edge type, we implement a corresponding layer. Therefore, the relational graph convolutional operation on the app usage graph exploits six layers: the user-app, user-location, user-time, app-location, app-time, and location-time layers. These layers use the edges of corresponding relations, which is equivalent to information propagation and aggregation in the respective bipartite subgraphs. Given a node i, by the relational graph convolutional operation of relation \(r_e \in R_E\) , we will obtain a learned relation-specific feature vector \(\boldsymbol {h}^{r_e}_i\) of node i, which can be calculated as follows:
\begin{equation} \boldsymbol {h}^{r_e}_i = \sigma \left(\sum \limits _{j \in \hat{\mathcal {N}}_{r_e}(i)} \hat{\omega }(j,i) \cdot {\bf W}_{r_e} \cdot \hat{\boldsymbol {h}}_j \right), \end{equation}
(7)
where \(\hat{\mathcal {N}}_{r_e}(i)\) denotes the set of sampled neighbors of node i under relation \(r_e \in R_E\) , \(\hat{\omega }(j,i)\) is the normalized edge weight of edge \(e(j, i)\) , \({\bf W}_{r_e}\) represents the relation-specific transformation weight matrix of relation \(r_e\) , \(\hat{\boldsymbol {h}}_j\) stands for the projected feature vector of node j, and \(\sigma (\cdot)\) is an activation function.
Intuitively, (7) accumulates transformed feature vectors of neighbor nodes through a set of relation-specific edges, which have homogeneous semantics. Due to the symmetry of the app usage graph, all types of nodes have three types of edges. Given an arbitrary node i, after feeding the node feature into the relational graph convolutional layer, we can obtain a group of relation-specific node feature vectors, denoted as \(\lbrace \boldsymbol {h}^{r_e^0}_i, \boldsymbol {h}^{r_e^1}_i, \boldsymbol {h}^{r_e^2}_i\rbrace\) . To better understand the relational graph convolutional operation, we briefly explain the processes in Figure 4.
Fig. 4.
Fig. 4. Diagram of relational graph convolutional operation and multi-relational attention operation, taking a user node as an example. \(G_{ua}\) denotes the user-app bipartite subgraph. \(G_{ul}\) denotes the user-location bipartite subgraph. \(G_{ut}\) denotes the user-time bipartite subgraph. \(\boldsymbol {h}_u^{r_e}\) is the feature vector of node u under the relation \(r_e\) . \(\beta _{u, r_e}\) represents the importance of the relation-specific feature vector \(\boldsymbol {h}_u^{r_e}\) . \(\boldsymbol {h}^{\prime }_{u}\) denotes the updated feature vector of node u.

4.3 Multi-Relational Attention Operation

We next fuse together the multiple relation-specific feature vectors to update new features of nodes. Importantly, we need to ensure that the new feature vector of a node can also be informed by the corresponding previous feature vector. Therefore, as depicted in Figure 4, we add a self-loop of a specific relation type to each node. Given a node i, we can calculate its self-relation feature vector \(\boldsymbol {h}^{s}_i\) as
\begin{equation} \boldsymbol {h}^{s}_i= \sigma ({\bf W}_{0} \cdot \hat{\boldsymbol {h}}_i), \end{equation}
(8)
where \({\bf W}_{0}\) is the weight matrix of self-relation and \(\hat{\boldsymbol {h}}_i\) is the projected feature vector of node i. In this way, for an arbitrary node i, it has four relation-specific feature vectors—that is, \({\bf H}_i = \lbrace \boldsymbol {h}^{r_e^0}_i, \boldsymbol {h}^{r_e^1}_i, \boldsymbol {h}^{r_e^2}_i, \boldsymbol {h}^{s}_i\rbrace\) .
Inspired by the vanilla attention approach [29], we first use a one-layer Multi-Layer Perceptron (MLP) with the activation function of \(\tanh\) to transform relation-specific feature vectors. We then compute the importance of different relation-specific feature vectors by multiplying an attention vector \(\boldsymbol {c}\) . Given a node i, the importance of relation \(r_e \in R_E\) can be calculated as
\begin{equation} \alpha _{i, r_e} = \boldsymbol {c}^{T} \cdot \tanh \left({\bf W} \cdot \boldsymbol {h}^{r_e}_i + \boldsymbol {b}\right),~~~~~~~~\forall ~~\boldsymbol {h}^{r_e}_i \in {\bf H}_i, \end{equation}
(9)
where \(\bf W\) is the weight matrix, \(\boldsymbol {b}\) is the bias vector, and \(\boldsymbol {c}^{T}\) is the attention vector. Note that we have added the self-relation into the set of \(R_E\) . Next, we normalize the importance across different relation-specific features with the softmax function. By denoting the normalized weight as \(\beta _{i, r_e}\) , we can compute \(\beta _{i, r_e}\) as
\begin{equation} \beta _{i, r_e} = \frac{\exp \left(\alpha _{i, r_e}\right)}{\sum _{r_e \in R_E} \exp \left(\alpha _{i, r_e}\right)}. \end{equation}
(10)
The higher the \(\beta _{i, r_e}\) , the higher the contribution of \(\boldsymbol {h}^{r_e}_i\) toward the new feature vector of node i. Therefore, by using the learned weights as coefficients, we can fuse these relation-specific feature vectors and update the new feature vector \(\boldsymbol {h}^{\prime }_i\) as follows:
\begin{equation} \boldsymbol {h}^{\prime }_i = \sum \limits _{r_e \in R_E} \beta _{i, r_e} \cdot \boldsymbol {h}^{r_e}_i. \end{equation}
(11)
In this way, the updated feature vectors of nodes aggregate all semantics hidden in multiple relations.

4.4 User Profiling

Since we formulate user profiling as a multi-label classification task, the last layer of our model is responsible for predicting the labels of users based on the representations of user nodes. Given the set of user profile labels Y, we employ the softmax function on the users’ representation matrix and obtain \({\bf Z} \in \mathbb {R} ^{|U|\times |Y|}\) (i.e., the predicted probability distribution of users’ labels). We then adopt the cross entropy as the loss function to carry out end-to-end training for the model. Mathematically, the loss function over all of the users labeled is defined as
\begin{equation} \mathcal {L} = -\sum \limits _{u \in U} {\boldsymbol {y}_u} \ln ({\boldsymbol {z}_u}), \end{equation}
(12)
where \(\boldsymbol {y}_u\) and \(\boldsymbol {z}_u\) are the ground truth and the predicted probability distribution of user u, respectively. Guided by the labeled data, we can optimize our proposed model through the back-propagation method.

5 Experiments

This section presents extensive experiments conducted on large-scale real-world app usage datasets. We first exhibit the experiment setup, including the datasets, compared baselines, evaluation metrics, and implementation details. We next compare the performance of our model with baselines and discuss the results. Last, to show the effectiveness of modules in our system, we compare several variants.

5.1 Experiment Setup

5.1.1 Data Collection.

To evaluate the system we proposed, we explore two real-world anonymized mobile app usage datasets. One dataset is collected by a Mobile Network Operator (MNO), and the other one is collected by the TalkingData platform.
Dataset Collected by an MNO. The MNO dataset was collected from one of the largest cities in the world, Shanghai, covering 1 week in April 2016. The dataset has more than 10,000 users. In practice, a systematic approach called SAMPLES is used to identify mobile app usage based on users’ network access data. SAMPLES can build conjunctive rules and detect more than 90% of applications based on a limited collection of manually labeled data samples, obtaining a 99% average accuracy [30]. To create conjunctive criteria, the operator crawled the 2,000 most popular apps from app stores and manually created data samples. The gathered network access records were then matched with particular applications. According to the ISP’s data, up to 90% of network traffic may be traced to individual apps. Each mobile app usage record includes the following fields: anonymized user ID u, app ID a, location ID l, and timestamp t. In particular, the locations refer to the associated base stations of users, as the dataset is collected from mobile networks. The profiles of users are gender labels provided by the ISP. The app category information is obtained from app stores [17].
We also crawled 782,528 POIs of Shanghai via the Baidu Map service to create a POI dataset. There are 15 POI categories: restaurant, hotel, entertainment, industry, residence, education, hospital, fitness center, shopping mall, scenic spot, transportation facility, financial service, life service, corporation & business, and government & organization. The statistics of the dataset are detailed in Table 2.
Table 2.
Table 2. Statistics of the MNO Dataset
Dataset Collected by TalkingData. The TalkingData software development kit (SDK),1 which is embedded into mobile apps and operates in the background, collects the app usage data automatically. Individual users of such apps have given their full recognition and approval. Necessary anonymization has been carried out to safeguard their privacy during collection processes. Each data sample contains a list of the applications being used, the time, the location (latitude and longitude), and a device-specific identification. To maintain spatial consistency with the MNO dataset, we filter out the app usage traces from locations other than Shanghai. Then, by using the nearest base station’s latitude and longitude, we map each position to that one. By doing so, for the TalkingData dataset, we also utilize base station ID information as the location, much like in the MNO dataset. As for TalkingData, the profiles of users are age labels provided by the platform. The statistics of the dataset are detailed in Table 3.
Table 3.
Table 3. Statistics of TalkingData

5.1.2 Baselines.

We select 10 models as the baselines to compare with our framework, MRel-HGAN. Specifically, they can be classified in two categories: classic models and graph-based models.
We first introduce four classic models for the user profiling task, which are commonly used in previous studies [7, 10, 22, 31]. These models take users’ feature vectors as input and output the profile labels of users, which are introduced as follows.
Logistic Regression [32]. Logistic Regression (LR) is widely used in user profiling. LR uses a logistic function to model the probability of a certain profile label for each user.
Support Vector Machine [33]. SVM is also widely used to solve classification problems. An SVM model is trained to find the maximum-margin hyperplane separating users of different labels, which is a non-probabilistic linear classifier.
Random Forest [34]. Random Forest (RF) is a classic ensemble learning method for classification. RF works by constructing a multitude of simple decision trees.
Multi-Layer Perceptron [35]. The MLP is trained using a supervised learning approach called back-propagation. The hidden layer size was set to 64 in our case.
As we use the graph structure to model mobile app usage behavior, we compare our method with six SOTA graph-based models. The graph-based models take a graph as input and output the embeddings learned of vertexes by exploring local graph structures and node features.
DeepWalk [36]. DeepWalk enlarges the word2vec [37] model to the application of graph representation learning. DeepWalk uses truncated random walks to gather local structural information, then feeds walk pathways into the skip-gram model to learn node embeddings. We set the number of random walks per node to 50, the embedding size to 64, the walk length to 30, and the window size to 10.
Node2vec [38]. Node2vec applies a biased random walk procedure, controlled by two parameters p and q, to produce embeddings of nodes. The parameters add flexibility in exploring neighborhoods. In the experiment, we set \(p=0.25\) and \(q=0.25\) .
Metapath2vec [39]. Metapath2vec employs meta-path-based random walks to cope with the heterogeneity of the graph and leverages the skip-gram model to perform node embeddings. In the experiments, the used meta-path schemes are UAU, UALAU, and UATAU.
Graph Convolutional Network [40]. GCN is an end-to-end supervised learning algorithm on homogeneous graph-structured data, which performs convolutional operations in the graph Fourier domain. In the experiment, we add a feature projection layer before the conventional GCN model to project the features of heterogeneous nodes into the same feature space.
Graph Attention Network [41]. The Graph Attention Network (GAT) is an end-to-end supervised learning algorithm on homogeneous graph-structured data, which uses the attention mechanism for the aggregation operation of node features. Similarly, we add a feature projection layer before the conventional GAT model to project the features of heterogeneous nodes into the same feature space. Additionally, we set the number of attention heads to 4.
Heterogeneous Graph Attention Network [42]. The Heterogeneous Graph Attention Network (HAN) performs graph attention operations on heterogeneous graph-structured data by leveraging meta-paths. It first learns meta-path-specific node embeddings from multiple meta-path-based homogeneous graphs and then employs the attention mechanism to combine them. In the experiments, the used meta-path schemes are UAU, UALAU, and UATAU. It is worth noting that the conventional HAN is hard to implement directly on the mobile app usage graph due to the high density of the graph. For example, for the meta-path UALAU, its meta-path-based homogeneous graph is with 12,777 nodes but 110,031,748 edges. Hence, we add our proposed heterogeneous neighbor sampling module before the conventional HAN to make the graph sparse and decrease computational costs.

5.1.3 Implementation Details and Evaluation Metrics.

In the experiments, 80% of labeled users are randomly selected for training, 10% of users are selected for testing, and the remaining composes the validation set used to determine the optimal hyperparameters. In the neighbor sampling procedure, the sizes of sampled neighbor sets are 1,000, 50, 50, and 24, for user \(\hat{\mathcal {N}_u}\) , app \(\hat{\mathcal {N}_a}\) , location \(\hat{\mathcal {N}_l}\) , and time \(\hat{\mathcal {N}_t}\) , respectively. The dimension of the attention vector \(\boldsymbol {c}\) is 128. The dimension of node embeddings is 64. In the training procedure, we randomly initialize parameters and use Adam [43] to optimize the model with an initial learning rate of 0.01.
We adopt four metrics: AUC (area under the curve), PRE (precision), Macro-F1, and Micro-F1, which are generally used in user profiling to evaluate the performance of models.2 To reduce the variance of results, we train all models repeatedly 10 times and report the averaged evaluation metrics of each model. Additionally, grid search is used to find the optimal hyperparameters of a model.

5.2 Performance Comparisons with Baselines

We first evaluate the classic models on three scenarios: the app-based scenario, location-based scenario, and app-location-joint scenario. For the app-based scenario, we exploit the used apps for gender prediction on the MNO dataset and age group prediction on TalkingData. We treat each app as a dimension and represent the input feature of each user as an app-based vector. For each app, the corresponding value is the normalized frequency of usage. Similarly, for the location-based scenario, each user is represented as a location-based vector to indicate the normalized frequency that the user visits the corresponding location. For the app-location-joint scenario, we jointly explore users’ used apps and visited locations for user profiling. In detail, for each user, we concatenate her or his app-based feature vector and location-based vector together. The performance of classic models for user profiling under three different scenarios is presented in Tables 4 and 5. Additionally, we evaluate graph-based models on the heterogeneous app usage graph and depict the results in Tables 6 and 7. GCN-Sampling and GAT-Sampling refer to the GCN and GAT models with our proposed neighbor sampling operation. From the results, we have the following key observations.
Table 4.
ModelFeatureAUCPREMacro-F1Micro-F1
LRApp (2,000)0.60350.60330.60340.6123
SVM0.64680.65070.64800.6604
RF0.61050.61360.61130.6252
MLP0.66930.66960.66940.6776
LRLocation (9,800)0.56500.56570.56530.5779
SVM0.50010.28850.36590.5771
RF0.55900.57640.54810.5959
MLP0.55430.55420.55420.5646
LRApp+Location(11,800)0.64070.64180.64110.6510
SVM0.62670.62900.62740.6393
RF0.58910.60850.58390.6213
MLP0.65440.65470.65460.6631
Table 4. Performance of Classic Models on the MNO Dataset for Predicting Users’ Gender
Table 5.
ModelFeaturePREMacro-F1Micro-F1
LRApp (2,728)0.53490.40430.5
SVM0.39080.38620.5279
RF0.47650.41050.4902
MLP0.53410.41820.5363
LRLocation (9,046)0.28100.21390.4012
SVM0.30250.21610.4053
RF0.19890.28460.4566
MLP0.26680.22030.3923
LRApp+Location(11,774)0.52670.39910.4939
SVM0.40220.37100.4809
RF0.36030.34080.5034
MLP0.52990.40490.4993
Table 5. Performance of Classic Models on TalkingData for Predicting Users’ Age Groups
Table 6.
ModelAUCGain on AUCPREGain on PREMacro-F1Gain on Macro-F1Micro-F1Gain on Micro-F1
DeepWalk0.673010.0446%0.67059.9329%0.67129.1330%0.67648.3678%
Node2vec0.661112.0254%0.660811.5466%0.660910.8337%0.66869.6321%
Metapath2vec0.69087.2090%0.69346.3023%0.68826.4371%0.68976.2781%
GCN0.570429.8387%0.67818.7008%0.515742.0400%0.610720.0262%
GCN-Sampling0.67909.0722%0.68477.6530%0.67418.6634%0.67378.8021%
GAT0.646514.5653%0.655212.5000%0.64738.6634%0.660510.9770%
GAT-Sampling0.68078.7998%0.69226.4866%0.68237.3575%0.69575.3615%
HAN0.70365.2587%0.70624.3755%0.70443.9892%0.70833.4872%
MRel-HGAN0.74060.73710.73250.7330
Table 6. Performance of Graph-Based Models on the MNO Dataset for Predicting Users’ Gender
Table 7.
ModelPREGain on PREMacro-F1Gain on Macro-F1Micro-F1Gain on Micro-F1
DeepWalk0.551715.1169%0.439418.6163%0.478934.9969%
Node2vec0.568111.7937%0.421323.7123%0.512926.0480%
Metapath2vec0.60345.2536%0.49565.1655%0.525323.0725%
GCN0.563612.6863%0.439018.7244%0.559115.6323%
GCN-Sampling0.59357.0093%0.48327.8642%0.581311.2162%
GAT0.58827.9735%0.452215.2587%0.60546.7889%
GAT-Sampling0.60405.1490%0.48517.4418%0.61355.3790%
HAN0.60944.2173%0.50862.4774%0.62782.9787%
MRel-HGAN0.63510.52120.6465
Table 7. Performance of Graph-Based Models on TalkingData for Predicting Users’ Age Groups
First, MRel-HGAN performs best among all methods, including both classic models and graph-based models. Specifically, as shown in Table 6, MRel-HGAN outperforms the best baseline by 5.26%, 4.38%, 3.99%, and 3.49% in terms of AUC, PRE, Macro-F1, and Micro-F1 on the MNO dataset for predicting users’ gender. In addition, MRel-HGAN outperforms the best baseline by 4.22%, 2.48%, and 2.98% in terms of PRE, Macro-F1, and Micro-F1 on TalkingData for predicting users’ age groups.
Second, compared with visited location information, the used app information is more useful for the task of both gender prediction and age group prediction. As shown in Tables 4 and 5, all classic models’ performance in the app-based scenario is better than that in the location-based scenario. Moreover, classic models perform poorly on the app-location-joint scenario, implying that simple concatenating operation is insufficient for coping with heterogeneous features and cannot explore the hidden relationships between different types of features.
Third, graph-based models generally outperform classic models. The main reason is that the graph structure can capture spatiotemporal app usage behavior across various users very well. By traveling through the local graph structures, graph-based models can learn the hidden relations between users, apps, locations, and time, which are helpful for user profiling.
Fourth, GCN performs the worst for the task of gender prediction compared to other graph-based models. As stated in Section 4.1, the reason may be twofold: oversmoothing and varying node degrees, which can be solved by the neighbor sampling mechanism. By applying our proposed neighbor sampling operation, GCN-Sampling achieves satisfactory performance.
Fifth, GAT outperforms GCN because GAT applies the attention mechanism to automatically estimate the importance of neighbors, which will mitigate the issues of varying node degrees and oversmoothing. In addition, GAT still obtains performance gain from the neighbor sampling mechanism.
Sixth, HAN achieves the best performance among all baselines. This is because HAN formalizes meta-path-based homogeneous graphs. Such a meta-path-based structure can leverage the semantics of different types of edges to enhance the performance of learned embeddings of nodes. However, HAN discards intermediate nodes along the meta-path when constructing meta-path-based homogeneous graphs. Therefore, compared with MRel-HGAN, HAN cannot leverage the node features of intermediate nodes, which leads to information loss and performance degradation.

5.3 Ablation Study

In this section, we compare the performance of MRel-HGAN with the following four variants:
MRel-HGAN(L): MRel-HGAN(L) only uses the subgraph \(G_{ul}\) . The feature of a user is the average embeddings of locations the user has visited.
MRel-HGAN(A): MRel-HGAN(A) only uses the subgraph \(G_{ua}\) . The feature of a user is the average embeddings of apps used by that user.
MRel-HGAN(No-S.): MRel-HGAN(No-S.) does not use the heterogeneous neighbor sampling operation.
MRel-HGAN(No-A.): MRel-HGAN(No-A.) does not use the multi-relational attention operation. Instead, we fuse multiple relation-specific feature vectors of one node by averaging.
The results of MRel-HGAN(L), MRel-HGAN(A), MRel-HGAN(No-S.), MRel-HGAN(No-A.), and MRel-HGAN are presented in Figures 5 and 6. The following can be observed:
Fig. 5.
Fig. 5. Performance of MRel-HGAN and its variants on the MNO dataset for predicting users’ gender.
Fig. 6.
Fig. 6. Performance of MRel-HGAN and its variants on TalkingData for predicting users’ age groups.
(1)
MRel-HGAN(L) performs slightly better than classic models of the location-based scenario for predicting users’ age groups and gender. This is because we utilize POI information as the feature of location nodes in the subgraph \(G_{ul}\) , which improves performance.
(2)
MRel-HGAN(A) outperforms MRel-HGAN(L), implying that the used app information is more valuable than visited location information for predicting gender and age groups, which corresponds to the results of classic models.
(3)
The prediction performance is improved when we introduce the spatiotemporal features into the app usage graph by adding location and time nodes.
(4)
MRel-HGAN(No-S.) performs better than GCN and GAT, demonstrating the effectiveness of relational graph convolutional operation in the heterogeneous graph.
(5)
MRel-HGAN(No-A.) performs better than MRel-HGAN(No-S.). The main reason is that the neighbor sampling operation can overcome oversmoothing issues and various neighbor sizes in the app usage graph.
(6)
MRel-HGAN performs better than MRel-HGAN(No-A.) because the multi-relational attention operation can automatically learn the importance of different relation-specific features for different types of nodes.
The results demonstrate that the modules we design, including heterogeneous neighbor sampling, relational graph convolutional operation, and multi-relational attention operation, are necessary to integrate the multi-relational data in the heterogeneous network.

6 Related Work and Study Limitations

6.1 User Profiling from App Usage Behavior

The relationships between various user profiles and variations in mobile app usage behavior have been the subject of several studies. Zhao et al. [44], for instance, analyzed the user-group level. They identified 382 distinct categories of users based on user behavior in mobile apps. They then assigned user groups semantic labels, such as night communicators, financial users, and evening learners. In a descriptive examination of how gender and age affect app usage, Andone et al. [9] found that both factors are influenced. Teenagers between the ages of 12 and 17 years are found to spend more than 40 minutes each day, on average, using communication, social networking, and gaming applications. However, those older than 30 years only use these applications for less than 10 minutes at a time. Peltonen et al. [45] examined how consumers’ cultural backgrounds affect their usage patterns. The researchers identified three main categories of cultural affiliations with corresponding use patterns: European, English-speaking, and mixed cultures. They did this by using app category usage as a feature using the hierarchical clustering technique. Zhao et al. [46] conducted a good survey to summarize recent studies on user profiling from their use of smartphone applications.
Predictive analytics, or the inference of user profiles using characteristics retrieved from app usage logs, have been done in certain studies [47]. Seneviratne et al. [22], for instance, collected data on 200 users’ mobile app usage and used SVM to infer gender classifications based on the users’ app adoptions. Additionally, Malmi and Weber [10] used a bigger dataset encompassing 3,760 individuals to validate the research of Seneviratne et al. [22] and forecast additional demographics, such as income and race. They found that while wealth is difficult to forecast, gender is the factor that is most difficult. Zhao et al. [7] used SVM and MLP to estimate the gender of users after extracting specified subject characteristics from app descriptions. Additionally, personality qualities are a crucial aspect of users, and considerable work has been done in this area. For instance, Peltonen et al. [48] and Montjoye et al. [49] used metrics based on mobile phone usage and app usage traces to infer personality aspects of consumers. A federated learning framework was also suggested by Brandão et al. [50] to safeguard user privacy when doing user profile activities. These studies, however, do not examine the spatiotemporal characteristics of app usage and are restricted to the app dimension. Additionally, because these systems cannot automatically extract relationships between people, applications, places, and time, they must either build handmade features or fusion methods, which restricts their applicability and efficiency. Notably, some earlier studies may attain competitive performance with carefully specified elements, such as app start and shut events, mobile phone brands, and battery drain patterns. However, on a broad scale, service providers and network operators find it challenging to gather the characteristics they utilize.

6.2 Graph-Based Representation Learning

Graph-based representation learning aims to learn low-dimensional vectors to represent vertices in a graph by exploring the local structure of vertices. Traditional approaches are based on matrix factorization using an adjacency matrix [51] or Laplacian matrix [52]. Inspired by the word embedding model [37], some algorithms were proposed to learn vertex representations based on random walks [36, 38]. Generally, they applied a random walk algorithm to traverse the graph and generated a series of node sequences. They then obtained node embeddings by treating node sequences as the equivalent of sentences and feeding them into the skip-gram model. In particular, Perozzi et al. [36] employed the conventional random walk algorithm, whereas Grover and Leskovec [38] used a biased random walk procedure to explore different graph structures. Moreover, to cope with the heterogeneous graph structure, Dong et al. [39] proposed a meta-path-based random walk approach, which can leverage the semantics of different types of nodes to enhance the performance of node embeddings.
Many graph embedding approaches based on GNNs have recently emerged. The GNN, which learns vertex representations by taking into account both local graph structure and node attributes, has performed well in a variety of tasks, including recommendation [53] and node classification [40]. Kipf and Welling [40] presented a powerful model: GCN. GCN is a kind of GNN that uses the graph Laplacian matrix to conduct convolutional operations in the graph Fourier domain. Furthermore, by substituting the convolutional operation with an attention operation, Veličković et al. [41] introduced GAT. During the aggregation operation, the attention mechanism decides the relative relevance of neighbor information for the target node. GAT, however, discards edge weight information, limiting its effectiveness on weighted graphs. Furthermore, Wang et al. [42] suggested HAN to deal with the problem of heterogeneous graphs by integrating meta-paths and GAT. They initially turn a heterogeneous network into several meta-path-based homogeneous graphs. They then employed an attention method to merge the meta-path-specific embeddings of each node. However, HAN failed to account for intermediate nodes along the meta-path, resulting in information loss and performance limitations.

6.3 Study Limitations

This article studied the problem of user profiling from their spatiotemporal mobile app usage behavior based on real-world datasets. Notably, our proposed framework is a general model for various user profiles. However, limited by the available datasets, we can only evaluate it with gender labels and age groups, which is a major limitation of our work. Additionally, in this work, we only take app categories as the feature of apps, which may limit our model’s performance. Some previous studies suggested that app icons [7], app descriptions [54], and user comments on apps [55] are also helpful in deriving app features. Hence, enriching app features with cross-domain data is a future step of this work.

7 Conclusion

The problem of user profiling based on spatiotemporal app use behavior was investigated in this article. We proposed MRel-HGAN, a graph learning based model that integrates users, apps, locations, and time entities into a single low-dimensional latent space. By applying a bootstrapping-based heterogeneous neighbor sampling strategy, MRel-HGAN can overcome the issue of oversmoothing caused by the high density of the mobile app usage graph. We then designed a relational graph convolutional operation and a multi-relational attention operation to explore the rich semantic information of various relations among apps, users, locations, and time. MRel-HGAN outperforms SOTA baselines for user profiling in experiments conducted on large-scale real-world datasets. Additionally, we verified the effectiveness of components in MRel-HGAN. This study opens the way for a slew of app usage related applications, including personalized app recommendations, app usage analysis, and app service optimization.

Footnotes

2
We only take into account PRE, Macro-F1, and Micro-F1 as evaluation metrics in the problem of age group prediction because it is a multi-class classification problem.

References

[1]
Joohyun Lee, Kyunghan Lee, Euijin Jeong, Jaemin Jo, and Ness B. Shroff. 2016. Context-aware application scheduling in mobile systems: What will users do and not do next? In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp’16). 1235–1246.
[2]
Yukuo Cen, Xu Zou, Jianwei Zhang, Hongxia Yang, Jingren Zhou, and Jie Tang. 2019. Representation learning for attributed multiplex heterogeneous network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’19). 1358–1368.
[3]
Hong Cao and Miao Lin. 2017. Mining smartphone data for app usage prediction and recommendations: A survey. Pervasive and Mobile Computing 37 (2017), 1–22.
[4]
Deyu Tian, Yun Ma, Aruna Balasubramanian, Yunxin Liu, Gang Huang, and Xuanzhe Liu. 2021. Characterizing embedded web browsing in mobile apps. IEEE Transactions on Mobile Computing 21, 11 (2021), 3912–3925.
[5]
Hengshu Zhu, Enhong Chen, Hui Xiong, Huanhuan Cao, and Jilei Tian. 2013. Mobile app classification with enriched contextual information. IEEE Transactions on Mobile Computing 13, 7 (2013), 1550–1563.
[6]
Tong Li, Tong Xia, Huandong Wang, Zhen Tu, Sasu Tarkoma, Zhu Han, and Pan Hui. 2022. Smartphone app usage analysis: Datasets, methods, and applications. IEEE Communications Surveys & Tutorials 24, 2 (2022), 937–966.
[7]
Sha Zhao, Yizhi Xu, Xiaojuan Ma, Ziwen Jiang, Zhiling Luo, Shijian Li, Laurence Tianruo Yang, Anind Dey, and Gang Pan. 2020. Gender profiling from a single snapshot of apps installed on a smartphone: An empirical study. IEEE Transactions on Industrial Informatics 16, 2 (Feb.2020), 1330–1342.
[8]
Clemens Stachl, Quay Au, Ramona Schoedel, Samuel D. Gosling, Gabriella M. Harari, Daniel Buschek, Sarah Theres Völkel, et al. 2020. Predicting personality from patterns of behavior collected with smartphones. Proceedings of the National Academy of Sciences 117, 30 (2020), 17680–17687.
[9]
Ionut Andone, Konrad Błaszkiewicz, Mark Eibes, Boris Trendafilov, Christian Montag, and Alexander Markowetz. 2016. How age and gender affect smartphone usage. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct (UbiComp’16). 9–12.
[10]
Eric Malmi and Ingmar Weber. 2016. You are what apps you use: Demographic prediction based on user’s apps. In Proceedings of the International AAAI Conference on Web and Social Media (ICWSM’16).
[11]
Elizabeth L. Murnane, Saeed Abdullah, Mark Matthews, Matthew Kay, Julie A. Kientz, Tanzeem Choudhury, Geri Gay, and Dan Cosley. 2016. Mobile manifestations of alertness: Connecting biological rhythms with patterns of smartphone app use. In Proceedings of the 18th International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI’16). 465–477.
[12]
Afshin Rahimi, Trevor Cohn, and Timothy Baldwin. 2018. Semi-supervised user geolocation via graph convolutional networks. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2009–2019.
[13]
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems. 1024–1034.
[14]
Keiichi Ochiai, Naoki Yamamoto, Takashi Hamatani, Yusuke Fukazawa, and Takayasu Yamaguchi. 2019. Exploiting graph convolutional networks for representation learning of mobile app usage. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data’19). IEEE, Los Alamitos, CA, 5379–5383.
[15]
Hongyang Gao, Zhengyang Wang, and Shuiwang Ji. 2018. Large-scale learnable graph convolutional networks. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’18). 1416–1424.
[16]
Qimai Li, Zhichao Han, and Xiao Ming Wu. 2018. Deeper insights into graph convolutional networks for semi-supervised learning. In Proceedings of the AAAI Conference on Artificial Intelligence.
[17]
Tong Li, Yong Li, Mohammad A. Hoque, Tong Xia, Sasu Tarkoma, and Pan Hui. 2020. To what extent we repeat ourselves? Discovering daily activity patterns across mobile app usage. IEEE Transactions on Mobile Computing 21, 4 (2020), 1492–1507.
[18]
Tong Li, Yong Li, Tong Xia, and Pan Hui. 2021. Finding spatiotemporal patterns of mobile application usage. IEEE Transactions on Network Science and Engineering. Early access, November 30, 2021.
[19]
Tong Li, Mingyang Zhang, Hancheng Cao, Yong Li, Sasu Tarkoma, and Pan Hui. 2020. What apps did you use?: Understanding the long-term evolution of mobile app usage. In Proceedings of The Web Conference 2020. 66–76.
[20]
Tong Li, Yali Fan, Yong Li, Sasu Tarkoma, and Pan Hui. 2023. Understanding the long-term evolution of mobile app usage. IEEE Transactions on Mobile Computing 22, 2 (2023), 1213–1230.
[21]
Zhen Tu, Yong Li, Pan Hui, Li Su, and Depeng Jin. 2019. Personalized mobile app recommendation by learning user’s interest from social media. IEEE Transactions on Mobile Computing 19, 11 (2019), 2670–2683.
[22]
Suranga Seneviratne, Aruna Seneviratne, Prasant Mohapatra, and Anirban Mahanti. 2015. Your installed apps reveal your gender and more! ACM SIGMOBILE: Mobile Computing and Communications Review 18, 3 (Jan.2015), 55–61.
[23]
Sha Zhao, Gang Pan, Jianrong Tao, Zhiling Luo, Shijian Li, and Zhaohui Wu. 2020. Understanding smartphone users from installed app lists using Boolean matrix factorization. IEEE Transactions on Cybernetics 52, 1 (2020), 384–397.
[24]
Bin Guo, Yixuan Zhang, Jiaqi Liu, Tong Guo, Yi Ouyang, and Zhiwen Yu. 2022. Which app is going to die? A framework for app survival prediction with multi-task learning. IEEE Transactions on Mobile Computing 21, 2 (2022), 728–739.
[25]
Shan Jiang, Joseph Ferreira, and Marta C. Gonzalez. 2012. Discovering urban spatial-temporal structure from human activity patterns. In Proceedings of the ACM SIGKDD International Workshop on Urban Computing (UrbComp’12). 95–102.
[26]
Jing Yuan, Yu Zheng, and Xing Xie. 2012. Discovering regions of different functions in a city using human mobility and POIs. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’12). 186–194.
[27]
Thomas Roelleke and Jun Wang. 2008. TF-IDF uncovered: A study of theories and probabilities. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’08). 435–442.
[28]
Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How powerful are graph neural networks? In Proceedings of the International Conference on Learning Representations.
[29]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998–6008.
[30]
Hongyi Yao, Gyan Ranjan, Alok Tongaonkar, Yong Liao, and Zhuoqing Morley Mao. 2015. SAMPLES: Self Adaptive Mining of Persistent LExical Snippets for classifying mobile application traffic. In Proceedings of the 21st Annual International Conference on Mobile Computing and Networking (MobiCom’15). ACM, New York, NY, 439–451.
[31]
Sha Zhao, Gang Pan, Yifan Zhao, Jianrong Tao, Jinlai Chen, Shijian Li, and Zhaohui Wu. 2016. Mining user attributes using large-scale app lists of smartphones. IEEE Systems Journal 11, 1 (2016), 315–323.
[32]
Sara Rosenthal and Kathleen McKeown. 2011. Age prediction in blogs: A study of style, content, and online behavior in pre- and post-social media generations. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. 763–772.
[33]
Faiyaz Al Zamal, Wendy Liu, and Derek Ruths. 2012. Homophily and latent attribute inference: Inferring latent attributes of Twitter users from neighbors. In Proceedings of the International AAAI Conference on Web and Social Media (ICWSM’12).
[34]
Aleksandr Farseev, Liqiang Nie, Mohammad Akbari, and Tat-Seng Chua. 2015. Harvesting multiple sources for user profile learning: A big data study. In Proceedings of the 5th ACM International Conference on Multimedia Retrieval (ICMR’15). 235–242.
[35]
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. 1989. Backpropagation applied to handwritten zip code recognition. Neural Computation 1, 4 (1989), 541–551.
[36]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). 701–710.
[37]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111–3119.
[38]
Aditya Grover and Jure Leskovec. 2016. Node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16). 855–864.
[39]
Yuxiao Dong, Nitesh V. Chawla, and Ananthram Swami. 2017. Metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’17). 135–144.
[40]
Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17): Conference Track Proceedings.
[41]
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph attention networks. In Proceedings of the International Conference on Learning Representations (ICLR’17).
[42]
Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S. Yu. 2019. Heterogeneous graph attention network. In Proceedings of the World Wide Web Conference. 2022–2032.
[43]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15): Conference Track Proceedings.
[44]
Sha Zhao, Julian Ramos, Jianrong Tao, Ziwen Jiang, Shijian Li, Zhaohui Wu, Gang Pan, and Anind K. Dey. 2016. Discovering different kinds of smartphone users through their application usage behaviors. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp’16). 498–509.
[45]
Ella Peltonen, Eemil Lagerspetz, Jonatan Hamberg, Abhinav Mehrotra, Mirco Musolesi, Petteri Nurmi, and Sasu Tarkoma. 2018. The hidden image of mobile apps: Geographic, demographic, and cultural factors in mobile usage. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI’18). Article 10, 12 pages.
[46]
Sha Zhao, Shijian Li, Julian Ramos, Zhiling Luo, Ziwen Jiang, Anind K. Dey, and Gang Pan. 2019. User profiling from their use of smartphone applications: A survey. Pervasive and Mobile Computing 59 (2019), 101052.
[47]
Erheng Zhong, Ben Tan, Kaixiang Mo, and Qiang Yang. 2013. User demographics prediction based on mobile data. Pervasive and Mobile Computing 9, 6 (2013), 823–837.
[48]
Ella Peltonen, Parsa Sharmila, Kennedy Opoku Asare, Aku Visuri, Eemil Lagerspetz, and Denzil Ferreira. 2020. When phones get personal: Predicting big five personality traits from application usage. Pervasive and Mobile Computing 69 (2020), 101269.
[49]
Yves-Alexandre de Montjoye, Jordi Quoidbach, Florent Robic, and Alex Sandy Pentland. 2013. Predicting personality using novel mobile phone-based metrics. In Proceedings of the International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction. 48–55.
[50]
André Brandão, Ricardo Mendes, and João P. Vilela. 2022. Prediction of mobile app privacy preferences with user profiles via federated learning. In Proceedings of the 12th ACM Conference on Data and Application Security and Privacy. 89–100.
[51]
Sam T. Roweis and Lawrence K. Saul. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science 290, 5500 (2000), 2323–2326.
[52]
Mikhail Belkin and Partha Niyogi. 2002. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in Neural Information Processing Systems. 585–591.
[53]
Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019. Neural graph collaborative filtering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). 165–174.
[54]
Naveen Karunanayake, Jathushan Rajasegaran, Ashanie Gunathillake, Suranga Seneviratne, and Guillaume Jourjon. 2020. A multi-modal neural embeddings approach for detecting mobile counterfeit apps: A case study on Google Play Store. arXiv:2006.02231v1 (2020).
[55]
Yi Ouyang, Bin Guo, Xinjiang Lu, Qi Han, Tong Guo, and Zhiwen Yu. 2018. CompetitiveBike: Competitive analysis and popularity prediction of bike-sharing apps using multi-source data. IEEE Transactions on Mobile Computing 18, 8 (2018), 1760–1773.

Cited By

View all
  • (2024)A Novel Framework for Joint Learning of City Region Partition and RepresentationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365285720:7(1-23)Online publication date: 16-May-2024
  • (2024)SiG: A Siamese-Based Graph Convolutional Network to Align Knowledge in Autonomous Transportation SystemsACM Transactions on Intelligent Systems and Technology10.1145/364386115:2(1-20)Online publication date: 28-Mar-2024
  • (2024)Sequential Recommendation with Collaborative Explanation via Mutual Information MaximizationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657770(1062-1072)Online publication date: 10-Jul-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology
ACM Transactions on Intelligent Systems and Technology  Volume 14, Issue 4
August 2023
481 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/3596215
  • Editor:
  • Huan Liu
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 July 2023
Online AM: 17 May 2023
Accepted: 30 April 2023
Revised: 10 February 2023
Received: 06 May 2022
Published in TIST Volume 14, Issue 4

Check for updates

Author Tags

  1. Mobile computing
  2. mobile app usage
  3. user profiling
  4. graph neural networks

Qualifiers

  • Research-article

Funding Sources

  • National Key Research and Development Program of China
  • National Natural Science Foundation of China
  • Guoqiang Institute
  • International Postdoctoral Exchange Fellowship Program (Talent-Introduction Program)
  • China Postdoctoral Science Foundation
  • Academy of Finland

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)894
  • Downloads (Last 6 weeks)123
Reflects downloads up to 18 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Novel Framework for Joint Learning of City Region Partition and RepresentationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365285720:7(1-23)Online publication date: 16-May-2024
  • (2024)SiG: A Siamese-Based Graph Convolutional Network to Align Knowledge in Autonomous Transportation SystemsACM Transactions on Intelligent Systems and Technology10.1145/364386115:2(1-20)Online publication date: 28-Mar-2024
  • (2024)Sequential Recommendation with Collaborative Explanation via Mutual Information MaximizationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657770(1062-1072)Online publication date: 10-Jul-2024
  • (2024)Demand-driven Urban Facility Visit PredictionACM Transactions on Intelligent Systems and Technology10.1145/362523315:2(1-24)Online publication date: 22-Feb-2024
  • (2024)Generative AI for Cyber Security: Analyzing the Potential of ChatGPT, DALL-E, and Other Models for Enhancing the Security SpaceIEEE Access10.1109/ACCESS.2024.338510712(53497-53516)Online publication date: 2024
  • (2024)Early energy performance analysis of smart buildings by consolidated artificial neural network paradigmsHeliyon10.1016/j.heliyon.2024.e2584810:4(e25848)Online publication date: Feb-2024
  • (2024)Link prediction in multilayer social networks using reliable local random walk and boosting ensemble classifierChaos, Solitons & Fractals10.1016/j.chaos.2024.115530188(115530)Online publication date: Nov-2024
  • (2024)Impact of Music Teaching on Student Mental Health Using IoT, Recurrent Neural Networks, and Big Data AnalyticsMobile Networks and Applications10.1007/s11036-024-02366-0Online publication date: 16-Jul-2024
  • (2023)Triple Dual Learning for Opinion-based Explainable RecommendationACM Transactions on Information Systems10.1145/363152142:3(1-27)Online publication date: 30-Dec-2023
  • (2023)MHANER: A Multi-source Heterogeneous Graph Attention Network for Explainable Recommendation in Online GamesACM Transactions on Intelligent Systems and Technology10.1145/3626243Online publication date: 9-Oct-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media