DOI: https://doi.org/10.1145/3366423.3380051
WWW '20: Proceedings of The Web Conference 2020, Taipei, Taiwan, April 2020
Modern online content-sharing platforms host billions of items like music, videos, and products uploaded by various providers for users to discover items of their interests. To satisfy the information needs, the task of effective item retrieval (or item search ranking) given user search queries has become one of the most fundamental problems to online content-sharing platforms. Moreover, the same query can represent different search intents for different users, so personalization is also essential for providing more satisfactory search results. Different from other similar research tasks, such as ad-hoc retrieval and product retrieval with copious words and reviews, items in content-sharing platforms usually lack sufficient descriptive information and related meta-data as features. In this paper, we propose the end-to-end deep attentive model (EDAM) to deal with personalized item retrieval for online content-sharing platforms using only discrete personal item history and queries. Each discrete item in the personal item history of a user and its content provider are first mapped to embedding vectors as continuous representations. A query-aware attention mechanism is then applied to identify the relevant contexts in the user history and construct the overall personal representation for a given query. Finally, an extreme multi-class softmax classifier aggregates the representations of both query and personal item history to provide personalized search results. We conduct extensive experiments on a large-scale real-world dataset with hundreds of million users from a large video media platform at Google. The experimental results demonstrate that our proposed approach significantly outperforms several competitive baseline methods. It is also worth mentioning that this work utilizes a massive dataset from a real-world commercial content-sharing platform for personalized item retrieval to provide more insightful analysis from the industrial aspects.
ACM Reference Format:
Jyun-Yu Jiang, Tao Wu, Georgios Roumpos, Heng-Tze Cheng, Xinyang Yi, Ed Chi, Harish Ganapathy, Nitin Jindal, Pei Cao and Wei Wang,. 2020. End-to-End Deep Attentive Personalized Item Retrieval for Online Content-sharing Platforms. In Proceedings of The Web Conference 2020 (WWW '20), April 20–24, 2020, Taipei, Taiwan. ACM, New York, NY, USA 8 Pages. https://doi.org/10.1145/3366423.3380051
Nowadays, online content-sharing platforms, such as music streaming system, photo and video sharing platform, and online e-commerce, have already become one of the most indispensable media in our lives [6]. However, enormous amounts of users are also accompanied with myriad uploaded contents. To ease the burden of discovering suitable contents from copious corpora, item search becomes one of the most essential functions to derive relevant items to certain queries and satisfy users’ information needs.
Compared to ad-hoc search tasks [28], queries in item retrieval are usually short and vague while the user search intents can be more ambiguous. Figure 1 illustrates an example of an ambiguous query and distinct search intents for two users. More specifically, a certain query can be simultaneously relevant to multiple items while users can have distinct search intents for the query. Hence, the search results should be personalized for different users with their information needs. Moreover, the shortage of descriptive information for both users and items further increases the difficulty of personalization. Hence, personalized item retrieval remains an important research problem, especially for commercial platforms with millions of daily users.
One of the feasible solutions for personalized retrieval is to exploit user history because user historical behaviors can reveal user interests while the idea has already been studied in some research tasks, such as product search and ad-hoc search. For example, previous studies [2, 3, 16] summarize the reviews of purchased products into continuous user features for personalized product search. The words in clicked documents can be utilized to recognize user search intents for personalized ad-hoc search [5, 9, 34, 36]. However, there are few existing studies that address personalized item retrieval with user history while most of the previous works focus on utilizing descriptive document contents. Moreover, personalized ad-hoc and product search methods require descriptive information like reviews that are usually unavailable for items in content-sharing platforms. Hence, although existing methods have demonstrated the effectiveness of descriptive information, it is necessary for personalized item retrieval to obtain user features without any additional information of items in user history.
Without any descriptive information, machine learning models require to learn continuous representations for items as the inputs. To derive decent item representations, some previous studies [14] propose to pre-train the item embeddings and then learn the retrieval models with fixed item representations. However, pre-trained embeddings and multi-stage approaches lead to several drawbacks for item retrieval in real-world content-sharing platforms. First, a massive amount of new items are uploaded to the system every day so that the pre-training needs to be frequently started over to avoid cold-start problems. Second, fixed item embeddings degenerates the retrieval model so that the item representations cannot be flexibly optimized with queries and the objective of the retrieval task. Last but not least, multi-stage approaches can be too complicated to be integrated into sophisticated real-world production pipelines with numerous components. Therefore, an end-to-end approach for item retrieval is required for industrial content-sharing platforms.
In this paper, we propose the end-to-end deep attentive model (EDAM) to address the problems of personalized item retrieval for online content-sharing platforms. Without any descriptive information, we learn a continuous representation for each item and each content provider so that the query-aware attention mechanism can derive historical item and content provider representations from personal item history. In addition, we propose to utilize external key embeddings for estimating item attention weights in a different latent space. The sequential knowledge in user history can be also learned from preserving item locality with context items. Experiments on a large-scale dataset from a real-world commercial content-sharing platform demonstrate that EDAM significantly outperforms conventional baseline methods in related personalized search tasks across different evaluation metrics and history lengths.
Personalized Task | Descriptive Information | Meta Information |
Ad-hoc Search | ✓ (documents) | ✗ |
Web Search | ✓ (web pages) | ✗ |
Microblog Search | ✓ (tweets) | ✓ (hashtags) |
Product Search | ✓ (product reviews) | ✓ (categories) |
Item Search | ✗ | ✓ (content providers) |
In the literature, although none of the previous studies focuses on personalized item retrieval for online content-sharing platforms, personalized product search [2, 3, 16] is one of the most related tasks that consider using descriptive information like product reviews. More precisely, the descriptive information can interpret and link both users and products. The structured information [11, 12, 27, 32, 42] and context images [13] can be also applied into personalization. However, all of them rely on descriptive information. Some studies [4, 20, 23, 41] conduct feature engineering and learn a separate ranking model. Personalized listing search [14, 17] is also related to our work, but they highly count on heterogeneous meta data and pre-trained embeddings. In addition to personalized product search, personalization in ad-hoc search [5, 9, 18, 29, 34, 36, 37, 38] and microblog search [31, 40] are also relevant to personalized item retrieval. However, all of the existing methods require descriptive information while some models need to be separately learned. Table 1 summarizes the comparisons between different personalized search tasks. Item recommendation with queries [7, 8, 10, 25, 26, 33, 35] and neural information retrieval [15, 21, 30, 43] can be also treated as related tasks to this work.
Our contributions can be summarized as:
In this section, we first formally define the objective of this paper, and then introduce our proposed approach, end-to-end deep attentive model (EDAM) to address the task of personalized item retrieval for online content-sharing platforms.
In this paper, we focus on personalized item retrieval using only query and personal item history like watched videos and listened musics. Suppose that V and C are the corpora of items and content providers, where the content provider c of an item v is denoted as C(v) ∈ C. Each query q is composed of a set of terms T(q) = {t1, ⋅⋅⋅, t|q|}, where ti is the i-th term of q; |q| is the number of terms in q. The profile of a user u can be represented by the personal item history as a set of accessed items HV(u) ⊂ V and the set of corresponding content providers HC(u) = {C(v)∣v ∈ HV(u)} ⊂ C. For a user u and a query q, R(q, u) ⊂ V indicates the corresponding set of items that are relevant to the query. Given a user u and a query q, our goal is to rank all of the items in V so that the relevant items R(q, u) can be ranked as high as possible. Note that the task is extremely difficult because only personal item history is available while none of meta-data and descriptive information is granted for items and content providers.
Figure 2 shows the illustration of the proposed framework personalized item retrieval with user history for online content-sharing platforms. Items and content providers in the user history are first mapped to item embeddings and provider embeddings while the query embedding is derived by aggregating the embeddings of query terms. With the query-aware attention mechanism, we compute the importance of each provider and each item, thereby obtaining the ultimate representations of historical items and content providers. In addition, we propose to utilize external item key memory for better estimation of item importance. Finally, after aggregating the representations of query and user history, the personalized search results can be derived by a softmax function over the candidate items. Moreover, the item key embeddings can be improved by an auxiliary classification task with query embeddings while the sequential knowledge in user history can be learned by locality preservation for item and provider embeddings.
To utilize the knowledge in the user history, we propose query-aware attention with external key memory to model user history. More precisely, an embedding-based model derives continuous representations in latent spaces for history items and content providers.
Query Embedding.For the given query, we derive a continuous bag-of-terms representation as the query embedding by aggregating term embeddings due to the production efficiency. Formally, the query embedding of the query q can be computed as:
User History Modeling with Query-aware Attention.As shown in previous studies [2, 3], user history can be useful for personalization. However, many of the activities in user history can be irrelevant to user search intents. In the item retrieval task, the query plays one of the most essential roles and directly represents search intents of the user. Hence, we utilize the query information for user history modeling with query-aware attention. More specifically, two continuous representations are derived to indicate relevant items and content providers in the user history.
Take the historical item representation as an example. In this paper, we estimate the importance of each item in the user history with the scaled dot-product attention [39]. For each item in user history v ∈ HV(u), the attention weight α(v, q) as the importance with the query q can be computed as follows:
External Key Embeddings for Item Attention.Generally, the query-aware attention projects historical items and content providers onto the latent query embedding space so that the embedding similarity can be treated as the importance scores. However, the query embedding space can be inappropriate to represent items and content providers. Moreover, the embedding spaces of different entities can be different. Although some studies [2, 3] apply non-linear transformations to cast embeddings into the same space for estimating attention weights, it can be better to independently model representations and estimate attention weights.
In this work, we propose to use additional external key embeddings for estimating item attention weights. For each item v ∈ V, instead of utilizing the item embedding v, we independently learn an external key embedding kv in the query embedding space to compute the attention weight as follows:
Finally, to capture the knowledge of both query and user history and model their interactions, the ultimate features h for deriving search results can be computed with a fully-connected layer h = ReLU(Whh0 + bh), where h0 = [q; hV; hC] concatenates the query embedding and the representations of items and content providers in user history; Wh and bh represent the layer weights and biases for dh hidden units; ReLU(·) is the rectified linear unit as the activation function.
To derive the ranking results, we follow the previous industrial approach [8] to pose the ranking problem as a task of extreme multi-class classification with the ultimate features h. More precisely, given a query q and a user u, we aim to calculate a probabilistic score P(v∣q, u) for each candidate item v ∈ V as the estimated relevance to the query.
Given the ultimate features h, we can derive the logits for multi-class classification with a fully-connected layer as x = Wsh, where Ws represents the weights for obtaining logits. Finally, the relevance scores P(v∣q, u) can be computed with a softmax function as:
When item key embeddings are crucial for estimating the importance of each item in user history, they can only be jointly and implicitly optimized with each other through complicated computations for item attention. To learn better item key embeddings, we propose an additional auxiliary ranker for regularization.
Since item key embeddings share the same latent space with query embeddings, item key embeddings can be also relatively applied for estimating relevance. The auxiliary task aims to estimate the relevance scores P(v∣q, {ki}) with only the query q and the item key embeddings of all items {ki}. Here we propose the item key softmax to address the auxiliary task and sharpen item key embeddings. Formally, the relevance score P(v∣q, {ki}) to the query q for the item v can be computed by replacing the weights of a softmax with item key embeddings as:
The sequential user behaviors can also indicate the relationships between items and content providers. In other words, the contexts of items in user history can also be beneficial to learn their embeddings. In this work, we conduct locality preservation for local patterns of items in user history with a continuous bag-of-words (CBOW) model as an additional regularization task.
Figure 3 shows an example of the CBOW model for items in user history. Given L context items of a certain item vi, locality preservation assumes that the embeddings of context items are capable of inferring the item vi. More formally, we aim to maximize the following objective computed by a softmax function as:
Multi-task learning is applied to simultaneously optimize the objectives of different components in EDAM, including (1) classification as ranking, (2) the auxiliary ranker, and (3) locality preservation. Each component has a corresponding loss jointly optimized with the losses of other components.
For classification as ranking and the auxiliary ranker, the tasks solve extreme multi-class classification problems with shared training data. Hence, we utilize the cross-entropy [19] between the predicted distributions and the gold standard y as the loss functions. Formally, the losses of two tasks can be computed as:
For locality preservation, it can be treated another extreme multi-class classification task for each item or content provider in user history. Hence, the locality preservation loss for different embeddings can be represented as:
Finally, the objective of multi-task learning combines the loss functions of different components as L = Lrank + Lauxiliary + Llocality.
Efficient Optimization.To efficiently train the model with millions of items and content providers in corpora, we rely on sampling negative classes as candidates from the background distribution to avoid exhausting computations [22]. More specifically, for each training instance, the cross-entropy is minimized with the class of the true label and several negative sampled classes. Practically, sampling several thousands of negative classes can lead to more than 100 times speedup over the conventional optimization in the production systems as shown in previous studies [8].
For the task of locality preservation, it is also time-consuming to enumerate all individual items and content providers in user history. Hence, in each training epoch, we stochastically sample an item and a content provider for optimizing the objectives. In other words, we manually conduct stochastic gradient descent for the part of learning locality preservation, thereby reaching several hundred times speedup over enumerating all possible candidates.
In this section, we conduct extensive experiments and in-depth analysis to verify the performance of our proposed approach.
Experimental Dataset.The experiments of this paper are conducted based on user logs of a large video media platform at Google with videos as items and channels as content providers.The dataset consists of 400 most recent accessed items of 184M users, where some of the items are accessed after issuing queries. To alleviate the impact of rare items, items are replaced with an out-of-vocabulary (OOV) item if the items are not among the top 1M items. Similarly, content providers are taken over by an OOV provider if they are not in the list of the top 400K providers.
Label Items and History Selection. The items associated with queries are treated as the label items that are relevant to the corresponding queries. Note that OOV items will not be selected as labels to prevent both training and evaluation from the noises caused by the ambiguity. To avoid the temporal leakage in the logs, we follow previous work [8] to derive the contexts as personal item history. Figure 4 shows the illustration of label items and an example of selecting personal item history for a given query. For instance, vi is not a label item without a corresponding query. In contrast, vj is a label item because it is accessed after the query qj. Given the label item vj with a query qj, the selected item history is considered as {v1, v2, ⋅⋅⋅, vj}.
Training and Evaluation.For evaluation, we randomly sample 10% of users and their logs as testing data while the data of the remaining 90% of users are considered as the training dataset. To reduce the bias of diligent users with more label items, we only adopt the last label item of each user in the testing dataset for evaluation. On the contrary, in each epoch, we independently sample a label item for each training user so that popular users would not be over-trained with more label items. Moreover, different label items of a user can be examined over training epochs.Competitive Baselines.Although none of the existing works focuses on personalized item retrieval for online content-sharing platforms without descriptive information, methods for other retrieval tasks can be modified by replacing the encoders of descriptive information with embeddings as a workaround. In our experiments, query embedding model (QEM) [3], hierarchical embedding model (HEM) [3], zero-attention model (ZAM) [2], attentive convolutional neural networks (ACNN) and recurrent neural networks (ARNN) [16] are considered as the comparative baseline methods.
Evaluation Metrics.For evaluation, we adopt success rate at top-k (SR@k) [28] to evaluate the performance of models. More precisely, SR@k denotes the percentage of the label items that can be found in the top-k ranked items.
Implementation Details.The model is implemented by Tensorflow [1] and optimized by Adam [24] with an initial learning rate of 1e-5. The embedding dimension d and the number of hidden units dh are set as 128 and 256 after fine-tuning. For all of the baselines, we also fine-tune all hyper-parameters for fair comparisons and evaluation.
Overall Evaluation.Figure 5 shows the performance of different methods. For the baselines, QEM performs the worst because it only considers query information while HEM achieves better performance by exploiting user history. AEM and ZAM are the best baselines with the attention that appropriately identifies important items and providers in history. ACNN and ARNN perform worse because they over-emphasize the sequential information, which is not as essential as the relations between history and the query for personalized search. Our proposed EDAM significantly outperforms all baselines. This is because the external item key embeddings can more appropriately model the query-history relations while locality preservation properly learns sequential knowledge.
Length of User History.We then analyze the performance with different lengths of user history. Figure 6 demonstrates the SR@1 scores of methods over different numbers of items in user history. For all methods using user history, the improvements against QEM are greater with more items in user history. When the number of historical items is limited, all baselines exploiting user history perform worse than QEM. In contrast, our proposed EDAM consistently outperforms all baselines over different history lengths. It shows that EDAM is capable of deriving essential information from user history across different situations of user history.Method | Length of User History | ||||
Overall | [0, 50] | [51, 100] | [101, 200] | [201, 400] | |
EDAM | 0.4097 | 0.3297 | 0.3718 | 0.3822 | 0.4196 |
-AR | 0.3973 | 0.3031 | 0.3522 | 0.3696 | 0.4089 |
-LP | 0.4039 | 0.3143 | 0.3591 | 0.3729 | 0.4155 |
Ablation Study.Here we conduct an ablation study to demonstrate the effectiveness of different components in EDAM. Table 2 depicts the SR@1 scores of EDAM with and without the auxiliary ranker (AR) and locality preservation (LP). The results show that both AR and LP are consistently beneficial across different history lengths while AR plays a more important role for EDAM. Especially, AR leads to greater improvements for shorter user history. It further demonstrates that the ability of EDAM to model personal information with only limited data as shown in Figure 6.
Content Provider Key Embeddings. In addition to historical items, we also attempt to apply external key memory into modeling content providers for personalization. Table 3 shows the SR@1 scores of ZEM and EDAM using two different key embeddings. Although item key embeddings lead to significant improvements, external key memory does not work for modeling content providers. This can be because label items of a content provider can be relevant to different queries so that the learned embeddings are noisy. Hence, EDAM only adopts the item key embeddings. In order to model the relations between the query and user history, the query-aware attention mechanism with external key memory is proposed to derive the representations of historical and content providers. The sequential knowledge can also be learned by preserving the item locality in history.Method | Length of User History | ||||
Overall | [0, 50] | [51, 100] | [101, 200] | [201, 400] | |
ZEM | 0.3957 | 0.3155 | 0.3570 | 0.3709 | 0.4056 |
EDAM (Item) | 0.4097 | 0.3297 | 0.3718 | 0.3822 | 0.4196 |
EDAM (Provider) | 0.3808 | 0.3106 | 0.3513 | 0.3608 | 0.3892 |
In this paper, we propose EDAM to address the problems of personalized item retrieval for online content-sharing platforms without any descriptive information based on the query-aware attention mechanism with external key memory and locality preservation. Experimental results and analysis on the large-scale dataset from a real-world commercial online content-sharing platform also demonstrate the effectiveness and the robustness of EDAM. The insights can be concluded as follows: (1) user history is helpful for personalized item retrieval; (2) learning external key item embeddings for estimating attention weights is beneficial, especially for the users with shorter item history; (3) sequential information in user history is sensitive for item retrieval so that EDAM with locality preservation outperforms baselines of sequence models such as ARNN.
⁎Work done while interning at Google.
This paper is published under the Creative Commons Attribution 4.0 International (CC-BY 4.0) license. Authors reserve their rights to disseminate the work on their personal and corporate Web sites with the appropriate attribution.
WWW '20, April 20–24, 2020, Taipei, Taiwan
© 2020 IW3C2 (International World Wide Web Conference Committee), published under Creative Commons CC-BY 4.0 License.
ACM ISBN 978-1-4503-7023-3/20/04.
DOI: https://doi.org/10.1145/3366423.3380051