RGMeta: Enhancing Cold-Start Recommendations with a Residual Graph Meta-Embedding Model
<p>A typical deep learning recommendation model.</p> "> Figure 2
<p>Example of ID embeddings for new and old items.</p> "> Figure 3
<p>The framework of RGMeta.</p> "> Figure 4
<p>Performance in the warm-up phase on the DNN prediction model. (<b>a</b>) MovieLens-1M (<b>b</b>) Taobao Ad.</p> "> Figure 5
<p>Effect of the equilibrium coefficient (the main prediction model is DNN).</p> "> Figure 6
<p>Effect of the number of neighborhood items (main prediction model is DNN).</p> ">
Abstract
:1. Introduction
- A meta-learning-based method, RGMeta, is proposed to solve the item cold-start problem by generating initial ID embeddings for new items. RGMeta serves as a meta-embedding model for generating initial ID embeddings and can be applied as a separate module to recommendation models that use ID embeddings.
- RGMeta further strengthens the connection between items by considering both the attribute features and target user attributes of the new and old items. The initial ID embedding of the new item is enhanced by introducing the residual operation to obtain a modified attribute embedding representation and target user embedding representation for the new item.
- Experiments were conducted on public datasets. The experimental results show that compared to the main meta-learning methods on the MovieLens-1M datasets, the AUC values of RGMeta have increased by averages of 1.5%, 1.1%, and 0.26%, respectively, thereby improving the prediction performances of cold-start problems.
2. Related Work
3. Residual Graph Meta-Embedding Model
3.1. Overview
3.2. Model Design
3.2.1. Refine Item Attribute Embedding
3.2.2. Refine the Target User’s Attribute Embedding
3.2.3. Generate Initial ID Embedding
3.3. Model Training
Algorithm 1: Train RGMeta by SGD. |
Input: fθ: the pre-trained base model. Input: N: the set of old item IDs. Input: : hyperparameter, the coefficient for meta-loss. Input: : step sizes. 1: Randomly initialize 2: while not done do 3: Randomly sample m items from N 4: for i in range(0,m) do 5: Use RGMeta to generate the initial ID embedding: ri 6: Sample mini-batches D1 and D2 each with K samples 7: Evaluate loss l1 on D1 8: Compute the adapted embedding: 9: Evaluate loss l2 on D2 10: Compute the final loss: loss = l1 + (1 −)l2 11: Update |
4. Experiments
4.1. Datasets
4.2. Backbone and Baseline
4.2.1. Backbone
- DNN: This is a deep neural network that includes an embedding layer, multiple FC layers, and an output layer [5]. Unlike traditional shallow neural networks, DNNs have multiple hidden layers, each of which can learn different levels of abstract features, thus solving some complex tasks better.
- DeepFM: This consists of a factorization machine (FM) and deep neural network (DNN) [7]. The FM part is used to model the second-order interaction between features. It is based on a Factorizer model and can effectively capture sparse interactions between features. The deep part is similar to traditional deep neural networks and is used to learn higher-order representations of features.
- Wide&Deep: This model consists of logistic regression (LR) and DNN, which can model low-order and high-order feature interactions [5]. The wide part is used to learn generalized cross-terms between features, and it can capture the linear relationship between features well. The deep part is used to learn higher-order representations of features, capturing nonlinear relationships between features.
- Deep&Cross: This model is divided into two parts: deep and cross [8]. The deep part is similar to traditional deep neural networks and is used to learn higher-order representations of features. In the cross part, the interactive information between features is mined through the calculation of cross features. It is a combination of deep learning and generalized linear models, aiming to solve the problem of traditional models when dealing with high-dimensional sparse features.
- PNN: Different from the traditional model based on feature crossing, it introduces a production layer into DNN [6]. The PNN model models the second-order interaction between features by introducing product vectors, and learns the higher-order representation of features through the deep layer, to better capture the relationship between features and further improve the accuracy of prediction.
4.2.2. Baseline
- NgbEmb [18]: For each selected neighbor item, its pre-trained ID embedding is already available. These embeddings, derived from historical data and model training on old items, effectively capture item features. NgbEmb utilizes the embedding information from these adjacent old items to generate the initial ID embedding for the new item. This generation process typically involves techniques such as weighted averaging, clustering, or other synthesis methods to effectively transfer the embedded features from old items to the new item. NgbEmb is used as a baseline method to evaluate the effectiveness of generating new item embeddings based solely on the old item’s information.
- MetaEmb [17]: Before generating ID embeddings for new items, it is necessary to first clarify the attribute characteristics of the new project. These features can include various types of information such as project category, description, price, etc. These attribute features provide rich information for the context of new items, enabling embeddings to better reflect their uniqueness. Using the attribute features of the new item, MetaEmb generates initial embeddings through specific algorithms. This process aims to transform different attribute features into a unified embedding vector, providing a more comprehensive representation of the new item. MetaEmb serves as a baseline for only considering new items.
- GME-A [18]. The GME-A model not only relies on the independent features of new items but also correlates the attribute information of old items, to comprehensively consider more data when generating ID embeddings. The core of this method lies in combining the features of new and old items to provide a more representative and robust initial embedding for new items. This enables subsequent recommendation or classification tasks to be based on richer information. GME-A is therefore regarded as a baseline model that focuses exclusively on item attribute features. It generates preliminary embedding representations by deeply mining the attribute relationships between items, without relying on user behavior data.
- CoMeta [24]: The CoMeta model consists of two submodules, namely B-EG and S-EG, which utilize collaborative information to enhance the generated meta-embeddings. Specifically, for a new item, B-EG computes a base embedding by calculating the weighted sum of the ID embeddings of similar old items. Meanwhile, S-EG generates a shift embedding that incorporates the item’s attribute features as well as the average ID embedding of users who have interacted with it. The final meta-embedding is obtained by summing the base embedding and the shift embedding.
4.3. Evaluation Metrics
4.4. Experimental Results
4.5. Ablation Studies
4.5.1. Effect of Equilibrium Coefficient
4.5.2. Effect of the Number of Neighborhood Items
4.5.3. Effect of Model Components
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- He, L.; Xia, L.; Zeng, W.; Ma, Z.; Zhao, Y.; Yin, D. Off-policy learning for multiple loggers. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 1184–1193. [Google Scholar]
- Ouyang, W.; Zhang, X.; Ren, S.; Li, L.; Liu, Z.; Du, Y. Click-through rate prediction with the user memory network. In Proceedings of the 1st International Workshop on Deep Learning Practice for High-Dimensional Sparse Data, Anchorage, AK, USA, 4–8 August 2019; pp. 1–4. [Google Scholar]
- Xia, Y.; Cao, Y.; Hu, S.; Liu, T.; Lu, L. Deep intention-aware network for click-through rate prediction. In Proceedings of the WWW ’23 Companion: Companion Proceedings of the ACM Web Conference, Austin, TX, USA, 30 April–4 May 2023; Association for Computing Machinery: New York, NY, USA, 2023; pp. 533–537. [Google Scholar]
- Kamal, M.; Bablu, T.A. Machine learning models for predicting click-through rates on social media: Factors and performance analysis. Int. J. Appl. Mach. Learn. Comput. Intell. 2022, 12, 1–4. [Google Scholar]
- Cheng, H.; Koc, L.; Harmsen, J.; Shaked, T.; Chandra, T.; Aradhye, H.; Anderson, G.; Corrado, G.; Chai, W.; Ispir, M.; et al. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, Boston, MA, USA, 15–19 September 2016; pp. 7–10. [Google Scholar]
- Qu, Y.; Fang, B.; Zhang, W.; Tang, R.; Niu, M.; Guo, H.; Yu, Y.; He, X. Product-based neural networks for user response prediction over multi-field categorical data. ACM Trans. Inf. Syst. (TOIS) 2018, 37, 1–35. [Google Scholar]
- Guo, H.; Tang, R.; Ye, Y.; Li, Z.; He, X. DeepFM: A factorization-machine based neural network for CTR prediction. arxiv 2017, arXiv:arxiv:1703.04247. [Google Scholar]
- Wang, R.; Fu, B.; Fu, G.; Wang, M. Deep & cross network for ad click predictions. In Proceedings of the ADKDD’17, Halifax, NS, Canada, 13–17 August 2017; pp. 1–7. [Google Scholar]
- Valanarasu, J.M.J.; Patel, V.M. Unext: Mlp-based rapid medical image segmentation network. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Singapore, 18–22 September 2022; pp. 23–33. [Google Scholar]
- Zhou, G.; Zhu, X.; Song, C.; Fan, Y.; Zhu, H.; Ma, X. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 1059–1068. [Google Scholar]
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–10 December 2013. [Google Scholar]
- Xia, L.; Huang, C.; Huang, C.; Lin, K.; Yu, T.; Kao, B. Automated self-supervised learning for recommendation. In Proceedings of the ACM Web Conference 2023, Austin, TX, USA, 30 April 30–4 May 2023; pp. 992–1002. [Google Scholar]
- He, W.; Sun, G.; Lu, J.; Fang, X. Candidate-aware graph contrastive learning for recommendation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, Taipei, Taiwan, 23–27 July 2023; pp. 1670–1679. [Google Scholar]
- Wei, C.; Liang, J.; Liu, D.; Dai, Z.; Li, M.; Wang, F. Meta graph learning for long-tail recommendation. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, USA, 6–10 August 2023; pp. 2512–2522. [Google Scholar]
- Wei, Y.; Wang, X.; Li, Q.; Nie, L.; Li, Y.; Li, X.; Chua, T. Contrastive learning for cold-start recommendation. In Proceedings of the 29th ACM International Conference on Multimedia, Chengdu, China, 20–24 October 2021; pp. 5382–5390. [Google Scholar]
- Volkovs, M.; Yu, G.; Poutanen, T. Dropoutnet: Addressing cold start in recommender systems. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Pan, F.; Li, S.; Ao, X.; Tang, P.; He, Q. Warm up cold-start advertisements: Improving ctr predictions via learning to learn id embeddings. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019; pp. 695–704. [Google Scholar]
- Ouyang, W.; Zhang, X.; Ren, S.; Li, L.; Zhang, K.; Luo, J.; Liu, Z.; Du, Y. Learning graph meta embeddings for cold-start ads in click-through rate prediction. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Montreal, QC, USA, 11–15 July 2021; pp. 1157–1166. [Google Scholar]
- Song, W.; Shi, C.; Xiao, Z.; Duan, Z.; Xu, Y.; Zhang, M.; Tang, J. Autoint: Automatic feature interaction learning via self-attentive neural networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 1161–1170. [Google Scholar]
- Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
- Dubey, A.K.; Jain, V. Comparative study of convolution neural network’s relu and leaky-relu activation functions. In Applications of Computing, Automation and Wireless Systems in Electrical Engineering: Proceedings of MARC; Springer: Berlin/Heidelberg, Germany, 2018; pp. 873–880. [Google Scholar]
- Zhao, X.; Ren, Y.; Du, Y.; Zhang, S.; Wang, N. Improving item cold-start recommendation via model-agnostic conditional variational autoencoder. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 2595–2600. [Google Scholar]
- Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 15–20 July 2017; pp. 1126–1135. [Google Scholar]
- Hu, H.; Rong, D.; Chen, J.; He, Q.; Liu, Z. CoMeta: Enhancing meta embeddings with collaborative information in cold-start problem of recommendation. In Proceedings of the International Conference on Knowledge Science, Engineering and Management, Guangzhou, China, 16–18 August 2023; pp. 213–225. [Google Scholar]
- Qin, J.; Zhang, W.; Wu, X.; Jin, J.; Fang, Y.; Yu, Y. User behavior retrieval for click-through rate prediction. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Xi’an, China, 25–30 July 2020; pp. 2347–2356. [Google Scholar]
- Ho, Y.; Wookey, S. The real-world-weight cross-entropy loss function: Modeling the costs of mislabeling. IEEE Access 2019, 8, 4806–4813. [Google Scholar] [CrossRef]
Dataset | #Fields | #Old Item IDs | #Samples to Train the Main Prediction Model | #Samples to Train the Cold-Start ID Embedding Model | #New Item IDs | # Samples for Warm-Up Training | #Samples for Testing |
---|---|---|---|---|---|---|---|
MovieLens-1M | 8 | 1058 | 765,669 | 43,320 | 1127 | 67,620 | 123,787 |
Taobao Ad | 23 | 62,209 | 3,592,047 | 1,784,000 | 531,593 | 810,000 | 109,712 |
Backbone | Method | MovieLens_lM | Taobao Ad | ||||
---|---|---|---|---|---|---|---|
AUC | RelaImpr | Loss | AUC | RelaImpr | Loss | ||
DNN | NgbEmb MetaEmb GME-A CoMeta RGMeta | 0.7132 0.7138 0.7235 0.7217 0.7253 | 1.19% 1.47% 6.07% 5.22% 6.93% | 0.6436 0.6436 0.6321 0.6330 0.6176 | 0.6081 0.6103 0.6198 0.6144 0.6256 | 0.93% 2.99% 11.86% 6.82% 17.27% | 0.2053 0.2017 0.1967 0.1984 0.1953 |
DeepFm | NgbEmb MetaEmb GME-A CoMeta RGMeta | 0.7133 0.7136 0.7233 0.7220 0.7242 | 1.23% 1.38% 5.98% 5.36% 6.41% | 0.6435 0.6433 0.6326 0.6320 0.6108 | 0.6161 0.6185 0.6232 0.6219 0.6280 | 0.87% 2.95% 7.04% 5.91% 11.21% | 0.2066 0.2013 0.1967 0.2016 0.1959 |
Wide&Deep | NgbEmb MetaEmb GME-A CoMeta RGMeta | 0.7166 0.7132 0.7207 0.7187 0.7219 | 2.8% 1.19% 4.75% 3.80% 5.32% | 0.6521 0.6457 0.6382 0.6399 0.6235 | 0.6166 0.6173 0.6236 0.6215 0.6266 | 0.78% 1.38% 6.83% 5.01% 9.42% | 0.4042 0.2292 0.2012 0.1985 0.1962 |
Deep&Cross | NgbEmb MetaEmb GME-A CoMeta RGMeta | 0.7102 0.7146 0.7171 0.7146 0.7212 | 0.96% 3.07% 4.27% 3.07% 6.24% | 0.6431 0.6476 0.6534 0.6313 0.6258 | 0.6081 0.6081 0.6122 0.6120 0.6217 | 0.75% 0.75% 4.57% 4.38% 13.42% | 0.2309 0.2005 0.1976 0.1956 0.1952 |
PNN | NgbEmb MetaEmb GME-A CoMeta RGMeta | 0.7061 0.7131 0.7154 0.7152 0.7166 | 1.32% 4.77% 5.90% 5.80% 6.49% | 0.6404 0.6475 0.6248 0.6455 0.6222 | 0.6025 0.6051 0.6080 0.6072 0.6214 | 0.69% 3.24% 6.09 5.30% 19.25% | 0.2112 0.2088 0.2042 0.2051 0.2027 |
Backbone | Emb.Model | MovieLens_lM | Taobao Ad | ||
---|---|---|---|---|---|
AUC | Loss | AUC | Loss | ||
DNN | GME-A RGMeta/UF RGMeta/Res RGMeta | 0.7235 0.7245 0.7250 0.7253 | 0.6321 0.6320 0.6227 0.6176 | 0.6198 0.6215 0.6224 0.6256 | 0.1967 0.1916 0.1985 0.1953 |
DeepFm | GME-A RGMeta/UF RGMeta/Res RGMeta | 0.7233 0.7234 0.7237 0.7242 | 0.6326 0.6242 0.6252 0.6108 | 0.6232 0.6250 0.6257 0.6280 | 0.1967 0.1971 0.1968 0.1959 |
Wide&Deep | GME-A RGMeta/UF RGMeta/Res RGMeta | 0.7207 0.7217 0.7214 0.7219 | 0.6382 0.6246 0.6279 0.6235 | 0.6236 0.6248 0.6252 0.6266 | 0.2012 0.1982 0.1980 0.1962 |
Deep&Cross | GME-A RGMeta/UF RGMeta/Res RGMeta | 0.7171 0.7188 0.7198 0.7212 | 0.6534 0.6347 0.6408 0.6258 | 0.6122 0.6136 0.6139 0.6217 | 0.1976 0.2035 0.2003 0.1952 |
PNN | GME-A RGMeta/UF RGMeta/Res RGMeta | 0.7154 0.7157 0.7164 0.7166 | 0.6248 0.6247 0.6294 0.6222 | 0.6080 0.6111 0.6106 0.6214 | 0.2042 0.2060 0.2054 0.2027 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, F.; Huang, C.; Xu, H.; Yang, W.; Han, W. RGMeta: Enhancing Cold-Start Recommendations with a Residual Graph Meta-Embedding Model. Electronics 2024, 13, 3473. https://doi.org/10.3390/electronics13173473
Zhao F, Huang C, Xu H, Yang W, Han W. RGMeta: Enhancing Cold-Start Recommendations with a Residual Graph Meta-Embedding Model. Electronics. 2024; 13(17):3473. https://doi.org/10.3390/electronics13173473
Chicago/Turabian StyleZhao, Fuzhe, Chaoge Huang, Han Xu, Wen Yang, and Wenlin Han. 2024. "RGMeta: Enhancing Cold-Start Recommendations with a Residual Graph Meta-Embedding Model" Electronics 13, no. 17: 3473. https://doi.org/10.3390/electronics13173473
APA StyleZhao, F., Huang, C., Xu, H., Yang, W., & Han, W. (2024). RGMeta: Enhancing Cold-Start Recommendations with a Residual Graph Meta-Embedding Model. Electronics, 13(17), 3473. https://doi.org/10.3390/electronics13173473