Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3394486.3403309acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Privileged Features Distillation at Taobao Recommendations

Published: 20 August 2020 Publication History

Abstract

Features play an important role in the prediction tasks of e-commerce recommendations. To guarantee the consistency of off-line training and on-line serving, we usually utilize the same features that are both available. However, the consistency in turn neglects some discriminative features. For example, when estimating the conversion rate (CVR), i.e., the probability that a user would purchase the item if she clicked it, features like dwell time on the item detailed page are informative. However, CVR prediction should be conducted for on-line ranking before the click happens. Thus we cannot get such post-event features during serving.
We define the features that are discriminative but only available during training as the privileged features. Inspired by the distillation techniques which bridge the gap between training and inference, in this work, we propose privileged features distillation (PFD). We train two models, i.e., a student model that is the same as the original one and a teacher model that additionally utilizes the privileged features. Knowledge distilled from the more accurate teacher is transferred to the student, which helps to improve its prediction accuracy. During serving, only the student part is extracted and it relies on no privileged features. We conduct experiments on two fundamental prediction tasks at Taobao recommendations, i.e., click-through rate (CTR) at coarse-grained ranking and CVR at fine-grained ranking. By distilling the interacted features that are prohibited during serving for CTR and the post-event features for CVR, we achieve significant improvements over their strong baselines. During the on-line A/B tests, the click metric is improved by +5.0% in the CTR task. And the conversion metric is improved by +2.3% in the CVR task. Besides, by addressing several issues of training PFD, we obtain comparable training speed as the baselines without any distillation.

References

[1]
Rohan Anil, Gabriel Pereyra, Alexandre Passos, Robert Ormandi, George E Dahl, and Geoffrey E Hinton. 2018. Large scale distributed neural network training through online distillation. In ICLR.
[2]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv:1607.06450 (2016).
[3]
Cristian Bucilu?, Rich Caruana, and Alexandru Niculescu-Mizil. 2006. Model compression. In SIGKDD. ACM, 535--541.
[4]
Xu Chen, Yongfeng Zhang, Hongteng Xu, Zheng Qin, and Hongyuan Zha. 2018. Adversarial distillation for efficient recommendation with external knowledge. ACM Transactions on Information Systems, Vol. 37, 1 (2018), 1--28.
[5]
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et almbox. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. ACM, NY, USA, 7--10.
[6]
Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In RecSys. ACM, 191--198.
[7]
George Cybenko. 1989. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, Vol. 2, 4 (1989), 303--314.
[8]
Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, et almbox. 2012. Large scale distributed deep networks. In NeurIPS. 1223--1231.
[9]
Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Transactions on Information Systems, Vol. 22, 1 (2004), 143--177.
[10]
John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, Vol. 12, Jul (2011), 2121--2159.
[11]
Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: A factorization-machine based neural network for CTR prediction. In AAAI. 1725--1731.
[12]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.
[13]
Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2016. Session-based recommendations with recurrent neural networks. In ICLR.
[14]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv:1503.02531 (2015).
[15]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation, Vol. 9, 8 (1997), 1735--1780.
[16]
Kurt Hornik. 1991. Approximation capabilities of multilayer feedforward networks. Neural Networks, Vol. 4, 2 (1991), 251--257.
[17]
Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In CIKM. ACM, 2333--2338.
[18]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML. 448--456.
[19]
Wang-Cheng Kang and Julian McAuley. 2018 Self-attentive sequential recommendation. In ICDM. IEEE, 197--206.
[20]
Yoon Kim and Alexander M. Rush. 2016. Sequence-Level Knowledge Distillation. In EMNLP. 1317--1327.
[21]
John Lambert, Ozan Sener, and Silvio Savarese. 2018. Deep learning under privileged information using heteroscedastic dropout. In CVPR. 8886--8895.
[22]
Jianxun Lian, Xiaohuan Zhou, Fuzheng Zhang, Zhongxia Chen, Xing Xie, and Guangzhong Sun. 2018. xDeepFM: Combining explicit and implicit feature interactions for recommender systems. In SIGKDD. ACM, NY, USA, 1754--1763.
[23]
Henry W Lin, Max Tegmark, and David Rolnick. 2017. Why does deep and cheap learning work so well? Journal of Statistical Physics, Vol. 168, 6 (2017), 1223--1247.
[24]
Shichen Liu, Fei Xiao, Wenwu Ou, and Luo Si. 2017. Cascade ranking for operational e-commerce search. In SIGKDD. ACM, 1557--1565.
[25]
David Lopez-Paz, Léon Bottou, Bernhard Schölkopf, and Vladimir Vapnik. 2016. Unifying distillation and privileged information. In ICLR.
[26]
Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. 2013. Rectifier nonlinearities improve neural network acoustic models. In ICML, Vol. 30. 3.
[27]
H Brendan McMahan, Gary Holt, David Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, et almbox. 2013. ftrl. In SIGKDD. 1222--1230.
[28]
Asit Mishra and Debbie Marr. 2018. Apprentice: Using knowledge distillation techniques to improve low-precision network accuracy. In ICLR.
[29]
Yabo Ni, Dan Ou, Shichen Liu, Xiang Li, Wenwu Ou, Anxiang Zeng, and Luo Si. 2018. Perceive your users in depth: Learning universal user representations from multiple e-commerce tasks. In SIGKDD. ACM, 596--605.
[30]
Gabriel Pereyra, George Tucker, Jan Chorowski, Łukasz Kaiser, and Geoffrey Hinton. 2017. Regularizing neural networks by penalizing confident output distributions. arXiv:1701.06548 (2017).
[31]
Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2015. Fitnets: Hints for thin deep nets. In ICLR.
[32]
Sebastian Ruder. 2017. An overview of multi-task learning in deep neural networks. arXiv:1706.05098 (2017).
[33]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In CVPR. 2818--2826.
[34]
Jiaxi Tang and Ke Wang. 2018. Ranking distillation: Learning compact ranking models with high performance for recommender system. In SIGKDD. ACM, 2289--2298.
[35]
Vladimir Vapnik and Rauf Izmailov. 2015. Learning using privileged information: similarity control and knowledge transfer. Journal of Machine Learning Research, Vol. 16, 2023--2049 (2015), 2.
[36]
Vladimir Vapnik and Akshay Vashist. 2009. A new learning paradigm: Learning using privileged information. Neural Networks, Vol. 22, 5--6 (2009), 544--557.
[37]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NeurIPS. 5998--6008.
[38]
Xiaojie Wang, Rui Zhang, Yu Sun, and Jianzhong Qi. 2018. Kdgan: Knowledge distillation with generative adversarial networks. In NeurIPS. 775--786.
[39]
Ying Zhang, Tao Xiang, Timothy M Hospedales, and Huchuan Lu. 2018. Deep mutual learning. In CVPR. 4320--4328.
[40]
Guorui Zhou, Ying Fan, Runpeng Cui, Weijie Bian, Xiaoqiang Zhu, and Kun Gai. 2018a. Rocket launching: A universal and efficient framework for training well-performing light net. In AAAI.
[41]
Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018b. Deep interest network for click-through rate prediction. In SIGKDD. ACM, 1059--1068.

Cited By

View all
  • (2024)HyperCLR: A Personalized Sequential Recommendation Algorithm Based on Hypergraph and Contrastive LearningMathematics10.3390/math1218288712:18(2887)Online publication date: 16-Sep-2024
  • (2024)A novel paper-reviewer recommendation method based on a semantics and correlation fusion modelProceedings of the International Conference on Computing, Machine Learning and Data Science10.1145/3661725.3661748(1-6)Online publication date: 12-Apr-2024
  • (2024)Privileged Knowledge State Distillation for Reinforcement Learning-based Educational Path RecommendationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671872(1621-1630)Online publication date: 25-Aug-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
August 2020
3664 pages
ISBN:9781450379984
DOI:10.1145/3394486
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ctr
  2. cvr
  3. distillation
  4. e-commerce recommendations
  5. privileged features

Qualifiers

  • Research-article

Conference

KDD '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)66
  • Downloads (Last 6 weeks)4
Reflects downloads up to 23 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)HyperCLR: A Personalized Sequential Recommendation Algorithm Based on Hypergraph and Contrastive LearningMathematics10.3390/math1218288712:18(2887)Online publication date: 16-Sep-2024
  • (2024)A novel paper-reviewer recommendation method based on a semantics and correlation fusion modelProceedings of the International Conference on Computing, Machine Learning and Data Science10.1145/3661725.3661748(1-6)Online publication date: 12-Apr-2024
  • (2024)Privileged Knowledge State Distillation for Reinforcement Learning-based Educational Path RecommendationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671872(1621-1630)Online publication date: 25-Aug-2024
  • (2024)Optimizing E-commerce Search: Toward a Generalizable and Rank-Consistent Pre-Ranking ModelProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3661343(2875-2879)Online publication date: 10-Jul-2024
  • (2024)Privileged Modality Learning via Multimodal HallucinationIEEE Transactions on Multimedia10.1109/TMM.2023.328287426(1516-1527)Online publication date: 2024
  • (2024)Node Cardinality Estimation in the Internet of Things Using Privileged Feature DistillationIEEE Transactions on Machine Learning in Communications and Networking10.1109/TMLCN.2024.34520572(1229-1247)Online publication date: 2024
  • (2024)Cross-Video Contextual Knowledge Exploration and Exploitation for Ambiguity Reduction in Weakly Supervised Temporal Action LocalizationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.334188134:6(4568-4580)Online publication date: Jun-2024
  • (2023)RD-SuiteProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667673(35748-35760)Online publication date: 10-Dec-2023
  • (2023)Extended Conversion: Capturing Successful Interactions in Voice ShoppingProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608836(826-832)Online publication date: 14-Sep-2023
  • (2023)Data-free Knowledge Distillation for Reusing Recommendation ModelsProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608789(386-395)Online publication date: 14-Sep-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media