Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3397271.3401170acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

MaHRL: Multi-goals Abstraction Based Deep Hierarchical Reinforcement Learning for Recommendations

Published: 25 July 2020 Publication History

Abstract

As huge commercial value of the recommender system, there has been growing interest to improve its performance in recent years. The majority of existing methods have achieved great improvement on the metric of click, but perform poorly on the metric of conversion possibly due to its extremely sparse feedback signal. To track this challenge, we design a novel deep hierarchical reinforcement learning based recommendation framework to model consumers' hierarchical purchase interest. Specifically, the high-level agent catches long-term sparse conversion interest, and automatically sets abstract goals for low-level agent, while the low-level agent follows the abstract goals and catches short-term click interest via interacting with real-time environment. To solve the inherent problem in hierarchical reinforcement learning, we propose a novel multi-goals abstraction based deep hierarchical reinforcement learning algorithm (MaHRL). Our proposed algorithm contains three contributions: 1) the high-level agent generates multiple goals to guide the low-level agent in different sub-periods, which reduces the difficulty of approaching high-level goals; 2) different goals share the same state encoder structure and its parameters, which increases the update frequency of the high-level agent and thus accelerates the convergence of our proposed algorithm; 3) an appreciated reward assignment mechanism is designed to allocate rewards in each goal so as to coordinate different goals in a consistent direction. We evaluate our proposed algorithm based on a real-world e-commerce dataset and validate its effectiveness.

References

[1]
Andrew G. Barto and Sridhar Mahadevan. 2003. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, Vol. 13, 1--2 (2003), 41--77.
[2]
John S Breese, David Heckerman, and Carl Kadie. 1998. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the 14th. conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 43--52.
[3]
Robin Burke. 2002. Hybrid recommender systems: Survey and experiments. User modeling and user-adapted interaction, Vol. 12, 4 (2002), 331--370.
[4]
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu Google, and Hemal Shah. 2016. Wide & Deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 7--10.
[5]
Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for YouTube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 191--198.
[6]
Gabriel Dulac-Arnold, Richard Evans, Hado van Hasselt, Peter Sunehag, Timothy Lillicrap, Jonathan Hunt, Timothy Mann, TheophaneWeber, Thomas Degris, and Ben Coppin. 2015. Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679 (2015).
[7]
Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2015. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).
[8]
Yujing Hu, Qing Da, Anxiang Zeng, Yang Yu, and Yinghui Xu. 2018. Reinforcement learning to rank in e-Commerce search engine: formalization. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM.
[9]
Kalerärvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS), Vol. 20, 4 (2002), 422--446.
[10]
Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World Wide Web. ACM, 661--670.
[11]
Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).
[12]
Long-Ji Lin. 1993. Reinforcement learning for robots using neural networks. Technical Report, Carnegie-Mellon Univ Pittsburgh PA School of Computer Science.
[13]
Greg Linden, Brent Smith, and Jeremy York. 2003. Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet computing, Vol. 7, 1 (2003), 76--80.
[14]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).
[15]
Raymond J Mooney and Loriene Roy. 2000. Content-based book recommending using learning for text categorization. In Proceedings of the 5th ACM conference on Digital libraries. ACM, 195--204.
[16]
Ofir Nachum, Shixiang Gu, Honglak Lee, and Sergey Levine. 2018. Data-efficient hierarchical reinforcement learning. In Advances in neural information processing systems.
[17]
Steffen Rendle. 2010. Factorization machines. In 10th IEEE International Conference on Data Mining (ICDM). IEEE, 995--1000.
[18]
Paul Resnick and Hal R Varian. 1997. Recommender systems. Commun. ACM, Vol. 40, 3 (1997), 56--58.
[19]
Francesco Ricci, Lior Rokach, and Bracha Shapira. 2011. Introduction to recommender systems handbook. In Recommender systems handbook. Springer, 1--35.
[20]
Richard S.Sutton, Doina Precup, and Satinder Singh. 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, Vol. 112, 1--2 (1999), 181--211.
[21]
Yumin Su, Liang Zhang, Quanyu Dai, Bo Zhang, Jinyao Yan, Dan Wang, Yongjun Bao, Sulong Xu, Yang He, and Weipeng Yan. 2020. An attention-based model for conversion rate prediction with delayed feedback via post-click calibration. In International Joint Conference on Artificial Intelligence - Pacific Rim International Conference on Artificial Intelligence.
[22]
Andrew Turpin and Falk Scholer. 2006. User performance versus precision measures for simple search tasks. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 11--18.
[23]
Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu. 2017. FeUdal networks for hierarchical reinforcement learning. arXiv preprint arXiv:1703.01161 (2017).
[24]
Yu Wang, Jixing Xu, Aohan Wu, Mantian Li, Yang He, Jinghe Hu, and Weipeng P. Yan. 2018. Telepath: understanding users from a human vision perspective in large-scale recommender systems. In Thirty-Second AAAI Conference on Artificial Intelligence.
[25]
Yikai Wang, Liang Zhang, Quanyu Dai, Fuchun Sun, Bo Zhang, Yang He, Weipeng Yan, and Yongjun Bao. 2019. Regularized adversarial sampling and deep time-aware attention for click-through rate prediction. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. ACM, 349--358.
[26]
Sai Wu, Weichao Ren, Chengchao Yu, Gang Chen, Dongxiang Zhang, and Jingbo Zhu. 2016. Personal recommendation using deep recurrent neural networks in NetEase. In Data Engineering (ICDE), 2016 IEEE 32nd International Conference on Data Engineering. IEEE, 1218--1229.
[27]
Hongxia Yang, Quan Lu, Angus Xianen Qiu, and Chun Han. 2016. Large scale CVR prediction through dynamic transfer learning of global and local features. In Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016, Vol. 53. PMLR, 103--119.
[28]
Jing Zhang, Bowen Hao, Bo Chen, Cuiping Li, Hong Chen, and Jimeng Sund. 2019. Hierarchical reinforcement learning for course recommendation in MOOCs. Psychology, Vol. 5, 4.64 (2019), 5--65.
[29]
Shuai Zhang, Lina Yao, and Aixin Sun. 2017. Deep learning based recommender system: A survey and new perspectives. arXiv preprint arXiv:1707.07435 (2017).
[30]
Xiangyu Zhao, Long Xia, Liang Zhang, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2018a. Deep reinforcement learning for page-wise recommendations. arXiv preprint arXiv:1805.02343 (2018).
[31]
Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Long Xia, Jiliang Tang, and Dawei Yin. 2018b. Recommendations with negative feedback via pairwise deep reinforcement learning. In KDD'18: The 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM.
[32]
Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Dawei Yin, Yihong Zhao, and Jiliang Tang. 2018c. Deep reinforcement learning for list-wise recommendations. arXiv preprint arXiv:1801.00209 (2018).
[33]
Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2018. Deep interest evolution network for click-through rate prediction. arXiv preprint arXiv:1809.03672 (2018).

Cited By

View all
  • (2024)Personalised Multi-modal Interactive Recommendation with Hierarchical State RepresentationsACM Transactions on Recommender Systems10.1145/36511692:3(1-25)Online publication date: 4-Mar-2024
  • (2023)A Knowledge-Enhanced Hierarchical Reinforcement Learning-Based Dialogue System for Automatic Disease DiagnosisElectronics10.3390/electronics1224489612:24(4896)Online publication date: 5-Dec-2023
  • (2023)Goal-Oriented Multi-Modal Interactive Recommendation with Verbal and Non-Verbal Relevance FeedbackProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608775(362-373)Online publication date: 14-Sep-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2020
2548 pages
ISBN:9781450380164
DOI:10.1145/3397271
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. conversion
  2. deep hierarchical reinforcement learning
  3. multi-goals
  4. recommender systems

Qualifiers

  • Research-article

Funding Sources

Conference

SIGIR '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)81
  • Downloads (Last 6 weeks)5
Reflects downloads up to 24 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Personalised Multi-modal Interactive Recommendation with Hierarchical State RepresentationsACM Transactions on Recommender Systems10.1145/36511692:3(1-25)Online publication date: 4-Mar-2024
  • (2023)A Knowledge-Enhanced Hierarchical Reinforcement Learning-Based Dialogue System for Automatic Disease DiagnosisElectronics10.3390/electronics1224489612:24(4896)Online publication date: 5-Dec-2023
  • (2023)Goal-Oriented Multi-Modal Interactive Recommendation with Verbal and Non-Verbal Relevance FeedbackProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608775(362-373)Online publication date: 14-Sep-2023
  • (2023)A Systematic Study on Reproducibility of Reinforcement Learning in Recommendation SystemsACM Transactions on Recommender Systems10.1145/35965191:3(1-23)Online publication date: 14-Jul-2023
  • (2023)Graph Enhanced Hierarchical Reinforcement Learning for Goal-oriented Learning Path RecommendationProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614897(1318-1327)Online publication date: 21-Oct-2023
  • (2023)Reinforced MOOCs Concept Recommendation in Heterogeneous Information NetworksACM Transactions on the Web10.1145/358051017:3(1-27)Online publication date: 22-May-2023
  • (2023)PrefRec: Recommender Systems with Human Preferences for Reinforcing Long-term User EngagementProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599473(2874-2884)Online publication date: 6-Aug-2023
  • (2023)Learning From Atypical Behavior: Temporary Interest Aware Recommendation Based on Reinforcement LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.314429235:10(9824-9835)Online publication date: 1-Oct-2023
  • (2023)A Deep Reinforcement Learning Recommender System With Multiple Policies for RecommendationsIEEE Transactions on Industrial Informatics10.1109/TII.2022.320929019:2(2049-2061)Online publication date: Feb-2023
  • (2023)Deep reinforcement learning in recommender systems: A survey and new perspectivesKnowledge-Based Systems10.1016/j.knosys.2023.110335264(110335)Online publication date: Mar-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media