research-article

MaHRL: Multi-goals Abstraction Based Deep Hierarchical Reinforcement Learning for Recommendations

Authors:

Weipeng YanAuthors Info & Claims

SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 871 - 880

https://doi.org/10.1145/3397271.3401170

Published: 25 July 2020 Publication History

Abstract

As huge commercial value of the recommender system, there has been growing interest to improve its performance in recent years. The majority of existing methods have achieved great improvement on the metric of click, but perform poorly on the metric of conversion possibly due to its extremely sparse feedback signal. To track this challenge, we design a novel deep hierarchical reinforcement learning based recommendation framework to model consumers' hierarchical purchase interest. Specifically, the high-level agent catches long-term sparse conversion interest, and automatically sets abstract goals for low-level agent, while the low-level agent follows the abstract goals and catches short-term click interest via interacting with real-time environment. To solve the inherent problem in hierarchical reinforcement learning, we propose a novel multi-goals abstraction based deep hierarchical reinforcement learning algorithm (MaHRL). Our proposed algorithm contains three contributions: 1) the high-level agent generates multiple goals to guide the low-level agent in different sub-periods, which reduces the difficulty of approaching high-level goals; 2) different goals share the same state encoder structure and its parameters, which increases the update frequency of the high-level agent and thus accelerates the convergence of our proposed algorithm; 3) an appreciated reward assignment mechanism is designed to allocate rewards in each goal so as to coordinate different goals in a consistent direction. We evaluate our proposed algorithm based on a real-world e-commerce dataset and validate its effectiveness.

References

[1]

Andrew G. Barto and Sridhar Mahadevan. 2003. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, Vol. 13, 1--2 (2003), 41--77.

[2]

John S Breese, David Heckerman, and Carl Kadie. 1998. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the 14th. conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 43--52.

[3]

Robin Burke. 2002. Hybrid recommender systems: Survey and experiments. User modeling and user-adapted interaction, Vol. 12, 4 (2002), 331--370.

[4]

Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu Google, and Hemal Shah. 2016. Wide & Deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 7--10.

Digital Library

[5]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for YouTube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 191--198.

Digital Library

[6]

Gabriel Dulac-Arnold, Richard Evans, Hado van Hasselt, Peter Sunehag, Timothy Lillicrap, Jonathan Hunt, Timothy Mann, TheophaneWeber, Thomas Degris, and Ben Coppin. 2015. Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679 (2015).

[7]

Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2015. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).

[8]

Yujing Hu, Qing Da, Anxiang Zeng, Yang Yu, and Yinghui Xu. 2018. Reinforcement learning to rank in e-Commerce search engine: formalization. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM.

Digital Library

[9]

Kalerärvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS), Vol. 20, 4 (2002), 422--446.

Digital Library

[10]

Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World Wide Web. ACM, 661--670.

[11]

Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).

[12]

Long-Ji Lin. 1993. Reinforcement learning for robots using neural networks. Technical Report, Carnegie-Mellon Univ Pittsburgh PA School of Computer Science.

[13]

Greg Linden, Brent Smith, and Jeremy York. 2003. Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet computing, Vol. 7, 1 (2003), 76--80.

[14]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).

[15]

Raymond J Mooney and Loriene Roy. 2000. Content-based book recommending using learning for text categorization. In Proceedings of the 5th ACM conference on Digital libraries. ACM, 195--204.

Digital Library

[16]

Ofir Nachum, Shixiang Gu, Honglak Lee, and Sergey Levine. 2018. Data-efficient hierarchical reinforcement learning. In Advances in neural information processing systems.

[17]

Steffen Rendle. 2010. Factorization machines. In 10th IEEE International Conference on Data Mining (ICDM). IEEE, 995--1000.

Digital Library

[18]

Paul Resnick and Hal R Varian. 1997. Recommender systems. Commun. ACM, Vol. 40, 3 (1997), 56--58.

Digital Library

[19]

Francesco Ricci, Lior Rokach, and Bracha Shapira. 2011. Introduction to recommender systems handbook. In Recommender systems handbook. Springer, 1--35.

[20]

Richard S.Sutton, Doina Precup, and Satinder Singh. 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, Vol. 112, 1--2 (1999), 181--211.

[21]

Yumin Su, Liang Zhang, Quanyu Dai, Bo Zhang, Jinyao Yan, Dan Wang, Yongjun Bao, Sulong Xu, Yang He, and Weipeng Yan. 2020. An attention-based model for conversion rate prediction with delayed feedback via post-click calibration. In International Joint Conference on Artificial Intelligence - Pacific Rim International Conference on Artificial Intelligence.

[22]

Andrew Turpin and Falk Scholer. 2006. User performance versus precision measures for simple search tasks. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 11--18.

Digital Library

[23]

Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu. 2017. FeUdal networks for hierarchical reinforcement learning. arXiv preprint arXiv:1703.01161 (2017).

[24]

Yu Wang, Jixing Xu, Aohan Wu, Mantian Li, Yang He, Jinghe Hu, and Weipeng P. Yan. 2018. Telepath: understanding users from a human vision perspective in large-scale recommender systems. In Thirty-Second AAAI Conference on Artificial Intelligence.

[25]

Yikai Wang, Liang Zhang, Quanyu Dai, Fuchun Sun, Bo Zhang, Yang He, Weipeng Yan, and Yongjun Bao. 2019. Regularized adversarial sampling and deep time-aware attention for click-through rate prediction. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. ACM, 349--358.

Digital Library

[26]

Sai Wu, Weichao Ren, Chengchao Yu, Gang Chen, Dongxiang Zhang, and Jingbo Zhu. 2016. Personal recommendation using deep recurrent neural networks in NetEase. In Data Engineering (ICDE), 2016 IEEE 32nd International Conference on Data Engineering. IEEE, 1218--1229.

[27]

Hongxia Yang, Quan Lu, Angus Xianen Qiu, and Chun Han. 2016. Large scale CVR prediction through dynamic transfer learning of global and local features. In Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016, Vol. 53. PMLR, 103--119.

[28]

Jing Zhang, Bowen Hao, Bo Chen, Cuiping Li, Hong Chen, and Jimeng Sund. 2019. Hierarchical reinforcement learning for course recommendation in MOOCs. Psychology, Vol. 5, 4.64 (2019), 5--65.

[29]

Shuai Zhang, Lina Yao, and Aixin Sun. 2017. Deep learning based recommender system: A survey and new perspectives. arXiv preprint arXiv:1707.07435 (2017).

Digital Library

[30]

Xiangyu Zhao, Long Xia, Liang Zhang, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2018a. Deep reinforcement learning for page-wise recommendations. arXiv preprint arXiv:1805.02343 (2018).

[31]

Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Long Xia, Jiliang Tang, and Dawei Yin. 2018b. Recommendations with negative feedback via pairwise deep reinforcement learning. In KDD'18: The 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM.

Digital Library

[32]

Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Dawei Yin, Yihong Zhao, and Jiliang Tang. 2018c. Deep reinforcement learning for list-wise recommendations. arXiv preprint arXiv:1801.00209 (2018).

[33]

Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2018. Deep interest evolution network for click-through rate prediction. arXiv preprint arXiv:1809.03672 (2018).

Cited By

Wu YMacdonald COunis I(2024)Personalised Multi-modal Interactive Recommendation with Hierarchical State RepresentationsACM Transactions on Recommender Systems10.1145/36511692:3(1-25)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1145/3651169
Zhu YLi YCui YZhang TWang DZhang YFeng S(2023)A Knowledge-Enhanced Hierarchical Reinforcement Learning-Based Dialogue System for Automatic Disease DiagnosisElectronics10.3390/electronics1224489612:24(4896)Online publication date: 5-Dec-2023
https://doi.org/10.3390/electronics12244896
Wu YMacdonald COunis I(2023)Goal-Oriented Multi-Modal Interactive Recommendation with Verbal and Non-Verbal Relevance FeedbackProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608775(362-373)Online publication date: 14-Sep-2023
https://dl.acm.org/doi/10.1145/3604915.3608775
Show More Cited By

Index Terms

MaHRL: Multi-goals Abstraction Based Deep Hierarchical Reinforcement Learning for Recommendations
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Reinforcement learning

Recommendations

A Scalable, Accurate Hybrid Recommender System
WKDD '10: Proceedings of the 2010 Third International Conference on Knowledge Discovery and Data Mining

Recommender systems apply machine learning techniques for filtering unseen information and can predict whether a user would like a given resource. There are three main types of recommender systems: collaborative filtering, content-based filtering, and ...
User-Specific Feature-Based Similarity Models for Top-n Recommendation of New Items
Survey Paper, Regular Papers and Special Section on Participatory Sensing and Crowd Intelligence

Recommending new items for suitable users is an important yet challenging problem due to the lack of preference history for the new items. Noncollaborative user modeling techniques that rely on the item features can be used to recommend new items. ...
A Deep Hierarchical Reinforcement Learner for Aerial Shepherding of Ground Swarms
Neural Information Processing
Abstract
This paper introduces a deep reinforcement learning method to train an autonomous aerial agent acting as a shepherd to provide guidance for a swarm of ground vehicles. The learner is situated within a high-fidelity robotic-operating-system (ROS)-...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2020

2548 pages

ISBN:9781450380164

DOI:10.1145/3397271

General Chairs:
Jimmy Huang
York University, Canada
,
Yi Chang
Jilin University, China
,
Xueqi Cheng
Chinese Academy of Sciences, China
,
Program Chairs:
Jaap Kamps
University of Amsterdam, Netherlands
,
Vanessa Murdock
Amazon, U.S.A.
,
Ji-Rong Wen
Renmin University of China, China
,
Yiqun Liu
Tsinghua University, China

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key R&D Program of China
National Natural Science Foundation of China

Conference

SIGIR '20

Sponsor:

SIGIR

SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval

July 25 - 30, 2020

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
782
Total Downloads

Downloads (Last 12 months)81
Downloads (Last 6 weeks)5

Reflects downloads up to 24 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wu YMacdonald COunis I(2024)Personalised Multi-modal Interactive Recommendation with Hierarchical State RepresentationsACM Transactions on Recommender Systems10.1145/36511692:3(1-25)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1145/3651169
Zhu YLi YCui YZhang TWang DZhang YFeng S(2023)A Knowledge-Enhanced Hierarchical Reinforcement Learning-Based Dialogue System for Automatic Disease DiagnosisElectronics10.3390/electronics1224489612:24(4896)Online publication date: 5-Dec-2023
https://doi.org/10.3390/electronics12244896
Wu YMacdonald COunis I(2023)Goal-Oriented Multi-Modal Interactive Recommendation with Verbal and Non-Verbal Relevance FeedbackProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608775(362-373)Online publication date: 14-Sep-2023
https://dl.acm.org/doi/10.1145/3604915.3608775
Cavenaghi ESottocornola GStella FZanker M(2023)A Systematic Study on Reproducibility of Reinforcement Learning in Recommendation SystemsACM Transactions on Recommender Systems10.1145/35965191:3(1-23)Online publication date: 14-Jul-2023
https://dl.acm.org/doi/10.1145/3596519
Li QXia WYin LShen JRui RZhang WChen XTang RYu YFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Graph Enhanced Hierarchical Reinforcement Learning for Goal-oriented Learning Path RecommendationProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614897(1318-1327)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3614897
Gong JWan YLiu YLi XZhao YWang CLin YFang XFeng WZhang JTang J(2023)Reinforced MOOCs Concept Recommendation in Heterogeneous Information NetworksACM Transactions on the Web10.1145/358051017:3(1-27)Online publication date: 22-May-2023
https://dl.acm.org/doi/10.1145/3580510
Xue WCai QXue ZSun SLiu SZheng DJiang PGai KAn BSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)PrefRec: Recommender Systems with Human Preferences for Reinforcing Long-term User EngagementProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599473(2874-2884)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599473
Du ZYang NYu ZYu P(2023)Learning From Atypical Behavior: Temporary Interest Aware Recommendation Based on Reinforcement LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.314429235:10(9824-9835)Online publication date: 1-Oct-2023
https://doi.org/10.1109/TKDE.2022.3144292
Fu MHuang LRao AIrissappane AZhang JQu H(2023)A Deep Reinforcement Learning Recommender System With Multiple Policies for RecommendationsIEEE Transactions on Industrial Informatics10.1109/TII.2022.320929019:2(2049-2061)Online publication date: Feb-2023
https://doi.org/10.1109/TII.2022.3209290
Chen XYao LMcAuley JZhou GWang X(2023)Deep reinforcement learning in recommender systems: A survey and new perspectivesKnowledge-Based Systems10.1016/j.knosys.2023.110335264(110335)Online publication date: Mar-2023
https://doi.org/10.1016/j.knosys.2023.110335
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents