Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3219819.3219886acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning

Published: 19 July 2018 Publication History

Abstract

Recommender systems play a crucial role in mitigating the problem of information overload by suggesting users' personalized items or services. The vast majority of traditional recommender systems consider the recommendation procedure as a static process and make recommendations following a fixed strategy. In this paper, we propose a novel recommender system with the capability of continuously improving its strategies during the interactions with users. We model the sequential interactions between users and a recommender system as a Markov Decision Process (MDP) and leverage Reinforcement Learning (RL) to automatically learn the optimal strategies via recommending trial-and-error items and receiving reinforcements of these items from users' feedback. Users' feedback can be positive and negative and both types of feedback have great potentials to boost recommendations. However, the number of negative feedback is much larger than that of positive one; thus incorporating them simultaneously is challenging since positive feedback could be buried by negative one. In this paper, we develop a novel approach to incorporate them into the proposed deep recommender system (DEERS) framework. The experimental results based on real-world e-commerce data demonstrate the effectiveness of the proposed framework. Further experiments have been conducted to understand the importance of both positive and negative feedback in recommendations.

References

[1]
Rajendra Akerkar and Priti Sajja. 2010. Knowledge-based systems. Jones & Bartlett Publishers.
[2]
Richard Bellman. 2013. Dynamic programming. Courier Corporation.
[3]
John S Breese, David Heckerman, and Carl Kadie. 1998. Empirical analysis of predictive algorithms for collaborative filtering Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 43--52.
[4]
Robin Burke. 2002. Hybrid recommender systems: Survey and experiments. User modeling and user-adapted interaction Vol. 12, 4 (2002), 331--370.
[5]
Thomas Degris, Martha White, and Richard S Sutton. 2012. Off-policy actor-critic. arXiv preprint arXiv:1205.4839 (2012).
[6]
Georges E Dupret and Benjamin Piwowarski. 2008. A user browsing model to predict search engine click data from past observations. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 331--338.
[7]
Milos Hauskrecht. 1997. Incremental methods for computing bounds in partially observable Markov decision processes. In AAAI/IAAI. 734--739.
[8]
Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2015. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).
[9]
Kalervo Jarvelin and Jaana Kekalainen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS) Vol. 20, 4 (2002), 422--446.
[10]
Michael Kearns, Yishay Mansour, and Andrew Y Ng. 2002. A sparse sampling algorithm for near-optimal planning in large Markov decision processes. Machine learning Vol. 49, 2 (2002), 193--208.
[11]
Omer Levy and Yoav Goldberg. 2014. Neural word embedding as implicit matrix factorization Advances in neural information processing systems. 2177--2185.
[12]
Long-Ji Lin. 1993. Reinforcement learning for robots using neural networks. Technical Report. Carnegie-Mellon Univ Pittsburgh PA School of Computer Science.
[13]
Greg Linden, Brent Smith, and Jeremy York. 2003. Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet computing Vol. 7, 1 (2003), 76--80.
[14]
Tariq Mahmood and Francesco Ricci. 2007. Learning and adaptivity in interactive recommender systems Proceedings of the ninth international conference on Electronic commerce. ACM, 75--84.
[15]
Tariq Mahmood and Francesco Ricci. 2009. Improving recommender systems with adaptive conversational strategies Proceedings of the 20th ACM conference on Hypertext and hypermedia. ACM, 73--82.
[16]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).
[17]
Raymond J Mooney and Loriene Roy. 2000. Content-based book recommending using learning for text categorization Proceedings of the fifth ACM conference on Digital libraries. ACM, 195--204.
[18]
Andrew W Moore and Christopher G Atkeson. 1993. Prioritized sweeping: Reinforcement learning with less data and less time. Machine learning Vol. 13, 1 (1993), 103--130.
[19]
Andrew Y Ng and Michael Jordan. 2000. PEGASUS: A policy search method for large MDPs and POMDPs Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 406--415.
[20]
Hanh TH Nguyen, Martin Wistuba, Josif Grabocka, Lucas Rego Drumond, and Lars Schmidt-Thieme. 2017. Personalized Deep Learning for Tag Recommendation. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 186--197.
[21]
Pascal Poupart and Craig Boutilier. 2005. VDCBPI: an approximate scalable algorithm for large POMDPs Advances in Neural Information Processing Systems. 1081--1088.
[22]
Steffen Rendle. 2010. Factorization machines. In Data Mining (ICDM), 2010 IEEE 10th International Conference on. IEEE, 995--1000.
[23]
Paul Resnick and Hal R Varian. 1997. Recommender systems. Commun. ACM Vol. 40, 3 (1997), 56--58.
[24]
Francesco Ricci, Lior Rokach, and Bracha Shapira. 2011. Introduction to recommender systems handbook. In Recommender systems handbook. Springer, 1--35.
[25]
Guy Shani, David Heckerman, and Ronen I Brafman. 2005. An MDP-based recommender system. Journal of Machine Learning Research Vol. 6, Sep (2005), 1265--1295.
[26]
Peter Sunehag, Richard Evans, Gabriel Dulac-Arnold, Yori Zwols, Daniel Visentin, and Ben Coppin. 2015. Deep Reinforcement Learning with Attention for Slate Markov Decision Processes with High-Dimensional States and Actions. arXiv preprint arXiv:1512.01124 (2015).
[27]
Nima Taghipour and Ahmad Kardan. 2008. A hybrid web recommender system based on q-learning Proceedings of the 2008 ACM symposium on Applied computing. ACM, 1164--1168.
[28]
Nima Taghipour, Ahmad Kardan, and Saeed Shiry Ghidary. 2007. Usage-based web recommendations: a reinforcement learning approach Proceedings of the 2007 ACM conference on Recommender systems. ACM, 113--120.
[29]
Andrew Turpin and Falk Scholer. 2006. User performance versus precision measures for simple search tasks Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 11--18.
[30]
Qingyun Wu, Hongning Wang, Liangjie Hong, and Yue Shi. 2017. Returning is Believing: Optimizing Long-term User Engagement in Recommender Systems Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 1927--1936.
[31]
Sai Wu, Weichao Ren, Chengchao Yu, Gang Chen, Dongxiang Zhang, and Jingbo Zhu. 2016. Personal recommendation using deep recurrent neural networks in NetEase Data Engineering (ICDE), 2016 IEEE 32nd International Conference on. IEEE, 1218--1229.
[32]
Shuai Zhang, Lina Yao, and Aixin Sun. 2017. Deep Learning based Recommender System: A Survey and New Perspectives. arXiv preprint arXiv:1707.07435 (2017).
[33]
Xiangyu Zhao, Long Xia, Liang Zhang, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2018. Deep Reinforcement Learning for Page-wise Recommendations. arXiv preprint arXiv:1805.02343 (2018).
[34]
Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Dawei Yin, Yihong Zhao, and Jiliang Tang. 2017. Deep Reinforcement Learning for List-wise Recommendations. arXiv preprint arXiv:1801.00209 (2017).

Cited By

View all
  • (2024)Cost-aware Offline Safe Meta Reinforcement Learning with Robust In-Distribution Online Task AdaptationProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662927(743-751)Online publication date: 6-May-2024
  • (2024)Foresight Distribution Adjustment for Off-policy Reinforcement LearningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662880(317-325)Online publication date: 6-May-2024
  • (2024)Non-Stationary Transformer Architecture: A Versatile Framework for Recommendation SystemsElectronics10.3390/electronics1311207513:11(2075)Online publication date: 27-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2018
2925 pages
ISBN:9781450355520
DOI:10.1145/3219819
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep reinforcement learning
  2. pairwise deep Q-network
  3. recommender system

Qualifiers

  • Research-article

Funding Sources

Conference

KDD '18
Sponsor:

Acceptance Rates

KDD '18 Paper Acceptance Rate 107 of 983 submissions, 11%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)659
  • Downloads (Last 6 weeks)69
Reflects downloads up to 23 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Cost-aware Offline Safe Meta Reinforcement Learning with Robust In-Distribution Online Task AdaptationProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662927(743-751)Online publication date: 6-May-2024
  • (2024)Foresight Distribution Adjustment for Off-policy Reinforcement LearningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662880(317-325)Online publication date: 6-May-2024
  • (2024)Non-Stationary Transformer Architecture: A Versatile Framework for Recommendation SystemsElectronics10.3390/electronics1311207513:11(2075)Online publication date: 27-May-2024
  • (2024)Reinforcement Learning-Based Dynamic Order Recommendation for On-Demand Food DeliveryTsinghua Science and Technology10.26599/TST.2023.901004129:2(356-367)Online publication date: Apr-2024
  • (2024)A social image recommendation system based on deep reinforcement learningPLOS ONE10.1371/journal.pone.030005919:4(e0300059)Online publication date: 4-Apr-2024
  • (2024)M3Rec: A Context-Aware Offline Meta-Level Model-Based Reinforcement Learning Approach for Cold-Start RecommendationACM Transactions on Information Systems10.1145/365994742:6(1-27)Online publication date: 19-Aug-2024
  • (2024)Adapting Job Recommendations to User Preference Drift with Behavioral-Semantic Fusion LearningProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671759(1004-1015)Online publication date: 25-Aug-2024
  • (2024)Modeling User Retention through Generative Flow NetworksProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671531(5497-5508)Online publication date: 25-Aug-2024
  • (2024)Future Impact Decomposition in Request-level RecommendationsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671506(5905-5916)Online publication date: 25-Aug-2024
  • (2024)EasyRL4Rec: An Easy-to-use Library for Reinforcement Learning Based Recommender SystemsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657868(977-987)Online publication date: 10-Jul-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media