Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3289600.3290958acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

When People Change their Mind: Off-Policy Evaluation in Non-stationary Recommendation Environments

Published: 30 January 2019 Publication History

Abstract

We consider the novel problem of evaluating a recommendation policy offline in environments where the reward signal is non-stationary. Non-stationarity appears in many Information Retrieval (IR) applications such as recommendation and advertising, but its effect on off-policy evaluation has not been studied at all. We are the first to address this issue. First, we analyze standard off-policy estimators in non-stationary environments and show both theoretically and experimentally that their bias grows with time. Then, we propose new off-policy estimators with moving averages and show that their bias is independent of time and can be bounded. Furthermore, we provide a method to trade-off bias and variance in a principled way to get an off-policy estimator that works well in both non-stationary and stationary environments. We experiment on publicly available recommendation datasets and show that our newly proposed moving average estimators accurately capture changes in non-stationary environments, while standard off-policy estimators fail to do so.

References

[1]
Elliot Aronson. 2008. The Social Animal 10th ed.). Worth/Freeman.
[2]
Heejung Bang and James M Robins. 2005. Doubly Robust Estimation in Missing Data and Causal Inference Models. Biometrics, Vol. 61, 4 (2005), 962--973.
[3]
Omar Besbes, Yonatan Gur, and Assaf Zeevi. 2014. Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Vol. 1. 199--207.
[4]
Iván Cantador, Peter Brusilovsky, and Tsvi Kuflik. 2011. 2nd Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec 2011). In Proceedings of the 5th ACM Conference on Recommender Systems (RecSys 2011). ACM, New York, NY, USA.
[5]
Nicolo Cesa-Bianchi, Claudio Gentile, and Giovanni Zappella. 2013. A Gang of Bandits. In Advances in Neural Information Processing Systems 26 . 737--745.
[6]
Delicious. 2018. Delicious website. http://www.delicious.com . (2018). Accessed: 2018-08-09.
[7]
Fernando Diaz. 2009. Integration of News Content into Web Results. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining. ACM, 182--191.
[8]
Miroslav Dud'ik, Dumitru Erhan, John Langford, and Lihong Li. 2012. Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits. Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (2012), 247--254.
[9]
Miroslav Dud'ik, John Langford, and Lihong Li. 2011. Doubly Robust Policy Evaluation and Learning. Proceedings of the 28th International Conference on International Conference on Machine Learning (2011), 1097--1104.
[10]
Aurélien Garivier and Eric Moulines. 2011. On Upper-Confidence Bound Policies for Switching Bandit Problems. Proceedings of the 22nd International Conference on Algorithmic Learning Theory (2011), 174--188.
[11]
Katja Hofmann, Anne Schuth, Shimon Whiteson, and Maarten de Rijke. 2013. Reusing Historical Interaction Data for Faster Online Learning to Rank for IR. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining. ACM, 183--192.
[12]
Daniel G Horvitz and Donovan J Thompson. 1952. A Generalization of Sampling Without Replacement From a Finite Universe. J. Amer. Statist. Assoc., Vol. 47, 260 (1952), 663--685.
[13]
Rolf Jagerman, Krisztian Balog, and Maarten de Rijke. 2018. OpenSearch: Lessons Learned from an Online Evaluation Campaign. J. Data and Information Quality, Vol. 10, 3 (2018), Article 13.
[14]
Thorsten Joachims. 2002. Optimizing Search Engines using Clickthrough Data. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 133--142.
[15]
Daniel Kahneman, Paul Slovic, and Amos Tversky (Eds.). 1982. Judgment Under Uncertainty: Heuristics and Biases .Cambridge University Press.
[16]
Anagha Kulkarni, Jaime Teevan, Krysta M. Svore, and Susan T. Dumais. 2011. Understanding Temporal Query Dynamics. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining. ACM, 167--176.
[17]
Last.fm. 2018. Last.fm website. http://www.lastfm.com . (2018). Accessed: 2018-08-09.
[18]
Damien Lefortier, Pavel Serdyukov, and Maarten de Rijke. 2014. Online Exploration for Detecting Shifts in Fresh Intent. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM, 589--598.
[19]
Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web. ACM, 661--670.
[20]
Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. 2011. Unbiased Offline Evaluation of Contextual-Bandit-Based News Article Recommendation Algorithms. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining. ACM, 297--306.
[21]
Fang Liu, Joohyun Lee, and Ness Shroff. 2018. A Change-Detection based Framework for Piecewise-stationary Multi-Armed Bandit Problem. Proceedings of the 32nd AAAI Conference on Artificial Intelligence (2018).
[22]
Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen. 2010. Personalized News Recommendation Based on Click Behavior. In Proceedings of the 15th International Conference on Intelligent User Interfaces. ACM, 31--40.
[23]
Tucker S McElroy, Brian C Monsell, and Rebecca J Hutchinson. 2018. Modeling of Holiday Effects and Seasonality in Daily Time Series . Technical Report. Center for Statistical Research and Methodology.
[24]
Joshua L Moore, Shuo Chen, Douglas Turnbull, and Thorsten Joachims. 2013. Taste Over Time: The Temporal Dynamics of User Preferences. In Proceedings of the 14th International Society for Music Information Retrieval Conference . 401--406.
[25]
Olfa Nasraoui, Jeff Cerwinske, Carlos Rojas, and Fabio Gonzalez. 2007. Performance of Recommendation Systems in Dynamic Streaming Environments. In Proceedings of the 2007 SIAM International Conference on Data Mining. SIAM, 569--574.
[26]
Fab'iola S. F. Pereira, Jo ao Gama, Sandra de Amo, and Gina M. B. Oliveira. 2018. On Analyzing User Preference Dynamics with Temporal Social Networks. Machine Learning, Vol. 107, 11 (2018), 1745--1773.
[27]
Doina Precup, Richard S. Sutton, and Satinder P. Singh. 2000. Eligibility Traces for Off-Policy Policy Evaluation. In Proceedings of the 17th International Conference on Machine Learning. 759--766.
[28]
Massimo Quadrana, Marta Reznakova, Tao Ye, Erik Schmidt, and Hossein Vahabi. 2018. Modeling Musical Taste Evolution with Recurrent Neural Networks. arXiv preprint arXiv:1806.06535 (2018).
[29]
Kira Radinsky, Krysta Svore, Susan Dumais, Jaime Teevan, Alex Bocharov, and Eric Horvitz. 2012. Modeling and Predicting Behavioral Dynamics on the Web. In Proceedings of the 21st International Conference on World Wide Web. ACM, 599--608.
[30]
Tobias Schnabel, Paul N Bennett, Susan T Dumais, and Thorsten Joachims. 2018. Short-Term Satisfaction and Long-Term Coverage: Understanding How Users Tolerate Algorithmic Exploration. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining. ACM, 513--521.
[31]
Norbert Schwarz and Fritz Strack. 1991. Context Effects in Attitude Surveys: Applying Cognitive Theory to Social Research. European Review of Social Psychology, Vol. 2 (1991), 31--50.
[32]
Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction .MIT press.
[33]
Adith Swaminathan, Akshay Krishnamurthy, Alekh Agarwal, Miro Dudik, John Langford, Damien Jose, and Imed Zitouni. 2017. Off-policy Evaluation for Slate Recommendation. In Proceedings of the 31st Conference on Neural Information Processing Systems. 3632--3642.
[34]
Philip S. Thomas and Emma Brunskill. 2016. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning. In Proceedings of the 33rd International Conference on International Conference on Machine Learning, Vol. 48. JMLR.org, 2139--2148.
[35]
Philip S Thomas, Georgios Theocharous, and Mohammad Ghavamzadeh. 2015. High-Confidence Off-Policy Evaluation. In Proceedings of the 29th AAAI Conference on Artificial Intelligence. 3000--3006.
[36]
Philip S Thomas, Georgios Theocharous, Mohammad Ghavamzadeh, Ishan Durugkar, and Emma Brunskill. 2017. Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. 4740--4745.
[37]
Roger Tourangeau. 1992. Context Effects on Attitude Responses: The Role of Retrieval and Necessary Structures. Context Effects in Social and Psychological Research, Norbert Schwarz and Seymour Sudman (Eds.). Springer, 35--47.
[38]
Dennis Wackerly, William Mendenhall, and Richard L Scheaffer. 2014. Mathematical Statistics with Applications .Cengage Learning.
[39]
Yu-Xiang Wang, Alekh Agarwal, and Miroslav Dudik. 2017. Optimal and Adaptive Off-policy Evaluation in Contextual Bandits. In Proceedings of the 34th International Conference on Machine Learning. 3589--3597.
[40]
Peter Whittle. 1988. Restless Bandits: Activity Allocation in a Changing World. J. Applied Probability, Vol. 25, A (1988), 287--298.
[41]
Qingyun Wu, Naveen Iyer, and Hongning Wang. 2018. Learning Contextual Bandits in a Non-stationary Environment. Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (2018).
[42]
Ya Xu, Nanyu Chen, Addrian Fernandez, Omar Sinno, and Anmol Bhasin. 2015. From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2227--2236.
[43]
Jia Yuan Yu and Shie Mannor. 2009. Piecewise-stationary Bandit Problems with Side Observations. In Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 1177--1184.

Cited By

View all
  • (2024)Non-Stationary Transformer Architecture: A Versatile Framework for Recommendation SystemsElectronics10.3390/electronics1311207513:11(2075)Online publication date: 27-May-2024
  • (2024)On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender SystemsACM Transactions on Information Systems10.1145/366199642:6(1-26)Online publication date: 19-Aug-2024
  • (2023)Generating Usage-related Questions for Preference Elicitation in Conversational Recommender SystemsACM Transactions on Recommender Systems10.1145/3629981Online publication date: 27-Oct-2023
  • Show More Cited By

Index Terms

  1. When People Change their Mind: Off-Policy Evaluation in Non-stationary Recommendation Environments

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining
      January 2019
      874 pages
      ISBN:9781450359405
      DOI:10.1145/3289600
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 30 January 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. non-stationary rewards
      2. off-policy evaluation

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      WSDM '19

      Acceptance Rates

      WSDM '19 Paper Acceptance Rate 84 of 511 submissions, 16%;
      Overall Acceptance Rate 498 of 2,863 submissions, 17%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)38
      • Downloads (Last 6 weeks)5
      Reflects downloads up to 12 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Non-Stationary Transformer Architecture: A Versatile Framework for Recommendation SystemsElectronics10.3390/electronics1311207513:11(2075)Online publication date: 27-May-2024
      • (2024)On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender SystemsACM Transactions on Information Systems10.1145/366199642:6(1-26)Online publication date: 19-Aug-2024
      • (2023)Generating Usage-related Questions for Preference Elicitation in Conversational Recommender SystemsACM Transactions on Recommender Systems10.1145/3629981Online publication date: 27-Oct-2023
      • (2023)DeepCPR: Deep Path Reasoning Using Sequence of User-Preferred Attributes for Conversational RecommendationACM Transactions on Knowledge Discovery from Data10.1145/361077518:1(1-22)Online publication date: 6-Sep-2023
      • (2023)CIRS: Bursting Filter Bubbles by Counterfactual Interactive Recommender SystemACM Transactions on Information Systems10.1145/359487142:1(1-27)Online publication date: 18-Aug-2023
      • (2023)Evaluating the Robustness of Click Models to Policy Distributional ShiftACM Transactions on Information Systems10.1145/356908641:4(1-28)Online publication date: 22-Mar-2023
      • (2023)Bias and Debias in Recommender System: A Survey and Future DirectionsACM Transactions on Information Systems10.1145/356428441:3(1-39)Online publication date: 7-Feb-2023
      • (2023)A Causal View for Item-level Effect of Recommendation on User PreferenceProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3570461(240-248)Online publication date: 27-Feb-2023
      • (2023)Knowledge Graph-Enhanced Sampling for Conversational Recommendation SystemIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.318515435:10(9890-9903)Online publication date: 1-Oct-2023
      • (2023)Towards Long-term Fairness in Interactive Recommendation: A Maximum Entropy Reinforcement Learning Approach2023 IEEE International Conference on Web Services (ICWS)10.1109/ICWS60048.2023.00029(118-123)Online publication date: Jul-2023
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media