Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3460231.3474250acmconferencesArticle/Chapter ViewAbstractPublication PagesrecsysConference Proceedingsconference-collections
research-article

Burst-induced Multi-Armed Bandit for Learning Recommendation

Published: 13 September 2021 Publication History

Abstract

In this paper, we introduce a non-stationary and context-free Multi-Armed Bandit (MAB) problem and a novel algorithm (which we refer to as BMAB) to solve it. The problem is context-free in the sense that no side information about users or items is needed. We work in a continuous-time setting where each timestamp corresponds to a visit by a user and a corresponding decision regarding recommendation. The main novelty is that we model the reward distribution as a consequence of variations in the intensity of the activity, and thereby we assist the exploration/exploitation dilemma by exploring the temporal dynamics of the audience. To achieve this, we assume that the recommendation procedure can be split into two different states: the loyal and the curious state. We identify the current state by modelling the events as a mixture of two Poisson processes, one for each of the possible states. We further assume that the loyal audience is associated with a single stationary reward distribution, but each bursty period comes with its own reward distribution. We test our algorithm and compare it to several baselines in two strands of experiments: synthetic data simulations and real-world datasets. The results demonstrate that BMAB achieves competitive results when compared to state-of-the-art methods.

Supplementary Material

MP4 File (RecSys55_2021.mp4)
We present here the RecSys 2021 paper: Burst-Induced Multi-armed Bandit for Learning Recommendation

References

[1]
Deepak Agarwal. 2013. Computational advertising: the linkedin way. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 1585–1586.
[2]
Deepak Agarwal, Bo Long, Jonathan Traupman, Doris Xin, and Liang Zhang. 2014. Laser: A scalable response prediction platform for online advertising. In Proceedings of the 7th ACM international conference on Web search and data mining. 173–182.
[3]
Shipra Agrawal and Navin Goyal. 2013. Further optimal regret bounds for thompson sampling. In Artificial intelligence and statistics. PMLR, 99–107.
[4]
Robin Allesiardo and Raphaël Féraud. 2015. Exp3 with drift detection for the switching bandit problem. In 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 1–7.
[5]
Rodrigo Augusto da Silva Alves, Renato Martins Assuncao, and Pedro Olmo Stancioli Vaz de Melo. 2016. Burstiness scale: A parsimonious model for characterizing random series of events. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1405–1414.
[6]
Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning 47, 2 (2002), 235–256.
[7]
Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E Schapire. 2002. The nonstochastic multiarmed bandit problem. SIAM journal on computing 32, 1 (2002), 48–77.
[8]
Peng Bao. 2016. Modeling and predicting popularity dynamics via an influence-based self-excited Hawkes process. In International Conference on Information and Knowledge Management, Proceedings.
[9]
Albert-László Barabási. 2005. The origin of bursts and heavy tails in human dynamics. Nature 435, 7039 (may 2005), 207–211.
[10]
Omar Besbes, Yonatan Gur, and Assaf Zeevi. 2014. Stochastic multi-armed-bandit problem with non-stationary rewards. Advances in neural information processing systems 27 (2014), 199–207.
[11]
Iván Cantador, Ignacio Fernández-Tobías, Shlomo Berkovsky, and Paolo Cremonesi. 2015. Cross-domain recommender systems. In Recommender systems handbook. Springer, 919–959.
[12]
Yang Cao, Zheng Wen, Branislav Kveton, and Yao Xie. 2019. Nearly optimal adaptive procedure with change detection for piecewise-stationary bandit. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 418–427.
[13]
Emanuele Cavenaghi, Gabriele Sottocornola, Fabio Stella, and Markus Zanker. 2021. Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm. Entropy 23, 3 (2021), 380.
[14]
Olivier Chapelle and Lihong Li. 2011. An empirical evaluation of thompson sampling. Advances in neural information processing systems 24 (2011), 2249–2257.
[15]
Daryl J Daley and David Vere-Jones. 2007. An introduction to the theory of point processes: volume II: general theory and structure. Springer Science & Business Media.
[16]
Eyal Even-Dar, Shie Mannor, and Yishay Mansour. 2002. PAC bounds for multi-armed bandit and Markov decision processes. In International Conference on Computational Learning Theory. Springer, 255–270.
[17]
Crícia Z Felício, Klérisson VR Paixão, Celia AZ Barcelos, and Philippe Preux. 2017. A multi-armed bandit model selection for cold-start user recommendation. In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization. 32–40.
[18]
Alceu Ferraz Costa, Yuto Yamaguchi, Agma Juci Machado Traina, Caetano Traina, and Christos Faloutsos. 2015. RSC: Mining and Modeling Temporal Activity in Social Media. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’15. ACM Press, New York, New York, USA, 269–278.
[19]
Edouard Fouché, Junpei Komiyama, and Klemens Böhm. 2019. Scaling multi-armed bandit algorithms. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1449–1459.
[20]
Aurélien Garivier and Eric Moulines. 2011. On upper-confidence bound policies for switching bandit problems. In International Conference on Algorithmic Learning Theory. Springer, 174–188.
[21]
Gourab Ghatak. 2020. A Change-Detection Based ThompsonSampling Framework for Non-Stationary Bandits. IEEE Trans. Comput. (2020).
[22]
John Gittins. 1974. A dynamic allocation index for the sequential design of experiments. Progress in statistics(1974), 241–266.
[23]
J. P. Gleeson, D. Cellai, J.-P. Onnela, M. A. Porter, and F. Reed-Tsochas. 2014. A simple generative model of collective online behavior. Proceedings of the National Academy of Sciences 111, 29 (jul 2014), 10411–10415.
[24]
Jyotirmoy Gope and Sanjay Kumar Jain. 2017. A survey on solving cold start problem in recommender systems. In 2017 International Conference on Computing, Communication and Automation (ICCCA). IEEE, 133–138.
[25]
Thore Graepel, Joaquin Quinonero Candela, Thomas Borchert, and Ralf Herbrich. 2010. Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft’s bing search engine. In ICML.
[26]
Cédric Hartland, Sylvain Gelly, Nicolas Baskiotis, Olivier Teytaud, and Michele Sebag. 2006. Multi-armed bandit, dynamic environments and meta-bandits. (2006).
[27]
Xu He, Bo An, Yanghua Li, Haikai Chen, Qingyu Guo, Xin Li, and Zhirong Wang. 2020. Contextual User Browsing Bandits for Large-Scale Online Mobile Recommendation. In Fourteenth ACM Conference on Recommender Systems. 63–72.
[28]
Jaya Kawale, Hung H Bui, Branislav Kveton, Long Tran-Thanh, and Sanjay Chawla. 2015. Efficient thompson sampling for online matrix-factorization recommendation. In Advances in neural information processing systems. 1297–1305.
[29]
Jon Kleinberg. 2002. Bursty and hierarchical structure in streams. In Proceedings of the eighth ACM SIGKDD(KDD ’02). ACM, New York, NY, USA, 91–101.
[30]
Dhruv Kumar Mahajan, Rajeev Rastogi, Charu Tiwari, and Adway Mitra. 2012. Logucb: an explore-exploit algorithm for comments recommendation. In Proceedings of the 21st ACM international conference on Information and knowledge management. 6–15.
[31]
R. Dean Malmgren, Jake M. Hofman, Luis A.N. Amaral, and Duncan J. Watts. 2009. Characterizing individual communication patterns. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’09. ACM Press, New York, New York, USA, 607.
[32]
Yasuko Matsubara, Yasushi Sakurai, B. Aditya Prakash, Lei Li, and Christos Faloutsos. 2012. Rise and fall patterns of information diffusion. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’12. ACM Press, New York, USA.
[33]
Swapnil Mishra. 2019. Bridging models for popularity prediction on social media. In WSDM 2019 - Proceedings of the 12th ACM International Conference on Web Search and Data Mining.
[34]
Julio Cesar Louzada Pinto, Tijani Chahed, and Eitan Altman. 2015. Trend detection in social networks using Hawkes processes. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015 - ASONAM ’15. ACM Press, New York, NY, USA.
[35]
Marian-Andrei Rizoiu, Lexing Xie, Scott Sanner, Manuel Cebrian, Honglin Yu, and Pascal Van Hentenryck. 2017. Expecting to be hip: Hawkes intensity processes for social media popularity. In Proceedings of the 26th International Conference on World Wide Web. 735–744.
[36]
Herbert Robbins. 1952. Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58, 5 (1952), 527–535.
[37]
Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, and Zheng Wen. 2017. A tutorial on thompson sampling. arXiv preprint arXiv:1707.02038(2017).
[38]
Javier Sanz-Cruzado, Pablo Castells, and Esther López. 2019. A simple multi-armed nearest-neighbor bandit for interactive recommendation. In Proceedings of the 13th ACM Conference on Recommender Systems. 358–362.
[39]
Suvash Sedhain, Aditya Menon, Scott Sanner, Lexing Xie, and Darius Braziunas. 2017. Low-rank linear cold-start recommendation from social data. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31.
[40]
Aleksandrs Slivkins. 2019. Introduction to multi-armed bandits. arXiv preprint arXiv:1904.07272(2019).
[41]
Mingxuan Sun, Fuxin Li, Joonseok Lee, Ke Zhou, Guy Lebanon, and Hongyuan Zha. 2013. Learning multiple-question decision trees for cold-start recommendation. In Proceedings of the sixth ACM international conference on Web search and data mining. 445–454.
[42]
Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
[43]
William R Thompson. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 3/4 (1933), 285–294.
[44]
Pedro Olmo S Vaz de Melo, Christos Faloutsos, Renato Assunção, and Antonio Loureiro. 2013. The self-feeding process: a unifying model for communication dynamics in the web. In Proceedings of the 22nd international conference on World Wide Web. 1319–1330.
[45]
Bo Wu, Wen-Huang Cheng, Yongdong Zhang, and Tao Mei. 2016. Time Matters. In Proceedings of the 24th ACM international conference on Multimedia. ACM, New York, NY, USA, 1336–1344.
[46]
Shuang-hong Yang and Hongyuan Zha. 2013. Mixture of Mutually Exciting Processes for Viral Diffusion. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), Sanjoy Dasgupta and David Mcallester (Eds.), Vol. 28. JMLR Workshop and Conference Proceedings, 1–9. http://jmlr.csail.mit.edu/proceedings/papers/v28/yang13a.pdf
[47]
Honglin Yu, Lexing Xie, and Scott Sanner. 2015. The Lifecyle of a Youtube Video: Phases, Content and Popularity. http://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/view/10537
[48]
Jia Yuan Yu and Shie Mannor. 2009. Piecewise-stationary bandit problems with side observations. In Proceedings of the 26th annual international conference on machine learning. 1177–1184.
[49]
Chunqiu Zeng, Qing Wang, Shekoofeh Mokhtari, and Tao Li. 2016. Online context-aware recommendation with time varying multi-armed bandit. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 2025–2034.
[50]
Qingyuan Zhao, Murat A. Erdogdu, Hera Y. He, Anand Rajaraman, and Jure Leskovec. 2015. SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’15. ACM Press, New York, New York, USA, 1513–1522.

Cited By

View all
  • (2024)Regionalization-Based Collaborative Filtering: Harnessing Geographical Information in RecommendersACM Transactions on Spatial Algorithms and Systems10.1145/365664110:2(1-23)Online publication date: 21-May-2024
  • (2023)Enhancing Conversational Recommendation Systems with Representation FusionACM Transactions on the Web10.1145/357703417:1(1-34)Online publication date: 21-Feb-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
RecSys '21: Proceedings of the 15th ACM Conference on Recommender Systems
September 2021
883 pages
ISBN:9781450384582
DOI:10.1145/3460231
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 September 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Audience dynamics
  2. Bursty methods
  3. Multi-Armed bandit
  4. Time series

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • BMBF
  • DFG

Conference

RecSys '21: Fifteenth ACM Conference on Recommender Systems
September 27 - October 1, 2021
Amsterdam, Netherlands

Acceptance Rates

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)8
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Regionalization-Based Collaborative Filtering: Harnessing Geographical Information in RecommendersACM Transactions on Spatial Algorithms and Systems10.1145/365664110:2(1-23)Online publication date: 21-May-2024
  • (2023)Enhancing Conversational Recommendation Systems with Representation FusionACM Transactions on the Web10.1145/357703417:1(1-34)Online publication date: 21-Feb-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media