research-article

Burst-induced Multi-Armed Bandit for Learning Recommendation

Authors:

Antoine Ledent,

Marius KloftAuthors Info & Claims

RecSys '21: Proceedings of the 15th ACM Conference on Recommender Systems

Pages 292 - 301

https://doi.org/10.1145/3460231.3474250

Published: 13 September 2021 Publication History

Abstract

In this paper, we introduce a non-stationary and context-free Multi-Armed Bandit (MAB) problem and a novel algorithm (which we refer to as BMAB) to solve it. The problem is context-free in the sense that no side information about users or items is needed. We work in a continuous-time setting where each timestamp corresponds to a visit by a user and a corresponding decision regarding recommendation. The main novelty is that we model the reward distribution as a consequence of variations in the intensity of the activity, and thereby we assist the exploration/exploitation dilemma by exploring the temporal dynamics of the audience. To achieve this, we assume that the recommendation procedure can be split into two different states: the loyal and the curious state. We identify the current state by modelling the events as a mixture of two Poisson processes, one for each of the possible states. We further assume that the loyal audience is associated with a single stationary reward distribution, but each bursty period comes with its own reward distribution. We test our algorithm and compare it to several baselines in two strands of experiments: synthetic data simulations and real-world datasets. The results demonstrate that BMAB achieves competitive results when compared to state-of-the-art methods.

Supplementary Material

MP4 File (RecSys55_2021.mp4)

We present here the RecSys 2021 paper: Burst-Induced Multi-armed Bandit for Learning Recommendation

Download
20.70 MB

References

[1]

Deepak Agarwal. 2013. Computational advertising: the linkedin way. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 1585–1586.

Digital Library

[2]

Deepak Agarwal, Bo Long, Jonathan Traupman, Doris Xin, and Liang Zhang. 2014. Laser: A scalable response prediction platform for online advertising. In Proceedings of the 7th ACM international conference on Web search and data mining. 173–182.

Digital Library

[3]

Shipra Agrawal and Navin Goyal. 2013. Further optimal regret bounds for thompson sampling. In Artificial intelligence and statistics. PMLR, 99–107.

[4]

Robin Allesiardo and Raphaël Féraud. 2015. Exp3 with drift detection for the switching bandit problem. In 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 1–7.

[5]

Rodrigo Augusto da Silva Alves, Renato Martins Assuncao, and Pedro Olmo Stancioli Vaz de Melo. 2016. Burstiness scale: A parsimonious model for characterizing random series of events. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1405–1414.

Digital Library

[6]

Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning 47, 2 (2002), 235–256.

Digital Library

[7]

Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E Schapire. 2002. The nonstochastic multiarmed bandit problem. SIAM journal on computing 32, 1 (2002), 48–77.

[8]

Peng Bao. 2016. Modeling and predicting popularity dynamics via an influence-based self-excited Hawkes process. In International Conference on Information and Knowledge Management, Proceedings.

Digital Library

[9]

Albert-László Barabási. 2005. The origin of bursts and heavy tails in human dynamics. Nature 435, 7039 (may 2005), 207–211.

[10]

Omar Besbes, Yonatan Gur, and Assaf Zeevi. 2014. Stochastic multi-armed-bandit problem with non-stationary rewards. Advances in neural information processing systems 27 (2014), 199–207.

[11]

Iván Cantador, Ignacio Fernández-Tobías, Shlomo Berkovsky, and Paolo Cremonesi. 2015. Cross-domain recommender systems. In Recommender systems handbook. Springer, 919–959.

Digital Library

[12]

Yang Cao, Zheng Wen, Branislav Kveton, and Yao Xie. 2019. Nearly optimal adaptive procedure with change detection for piecewise-stationary bandit. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 418–427.

[13]

Emanuele Cavenaghi, Gabriele Sottocornola, Fabio Stella, and Markus Zanker. 2021. Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm. Entropy 23, 3 (2021), 380.

[14]

Olivier Chapelle and Lihong Li. 2011. An empirical evaluation of thompson sampling. Advances in neural information processing systems 24 (2011), 2249–2257.

[15]

Daryl J Daley and David Vere-Jones. 2007. An introduction to the theory of point processes: volume II: general theory and structure. Springer Science & Business Media.

[16]

Eyal Even-Dar, Shie Mannor, and Yishay Mansour. 2002. PAC bounds for multi-armed bandit and Markov decision processes. In International Conference on Computational Learning Theory. Springer, 255–270.

[17]

Crícia Z Felício, Klérisson VR Paixão, Celia AZ Barcelos, and Philippe Preux. 2017. A multi-armed bandit model selection for cold-start user recommendation. In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization. 32–40.

Digital Library

[18]

Alceu Ferraz Costa, Yuto Yamaguchi, Agma Juci Machado Traina, Caetano Traina, and Christos Faloutsos. 2015. RSC: Mining and Modeling Temporal Activity in Social Media. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’15. ACM Press, New York, New York, USA, 269–278.

Digital Library

[19]

Edouard Fouché, Junpei Komiyama, and Klemens Böhm. 2019. Scaling multi-armed bandit algorithms. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1449–1459.

Digital Library

[20]

Aurélien Garivier and Eric Moulines. 2011. On upper-confidence bound policies for switching bandit problems. In International Conference on Algorithmic Learning Theory. Springer, 174–188.

[21]

Gourab Ghatak. 2020. A Change-Detection Based ThompsonSampling Framework for Non-Stationary Bandits. IEEE Trans. Comput. (2020).

[22]

John Gittins. 1974. A dynamic allocation index for the sequential design of experiments. Progress in statistics(1974), 241–266.

[23]

J. P. Gleeson, D. Cellai, J.-P. Onnela, M. A. Porter, and F. Reed-Tsochas. 2014. A simple generative model of collective online behavior. Proceedings of the National Academy of Sciences 111, 29 (jul 2014), 10411–10415.

[24]

Jyotirmoy Gope and Sanjay Kumar Jain. 2017. A survey on solving cold start problem in recommender systems. In 2017 International Conference on Computing, Communication and Automation (ICCCA). IEEE, 133–138.

[25]

Thore Graepel, Joaquin Quinonero Candela, Thomas Borchert, and Ralf Herbrich. 2010. Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft’s bing search engine. In ICML.

[26]

Cédric Hartland, Sylvain Gelly, Nicolas Baskiotis, Olivier Teytaud, and Michele Sebag. 2006. Multi-armed bandit, dynamic environments and meta-bandits. (2006).

[27]

Xu He, Bo An, Yanghua Li, Haikai Chen, Qingyu Guo, Xin Li, and Zhirong Wang. 2020. Contextual User Browsing Bandits for Large-Scale Online Mobile Recommendation. In Fourteenth ACM Conference on Recommender Systems. 63–72.

[28]

Jaya Kawale, Hung H Bui, Branislav Kveton, Long Tran-Thanh, and Sanjay Chawla. 2015. Efficient thompson sampling for online matrix-factorization recommendation. In Advances in neural information processing systems. 1297–1305.

[29]

Jon Kleinberg. 2002. Bursty and hierarchical structure in streams. In Proceedings of the eighth ACM SIGKDD(KDD ’02). ACM, New York, NY, USA, 91–101.

Digital Library

[30]

Dhruv Kumar Mahajan, Rajeev Rastogi, Charu Tiwari, and Adway Mitra. 2012. Logucb: an explore-exploit algorithm for comments recommendation. In Proceedings of the 21st ACM international conference on Information and knowledge management. 6–15.

Digital Library

[31]

R. Dean Malmgren, Jake M. Hofman, Luis A.N. Amaral, and Duncan J. Watts. 2009. Characterizing individual communication patterns. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’09. ACM Press, New York, New York, USA, 607.

Digital Library

[32]

Yasuko Matsubara, Yasushi Sakurai, B. Aditya Prakash, Lei Li, and Christos Faloutsos. 2012. Rise and fall patterns of information diffusion. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’12. ACM Press, New York, USA.

Digital Library

[33]

Swapnil Mishra. 2019. Bridging models for popularity prediction on social media. In WSDM 2019 - Proceedings of the 12th ACM International Conference on Web Search and Data Mining.

Digital Library

[34]

Julio Cesar Louzada Pinto, Tijani Chahed, and Eitan Altman. 2015. Trend detection in social networks using Hawkes processes. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015 - ASONAM ’15. ACM Press, New York, NY, USA.

Digital Library

[35]

Marian-Andrei Rizoiu, Lexing Xie, Scott Sanner, Manuel Cebrian, Honglin Yu, and Pascal Van Hentenryck. 2017. Expecting to be hip: Hawkes intensity processes for social media popularity. In Proceedings of the 26th International Conference on World Wide Web. 735–744.

Digital Library

[36]

Herbert Robbins. 1952. Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58, 5 (1952), 527–535.

[37]

Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, and Zheng Wen. 2017. A tutorial on thompson sampling. arXiv preprint arXiv:1707.02038(2017).

[38]

Javier Sanz-Cruzado, Pablo Castells, and Esther López. 2019. A simple multi-armed nearest-neighbor bandit for interactive recommendation. In Proceedings of the 13th ACM Conference on Recommender Systems. 358–362.

Digital Library

[39]

Suvash Sedhain, Aditya Menon, Scott Sanner, Lexing Xie, and Darius Braziunas. 2017. Low-rank linear cold-start recommendation from social data. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31.

[40]

Aleksandrs Slivkins. 2019. Introduction to multi-armed bandits. arXiv preprint arXiv:1904.07272(2019).

[41]

Mingxuan Sun, Fuxin Li, Joonseok Lee, Ke Zhou, Guy Lebanon, and Hongyuan Zha. 2013. Learning multiple-question decision trees for cold-start recommendation. In Proceedings of the sixth ACM international conference on Web search and data mining. 445–454.

Digital Library

[42]

Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.

Digital Library

[43]

William R Thompson. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 3/4 (1933), 285–294.

[44]

Pedro Olmo S Vaz de Melo, Christos Faloutsos, Renato Assunção, and Antonio Loureiro. 2013. The self-feeding process: a unifying model for communication dynamics in the web. In Proceedings of the 22nd international conference on World Wide Web. 1319–1330.

Digital Library

[45]

Bo Wu, Wen-Huang Cheng, Yongdong Zhang, and Tao Mei. 2016. Time Matters. In Proceedings of the 24th ACM international conference on Multimedia. ACM, New York, NY, USA, 1336–1344.

[46]

Shuang-hong Yang and Hongyuan Zha. 2013. Mixture of Mutually Exciting Processes for Viral Diffusion. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), Sanjoy Dasgupta and David Mcallester (Eds.), Vol. 28. JMLR Workshop and Conference Proceedings, 1–9. http://jmlr.csail.mit.edu/proceedings/papers/v28/yang13a.pdf

[47]

Honglin Yu, Lexing Xie, and Scott Sanner. 2015. The Lifecyle of a Youtube Video: Phases, Content and Popularity. http://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/view/10537

[48]

Jia Yuan Yu and Shie Mannor. 2009. Piecewise-stationary bandit problems with side observations. In Proceedings of the 26th annual international conference on machine learning. 1177–1184.

Digital Library

[49]

Chunqiu Zeng, Qing Wang, Shekoofeh Mokhtari, and Tao Li. 2016. Online context-aware recommendation with time varying multi-armed bandit. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 2025–2034.

Digital Library

[50]

Qingyuan Zhao, Murat A. Erdogdu, Hera Y. He, Anand Rajaraman, and Jure Leskovec. 2015. SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’15. ACM Press, New York, New York, USA, 1513–1522.

Digital Library

Cited By

Alves R(2024)Regionalization-Based Collaborative Filtering: Harnessing Geographical Information in RecommendersACM Transactions on Spatial Algorithms and Systems10.1145/365664110:2(1-23)Online publication date: 21-May-2024
https://dl.acm.org/doi/10.1145/3656641
Wang YChen XFang JMeng ZLiang S(2023)Enhancing Conversational Recommendation Systems with Representation FusionACM Transactions on the Web10.1145/357703417:1(1-34)Online publication date: 21-Feb-2023
https://dl.acm.org/doi/10.1145/3577034

Recommendations

Multi-armed bandit problem with known trend

We consider a variant of the multi-armed bandit model, which we call multi-armed bandit problem with known trend, where the gambler knows the shape of the reward function of each arm but not its distribution. This new problem is motivated by different ...
Multi-armed Bandit with Additional Observations
SIGMETRICS '18

We study multi-armed bandit (MAB) problems with additional observations, where in each round, the decision maker selects an arm to play and can also observe rewards of additional arms (within a given budget) by paying certain costs. We propose ...
Multi-armed Bandit with Additional Observations
SIGMETRICS '18: Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems

We study multi-armed bandit (MAB) problems with additional observations, where in each round, the decision maker selects an arm to play and can also observe rewards of additional arms (within a given budget) by paying certain costs. We propose ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

RecSys '21: Proceedings of the 15th ACM Conference on Recommender Systems

September 2021

883 pages

ISBN:9781450384582

DOI:10.1145/3460231

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 September 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

BMBF
DFG

Conference

RecSys '21

Sponsor:

RecSys '21: Fifteenth ACM Conference on Recommender Systems

September 27 - October 1, 2021

Amsterdam, Netherlands

Acceptance Rates

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
624
Total Downloads

Downloads (Last 12 months)28
Downloads (Last 6 weeks)8

Reflects downloads up to 20 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Alves R(2024)Regionalization-Based Collaborative Filtering: Harnessing Geographical Information in RecommendersACM Transactions on Spatial Algorithms and Systems10.1145/365664110:2(1-23)Online publication date: 21-May-2024
https://dl.acm.org/doi/10.1145/3656641
Wang YChen XFang JMeng ZLiang S(2023)Enhancing Conversational Recommendation Systems with Representation FusionACM Transactions on the Web10.1145/357703417:1(1-34)Online publication date: 21-Feb-2023
https://dl.acm.org/doi/10.1145/3577034

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents