Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3666122.3666468guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
research-article

Non-stationary bandits with auto-regressive temporal dependency

Published: 30 May 2024 Publication History

Abstract

Traditional multi-armed bandit (MAB) frameworks, predominantly examined under stochastic or adversarial settings, often overlook the temporal dynamics inherent in many real-world applications such as recommendation systems and online advertising. This paper introduces a novel non-stationary MAB framework that captures the temporal structure of these real-world dynamics through an auto-regressive (AR) reward structure. We propose an algorithm that integrates two key mechanisms: (i) an alternation mechanism adept at leveraging temporal dependencies to dynamically balance exploration and exploitation, and (ii) a restarting mechanism designed to discard out-of-date information. Our algorithm achieves a regret upper bound that nearly matches the lower bound, with regret measured against a robust dynamic benchmark. Finally, via a real-world case study on tourism demand prediction, we demonstrate both the efficacy of our algorithm and the broader applicability of our techniques to more complex, rapidly evolving time series.

References

[1]
D. A. Aaker, J. M. Carman, and R. Jacobson. Modeling advertising-sales relationships involving feedback: a time series analysis of six cereal brands. Journal of Marketing Research, 19(1):116-125, 1982.
[2]
D. Agarwal, B.-C. Chen, and R Elango. Spatio-temporal models for estimating click-through rate. In Proceedings of the 18th International Conference on World. Wide Web, page 21-30, 2009.
[3]
O. Anava, E. Hazan, and A. Zeevi. Online time series prediction with missing data. In Proceedings of the 32nd International Conference on Machine Learning, volume 37, pages 2191-2199, 2015.
[4]
A. A. Ariyo, A. O. Adewumi, and C. K. Ayo. Stock price prediction using the arima model. In 2014 UKSim-AMSS 16th international conference on computer modelling and simulation, pages 106-112. IEEE, 2014.
[5]
P. Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3:397^22, 2003.
[6]
P. Auer, N. Cesa-Bianchi, and R Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47:235-256, 2002.
[7]
P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The non-stochastic multi-armed bandit problem. SIAM journal of computing, 32:48-77, 2002.
[8]
K. Azuma. Weighted sums of certain dependent random variables. Tohoku Mathematical Journal, Second Series, 19(3):357-367, 1967.
[9]
D. Bertsimas and J. Nino-Mora. Restless bandits, linear programming relaxations, and a primal-dual index heuristic. Operations Research, 48(1):80-90, 2000.
[10]
O. Besbes, Y. Gur, and A. Zeevi. Stochastic multi-armed-bandit problem with non-stationary rewards. In Advances in Neural Information Processing Systems, volume 27, 2014.
[11]
I. Bogunovic, J. Scarlett, and V. Cevher. Time-varying gaussian process bandit optimization. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, volume 51, pages 314-323. PMLR, 2016.
[12]
Y. Cao, Z. Wen, B. Kveton, and Y. Xie. Nearly optimal adaptive procedure with change detection for piecewise-stationary bandit. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, volume 89 of PMLR, pages 418-427, 2019.
[13]
L. Cella and N. Cesa-Bianchi. Stochastic bandits with delay-dependent payoffs. In International Conference on Artificial Intelligence and Statistics, pages 1168-1177. PMLR, 2020.
[14]
V. Chavez-Demoulin and J. McGill. High-frequency financial data modeling using hawkes processes. Journal of Banking & Finance, 36(12):3415-3426, 2012.
[15]
N. Chen, C. Wang, and L. Wang. Learning and optimization with seasonal patterns. arXiv preprint arXiv:2005.08088, 2020.
[16]
V. Dani and T. P. Hayes. Robbing the bandit: Less regret in online geometric optimization against an adaptive adversary. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithm (SODA), page 937-943, 2006.
[17]
H. A. David and H. N. Nagaraja. Order Statistics. Wiley, 2003.
[18]
M. G. Dekimpe and D. M. Hanssens. The persistence of marketing effects on sales. Marketing science, 14(1):1-21, 1995.
[19]
B. U. Devi, D. Sundar, and P. Alli. An effective time series analysis for stock trend prediction using arima model for nifty midcap-50. International Journal of Data Mining & Knowledge Management Process, 3(1):65, 2013.
[20]
R. Durrett. Probability: Theory and Examples. Cambridge University Press, 5th edition, 2019.
[21]
J. Fattah, L. Ezzine, Z. Aman, H. E. Moussami, and A. Lachhab. Forecasting of demand using arima model. International Journal of Engineering Business Management, 10, 2018.
[22]
T. C. Gard. Introduction to Stochastic Differential Equations. Marcel Dekker, 1988.
[23]
C. W. Gardiner. Handbook of stochastic methods: for the natural and social sciences. Springer, 2009.
[24]
A. Garivier, T. Lattimore, and E. Kaufmann. On explore-then-commit strategies. In Advances in Neural Information Processing Systems, volume 29, 2016.
[25]
A. Garivier and E. Moulines. On upper-confidence bound policies for switching bandit problems. In Algorithmic Learning Theory, pages 174-188. Springer Berlin Heidelberg, 2011.
[26]
J. Gittins and D. Jones. A dynamic allocation index for the sequential design of experiments. In Progress in Statistics, pages 241-266. North-Holland, 1974.
[27]
S. Grünewälder and A. Khaleghi. Approximations of the restless bandit problem. Journal of Machine Learning Research, 20(14):1-37, 2019.
[28]
S. Guha and K. Munagala. Approximation algorithms for partial-information based stochastic control with markovian rewards. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07), pages 483-493, 2007.
[29]
W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American statistical association, 58(301):13-30, 1963.
[30]
N. B. Keskin and A. Zeevi. Chasing demand: Learning and earning in a changing environment. Mathematics of Operations Research, 42(2):277-307, 2017.
[31]
R. Kleinberg. Nearly tight bounds for the continuum-armed bandit problem. In Proceedings of the 17th International Conference on Neural Information Processing Systems, page 697-704, 2004.
[32]
R. Kleinberg and N. Immorlica. Recharging bandits. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 309-319. IEEE, 2018.
[33]
R. Kleinberg and T. Leighton. The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 594-605, 2003.
[34]
P. Kloeden and E. Platen. Numerical Solution of Stochastic Differential Equations. Springer, 1992.
[35]
T. L. Lai, H. Robbins, et al. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1):4-22, 1985.
[36]
C. Lim and M. MacAleer. Time series forecasts of international tourism demand for australia. Technical report, ISER Discussion Paper, 2001.
[37]
F. Liu, J. Lee, and N. B. Shroff. A change-detection based framework for piecewise-stationary multi-armed bandit problem. In AAAI, pages 3651-3658, 2018.
[38]
K. Liu and Q. Zhao. Indexability of restless bandit problems and optimality of whittle index for dynamic multichannel access. IEEE Trans. Inf. Theor., 56(11):5547-5567, nov 2010.
[39]
Y. Liu, B. Van Roy, and K. Xu. Nonstationary bandit learning via predictive sampling. In International Conference on Artificial Intelligence and Statistics, pages 6215-6244. PMLR, 2023.
[40]
A. Mate, J. Killian, H. Xu, A. Perrault, and M. Tambe. Collapsing bandits and their application to public health intervention. In Advances in Neural Information Processing Systems, volume 33, pages 15639-15650. Curran Associates, Inc., 2020.
[41]
R. Ortner, D. Ryabko, P. Auer, and R. Munos. Regret bounds for restless markov bandits. In Proceedings of the 23rd International Conference on Algorithmic Learning Theory, page 214-228, 2012.
[42]
S. Pandey, D. Agarwal, D. Chakrabarti, and V. Josifovski. Bandits for taxonomies: A modelbased approach. In SIAM International Conference on Data Mining, pages 216-227, 2007.
[43]
C. H. Papadimitriou and J. N. Tsitsiklis. The complexity of optimal queuing network control. Mathematics of Operations Research, 24(2):293-305, 1999.
[44]
G. A. Pavliotis. Stochastic processes and applications. Springer-Verlag New York, 2016.
[45]
D. Pollard. Convergence of stochastic processes. Berlin: Springer, 1984.
[46]
H. Robbins. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5):527 - 535, 1952.
[47]
R. H. Shumway and D. S. Stoffer. Time Series Analysis and Its Applications: With R Examples. Springer, 4th edition, 2017.
[48]
A. Slivkins and E. Upfal. Adapting to a changing environment: the brownian restless bandits. In 21st Conference on Learning Theory (COLT), pages 343-354, 2008.
[49]
R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. The MIT Press, second edition, 2018.
[50]
C. Tekin and M. Liu. Online learning of rested and restless bandits. IEEE Transactions on Information Theory, 58(8):5588-5611, 2012.
[51]
W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25:285 - 294, 1933.
[52]
F. Trovo, S. Paladino, M. Restelli, and N. Gatti. Sliding-window thompson sampling for non-stationary settings. Journal of Artificial Intelligence Research, 68:311-364, 2020.
[53]
P. Whittle. Restless bandits: activity allocation in a changing world. Journal of Applied Probability, 25A:287-298, 1988.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems
December 2023
80772 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 30 May 2024

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media