Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits

Published: 22 February 2021 Publication History

Abstract

We consider combinatorial semi-bandits over a set of arms X \subset \0,1\ ^d where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound R(T) = O( d (łn m)^2 (łn T) / Δ_\min ) after T rounds, where m = \max_x \in X 1^\top x. However, ESCB it has computational complexity O(|X|), which is typically exponential in d, and cannot be used in large dimensions. We propose the first algorithm that is both computationally and statistically efficient for this problem with regret R(T) = O( d (łn m)^2 (łn T) / Δ_\min ) and computational asymptotic complexity O(δ_T^-1 poly(d)), where δ_T is a function which vanishes arbitrarily slowly. Our approach involves carefully designing AESCB, an approximate version of ESCB with the same regret guarantees. We show that, whenever budgeted linear maximization over X can be solved up to a given approximation ratio, AESCB is implementable in polynomial time O(δ_T^-1 poly(d)) by repeatedly maximizing a linear function over X subject to a linear budget constraint, and showing how to solve these maximization problems efficiently.

References

[1]
Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. 2011. Improved Algorithms for Linear Stochastic Bandits. In Proc. of NIPS .
[2]
Venkatachalam Anantharam, Pravin Varaiya, and Jean Walrand. 1987. Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part I: iid rewards. Automatic Control, IEEE Transactions on, Vol. 32, 11 (1987), 968--976.
[3]
Alper Atamtü rk and André s Gó mez. 2017. Maximizing a Class of Utility Functions Over the Vertices of a Polytope. Operations Research, Vol. 65, 2 (2017), 433--445.
[4]
Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. 2003. The Nonstochastic Multiarmed Bandit Problem. SIAM J. Comput., Vol. 32, 1 (2003), 48--77.
[5]
Andre Berger, Vincenzo Bonifaci, Fabrizio Gandoni, and Guido Shaefer. 2011. Budgeted Matching and Budgeted Matroid intersection via the Gasoline Puzzle. Mathematical Programming (2011).
[6]
Jeff. Bezanson, Alan. Edelman, Stefan. Karpinski, and Viral B. Shah. 2017. Julia: A Fresh Approach to Numerical Computing. SIAM Rev., Vol. 59, 1 (2017), 65--98.
[7]
Olivier Cappé, Aurelien Garivier, Odalric Maillard, Remi Munos, and Gilles Stoltz. 2013. Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation. Annals of Statistics, Vol. 41, 3 (June 2013), 516--541.
[8]
Nicolò Cesa-Bianchi and Gábor Lugosi. 2012. Combinatorial bandits. J. Comput. System Sci., Vol. 78, 5 (2012), 1404--1422.
[9]
Wei Chen, Yajun Wang, and Yang Yuan. 2013. Combinatorial Multi-Armed Bandit: General Framework, Results and Applications. In Proc. of ICML .
[10]
Wei Chu, Lihong Li, Lev Reyzin, and Robert Schapire. 2011. Contextual Bandits with Linear Payoff Functions. In Proc. of AISTATS .
[11]
Chris Coey, Miles Lubin, and Juan Pablo Vielma. 2020. Outer approximation with conic certificates for mixed-integer convex problems. Math. Program. Comput., Vol. 12, 2 (2020), 249--293.
[12]
Richard Combes, M. Sadegh Talebi, Alexandre Proutiere, and Marc Lelarge. 2015. Combinatorial Bandits Revisited. In Proc. of NIPS .
[13]
Varsha Dani, Thomas P. Hayes, and Sham M. Kakade. 2008. Stochastic Linear Optimization under Bandit Feedback. In Proc. of COLT . 355--366.
[14]
Remy Degenne and Vianney Perchet. 2016. Combinatorial semi-bandit with known covariance. In Proc. of NIPS .
[15]
Branislav Kveton, Zheng Wen, Azin Ashkan, Hoda Eydgahi, and Brian Eriksson. 2014. Matroid Bandits: Fast Combinatorial Optimization with Learning. In Proc. of UAI .
[16]
Branislav Kveton, Zheng Wen, Azin Ashkan, and Csaba Szepesvari. 2015. Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits. In Proc. of AISTATS .
[17]
Tze Leung Lai and Herbert Robbins. 1985. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, Vol. 6, 1 (1985), 4--2.
[18]
Miguel Sousa Lobo, Lieven Vandenberghe, Stephen Boyd, and Hervé Lebret. 1998. Applications of second-order cone programming. Linear Algebra Appl., Vol. 284, 1 (1998), 193 -- 228.
[19]
Nimrod Megiddo. 1981. Applying parallel computation algorithms in the design of serial algorithms. In Proc. of FOCS .
[20]
James G. Oxley. 2006. Matroid Theory (Oxford Graduate Texts in Mathematics) .Oxford University Press, Inc., USA.
[21]
Pierre Perrault, Vianney Perchet, and Michal Valko. 2019. Exploiting structure of uncertainty for efficient matroid semi-bandits. In Proc. of ICML .
[22]
R. Ravi and Michel X. Goemans. 1996. The constrained minimum spanning tree problem. SWAT (1996).
[23]
Idan Rejwan and Yishay Mansour. 2020. Top-k Combinatorial Bandits with Full-Bandit Feedback (Proc. of ALT).
[24]
Herbert Robbins. 1952. Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc., Vol. 58, 5 (1952), 527--535.
[25]
M. Sadegh Talebi and Alexandre Proutiere. 2016. An Optimal Algorithm for Stochastic Matroid Bandit Optimization. In Proc. of ICAAMS .
[26]
Siwei Wang and Wei Chen. 2018. Thompson Sampling for Combinatorial Semi-Bandits. In Proc. of ICML .
[27]
Zheng Wen, Branislav Kveton, and Azin Ashkan. 2015. Efficient learning in large-scale combinatorial semi-bandits. In Proc. of ICML .

Cited By

View all
  • (2024)Metaheuristics and machine learning: an approach with reinforcement learning assisting neural architecture searchJournal of Heuristics10.1007/s10732-024-09526-130:3-4(199-224)Online publication date: 1-Aug-2024
  • (2023)Multiagent Online Source Seeking Using Bandit AlgorithmIEEE Transactions on Automatic Control10.1109/TAC.2022.323219068:5(3147-3154)Online publication date: May-2023
  • (2022)Batch-size independent regret bounds for combinatorial semi-bandits with probabilistically triggered arms or independent armsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601354(14904-14916)Online publication date: 28-Nov-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems
Proceedings of the ACM on Measurement and Analysis of Computing Systems  Volume 5, Issue 1
POMACS
March 2021
252 pages
EISSN:2476-1249
DOI:10.1145/3452093
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2021
Published in POMACS Volume 5, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bandits
  2. combinatorial bandits
  3. combinatorial optimization

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)5
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Metaheuristics and machine learning: an approach with reinforcement learning assisting neural architecture searchJournal of Heuristics10.1007/s10732-024-09526-130:3-4(199-224)Online publication date: 1-Aug-2024
  • (2023)Multiagent Online Source Seeking Using Bandit AlgorithmIEEE Transactions on Automatic Control10.1109/TAC.2022.323219068:5(3147-3154)Online publication date: May-2023
  • (2022)Batch-size independent regret bounds for combinatorial semi-bandits with probabilistically triggered arms or independent armsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601354(14904-14916)Online publication date: 28-Nov-2022
  • (2021)Stochastic bandits with groups of similar armsProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3541750(19461-19472)Online publication date: 6-Dec-2021
  • (2021)On the suboptimality of thompson sampling in high dimensionsProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540899(8345-8354)Online publication date: 6-Dec-2021

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media