research-article

Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits

Authors:

Thibaut Cuvelier,

Richard Combes,

Eric GourdinAuthors Info & Claims

Proceedings of the ACM on Measurement and Analysis of Computing Systems, Volume 5, Issue 1

Article No.: 09, Pages 1 - 31

https://doi.org/10.1145/3447387

Published: 22 February 2021 Publication History

Abstract

We consider combinatorial semi-bandits over a set of arms X \subset \0,1\ ^d where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound R(T) = O( d (łn m)^2 (łn T) / Δ_\min ) after T rounds, where m = \max_x \in X 1^\top x. However, ESCB it has computational complexity O(|X|), which is typically exponential in d, and cannot be used in large dimensions. We propose the first algorithm that is both computationally and statistically efficient for this problem with regret R(T) = O( d (łn m)^2 (łn T) / Δ_\min ) and computational asymptotic complexity O(δ_T^-1 poly(d)), where δ_T is a function which vanishes arbitrarily slowly. Our approach involves carefully designing AESCB, an approximate version of ESCB with the same regret guarantees. We show that, whenever budgeted linear maximization over X can be solved up to a given approximation ratio, AESCB is implementable in polynomial time O(δ_T^-1 poly(d)) by repeatedly maximizing a linear function over X subject to a linear budget constraint, and showing how to solve these maximization problems efficiently.

References

[1]

Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. 2011. Improved Algorithms for Linear Stochastic Bandits. In Proc. of NIPS .

Digital Library

[2]

Venkatachalam Anantharam, Pravin Varaiya, and Jean Walrand. 1987. Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part I: iid rewards. Automatic Control, IEEE Transactions on, Vol. 32, 11 (1987), 968--976.

[3]

Alper Atamtü rk and André s Gó mez. 2017. Maximizing a Class of Utility Functions Over the Vertices of a Polytope. Operations Research, Vol. 65, 2 (2017), 433--445.

Digital Library

[4]

Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. 2003. The Nonstochastic Multiarmed Bandit Problem. SIAM J. Comput., Vol. 32, 1 (2003), 48--77.

Digital Library

[5]

Andre Berger, Vincenzo Bonifaci, Fabrizio Gandoni, and Guido Shaefer. 2011. Budgeted Matching and Budgeted Matroid intersection via the Gasoline Puzzle. Mathematical Programming (2011).

[6]

Jeff. Bezanson, Alan. Edelman, Stefan. Karpinski, and Viral B. Shah. 2017. Julia: A Fresh Approach to Numerical Computing. SIAM Rev., Vol. 59, 1 (2017), 65--98.

Digital Library

[7]

Olivier Cappé, Aurelien Garivier, Odalric Maillard, Remi Munos, and Gilles Stoltz. 2013. Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation. Annals of Statistics, Vol. 41, 3 (June 2013), 516--541.

[8]

Nicolò Cesa-Bianchi and Gábor Lugosi. 2012. Combinatorial bandits. J. Comput. System Sci., Vol. 78, 5 (2012), 1404--1422.

Digital Library

[9]

Wei Chen, Yajun Wang, and Yang Yuan. 2013. Combinatorial Multi-Armed Bandit: General Framework, Results and Applications. In Proc. of ICML .

[10]

Wei Chu, Lihong Li, Lev Reyzin, and Robert Schapire. 2011. Contextual Bandits with Linear Payoff Functions. In Proc. of AISTATS .

[11]

Chris Coey, Miles Lubin, and Juan Pablo Vielma. 2020. Outer approximation with conic certificates for mixed-integer convex problems. Math. Program. Comput., Vol. 12, 2 (2020), 249--293.

[12]

Richard Combes, M. Sadegh Talebi, Alexandre Proutiere, and Marc Lelarge. 2015. Combinatorial Bandits Revisited. In Proc. of NIPS .

[13]

Varsha Dani, Thomas P. Hayes, and Sham M. Kakade. 2008. Stochastic Linear Optimization under Bandit Feedback. In Proc. of COLT . 355--366.

[14]

Remy Degenne and Vianney Perchet. 2016. Combinatorial semi-bandit with known covariance. In Proc. of NIPS .

[15]

Branislav Kveton, Zheng Wen, Azin Ashkan, Hoda Eydgahi, and Brian Eriksson. 2014. Matroid Bandits: Fast Combinatorial Optimization with Learning. In Proc. of UAI .

[16]

Branislav Kveton, Zheng Wen, Azin Ashkan, and Csaba Szepesvari. 2015. Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits. In Proc. of AISTATS .

[17]

Tze Leung Lai and Herbert Robbins. 1985. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, Vol. 6, 1 (1985), 4--2.

Digital Library

[18]

Miguel Sousa Lobo, Lieven Vandenberghe, Stephen Boyd, and Hervé Lebret. 1998. Applications of second-order cone programming. Linear Algebra Appl., Vol. 284, 1 (1998), 193 -- 228.

[19]

Nimrod Megiddo. 1981. Applying parallel computation algorithms in the design of serial algorithms. In Proc. of FOCS .

Digital Library

[20]

James G. Oxley. 2006. Matroid Theory (Oxford Graduate Texts in Mathematics) .Oxford University Press, Inc., USA.

[21]

Pierre Perrault, Vianney Perchet, and Michal Valko. 2019. Exploiting structure of uncertainty for efficient matroid semi-bandits. In Proc. of ICML .

[22]

R. Ravi and Michel X. Goemans. 1996. The constrained minimum spanning tree problem. SWAT (1996).

[23]

Idan Rejwan and Yishay Mansour. 2020. Top-k Combinatorial Bandits with Full-Bandit Feedback (Proc. of ALT).

[24]

Herbert Robbins. 1952. Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc., Vol. 58, 5 (1952), 527--535.

[25]

M. Sadegh Talebi and Alexandre Proutiere. 2016. An Optimal Algorithm for Stochastic Matroid Bandit Optimization. In Proc. of ICAAMS .

[26]

Siwei Wang and Wei Chen. 2018. Thompson Sampling for Combinatorial Semi-Bandits. In Proc. of ICML .

[27]

Zheng Wen, Branislav Kveton, and Azin Ashkan. 2015. Efficient learning in large-scale combinatorial semi-bandits. In Proc. of ICML .

Cited By

Venske Sde Almeida CDelgado M(2024)Metaheuristics and machine learning: an approach with reinforcement learning assisting neural architecture searchJournal of Heuristics10.1007/s10732-024-09526-130:3-4(199-224)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1007/s10732-024-09526-1
Du BQian KClaudel CSun D(2023)Multiagent Online Source Seeking Using Bandit AlgorithmIEEE Transactions on Automatic Control10.1109/TAC.2022.323219068:5(3147-3154)Online publication date: May-2023
https://doi.org/10.1109/TAC.2022.3232190
Liu XZuo JWang SJoe-Wong CLui JChen WKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Batch-size independent regret bounds for combinatorial semi-bandits with probabilistically triggered arms or independent armsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601354(14904-14916)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601354
Show More Cited By

Index Terms

Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits
1. Mathematics of computing
  1. Discrete mathematics
    1. Combinatorics
      1. Combinatorial optimization
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Reinforcement learning

Recommendations

Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits
SIGMETRICS '21: Abstract Proceedings of the 2021 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems

We consider combinatorial semi-bandits over a set X ⊂ (0,1)d where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound R(T) = O ( d (łn m)2 (łn T) øver Δmin) after T rounds, where m = maxx ∈ ...
Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits
SIGMETRICS '21

We consider combinatorial semi-bandits over a set X ⊂ (0,1)d where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound R(T) = O ( d (łn m)2 (łn T) øver Δmin) after T rounds, where m = maxx ∈ ...
Combinatorial PCPs with Efficient Verifiers

The PCP theorem asserts the existence of proofs that can be verified by a verifier that reads only a very small part of the proof. The theorem was originally proved by Arora and Safra (J. ACM 45(1)) and Arora et al. (J. ACM 45(3)) using sophisticated ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems

Proceedings of the ACM on Measurement and Analysis of Computing Systems Volume 5, Issue 1

POMACS

March 2021

252 pages

EISSN:2476-1249

DOI:10.1145/3452093

Editors:
Augustin Chaintreau
Columbia University
,
Leana Golubchik
University of Southern California
,
Zhi-Li Zhang
University of Minnesota

Issue’s Table of Contents

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2021

Published in POMACS Volume 5, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
161
Total Downloads

Downloads (Last 12 months)29
Downloads (Last 6 weeks)5

Reflects downloads up to 24 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Venske Sde Almeida CDelgado M(2024)Metaheuristics and machine learning: an approach with reinforcement learning assisting neural architecture searchJournal of Heuristics10.1007/s10732-024-09526-130:3-4(199-224)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1007/s10732-024-09526-1
Du BQian KClaudel CSun D(2023)Multiagent Online Source Seeking Using Bandit AlgorithmIEEE Transactions on Automatic Control10.1109/TAC.2022.323219068:5(3147-3154)Online publication date: May-2023
https://doi.org/10.1109/TAC.2022.3232190
Liu XZuo JWang SJoe-Wong CLui JChen WKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Batch-size independent regret bounds for combinatorial semi-bandits with probabilistically triggered arms or independent armsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601354(14904-14916)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601354
Pesquerel FSaber HMaillard ORanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Stochastic bandits with groups of similar armsProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3541750(19461-19472)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3541750
Zhang RCombes RRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)On the suboptimality of thompson sampling in high dimensionsProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540899(8345-8354)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3540899

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents