An Efficient Algorithm for Learning with Semi-bandit Feedback

Gergely Neu²² &
Gábor Bartók²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8139))

Included in the following conference series:

International Conference on Algorithmic Learning Theory

1715 Accesses
6 Citations

Abstract

We consider the problem of online combinatorial optimization under semi-bandit feedback. The goal of the learner is to sequentially select its actions from a combinatorial decision set so as to minimize its cumulative loss. We propose a learning algorithm for this problem based on combining the Follow-the-Perturbed-Leader (FPL) prediction method with a novel loss estimation procedure called Geometric Resampling (GR). Contrary to previous solutions, the resulting algorithm can be efficiently implemented for any decision set where efficient offline combinatorial optimization is possible at all. Assuming that the elements of the decision set can be described with d-dimensional binary vectors with at most m non-zero entries, we show that the expected regret of our algorithm after T rounds is $O(m\sqrt{dT\log d})$. As a side result, we also improve the best known regret bounds for FPL, in the full information setting to $O(m^{3/2}\sqrt{T\log d})$, gaining a factor of $\sqrt{d/m}$ over previous bounds for this algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Follow the perturbed approximate leader for solving semi-bandit combinatorial optimization

Article 16 July 2021

Linear Bandits in Unknown Environments

Hedging Under Uncertainty: Regret Minimization Meets Exponentially Fast Convergence

References

Allenberg, C., Auer, P., Györfi, L., Ottucsák, G.: Hannan consistency in on-line learning in case of unbounded losses under partial monitoring. In: Balcázar, J.L., Long, P.M., Stephan, F. (eds.) ALT 2006. LNCS (LNAI), vol. 4264, pp. 229–243. Springer, Heidelberg (2006)
Chapter Google Scholar
Audibert, J.-Y., Bubeck, S.: Regret bounds and minimax policies under partial monitoring. Journal of Machine Learning Research 11, 2635–2686 (2010)
MathSciNet Google Scholar
Audibert, J.Y., Bubeck, S., Lugosi, G.: Regret in online combinatorial optimization. To appear in Mathematics of Operations Research (2013)
Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1), 48–77 (2002)
Article MathSciNet MATH Google Scholar
Awerbuch, B., Kleinberg, R.D.: Adaptive routing with end-to-end feedback: distributed learning and geometric approaches. In: Proceedings of the 36th ACM Symposium on Theory of Computing, pp. 45–53 (2004)
Google Scholar
Bubeck, S., Cesa-Bianchi, N., Kakade, S.M.: Towards minimax policies for online linear optimization with bandit feedback. In: Proceedings of the 25th Annual Conference on Learning Theory (COLT), pp. 1–14 (2012)
Google Scholar
Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, New York (2006)
Book MATH Google Scholar
Cesa-Bianchi, N., Lugosi, G.: Combinatorial bandits. Journal of Computer and System Sciences 78, 1404–1422 (2012)
Article MathSciNet MATH Google Scholar
Dani, V., Hayes, T., Kakade, S.: The price of bandit information for online optimization. In: Advances in Neural Information Processing Systems (NIPS), vol. 20, pp. 345–352 (2008)
Google Scholar
György, A., Linder, T., Lugosi, G., Ottucsák, G.: The on-line shortest path problem under partial monitoring. Journal of Machine Learning Research 8, 2369–2403 (2007)
MATH Google Scholar
Hannan, J.: Approximation to Bayes risk in repeated play. Contributions to the Theory of Games 3, 97–139 (1957)
Google Scholar
Kalai, A., Vempala, S.: Efficient algorithms for online decision problems. Journal of Computer and System Sciences 71, 291–307 (2005)
Article MathSciNet MATH Google Scholar
Koolen, W., Warmuth, M., Kivinen, J.: Hedging structured concepts. In: Proceedings of the 23rd Annual Conference on Learning Theory (COLT), pp. 93–105 (2010)
Google Scholar
McMahan, H.B., Blum, A.: Online geometric optimization in the bandit setting against an adaptive adversary. In: Shawe-Taylor, J., Singer, Y. (eds.) COLT 2004. LNCS (LNAI), vol. 3120, pp. 109–123. Springer, Heidelberg (2004)
Chapter Google Scholar
Poland, J.: FPL analysis for adaptive bandits. In: Lupanov, O.B., Kasim-Zade, O.M., Chaskin, A.V., Steinhöfel, K. (eds.) SAGA 2005. LNCS, vol. 3777, pp. 58–69. Springer, Heidelberg (2005)
Chapter Google Scholar
Suehiro, D., Hatano, K., Kijima, S., Takimoto, E., Nagano, K.: Online prediction under submodular constraints. In: Bshouty, N.H., Stoltz, G., Vayatis, N., Zeugmann, T. (eds.) ALT 2012. LNCS, vol. 7568, pp. 260–274. Springer, Heidelberg (2012)
Chapter Google Scholar
Takimoto, E., Warmuth, M.: Paths kernels and multiplicative updates. Journal of Machine Learning Research 4, 773–818 (2003)
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Theory, Budapest University of Technology and Economics, Hungary
Gergely Neu
Department of Computer Science, ETH Zürich, Switzerland
Gábor Bartók

Authors

Gergely Neu
View author publications
You can also search for this author in PubMed Google Scholar
Gábor Bartók
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National University of Singapore, Republic of Singapore
Sanjay Jain & Frank Stephan &
Inria Lille - Nord Europe, Villeneuve d’Ascq, France
Rémi Munos
Hokkaido University, Sapporo, Japan
Thomas Zeugmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Neu, G., Bartók, G. (2013). An Efficient Algorithm for Learning with Semi-bandit Feedback. In: Jain, S., Munos, R., Stephan, F., Zeugmann, T. (eds) Algorithmic Learning Theory. ALT 2013. Lecture Notes in Computer Science(), vol 8139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40935-6_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-40935-6_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40934-9
Online ISBN: 978-3-642-40935-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics