Abstract
We study online regret minimization algorithms in an experts setting. In this setting, the algorithm chooses a distribution over experts at each time step and receives a gain that is a weighted average of the experts’ instantaneous gains. We consider a bicriteria setting, examining not only the standard notion of regret to the best expert, but also the regret to the average of all experts, the regret to any given fixed mixture of experts, or the regret to the worst expert. This study leads both to new understanding of the limitations of existing no-regret algorithms, and to new algorithms with novel performance guarantees. More specifically, we show that any algorithm that achieves only \(O(\sqrt{T})\) cumulative regret to the best expert on a sequence of T trials must, in the worst case, suffer regret \(\varOmega(\sqrt{T})\) to the average, and that for a wide class of update rules that includes many existing no-regret algorithms (such as Exponential Weights and Follow the Perturbed Leader), the product of the regret to the best and the regret to the average is, in the worst case, Ω(T). We then describe and analyze two alternate new algorithms that both achieve cumulative regret only \(O(\sqrt{T}\log T)\) to the best expert and have only constant regret to any given fixed distribution over experts (that is, with no dependence on either T or the number of experts N). The key to the first algorithm is the gradual increase in the “aggressiveness” of updates in response to observed divergences in expert performances. The second algorithm is a simple twist on standard exponential-update algorithms.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Auer, P., Cesa-Bianchi, N., & Gentile, C. (2002). Adaptive and self-confident on-line learning algorithms. Journal of Computer and System Sciences, 64, 48–75.
Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, learning, and games. Cambridge: Cambridge University Press.
Cesa-Bianchi, N., Mansour, Y., & Stoltz, G. (2007). Improved second-order bounds for prediction with expert advice. Machine Learning, 66(2/3), 321–352.
Cover, T. (1991). Universal portfolios. Mathematical Finance, 1(1), 1–19.
Even-Dar, E., Kearns, M., Mansour, Y., & Wortman, J. (2007). Regret to the best versus regret to the average. In Twentieth annual conference on learning theory (pp. 233–247).
Freund, Y. (2003). Predicting a binary sequence almost as well as the optimal biased coin. Information and Computation, 182(2), 73–94.
Helmbold, D., Schapire, R., Singer, Y., & Warmuth, M. (1998). On-line portfolio selection using multiplicative updates. Mathematical Finance, 8(4), 325–347.
Kalai, A., & Vempala, S. (2005). Efficient algorithms for on-line optimization. Journal of Computer and System Sciences, 71(3), 291–307.
Littlestone, N., & Warmuth, M. K. (1994). The weighted majority algorithm. Information and Computation, 108(2), 212–261.
Vovk, V. (1998). A game of prediction with expert advice. Journal of Computer and System Sciences, 56(2), 153–173.
Zhang, T. (2007). Personal communication.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Claudio Gentile, Nader H. Bshouty.
A preliminary version of this work appeared in the Proceedings of the Twentieth Annual Conference on Learning Theory, 2007 (Even-Dar et al. 2007).
This work was done while E. Even-Dar was a post doctoral researcher at the University of Pennsylvania.
Y. Mansour was supported in part by grant no. 1079/04 from the Israel Science Foundation, a grant from BSF, an IBM faculty award, and the IST Programme of the European Community, under the PASCAL Network of Excellence, IST-2002-506778. This paper reflects only the authors’ views.
Rights and permissions
About this article
Cite this article
Even-Dar, E., Kearns, M., Mansour, Y. et al. Regret to the best vs. regret to the average. Mach Learn 72, 21–37 (2008). https://doi.org/10.1007/s10994-008-5060-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-008-5060-z