Article

Putting Bayes to sleep

Authors:

Wouter M. Koolen,

Dmitry Adamskiy,

Manfred K. WarmuthAuthors Info & Claims

NIPS'12: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1

Pages 135 - 143

Published: 03 December 2012 Publication History

Abstract

We consider sequential prediction algorithms that are given the predictions from a set of models as inputs. If the nature of the data is changing over time in that different models predict well on different segments of the data, then adaptivity is typically achieved by mixing into the weights in each round a bit of the initial prior (kind of like a weak restart). However, what if the favored models in each segment are from a small subset, i.e. the data is likely to be predicted well by models that predicted well before? Curiously, fitting such "sparse composite models" is achieved by mixing in a bit of all the past posteriors. This self-referential updating method is rather peculiar, but it is efficient and gives superior performance on many natural data sets. Also it is important because it introduces a long-term memory: any model that has done well in the past can be recovered quickly. While Bayesian interpretations can be found for mixing in a bit of the initial prior, no Bayesian interpretation is known for mixing in past posteriors. We build atop the "specialist" framework from the online learning literature to give the Mixing Past Posteriors update a proper Bayesian foundation. We apply our method to a well-studied multitask learning problem and obtain a new intriguing efficient update that achieves a significantly better bound.

References

[1]

[ABR07] Jacob Ducan Abernethy, Peter Bartlett, and Alexander Rakhlin. Multitask learning with expert advice. Technical report, University of California at Berkeley, January 2007.

[2]

[ARB08] Alekh Agarwal, Alexander Rakhlin, and Peter Bartlett. Matrix regularization techniques for online multitask learning, October 2008.

[3]

[BW02] Olivier Bousquet and Manfred K. Warmuth. Tracking a small set of experts by mixing past posteriors. Journal of Machine Learning Research, 3:363-396, 2002.

[4]

[CBGLS12] Nicolò Cesa-Bianchi, Pierre Gaillard, Gábor Lugosi, and Gilles Stoltz. A new look at shifting regret. CoRR, abs/1202.3323, 2012.

[5]

[CCBG10] Giovanni Cavallanti, Nicolò Cesa-Bianchi, and Claudio Gentile. Linear algorithms for online multitask classification. J. Mach. Learn. Res., 11:2901-2934, December 2010.

[6]

[CKZV10] Alexey Chernov, Yuri Kalnishkan, Fedor Zhdanov, and Vladimir Vovk. Supermartingales in prediction with expert advice. Theor. Comput. Sci., 411(29-30):2647-2669, June 2010.

[7]

[CV09] Alexey Chernov and Vladimir Vovk. Prediction with expert evaluators' advice. In Proceedings of the 20th international conference on Algorithmic learning theory, ALT'09, pages 8-22, Berlin, Heidelberg, 2009. Springer-Verlag.

[8]

[FSSW97] Y. Freund, R. E. Schapire, Y. Singer, and M. K. Warmuth. Using and combining predictors that specialize. In Proc. 29th Annual ACM Symposium on Theory of Computing, pages 334-343. ACM, 1997.

[9]

[GWBA02] Robert B. Gramacy, Manfred K. Warmuth, Scott A. Brandt, and Ismail Ari. Adaptive caching by refetching. In Suzanna Becker, Sebastian Thrun, and Klaus Obermayer, editors, NIPS, pages 1465-1472. MIT Press, 2002.

[10]

[HLSS00] David P. Helmbold, Darrell D. E. Long, Tracey L. Sconyers, and Bruce Sherrod. Adaptive disk spin-down for mobile computers. ACM/Baltzer Mobile Networks and Applications (MONET), pages 285-297, 2000.

[11]

[HW98] Mark Herbster and Manfred K. Warmuth. Tracking the best expert. Machine Learning, 32:151-178, 1998.

[12]

[KdR08] Wouter M. Koolen and Steven de Rooij. Combining expert advice efficiently. In Rocco Servedio and Tong Zang, editors, Proceedings of the 21st Annual Conference on Learning Theory (COLT 2008), pages 275-286, June 2008.

[13]

[Koo11] Wouter M. Koolen. Combining Strategies Efficiently: High-quality Decisions from Conflicting Advice. PhD thesis, Institute of Logic, Language and Computation (ILLC), University of Amsterdam, January 2011.

[14]

[KvE10] Wouter M. Koolen and Tim van Erven. Freezing and sleeping: Tracking experts that learn by evolving past posteriors. CoRR, abs/1008.4654, 2010.

[15]

[LPS09] Gabor Lugosi, Omiros Papaspiliopoulos, and Gilles Stoltz. Online multi-task learning with hard constraints. In COLT, 2009.

[16]

[RAB07] Alexander Rakhlin, Jacob Abernethy, and Peter L. Bartlett. Online discovery of similarity mappings. In Proceedings of the 24th international conference on Machine learning, ICML '07, pages 767-774, New York, NY, USA, 2007. ACM.

[17]

[SM99] Gil I. Shamir and Neri Merhav. Low complexity sequential lossless coding for piece-wise stationary memoryless sources. IEEE Trans. Info. Theory, 45:1498-1519, 1999.

[18]

[SRDV11] Avishek Saha, Piyush Rai, Hal Daumé III, and Suresh Venkatasubramanian. Online learning of multiple tasks and their relationships. In AISTATS, Ft. Lauderdale, Florida, 2011.

[19]

[VW98] Paul A.J. Volf and Frans M.J. Willems. Switching between two universal source coding algorithms. In Proceedings of the Data Compression Conference, Snowbird, Utah, pages 491-500, 1998.

Cited By

Herbster MPasteris STse LLarochelle HRanzato MHadsell RBalcan MLin H(2020)Online multitask learning with long-term memoryProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497216(17779-17791)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3497216
Muñoz Medina AVassilvitskii SYin DCuzzocrea AAllan JPaton NSrivastava DAgrawal RBroder AZaki MCandan SLabrinidis ASchuster AWang H(2018)Online Learning for Non-Stationary A/B TestsProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3271718(317-326)Online publication date: 17-Oct-2018
https://dl.acm.org/doi/10.1145/3269206.3271718

Putting Bayes to sleep
1. Computing methodologies

Recommendations

Latent-Space Variational Bayes

Variational Bayesian Expectation-Maximization (VBEM), an approximate inference method for probabilistic models based on factorizing over latent variables and model parameters, has been a standard technique for practical Bayesian inference. In this paper,...
Averaged collapsed variational bayes inference

This paper presents the Averaged CVB (ACVB) inference and oers convergence-guaranteed and practically useful fast Collapsed Variational Bayes (CVB) inferences. CVB inferences yield more precise inferences of Bayesian probabilistic models than ...
A Novel Bayes Model: Hidden Naive Bayes

Because learning an optimal Bayesian network classifier is an NP-hard problem, learning-improved naive Bayes has attracted much attention from researchers. In this paper, we summarize the existing improved algorithms and propose a novel Bayes model: ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'12: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1

December 2012

3328 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 03 December 2012

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 29 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Herbster MPasteris STse LLarochelle HRanzato MHadsell RBalcan MLin H(2020)Online multitask learning with long-term memoryProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497216(17779-17791)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3497216
Muñoz Medina AVassilvitskii SYin DCuzzocrea AAllan JPaton NSrivastava DAgrawal RBroder AZaki MCandan SLabrinidis ASchuster AWang H(2018)Online Learning for Non-Stationary A/B TestsProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3271718(317-326)Online publication date: 17-Oct-2018
https://dl.acm.org/doi/10.1145/3269206.3271718

View Options

View options

Media

Figures

Other

Tables

View Table of Contents