Abstract
The informative vector machine (IVM) is a practical method for Gaussian process regression and classification. The IVM produces a sparse approximation to a Gaussian process by combining assumed density filtering with a heuristic for choosing points based on minimizing posterior entropy. This paper extends IVM in several ways. First, we propose a novel noise model that allows the IVM to be applied to a mixture of labeled and unlabeled data. Second, we use IVM on a block-diagonal covariance matrix, for “learning to learn” from related tasks. Third, we modify the IVM to incorporate prior knowledge from known invariances. All of these extensions are tested on artificial and real data.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baxter, J.: Learning internal representations. In: Proc. COLT, vol. 8, pp. 311–320. Morgan Kaufmann Publishers, San Francisco (1995)
Becker, S., Thrun, S., Obermayer, K. (eds.): Advances in Neural Information Processing Systems, vol. 15. MIT Press, Cambridge (2003)
Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998)
Caruana, R.: Multitask learning. Machine Learning 28(1), 41–75 (1997)
Chapelle, O., Weston, J., Schölkopf, B.: Cluster kernels for semi-supervised learning. In: Becker, et al. (ed.) [2]
Cortes, C., Vapnik, V.N.: Support vector networks. Machine Learning 20, 273–297 (1995)
Csató, L.: Gaussian Processes — Iterative Sparse Approximations. PhD thesis, Aston University (2002)
Csató, L., Opper, M.: Sparse representation for Gaussian process models. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Advances in Neural Information Processing Systems, vol. 13, pp. 444–450. MIT Press, Cambridge (2001)
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis. Chapman and Hall, Boca Raton (1995)
Kass, R.E., Steffey, D.: Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical Bayes models). Journal of the American Statistical Association 84, 717–726 (1989)
Lawrence, N.D., Jordan, M.I.: Semi-supervised learning via Gaussian processes. In: Advances in Neural Information Processing Systems, vol. 17. MIT Press, Cambridge (2005) (to appear)
Lawrence, N.D., Platt, J.C.: Learning to learn with the informative vector machine. In: Greiner, R., Schuurmans, D. (eds.) Proceedings of the International Conference in Machine Learning, vol. 21, pp. 512–519. Morgan Kauffman, San Francisco (2004)
Lawrence, N.D., Schölkopf, B.: Estimating a kernel Fisher discriminant in the presence of label noise. In: Brodley, C., Danyluk, A.P. (eds.) Proceedings of the International Conference in Machine Learning, vol. 18. Morgan Kauffman, San Francisco (2001)
Lawrence, N.D., Seeger, M., Herbrich, R.: Fast sparse Gaussian process methods: The informative vector machine. In: Becker, et al. (eds.) [2], pp. 625–632
MacKay, D.J.C.: Bayesian Methods for Adaptive Models. PhD thesis, California Institute of Technology (1991)
Minka, T.P.: A family of algorithms for approximate Bayesian inference. PhD thesis, Massachusetts Institute of Technology (2001)
Nabney, I.T.: Netlab: Algorithms for Pattern Recognition. Advances in Pattern Recognition. Springer, Berlin (2001), Code available from http://www.ncrg.aston.ac.uk/netlab/
Schölkopf, B., Burges, C.J.C., Vapnik, V.N.: Incorporating invariances in support vector learning machines. In: Vorbrüggen, J.C., von Seelen, W., Sendhoff, B. (eds.) ICANN 1996. LNCS, vol. 1112, pp. 47–52. Springer, Heidelberg (1996)
Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2001)
Seeger, M.: Covariance kernels from Bayesian generative models. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14, pp. 905–912. MIT Press, Cambridge (2002)
Seeger, M.: Bayesian Gaussian Process Models: PAC-Bayesian Generalisation Error Bounds and Sparse Approximations. PhD thesis, The University of Edinburgh (2004)
Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Conference on Computational Learning Theory, vol. 10, pp. 287–294. Morgan Kauffman, San Francisco (1992)
Sollich, P.: Probabilistic interpretation and Bayesian methods for support vector machines. In: Proceedings 1999 International Conference on Artificial Neural Networks, ICANN 1999, London, U.K, pp. 91–96, The Institution of Electrical Engineers (1999)
Thrun, S.: Is learning the n-th thing any easier than learning the first? In: Touretzky, et al. (eds.) [25], pp. 640–646
Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.): Advances in Neural Information Processing Systems, vol. 8. MIT Press, Cambridge (1996)
Vapnik, V.N.: Statistical Learning Theory. John Wiley and Sons, New York (1998)
Williams, C.K.I.: Computing with infinite networks. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems, vol. 9. MIT Press, Cambridge (1997)
Williams, C.K.I., Rasmussen, C.E.: Gaussian processes for regression. In: Touretzky, et al. (eds.) [28], pp. 514–520
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lawrence, N.D., Platt, J.C., Jordan, M.I. (2005). Extensions of the Informative Vector Machine. In: Winkler, J., Niranjan, M., Lawrence, N. (eds) Deterministic and Statistical Methods in Machine Learning. DSMML 2004. Lecture Notes in Computer Science(), vol 3635. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11559887_4
Download citation
DOI: https://doi.org/10.1007/11559887_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29073-5
Online ISBN: 978-3-540-31728-9
eBook Packages: Computer ScienceComputer Science (R0)