Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3044805.3044976guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A unifying view of representer theorems

Published: 21 June 2014 Publication History

Abstract

It is known that the solution of regularization and interpolation problems with Hilbertian penalties can be expressed as a linear combination of the data. This very useful property, called the representer theorem, has been widely studied and applied to machine learning problems. Analogous optimality conditions have appeared in other contexts, notably in matrix regularization. In this paper we propose a unified view, which generalizes the concept of representer theorems and extends necessary and sufficient conditions for such theorems to hold. Our main result shows a close connection between representer theorems and certain classes of regularization penalties, which we call orthomonotone functions. This result not only subsumes previous representer theorems as special cases but also yields a new class of optimality conditions, which goes beyond the classical linear combination of the data. Moreover, orthomonotonicity provides a useful criterion for testing whether a representer theorem holds for a specific regularization problem.

References

[1]
Abernethy, J., Bach, F., Evgeniou, T., and Vert, J-P. A new approach to collaborative filtering: Operator estimation with spectral regularization. Journal of Machine Learning Research, 10:803-826, 2009.
[2]
Amit, Y., Fink, M., Srebro, N., and Ullman, S. Uncovering shared structures in multiclass classification. In Proceedings of the Twenty-Fourth International Conference on Machine learning, 2007.
[3]
Argyriou, A., Evgeniou, T., and Pontil, M. Convex multitask feature learning. Machine Learning, 73(3):243-272, 2008.
[4]
Argyriou, A., Micchelli, C. A., and Pontil, M. When is there a representer theorem? Vector versus matrix regularizers. Journal of Machine Learning Research, 10:2507-2529, 2009.
[5]
Argyriou, A., Micchelli, C.A., and Pontil, M. On spectral learning. The Journal of Machine Learning Research, 11:935-953, 2010.
[6]
Belkin, M., Niyogi, P., and Sindhwani, V. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7:2399-2434, 2006.
[7]
De Vito, E., Rosasco, L., Caponnetto, A., Piana, M., and Verri, A. Some properties of regularized kernel methods. Journal of Machine Learning Research, 5:1363-1390, 2004.
[8]
Dinuzzo, F. Learning output kernels for multi-task problems. Neurocomputing, 118:119-126, 2013.
[9]
Dinuzzo, F. and Fukumizu, K. Learning low-rank output kernels. Journal of Machine Learning Research, 20:181-196, 2011.
[10]
Dinuzzo, F. and Schölkopf, B. The representer theorem for Hilbert spaces: a necessary and sufficient condition. In Advances in Neural Information Processing Systems 25, pp. 189-196, 2012.
[11]
Dinuzzo, F., Neve, M., De Nicolao, G., and Gianazza, U. P. On the representer theorem and equivalent degrees of freedom of SVR. Journal of Machine Learning Research, 8:2467-2495, 2007.
[12]
Evgeniou, T., Micchelli, C. A., and Pontil, M. Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6:615-637, 2005.
[13]
Girosi, F. An equivalence between sparse approximation and support vector machines. Neural Computation, 10(6):1455-1480, 1998.
[14]
Gnecco, G. and Sanguineti, M. Regularization techniques and suboptimal solutions to optimization problems in learning from data. Neural Computation, 22(3):793-829, 2010.
[15]
Jain, P., Kulis, B., and Dhillon, I. S. Inductive regularized learning of kernel functions. In Advances in Neural Information Processing Systems, pp. 946-954, 2010.
[16]
Jain, P., Kulis, B., Davis, J. V., and Dhillon, I. S. Metric and kernel learning using a linear transformation. The Journal of Machine Learning Research, 13:519-547, 2012.
[17]
Kimeldorf, G. S. and Wahba, G. A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. The Annals ofMathematical Statistics, 41(2):495-502, 1970.
[18]
Kulis, B., Saenko, K., and Darrell, T. What you saw is not what you get: Domain adaptation using asymmetric kernel transforms. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1785-1792, 2011.
[19]
Lafferty, J., Zhu, X., and Liu, Y. Kernel conditional random fields: representation and clique selection. In Proceedings of the Twenty-First International Conference on Machine learning, pp. 64, 2004.
[20]
Micchelli, C. A. and Pontil, M. On learning vector-valued functions. Neural Computation, 17:177-204, 2005.
[21]
Muandet, K., Fukumizu, K., Dinuzzo, F., and Schölkopf, B. Learning from distributions via support measure machines. In Advances in Neural Information Processing Systems 25, pp. 10-18. MIT Press, 2012.
[22]
Mukherjee, S. and Wu, Q. Estimation of gradients and coordinate covariation in classification. The Journal of Machine Learning Research, 7:2481-2514, 2006.
[23]
Pillai, N. S., Wu, Q., Liang, F., Mukherjee, S., and Wolpert, R. L. Characterizing the function space for Bayesian kernel models. Journal of Machine Learning Research, 8:1769-1797, 2007.
[24]
Schölkopf, B. and Smola, A. J. Learning with Kernels. MIT Press, 2002.
[25]
Schölkopf, B., Herbrich, R., and Smola., A.J. A generalized representer theorem. In Proceedings of the Fourteenth Annual Conference on Computational Learning Theory, 2001.
[26]
Signoretto, M., De Lathauwer, L., and Suykens, J. A. K. Learning tensors in reproducing kernel Hilbert spaces with multilinear spectral penalties. arXiv preprint arXiv:1310.4977, 2013.
[27]
Sriperumbudur, B. K., Gretton, A., Fukumizu, K., Schölkopf, B., and Lanckriet, G.R.G. Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research, 11:1517-1561, 2010.
[28]
Steinwart, I. Sparseness of support vector machines. Journal of Machine Learning Research, 4:1071-1105, 2003.
[29]
Warmuth, M., Kotlowski, W., and Zhou, S. Kernelization of matrix updates, when and how? In Algorithmic Learning Theory, pp. 350-364. Springer, 2012.
[30]
Yu, Y., Cheng, H., Schuurmans, D., and Szepesvari, C. Characterizing the representer theorem. In Proceedings of the 30th International Conference on Machine Learning, pp. 570-578, 2013.
[31]
Zhang, H. and Zhang, J. Regularized learning in Banach spaces as an optimization problem: representer theorems. Journal of Global Optimization, 54(2):235-250, 2012.

Cited By

View all
  • (2017)Adapting kernel representations online using submodular maximizationProceedings of the 34th International Conference on Machine Learning - Volume 7010.5555/3305890.3305995(3037-3046)Online publication date: 6-Aug-2017
  1. A unifying view of representer theorems

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      ICML'14: Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32
      June 2014
      2786 pages

      Publisher

      JMLR.org

      Publication History

      Published: 21 June 2014

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 14 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2017)Adapting kernel representations online using submodular maximizationProceedings of the 34th International Conference on Machine Learning - Volume 7010.5555/3305890.3305995(3037-3046)Online publication date: 6-Aug-2017

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media