Regularization: From Inverse Problems to Large-Scale Machine Learning

Ernesto De Vito¹⁷,
Lorenzo Rosasco¹⁷ &
Alessandro Rudi¹⁸

Part of the book series: Applied and Numerical Harmonic Analysis ((ANHA))

1206 Accesses
2 Citations
1 Altmetric

Abstract

We discuss regularization methods in machine learning with an emphasis on the interplay between statistical and computational aspects. We begin recalling a connection between inverse problem and machine learning. Then, we discuss how it allows to translate modeling principles into computational procedures, suggesting strategies to deal with large-scale learning problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Regularization

Gauss on least-squares and maximum-likelihood estimation

Article Open access 02 April 2022

Fundamental Concepts of Machine Learning

Notes

1.
More precisely, two continuous functions that are equal almost everywhere, they coincide only on $\mathcal{X}_\rho $.
2.
Or rather the second moment matrix.

References

Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68(3), 337–404 (1950). http://dx.doi.org/10.2307/1990404
Aymeric Dieuleveut, F.B.: Nonparametric stochastic approximation with large step-sizes. Ann. Stat. 44(4), 1363–1399 (2016). https://doi.org/10.1214/15-AOS1391
Article MathSciNet MATH Google Scholar
Bathia, R., Johnson, C.R.: Matrix analysis. SIAM Rev. 40(2), 413–413 (1998)
Google Scholar
Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7, 2399–2434 (2006)
MathSciNet MATH Google Scholar
Bertero, M., De Mol, C., Pike, E.R.: Linear inverse problems with discrete data. i. general formulation and singular system analysis. Inverse Probl. 1(4), 301 (1985)
Google Scholar
Blanchard, G., Bousquet, O., Zwald, L.: Statistical properties of kernel principal component analysis. Mach. Learn. 66(2–3), 259–294 (2007)
Article Google Scholar
Blanchard, G., Krämer, N.: Optimal learning rates for kernel conjugate gradient regression. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, Vancouver, British Columbia, Canada, pp. 226–234. Curran Associates, Inc. (2010). https://proceedings.neurips.cc/paper/2010/hash/b2f627fff19fda463cb386442eac2b3d-Abstract.html
Blanchard, G., Mücke, N.: Optimal rates for regularization of statistical inverse learning problems. Found. Comput. Math. 18(4), 971–1013 (2018)
Article MathSciNet Google Scholar
Bottou, L., Bousquet, O.: The Tradeoffs of Large Scale Learning. In: NIPS (2007)
Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Book Google Scholar
Bühlmann, P., Yu, B.: Boosting with the l 2 loss: regression and classification. J. Am. Stat. Assoc. 98(462), 324–339 (2003)
Article MathSciNet Google Scholar
Caponnetto, A., De Vito, E.: Optimal rates for the regularized least-squares algorithm. Found. Comput. Math. 7(3), 331–368 (2007)
Article MathSciNet Google Scholar
Caponnetto, A., Yao, Y.: Cross-validation based adaptation for regularization operators in learning theory. Anal. Appl. 8, 161–3183 (2010)
Google Scholar
Coifman, R.R., Lafon, S.: Geometric harmonics: a novel tool for multiscale out-of-sample extension of empirical functions. Appl. Comput. Harmon. Anal. 21(1), 31–52 (2006)
Article MathSciNet Google Scholar
Cucker, F., Smale, S.: On the mathematical foundation of learning. Am. Math. Soc. 39(1), 1–49 (2001)
Article MathSciNet Google Scholar
Cucker, F., Zhou, D.X.: Learning theory: an approximation theory viewpoint, vol. 24. Cambridge University Press (2007)
Google Scholar
De Vito, E., Caponnetto, A., Rosasco, L.: Model selection for regularized least-squares algorithm in learning theory. Found. Comput. Math. 5(1) (2005)
Google Scholar
De Vito, E., Rosasco, L., Caponnetto, A.: Discretization error analysis for Tikhonov regularization. Anal. Appl. (Singap.) 4(1), 81–99 (2006)
Article MathSciNet Google Scholar
De Vito, E., Rosasco, L., Caponnetto, A., Giovannini, U.D., Odone, F.: Learning from examples as an inverse problem. J. Mach. Learn. Res. 6(May), 883–904 (2005)
MathSciNet MATH Google Scholar
Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition, vol. 31. Springer Science & Business Media (2013)
Google Scholar
Dudley, R.: Real Analysis and Probability, Cambridge Studies in Advanced Mathematics, vol. 74. Cambridge University Press, Cambridge (2002). https://doi.org/10.1017/CBO9780511755347
Engl, H.W., Hanke, M., Neubauer, A.: Regularization of Inverse Problems, vol. 375. Springer (1996)
Google Scholar
Foucart, S., Rauhut, H.: A Mathematical Introduction to Compressive Sensing. Birkhäuser Basel (2013)
Google Scholar
Györfi, L., Kohler, M., Krzyzak, A., Walk, H.: A Distribution-Free Theory of Nonparametric Regression. Springer Science & Business Media (2006)
Google Scholar
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970)
Article Google Scholar
Hsu, D., Kakade, S.M., Zhang, T.: Random design analysis of ridge regression. J. Mach. Learn. Res.-Proc. Track 23, 9–1 (2012)
Google Scholar
Kress, R.: Linear Integral Equations, vol. 82. Springer Science & Business Media (1989)
Google Scholar
Landweber, L.: An iteration formula for fredholm integral equations of the first kind. Am. J. Math. 73, 615–624 (1951)
Article MathSciNet Google Scholar
Lang, S.: Real and functional analysis, Graduate Texts in Mathematics, vol. 142, 3rd edn. Springer, New York (1993). https://doi.org/10.1007/978-1-4612-0897-6
Lin, J., Rosasco, L.: Optimal rates for multi-pass stochastic gradient methods. J. Mach. Learn. Res. 18, 97:1–97:47 (2017). http://jmlr.org/papers/v18/17-176.html
Lin, J., Rudi, A., Rosasco, L., Cevher, V.: Optimal rates for spectral algorithms with least-squares regression over hilbert spaces. Appl. Comput. Harmon. Anal. 48(3), 868–890 (2020)
Article MathSciNet Google Scholar
Minsker, S.: On some extensions of bernstein’s inequality for self-adjoint operators. Stat. Probab. Lett. 127, 111–119 (2017)
Article MathSciNet Google Scholar
Mücke, N., Neu, G., Rosasco, L.: Beating SGD saturation with tail-averaging and minibatching. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 12,568–12,577 (2019). https://proceedings.neurips.cc/paper/2019/hash/4d0b954f0bef437c29dfa73fafdf3fa5-Abstract.html
Pagliana, N., Rosasco, L.: Implicit regularization of accelerated methods in hilbert spaces. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 14,454–14,464 (2019). https://proceedings.neurips.cc/paper/2019/hash/c61aed648da48aa3893fb3eaadd88a7f-Abstract.html
Pinelis, I.: Optimum bounds for the distributions of martingales in Banach spaces. Ann. Probab. 22(4), 1679–1706 (1994)
Article MathSciNet Google Scholar
Pinelis, I.: Correction: Optimum bounds for the distributions of martingales in Banach spaces. [Ann. Probab. 22(4), 1679–1706 (1994); MR1331198 (96b:60010)]. Ann. Probab. 27(4), 2119 (1999)
Google Scholar
Poggio, T., Girosi, F.: Networks for approximation and learning. Proc. IEEE 78(9), 1481–1497 (1990)
Article Google Scholar
Rahimi, A., Recht, B.: Random Features for Large-Scale Kernel Machines. In: NIPS, pp. 1177–1184. Curran Associates, Inc. (2007)
Google Scholar
Rosasco, L., Villa, S.: Learning with incremental iterative regularization. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pp. 1630–1638 (2015). https://proceedings.neurips.cc/paper/2015/hash/1587965fb4d4b5afe8428a4a024feb0d-Abstract.html
Rudi, A., Camoriano, R., Rosasco, L.: Less is more: Nyström computational regularization. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28, pp. 1648–1656. Curran Associates, Inc. (2015). http://papers.nips.cc/paper/5936-less-is-more-nystrom-computational-regularization.pdf
Rudi, A., Camoriano, R., Rosasco, L.: Generalization Properties of Learning with Random Features. (2016)
Google Scholar
Rudi, A., Canas, G.D., Rosasco, L.: On the sample complexity of subspace learning. Adv. Neural. Inf. Process. Syst. 26, 2067–2075 (2013)
Google Scholar
Smale, S., Zhou, D.X.: Shannon sampling and function reconstruction from point values. Bull. Am. Math. Soc. 41(3), 279–305 (2004)
Article MathSciNet Google Scholar
Smale, S., Zhou, D.X.: Learning theory estimates via integral operators and their approximations. Constr. Approx. 26(2), 153–172 (2007)
Article MathSciNet Google Scholar
Steinwart, I., Christmann, A.: Support Vector Machines. Springer New York (2008). https://books.google.de/books?id=HUnqnrpYt4IC
Steinwart, I., Hush, D.R., Scovel, C., et al.: Optimal rates for regularized least squares regression. In: COLT, pp. 79–93 (2009)
Google Scholar
Tropp, J.A.: User-friendly tools for random matrices: an introduction (2012)
Google Scholar
Vapnik, V.N., Vapnik, V.: Statistical Learning Theory, vol. 1. Wiley, New York (1998)
MATH Google Scholar
Wahba, G.: Spline Models for Observational Data. Society for Industrial and Applied Mathematics, Philadelphia (1990)
Google Scholar
Yen, E.H., Lin, T.W., Lin, S.D., Ravikumar, P.K., Dhillon, I.S.: Sparse Random Feature Algorithm as Coordinate Descent in Hilbert Space. In: Advances in Neural Information Processing Systems 27 (2014). http://papers.nips.cc/paper/5479-sparse-random-feature-algorithm-as-coordinate-descent-in-hilbert-space.pdf
Yurinsky, V.: Sums and Gaussian Vectors, vol. 1617. Springer, Berlin (1995)
Google Scholar

Download references

Acknowledgements

This chapter evolved from a set of notes for the Summer School “Applied Harmonic Analysis and Machine Learning” in Genova, as well as the Summer School “Structured Regularization for High-Dimensional Data Analysis” at IHP Paris, and the Spring School “Structural Inference” Malente, Germany. The results we present are an overview of the work with and of a number of collaborators and colleagues. Among others, we would like to thank Gilles Blanchard, Andrea Caponnetto, Luigi Carratino, Raffaello Camoriano, Junhong Lin, Nicole Mücke, Nicoló Pagliana, Steve Smale, Alessandro Verri, Silvia Villa and Ding-Xuan Zhou.

This material is based on work supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216. We gratefully acknowledge the support of NVIDIA Corporation for the donation of the Titan Xp GPUs and the Tesla k40 GPU used for this research. L. R. acknowledges the financial support of the European Research Council (grant SLING 819789), the AFOSR projects FA9550-18-1-7009, FA9550-17-1-0390 and BAA-AFRL-AFOSR-2016-0007 (European Office of Aerospace Research and Development), and the EU H2020-MSCA-RISE project NoMADS-DLV-777826. Part of this work was funded by the French government under management of Agence Nationale de la Recherche as part of the “Investissements d’ avenir program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute). A. R. acknowledges the financial support of the European Research Council (grant REAL 947908). E.D.V. is a member of the Gruppo Nazionale per l’Analisi Matematica, la Probabilità e le loro Applicazioni (GNAMPA) of the Istituto Nazionale di Alta Matematica (INdAM).

Author information

Authors and Affiliations

DIMA & MaLGA, Universita di Genova, Via Dodecaneso 35, Genova, Italy
Ernesto De Vito & Lorenzo Rosasco
Inria and Ecole Normale Supérieure, PSL Research University, Paris, France
Alessandro Rudi

Authors

Ernesto De Vito
View author publications
You can also search for this author in PubMed Google Scholar
Lorenzo Rosasco
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Rudi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ernesto De Vito .

Editor information

Editors and Affiliations

Dipartimento di Matematica, Università di Genova, Genova, Italy
Filippo De Mari
Dipartimento di Matematica, Università di Genova, Genova, Italy
Ernesto De Vito

Appendix

In this section, we collect the notation and some basic mathematical facts, adapted to our setting.

$\mathcal {H},\mathcal{G}$	Hilbert spaces
$\bigl \Vert {f}\bigr \Vert _{\mathcal {H}}$	norm of $f\in \mathcal {H}$
$\left\langle {f},{g}\right\rangle _{\mathcal {H}}$	scalar product between $f,h\in \mathcal {H}$
$\mathcal K^\perp =\{f\in \mathcal {H}:\left\langle {f},{f'}\right\rangle _\mathcal {H}=0\,\forall f'\in \mathcal K\}$	complement of a closed subspace $\mathcal K\subset \mathcal {H}$
$P_\mathcal K$	projection onto a closed subspace $\mathcal K\subset \mathcal {H}$
$\mathop \mathrm{ker}{A}=\{f\in \mathcal {H}:Af=0\}$	kernel of $A:\mathcal {H}\rightarrow \mathcal{G}$
${\text {ran}}{A}=\{Af\in \mathcal{G}:f\in \mathcal {H}\}$	range of $A:\mathcal {H}\rightarrow \mathcal{G}$
$A^*:\mathcal{G}\rightarrow \mathcal {H}$	adjoint of $A:\mathcal {H}\rightarrow \mathcal{G}$
$\|{A}\|=\sqrt{A^*A}$	absolute value of $A:\mathcal {H}\rightarrow \mathcal{G}$
$\bigl \Vert {A}\bigr \Vert _\infty = \sup _{f\in \mathcal {H}} \frac{\bigl \Vert {Af}\bigr \Vert _{\mathcal{G}}}{\Vert {f}\Vert _\mathcal {H}}$	operator norm of $A:\mathcal {H}\rightarrow \mathcal{G}$
$w\otimes v = \left\langle {\cdot },{v}\right\rangle _\mathcal {H} w:\mathcal {H}\rightarrow \mathcal{G}$	rank-one operator $v\in \mathcal {H}$, $w\in \mathcal{G}$
$(\Omega ,\mathcal F,\mathbb P)$	probability space
$\mathbb E[\xi ]$	expectation of the random variable $\xi :\Omega \rightarrow \mathcal {H}$

We recall the following results.

Trace class operators: an operator $A:\mathcal {H}\rightarrow \mathcal {H}$ is trace class if there exists (any) a base $(v_\ell )_{\ell \in \Lambda }$ of $\mathcal {H}$ such that the series $\sum _{\ell \in \Lambda } \left\langle {|{A}|v_\ell },{v_\ell }\right\rangle _\mathcal {H}$ converges and
$$ {\text {Tr}} A = \sum _{\ell \in \Lambda } \left\langle {A v_\ell },{v_\ell }\right\rangle _\mathcal {H} $$
is called the trace of A. An $A:\mathcal {H}\rightarrow \mathcal{G}$ is trace class if $|{A}|$ is trace class and the space $\mathcal S_1(\mathcal {H},\mathcal{G})$ of trace class operators from $\mathcal {H}$ to $\mathcal{G}$ is a Banach space with respect the norm
$$ \bigl \Vert {A}\bigr \Vert _{ \mathcal S_1(\mathcal {H},\mathcal{G})} = {\text {Tr}}(|{A}|). $$
Hilbert–Schmidt operators: $A:\mathcal {H}\rightarrow \mathcal{G}$ is a Hilbert–Schmidt operator if $A^*A$ is a trace class operator. The space $\mathcal S_2(\mathcal {H},\mathcal{G})$ of Hilbert–Schmidt operators from $\mathcal {H}$ to $\mathcal{G}$ is a Hilbert space with respect to the scalar product
$$ \left\langle {A},{B}\right\rangle _{ \mathcal S_2(\mathcal {H},\mathcal{G})} ={\text {Tr}}(B^*A), $$
and $\mathcal S_1(\mathcal {H},\mathcal{G})\subset \mathcal S_2(\mathcal {H},\mathcal{G})$.
Hilbert–Schmidt theorem: if $A:\mathcal {H}\rightarrow \mathcal {H}$ is a Hilbert–Schmidt self-adjoint operator, there exists a base $(v_\ell )_{\ell \in \Lambda }$ of $\mathcal {H}$ and a sequence $(v_\ell )_{\ell \in \Lambda }$ of real numbers such that
$$\begin{aligned} A v_\ell =\sigma _\ell v_\ell . \end{aligned}$$
Functional calculus: if $A:\mathcal {H}\rightarrow \mathcal {H}$ is a Hilbert–Schmidt self-adjoint operator and $\varphi :{\mathbb {R}}\rightarrow {\mathbb {R}}$ is a bounded function, by the functional calculus the operator $\varphi (A):\mathcal {H}\rightarrow \mathcal {H}$ is defined as
$$ \varphi (A) f =\sum _{\ell \in \Lambda } \varphi (\sigma _\ell ) \left\langle {f},{v_\ell }\right\rangle _\mathcal {H} v_\ell . $$
where the series converges in $\mathcal {H}$.
For any operator $A:\mathcal {H}\rightarrow \mathcal{G}$
$$\begin{aligned} \Vert A\Vert ^2_\infty = \Vert AA^*\Vert _\infty \end{aligned}$$
(68)
Cordes inequality: for any pair of positive operator $A, B:\mathcal {H}\rightarrow \mathcal {H}$ and $s \in (0,1]$
$$\begin{aligned} \Vert A^s B^s\Vert _\infty \le \Vert A B\Vert ^s_\infty \end{aligned}$$
(69)
Take two operators $B,C:\mathcal {H}\rightarrow \mathcal{G}$ such that
$$\begin{aligned} B B^* \le CC^*, \end{aligned}$$
(70)
then
$$\begin{aligned} \bigl \Vert {AB}\bigr \Vert _\infty \le \bigl \Vert {AC}\bigr \Vert _\infty \end{aligned}$$
(71)
for any operator $A:\mathcal{G}\rightarrow \mathcal{G}'$. Indeed, by (70)
$$ ABB^*A^* \le ACC^*A^* \qquad \Longrightarrow \qquad \bigl \Vert {ABB^*A^*}\bigr \Vert _\infty \le \bigl \Vert {ACC^*A^*}\bigr \Vert _\infty . $$
Then,
$$\begin{aligned} \bigl \Vert {AB}\bigr \Vert ^2_\infty&= \bigl \Vert {ABB^*A^*}\bigr \Vert _\infty \le \bigl \Vert {CBB^*C^*}\bigr \Vert _\infty = \bigl \Vert {AC}\bigr \Vert ^2_\infty . \end{aligned}$$
If $A:\mathcal {H}\rightarrow \mathcal{G}$ and $\lambda >0$
$$\begin{aligned} ((A^*A + \lambda I)^{-1}A^*A - I) = \lambda (A^*A + \lambda I)^{-1} \end{aligned}$$
(72)
see [22].
For any positive operator $C:\mathcal {H}\rightarrow \mathcal {H}$ and projection $P:\mathcal {H}\rightarrow \mathcal {H}$
$$\begin{aligned} P(PCP + \lambda I)^{-1}P \le P(C + \lambda I)^{-1}P \end{aligned}$$
(73)
see, for example, Theorem V.2.3-(iv) of [3] for the proof of this property of operator convex functions, and Cor. V.2.6 together with Exercise V.1.10-(ii) and Exercise V.2.11, in the same book, for the proof of operator convexity of $(\cdot +\lambda )^{-1}$.
For any pair $A,B:\mathcal {H}\rightarrow \mathcal {H}$ of positive operators with bounded inverse such that
$$\begin{aligned} \bigl \Vert {B^{-\frac{1}{2}} (A-B) B^{-\frac{1}{2}}}\bigr \Vert _{\infty } < 1 \end{aligned}$$
(74)
then
$$\begin{aligned} \bigl \Vert {A^{-\frac{1}{2}} B^{\frac{1}{2}} }\bigr \Vert _{\infty }&\le \sqrt{\frac{1}{1 - \bigl \Vert {B^{-\frac{1}{2}} (A-B) B^{-\frac{1}{2}}}\bigr \Vert _{\infty }}}. \end{aligned}$$
(75)
Indeed, observe that
$$\begin{aligned} (A^{-\frac{1}{2}} B^{\frac{1}{2}})^* A^{-\frac{1}{2}} B^{\frac{1}{2}}&= B^{\frac{1}{2}} A^{-1} B^{\frac{1}{2}}=B^{\frac{1}{2}} \left( A-B+B\right) ^{-1} B^{\frac{1}{2}} \\&= B^{\frac{1}{2}} \left( B^{\frac{1}{2}}\left( B^{-\frac{1}{2}}(A-B) B^{-\frac{1}{2}}+I\right) B^{\frac{1}{2}}\right) ^{-1} B^{\frac{1}{2}}\\&= \left( B^{-\frac{1}{2}}(A-B) B^{-\frac{1}{2}}+I\right) ^{-1}=\sum _{\ell =0}^{+\infty } (-1)^\ell \left( B^{-\frac{1}{2}}(A-B) B^{-\frac{1}{2}}\right) ^{\ell }, \end{aligned}$$
where the Neumann series converges by (74). Hence, triangular inequality gives
$$\begin{aligned} \bigl \Vert {A^{-\frac{1}{2}} B^{\frac{1}{2}}}\bigr \Vert ^2_\infty&= \bigl \Vert {\left( B^{-\frac{1}{2}}(A-B) B^{-\frac{1}{2}}+I\right) ^{-1}}\bigr \Vert _{\infty } \le \frac{1}{1 - \bigl \Vert { B^{-\frac{1}{2}} (A-B) B^{-\frac{1}{2}}}\bigr \Vert _{\infty }}. \end{aligned}$$
Höeffidng inequality in separable Hilbert spaces [35, 36, 51] : take a family
$$\begin{aligned} \xi _1,\ldots ,\xi _n:\Omega \rightarrow \mathcal {H}\end{aligned}$$
of independent zero mean random variables such that $\Vert {\xi _i}\Vert _\mathcal {H}\le \kappa $, then for all $\epsilon >0$
$$ \mathbb P\left[ \Vert {\frac{1}{n} \sum _{i=1}^n \xi _i}\Vert _\mathcal {H} >\epsilon \right] \le 2 \exp \left( -\frac{\epsilon ^2 n }{4 \kappa ^2}\right) , $$
i.e., for all $\tau >0$ with probability at least $1- 2 e^{-\tau }$
$$\begin{aligned} \Vert {\frac{1}{n} \sum _{i=1}^n \xi _i}\Vert _\mathcal {H} \le \dfrac{2\kappa \sqrt{\tau }}{\sqrt{n}}. \end{aligned}$$
(76)

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

De Vito, E., Rosasco, L., Rudi, A. (2021). Regularization: From Inverse Problems to Large-Scale Machine Learning. In: De Mari, F., De Vito, E. (eds) Harmonic and Applied Analysis. Applied and Numerical Harmonic Analysis. Birkhäuser, Cham. https://doi.org/10.1007/978-3-030-86664-8_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-86664-8_5
Published: 14 December 2021
Publisher Name: Birkhäuser, Cham
Print ISBN: 978-3-030-86663-1
Online ISBN: 978-3-030-86664-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

\(\mathcal {H},\mathcal{G}\)	Hilbert spaces
\(\bigl \Vert {f}\bigr \Vert _{\mathcal {H}}\)	norm of \(f\in \mathcal {H}\)
\(\left\langle {f},{g}\right\rangle _{\mathcal {H}}\)	scalar product between \(f,h\in \mathcal {H}\)
\(\mathcal K^\perp =\{f\in \mathcal {H}:\left\langle {f},{f'}\right\rangle _\mathcal {H}=0\,\forall f'\in \mathcal K\}\)	complement of a closed subspace \(\mathcal K\subset \mathcal {H}\)
\(P_\mathcal K\)	projection onto a closed subspace \(\mathcal K\subset \mathcal {H}\)
\(\mathop \mathrm{ker}{A}=\{f\in \mathcal {H}:Af=0\}\)	kernel of \(A:\mathcal {H}\rightarrow \mathcal{G}\)
\({\text {ran}}{A}=\{Af\in \mathcal{G}:f\in \mathcal {H}\}\)	range of \(A:\mathcal {H}\rightarrow \mathcal{G}\)
\(A^*:\mathcal{G}\rightarrow \mathcal {H}\)	adjoint of \(A:\mathcal {H}\rightarrow \mathcal{G}\)
\(\|{A}\|=\sqrt{A^*A}\)	absolute value of \(A:\mathcal {H}\rightarrow \mathcal{G}\)
\(\bigl \Vert {A}\bigr \Vert _\infty = \sup _{f\in \mathcal {H}} \frac{\bigl \Vert {Af}\bigr \Vert _{\mathcal{G}}}{\Vert {f}\Vert _\mathcal {H}}\)	operator norm of \(A:\mathcal {H}\rightarrow \mathcal{G}\)
\(w\otimes v = \left\langle {\cdot },{v}\right\rangle _\mathcal {H} w:\mathcal {H}\rightarrow \mathcal{G}\)	rank-one operator \(v\in \mathcal {H}\), \(w\in \mathcal{G}\)
\((\Omega ,\mathcal F,\mathbb P)\)	probability space
\(\mathbb E[\xi ]\)	expectation of the random variable \(\xi :\Omega \rightarrow \mathcal {H}\)

Regularization: From Inverse Problems to Large-Scale Machine Learning

Abstract

Access this chapter

Subscribe and save

Buy Now