Abstract
We provide a new upper bound for sampling numbers \((g_n)_{n\in \mathbb {N}}\) associated with the compact embedding of a separable reproducing kernel Hilbert space into the space of square integrable functions. There are universal constants \(C,c>0\) (which are specified in the paper) such that
where \((\sigma _k)_{k\in \mathbb {N}}\) is the sequence of singular numbers (approximation numbers) of the Hilbert–Schmidt embedding \(\mathrm {Id}:H(K) \rightarrow L_2(D,\varrho _D)\). The algorithm which realizes the bound is a least squares algorithm based on a specific set of sampling nodes. These are constructed out of a random draw in combination with a down-sampling procedure coming from the celebrated proof of Weaver’s conjecture, which was shown to be equivalent to the Kadison–Singer problem. Our result is non-constructive since we only show the existence of a linear sampling operator realizing the above bound. The general result can for instance be applied to the well-known situation of \(H^s_{\text {mix}}(\mathbb {T}^d)\) in \(L_2(\mathbb {T}^d)\) with \(s>1/2\). We obtain the asymptotic bound
which improves on very recent results by shortening the gap between upper and lower bound to \(\sqrt{\log (n)}\). The result implies that for dimensions \(d>2\) any sparse grid sampling recovery method does not perform asymptotically optimal.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In this paper, we study a well-known problem on the optimal recovery of multivariate functions from n function samples. The problem turned out to be rather difficult in several relevant situations. Since we want to recover the function f from n function samples \((f(\mathbf {x}^1),\ldots ,f(\mathbf {x}^n))\), the problem boils down to the question of how to choose these sampling nodes \(\mathbf {X}= (\mathbf {x}^1,\ldots ,\mathbf {x}^n)\) and corresponding recovery algorithms. The minimal worst-case error for an optimal choice is reflected by the n-th sampling number defined by
The functions are modeled as elements from a separable reproducing kernel Hilbert space H(K) of functions on a set \(D \subset \mathbb {R}^d\) with finite trace kernel \(K(\cdot ,\cdot )\), i.e.,
The recovery problem (in the above framework) has been first addressed by Wasilkowski and Woźniakowski in [42]. The corresponding problem for certain particular cases (e.g., classes of functions with mixed smoothness properties, see [7, Sect. 5]) has been studied much earlier. Our main result is the existence of two universal constants \(C,c>0\) (specified in Remark 6.3) such that the relation
holds true between the sampling numbers \((g_n)_{n\in \mathbb {N}}\) and the square summable singular numbers \((\sigma _k)_{k\in \mathbb {N}}\) of the compact embedding
We emphasize that in general, the square-summability of the singular numbers \((\sigma _k)_{k\in \mathbb {N}}\) is not implied by the compactness of the embedding \(\mathrm {Id}_{K,\varrho _D}\). This is one reason why we need the additional assumption of a finite trace kernel (2) (or a Hilbert–Schmidt embedding). In addition, as it has been observed by Hinrichs et al. [10], the non-existing trace may cause the sampling numbers to have a worse (or even no) polynomial decay than the corresponding polynomially decaying singular numbers \((\sigma _k)_k\). Hence, an inequality (like (3)) which passes on the polynomial decay of the singular numbers to the sampling numbers is in general impossible without the condition of a finite trace (2). In our main example, the recovery of multivariate functions with dominating mixed smoothness (see Sect. 7), this condition is equivalent to \(s>1/2\), where s denotes the mixed smoothness parameter. For further historical and technical comments (e.g., non-separable RKHS) we refer to Remark 6.2.
The algorithm which realizes the bound (3) in the sense of (1) is a (linear) least squares algorithm based on a specific set of sampling nodes. These are constructed out of a random draw in combination with a down-sampling procedure coming from the proof of Weaver’s conjecture [26], see Sect. 2. In its original form, the result in [26] is not applicable for our purpose. That is why we have to slightly generalize it, see Theorem 2.3. Note that the result in (3) is non-constructive. We do not have a deterministic construction for a suitable set of nodes. However, we have control of the failure probability which can be made arbitrarily small. In addition, the subspace, where the least squares algorithm is taking place is precisely given and determined by the first m singular vectors.
The problem discussed in the present paper is tightly related to the problem of the Marcinkiewicz discretization of \(L_2\)-norms for functions from finite-dimensional spaces (e.g., trigonometric polynomials). In fact, constructing well-conditioned matrices for the least squares approximation is an equivalent issue. Let us emphasize that V.N. Temlyakov (and coauthors) already used the Nitzan et al. construction [26] for the Marcinkiewicz discretization problem in the context of multivariate (hyperbolic cross) polynomials, see [34, 35] and the very recent paper [21].
Compared to the result by Krieg and M.Ullrich [15], the relation (3) is stronger. In fact, the difference is mostly in the \(\log \)-exponent as the example below shows. The general relation (3) yields a significant improvement in the situation of mixed Sobolev embeddings in \(L_2\), see Sect. 7. Applied, for instance, to the situation of \(H^s_{\text {mix}}(\mathbb {T}^d)\) in \(L_2(\mathbb {T}^d)\) with \(s>1/2\) (this condition is equivalent to the finite trace condition (2)), the result in (3) yields
whereas the result in [15] (see also [13, 24, 41]) implies
The \(\log \)-gap grows with \(s>1/2\). Our new result achieves rates that are only worse by \(\sqrt{\log (n)}\) in comparison with the benchmark rates given by the singular numbers. Note that in \(d\ge 3\) and any \(s>1/2\) the bound (4) yields a better performance than any sparse grid technique is able to provide, see [2, 3, 6, 8, 31, 33] and [7, Sect. 5]. In addition, combining the above result with recent preasymptotic estimates for the \((\sigma _j)_j\), see [14, 17,18,19], we are able to obtain reasonable bounds for \(g_n\) also in the case of small n. See Sect. 7 for further comments and references in this direction.
Krieg and M.Ullrich [15] used a sophisticated random sampling strategy which allowed for establishing a new connection between sampling numbers and singular values. Let us emphasize that this can be considered as a major progress in this field. In addition, the result in this paper partly relies on this random sampling strategy according to a distribution built upon spectral properties of the embedding. The advantage of the pure random strategy in connection with a \(\log (n)\)-oversampling is the fact that the failure probability decays polynomially in n which has been recently shown by M.Ullrich [41] and, independently, by Moeller together with the third named author [24]. In other words, although this approach incorporates a probabilistic ingredient, the failure probability is controlled and the algorithm may be implemented. Note that there are some obvious parallels to the field of compressed sensing, where also the measurement matrix is drawn at random and satisfies RIP with high probability.
Notation. As usual, \(\mathbb {N}\) denotes the natural numbers, \(\mathbb {N}_0{:}{=}\mathbb {N}\cup \{0\}\), \(\mathbb {Z}\) denotes the integers, \(\mathbb {R}\) the real numbers and \(\mathbb {R}_+\) the nonnegative real numbers and \(\mathbb {C}\) the complex numbers. For a natural number m, we set \([m] {:}{=} \{1, \ldots , m\}\). We will also use \(\dot{\cup }\) to emphasize that a union is disjoint. If not indicated otherwise \(\log (\cdot )\) denotes the natural logarithm of its argument. \(\mathbb {C}^n\) denotes the complex n-space, whereas \(\mathbb {C}^{m\times n}\) denotes the set of all \(m\times n\)-matrices \(\mathbf {L}\) with complex entries. Vectors and matrices are usually typesetted boldface with \(\mathbf {x},\mathbf {y}\in \mathbb {C}^n\). The matrix \(\mathbf {L}^{*}\) denotes the adjoint matrix. The spectral norm of matrices \(\mathbf {L}\) is denoted by \(\Vert \mathbf {L}\Vert \) or \(\Vert \mathbf {L}\Vert _{2\rightarrow 2}\). For a complex (column) vector \(\mathbf {y}\in \mathbb {C}^n\) (or \(\ell _2\)), we will often use the tensor notation for the matrix
For \(0<p\le \infty \) and \(\mathbf {x}\in \mathbb {C}^n\), we denote \(\Vert \mathbf {x}\Vert _p {:}{=} (\sum _{i=1}^n |x_i|^p)^{1/p}\) with the usual modification in the case \(p=\infty \) or \(\mathbf {x}\) being an infinite sequence. As usual, we will denote with \(\mathbb {E}X\) the expectation of a random variable X on a probability space \((\varOmega , \mathcal {A}, \mathbb {P})\). Given a measurable subset \(D\subset \mathbb {R}^d\) and a measure \(\varrho \), we denote with \(L_2(D,\varrho )\) the space of all square integrable complex-valued functions (equivalence classes) on D with \(\int _D |f(\mathbf {x})|^2\, d\varrho (\mathbf {x})<\infty \). We will often use \(\varOmega = D^n\) as probability space with the product measure \(\mathbb {P}= d\varrho ^n\) if \(\varrho \) is a probability measure itself.
2 Weaver’s Theorem
In this section, we prove a modified version of Weaver’s \(\text {KS}_2\)-theorem, also known as Weaver’s \(\text {KS}_2\)-conjecture, from [43] which was shown to be equivalent to the famous Kadison–Singer conjecture [12] dating back as far as 1959. For a long time, these statements were mere conjectures and many people even believed them to be false. Since the celebrated proof given by Marcus et al. [22] in 2015, however, they have turned into actual theorems and thus into rather strong tools for various applications, and it is in fact the Weaver \(\text {KS}_2\)-conjecture that is at the heart of our argument in this article. We need it in a slightly modified form, however, formulated in Theorem 2.3. The starting point for its proof is the following reformulation of the classical Weaver statement which already occurred in [26]. We will formulate it with slightly improved constants, see [25].
Theorem 2.1
[26] Let \(0 < \varepsilon \) and \(\mathbf {u}_1, \ldots , \mathbf {u}_n \in \mathbb {C}^m\) with \(\Vert \mathbf {u}_i\Vert _2^2 \le \varepsilon \) for all \(i=1, \ldots , n\) and
for all \(\mathbf {w} \in \mathbb {C}^m\). Then, there is a partition \(S_1 \dot{\cup } S_2 = [n]\) with
for each \(j=1, 2\) and all \(\mathbf {w} \in \mathbb {C}^m\). In particular, we have
for each \(j=1, 2\) and all \(\mathbf {w} \in \mathbb {C}^m\).
Note that the above statement is trivial for \(\varepsilon \ge (2+\sqrt{2})^{-2}\), since in this case the lower bound is \(\le 0\) and the upper bound is \(\ge 1\). Relaxing condition (5), one obtains an analogous statement for non-tight frames.
Corollary 2.2
[26] Let \(0 < \varepsilon \) and \(\mathbf {u}_1, \ldots , \mathbf {u}_n \in \mathbb {C}^m\) with \(\Vert \mathbf {u}_i\Vert _2^2 \le \varepsilon \) for all \(i=1, \ldots , n\) and
for all \(\mathbf {w} \in \mathbb {C}^m\), where \(\beta \ge \alpha > 0\) are some fixed constants. Then, there is a partition \(S_1 \dot{\cup } S_2 = [n]\), such that
for each \(j=1, 2\) and all \(\mathbf {w} \in \mathbb {C}^m\).
Again, the above statement is trivial for \(\varepsilon /\alpha \ge (2+\sqrt{2})^{-2}\). Now we are ready to formulate and prove the theorem which is convenient for our later purpose. The proof technique of this theorem is analogous to the one used for the proof of Lemma 2 in [26]. After the preprint was finished, V.N. Temlyakov pointed out to us that their proof of Lemma 2.2 in their recent paper [21], which is stated in a weaker form, also contains a version of the theorem below with unspecified constants.
Theorem 2.3
Let \(k_1, k_2, k_3 > 0\) and \(\mathbf {u}_1, \ldots , \mathbf {u}_n \in \mathbb {C}^m\) with \(\Vert \mathbf {u}_i\Vert _2^2 \le k_1 \frac{m}{n}\) for all \(i=1, \ldots , n\) and
for all \(\mathbf {w} \in \mathbb {C}^m\). Then, there is a \(J \subseteq [n]\) of size \(\# J \le c_1 m\) with
for all \(\mathbf {w} \in \mathbb {C}^m\), where \(c_1, c_2, c_3\) only depend on \(k_1, k_2, k_3\). More precisely, we can choose
in case \(\frac{n}{m}\ge 47 \frac{k_1}{k_2}\). In the regime \(1\le \frac{n}{m}< 47 \frac{k_1}{k_2}\), one may put \(c_1 = 47k_1/k_2\), \(c_2 = k_2\), \(c_3=47k_1 k_3/k_2\).
Proof
To ease the notation a bit, let us set \(\zeta {:}{=} 2+\sqrt{2}\). Put \(\delta {:}{=} k_1 \frac{m}{n}\), \(\alpha _0 {:}{=} k_2\), \(\beta _0 {:}{=} k_3\) and define recursively
for \(\ell \in \mathbb {N}_0\). Assume for the moment that \(\delta < (2\zeta )^{-2}k_2\). We want to show that there is a constant \(\gamma >0\), not depending on \(\delta \) and an \(L \in \mathbb {N}\), such that \(\alpha _\ell \ge (2\zeta )^2\delta \) for all \(\ell \le L\) as well as \(\zeta ^2 \delta \le \alpha _{L+1} < (2\zeta )^2\delta \) and \(\beta _{L+1}< \gamma \alpha _{L+1} < (2\zeta )^2\gamma \delta \).
Notice that
is strictly increasing in \(\alpha _\ell > 0\). For \(\alpha _\ell \ge (2\zeta )^2\delta \), we thus have
and therefore
Set \(L {:}{=} \max \left\{ \ell \in \mathbb {N}_0: \alpha _\ell \ge (2\zeta )^2\delta \right\} \) so that \(\alpha _\ell \ge (2\zeta )^2\delta \) for all \(\ell \le L\). Notice that, since \(\alpha _{\ell +1} < \alpha _\ell /2\) as long as \(\alpha _\ell \ge (2\zeta )^2\delta \), we have \(L < \infty \). Since \(\alpha _0 = k_2 > (2\zeta )^2\delta \), we also have \(L \ge 0\).
The definition of L directly yields \(\alpha _{L+1} < (2\zeta )^2\delta \), but by the above also
It remains to find a \(\gamma > 0\) as described above. To do so, first observe by the definition of the \(\alpha _\ell \) and \(\beta _\ell \) that
We have \(\alpha _L \ge (2\zeta )^2\delta \) so that \(\zeta \sqrt{\delta /\alpha _L} \le 2^{-1}\) and using
inductively we get \(\zeta \sqrt{\delta /\alpha _{L-\ell }} \le 2^{-1-\ell /2}\) for \(\ell = 0, 1, \ldots , L\). Thus,
which yields the final claim.
With this at hand, consider the situation of the theorem. Clearly, we have \(n\ge m\) due to the lower frame bound \(k_2>0\). We now distinguish two cases. Firstly, if \(1\le \frac{n}{m} < 47\frac{k_1}{k_2}\) the assertion follows directly for \(J = [n]\) and the choice \(c_1 = \frac{n}{m}\), \(c_2 = k_2 \frac{n}{m}\), and \(c_3 = k_3 \frac{n}{m}\). Incorporating the bounds for n/m gives the choice in the statement of the theorem.
In the second case, when \(\frac{n}{m}\ge 47\frac{k_1}{k_2}\), let \(\alpha _\ell \), \(\beta _\ell \) be as above and note that \(\delta = k_1 \frac{m}{n} \le k_2/47 < (2\zeta )^{-2}k_2\). The vectors \(\mathbf {u}_i\) fulfill the assumptions of Corollary 2.2 for \(\alpha = \alpha _0 = k_2\) and \(\beta = \beta _0 = k_3\), so that there is a set \(J_1 \subseteq [n]\) with
for all \(\mathbf {w} \in \mathbb {C}^m\). By choosing the smaller of the two partition classes \(S_1\) or \(S_2\), we may assume \(\# J_1 \le \frac{1}{2} n\). We can now apply Corollary 2.2 again, where we restrict ourselves to the indices in \(J_1\). We thus get a \(J_2 \subseteq J_1\) with
for all \(\mathbf {w} \in \mathbb {C}^m\). Again, by choosing the smaller partition class, we may assume \(\# J_2 \le \frac{1}{2} \# J_1 \le \frac{1}{4} n\). After \(L+1\) applications of Corollary 2.2, we get
for all \(\mathbf {w} \in \mathbb {C}^m\), where \(J_{L+1} \subseteq [n]\) with \(\#J_{L+1} \le 2^{-(L+1)} n\). By what was proven in the first part of this proof, we therefore get
for all \(\mathbf {w} \in \mathbb {C}^m\). We thus get the assertion for \(J = J_{L+1}\), \(c_2 = \zeta ^2 k_1\) and \(c_3 = (2\zeta )^2 \gamma k_1\le 35.21\cdot (2\zeta )^2 \frac{k_1k_3}{k_2}\). As for \(c_1\), look at the quantities
for \(\ell = 0, 1, \ldots , L+1\), where we set \(J_0 {:}{=} [n]\). Then, \(\phi _0 = \beta _0/n = k_3/n\). Since
we see that the \(\phi _\ell \) are monotonically increasing. Thus,
so that
i.e., \(c_1 \le 35.21 \cdot (2\zeta )^2k_1/k_2\). \(\square \)
3 Reproducing Kernel Hilbert Spaces
We will work in the framework of reproducing kernel Hilbert spaces. The relevant theoretical background can be found in [1, Chapt. 1] and [4, Chapt. 4]. The papers [9] and [32] are also of particular relevance for the subject of this paper.
Let \(L_2(D,\varrho _D)\) be the space of complex-valued square-integrable functions with respect to \(\varrho _D\). Here \(D \subset \mathbb {R}^d\) is an arbitrary measurable subset and \(\varrho _D\) a measure on D. We further consider a reproducing kernel Hilbert space H(K) with a Hermitian positive definite kernel K on \(D \times D\). The crucial property of reproducing kernel Hilbert spaces is the fact that Dirac functionals are continuous, or, equivalently, the reproducing property
holds for all \(\mathbf {x}\in D\).
We will use the notation from [4, Chapt. 4]. In the framework of this paper, the finite trace of the kernel is given by
The embedding operator
is Hilbert–Schmidt under the finite trace condition (6), see [9, 32, Lemma 2.3], which we always assume from now on. We additionally assume that H(K) is at least infinite dimensional. Let us denote the (at most) countable system of strictly positive eigenvalues \((\lambda _j)_{j\in \mathbb {N}}\) of \(W_{K,\varrho _D} = \mathrm {Id}_{K,\varrho _D}^{*} \circ \mathrm {Id}_{K,\varrho _D}\) arranged in non-increasing order, i.e.,
We will also need the left and right singular vectors \((e_k)_k \subset H(K)\) and \((\eta _k)_k \subset L_2(D,\varrho _D)\) which both represent orthonormal systems in the respective spaces related by \(e_k = \sigma _k \eta _k\) with \(\lambda _k = \sigma _k^2\) for \(k\in \mathbb {N}\). We would like to emphasize that the embedding (7) is not necessarily injective. In other words, for certain kernels there might also be a nontrivial null-space of the embedding in (7). Therefore, the system \((e_k)_k\) from above is not necessarily a basis in H(K). It would be a basis under additional restrictions, e.g., if the kernel \(K(\cdot ,\cdot )\) is continuous and bounded (i.e., a Mercer kernel). It is shown in [9, 32, Lemma 2.3] that if \({\text {tr}}(K)<\infty \) and H(K) is separable the nonnegative function
vanishes almost everywhere. Let us finally define the “spectral functions”
and
provided that they exist.
4 Weighted Least Squares
Let us begin with concentration inequalities for the spectral norm of sums of complex rank-1 matrices. Such matrices appear as \(\mathbf {L}^*\mathbf {L}\) when studying least squares solutions of over-determined linear systems
where \(\mathbf {L}\in \mathbb {C}^{n \times m}\) is a matrix with \(n>m\). It is well known that the above system may not have a solution. However, we can ask for the vector \(\mathbf {c}\) which minimizes the residual \(\Vert \mathbf {f}-\mathbf {L}\cdot \mathbf {c}\Vert _2\). Multiplying the system with \(\mathbf {L}^*\) gives
which is called the system of normal equations. If \(\mathbf {L}\) has full rank, then the unique solution of the least squares problem is given by
For function recovery problems, we will use the following matrix
for \(\mathbf {X}= (\mathbf {x}^1,\ldots ,\mathbf {x}^n) \in D^n\) of distinct sampling nodes and a system \((\eta _k(\cdot ))_{k}\) of functions. Here \(\mathbf {y}^i {:}{=} (\eta _1(\mathbf {x}^i),\ldots ,\eta _{m-1}(\mathbf {x}^i)), i=1,\ldots ,n\).
Lemma 4.1
[13, Proposition 3.1] Let \(\mathbf {L}\in \mathbb {C}^{n\times m}\) be a matrix with \(m\le n\) with full rank and singular values \(\tau _1,\ldots ,\tau _m >0\) arranged in non-increasing order.
-
(i)
Then, also the matrix \((\mathbf {L}^{*}\mathbf {L})^{-1}\mathbf {L}^{*}\) has full rank and singular values \(\tau _m^{-1},\ldots ,\tau _1^{-1}\) (arranged in non-increasing order).
-
(ii)
In particular, it holds that
$$\begin{aligned} (\mathbf {L}^{*}\mathbf {L})^{-1}\mathbf {L}^{*} = \mathbf {V}^{*} \tilde{{{\varvec{\varSigma }}}}\mathbf {U}\end{aligned}$$whenever \(\mathbf {L}= \mathbf {U}^{*} {{\varvec{\varSigma }}} \mathbf {V}\), where \(\mathbf {U}\in \mathbb {C}^{n\times n}\) and \(\mathbf {V}\in \mathbb {C}^{m\times m}\) are orthogonal matrices and \({{\varvec{\varSigma }}} \in \mathbb {R}^{n\times m}\) is a rectangular matrix with only \((\tau _1,\ldots ,\tau _m)\) on the main diagonal. Here \(\tilde{{{\varvec{\varSigma }}}} \in \mathbb {R}^{m\times n}\) denotes the matrix with \((\tau _1^{-1},\ldots ,\tau _m^{-1})\) on the main diagonal.
-
(iii)
The operator norm \(\Vert (\mathbf {L}^{*}\mathbf {L})^{-1}\mathbf {L}^{*}\Vert _{2 \rightarrow 2}\) can be controlled as follows:
$$\begin{aligned} \tau _1^{-1} \le \Vert (\mathbf {L}^{*}\mathbf {L})^{-1}\mathbf {L}^{*}\Vert _{2 \rightarrow 2} \le \tau _m^{-1}. \end{aligned}$$
Being in the RKHS setting, we compute the coefficients \(c_k\), \(k=1,\ldots ,m-1\), of the approximant
using the least squares algorithm (11). We will also use the weighted version below, where \(\varrho _m(\cdot )\) is a density function which essentially first appeared in [15] and has been adapted in [24] to
Note that the mapping \(f \mapsto \widetilde{S}^m_{\mathbf {X}}f\) is well defined and linear for a fixed set of sampling nodes
if the matrix \(\widetilde{\mathbf {L}}_{n,m}\) has full (column) rank. The next section gives sufficient conditions when this is the case.
5 Concentration Results for Random Matrices
We start with a concentration inequality for the spectral norm of a matrix of type (12). It turns out that the complex matrix \(\mathbf {L}_{n,m}{:}{=}\mathbf {L}_{n,m}(\mathbf {X})\in \mathbb {C}^{n\times (m-1)}\) has full rank with high probability, if \(\mathbf {X}= (\mathbf {x}^1,\ldots ,\mathbf {x}^n)\) is drawn at random from \(D^n\) according to a measure \(\mathbb {P}= d\varrho ^n\), the functions \((\eta _k)_{k=1}^{m-1}\) are orthonormal w.r.t the measure \(\varrho \) and m is not too large (compared to n). We will find below that the eigenvalues of
are bounded away from zero with high probability if m is small enough compared to n. The following result is a consequence of [40, Thm. 1.1], see also [24, Thm. 2.3, Cor. 2.5].
Theorem 5.1
For \(n \ge m\), \(r > 1\), we immediately obtain that the matrix \(\mathbf {H}_m\) has only eigenvalues greater than 1/2 and smaller than 3/2 with probability at least \(1 - 2n^{1-r}\) if the nodes are sampled i.i.d. according to \(\varrho \) and
where N(m) is the quantity defined in (9). Equivalently, we have for all \(\mathbf {w}\in \mathbb {C}^{m-1}\)
and
with probability at least \(1-2n^{1-r}\).
Let us now turn to infinite matrices. We need a result which can be applied to independent \(\ell _2\)-sequences of the form
where \((e_k)_k \subset H(K)\) is the system of right singular vectors of the embedding \(\mathrm {Id}:H(K) \rightarrow L_2\) defined above.
The following infinite-dimensional concentration result is proved in . There are earlier versions for the finite-dimensional framework (matrices) proved by Tropp [40], Oliveira [28], Rauhut [30] and others. Mendelson, Pajor [23] and also Oliveira [28] comment on infinite versions of their result. The key feature of the following proposition is the exact control of the constants and the decaying fail probability, see also Remark 3.10 in [24] for a more detailed comparison to earlier results.
Proposition 5.2
Let \(\mathbf {y}^i , i= 1 \dots n \), be i.i.d random sequences from \( \ell _2\). Let further \(n \ge 3\), \(r > 1\), \(M>0\) such that \(\Vert \mathbf {y}^i \Vert _2 \le M\) for all \(i=1 \dots n\) almost surely and \( \mathbb {E}\mathbf {y}^i \otimes \mathbf {y}^i = {{\varvec{\varLambda }}}\) for all \(i=1 \dots n\). Then,
where \(F {:}{=} \max \Big \{ \frac{8r \log n}{n} M^2 \kappa ^2 , \Vert {{\varvec{\varLambda }}} \Vert _{2 \rightarrow 2} \Big \}\) and \( \kappa = \frac{1+\sqrt{5}}{2}\) .
This can be written in a more compact form.
Theorem 5.3
Let \(\mathbf {y}^i , i= 1 \dots n, \) be i.i.d random sequences from \( \ell _2\). Let further \(n \ge 3\), \(M>0\) such that \(\Vert \mathbf {y}^i \Vert _2 \le M\) for all \(i=1 \dots n\) almost surely and \( \mathbb {E}\mathbf {y}^i \otimes \mathbf {y}^i = {{\varvec{\varLambda }}}\) for \(i=1,\ldots ,n\) with \(\Vert \varvec{\varLambda }\Vert _{2\rightarrow 2} \le 1\). Then, for \(0<t<1\),
6 New Bounds for Sampling Numbers
We are interested in the question of optimal sampling recovery of functions from reproducing kernel Hilbert spaces in \(L_2(D,\varrho _D)\). The quantity we want to study is classically given by
and quantifies the recovery of functions out of n function values in the worst-case setting. The goal is to get reasonable bounds for this quantity in n, preferably in terms of the singular numbers of the embedding. Results on the decay properties of this quantity in the framework of RKHS have been given by several authors, see, e.g., [15, 20, 27] and the references therein (see also Remark 6.2). For a special case in the field of hyperbolic cross approximation, we refer to [7, Outstanding Open Problem 1.4]. Here we present a new upper bound in the general framework.
The main idea. In the following theorem, we apply Weaver’s theorem to a random frame. The idea is to construct a sampling operator \(\widetilde{S}^m_J\) using O(m) sampling nodes as follows. We draw nodes \(\mathbf {X}= (\mathbf {x}^1,\ldots ,\mathbf {x}^n)\) i.i.d. at random according to some measure \(\mu _m\) specified concretely in the proof below, where n scales as \(m\log m\). At this stage, we have too many sampling nodes, however a “good” frame in the sense of a well-conditioned matrix \(\mathbf {L}_{n,m}\) in (12). To this frame (rows of \(\mathbf {L}_{n,m}\)), we apply our modified Weaver theorem. The result is a shrunk well-conditioned sub-frame corresponding to a subset \((\mathbf {x}^i)_{i\in J}\) of the initial set of sample nodes. With this sub-frame (sub-matrix of \(\mathbf {L}_{n,m}\)), we solve the over-determined system via the least squares Algorithm 1. This represents the sampling operator. When it comes to the error analysis, we again benefit from the fact that we deal with a (not too small compared to n) subset of the original nodes \(\mathbf {X}\). The consequence is that we do not pay too much (\(\sqrt{\log n}\)) compared to the sampling operator based on the original set \(\mathbf {X}\) of nodes. However, we only used O(m) sample nodes which makes the difference.
Theorem 6.1
Let H(K) be a separable reproducing kernel Hilbert space on a set \(D \subset \mathbb {R}^d\) with a positive semidefinite kernel \(K:D\times D\rightarrow \mathbb {C}\) satisfying
for some measure \(\varrho _D\) on D. Then, \(\mathrm {Id}_{K,\varrho _D}:H(K) \rightarrow L_2(D,\varrho _D)\) is a Hilbert–Schmidt embedding, the corresponding sequence of singular numbers \((\sigma _k)_{k=1}^{\infty }\) square-summable. For the sequence of sampling numbers \(g_n{:}{=}g_n(\mathrm {Id}_{K,\varrho _D})\), we have the general bound
with two universal constants \(C,c>0\), which are specified in Remark 6.3.
Proof
Let \(m \ge 2\). Similarly, as in [13, 15] and [24], we use the density function (14) in order to consider the embedding \(\widetilde{\mathrm {Id}}:H(\widetilde{K}_m) \rightarrow L_2(D,\varrho _m(\cdot )d\varrho _D)\) instead of \(\mathrm {Id}_{K,\varrho _D}:H(K) \rightarrow L_2(D,\varrho _D)\). In fact, we define the new kernel
This yields
and \(\widetilde{N}(m) \le 2(m-1)\), \(\widetilde{T}(m) \le 2\sum _{j=m}^{\infty }\sigma _j^2\). For the details of this, see the discussion in the proof of [13, Thm. 5.9] and [13, Thm. 5.5]. Note that the operators \(\widetilde{S}_{\mathbf {X}}^m\) and \(S_{\mathbf {X}}^m\) are defined by Algorithm 1 and (11). The number of samples will be chosen later as O(m). Choose now the smallest n such that
This implies \(\widetilde{N}(m) \le 2(m-1) \le n/(20 \log (n))\). Applying Theorem 5.1 with \(r=2\) gives that the rows of the matrix \(\frac{1}{\sqrt{n}}\widetilde{\mathbf {L}}_{n,m}\) represent a finite frame with frame bounds 1/2 and 3/2 with high probability (the failure probability \(2n^{-1}\) decays polynomially in n) when \(\mathbf {X}= (\mathbf {x}^1,\ldots ,\mathbf {x}^n) \in D^n\) is sampled w.r.t to the measure \(d\mu _m = \varrho _m(\cdot )d\varrho _D\). That means we have with high probability for any \(\mathbf {w}\in \mathbb {C}^{m-1}\)
Let us now denote with \(P_{m-1}:H(\widetilde{K}_m) \rightarrow H(\widetilde{K}_m)\) the projection operator onto \(\text{ span }\{e_1(\cdot )/\sqrt{\varrho _m(\cdot )},\ldots ,e_{m-1}(\cdot )/\sqrt{\varrho _m(\cdot )}\}\). Following the proof in [24, Thm. 5.1], we obtain almost surely
with \({{\varvec{\varLambda }}} = \mathrm {diag}(\sigma _m^2,\sigma _{m+1}^2,\ldots )\) and
being infinite matrices/operators with sequences \(\mathbf {y}^i = 1/\sqrt{\varrho _m(\mathbf {x}^i)}\left( e_m(\mathbf {x}^i),e_{m+1}(\mathbf {x}^i),\right. \left. \dots \right) \), \( i=1 \dots n\). By Proposition 5.2, we finally get from this with high probability
for some constant \(c_1>0\), where we used that
Due to the high probability of both events, (22) and (24), there exists an instance of nodes \(\{\mathbf {x}^1,\ldots ,\mathbf {x}^n\}\) such that the bound (24) is true and \(\frac{1}{\sqrt{n}}\widetilde{\mathbf {L}}_{n,m}\) represents a finite frame in \(\mathbb {C}^{m-1}\). Note that all the assumptions of Theorem 2.3 are fulfilled. Indeed, the squared Euclidean norms of the rows of \(\frac{1}{\sqrt{n}}\widetilde{\mathbf {L}}_{n,m}\) are bounded by \(2(m-1)/n\). To this finite frame, we may thus apply Theorem 2.3 which constructs a sub-matrix \(\widetilde{\mathbf {L}}_{J,m}\) having \(\# J = O(m)\) rows which, properly normalized, still form a frame in \(\mathbb {C}^{m-1}\). It holds for all \(\mathbf {w}\in \mathbb {C}^{m-1}\)
With this matrix we perform the least squares method (11) applied to the shrunk vector of function samples \((f(\mathbf {x}^i))_{i\in J}\) corresponding to the rows of \(\widetilde{\mathbf {L}}_{J,m}\). We denote the least squares operator with \(S^m_J\). Note that this operator uses only \(\# J = O(m)\) samples. Let \(\Vert g\Vert _{H(\widetilde{K}_m)}\le 1\) and again denote with \(P_{m-1}:H(\widetilde{K}_m) \rightarrow H(\widetilde{K}_m)\) the projection operator onto \(\text{ span }\{e_1(\cdot )/\sqrt{\varrho _m(\cdot )},\ldots ,e_{m-1}(\cdot )/\sqrt{\varrho _m(\cdot )}\}\). Then, it holds
Using Lemma 4.1 together with (25) gives
By the choice of n together with (24), we may estimate further
Consequently, by (21) we obtain
and finally, using that \(\# J = O(m)\),
where all the involved constants are universal. This implies the statement of the theorem. \(\square \)
Remark 6.2
(i) The additional assumption on the “separability” of H(K) ensures the equality sign in the inequality
The identity is crucial in the proof of Theorem 6.1, see also [13, Thm. 5.5]. Without an equality in (28), the best known general upper bound for the sampling numbers \(g_n\) can be found in the recent paper [24]. In fact, combining the proof of Theorem 6.1 with the one in [24, Thm. 7.1], one can prove
The bound is worse compared to (19). However, one should rather compare to the bound in [42], since this proof also works without equality in (28). There the authors proved under the same assumptions
(ii) As far as we know, Wasilkowski and Woźniakowski [42] were the first who addressed the sampling recovery problem in 2001 in the context of reproducing kernel Hilbert spaces. They obtained results using exclusively the finite trace condition (2). Later in 2009, Kuo et al. [20] did further progress by determining the range for the “power of standard information” when knowing the decay rate of the singular numbers. We refer to the monograph [27] for a detailed historical discussion of the development until 2012. The recent progress in the field has been initiated by Krieg and M.Ullrich [15] in 2019. The authors proved under stronger assumptions than in Theorem 6.1 an existence result, namely
which represents a slightly weaker bound. Further progress (explicit constants, consequences for numerical integration) has been given in Kämmerer et al. [13] and in M.Ullrich [41] as well as Moeller and T.Ullrich [24] for the control of the failure probability.
(iii) To further demonstrate the usefulness of the tools developed here, let us mention that they have already been used by others. In fact, while this manuscript was under review, Krieg and M.Ullrich [16] used this technique to observe that for non-Hilbert function space embeddings the “gap” is also at most \(\sqrt{\log (n)}\) if the corresponding approximation numbers are p-summable with \(p<2\). Finally, we would like to mention a very recent result by Temlyakov [36], where the sampling numbers \(g_n\) of a function class \(\mathbf {F}\) in \(L_2\) are related to the Kolmogorov numbers in \(L_\infty \). This allows for treating the case of “small smoothness”, where the square summability does not hold [37, 38].
Remark 6.3
Note that, using Theorem 2.3, Theorem 5.1 and Proposition 5.2, we can get explicit values for the constants in the above theorem. Concretely, we can conclude that (19) holds for \(m\ge 13136\) with \(C=1.5 \cdot 10^6 \) and \(c=3.8 \cdot 10^{-5}\).
To verify this, let \(m \ge 2\) and \(n=n(m)\), as in the proof of Theorem 6.1, such that
For \(m=2\) we get \(n=497\) and since n increases as m increases, we generally have \(n \ge 497\). Put \(r=2\). We then also have (22) with probability at least \(1-2/n \ge 1-2/497\) and (24) with probability at least \(1-2^{3/4}/n \ge 1-2^{3/4}/497\). Since these already sum up to \( (1-2/497) + (1-2^{3/4}/497) > 1\), we can guarantee the existence of a node set \(\{\mathbf{x}^{1}, \ldots , \mathbf{x}^{n}\}\) as in the proof of Theorem 6.1.
In the proof, we apply Theorem 2.3 with \(k_1 = 2, k_2 = 1/2, k_3 = 3/2\), so that we get the respective constants
For this, note that for all \(m\ge 2\) and \(n=n(m)\) as above we have
To calculate \(c_1\), we apply Proposition 5.2 with \(M^2 = 2 \sum _{k=m}^\infty \sigma _k^2\), so that we get
Using \(\log (n)/n \le 1/40m\), we can then estimate further
so that \(c_1 = 3.09\dots \).
After applying Theorem 2.3, we get \(c_2 = \tilde{c}_2 = 2(2+\sqrt{2})^2\) and \(C_2 = \tilde{c}_3 = 9852\), as well as \(\# J \le 6568m\).
Since
Lemma 4.1 (iii) gives
so that we may choose \(c_3 = 1/c_2 = (2+\sqrt{2})^{-2}/2\).
To obtain \(c_4\), we use \(n/m \le \frac{n}{n-1} \cdot 40 \log (n-1) \le \frac{497\log (496)}{496\log (497)} \cdot 40 \log (n)\) to estimate
and get \(c_4 = 6.31\dots \).
As for \(c_5\), we start with
so that
Furthermore, we have \(\log (n-1)-\log (40)-\log \log (n-1) < \log (m)\) and
from which we conclude \(\log (n) \le \vartheta ^{-1} \log (m)\). We then have
with \(c_5 = 113.35\dots \).
Finally observe that we can choose \(c_6=\tilde{c}_1=6568\) due to \(\# J \le \tilde{c}_1m=6568m\).
Now take \(\tilde{m}\ge 2\tilde{c}_1\) and \(m=\lfloor \tilde{m}/\tilde{c}_1 \rfloor \ge 2\). Further note that \(\lfloor \tilde{m}/2\tilde{c}_1 \rfloor = \lfloor \lfloor \tilde{m}/\tilde{c}_1 \rfloor /2\rfloor \). Then
The asserted estimates now follow due to \(2\tilde{c}_1=13136\) and \(2 c_5 \tilde{c}_1 \le 1.5 \cdot 10^6\) and \(1/(4\tilde{c}_1) \ge 3.8 \cdot 10^{-5}\).
Remark 6.4
The interesting question remains, whether there is a situation where the above bound on sampling numbers is sharp. Let us refer to the next subsection for a possible candidate. Clearly, there are situations where the bound in Theorem 6.1 does not reflect the correct behavior of sampling numbers. This is, for instance, the case for the univariate Sobolev embedding \(\mathrm {Id}:H^1([0,1]) \rightarrow L_2([0,1])\) where the sampling numbers show, at least asymptotically, the same behavior as the singular numbers.
7 An Outstanding Open Problem
Let us once again comment on an important open problem for the optimal sampling recovery of multivariate functions. We consider the minimal worst-case error (sampling numbers/widths) defined by
Let us comment on the class \(H^s_{\text {mix}}(\mathbb {T}^d)\). That is, we consider functions on the d-dimensional torus \(\mathbb {T}^d \simeq [0,1)^d\), where \(\mathbb {T}\) stands for [0, 1] with endpoints identified. Note that the unit cube \([0,1]^d\) is preferred here since it has Lebesgue measure 1 and is therefore a probability space. We could have also worked with \([0,2\pi ]^d\) and the Lebesgue measure (which can be made a probability measure by a d-dependent rescaling).
There are many different ways to define function spaces of dominating mixed smoothness, see [7, Chapt. 3]. We choose an approach which is closely related to [18, Sect. 2.1], see also (2.6) there. In fact, \(L_2(\mathbb {T}^d)\)-norms of mixed derivatives of the multivariate function f can be written in terms of Fourier coefficients \(\hat{f}_{\mathbf {k}}\) of f. For \(\alpha \in \mathbb {N}\) we define the space \(H^{\alpha }_{\text{ mix }}(\mathbb {T}^d)\) as the Hilbert space with the inner product
\(D^{(j_1,\ldots ,j_d)}=\partial _1^{j_1}\cdots \partial _d^{j_d}\) thereby denotes the weak derivative operator. Defining the weight
and the univariate kernel function
directly leads to
which is a reproducing kernel for \(H^\alpha _{\text{ mix }}(\mathbb {T}^d)\). In particular, for any \(f\in H^\alpha _{\text{ mix }}(\mathbb {T}^d)\) we have
The kernel defined in (32) associated with the inner product (30) can be extended to the case of fractional smoothness \(s>0\) replacing \(\alpha \) by s in (31)–(32) which in turn leads to the inner product
in terms of the Fourier coefficients \(\hat{f}_{\mathbf {k}}\), \(\hat{g}_{\mathbf {k}}\) and the corresponding norm. The (ordered) sequence \((\lambda _j)_{j=1}^{\infty }\) of eigenvalues of the corresponding mapping \(W_{s,d} = \mathrm {Id}_{s,d}^{*}\circ \mathrm {Id}_{s,d}\), where \(\mathrm {Id}:\mathrm {H}\big (K^d_{s}\big ) \rightarrow L_2(\mathbb {T}^d)\), is the non-increasing rearrangement of the numbers
It has been shown by various authors, see [7, Chapt. 4] and the references therein, that we have asymptotically (\(s>0\))
The correct asymptotic behavior of (29) has been addressed by several authors in the information-based complexity (IBC) community, see, e.g., [27] and also [7, Outstanding Open Problem 1.4]. It is nowadays well known, see, e.g., [6, 31, 39] and [7, Sec. 5] for some historical remarks that for \(s>1/2\) the bound
holds asymptotically in \(n\in \mathbb {N}\). Note that there is a d-depending gap in the logarithm between upper and lower bound.
Recently, Krieg and M.Ullrich [15] improved this bound by using a probabilistic technique to show that for \(s>1/2\)
Clearly, if \(s<(d-1)/2\), then the gap in (34) is reduced to \((\log n)^s\), which is still growing in s. In particular, there is no improvement if \(d=2\). However, this result can be considered as a major progress for the research on the complexity of this problem. They disproved Conjecture 5.6.2. in [7] for \(p=2\) and \(1/2< s <(d-1)/2\). Indeed, the celebrated sparse grid points are now beaten by random points in a certain range for s. This again reflects the “power of random information”, see [11].
Still it is worth mentioning that the sparse grids represent the best known deterministic construction what concerns the asymptotic order. Indeed, the guarantees are deterministic and only slightly worse compared to random nodes in the asymptotic regime. However, regarding preasymptotics the random constructions provide substantial advantages. The problem is somehow related to the recent efforts in compressed sensing. There the optimal RIP matrices are given as realizations of random matrices. Known deterministic constructions are far from being optimal.
In the present paper, we prove that the sparse grids are beaten for the full range of \(s>1/2\) whenever \(d\ge 3\). In case \(d=2\) our approach and the sparse grids have the same performance. Clearly, inserting (33) into the bound in Theorem 6.1 gives
which shortens the gap between upper and lower bound to \(\sqrt{\log n}\). The best known lower bound is the one from (33). It is neither clear whether the bound in (35) is sharp nor if it can be improved. So this framework might serve as a candidate for Remark 6.4. Therefore, the outstanding open question remains (see, e.g., [7, Chapt. 5] and the references therein) whether there is an intrinsic additional difficulty when restricting to algorithms based on function samples rather than Fourier coefficients. From a practical point of view, sampling algorithms are highly relevant since we usually have given discrete samples of functions. The question remains: are the asymptotic characteristics \(\sigma _n\) and \(g_n\) of the same order or do they rather behave like \(\sigma _n = o(g_n)\)? This represents a fundamental open problem in hyperbolic cross approximation, see [7, Outstanding Open Problem 1.4].
References
A. Berlinet and C. Thomas-Agnan. Reproducing kernel Hilbert spaces in probability and statistics. Kluwer Academic Publishers, Boston, MA, 2004. With a preface by Persi Diaconis.
H.-J. Bungartz and M. Griebel. Sparse grids. Acta Numerica, 13:147–269, 2004.
G. Byrenheid. Sparse representation of multivariate functions based on discrete point evaluations. Dissertation, Institut für Numerische Simulation, Universität Bonn, 2018.
A. Christmann and I. Steinwart. Support Vector Machines. Springer, Berlin, 2008.
A. Cohen and G. Migliorati. Optimal weighted least-squares methods. SMAI J. Comput. Math., 3:181–203, 2017.
D. Dũng. B-spline quasi-interpolant representations and sampling recovery of functions with mixed smoothness. J. Complexity, 27(6):541–567, 2011.
D. Dũng, V. N. Temlyakov, and T. Ullrich. Hyperbolic Cross Approximation. Advanced Courses in Mathematics. CRM Barcelona. Birkhäuser/Springer, 2019.
D. Dũng and T. Ullrich. Lower bounds for the integration error for multivariate functions with mixed smoothness and optimal Fibonacci cubature for functions on the square. Math. Nachr., 288(7):743–762, 2015.
M. Hein and O. Bousquet. Kernels, associated structures and generalizations. Technical Report 127, Max Planck Institute for Biological Cybernetics, Tübingen, Germany, 2004.
A. Hinrichs, E. Novak, and J. Vybíral. Linear information versus function evaluations for \(L_2\)-approximation. J. Approx. Theory, 153:97–107, 2008.
A. Hinrichs, E. Novak, D. Krieg, J. Prochno, and M. Ullrich. On the power of random information. In Multivariate Algorithms and Information-Based Complexity. De Gruyter, Berlin/Munich/Boston, 2020.
R. Kadison and I. Singer. Extensions of pure states. American Journal of Mathematics, 81(2):383–400, 1959.
L. Kämmerer, T. Ullrich, and T. Volkmer. Worst-case recovery guarantees for least squares approximation using random samples. arXiv:1911.10111, 2019.
D. Krieg. Tensor power sequences and the approximation of tensor product operators. J. Complexity, 44:30–51, 2018.
D. Krieg and M. Ullrich. Function values are enough for \({L}_2\)-approximation. Found. Comput. Math., to appear. arXiv:math/1905.02516v5.
D. Krieg and M. Ullrich. Function values are enough for \(L_2\)-approximation: Part (II). arXiv:2011.01779, 2020.
T. Kühn. New preasymptotic estimates for the approximation of periodic Sobolev functions. In 2018 MATRIX annals, volume 3 of MATRIX Book Ser. Springer, Cham, to appear, https://www.matrix-inst.org.au/2018-matrix-annals/.
T. Kühn, W. Sickel, and T. Ullrich. Approximation of mixed order Sobolev functions on the \(d\)-torus: asymptotics, preasymptotics, and \(d\)-dependence. Constr. Approx., 42(3):353–398, 2015.
T. Kühn, W. Sickel, and T. Ullrich. How anisotropic mixed smoothness affects the decay of singular numbers of Sobolev embeddings. arXiv:2001.09022, 2020.
F. Y. Kuo, G. W. Wasilkowski, and H. Woźniakowski. On the power of standard information for multivariate approximation in the worst case setting. J. Approx. Theory, 158(1):97–125, 2009.
I. Limonova and V. Temlyakov. On sampling discretization in \({L}_2\). arXiv:math/2009.10789v1, 2020.
A. W. Marcus, D. A. Spielman, and N. Srivastava. Interlacing families II: Mixed characteristic polynomials and the Kadison–Singer problem. Ann. of Math. (2), 182(1):327–350, 2015.
S. Mendelson and A. Pajor. On singular values of matrices with independent rows. Bernoulli, 12:761–773, 2006.
M. Moeller and T. Ullrich. \({L}_2\)-norm sampling discretization and recovery of functions from RKHS with finite trace. arXiv:2009.11940, 2020.
N. Nagel. On the Kadison-Singer problem and Weaver’s conjecture with implications for Fourier systems over unbounded sets. Bachelor’s thesis, Faculty of Mathematics, TU Chemnitz, 2020.
S. Nitzan, A. Olevskii, and A. Ulanovskii. Exponential frames on unbounded sets. Proc. Amer. Math. Soc., 144(1):109–118, 2016.
E. Novak and H. Woźniakowski. Tractability of multivariate problems. Volume III: Standard information for operators, volume 18 of EMS Tracts in Mathematics. European Mathematical Society (EMS), Zürich, 2012.
R. I. Oliveira. Sums of random Hermitian matrices and an inequality by Rudelson. Electr. Comm. Probab., 15:203–212, 2010.
C. C. Paige and M. A. Saunders. LSQR: An algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Software, 8:43–71, 1982.
H. Rauhut. Compressive sensing and structured random matrices. In M. Fornasier, editor, Theoretical Foundations and Numerical Methods for Sparse Recovery, volume 9 of Radon Series on Computational and Applied Mathematics. de Gruyter, Berlin, 2010.
W. Sickel and T. Ullrich. The Smolyak algorithm, sampling on sparse grids and function spaces of dominating mixed smoothness. East J. Approx., 13(4):387–425, 2007.
I. Steinwart and C. Scovel. Mercers theorem on general domains: On the interaction between measures, kernels, and rkhss. Constructive Approximation, 35, 2012.
V. N. Temlyakov. Approximation of periodic functions. Computational Mathematics and Analysis Series. Nova Science Publishers Inc., Commack, NY, 1993.
V. N. Temlyakov. The Marcinkiewicz-type discretization theorems for the hyperbolic cross polynomials. Jaen J. Approx., 9(1):37–63, 2017.
V. N. Temlyakov. The Marcinkiewicz-type discretization theorems. Constr. Approx., 48(2):337–369, 2018.
V. N. Temlyakov. On optimal recovery in \(L_2\). arXiv:2010.03103, 2020.
V. N. Temlyakov and T. Ullrich. Bounds on Kolmogorov widths of classes with small mixed smoothness. arXiv:2012.09925v1, 2020.
V. N. Temlyakov and T. Ullrich. Approximation of functions with small mixed smoothness in the uniform norm. arXiv:2012.2012.11983v1, 2020.
H. Triebel. Bases in function spaces, sampling, discrepancy, numerical integration, volume 11 of EMS Tracts in Mathematics. European Mathematical Society (EMS), Zürich, 2010.
J. Tropp. User-friendly tail bounds for sums of random matrices. Found. Comp. Math., 12(4):389–434, 2011.
M. Ullrich. On the worst-case error of least squares algorithms for \({L}_2\)-approximation with high probability. Journal of Complexity, 60, 2020.
G. W. Wasilkowski and H. Woźniakowski. On the power of standard information for weighted approximation. Found. Comput. Math., 1:417–434, 2001.
N. Weaver. The Kadison-Singer problem in discrepancy theory. Discrete Mathematics, 278(1–3):227–239, 2004.
Acknowledgements
The authors would like to thank V.N. Temlyakov for giving a series of talks at the Chemnitz Summer School on Applied Analysis where he brought the paper [26] to their attention. Theorem 2.3 is a generalization of the main result in [26]. After this preprint was finished, V.N. Temlyakov pointed out to the authors that the proof of Lemma 2.2 in the recent paper [21] also yields a version of Theorem 2.3 with different constants. The authors would further like to thank Mario Ullrich for a useful comment regarding the case distinction for computing the explicit constants in Theorem 2.3. Last but not least they thank David Krieg for useful remarks on Sect. 4. T.U. would like to acknowledge support by the DFG Ul-403/2-1.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Frances Kuo.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Nagel, N., Schäfer, M. & Ullrich, T. A New Upper Bound for Sampling Numbers. Found Comput Math 22, 445–468 (2022). https://doi.org/10.1007/s10208-021-09504-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10208-021-09504-0
Keywords
- Sampling recovery
- Least squares approximation
- Random sampling
- Weaver’s conjecture
- Finite frames
- Kadison–Singer problem