Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

A Uniform Error Bound for Stochastic Kriging: Properties and Implications on Simulation Experimental Design

Published: 25 November 2024 Publication History

Abstract

In this work, we propose a method to construct a uniform error bound for the SK predictor. In investigating the asymptotic properties of the proposed uniform error bound, we examine the convergence rate of SK’s predictive variance under the supremum norm in both fixed and random design settings. Our analyses reveal that the large-sample properties of SK prediction depend on the design-point sampling scheme and the budget allocation scheme adopted. Appropriately controlling the order of noise variances through budget allocation is crucial for achieving a desirable convergence rate of SK’s approximation error, as quantified by the uniform error bound, and for maintaining SK’s numerical stability. Moreover, we investigate the impact of noise variance estimation on the uniform error bound’s performance theoretically and numerically. We demonstrate the superiority of the proposed uniform bound to the Bonferroni correction-based simultaneous confidence interval under various experimental settings through numerical evaluations.

1 Introduction

In the recent decades, research on metamodeling techniques for stochastic simulation experiments has received considerable attention from the simulation research community [2, 15, 34]. Several metamodeling methodologies, such as stochastic Kriging (SK) [2, 10, 11], have been proposed for approximating the mean response surface implied by a stochastic simulation.
Built on Gaussian process regression (GPR), SK is suitable for simulation metamodeling under the impact of heteroscedasticity (i.e., the simulation output variance varies across the input space) [12, 46]. Similar to Kriging, a popular GPR-based methodology for design and analysis of deterministic computer experiments [37], a key element of SK prediction is the use of conditional inference based on GP. At each prediction point in the input space, the conditional distribution of a GP is normal with the predictive mean and the predictive variance in closed form. A pointwise confidence interval of the mean response value at a given prediction point with a prescribed coverage probability can be constructed based on this conditional distribution. In many applications, however, it is often desirable to have a joint confidence region of the mean response values at an arbitrary set of prediction points with a prescribed simultaneous coverage probability. For example, in materials science and engineering, researchers are interested in chemical sensor calibration in a drifting environment via GPR-based statistical inference of the functional relationship between the sensor response and the analyte concentration and the environmental factors with rigorous uncertainty quantification, aiming at obtaining a calibration model of the desired quality [20]. To draw comprehensive and reliable inferences regarding the effectiveness of a treatment on individual patients, investigators in clinical trial studies aim to infer individualized treatment effects based on observational data via multi-task GPR modeling. Simultaneous confidence intervals are sought to provide measures of confidence in the estimates, a crucial aspect for realizing the full potential of precision medicine [1]. To the best of our knowledge, the methodology for constructing uniform error bounds for SK is relatively underdeveloped. Existing approaches to construct a joint confidence region for SK typically rely on bootstrapping [20, 26] and correction methods such as Bonferroni or Šidák [1, 14] and only apply to a finite number of prediction points. It would be desirable to develop a method capable of providing high-probability bounds on the maximum approximation error achieved by the SK predictor across the input space. Such a bound proves especially valuable in addressing predictive uncertainty, providing an overall measure of confidence in the model’s predictions across the entire input space.
A closely related topic in nonparametric data analysis is simultaneous confidence bands (SCBs). SCBs play a crucial role in global inference for unknown functions, such as detecting the overall shape of an unknown function or testing whether the curve adheres to specific functional forms that describe the underlying input-output relationship [5, 6, 38]. SCBs were first constructed for nonparametric density estimation [4] and further extended to nonparametric regression in both the random design [22] and the fixed design settings [18]. Recent advancements in nonparametric methodology for constructing SCBs have significantly expanded in terms of techniques and objectives. For example, asymptotically correct SCBs were proposed for estimating either the mean or the variance function or both of a nonparametric regression model [6]. Various SCBs for single-index link functions [21], cumulative distribution functions [43], and functional data [8] also emerged. Adaptive SCBs that automatically adjust to the smoothness of the underlying function were also investigated [7]. Last but not least, various approaches were explored to construct bootstrap-based SCBs for enhancing the small sample size performance [24] or for obtaining the minimum band width [38].
Uniform error bounds for GPR are not new; they have been studied as an essential component of GP-upper confidence bound (GP-UCB) approaches. These methods are used to address stochastic multi-armed bandit problems with a continuous arm set, assuming that the underlying response function of interest is either a sample from a GP or has a bounded reproducing kernel Hilbert space norm (see, e.g., References [13, 40]). Recently, there has been a growing interest in establishing uniform error bounds for various GPR models. These bounds are crucial in enhancing our understanding of the convergence rate and uncertainty quantification associated with the respective models. For example, Wang et al. (2020) derived uniform error bounds of the simple and universal Kriging predictor built on noiseless observations [47]. Lederer and Umlauft (2019) proposed a uniform error bound for GPR under the impact of homoscedastic noise, where the true underlying function f is assumed to be a realization of a GP with kernel K satisfying some regularity conditions [29]. However, the exploration of a uniform error bound for GPR under heteroscedasticity has been limited.
In this work, we extend the approach of Reference [29] and propose a method to construct a uniform error bound for the SK predictor. In investigating the asymptotic properties of the proposed uniform error bound, we examine the convergence rate of SK’s predictive variance under the supremum norm in both fixed and random design settings. Previous work provided theoretical analyses of the predictive variance for GPR modeling with observations subject to zero [41] and homoscedastic noise [28, 44]. Our investigation fills the gap by conducting a counterpart study of SK’s predictive variance, which can be of independent interest. Our analyses reveal that the large-sample properties of SK prediction depend on the design-point set determined by the design-point sampling scheme and the budget allocation scheme adopted. Roughly speaking, adopting a quasi-uniform design facilitates faster predictive variance convergence rates. Given a sufficiently large simulation budget, the smoother the underlying function of interest is and the lower the input-space dimensionality is, the fewer distinct design points are required, but more replications are needed at each and vice versa. Moreover, this work reveals that appropriately controlling the order of noise variances through budget allocation is crucial for achieving a desirable convergence rate of SK’s approximation error, as quantified by the uniform error bound, and for maintaining SK’s numerical stability. Last but not least, we evaluate the impact of noise variance estimation on the uniform error bound’s performance theoretically and numerically.
The rest of the article is organized as follows: Section 2 provides a review of SK and a summary of key notation used in this work. Section 3 presents the main results on the proposed uniform error bound for the SK predictor. Section 4 analyzes the convergence rate of SK’s predictive variance under the supremum norm in the fixed design setting and studies the corresponding convergence order of the uniform error bound. The discussion of the random design setting is deferred to Appendix C. Section 5 reveals some implications on simulation experimental designs for SK and theoretically investigates the impact of noise variance estimation on the uniform error bound. Section 6 conducts numerical experiments to demonstrate the proposed uniform bound’s performance. Section 7 concludes this work with a discussion of our major findings and avenues for future research.

2 Background

This section offers a concise review of SK and establishes some notation that will be useful in the rest of the work.

2.1 Review of Stochastic Kriging

In SK, the system output obtained at a point \({\bf x}\) in the d-dimensional input space \(\mathcal {X}\subseteq \mathbb {R}^d\) on the jth simulation replication \(\mathcal {Y}_{j}({\bf x})\) is modeled as
\begin{equation} \mathcal {Y}_{j}({\bf x}) = f({\bf x}) + \varepsilon _j({\bf x}), \end{equation}
(1)
where \(f({\bf x})\) denotes the unknown true mean response that we intend to estimate at \({\bf x}\in \mathcal {X}\). For ease of exposition, we assume that \(f: \mathcal {X}\mapsto \mathbb {R}\) represents a sample of a stationary mean-zero GP [37], in line with a vast array of GP literature [13, 32, 40, 44]. The spatial covariance between any two points in the GP is given by \(\Sigma _{\sf M}({\bf x}, {\bf x}^\prime):= \xi ^2 K({\bf x}, {\bf x}^\prime)\), where \(\Sigma _{\sf M}: \mathbb {R}^d \times \mathbb {R}^d \rightarrow \mathbb {R}_+\) denotes the spatial covariance function, \(\xi ^2\) is the spatial variance of the GP, and \(K({\bf x}, {\bf x}^\prime)\) denotes the kernel function satisfying \(K({\bf x}, {\bf x}^\prime)\le 1\) for all \({\bf x}, {\bf x}^\prime \in \mathcal {X}\). The simulation noise terms incurred at \({\bf x}\) on different replications, \(\varepsilon _j({\bf x})\)’s, are assumed to be independent and identically distributed (i.i.d.) normal random variables with mean zero and input-dependent variance \({\sf V}({\bf x}) := {\rm Var}(\varepsilon _j({\bf x}))\). The normality of \(\varepsilon _j({\bf x})\) could be anticipated, since in a discrete-event simulation, the output \(\mathcal {Y}_{j}({\bf x})\) typically represents the average of a large number of more basic random variables obtained on the jth simulation replication.
Given a fixed simulation budget B to expend for approximating the mean response surface via SK, an experimental design for performing the simulation runs can be given as \(\lbrace ({\bf x}_i, n_i)_{i=1}^k: \sum _{i=1}^k n_i = B\rbrace\), where k denotes the number of distinct design points selected from the input space \(\mathcal {X}\), \({\bf x}_1, {\bf x}_2,\ldots , {\bf x}_k\) denote the k design points in the design-point set \(\mathcal {D}:=\lbrace {\bf x}_i, i=1,2,\ldots ,k\rbrace\), and \(n_i\) represents the number of replications to apply at \({\bf x}_i\), \(i=1,2,\ldots ,k\). Based on the simulation dataset \(\mathbb {D}_{k,B}=\lbrace {\bf x}_i,\lbrace \mathcal {Y}_{j}({\bf x}_i)\rbrace _{j=1}^{n_i}, i=1,2,\ldots ,k: \sum _{i=1}^k n_i = B\rbrace\) obtained, one can obtain the SK predictor of \(f({\bf x}_0)\) at any \({\bf x}_0 \in \mathcal {X}\) as follows:
\begin{equation} \mu _{k,B}({\bf x}_0) = \Sigma _{\sf M}({\bf x}_0,{\bf X})^\top \left(\Sigma _{\sf M}({\bf X},{\bf X}) + \Sigma _{\varepsilon }\right)^{-1}\bar{\mathcal {Y}}; \end{equation}
(2)
and its corresponding predictive variance is given by
\begin{equation} \sigma ^2_{k,B}({\bf x}_0) = \Sigma _{\sf M}({\bf x}_0, {\bf x}_0) - \Sigma _{\sf M}({\bf x}_0,{\bf X})^\top \left(\Sigma _{\sf M}({\bf X},{\bf X}) + \Sigma _{\varepsilon }\right)^{-1}\Sigma _{\sf M}({\bf x}_0,{\bf X}), \end{equation}
(3)
where \({\bf X}= ({\bf x}_1^\top , {\bf x}_2^\top , \ldots , {\bf x}_k^\top)^\top\) denotes the \(k\times d\) design matrix; \(\bar{\mathcal {Y}}=(\bar{\mathcal {Y}} ({\bf x}_1), \bar{\mathcal {Y}} ({\bf x}_2), \ldots , \bar{\mathcal {Y}} ({\bf x}_k))^\top\) denotes the \(k\times 1\) vector of the sample averages of simulation outputs, with \(\bar{\mathcal {Y}} ({\bf x}_i)=f({\bf x}_i)+\bar{\mbox{ $\varepsilon $}}({\bf x}_i)\) and \(\bar{\mbox{ $\varepsilon $}}({\bf x}_i)\) denoting the average random noise incurred at \({\bf x}_i\), \(i=1,2,\ldots ,k\). With a slight abuse of notation, we use \(\Sigma _{\sf M}({\bf X},{\bf X})\) to denote the \(k\times k\) covariance matrix that records the spatial covariances across the k design points and \(\Sigma _{\sf M}({\bf x}_0,{\bf X})\) to represent the \(k\times 1\) vector that contains the spatial covariances between the k design points and the prediction point \({\bf x}_0\); and the kernel matrix \(K({\bf X},{\bf X})\) and the kernel vector \(K({\bf x}_0,{\bf X})\) can be understood in the same manner. The \(k\times k\) diagonal matrix \(\Sigma _{\varepsilon }\) denotes the variance-covariance matrix of the \(k\times 1\) vector of average random noise terms \(\bar{\mbox{ $\varepsilon $}}=\left(\bar{\mbox{ $\varepsilon $}}({\bf x}_1),\bar{\mbox{ $\varepsilon $}}({\bf x}_2),\ldots , \bar{\mbox{ $\varepsilon $}}({\bf x}_k)\right)^\top\) given by \(\Sigma _{\varepsilon }=\mbox{diag} \left({\sf V}({\bf x}_1)/n_1, {\sf V}({\bf x}_2)/n_2, \ldots , {\sf V}({\bf x}_k)/n_k\right)\).
A pointwise confidence interval (CI) of \(f({\bf x}_0)\) at a given prediction point \({\bf x}_0\in \mathcal {X}\) with a prescribed confidence level \((1-\alpha)\) can be constructed as \(\mu _{k,B}({\bf x}_0)\pm z_{1-\alpha /2}\sigma _{k,B}({\bf x}_0)\), where \(\alpha \in (0,1)\) and \(z_{1-\alpha /2}\) denotes the \((1-\alpha /2)\)-quantile of the standard normal distribution [25]. One way to construct a confidence region for \(f(\cdot)\) at N prediction points (say, \({\bf x}_{0,i}\in \mathcal {X}\), \(i=1,2,\ldots ,N\)) with confidence level \((1-\alpha)\) is to apply the Bonferroni correction and obtain a simultaneous confidence bound comprising N pointwise confidence intervals [26]: \(\mu _{k,B}({\bf x}_{0,i})\pm z_{1-\alpha /(2N)}\sigma _{k,B}({\bf x}_{0,i})\), \(i=1,2,\ldots ,N\). Nevertheless, the literature on constructing a joint confidence bound for SK over an arbitrary set of prediction points is sparse.

2.2 Notation

In the remainder of this work, we employ the following notation consistently. Given a function \(g: \mathcal {X}\mapsto \mathbb {R}\), let \(\Vert g\Vert _\infty\) denote its supremum norm. For a vector \({\bf v}\), \(\left\Vert {\bf v}\right\Vert\) denotes the Euclidean norm of \({\bf v}\). Let \({\bf I}_k\) denote the \(k\times k\) identity matrix. For any symmetric matrix \({\bf Q}\), the \(\ell _2\)-operator norm or spectral norm of \({\bf Q}\) is denoted as \(\Vert {\bf Q}\Vert\), defined as \(\Vert {\bf Q}\Vert := \lambda _{\max }({\bf Q})\), where \(\lambda _{\max }({\bf Q})\) and \(\lambda _{\min }({\bf Q})\), respectively, denote the maximum and minimum eigenvalues of \({\bf Q}\). Define \({\sf V}_{k,\max } :=\max \limits _{1\le i \le k}{\sf V}({\bf x}_i)/n_i\) and \({\sf V}_{k,\min } :=\min \limits _{1\le i \le k}{\sf V}({\bf x}_i)/n_i\) as the maximum and minimum noise variance at the k design points. Let \(\lesssim\) and \(\gtrsim\) denote inequalities up to a constant multiple and write \(a \asymp b\) when both \(a \lesssim b\) and \(a \gtrsim b\) hold.

3 Uniform Error Bound for Stochastic Kriging

This section presents the main results regarding the construction of the proposed high-probability uniform error bound and then reveals its asymptotic order as the number of design points k and the budget B allocated approach infinity.
To deepen understanding, we sketch the idea behind constructing the uniform error bound, which essentially involves two main steps. First, consider any grid of points covering the input space \(\mathcal {X}\) with the grid constant \({\tau }\), denoted as \(\mathbb {X}_\tau\), satisfying \(\max _{{\bf x}\in \mathcal {X}} \min _{{\bf x}^\prime \in \mathbb {X}_{\tau }} \left\Vert {\bf x}- {\bf x}^\prime \right\Vert \le \tau\). Conceptually, we can visualize such a cover as a collection of balls of radius \(\tau\) covering the input space \(\mathcal {X}\). We establish a high-probability uniform bound for the approximation error \(|{\mu _{k,B}({\bf x}^\prime)-f({\bf x}^\prime)}|\) that holds simultaneously at all points \({\bf x}^\prime\) in the grid \(\mathbb {X}_{\tau }\) by leveraging properties of the Gaussian distribution, which follows from model (1) and Equation (2) and accounts for both sampling and response-surface uncertainty. This step introduces a dependency on the number of points in the grid and is accounted for by the coefficient \(\beta (\tau)\) to be defined in Equation (5), which involves the covering number \(M(\tau ,\mathcal {X})\)—the minimum number of points in \(\mathbb {X}_{\tau }\) to cover \(\mathcal {X}\) given the grid constant \(\tau\). The second step is to extend such an error bound from the points in the grid to any point in the input space \(\mathcal {X}\), yielding the high-probability uniform error bound for SK. This extension is achieved by establishing the Lipschitz continuity of the predictive mean \(\mu _{k,B}(\cdot)\) and the modulus of continuity \(\omega _{\sigma _{k,B}}(\cdot)\) for the predictive standard deviation \(\sigma _{k,B}(\cdot)\) based on the Lipschitz continuity and differentiability of the covariance function \(\Sigma _{\sf M}(\cdot ,\cdot)\).
Theorem 1.
Consider a zero mean Gaussian process defined through the continuous covariance function \(\Sigma _{\sf M}(\cdot ,\cdot)\) with Lipschitz constant \(L_{\Sigma }\) on the compact set \(\mathcal {X}\subset \mathbb {R}^d\). Also consider a continuous unknown function \(f: \mathcal {X}\rightarrow \mathbb {R}\) with Lipschitz constant \(L_f\) and its noisy observations \(\mathcal {Y}(\cdot)\)’s satisfying the assumptions stipulated under model (1). Then, the predictive mean function \(\mu _{k,B}(\cdot)\) and the predictive standard deviation \(\sigma _{k,B}(\cdot)\) obtained based on a simulation dataset \(\mathbb {D}_{k,B}=\lbrace {\bf x}_i,\lbrace \mathcal {Y}_{j}({\bf x}_i)\rbrace _{j=1}^{n_i}, i=1,2,\ldots ,k: \sum _{i=1}^k n_i = B\rbrace\) are continuous with Lipschitz constant \(L_{\mu _{k,B}}\) and modulus of continuity \(\omega _{\sigma _{k,B}}(\cdot)\) on \(\mathcal {X}\), which are, respectively, given by
\[\begin{eqnarray} L_{\mu _{k,B}} & = & L_\Sigma \sqrt {k}\left\Vert \big (\Sigma _{\sf M}({\bf X}, {\bf X})+\Sigma _{\varepsilon }\big)^{-1}\bar{\mathcal {Y}}\right\Vert , \nonumber \nonumber\\ \omega _{\sigma _{k,B}}(\mathit {h}) & = & \sqrt {2 \mathit {h} L_\Sigma \left(1+k\left\Vert \big (\Sigma _{\sf M}({\bf X}, {\bf X})+\Sigma _{\varepsilon }\big)^{-1}\right\Vert \max \limits _{\tilde{{\bf x}},\tilde{{\bf x}}^{\prime }\in \mathcal {X}}\Sigma _{\sf M}(\tilde{{\bf x}}, \tilde{{\bf x}}^{\prime })\right)}, \end{eqnarray}\]
(4)
where \(L_{\Sigma }\), the Lipschitz constant of the covariance function \(\Sigma _{\sf M}(\cdot ,\cdot)\), is defined as
\[\begin{eqnarray*} L_{\Sigma } &:=& \max \limits _{\tilde{{\bf x}},\tilde{{\bf x}}^{\prime }\in \mathcal {X}} \left\Vert \bigg (\frac{\partial \Sigma _{\sf M}(\tilde{{\bf x}}, \tilde{{\bf x}}^{\prime })}{\partial \tilde{x}_1} \ \frac{\partial \Sigma _{\sf M}(\tilde{{\bf x}}, \tilde{{\bf x}}^{\prime })}{\partial \tilde{x}_2} \ \ \ldots \ \frac{\partial \Sigma _{\sf M}(\tilde{{\bf x}}, \tilde{{\bf x}}^{\prime })}{\partial \tilde{x}_d}\bigg)^\top \right\Vert , \end{eqnarray*}\]
and \(\omega _{\sigma _{k,B}}(\mathit {h})\) measures how much the value of \(\sigma _{k,B}(\cdot)\) can vary when moving from one point to another within a given distance \(\mathit {h}\gt 0\). Given any \(\alpha \in (0,1)\), choose \(\tau \in \mathbb {R}_{+}\) and set
\[\begin{eqnarray} \beta (\tau) = 2\log \left(M(\tau ,\mathcal {X})/\alpha \right) \mbox{ and } \gamma _{{k,B}}(\tau) = (L_{\mu _{k,B}}+L_f)\tau +\sqrt {\beta (\tau)}\omega _{\sigma _{k,B}}(\tau), \end{eqnarray}\]
(5)
then it holds true that
\[\begin{eqnarray} {\rm P}\left(|{\mu _{k,B}({\bf x})-f({\bf x})}| \le \sqrt {\beta (\tau)}\sigma _{k,B}({\bf x})+\gamma _{k,B}(\tau), \forall {\bf x}\in \mathcal {X}\right) \ge 1-\alpha , \end{eqnarray}\]
(6)
where \(L_f\) denotes a Lipschitz constant of \(f(\cdot)\) that holds with probability of at least \(1-\alpha _L\) with \(\alpha _L \in (0,1)\) and can be given as
\[\begin{eqnarray*} L_f &:= & \left\Vert \begin{bmatrix} \sqrt {2\log \left(\frac{2d}{\alpha _L}\right)}\max \limits _{{\bf x}\in \mathcal {X}}\sqrt {\Sigma _{\sf M}^{\partial 1}({\bf x},{\bf x})}+12\sqrt {6d}\max \bigg \lbrace \max \limits _{{\bf x}\in \mathcal {X}}\sqrt {\Sigma _{\sf M}^{\partial 1}({\bf x},{\bf x})},\sqrt {rL_\Sigma ^{\partial 1}}\bigg \rbrace \\ \vdots \\ \sqrt {2\log \left(\frac{2d}{\alpha _L}\right)}\max \limits _{{\bf x}\in \mathcal {X}}\sqrt {\Sigma _{\sf M}^{\partial d}({\bf x},{\bf x})}+12\sqrt {6d}\max \bigg \lbrace \max \limits _{{\bf x}\in \mathcal {X}}\sqrt {\Sigma _{\sf M}^{\partial d}({\bf x},{\bf x})},\sqrt {rL_\Sigma ^{\partial d}}\bigg \rbrace \end{bmatrix}\right\Vert ; \end{eqnarray*}\]
\(L_\Sigma ^{\partial i}\) denotes the Lipschitz constant of the partial derivative kernel \(\Sigma _{\sf M}^{\partial i}({\bf x},{\bf x})\) on the set \(\mathcal {X}\) for \(i=1,2,\ldots , d\), and \(r := \max \limits _{\tilde{{\bf x}},\tilde{{\bf x}}^{\prime }\in \mathcal {X}}\Vert \tilde{{\bf x}}-\tilde{{\bf x}}^{\prime }\Vert\).
Proof.
Theorem 1 is inspired by Theorem 3.1 in Reference [29], whose proof is in the same vein as those of Theorems 3.1 and 3.2 in Reference [29]. Specifically, we replace the homogeneous noise variance-covariance matrix \(\sigma ^2{\bf I}_k\) by the heteroscedastic noise variance-covariance matrix \(\Sigma _{\varepsilon }\) in the proof of Theorem 3.1. The Lipschitz constant \(L_f\) can be obtained following the proof of Theorem 3.2. □
Some comments follow immediately. First and foremost, we can construct a uniform bound for the function \(f(\cdot)\) in light of Theorem 1: It holds with probability at least \(1-\alpha\) that
\begin{equation} LB({\bf x}) \le f({\bf x}) \le UB({\bf x}), \quad \forall {\bf x}\in \mathcal {X}, \end{equation}
(7)
where \(LB({\bf x}):= \mu _{k,B}({\bf x})-\sqrt {\beta (\tau)}\sigma _{k,B}({\bf x})-\gamma _{k,B}(\tau)\) and \(UB({\bf x}):= \mu _{k,B}({\bf x})+\sqrt {\beta (\tau)}\sigma _{k,B}({\bf x})+\gamma _{k,B}(\tau)\) are, respectively, the lower and upper limits of the bound at \({\bf x}\in \mathcal {X}\). Second, the grid constant \(\tau\) is a parameter used in the derivation of Theorem 1. Given a particular value of \(\tau\), obtaining a closed-form expression for the covering number \(M(\tau ,\mathcal {X})\) can be challenging. However, it is typically convenient to upper bound \(M(\tau ,\mathcal {X})\). For example, for a hypercubic set \(\mathcal {X}\subseteq \mathbb {R}^d\), an upper bound of \(M(\tau ,\mathcal {X})\) can be given as \((1+r/\tau)^d\), where r denotes the edge length of the hypercube. Notice that \(\beta (\tau)\) and \(\gamma _{k,B}(\tau)\) in Equation (5) can be obtained analytically given a simulation dataset \(\mathbb {D}_{k,B}\) and the covariance function \(\Sigma _{\sf M}(\cdot ,\cdot)\). Therefore, a probabilistic uniform error bound that holds true simultaneously for all \({\bf x}\in \mathcal {X}\) can be constructed with a prescribed error level \(\alpha\) and a choice of the grid constant \(\tau\).
The following result elucidates the order of the approximation error, quantified by the uniform error bound, as k and B approach infinity. Below, we make the choice of grid constant \(\tau\) depend on the number of design points k, so \(\beta (\tau)\) and \(\gamma _{k,B}(\tau)\) depend on k as well. The notations \(\tau (k)\), \(\beta _k(\tau)\), and \(\gamma _{k,B}(\tau)\) will be used hereinafter whenever it is not cumbersome to emphasize the terms’ dependence on the number of design points k.
Theorem 2.
Suppose that the assumptions of Theorem 1 are satisfied. The experimental design ensures that as the number of design points k increases, the number of observations obtained at each existing design point also grows. Furthermore, for any fixed \(k\ge 1\), the numbers of replications allocated over the k design points satisfy that \({\sf V}_{k,\max }/{\sf V}_{k,\min } \lt \infty\). Assume that the absolute value of the unknown function \(f: \mathcal {X}\rightarrow \mathbb {R}\) of interest is bounded above by \(\bar{f}\in \mathbb {R_+}\). Given any \(\alpha \in (0,1)\), it holds that
\[\begin{eqnarray*} { {\rm P}\left(\sup \limits _{{\bf x}\in \mathcal {X}} |\mu _{k,B}({\bf x})-f({\bf x})| = \mathcal {O} \left(\left(\beta _k(\tau) \cdot \tau (k) \cdot k {\sf V}_{k,\min }^{-1}\right)^{\frac{1}{2}} + \tau (k) + \tau (k) \cdot k {\sf V}_{k,\min }^{-\frac{1}{2}} +\sqrt {\beta _k(\tau)}\sup _{{\bf x}\in \mathcal {X}}\sigma _{k,B}({\bf x})\right) \right) \ge 1-\alpha .} \end{eqnarray*}\]
Proof.
Due to Theorem 1 with \(\beta _k(\tau)=2\log \left(M(\tau (k),\mathcal {X})/(\pi _k\alpha)\right)\) such that \(\sum \nolimits _{k=1}^\infty \pi _k = 1/2\) and the union bound over all \(k\ge 1\), the following event holds true with probability of at least \(1-\alpha /2\):
\[\begin{eqnarray} \sup \limits _{{\bf x}\in \mathcal {X}} |\mu _{k,B}({\bf x})-f({\bf x})| \le \sqrt {\beta _k(\tau)} \sup \limits _{{\bf x}\in \mathcal {X}}\sigma _{k,B}({\bf x})+\gamma _{k,B}(\tau), \ \forall k \ge 1. \end{eqnarray}\]
(8)
A trivial bound for the covering number can be obtained by considering a uniform grid over the hypercube containing \(\mathcal {X}\). This leads to \(M(\tau (k),\mathcal {X}) \le (1+r/\tau (k))^d\), where \(r = \max _{\tilde{{\bf x}},\tilde{{\bf x}}^{\prime }\in \mathcal {X}}\Vert \tilde{{\bf x}}-\tilde{{\bf x}}^{\prime }\Vert\). Therefore, we have
\[\begin{eqnarray} \beta _k(\tau) \le 2d\log \left(1+r/\tau (k)\right)-2\log (\pi _k)-2\log (\alpha). \end{eqnarray}\]
(9)
Furthermore, recall the definition of \(L_{\mu _{k,B}}\) in Equation (4) from Theorem 1,
\[\begin{eqnarray*} L_{\mu _{k,B}} = L_\Sigma \sqrt {k} \left\Vert \left(\Sigma _{\sf M}({\bf X}, {\bf X})+\Sigma _{\varepsilon }\right)^{-1}\bar{\mathcal {Y}}\right\Vert . \end{eqnarray*}\]
Since the matrix \(\Sigma _{\sf M}({\bf X}, {\bf X})+\Sigma _{\varepsilon }\) is positive definite and \(f(\cdot)\) is bounded above by \(\bar{f}\), it follows that
\[\begin{eqnarray*} \left\Vert \left(\Sigma _{\sf M}({\bf X}, {\bf X})+\Sigma _{\varepsilon }\right)^{-1}\bar{\mathcal {Y}}\right\Vert \le \left\Vert \bar{\mathcal {Y}}\right\Vert /\lambda _{\min }\left(\Sigma _{\sf M}({\bf X}, {\bf X})+\Sigma _{\varepsilon }\right) \le (\sqrt {k}\bar{f}+\left\Vert \bar{\mbox{ $\varepsilon $}}\right\Vert)/{\sf V}_{k,\min }, \nonumber \nonumber \end{eqnarray*}\]
where the second inequality follows from the triangular inequality, and recall \(\bar{\mbox{ $\varepsilon $}} = \left(\bar{\mbox{ $\varepsilon $}}({\bf x}_1), \bar{\mbox{ $\varepsilon $}}({\bf x}_2),\ldots ,\bar{\mbox{ $\varepsilon $}}({\bf x}_k)\right)^\top\). Given that \(\bar{\mbox{ $\varepsilon $}}\) is multivariate normally distributed with mean zero and variance-covariance matrix \(\Sigma _{\varepsilon }\), \(\left\Vert \bar{\mbox{ $\varepsilon $}}\right\Vert ^2\) is equal to \(\sum \nolimits _{i=1}^k a_i Z_i^2\) in distribution, where \(Z_i\)’s are i.i.d. standard normal random variables and \(a_i = {\sf V}({\bf x}_i)/n_i\), for \(i=1,2,\ldots ,k\), i.e., \(\left\Vert \bar{\mbox{ $\varepsilon $}}\right\Vert ^2 \stackrel{\mathcal {D}}{=}\sum \nolimits _{i=1}^k a_i Z_i^2\). Then, by Lemma 1 of Reference [27], we have
\[\begin{eqnarray*} {\rm P}\left(\left\Vert \bar{\mbox{ $\varepsilon $}}\right\Vert ^2\ge 2\left(\sum \limits _{i=1}^k \frac{{\sf V}^2({\bf x}_i)}{n_i^2}\right)^{\frac{1}{2}}\sqrt {\eta _{k,B}}+2{\sf V}_{k,\max }\eta _{k,B}+\sum \limits _{i=1}^k \frac{{\sf V}({\bf x}_i)}{n_i}\right) \le \exp (-\eta _{k,B}) \nonumber \nonumber \end{eqnarray*}\]
for any \(\eta _{k,B}\gt 0\). Therefore, with probability of at least \(1-\exp (-\eta _{k,B})\), we have
\[\begin{eqnarray*} \left\Vert \bar{\mbox{ $\varepsilon $}}\right\Vert ^2 \le 2\left(\sum \limits _{i=1}^k \frac{{\sf V}^2({\bf x}_i)}{n_i^2}\right)^{\frac{1}{2}}\sqrt {\eta _{k,B}}+2{\sf V}_{k,\max }\eta _{k,B}+\sum \limits _{i=1}^k \frac{{\sf V}({\bf x}_i)}{n_i}. \nonumber \nonumber \end{eqnarray*}\]
Hence, if we set \(\eta _{k,B}=\log \left(1/(\pi _k \alpha)\right)\) so \(\sum \nolimits _{k=1}^{\infty }\pi _k=1/2\), then applying the union bounds over all \(k\ge 1\) yields
\[\begin{eqnarray} { \left\Vert \left(\Sigma _{\sf M}({\bf X}, {\bf X})+\Sigma _{\varepsilon }\right)^{-1}\bar{\mathcal {Y}}\right\Vert \le \left(\sqrt {k}\bar{f}+\left(2\left(\sum \limits _{i=1}^k \frac{{\sf V}^2({\bf x}_i)}{n_i^2}\right)^{\frac{1}{2}}\sqrt {\eta _{k,B}}+2{\sf V}_{k,\max }\eta _{k,B}+\sum \limits _{i=1}^k \frac{{\sf V}({\bf x}_i)}{n_i}\right)^{\frac{1}{2}}\right){\sf V}_{k,\min }^{-1}} \end{eqnarray}\]
(10)
for all \(k\ge 1\) with probability of at least \(1-\alpha /2\). Hence, by Theorem 1, the Lipschitz constant of the predictive mean function \(\mu _{k,B}(\cdot)\) satisfies
\[\begin{eqnarray*} L_{\mu _{k,B}} \le L_\Sigma \underbrace{ \sqrt {k} \left(\sqrt {k}\bar{f}+\left(2\left(\sum \limits _{i=1}^k \frac{{\sf V}^2({\bf x}_i)}{n_i^2}\right)^{\frac{1}{2}}\sqrt {\eta _{k,B}}+2{\sf V}_{k,\max }\eta _{k,B}+\sum \limits _{i=1}^k \frac{{\sf V}({\bf x}_i)}{n_i}\right)^{\frac{1}{2}}\right){\sf V}_{k,\min }^{-1}}_{:= U_{k,B}}, \ \forall k\ge 1, \end{eqnarray*}\]
where \(U_{k,B}\) denotes all the terms to the right of \(L_\Sigma\) in the inequality above and recall \(\eta _{k,B}=\log (1/(\pi _k \alpha))\). Since \((\sum _{i=1}^k \tfrac{{\sf V}^2({\bf x}_i)}{n_i^2})^{\tfrac{1}{2}} \le \sqrt {k} {\sf V}_{k,\max }\), \(\sum _{i=1}^k \tfrac{{\sf V}({\bf x}_i)}{n_i} \le k {\sf V}_{k,\max }\), and \({\sf V}_{k,\max }/\) \({\sf V}_{k,\min }\lt \infty\), and \(\eta _{k,B}\) grows slowly (typically logarithmically with k for some commonly used \(\lbrace \pi _k\rbrace\) sequences), it follows that \(L_{\mu _{k,B}} \lesssim U_{k,B} \asymp k{\sf V}_{k,\min }^{-\tfrac{1}{2}}\) with probability of at least \(1-\alpha /2\).
The modulus of continuity \(\omega _{\sigma _{k,B}}(\cdot)\) of the predictive standard deviation defined in Equation (4) satisfies
\[\begin{eqnarray} \omega _{\sigma _{k,B}}(\tau) \le \left[2L_\Sigma \tau (k)\left(k \max \limits _{\tilde{{\bf x}},\tilde{{\bf x}}^{\prime }\in \mathcal {X}}\Sigma _{\sf M}(\tilde{{\bf x}},\tilde{{\bf x}}^{\prime }){\sf V}^{-1}_{k,\min }+1\right)\right]^{\frac{1}{2}} , \end{eqnarray}\]
(11)
since \(\Vert (\Sigma _{\sf M}({\bf X}, {\bf X})+\Sigma _{\varepsilon })^{-1}\Vert = \lambda _{\max }\left(\left(\Sigma _{\sf M}({\bf X}, {\bf X})+\Sigma _{\varepsilon }\right)^{-1}\right) = 1/\lambda _{\min }\left(\Sigma _{\sf M}({\bf X}, {\bf X})+\Sigma _{\varepsilon }\right) \le 1/\lambda _{\min }\left(\Sigma _{\varepsilon }\right) = 1 \big /{\sf V}_{k,\min }\), where we have used \(\lambda _{\min }\left(\Sigma _{\sf M}({\bf X}, {\bf X})+\Sigma _{\varepsilon }\right)\ge \lambda _{\min }\left(\Sigma _{\sf M}({\bf X}, {\bf X})\right)+ \lambda _{\min }\left(\Sigma _{\varepsilon }\right) \ge \lambda _{\min }\left(\Sigma _{\varepsilon }\right)\) due to Weyl’s Theorem. Therefore, it follows that \(\omega _{\sigma _{k,B}}(\tau) \lesssim (\tau (k) \cdot k {\sf V}_{k,\min }^{-1})^{\tfrac{1}{2}}\).
Due to the union bound, we have that \(\gamma _{k,B}(\tau)\) in Expression (8) can be bounded above with probability of at least \(1-\alpha\) as follows:
\begin{align} \gamma _{k,B}(\tau) & = \sqrt {\beta (\tau)}\omega _{\sigma _{k,B}}(\tau) + (L_{\mu _{k,B}}+L_f)\tau (k) \nonumber \nonumber\\ & \le \left[\beta _k(\tau) \cdot 2L_\Sigma \tau (k)\left(k\max \limits _{\tilde{{\bf x}},\tilde{{\bf x}}^{\prime }\in \mathcal {X}}\Sigma _{\sf M}(\tilde{{\bf x}},\tilde{{\bf x}}^{\prime })\big /{\sf V}_{k,\min }+1\right)\right]^{\frac{1}{2}}+L_f \tau (k) +L_\Sigma U_{k,B} \tau (k) \nonumber \nonumber\\ & \lesssim \left(\beta _k(\tau) \cdot \tau (k) \cdot k {\sf V}_{k,\min }^{-1}\right)^{\frac{1}{2}} + \tau (k) + \tau (k) \cdot k {\sf V}_{k,\min }^{-\frac{1}{2}}. \end{align}
(12)
Therefore, it follows from Expressions (8) and (12) that
\[\begin{eqnarray*} {{\rm P}\left(\sup \limits _{{\bf x}\in \mathcal {X}} |\mu _{k,B}({\bf x})-f({\bf x})| = \mathcal {O} \left(\left(\beta _k(\tau) \cdot \tau (k) \cdot k {\sf V}_{k,\min }^{-1}\right)^{\frac{1}{2}} + \tau (k) + \tau (k) \cdot k {\sf V}_{k,\min }^{-\frac{1}{2}} +\sqrt {\beta _k(\tau)}\sup _{{\bf x}\in \mathcal {X}}\sigma _{k,B}({\bf x})\right) \right) \ge 1-\alpha .} \end{eqnarray*}\]
 □
We briefly comment on Theorem 2, which sets up the foundation for the subsequent analysis. First and foremost, the upper bound of the approximation error achieved at any \({\bf x}\in \mathcal {X}\), \(\sqrt {\beta _k(\tau)}\sigma _{k,B}({\bf x})+\gamma _{k,B}(\tau)\), consists of two parts. From Expression (9), we observe that the smaller the grid constant \(\tau (k)\), the greater the coefficient \(\beta _k(\tau)\). To guarantee a vanishing approximation error for \(\forall {\bf x}\in \mathcal {X}\), the choice of \(\tau (k)\) must ensure that \(\gamma _{k,B}(\tau) \rightarrow 0\) and \(\sup _{{\bf x}\in \mathcal {X}}\sqrt {\beta _k(\tau)}\sigma _{k,B}({\bf x}) \rightarrow 0\) as the number of design points k and the total simulation budget B approach infinity. It is worth noting that the grid constant \(\tau (k)\) can be chosen arbitrarily small, making the magnitude of \(\gamma _{k,B}(\tau)\) negligible compared to \(\sqrt {\beta _k(\tau)}\sigma _{k,B}({\bf x})\) for \(\forall {\bf x}\in \mathcal {X}\). We will use this fact to study the asymptotic performance of the uniform error bound as \(k, B \rightarrow \infty\). Second, given a fixed total budget B to allocate at k distinct design points, the budget allocation scheme adopted impacts the width of the uniform error bound through \(\sigma _{k,B}({\bf x})\) and \(\gamma _{k,B}(\tau)\), indicated by Expression (8) and the upper bounds of \(\omega _{\sigma _{k,B}}(\tau)\) and \(L_{\mu _{k,B}}\). In particular, efficient unequal budget allocation schemes can potentially reduce the magnitudes of \(\sigma _{k,B}({\bf x})\) and \(\gamma _{k,B}(\tau)\) compared to the equal allocation scheme, leading to a tighter uniform error bound. However, if \(\tau (k)\) is sufficiently small, then the impact of budget allocation schemes may not be that obvious; see Expression (12). Nonetheless, the explicit analysis of these impacts is challenging, and we resort to numerical studies in Section 6 for a thorough investigation. Last but not least, understanding the convergence rate of \(\sup _{{\bf x}\in \mathcal {X}}\sigma _{k,B}({\bf x})\) is crucial to determine the order of the vanishing approximation error. We will conduct an in-depth investigation of this in Section 4.

4 Analysis of the Predictive Variance and the Uniform Error Bound in the Fixed Design Setting

In this section, we analyze the convergence rate of the predictive variance \(\sigma ^2_{k,B}(\cdot)\) under the supremum norm in the fixed design setting and subsequently elucidate the corresponding order of the proposed uniform error bound. Due to space limitations, the discussion of the random design setting is deferred to Appendix C.
We concentrate on the fixed design setting in this section, commonly employed in the design and analysis of computer experiments [37]. In this setting, the design points are chosen according to specific sampling schemes of our choice. Such designs encompass a variety of approaches, including, but not limited to, grid designs, low-discrepancy sequences, maximin Latin hypercube designs, and optimal Latin hypercube designs. Our analysis of the convergence rate of SK’s predictive variance in the fixed design setting leverages results from the scattered data analysis literature, where the goal is to approximate or interpolate an underlying fixed function; see, e.g., References [36] and [49].
Notice that the predictive variance \(\sigma ^2_{k,B}({\bf x}_0)\) can be upper bounded as follows:
\begin{align} \sigma ^2_{k,B}({\bf x}_0) \le \Sigma _{\sf M}({\bf x}_0, {\bf x}_0) - \Sigma _{\sf M}({\bf x}_0,{\bf X})^\top \left(\Sigma _{\sf M}({\bf X},{\bf X}) + {\sf V}_{k,\max }{\bf I}_k\right)^{-1}\Sigma _{\sf M}({\bf x}_0,{\bf X}), \end{align}
(13)
where recall \({\sf V}_{k,\max }:= \max \limits _{1\le i \le k}{\sf V}({\bf x}_i)/n_i\). Define \(\lambda := {\sf V}_{k,\max }/\xi ^2\) hereinafter in this section. We can rewrite the upper bound for \(\sigma ^2_{k,B}({\bf x}_0)\) given by the right-hand side of Expression (13) as \(\xi ^2 (K_{{\bf x}_0}({\bf x}_0) - \widehat{K}_{{\bf x}_0}({\bf x}_0)),\) where \(K_{{\bf x}_0}({\bf x}^\prime) := K({\bf x}_0, {\bf x}^\prime)\) for any \({\bf x}^\prime \in \mathcal {X}\) and recall from Section 2.1 that \(K({\bf x}, {\bf x}^\prime) = \Sigma _{\sf M}({\bf x}, {\bf x}^\prime)/\xi ^2\) for any \({\bf x}, {\bf x}^\prime \in \mathcal {X}\) is the kernel function; in addition,
\begin{equation*} \widehat{K}_{{\bf x}_0}({\bf x}^\prime) = K({\bf x}^\prime ,{\bf X})^\top \left(K({\bf X},{\bf X}) + \lambda {\bf I}_k\right)^{-1}K({\bf X}, {\bf x}_0). \end{equation*}
Inspired by References [36] and [49], we realize that, for any given prediction point \({\bf x}_0 \in \mathcal {X}\), \(\widehat{K}_{{\bf x}_0}(\cdot)\) is the solution to the following regularized kernel-based approximation using the noiseless observations of the function \(f:=K_{{\bf x}_0}\) at the k design points in the design-point set \(\mathcal {D}=\lbrace {\bf x}_1,{\bf x}_2,\ldots ,{\bf x}_k\rbrace \subseteq \mathcal {X}\):
\begin{equation} s_{\lambda } = \widehat{K}_{{\bf x}_0} := \mathop {\rm argmin}_{s\in \mathcal {H}} \left\lbrace \sum _{j=1}^k \left(f({\bf x}_j)-s({\bf x}_j)\right)^2+\lambda \Vert s\Vert ^2_\mathcal {H}\right\rbrace , \end{equation}
(14)
where the solution \(s_{\lambda }(\cdot)\) depends on the choice of the regularization parameter \(\lambda ={\sf V}_{k,\max }/\xi ^2\), \(\mathcal {H}\) denotes the reproducing kernel Hilbert space (RKHS) associated with kernel K, and \(\Vert \cdot \Vert _{\mathcal {H}}\) denotes the norm equipped with \(\mathcal {H}\) (see a brief introduction to RKHS in Appendix B). Therefore, we can analyze the error achieved by the estimator, specifically \(\Vert f - s_{\lambda }\Vert _{\infty }= \Vert K_{{\bf x}_0} - \widehat{K}_{{\bf x}_0}\Vert _{\infty }\), to investigate the convergence properties of the predictive variance \(\sigma ^2_{k,B}(\cdot)\) under the supremum norm. To begin with, we introduce some useful definitions next.
Definition 1.
For \(\nu \in \mathbb {N}\), \(q\in [1,\infty ]\), and a domain \(\tilde{\mathcal {X}}\subseteq \mathbb {R}^d\) that is non-empty, open, and connected, the integer order Sobolev space \(W^\nu _q(\tilde{\mathcal {X}})\) is defined as
\begin{equation*} W^\nu _q(\tilde{\mathcal {X}})=\lbrace f\in L_q(\tilde{\mathcal {X}}):\forall \mbox{ $\alpha $}\in \mathbb {N}^d, |\mbox{ $\alpha $}|\le \nu , D^{ \mbox{ $\alpha $}} f\in L_q(\tilde{\mathcal {X}})\rbrace , \end{equation*}
where \(\mathbb {N}^d\) denotes the set of multi-indices of size d, \(|\mbox{ $\alpha $}|:=\sum _{i=1}^d \alpha _i\) for a multi-index \(\mbox{ $\alpha $}= (\alpha _1, \alpha _2, \ldots , \alpha _d) \in \mathbb {N}^d\), and \(D^{\mbox{ $\alpha $}}\) denotes the weak derivative operator corresponding to \(\mbox{ $\alpha $}\), which is defined as \(D^{\mbox{ $\alpha $}} f= \frac{\partial ^{|\mbox{ $\alpha $}|}}{\partial \alpha _1 \partial \alpha _2 \ldots \partial \alpha _d} f\).
For any function \(f\in W^\nu _q(\tilde{\mathcal {X}})\), the (semi-)norms are given by
\begin{align*} & |f|_{W^\nu _q(\tilde{\mathcal {X}})} := \left(\sum _{|{\mbox{ $\alpha $}}|=\nu }\Vert D^{\mbox{ $\alpha $}} f\Vert ^q_{L_q(\tilde{\mathcal {X}})}\right)^{1/q} \mbox{ and } \Vert f\Vert _{W^\nu _q(\tilde{\mathcal {X}})} := \left(\sum _{|{\mbox{ $\alpha $}}|\le \nu }\Vert D^{\mbox{ $\alpha $}} f\Vert ^q_{L_q(\tilde{\mathcal {X}})}\right)^{1/q}, \ \mbox{if $q\in [1, \infty)$,} \\ & |f|_{W^\nu _\infty (\tilde{\mathcal {X}})} := \sup _{|{\mbox{ $\alpha $}}|=\nu } \Vert D^{\mbox{ $\alpha $}} f\Vert _{L_\infty (\tilde{\mathcal {X}})} \mbox{ and } \Vert f\Vert _{W^\nu _\infty (\tilde{\mathcal {X}})} := \sup _{|{\mbox{ $\alpha $}}|\le \nu } \Vert D^{\mbox{ $\alpha $}} f\Vert _{L_\infty (\tilde{\mathcal {X}})}, \ \mbox{if $q=\infty $}, \end{align*}
with \(\Vert \cdot \Vert _{L_q(\tilde{\mathcal {X}})}\) denoting the usual \(L_q\) norm.
Sobolev spaces of fractional orders (\(\nu \notin \mathbb {N}\)) are defined analogously; see References [3, 49] for details. In particular, for \(\nu \gt d/2\), the Sobolev space \(W^\nu _2(\mathbb {R}^d)\) can be written as \(W^\nu _2(\mathbb {R}^d)=\lbrace f\in L_2(\mathbb {R}^d): \Vert f\Vert _{W^\nu _2(\mathbb {R}^d)} \lt \infty \rbrace\). The Sobolev spaces on a subset (e.g., compact and convex) \(\mathcal {X}\subset \mathbb {R}^d\), \(W^\nu _2(\mathcal {X})\), can be defined as \(W^\nu _2(\mathcal {X}):=\lbrace f: \mathcal {X}\rightarrow \mathbb {R}^d: \exists \breve{f} \in W^\nu _2(\mathbb {R}^d) \mbox{ such that } \breve{f}({\bf x}) = f({\bf x}), \forall {\bf x}\in \mathcal {X}\rbrace ,\) with norm \(\Vert f\Vert _{W^\nu _2(\mathcal {X})} := \inf \lbrace \Vert \breve{f}\Vert _{W^\nu _2(\mathbb {R}^d)}: \breve{f} \in W^\nu _2(\mathbb {R}^d) \mbox{ and } \breve{f}({\bf x}) = f({\bf x}), \forall {\bf x}\in \mathcal {X}\rbrace\). The RKHS \(\mathcal {H}\) associated with kernel K is said to be norm-equivalent to the Sobolev space \(W_2^\nu (\mathcal {X})\) if there exist constants \(c_1, c_2\gt 0\) such that
\begin{equation*} c_1 \Vert f\Vert _{W_2^{\nu }(\mathcal {X})} \le \Vert f\Vert _{\mathcal {H}} \le c_2 \Vert f\Vert _{W_2^{\nu }(\mathcal {X})}, \quad \forall f\in \mathcal {H}. \end{equation*}
That is, \(\mathcal {H}\) is equivalent to \(W_2^\nu (\mathcal {X})\) as a set of functions. In this case, the kernel function K is called \(\nu\)-smooth. One popular kernel function that is \(\nu\)-smooth is the Matérn kernel as given below.
For \(\nu \gt d/2\), the Matérn kernel is given by
\begin{equation} K({\bf x},{\bf x}^\prime) = \frac{2^{1-(\nu -d/2)}}{\Gamma (\nu -d/2)}\left(\sqrt {2\left(\nu -d/2\right)}\frac{\Vert {\bf x}-{\bf x}^\prime \Vert }{l}\right)^{\nu -d/2}\mathcal {B}_{\nu -d/2}\left(\sqrt {2\left(\nu -d/2\right)}\frac{\Vert {\bf x}-{\bf x}^\prime \Vert }{l}\right), \end{equation}
(15)
where \(\Gamma\) is the Gamma function, \(\mathcal {B}_{\nu -d/2}\) is the modified Bessel function of the second kind of order \(\nu -d/2\), and \(l\gt 0\) is the lengthscale parameter. The parameter \(\nu\) controls the smoothness level of functions in the corresponding RKHS; the smaller its value, the rougher the functions.
Definition 2.
Given the bounded input space \(\mathcal {X}\subseteq \mathbb {R}^d\) and the design-point set \(\mathcal {D}=\lbrace {\bf x}_i, i=1,2,\ldots ,k\rbrace \subseteq \mathcal {X}\), the fill distance \(h_{\mathcal {D},\mathcal {X}}\) and the separation distance \(q_{\mathcal {D}}\) are defined as
\begin{equation*} h_{\mathcal {D},\mathcal {X}} := \sup _{{\bf x}\in \mathcal {X}}\min _{ 1\le j\le k}\Vert {\bf x}-{\bf x}_j\Vert , \quad q_\mathcal {D}:= \frac{1}{2}\min _{\stackrel{1 \le i,j \le k}{i \ne j} } \Vert {\bf x}_i-{\bf x}_j\Vert . \end{equation*}
The design-point set \(\mathcal {D}\) is said to be quasi-uniform with respect to a constant \(c_{{qu}}\gt 0\) if \(q_{\mathcal {D}} \le h_{\mathcal {D},\mathcal {X}} \le c_{{qu}} \cdot q_{\mathcal {D}}\).
Remark 1.
A design-point set \(\mathcal {D}\) with a small fill distance \(h_{\mathcal {D},\mathcal {X}}\) guarantees that every point in the input space \(\mathcal {X}\) is close to some design point in \(\mathcal {D}\), while \(\mathcal {D}\) having a large separation distance \(q_{\mathcal {D}}\) indicates that the design points in \(\mathcal {D}\) are not clustered. Quasi-uniform designs achieve the optimal rates for the fill distance, namely, \(h_{\mathcal {D}, \mathcal {X}} \asymp k^{-1/d}\), \(\forall k\in \mathbb {N}\) [48, 51]. For example, the design-point set comprising regular grid points in a hypercube \((0,1)^d\) is quasi-uniform. Points selected sequentially to minimize the GPR’s predictive variance for a \(\nu\)-smooth kernel with \(\nu \gt (d/2)+1\) form a quasi-uniform design. It is known that low-discrepancy point sets, e.g., the Halton sequence and Sobol’ nets have small fill distances; and sequences such as the Halton sequence have a fill distance \(h_{\mathcal {D}, \mathcal {X}} \lesssim k^{-1/d}\left(\log k\right)\), achieving the optimal rate up to a logarithmic factor [41]. Latin hypercube designs (LHDs) are not necessarily quasi-uniform. However, variants of LHDs, such as maximin and minimax LHDs, exist that optimize the fill or separation distance [51].
We will ground the upcoming analysis on the smoothness level of the kernel function K, derive the convergence rate of the predictive variance under the supremum norm, and investigate the asymptotic order of the approximation error as quantified by the uniform error bound in the fixed design setting. Sections 4.1 and 4.2 elaborate on the finite smoothness and infinite smoothness cases, respectively.

4.1 Finite Smoothness Case

This subsection performs an analysis in the fixed design setting when the kernel K has a finite smoothness level. We start with the following result, which provides an upper bound on the error of the regularized kernel based approximation \(s_\lambda\) given in Equation (14).
Proposition 1 (Proposition 3.6 in Wendland and Rieger [49]).
Let \(\nu =\kappa +s\) with \(\kappa \gt d/2\) and \(s\in (0,1]\). Suppose that \(\mathcal {X}\subseteq \mathcal {R}^d\) is bounded and satisfies an interior cone condition—there exists an angle \(\theta \in (0,\pi /2)\) and a radius \(\tilde{r}\gt 0\) such that for every \({\bf x}\in \mathcal {X}\) there exists a unit vector \(\zeta ({\bf x})\) such that the cone \(C({\bf x},\zeta ({\bf x}),\theta ,\tilde{r}):=\lbrace {\bf x}+\gamma {\bf x}_0:{\bf x}_0\in \mathbb {R}^d,\Vert {\bf x}_0\Vert =1,{\bf x}_0^\top \zeta ({\bf x})\ge \cos \theta ,\gamma \in [0,\tilde{r}]\rbrace\) is contained in \(\mathcal {X}\). Let \(q\in [1, \infty ]\). Suppose that \(s_\lambda\) is the solution given in Equation (14) for the given design-point set \(\mathcal {D}\subseteq \mathcal {X}\) and the target function \(f\in W_2^\nu (\mathcal {X})\). Then, the following error estimate holds:
\begin{equation*} |f-s_\lambda |_{W_q^j(\mathcal {X})}\le C\left(h_{\mathcal {D},\mathcal {X}}^{\nu -j-d(\frac{1}{2}-\frac{1}{q})}+h_{\mathcal {D},\mathcal {X}}^{-j}\sqrt {\lambda }\right)\Vert f\Vert _{W_2^\nu (\mathcal {X})},\quad 0\le j\lt \kappa -d/2, \end{equation*}
which gives that, for \(\lambda \le h_{\mathcal {D},\mathcal {X}}^{2\nu -d(1-2/q)_+}\),
\begin{equation*} |f-s_\lambda |_{W_q^j(\mathcal {X})}\le Ch_{\mathcal {D},\mathcal {X}}^{\nu -j-d(\frac{1}{2}-\frac{1}{q})_+}\Vert f\Vert _{W_2^\nu (\mathcal {X})},\quad 0\le j\lt \kappa -d/2, \end{equation*}
where \((x)_+ = \max \lbrace x, 0\rbrace\).
We note that the interior cone condition essentially states that the input space \(\mathcal {X}\) is sufficiently regular with no sharp corners or cusps; and the d-dimensional unit cube \(\mathcal {X}= [0,1]^d\) provides an example [41]. By setting \(f:=K_{{\bf x}_0}\), \(s_\lambda := \widehat{K}_{{\bf x}_0}\), \(j=0\), and \(q=\infty\) in Proposition 1, we obtain the following result, which reveals the optimal convergence rate of the maximum discrepancy between \(K_{{\bf x}_0}\) and its approximation \(\widehat{K}_{{\bf x}_0}\).
Corollary 1.
Suppose that the conditions of Proposition 1 are fulfilled. If \(\lambda \le h_{\mathcal {D},\mathcal {X}}^{2\nu -d}\) and the design-point set \(\mathcal {D}\) is quasi-uniform so \(h_{\mathcal {D},\mathcal {X}} \asymp k^{-1/d}\), then the maximum discrepancy between the function \(f:= K_{{\bf x}_0}\) and its approximation \(s_\lambda :=\widehat{K}_{{\bf x}_0}\) achieves the optimal convergence rate:
\begin{equation*} \Vert K_{{\bf x}_0} - \widehat{K}_{{\bf x}_0}\Vert _{\infty }\lesssim k^{-\frac{2\nu -d}{2d}}. \end{equation*}
In the finite smoothness case under the fixed design setting, Corollary 1 indicates that \(\sigma _{k,B}({\bf x}_0) \lesssim k^{-\frac{\nu -d/2}{2d}}\) for any \({\bf x}_0\in \mathcal {X}\). The condition on the smoothness parameter \(\lambda\) stipulated in Corollary 1 can be expressed as \({\sf V}_{k,\max }\lesssim k^{1-\frac{2\nu }{d}}\) following the definition \(\lambda ={\sf V}_{k,\max }/\xi ^2\). This condition indicates that: (i) the upper bound of \({\sf V}_{k,\max }\) must decrease with the number of design points k; and (ii) given a fixed number of design points k, the higher the smoothness level \(\nu\) (respectively, the higher the input-space dimensionality d) a given problem has, the faster (respectively, the slower) the decreasing rate required on \({\sf V}_{k,\max }\). The above sheds light on the following important aspects of a desirable simulation experimental design to achieve the optimal convergence rate of \(\Vert \sigma ^2_{k,B}\Vert _{\infty }\). First and foremost, the design-point set must be quasi-uniform. Moreover, the numbers of simulation replications \(n_i\)’s must grow with the number of design points k, and the smoother the kernel K, the faster \(n_i\)’s must grow.
Given that the kernel has a finite smoothness level, the next result extends Theorem 2 by elucidating the order of the uniform error bound as \(k, B \rightarrow \infty\) in the fixed design setting.
Proposition 2.
Suppose that the assumptions of Theorem 2 and Corollary 1 are fulfilled, the kernel K has \(\nu \gt d/2\) smoothness level with \(\nu \lt \infty\), and \({\sf V}_{k,\max } \asymp k^{1-2\nu /d}\). Given any \(\alpha \in (0,1)\), it holds that
\[\begin{eqnarray*} {\rm P}\left(\sup \limits _{{\bf x}\in \mathcal {X}} |\mu _{k,B}({\bf x})-f({\bf x})| = \mathcal {O}\left((\log k)^{\frac{1}{2}} k^{-\frac{\nu -d/2}{2d}}\right) \right) \ge 1-\alpha . \end{eqnarray*}\]
Proof.
We see from Expression (8) that both \(\gamma _{k,B}(\tau)\) and \(\sqrt {\beta _k(\tau)}\sup _{{\bf x}\in \mathcal {X}}\sigma _{k,B}({\bf x})\) must converge to zero as \(k,B \rightarrow \infty\) to guarantee a vanishing approximation error at \(\forall {\bf x}\in \mathcal {X}\). In the finite smoothness case under the fixed design setting, \(\lim \limits _{k, B\rightarrow \infty } \gamma _{k,B}(\tau)=0\) can be achieved if the grid constant as a function of k, \(\tau (k)\), decreases sufficiently fast. Recall from Expression (12) that \(\gamma _{k,B}(\tau) \lesssim ((\beta _k(\tau) \cdot \tau (k) \cdot k {\sf V}_{k,\min }^{-1})^{\tfrac{1}{2}} + \tau (k) + \tau (k) \cdot k {\sf V}_{k,\min }^{-\tfrac{1}{2}}) .\) Since \({\sf V}_{k,\max }/{\sf V}_{k, \min }\lt \infty\), \({\sf V}_{k,\min } \asymp k^{1-2\nu /d}\), setting \(\tau (k) = \mathcal {O}(k^{-a})\) as \(k, B\rightarrow \infty\) with \(a = 3\nu /d\) ensures that \(\lim \limits _{k, B\rightarrow \infty } \gamma _{k,B}(\tau)=0\) and moreover \(\gamma _{k,B}(\tau) = o(\sqrt {\beta _k(\tau)}\sup _{{\bf x}\in \mathcal {X}}\sigma _{k,B}({\bf x}))\) as \(k, B\rightarrow \infty\). The latter follows from the fact that \(\beta _k(\tau) = \mathcal {O} (\log k)\) due to Expression (9) and \(\sigma _{k,B}({\bf x}) = \mathcal {O} (k^{-\tfrac{\nu -d/2}{2d}})\) at \(\forall {\bf x}\in \mathcal {X}\), thanks to Corollary 1. Therefore, \(\gamma _{k,B}(\tau) + \sqrt {\beta _k(\tau)}\sup _{{\bf x}\in \mathcal {X}}\sigma _{k,B}({\bf x}) \lesssim \sqrt {\beta _{k}(\tau)} \sup _{{\bf x}\in \mathcal {X}}\sigma _{k,B}({\bf x}) = \mathcal {O}((\log k)^{\tfrac{1}{2}} k^{-\tfrac{\nu -d/2}{2d}})\) as \(k, B \rightarrow \infty\). □

4.2 Infinite Smoothness Case

When the kernel K has infinite smoothness, the corresponding RKHS \(\mathcal {H}\) can be continuously embedded into every classical Sobolev space \(W^{\nu }_p(\mathcal {X})\) for all \(\nu \in \mathbb {N}\) and some fixed \(p\in [1,\infty)\) [36]. By restricting our attention to two popular infinitely smooth kernels, respectively, the Gaussian kernel (abbreviated as G): \(K({\bf x},{\bf x}^\prime) = \exp (-\Vert {\bf x}-{\bf x}^\prime \Vert ^2/\theta)\) with \(\theta \gt 0\) and the inverse multiquadrics kernel (abbreviated as M): \(K({\bf x},{\bf x}^\prime) = (c^2+\Vert {\bf x}-{\bf x}^\prime \Vert ^2)^{-\xi },\) where \(c\gt 0\) and \(\xi \gt d/2\) (page 76 of Reference [48]), we reveal the convergence rate of \(\sigma ^2_{k,B}(\cdot)\) under the supremum norm and the asymptotic order of the uniform error bound as \(k, B \rightarrow \infty\) in the fixed design setting. We first state a result that provides an upper bound on the error of the regularized kernel-based approximation \(s_{\lambda }\) given in Equation (14).
Proposition 3 (Theorem 6.1 and Corollary 6.3 in Rieger and Zwicknagl [36]).
If \(\mathcal {X}\) is a compact cube with side length \(\ell\), then for all \(q \in [1, \infty ]\), there exist constants \(h_0\), C, and \(\tilde{C}\gt 0\) such that for all design-point sets \(\mathcal {D}\subset \mathcal {X}\) with fill distance \(h_{\mathcal {D},\mathcal {X}}\le h_0\) and for all \(f\in \mathcal {H}_G\) in the RKHS of a Gaussian kernel G,
\begin{equation*} \Vert f-s_\lambda \Vert _{L_q(\mathcal {X})}\le \left(2\exp \left(C\log (h_{\mathcal {D},\mathcal {X}})/h_{\mathcal {D},\mathcal {X}}\right)+\sqrt {\lambda }\exp \left(\tilde{C}/h_{\mathcal {D},\mathcal {X}}\right)\right)\Vert f\Vert _{\mathcal {H}_G}, \end{equation*}
which gives that, for \(\lambda \le \exp (2(C\log (h_{\mathcal {D},\mathcal {X}})-\tilde{C})/h_{\mathcal {D},\mathcal {X}})\),
\begin{equation*} \Vert f-s_\lambda \Vert _{L_q(\mathcal {X})}\le 3\exp \left(C\log (h_{\mathcal {D},\mathcal {X}})/h_{\mathcal {D},\mathcal {X}}\right)\Vert f\Vert _{\mathcal {H}_G}. \end{equation*}
For all design-point sets \(\mathcal {D}\subset \mathcal {X}\) with fill distance \(h_{\mathcal {D},\mathcal {X}}\le h_0\) and for all \(f\in \mathcal {H}_M\) in the RKHS of an inverse multiquadrics kernel M,
\begin{equation*} \Vert f-s_\lambda \Vert _{L_q(\mathcal {X})}\le \left(2\exp \left(-C/h_{\mathcal {D},\mathcal {X}}\right)+\sqrt {\lambda }\exp \left(\tilde{C}/h_{\mathcal {D},\mathcal {X}}\right)\right)\Vert f\Vert _{\mathcal {H}_M}, \end{equation*}
which gives that, for \(\lambda \le \exp (-2(C+\tilde{C})/h_{\mathcal {D},\mathcal {X}})\),
\begin{equation*} \Vert f-s_\lambda \Vert _{L_q(\mathcal {X})}\le 3\exp \left(-C/h_{\mathcal {D},\mathcal {X}}\right)\Vert f\Vert _{\mathcal {H}_M} . \end{equation*}
Notice that the constants C, \(\tilde{C}\), and \(h_0\) depend only on d, \(\ell\), and q.
Setting \(f:=K_{{\bf x}_0}\), \(s_\lambda := \widehat{K}_{{\bf x}_0}\), and \(q=\infty\) in Proposition 3 yields the following result on the optimal convergence rate of the maximum discrepancy between \(K_{{\bf x}_0}\) and its approximation \(\widehat{K}_{{\bf x}_0}\) in the infinite smoothness case:
Corollary 2.
Suppose that the assumptions of Proposition 3 are fulfilled. For a Gaussian kernel, if \(\lambda \le \exp (2(C\log (h_{\mathcal {D},\mathcal {X}})-\tilde{C})/h_{\mathcal {D},\mathcal {X}})\) and the design-point set \(\mathcal {D}\) is quasi-uniform so \(h_{\mathcal {D},\mathcal {X}}\asymp k^{-1/d}\), then the maximum discrepancy between the function \(f:=K_{{\bf x}_0}\) and its approximation \(s_\lambda := \widehat{K}_{{\bf x}_0}\) attains its optimal convergence rate:
\begin{equation*} \Vert K_{{\bf x}_0} - \widehat{K}_{{\bf x}_0}\Vert _{\infty } \lesssim \exp \left(-Ck^{\frac{1}{d}}\left(\log k\right)\right). \end{equation*}
For an inverse multiquadrics kernel, if \(\lambda \le \exp (-2(C+\tilde{C})/h_{\mathcal {D},\mathcal {X}})\) and the design-point set \(\mathcal {D}\) is quasi-uniform so \(h_{\mathcal {D},\mathcal {X}} \asymp k^{-1/d}\), then the maximum discrepancy between the function \(f:=K_{{\bf x}_0}\) and its approximation \(s_\lambda := \widehat{K}_{{\bf x}_0}\) attains its optimal convergence rate:
\begin{equation*} \Vert K_{{\bf x}_0} - \widehat{K}_{{\bf x}_0}\Vert _{\infty } \lesssim \exp \left(-Ck^{\frac{1}{d}}\right). \end{equation*}
In the infinite smoothness case under the fixed design setting, Corollary 2 indicates that \(\sigma _{k,B}({\bf x}_0) \lesssim \exp (-Ck^{\frac{1}{d}}/2)\) at any \({\bf x}_0\in \mathcal {X}\). The condition on the regularization parameter \(\lambda ={\sf V}_{k,\max }/\xi ^2\) stipulated in Corollary 2 can be expressed as \({\sf V}_{k,\max }\lesssim \exp (-(2C{d}^{-1}(\log k)+2\tilde{C})k^{1/d})\) and \({\sf V}_{k,\max }\lesssim \exp (-2(C+\tilde{C})k^{1/d})\) for a Gaussian kernel and an inverse multiquadrics kernel, or equivalently, \({\sf V}_{k,\max }\lesssim \exp (-\bar{C}k^{1/d})\) with \(\bar{C}=2(C+\tilde{C})\) if the logarithmic factor term is ignored. It implies that: (i) the upper bound of \({\sf V}_{k,\max }\) must decrease exponentially with \(k^{1/d}\); and (ii) given a fixed number of design points k, the higher the input-space dimensionality d of a given problem, the slower the decreasing rate required on \({\sf V}_{k,\max }\). The above also provides some insight into a desirable simulation experimental design to achieve the optimal convergence rate of \(\Vert \sigma ^2_{k,B}\Vert _{\infty }\). First, the design-point set must be quasi-uniform. Moreover, the numbers of simulation replications \(n_i\)’s must grow with the number of design points k at a much faster rate in the infinitely smoothness case compared to in the finite smoothness case.
Given that the kernel has infinite smoothness, the next result extends Theorem 2 by elucidating the order of the uniform error bound as \(k, B \rightarrow \infty\) in the fixed design setting.
Proposition 4.
Suppose that the assumptions of Theorem 2 and Corollary 2 are fulfilled, the kernel K has infinite smoothness, and \({\sf V}_{k,\max } \asymp \exp (-\bar{C} k^{1/d})\) with \(\bar{C}=2(C+\tilde{C})\), where C and \(\tilde{C}\) are defined in Proposition 3. Given any \(\alpha \in (0,1)\), it holds that
\[\begin{eqnarray*} {\rm P}\left(\sup \limits _{{\bf x}\in \mathcal {X}} |\mu _{k,B}({\bf x})-f({\bf x})| = \mathcal {O}\left(k^{\frac{1}{2d}} \exp (-Ck^{\frac{1}{d}}/2) \right) \right) \ge 1-\alpha . \end{eqnarray*}\]
Proof.
We see from Expression (8) that both \(\gamma _{k,B}(\tau)\) and \(\sqrt {\beta _k(\tau)}\sup _{{\bf x}\in \mathcal {X}}\sigma _{k,B}({\bf x})\) must converge to zero as \(k,B \rightarrow \infty\) to guarantee a vanishing approximation error at \(\forall {\bf x}\in \mathcal {X}\). In the infinite smoothness case under the fixed design setting, \(\lim \limits _{k, B\rightarrow \infty } \gamma _{k,B}(\tau)=0\) can be achieved if the grid constant as a function of k, \(\tau (k)\), decreases sufficiently fast. Recall from Expression (12) that \(\gamma _{k,B}(\tau) \lesssim ((\beta _k(\tau) \cdot \tau (k) \cdot k {\sf V}_{k,\min }^{-1})^{\tfrac{1}{2}} + \tau (k) + \tau (k) \cdot k {\sf V}_{k,\min }^{-\tfrac{1}{2}})\). Since \({\sf V}_{k,\max }/{\sf V}_{k,\min }\lt \infty\) and \({\sf V}_{k,\min } \asymp \exp (-\bar{C} k^{1/d})\) with \(\bar{C}=2(C+\tilde{C})\), setting \(\tau (k) = \mathcal {O}(\exp (-a k^{\tfrac{1}{d}}))\) where \(a \gt \bar{C} +C\) ensures \(\lim \limits _{k, B\rightarrow \infty } \gamma _{k,B}(\tau)=0\) and moreover \(\gamma _{k,B}(\tau) = o(\sqrt {\beta (\tau)}\sigma _{k,B}({\bf x}))\) at \(\forall {\bf x}\in \mathcal {X}\) as \(k, B\rightarrow \infty\). The latter follows from the fact that \(\beta _k(\tau) = \mathcal {O} (k^{\tfrac{1}{d}})\) due to Expression (9) and \(\sigma _{k,B}({\bf x}) = \mathcal {O} (\exp (-Ck^{\tfrac{1}{d}}/2))\) at \(\forall {\bf x}\in \mathcal {X},\) thanks to Corollary 2. Therefore, \(\gamma _{k,B}(\tau) + \sqrt {\beta _k(\tau)}\sup _{{\bf x}\in \mathcal {X}}\sigma _{k,B}({\bf x}) \lesssim \sqrt {\beta _{k}(\tau)} \sup _{{\bf x}\in \mathcal {X}}\sigma _{k,B}({\bf x}) = \mathcal {O}(k^{\tfrac{1}{2d}} \exp (-Ck^{\tfrac{1}{d}}/2))\) as \(k, B \rightarrow \infty\). □
We close this section with some practical insights into suitable experimental designs for SK metamodeling. To achieve a desirable convergence rate of \(\sigma _{k,B}(\cdot)\) under the supremum norm given a sufficiently large budget B to expend, an experimental design suitable for the infinite smoothness case is expected to use fewer design points but with more replications allocated to each design point to fulfill its condition on \({\sf V}_{k,\max }\), hence, a “sparse and deep” design is recommended. However, an experimental design suitable for the finite smoothness case can adopt relatively more design points but with fewer replications allocated at each point to fulfill its condition on \({\sf V}_{k,\max }\); hence, a “dense and shallow” design is recommended. Such insights echo those reported in other contexts in the stochastic simulation metamodeling literature; see, e.g., References [46, 55].

5 Discussion and Extension

This section provides implications for simulation experimental designs for SK and theoretically investigates the impact of noise variance estimation on the performance of the uniform error bound.

5.1 Discussion

This subsection provides some further insight into the impact of experimental designs and budget allocation schemes on SK metamodeling in light of the results obtained in Section 4 for the fixed design setting.
Given a fixed number of design points k, it is not difficult to see that \({\sf V}_{k,\max } = \max \limits _{1\le i \le k}{\sf V}({\bf x}_i)/n_i\) is minimized by setting the number of replications at design point \({\bf x}_i\) to
\begin{equation} n_i = \frac{{\sf V}({\bf x}_i)}{\sum _{j=1}^k {\sf V}({\bf x}_j)} B, \ i = 1,2,\ldots , k. \end{equation}
(16)
Under the budget allocation in Equation (16), we have \(\lambda = {\sf V}_{k,\max }/\xi ^2 = {(B\xi ^2)}^{-1} \sum _{i=1}^k {\sf V}({\bf x}_i)\). When k becomes large, it follows that
\[\begin{eqnarray*} \lambda = (B\xi ^2)^{-1} k \left(\frac{1}{k}\sum _{i=1}^k {\sf V}({\bf x}_i)\right) \approx (B\xi ^2)^{-1} k \left(|\mathcal {X}|^{-1}\int _{\mathcal {X}} {\sf V}({\bf u}) {\tt d} {\bf u}\right) \asymp k/B, \end{eqnarray*}\]
where the last step follows from \(|\mathcal {X}|^{-1}\int _{\mathcal {X}} {\sf V}({\bf u}) {\tt d} {\bf u}\lt \infty\).
However, if an equal budget allocation rule is applied, i.e., \(n_i = B/k\) for \(i=1,2,\ldots ,k\), then it is easy to see that in this case \(\lambda = (B\xi ^2)^{-1} k\max _{1\le i\le k} {\sf V}({\bf x}_i)\). When k becomes large, we have
\[\begin{eqnarray*} \lambda = (B\xi ^2)^{-1} k\left(\max _{1\le i\le k} {\sf V}({\bf x}_i) \right) \approx (B\xi ^2)^{-1} k\sup _{\mathcal {X}} {\sf V}({\bf x}) \asymp k/B, \end{eqnarray*}\]
where in the last step we have used \(\sup _{\mathcal {X}}{\sf V}({\bf x})\lt \infty\). In brief, when k is sufficiently large, different budget allocation rules lead to the same order of \(\lambda\), which only depends on the total budget B and the number of design points k.
The discussion above sheds some light on the relationship between the number of design points k and the total budget B to achieve the optimal decreasing rate of \(\Vert \sigma _{k,B}\Vert _{\infty }\). In light of the results for the fixed design setting in Section 4, we have the following remarks regarding the finite and the infinite smoothness cases. When the underlying kernel function has a smoothness level \(\nu \lt \infty\), the discussion above and Corollary 1 indicate that \(\lambda \asymp k/B \lesssim k^{1-\frac{2\nu }{d}}\), or equivalently, \(k\lesssim B^{\tfrac{d}{2\nu }}\), when the budget B is sufficiently large. Therefore, in addition to adopting a quasi-uniform design-point set, to achieve the optimal convergence rate of \(\sigma _{k,B}(\cdot)\), one should use a large number of design points k if the input-space dimensionality d of a given problem is high (respectively, the smoothness level \(\nu\) is low) and vice versa. However, when the kernel function has an infinite smoothness level, the discussion above and Corollary 2 indicate that \(\lambda \asymp k/B \lesssim \exp (-(2Cd^{-1}(\log k)+2\tilde{C})k^{1/d})\) (for a Gaussian kernel), or equivalently, \(k\lesssim (\log B)^d\) if the logarithmic factor terms are ignored when the budget B is sufficiently large. Hence, besides adopting a quasi-uniform design-point set, to achieve an optimal convergence rate of \(\Vert \sigma _{k,B}\Vert _{\infty }\) in this case, one should use a relatively large number of design points k if the input space dimensionality d of a given problem is high. However, the number of design points required is expected to be much smaller than that in the finite smoothness case.
Last but not least, we note that the root mean squared error of SK is upper bounded by the predictive standard deviation under the supremum norm:
\begin{align*} \mbox{RMSE} := \left(\int _{\mathcal {X}} \sigma _{k,B}^2({\bf x}) {\tt d}{\bf x}\right)^{1/2} \le \Vert \sigma _{k,B}\Vert _{\infty }. \end{align*}
Therefore, the aforementioned desirable simulation experimental designs for achieving the optimal convergence rate of \(\Vert \sigma _{k,B}\Vert _{\infty }\) also apply if the target is to achieve an adequate global response surface fitting with a guaranteed RMSE convergence rate.

5.2 Impact of Noise Variance Estimation on the Uniform Error Bound

In this subsection, we conduct a theoretical investigation of the impact of noise variance estimation on the performance of the uniform error bound. Recall that, in light of Theorem 1, we can obtain a high-probability uniform bound as given in Expression (7) for \(f(\cdot)\), assuming the knowledge of true noise variances \({\sf V}({\bf x}_i)\) for \(i=1,2,\ldots ,k\). Refer to this bound as the nominal uniform bound. Denote \(\widehat{{\sf V}}({\bf x}_i)\) as the sample variance calculated based on \(n_i\ge 2\) replications simulated at \({\bf x}_i\) for \(i=1,2,\ldots ,k\). Define \(\widehat{LB}({\bf x}):= \widehat{\mu }_{k,B}({\bf x})-\sqrt {\beta (\tau)}\widehat{\sigma }_{k,B}({\bf x})-\widehat{\gamma }_B(\tau)\) and \(\widehat{UB}({\bf x}):= \widehat{\mu }_{k,B}({\bf x})+\sqrt {\beta (\tau)}\widehat{\sigma }_{k,B}({\bf x})+\widehat{\gamma }_B(\tau)\), which are constructed using the sample variance \(\widehat{{\sf V}}({\bf x}_i)\) in place of \({\sf V}({\bf x}_i)\) for \(i=1,2,\ldots ,k\). We refer to \([\widehat{LB}({\bf x}), \widehat{UB}({\bf x})]\) as the empirical uniform bound for \(f({\bf x})\) at all \({\bf x}\in \mathcal {X}\) and study its performance as k and B tend to infinity.
Define \(n^*_k : = \min _{1 \le i\le k} n_i\) as the minimum number of replications allocated across the k design points, and recall \({\sf V}_{k,\max } = \max \limits _{1\le i \le k}{\sf V}({\bf x}_i)/n_i\). The following result provides a high-probability bound of the maximum discrepancy between the diagonal entries in the true noise variance-covariance matrix \(\Sigma _{\varepsilon }\) and those in its estimate \(\widehat{\Sigma }_\varepsilon :=\mbox{diag} (\widehat{{\sf V}}({\bf x}_1)/n_1, \widehat{{\sf V}}({\bf x}_2)/n_2, \ldots , \widehat{{\sf V}}({\bf x}_k)/n_k)\), which holds for all \(k\ge 1\).
Proposition 5.
Fix \(\alpha \in (0,1)\). Let \(\eta _{k,B}=\log ((\pi _k \alpha)^{-1})\) so \(\sum \nolimits _{k=1}^{\infty }\pi _k=1/3\). Suppose that the noise terms incurred at \({\bf x}_i\), \(\varepsilon _j({\bf x}_i)\)’s, are i.i.d. normal with mean zero and variance \({\sf V}({\bf x}_i)\) at \(i=1,2,\ldots ,k\), and \(n^*_k \ge 2\). Then, with probability at least \(1-\alpha /3\), it holds that,
\[\begin{eqnarray*} \max _{1 \le i\le k}\left|\frac{\widehat{{\sf V}}({\bf x}_i)-{\sf V}({\bf x}_i)}{n_i}\right| \le \sqrt {\frac{8\left(\eta _{k,B} + \log (2k)\right)}{(n^*_k-1)}} \cdot {\sf V}_{k,\max }, \quad \forall k\ge 1. \end{eqnarray*}\]
The proof of Proposition 5 is provided in Appendix A.1. Based on Proposition 5, we can analyze the difference between the empirical uniform bound and the nominal uniform bound as k and B tend to infinity.
Theorem 3.
Fix \(\alpha \in (0,1)\). Suppose that the assumptions of Theorem 2 and Proposition 5 are satisfied and the design-point set \(\mathcal {D}\) is quasi-uniform.
If the kernel K has \(\nu \gt d/2\) smoothness with \(\nu \lt \infty\), \({\sf V}_{k,\max } \asymp k^{1-2\nu /d}\), and \(k^{\tfrac{6\nu }{d}-1} (\log k) = o(n_k^*)\), then it holds with probability at least \(1-\alpha\) that \(\sup _{{\bf x}\in \mathcal {X}}|\tfrac{\widehat{LB}({\bf x}) - LB({\bf x})}{\sqrt {\beta (\tau)} \sigma _{k,B}({\bf x}) + \gamma _{k,B}(\tau)}| = o (1)\) and \(\sup _{{\bf x}\in \mathcal {X}}|\tfrac{\widehat{UB}({\bf x}) - UB({\bf x})}{\sqrt {\beta (\tau)} \sigma _{k,B}({\bf x}) + \gamma _{k,B}(\tau)}| = o (1)\) as \(k, B\rightarrow \infty\).
If the kernel K has infinite smoothness, \({\sf V}_{k,\max } \asymp \exp (-\bar{C} k^{1/d})\) for some \(\bar{C}\gt 0\), and \(k^2 (\log k) \exp (2(\bar{C}+C) k^{\tfrac{1}{d}})= o(n_k^*)\) with \(C, \bar{C}\gt 0\) as specified in the proof of Proposition 4, then it holds with probability at least \(1-\alpha\) that \(\sup _{{\bf x}\in \mathcal {X}}|\tfrac{\widehat{LB}({\bf x}) - LB({\bf x})}{\sqrt {\beta (\tau)} \sigma _{k,B}({\bf x}) + \gamma _{k,B}(\tau)}| = o (1)\) and \(\sup _{{\bf x}\in \mathcal {X}}|\tfrac{\widehat{UB}({\bf x}) - UB({\bf x})}{\sqrt {\beta (\tau)} \sigma _{k,B}({\bf x}) + \gamma _{k,B}(\tau)}| = o (1)\) as \(k, B\rightarrow \infty\).
The proof of Theorem 3 is deferred to Appendix A.2. We see from Theorem 3 that the difference between the empirical uniform bound and the nominal uniform bound becomes negligible so long as the numbers of replications at all design points grow sufficiently fast with the expansion of the design-point set \(\mathcal {D}\). While the conditions stipulated on the minimum number of replications \(n_k^*\) may seem stringent, the numerical results in Section 6 indicate that the empirical and nominal uniform bounds perform similarly in practice, unless the numbers of replications at some design points become very small. We note that the conservativeness in the conditions arises from the techniques used to upper bound various terms involving the inverse of \({\mathbf {\Sigma }}:=\left(\Sigma _{\sf M}+\Sigma _{\varepsilon }\right)\) in \(| \widehat{LB}({\bf x}) - LB({\bf x})|\) and \(| \widehat{UB}({\bf x}) - UB({\bf x})|\) and to ensure that these terms vanish faster than the nominal uniform bound’s half-width for all \({\bf x}\in \mathcal {X}\). Other proof techniques may provide less stringent conditions. The proof of Theorem 3 also reveals the importance of stabilizing the SK metamodeling process as the design-point set expands and the budget allocated grows. To be more precise, the condition of matrix \({\mathbf {\Sigma }}\) can become increasingly worse as \(k,B\rightarrow \infty\), which is reflected by the magnitude of \(\Vert {\mathbf {\Sigma }}^{-1}\Vert = 1/\lambda _{\min }({\mathbf {\Sigma }})\) in the proof of Theorem 3. This also leads us to briefly discuss the condition number of \({\mathbf {\Sigma }}\), \(\mbox{cond}({\mathbf {\Sigma }}) = \lambda _{\max }({\mathbf {\Sigma }})/ \lambda _{\min }({\mathbf {\Sigma }})\). It is known that the maximum eigenvalue typically incurs no problem and behaves nicely compared to the minimum eigenvalue (Chapter 12 of Reference [48]). Since \(\lambda _{\min }({\mathbf {\Sigma }}) \ge \lambda _{\min }(\Sigma _{\sf M}) + \lambda _{\min }(\Sigma _{\varepsilon }) \ge \max \lbrace \lambda _{\min }(\Sigma _{\varepsilon }), \lambda _{\min }(\Sigma _{\sf M})\rbrace\), we know more about the stability and accuracy of SK’s prediction if we know more about the lower bounds for \(\lambda _{\min }(\Sigma _{\sf M})\) and \(\lambda _{\min }(\Sigma _{\varepsilon })\). Lower bounds for \(\lambda _{\min }(\Sigma _{\sf M})\) are known to depend on the separation distance \(q_{\mathcal {D}}\). Recall from Definition 2 that \(q_{\mathcal {D}} \asymp h_{\mathcal {D}, \mathcal {X}}= \mathcal {O}(k^{-1/d})\) if the design-point set \(\mathcal {D}\) is quasi-uniform. In this case, it is known that \(\lambda _{\min }(\Sigma _{\sf M}) \gtrsim q_{\mathcal {D}}^{(2\nu -d)} = \mathcal {O}(k^{1-2\nu /d})\) in the finite smoothness case [51], whereas in the infinite smoothness case, \(\lambda _{\min }(\Sigma _{\sf M}) \gtrsim \exp (-C_0 q_{\mathcal {D}}^{-2})q_{\mathcal {D}}^{-d} = \mathcal {O}(\exp (-C_0 \cdot k^{2/d})k)\) for some \(C_0\gt 0\) (Section 12.2 of Reference [48]). To maintain SK’s numerical stability as \(k, B \rightarrow \infty\), \(\lambda _{\min }(\Sigma _{\varepsilon })= {\sf V}_{k,\min }\) should not diminish too rapidly. Since \(\lambda _{\min }({\mathbf {\Sigma }}) \ge \max \lbrace \lambda _{\min }(\Sigma _{\varepsilon }), \lambda _{\min }(\Sigma _{\sf M})\rbrace = \lambda _{\min }(\Sigma _{\varepsilon }) = {\sf V}_{k,\min }\) and \(\Vert {\mathbf {\Sigma }}^{-1}\Vert \le 1/\lambda _{\min }(\Sigma _{\varepsilon }) = {\sf V}^{-1}_{k,\min }\), the condition \({\sf V}_{k,\max }/{\sf V}_{k,\min }\lt \infty\) and that on the order of \({\sf V}_{k,\max }\) stipulated in Theorem 3 ensure the numerical stability of SK while also fulfilling the regularization parameter role played by the noise variances (manifested by \({\sf V}_{k,\max }\)) to achieve the best convergence rate of SK’s predictive variance. We will examine the impact of noise variance estimation on the performance of the uniform bound through numerical experiments in the next section.

6 Numerical Experiments

This section compares the performance of the nominal uniform bound assuming the knowledge of the true noise variances (given by Expression (7) in Section 3 and referred to as \(\mbox{CI}_{{\mbox{u}}}\)), the empirical uniform bound built with noise variances estimated (given at the beginning of Section 5.2 and referred to as \(\mbox{CI}_{{\mbox{eu}}}\)), and the Bonferroni correction-based simultaneous confidence interval built with noise variances estimated (given at the end of Section 2.1 and referred to as \(\mbox{CI}_{{\mbox{b}}}\)) through numerical evaluations. We consider the following examples: an M/M/1 queueing model and a dimension-flexible synthetic example.
We start by describing the general experimental setup for numerical evaluations. A simulation experiment is performed with a total budget of B observations to collect at k distinct design points, with \(n_i\) observations to obtain at design point \({\bf x}_i\), for \(i = 1, 2, \ldots , k\). The k design points are chosen according to either a random design or a fixed design, to be specified for each example. Three budget allocation schemes, i.e., the equal allocation, unequal allocation 1, and unequal allocation 2, are considered. Specifically, the equal budget allocation scheme sets \(n_i = \lceil B/k \rceil\), where \(\lceil a\rceil\) denotes the least integer not less than a. Assuming that the noise variance function is known, unequal allocation 1 sets \(n_i = \lceil \tfrac{{\sf V}({\bf x}_i)}{\sum _{l=1}^{k}{\sf V}({\bf x}_l)}B\rceil\), while unequal allocation 2 sets \(n_i = \lceil \tfrac{\sqrt {{\sf V}({\bf x}_i)}}{\sum _{l=1}^{k}\sqrt {{\sf V}({\bf x}_l)}}B\rceil\). We note that the goal of unequal allocation 1 is to equalize the variance of sample means across all design points. This approach has been explored in the literature on ranking and selection as well as multi-armed bandit problems [9, 16, 19], aiming to alleviate the impact of heteroscedasticity. Unequal allocation 2 assigns the simulation budget in accordance with the scale function \(\rho ({\bf x}): = \sqrt {{\sf V}({\bf x})}\) for \({\bf x}\in \mathcal {X}\); it is proposed based on the optimal design density \(h^*(\cdot)\) established for controlled nonparametric regression experiments. Specifically, \(h^*(x) = \rho (x)/ \int _{\mathcal {X}} \rho (u) du\) achieves the asymptotic sharp minimax lower bound for the mean integrated squared error when \(x \in \mathcal {X}= [0,1]\); see Reference [17] and references therein for more details. We consider various experimental settings comprising various combinations of the total budget B, the number of design points k, the design-point sampling scheme to choose k design points, and the budget allocation scheme adopted. The target confidence level is set to 0.95. SK is implemented by adopting two types of kernel functions, respectively, the Gaussian kernel and the Matérn kernel in Equation (15) with smoothness parameter \(\nu = 5/2+d/2\), where d denotes the dimension of the input space \(\mathcal {X}\).
For performance evaluation, we repeat the simulation experiment under each experimental setting for \(M=100\) independent macro-replications and calculate the empirical simultaneous coverage probability (\(\widehat{\mbox{SCP}}\)) of a given confidence bound defined as
\[\begin{eqnarray*} \widehat{\mbox{SCP}} = \frac{1}{M}\sum \limits _{m=1}^{M} {\mathbf {1}} \lbrace f({\bf x}_{0,i}) \in \mbox{CI}({\bf x}_{0,i}) \mbox{ for each } i=1,2,\ldots ,N \mbox{ on the $m$th macro-replication}\rbrace , \end{eqnarray*}\]
where \(\mbox{CI}({\bf x}_{0,i})\) refers to \(\mbox{CI}_{{\mbox{u}}}\), \(\mbox{CI}_{{\mbox{eu}}}\), or \(\mbox{CI}_{{\mbox{b}}}\) obtained at prediction point \({\bf x}_{0,i}\), \(i=1,2,\ldots ,N\); and \({\mathbf {1}}\lbrace \mathcal {A}\rbrace\) is 1 if event \(\mathcal {A}\) is true and 0 otherwise. Moreover, we compare the half-widths of \(\mbox{CI}_{{\mbox{u}}}\), \(\mbox{CI}_{{\mbox{eu}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) constructed for the N prediction points based on their respective maximum half-widths \(H_{\max }\) achieved on each macro-replication.

6.1 M/M/1 Queue

Consider simulating an M/M/1 queue with service rate 1 and arrival rate x with \(x \in \mathcal {X}= [0.3, 0.9]\). In this example, the simulation output recorded on each replication at a given design point is the average number of customers in the system from time 0 to T time units. It is well known from queueing theory that the steady-state mean number of customers in the queue is \(f(x)=x/(1-x)\), which is the mean function we intend to estimate. The noise variance function is \({\sf V}(x)\approx 2x(1+x)/(T(1-x)^4)\) for T large [50]. When simulating the M/M/1 queue at each design point, we initialize each replication in steady state and set the run length of each replication T to 1,000. Besides the general experimental setup given at the beginning of Section 6, the specific setup for this example is as follows: The value of the grid constant \(\tau\) is set to ensure that \(\sqrt {\beta _k(\tau)}\sigma _{k,B}(x)\) dominates \(\gamma _{k,B}(\tau)\) in Equation (7). We consider using a total budget \(B \in \lbrace 2,560,25,600\rbrace\) and varying the number of distinct design points \(k\in \lbrace 16,32,128,512\rbrace\) when \(B=2,560\) and \(k\in \lbrace 32,128,512,2,048\rbrace\) when \(B=25,600\). Given each choice of k, we choose k design points according to a uniform design or a grid design, where the former comprises a sample of k independent and uniformly distributed points and the latter k equispaced grid points in \(\mathcal {X}\). The three budget allocation schemes mentioned at the beginning of Section 6 are applied given each combination of B, k, and the design-point sampling scheme. A grid of \(N = 1,000\) equispaced points is selected from \(\mathcal {X}\) for prediction. Summary of results. Tables 1 and 2 show the \(\widehat{\mbox{SCP}}\)’s achieved by the empirical uniform bound (\(\mbox{CI}_{{\mbox{eu}}}\)), the nominal uniform bound (\(\mbox{CI}_{{\mbox{u}}}\)), and the Bonferroni correction-based simultaneous confidence intervals (\(\mbox{CI}_{{\mbox{b}}}\)) under different experimental settings when SK is implemented with the Gaussian and Matérn kernels. The following common observations are made across all experimental settings: First, given a fixed budget B, the \(\widehat{\mbox{SCP}}\)’s of \(\mbox{CI}_{{\mbox{eu}}}\) and \(\mbox{CI}_{{\mbox{u}}}\) are typically higher than the target level 0.95, while the \(\widehat{\mbox{SCP}}\)’s of \(\mbox{CI}_{{\mbox{b}}}\) are not close to 0.95, except when the number of design points k is relatively small under the grid design. Second, the simultaneous coverage performance of \(\mbox{CI}_{{\mbox{eu}}}\) remains satisfactory as the number of design points k increases until it reaches a point where only a few replications are allocated to some design points, regardless of the kernel function and the design-point sampling scheme adopted; see, e.g., the \(\widehat{\mbox{SCP}}\)’s of \(\mbox{CI}_{{\mbox{eu}}}\) obtained under the equal allocation and unequal allocation 1 when \(k=512\) and \(B = 2,560\). However, the simultaneous coverage performance of \(\mbox{CI}_{{\mbox{b}}}\) typically deteriorates as k increases given a fixed budget B, regardless of the kernel function adopted. This is evidenced by examining the \(\widehat{\mbox{SCP}}\)’s of \(\mbox{CI}_{{\mbox{b}}}\) obtained across all experimental settings shown in Tables 1 and 2. Third, the \(\widehat{\mbox{SCP}}\)’s of all bounds increase with the budget B given that the number of design points k is fixed. Last, applying unequal budget allocation schemes typically help improve the \(\widehat{\mbox{SCP}}\)’s of \(\mbox{CI}_{{\mbox{b}}}\) when the number of design points k is not too large relative to the budget B given. Such an impact on the \(\widehat{\mbox{SCP}}\)’s of \(\mbox{CI}_{{\mbox{eu}}}\) and \(\mbox{CI}_{{\mbox{u}}}\) is not as obvious, however.
Table 1.
BkGaussianMatérn
Equal allocationUnequal allocation 1Unequal allocation 2Equal allocationUnequal allocation 1Unequal allocation 2
\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)
2,560160.80110.97110.98110.76111110.9911
320.68110.93110.90110.65110.98110.9311
1280.460.980.980.52110.70110.460.990.990.54110.7511
5120.330.920.980.140.8810.57110.340.930.980.010.2810.6411
25,600320.98111111110.9711111111
1280.69110.96110.93110.67111110.9111
5120.54110.81110.78110.45110.90110.8111
2,0480.58110.330.9910.72110.44110.310.9910.7211
Table 1. \(\widehat{\mbox{SCP}}\)’s of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) Obtained for the M/M/1 Example under the Grid Design
Table 2.
BkGaussianMatérn
Equal allocationUnequal allocation 1Unequal allocation 2Equal allocationUnequal allocation 1Unequal allocation 2
\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)
2,560160.440.920.910.320.880.880.370.890.890.480.940.940.330.910.910.390.930.93
320.460.950.950.500.970.980.560.980.980.440.960.960.590.990.990.550.990.99
1280.420.980.990.60110.55110.3810.990.62110.6011
5120.300.910.980.180.8110.70110.230.9310.020.3610.6711
25,600320.600.940.940.550.980.980.540.990.990.600.960.960.590.990.990.580.990.99
1280.51110.83110.74110.48110.85110.7511
5120.61110.82110.74110.56110.81110.7411
2,0480.48110.320.9910.58110.47110.300.9810.5811
Table 2. \(\widehat{\mbox{SCP}}\)’s of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) Obtained for the M/M/1 Example under the Uniform Design
We next examine the maximum half-widths \(H_{\max }\) of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) obtained under different experimental settings. Figure 1 summarizes the magnitudes of \(H_{\max }\) (in a logarithmic scale) of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) obtained under the grid design with a total budget of \(B=2,560\) when SK is implemented with the Gaussian and Matérn kernels. The corresponding results obtained with \(B=25,600\) and those obtained with \(B \in \lbrace 2,560, 25,600\rbrace\) under the uniform design are shown in Figures 68 in Appendix D.1. We have the following observations from Figures 1 and 6: First, the \(H_{\max }\)’s of \(\mbox{CI}_{{\mbox{eu}}}\) and \(\mbox{CI}_{{\mbox{u}}}\) are larger than those of \(\mbox{CI}_{{\mbox{b}}}\) under each experimental setting considered, regardless of the kernel function adopted. Second, the magnitudes of \(H_{\max }\) of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) decrease with the number of design points k when a large total budget \(B=25,600\) is applied under all budget allocation schemes. However, this trend may not hold for a small total budget. Third, for \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\), when a given budget B is allocated to a fixed number of design points, the magnitudes of \(H_{\max }\) produced under the equal allocation scheme are the highest. It is worth noting that, regarding all bounds, the two unequal budget allocation schemes lead to shorter half-widths than the equal budget allocation scheme without compromising the simultaneous coverage performance, as observed in Table 1. The magnitudes of \(H_{\max }\) produced under the two unequal allocation schemes are similar, with unequal allocation 2 leading to higher \(H_{\max }\) values than unequal allocation 1. Last, regarding the impact of the design-point sampling scheme, the uniform design tends to lead to more outliers of the \(H_{\max }\) values compared to the grid design; see Figures 7 and 8 in Appendix D.1 for details. Moreover, under the uniform design, the unequal allocation schemes yield shorter half-widths of all bounds compared to the equal allocation scheme, but the difference between the two unequal allocation schemes is slight.
Fig. 1.
Fig. 1. Boxplots of \(\log (H_{\max })\) of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) for the M/M/1 example obtained on 100 independent macro-replications under the grid design with \(B=2,560\).
Fig. 2.
Fig. 2. Boxplots of \(\log (H_{\max })\) of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) for the 1D example obtained on 100 independent macro-replications under LHD with \(B=2,560\).
Fig. 3.
Fig. 3. Boxplots of \(\log (H_{\max })\) of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) for the 1D example obtained on 100 independent macro-replications under LHD with \(B = 10,240\).
Fig. 4.
Fig. 4. Boxplots of \(\log (H_{\max })\) of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) for the 5D example obtained on 100 independent macro-replications under LHD with \(B = 2,560\).
Fig. 5.
Fig. 5. Boxplots of \(\log (H_{\max })\) of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) for the 5D example obtained on 100 independent macro-replications under LHD with \(B = 10,240\).
We summarize the impacts of noise variance estimation, the design-point sampling scheme, and the choice of kernel function observed in the M/M/1 queueing example. First, comparisons based on the \(\widehat{\mbox{SCP}}\)’s and \(H_{\max }\)’s of \(\mbox{CI}_{{\mbox{eu}}}\) and \(\mbox{CI}_{{\mbox{u}}}\) indicate that the performance of the empirical uniform bound is very similar to that of the nominal one, except when the number of replications is too few at some design points, since using the variance estimates can incur additional variability in the SK model fitting. This corroborates the theoretical results provided in Section 5.2. Second, with other experimental settings held constant, the performance of all bounds under the grid design that is quasi-uniform typically outperforms that under the uniform design. This aligns with the theoretical results provided in Section 4. Last, with other experimental settings held constant, the performance of all bounds built with the Matérn kernel is comparable to that built with the Gaussian kernel for the M/M/1 queueing example. However, the magnitudes of \(H_{\max }\) of all bounds built with the Matérn kernel tend to be higher than those with the Gaussian kernel; this is because a faster convergence rate of the predictive variance is achieved with the Gaussian kernel than with the Matérn kernel, corroborating Corollaries 1 and 2, along with the comments following the results. Moreover, when the number of replications at some design points is too small, using the Matérn kernel can result in fitting issues that render the SK model unstable and adversely affect simultaneous coverage performance; see the \(\widehat{\mbox{SCP}}\)’s of \(\mbox{CI}_{{\mbox{eu}}}\) and \(\mbox{CI}_{{\mbox{b}}}\) obtained under unequal allocation 1 when \(B=2,560\) and \(k=512\) shown in Table 1 and the corresponding \(H_{\max }\)’s shown in Figure 1(d) in this subsection and Figure 7(d) in Appendix D.1.

6.2 A Dimension-flexible Example

Consider the following dimension-flexible example adapted from the one studied by Reference [23], where \(\mathcal {X}= [-1,1]^d\) and d denotes the input-space dimension. The simulation output at design point \({\bf x}=(x_1,x_2,\ldots ,x_d)^\top\) on the jth replication is generated as \(\mathcal {Y}_{j}({\bf x}) = f({\bf x}) + \varepsilon _j({\bf x}),\) where \(\varepsilon _j({\bf x})\)’s are i.i.d. normally distributed with mean zero and variance \({\sf V}({\bf x})\). The noise variance function is \({\sf V}({\bf x}) = (2+\cos (\pi +\sum _{i=1}^d x_i/d))^2\) for any \({\bf x}\in \mathcal {X}\). The mean function of interest is given by \(f(x)=\sin (9x^2)\) if \(d=1\) and \(f({\bf x})=\sin (9x_1^2) + \sin ((\sum _{i=2}^d 3 x_i/(d-1))^2)\) if \(d\gt 1\). We describe the specific setup for this example next and refer the reader to the beginning of Section 6 for the general experimental setup. We consider two problem cases corresponding to \(d=1\) and \(d=5\). For \(d=1\), we vary the number of design points \(k \in \lbrace 16, 64, 256, 512\rbrace\) when \(B=2,560\) and \(k\in \lbrace 16,64,256,512,2,048\rbrace\) when \(B=10,240\). For \(d=5\), we vary \(k \in \lbrace 16, 64, 256, 512\rbrace\) when \(B=2,560\) and \(k \in \lbrace 64,256,512,1,024,2,048\rbrace\) when \(B=10,240\). Given each choice of k, a set of k design points is chosen according to a uniform design or a maximin Latin hypercube design (LHD). The three budget allocation schemes mentioned at the beginning of Section 6 are applied given each combination of B, k, and the design-point sampling scheme. On each macro-replication, we independently generate a set of \(N = 2,500\) points from \(\mathcal {X}\) according to a maximin LHD as the prediction points for performance evaluation.
Summary of results. We first look into the 1D case. Tables 3 and 4 present the \(\widehat{\mbox{SCP}}\)’s of the empirical uniform bound (\(\mbox{CI}_{{\mbox{eu}}}\)), the nominal uniform bound (\(\mbox{CI}_{{\mbox{u}}}\)), and the Bonferroni correction-based simultaneous confidence intervals (\(\mbox{CI}_{{\mbox{b}}}\)) under different experimental settings when SK is implemented with the Gaussian and Matérn kernels. Observations akin to those noted in the M/M/1 queueing example are summarized as follows: First, given a fixed budget B, the \(\widehat{\mbox{SCP}}\)’s of \(\mbox{CI}_{{\mbox{eu}}}\) and \(\mbox{CI}_{{\mbox{u}}}\) are typically higher than the target level 0.95. Moreover, the performance of \(\mbox{CI}_{{\mbox{eu}}}\) is very similar to that of the nominal one in this 1D case, except when the number of replications is too few at some design points. Second, the \(\widehat{\mbox{SCP}}\)’s of all bounds increase with the budget B given that the number of design points k is fixed. Third, applying unequal budget allocation schemes typically helps improve the \(\widehat{\mbox{SCP}}\)’s of \(\mbox{CI}_{{\mbox{b}}}\) when the number of design points k is not too large relative to the budget B given. Last, the \(\widehat{\mbox{SCP}}\)’s of \(\mbox{CI}_{{\mbox{b}}}\) deteriorate as the number of design points k becomes large, regardless of the budget allocation schemes adopted. We highlight that this 1D case differs from the M/M/1 example in the following aspects: First, the \(\widehat{\mbox{SCP}}\)’s of \(\mbox{CI}_{{\mbox{b}}}\) are not significantly lower than those of \(\mbox{CI}_{{\mbox{eu}}}\) and \(\mbox{CI}_{{\mbox{u}}}\), so long as the number of design points k is not too large relative to the budget B given. Second, all bounds built with the Matérn kernel achieve comparable or higher \(\widehat{\mbox{SCP}}\)’s compared to those built with the Gaussian kernel, especially when the number of design points k is small.
Table 3.
BkGaussianMatérn
Equal allocationUnequal allocation 1Unequal allocation 2Equal allocationUnequal allocation 1Unequal allocation 2
\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)
2,560160.900.920.920.920.950.960.930.940.94111111111
641110.99110.9911111111111
2560.94110.97110.940.9910.98110.98110.9811
5120.390.9210.430.9410.580.9810.330.9210.380.9210.530.941
10,240160.930.990.990.940.990.990.920.990.99111111111
64111111111111111111
2560.99111111110.9911111111
5121111110.9911111111111
2,0480.310.9910.340.9710.480.9910.250.9310.210.9110.410.981
Table 3. \(\widehat{\mbox{SCP}}\)’s of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) Obtained for the 1D Case under LHD
Table 4.
BkGaussianMatérn
Equal allocationUnequal allocation 1Unequal allocation 2Equal allocationUnequal allocation 1Unequal allocation 2
\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)
2,560160.940.950.940.960.970.930.940.940.94111111111
640.910.920.950.990.990.980.960.960.970.9911111111
2560.9710.970.97110.95110.98110.99110.9911
5120.430.9810.460.9210.580.9910.460.9610.400.8810.650.981
10,240160.970.980.970.970.980.980.970.980.980.99110.99110.9911
640.990.990.990.990.990.99111111111111
2560.9811111111111111111
5120.99111110.9911111111111
2,0480.300.9710.380.9810.56110.170.9310.290.9110.470.991
Table 4. \(\widehat{\mbox{SCP}}\)’s of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) Obtained for the 1D Case under the Uniform Design
We further delve into the maximum half-widths \(H_{\max }\) of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) obtained under different experimental settings. To save space, we concentrate on analyzing the results obtained under LHD with the equal allocation scheme. Similar observations apply to the results obtained under LHD with the two unequal budget allocation schemes and under the uniform design with all three budget allocations; see Figures 9 and 10 in Appendix D.2.1 for details. Figures 2 and 3 summarize the magnitudes of \(H_{\max }\) (in a logarithmic scale) of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) obtained under LHD with a total budget of \(B=2,560\) and 10,240. First, the \(H_{\max }\)’s of \(\mbox{CI}_{{\mbox{eu}}}\) and \(\mbox{CI}_{{\mbox{u}}}\) are larger than those of \(\mbox{CI}_{{\mbox{b}}}\) under each experimental setting considered, regardless of the kernel function adopted. Second, the magnitudes of \(H_{\max }\) of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) decrease with the number of design points k when a large total budget \(B=10,240\) is applied. However, this trend is less discernible with a small budget. Third, the magnitudes of \(H_{\max }\) of \(\mbox{CI}_{{\mbox{eu}}}\) and \(\mbox{CI}_{{\mbox{u}}}\) are similar, except when k becomes large relative to the given budget B so using variance estimates based on a few replications can incur additional variability in SK model fitting, which impacts \(\mbox{CI}_{{\mbox{eu}}}\); this observation holds regardless of the kernel function adopted. Fourth, applying unequal budget allocation schemes only slightly reduces the half-widths of all three bounds, as observed in Figures 9(a)–(d) and 10(a)–(d) in Appendix D.2.1. Last, the magnitudes of \(H_{\max }\) of all bounds built with the Matérn kernel tend to be higher than those with the Gaussian kernel, which explains the corresponding higher \(\widehat{\mbox{SCP}}\)’s observed in Table 3. We also highlight some differences in the 1D case from the M/M/1 example. Given a fixed total budget B, especially a small one, using a small number of design points k can lead to unstable SK model fitting when the Gaussian kernel is adopted, resulting in highly variable \(H_{\max }\) values for all bounds; see, e.g., the \(H_{\max }\)’s obtained for \(k=16\) in Figure 2(a). This is because the true mean function \(f(\cdot)\) in this 1D case is nonlinear, and its trend changes more frequently than that of the M/M/1 example across the input space, due to its combination of the trigonometric function and higher-order polynomial terms. It is known that the Gaussian kernel is not flexible enough for modeling a function with such features, and using a small number of design points exacerbates this deficiency in capturing the true mean function \(f(\cdot)\) in the 1D case.
We next investigate the 5D case. Compared to the 1D case, notice that the true mean function \(f(\cdot)\) in this 5D case is more complex and changes more rapidly across its input space. Tables 5 and 6 present the \(\widehat{\mbox{SCP}}\)’s of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) obtained under various experimental settings. We observe that, similar to the 1D case, the performance of \(\mbox{CI}_{{\mbox{b}}}\) built with the Gaussian kernel is notably weak, regardless of the budget size B. In contrast, \(\mbox{CI}_{{\mbox{b}}}\) built with the Matérn kernel delivers relatively robust performance under the same experimental settings. Furthermore, in this 5D case, an increase in the number of design points k is associated with higher \(\widehat{\mbox{SCP}}\)’s for \(\mbox{CI}_{{\mbox{b}}}\) especially when the Matérn kernel is adopted, underscoring the need of using a dense design with more design points for capturing the function in high-dimensional problems and enhancing the reliability of the confidence bound built. Regarding \(\mbox{CI}_{{\mbox{eu}}}\) and \(\mbox{CI}_{{\mbox{u}}}\), the Matérn kernel also facilitates consistently strong simultaneous coverage performance across different experimental settings, highlighting its suitability for modeling highly nonlinear and rapidly changing response surfaces in high-dimensional problems. In contrast, due to the limitations of the Gaussian kernel in modeling the mean surface \(f(\cdot)\) in this 5D case, the \(\widehat{\mbox{SCP}}\)’s of \(\mbox{CI}_{{\mbox{eu}}}\) and \(\mbox{CI}_{{\mbox{u}}}\) suffer especially when the number of design points k is small.
Table 5.
BkGaussianMatérn
Equal allocationUnequal allocation 1Unequal allocation 2Equal allocationUnequal allocation 1Unequal allocation 2
\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)
2,5601600.510.4900.510.5000.520.510.10110.07110.0811
6401101100.990.990.39110.44110.3611
2560.03110.01110110.79110.70110.7811
5120.1710.970.1710.990.1610.980.80110.78110.7811
10,2406401100.990.990110.32110.32110.3411
2560.01110.01110.01110.65110.70110.6411
5120.08110.05110.10110.90110.90110.8411
1,0240.17110.19110.19110.92110.89110.9211
2,0480.24110.24110.2210.990.96110.97110.9911
Table 5. \(\widehat{\mbox{SCP}}\)’s of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) Obtained for the 5D Case under LHD
Table 6.
BkGaussianMatérn
Equal allocationUnequal allocation 1Unequal allocation 2Equal allocationUnequal allocation 1Unequal allocation 2
\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)\(\mbox{CI}_{{\mbox{b}}}\)\(\mbox{CI}_{{\mbox{eu}}}\)\(\mbox{CI}_{{\mbox{u}}}\)
2,5601600.690.7100.740.7200.730.730.14110.12110.1211
6400.9910110110.33110.22110.3311
2560110110110.65110.60110.6611
5120.2410.990.2110.960.2310.990.80110.77110.7211
10,240640110110110.19110.19110.2011
2560110.01110110.57110.58110.5811
5120.09110.11110.09110.83110.86110.7611
1,0240.24110.18110.23110.92110.92110.9511
2,0480.2810.990.2310.990.2710.990.95110.95110.9211
Table 6. \(\widehat{\mbox{SCP}}\)’s of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) Obtained for the 5D Case under the Uniform Design
Figures 4 and 5 summarize the magnitudes of \(H_{\max }\) (in a logarithmic scale) of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\), employing LHD with a total budget \(B=2,560\) and 10,240. For simplicity, the discussion below focuses on the results obtained under LHD with the equal allocation scheme. Similar observations apply to the results obtained under LHD with the two unequal budget allocation schemes and under the uniform design with all three budget allocations, which are detailed in Figures 11 and 12 provided in Appendix D.2.2. First and foremost, the variation exhibited by the \(H_{\max }\) values of all bounds built with the two kernels demonstrates the notable difference between the two kernels in this 5D case as the number of design points k increases, especially when the given budget B is small. Specifically, the Gaussian kernel leads to increased instability in modeling as the number of design points k grows, contrasting with the Matérn kernel, which maintains consistent modeling stability across the range of k values considered; compare Figures 4(a) and (b). Furthermore, given a large budget B, the \(H_{\max }\)’s of all bounds built using the Gaussian kernel show a decreasing trend as the number of design points k increases. However, such a trend seems absent when the Matérn kernel is adopted. This observation explains the corresponding low \(\widehat{\mbox{SCP}}\)’s of \(\mbox{CI}_{{\mbox{b}}}\) observed in Tables 5 and 6. An explanation is that the predictive variance (hence the confidence bound width) associated with the Gaussian kernel diminishes faster compared with the Matérn kernel, as indicated by Corollaries 1 and 2 and the comments immediately following them.
Finally, we summarize key insights obtained from the numerical evaluations. When applying SK for metamodeling purposes, selecting an appropriate kernel suitable for modeling the function of interest is crucial. The Gaussian kernel performs well in queueing applications like the M/M/1 example considered here [53]. In contrast, only the Matérn kernel with smoothness parameter \(\nu = 5/2+d/2\) was considered by Reference [23] from where the dimension-flexible example originated, which also showed its competence in our evaluations. Therefore, choosing an appropriate kernel function is example-dependent and should take into account the features of the function of interest. The performance of Bonferroni correction-based simultaneous confidence intervals is sensitive to this choice, and they can be unreliable even when built by adopting an appropriate experimental design. In contrast, the uniform error bound is more robust, especially in high-dimensional problems. Second, fixed designs can outperform random designs in facilitating stability in SK model fitting and simultaneous coverage performance. Last, unequal allocation schemes can enhance the performance of confidence bounds, but their impact is limited.

7 Conclusion

In this article, we proposed a method for constructing a uniform error bound with a prescribed confidence level for the SK predictor. The investigation of the asymptotic properties of the proposed uniform error bound delves into analyzing the convergence rate of SK’s predictive variance under the supremum norm. The work sheds light on suitable design-point sampling schemes and budget allocation schemes for SK metamodeling to ensure desirable large-sample predictive performance. Furthermore, we conducted theoretical and numerical investigations into the impact of noise variance estimation on the empirical uniform error bound’s performance. Our comprehensive numerical evaluations corroborated the theoretical findings and demonstrated the competence of the uniform error bound in implementation under various experimental settings.
In analyzing the convergence rates of SK’s predictive variance under the supremum norm, we derived an upper bound for SK’s predictive variance, characterized this upper bound as the solution to a regularized kernel-based approximation (or equivalently, a kernel ridge regression (KRR)) problem with noiseless observations of kernel function values, and identified the maximum noise variance across design points as playing a crucial role in effectively serving as the regularization parameter in the KRR problem. The conditions on the regularization parameter and experimental designs to ensure the solution to the KRR problem achieve the fastest convergence rate under the supremum norm then translate into conditions on budget allocation and design-point selection for SK’s predictive variance under the supremum norm and hence the proposed uniform error bound to achieve a desirable convergence rate. It is worth noting that, for uncertainty quantification of GPR modeling subject to the impact homoscedastic noise, Wang (2021) examined the reliability of the pointwise GP confidence interval under model misspecification [44]. The study also recognized the role played by the ratio of the noise variance to the spatial variance as the regularization parameter in GPR, considering that its value can decrease with the increase in the number of design points. However, it is crucial to highlight that the value of this regularization parameter is externally imposed without a connection to budget allocation and hence no insights were derived. A caveat to note is that the results derived in this work rely on the assumption that the mean function of interest can be accurately modeled by a GP with the specified kernel function. Our numerical examples demonstrated that selecting an appropriate kernel function is example-dependent and should take into account the features of the function of interest.
There are several promising avenues for future research. First, despite its desirable simultaneous coverage guarantee and relative robustness to kernel choice and design-point sampling schemes, the uniform bound often exhibits wider interval widths compared to the Bonferroni bound, resulting in reduced sharpness in specific applications. One potential approach to enhancing the uniform bound’s performance is to reduce \(\beta (\tau)\) and \(\gamma _{k,B}(\tau)\) in Expression (7) by optimizing the choice of the grid constant \(\tau\) and employing suitable experimental designs. Second, investigating the reliability of the uniform error bound under model misspecification is a priority. Moreover, extensions to other cases, such as f being a sample of a non-zero mean GP, a nonstationary GP, or unsuitable to be modeled as a sample of a GP, are promising directions to explore. Third, studying the effects of GPR hyper-parameter estimation on the uniform error bound’s accuracy warrants a detailed examination. Moreover, given that substituting sample variances with smoothed variance estimates enhances SK model fitting and predictive performance [45, 46], using smoothed variance estimates is anticipated to improve the simultaneous coverage performance of the uniform bound. We intend to investigate this impact, building upon the discussion in Section 5.2. Lastly, developing a numerically stable and computationally efficient uniform error bound for large-scale GPR applications under heteroscedasticity presents an intriguing challenge.

Acknowledgments

Portions of this article were previously published in [52].

A Proofs in Section 5

A.1 Proof of Proposition 5

Proof.
We start by presenting some definitions and results essential for the subsequent development.
Definition 3 (Definition 2.7 on page 26 of Reference [42]).
A random variable X with mean \(\mu = {\rm E}[X]\) is sub-exponential if there are non-negative parameters (\(\nu , \alpha\)) such that
\begin{equation*} {\rm E}[e^{t (X-\mu)}] \le e^{\frac{\nu ^2 t^2}{2}}, \qquad \mbox{for all } |t|\lt \frac{1}{\alpha } . \end{equation*}
 □
Lemma 1 (Concentration inequality for chi-squared variables, page 29 of Wainwright [42]).
A chi-squared \(\mathcal {X}^2\) random variable with n degrees of freedom, denoted by \(Y \sim \mathcal {X}^2_n\), is sub-exponential with parameters \((\nu , \alpha) = (2\sqrt {n}, 4)\), and the two-sided tail bound follows as
\begin{equation*} {\rm P}\left(\left|\frac{1}{n} Y - 1\right| \ge t\right) \le 2 e^{-n t^2/8}, \qquad \mbox{for all } t \in (0,1) . \end{equation*}
We are now ready to prove Proposition 5. At any given design point \({\bf x}_i\) (\(i = 1,2,\ldots ,k\)), consider the random variable \((n_i-1) \widehat{{\sf V}}({\bf x}_i)/{\sf V}({\bf x}_i)\). Since the noise terms incurred at \({\bf x}_i\), \(\varepsilon _j({\bf x}_i)\)’s, are i.i.d. normal with mean zero and variance \({\sf V}({\bf x}_i)\), it follows that \((n_i-1) \widehat{{\sf V}}({\bf x}_i)/{\sf V}({\bf x}_i) \sim \mathcal {X}^2_{n_i-1}\), a \(\mathcal {X}^2\) random variable with (\(n_i-1)\) degrees of freedom.
By Lemma 1, we have
\begin{equation} {\rm P}\left(\left|\frac{\widehat{{\sf V}}({\bf x}_i)}{{\sf V}({\bf x}_i)} - 1\right| \ge t\right) \le 2 e^{-(n_i-1) t^2/8}, \qquad \mbox{for all } t \in (0,1) . \end{equation}
(A.17)
Define events \(\mathcal {A}_i := \lbrace |\tfrac{\widehat{{\sf V}}({\bf x}_i)-{\sf V}({\bf x}_i)}{n_i}| \le t \cdot \tfrac{{\sf V}({\bf x}_i)}{n_i}\rbrace\) and \(\mathcal {B}_i := \lbrace |\tfrac{\widehat{{\sf V}}({\bf x}_i)-{\sf V}({\bf x}_i)}{n_i}| \le t \cdot {\sf V}_{k,\max }\rbrace\) for \(i= 1,2,\ldots ,k\). In light of Expression (A.17) and \({\sf V}_{k,\max } =\max _{1\le i\le k}\tfrac{{\sf V}({\bf x}_i)}{n_i}\), we have for each \({\bf x}_i\) given,
\begin{equation} {\rm P}\left(\mathcal {B}_i^c\right) \le {\rm P}\left(\mathcal {A}_i^c\right) \le 2 e^{-(n_i-1) t^2/8}, \qquad \mbox{for all } t \in (0,1) . \end{equation}
(A.18)
In particular, it follows from Expression (A.18) that, for all \(t \in (0,1)\),
\begin{align} {\rm P}\left(\max _{1 \le i\le k}\left|\frac{\widehat{{\sf V}}({\bf x}_i)-{\sf V}({\bf x}_i)}{n_i}\right| \ge t \cdot {\sf V}_{k,\max }\right) = {\rm P}\left(\cup _{i=1}^k\mathcal {B}_i^c\right) \le 2\sum _{i=1}^k e^{-(n_i-1) t^2/8} \le 2k e^{-(n_k^*-1) t^2/8} . \end{align}
(A.19)
Set \(e^{-\eta _{k,B}} = 2k e^{-(n^*_k-1) t^2/8}\) for any \(t\in (0,1)\). Correspondingly, a given \(\eta _{k,B}\gt 0\) corresponds to \(t = \sqrt {\tfrac{8(\eta _{k,B} + \log (2k))}{(n^*_k-1)}}\). Then, Expression (A.19) can be equivalently written as
\[\begin{eqnarray} {\rm P}\left(\max _{1 \le i\le k}\left|\frac{\widehat{{\sf V}}({\bf x}_i)-{\sf V}({\bf x}_i)}{n_i}\right| \ge \sqrt {\frac{8\left(\eta _{k,B} + \log (2k)\right)}{(n^*_k-1)}} \cdot {\sf V}_{k,\max }\right) \le e^{-\eta _{k,B}} . \end{eqnarray}\]
(A.20)
Now, if we set \(\eta _{k,B}=\log ((\pi _k \alpha)^{-1})\) so \(\sum \nolimits _{k=1}^{\infty }\pi _k=1/3\), then applying the union bounds over all \(k\ge 1\) yields
\[\begin{eqnarray*} \max _{1 \le i\le k}\left|\frac{\widehat{{\sf V}}({\bf x}_i)-{\sf V}({\bf x}_i)}{n_i}\right| \le \sqrt {\frac{8\left(\eta _{k,B} + \log (2k)\right)}{(n^*_k-1)}} \cdot {\sf V}_{k,\max }, \quad \forall k\ge 1 \end{eqnarray*}\]
holds true with probability of at least \(1-\alpha /3\).

A.2 Proof of Theorem 3

Proof.
Define \({\mathbf {\Sigma }}:= \Sigma _{\sf M}+ \Sigma _{\varepsilon }\) and \(\widehat{{\mathbf {\Sigma }}} := \Sigma _{\sf M}+ \widehat{\Sigma }_\varepsilon\). And let \(\Delta \Sigma _{\varepsilon }: = \widehat{\Sigma }_\varepsilon - \Sigma _{\varepsilon }\). Recall the definitions of \(\widehat{LB}({\bf x})\) and \(\widehat{UB}({\bf x})\) of the empirical uniform bound. We have
\begin{align} \left| \widehat{LB}({\bf x}) - LB({\bf x})\right| = \left| \widehat{UB}({\bf x}) - UB({\bf x})\right| \le \Delta \mu _{k,B}({\bf x}) + \sqrt {\beta (\tau)} \Delta \sigma _{k,B}({\bf x}) + \Delta \gamma _{k,B}(\tau), \end{align}
(A.21)
where for all \({\bf x}\in \mathcal {X}\),
\begin{align} \Delta \mu _{k,B}({\bf x}) & : = \left|\widehat{\mu }_{k,B}({\bf x}) - \mu _{k,B}({\bf x})\right| = \left|\Sigma _{\sf M}({\bf x}_0,{\bf X})^\top \left(\widehat{{\mathbf {\Sigma }}}^{-1}-{\mathbf {\Sigma }}^{-1}\right)\bar{\mathcal {Y}} \right| , \nonumber \nonumber\\ \Delta \sigma _{k,B}({\bf x}) &: = \sqrt { \Delta \sigma _{k,B}^2({\bf x}) }, \nonumber \nonumber\\ \Delta \sigma _{k,B}^2({\bf x}) & := \left| \widehat{\sigma }_{k,B}^2({\bf x}) - \sigma _{k,B}^2({\bf x}) \right| = \left|\Sigma _{\sf M}({\bf x}_0,{\bf X})^\top \left(\widehat{{\mathbf {\Sigma }}}^{-1}-{\mathbf {\Sigma }}^{-1}\right)\Sigma _{\sf M}({\bf x}_0,{\bf X}) \right|, \nonumber \nonumber\\ \Delta \gamma _{k,B}(\tau) & := \Delta L_{\mu _{k,B}} \tau + \sqrt {\beta (\tau)} \Delta \omega _{\sigma _{k,B}}(\tau), \end{align}
(A.22)
with \(\Delta L_{\mu _{k,B}}:= |\widehat{L}_{\mu _{k,B}} - L_{\mu _{k,B}}|\) and \(\Delta \omega _{\sigma _{k,B}}(\tau) := (2\tau L_\Sigma k\left\Vert \widehat{{\mathbf {\Sigma }}}^{-1} - {\mathbf {\Sigma }}^{-1}\right\Vert \max \limits _{{\bf x},{\bf x}^{\prime }\in \mathcal {X}}\Sigma _{\sf M}({\bf x}, {\bf x}^{\prime }))^{\tfrac{1}{2}}\).
Recall from Equation (4) that
\begin{align*} \omega _{\sigma _{k,B}}(\tau) = \sqrt {2\tau L_\Sigma \left(1+k\left\Vert {\mathbf {\Sigma }}^{-1}\right\Vert \max \limits _{{\bf x},{\bf x}^{\prime }\in \mathcal {X}}\Sigma _{\sf M}({\bf x}, {\bf x}^{\prime })\right)}, \end{align*}
and \(\widehat{\omega }_{\sigma _{k,B}}(\tau)\) is obtained by replacing \({\mathbf {\Sigma }}\) with \(\widehat{{\mathbf {\Sigma }}}\) in the equation above. It follows from \(\sqrt {a+b}\le \sqrt {a} + \sqrt {b}\) for any \(a,b\ge 0\) that \(|\widehat{\omega }_{\sigma _{k,B}}(\tau)-\omega _{\sigma _{k,B}}(\tau)| \le \Delta \omega _{\sigma _{k,B}}(\tau)\).
The first part of the proof is identical to that of Theorem 2 but by replacing \(\sum \nolimits _{k=1}^{\infty }\pi _k=1/2\) with \(\sum \nolimits _{k=1}^{\infty }\pi _k=1/3\) in Expressions (8) and (10). The second part of the proof focuses on showing that \(| \widehat{LB}({\bf x}) - LB({\bf x})| = | \widehat{UB}({\bf x}) - UB({\bf x})| = o(\sqrt {\beta (\tau)} \sigma _{k,B}({\bf x}) + \gamma _{k,B}(\tau))\) for all \({\bf x}\in \mathcal {X}\) despite the impact of noise variance estimation.
From Proposition 5, we have with probability at least \(1-\alpha /3\),
\begin{equation} \max _{1 \le i\le k}\left|\frac{\widehat{{\sf V}}({\bf x}_i)-{\sf V}({\bf x}_i)}{n_i}\right| \lesssim \left(\frac{\log k}{n_k^*}\right)^{\frac{1}{2}} \cdot {\sf V}_{k,\max } . \end{equation}
(A.23)
By the Woodbury matrix identity, we have
\begin{equation} \widehat{{\mathbf {\Sigma }}}^{-1} - {\mathbf {\Sigma }}^{-1} = {\mathbf {\Sigma }}^{-1}\left({\bf I}+ \Delta \Sigma _{\varepsilon }{\mathbf {\Sigma }}^{-1}\right)^{-1} \Delta \Sigma _{\varepsilon }{\mathbf {\Sigma }}^{-1}. \end{equation}
(A.24)
It follows that
\begin{align} \left\Vert \widehat{{\mathbf {\Sigma }}}^{-1} - {\mathbf {\Sigma }}^{-1}\right\Vert \le \left\Vert {\mathbf {\Sigma }}^{-1}\right\Vert \left\Vert \left({\bf I}+ \Delta \Sigma _{\varepsilon }{\mathbf {\Sigma }}^{-1}\right)^{-1} \right\Vert \left\Vert \Delta \Sigma _{\varepsilon }{\mathbf {\Sigma }}^{-1}\right\Vert \le \left\Vert {\mathbf {\Sigma }}^{-1}\right\Vert \left\Vert \Delta \Sigma _{\varepsilon }{\mathbf {\Sigma }}^{-1}\right\Vert , \end{align}
(A.25)
where, thanks to Weyl’s Theorem, we have used
\begin{equation} \left\Vert \left({\bf I}+ \Delta \Sigma _{\varepsilon }{\mathbf {\Sigma }}^{-1}\right)^{-1} \right\Vert = \lambda _{\max }\left(\left({\bf I}+ \Delta \Sigma _{\varepsilon }{\mathbf {\Sigma }}^{-1}\right)^{-1}\right) = \frac{1}{ \lambda _{\min }\left({\bf I}+ \Delta \Sigma _{\varepsilon }{\mathbf {\Sigma }}^{-1}\right)} \le 1, \end{equation}
(A.26)
and \(\lambda _{\min }({\bf I}+ \Delta \Sigma _{\varepsilon }{\mathbf {\Sigma }}^{-1}) \le 1\), since \(\Delta \Sigma _{\varepsilon }{\mathbf {\Sigma }}^{-1}\) is positive definite.
First, we analyze the terms in \(\Delta \gamma _{k,B}(\tau)\) defined in Equation (A.22). It follows that, with probability of at least \(1-\alpha /3\),
\[\begin{eqnarray*} \Delta L_{\mu _{k,B}} \tau & = L_\Sigma \sqrt {k} \left\Vert \left(\widehat{{\mathbf {\Sigma }}}^{-1}-{\mathbf {\Sigma }}^{-1}\right)\bar{\mathcal {Y}}\right\Vert \tau \nonumber \nonumber\\ & \stackrel{(i)}{\le } L_\Sigma \sqrt {k} \left\Vert {\mathbf {\Sigma }}^{-1}\right\Vert \left\Vert \Delta \Sigma _{\varepsilon }{\mathbf {\Sigma }}^{-1} \bar{\mathcal {Y}}\right\Vert \tau \nonumber \nonumber\\ & \le L_\Sigma \sqrt {k} \left\Vert {\mathbf {\Sigma }}^{-1}\right\Vert \left\Vert \Delta \Sigma _{\varepsilon }\right\Vert \left\Vert {\mathbf {\Sigma }}^{-1} \bar{\mathcal {Y}}\right\Vert \tau \nonumber \nonumber\\ & = L_\Sigma \sqrt {k} \left\Vert {\mathbf {\Sigma }}^{-1} \bar{\mathcal {Y}}\right\Vert \tau \left\Vert {\mathbf {\Sigma }}^{-1}\right\Vert \left\Vert \Delta \Sigma _{\varepsilon }\right\Vert \nonumber \nonumber \end{eqnarray*}\]
\begin{align} & \stackrel{(ii)}{\lesssim } L_{\mu _{k,B}} \tau \cdot {\sf V}_{k,\min } ^{-1} \cdot \left(\frac{\log k}{n_k^*}\right)^{\frac{1}{2}} \cdot {\sf V}_{k,\max } \nonumber \nonumber\\ & \stackrel{(iii)}{\asymp } L_{\mu _{k,B}} \tau \left(\frac{\log k}{n_k^*}\right)^{\frac{1}{2}} = o(L_{\mu _{k,B}} \tau) \end{align}
(A.27)
if \(\log k = o(n_k^*)\) as \(k, B\rightarrow \infty\). In Expression (A.27) step \((i)\) on the right-hand side follows from Expressions (A.24) and (A.26), step \((ii)\) follows from Expressions (4), (A.23), and that \(\Vert {\mathbf {\Sigma }}^{-1}\Vert \le 1/\lambda _{\min }(\Sigma _{\varepsilon }) = {\sf V}^{-1}_{k,\min }\), and step \((iii)\) follows, since \({\sf V}_{k,\max }/{\sf V}_{k,\min }\lt \infty\).
Next, we have that with probability of at least \(1-\alpha /3\)
\begin{align} \sqrt {\beta (\tau)} \Delta \omega _{\sigma _{k,B}}(\tau) & = \sqrt {\beta (\tau)} \left(2\tau L_\Sigma k\left\Vert \widehat{{\mathbf {\Sigma }}}^{-1}-{\mathbf {\Sigma }}^{-1}\right\Vert \max \limits _{{\bf x},{\bf x}^{\prime }\in \mathcal {X}}\Sigma _{\sf M}({\bf x}, {\bf x}^{\prime })\right)^{\frac{1}{2}} \nonumber \nonumber\\ & \stackrel{(i)}{\le } \sqrt {\beta (\tau)} \left(2\tau L_\Sigma k \left\Vert {\mathbf {\Sigma }}^{-1}\right\Vert \left\Vert \Delta \Sigma _{\varepsilon }\right\Vert \left\Vert {\mathbf {\Sigma }}^{-1}\right\Vert \max \limits _{{\bf x},{\bf x}^{\prime }\in \mathcal {X}}\Sigma _{\sf M}({\bf x}, {\bf x}^{\prime })\right)^{\frac{1}{2}} \nonumber \nonumber\\ & \stackrel{(ii)}{\le } \left(\beta (\tau) \cdot 2\tau \cdot L_\Sigma k \cdot \max \limits _{{\bf x},{\bf x}^{\prime }\in \mathcal {X}}\Sigma _{\sf M}({\bf x}, {\bf x}^{\prime }) {\sf V}_{k,\min }^{-1} \right)^{\frac{1}{2}} {\sf V}_{k,\min }^{-\frac{1}{2}} \left(\left(\frac{\log k}{n_k^*}\right)^{\frac{1}{2}} \cdot {\sf V}_{k,\max }\right)^{\frac{1}{2}} \nonumber \nonumber\\ & \stackrel{(iii)}{\asymp } \sqrt {\beta (\tau)} \omega _{\sigma _{k,B}}(\tau) \left(\frac{\log k}{n_k^*}\right)^{\frac{1}{4}} = o\left(\sqrt {\beta (\tau)} \omega _{\sigma _{k,B}}(\tau) \right) \end{align}
(A.28)
if \(\log k = o(n_k^*)\) as \(k, B\rightarrow \infty\). In Expression (A.28), step \((i)\) on the right-hand side follows from Expression (A.25), step \((ii)\) follows Expression (A.23) and that \(\Vert {\mathbf {\Sigma }}^{-1}\Vert \le {\sf V}^{-1}_{k,\min }\), and step \((iii)\) follows from Expression (11) and that \({\sf V}_{k,\max }/{\sf V}_{k,\min }\lt \infty\).
Moreover, due to the union bound, we have with probability of at least \(1-2\alpha /3\),
\begin{align} \Delta \mu _{k,B}({\bf x}) & = \left|\Sigma _{\sf M}({\bf x},{\bf X})^\top \left(\widehat{{\mathbf {\Sigma }}}^{-1}-{\mathbf {\Sigma }}^{-1}\right)\bar{\mathcal {Y}}\right| \nonumber \nonumber\\ & \stackrel{(i)}{\le } \left\Vert \Sigma _{\sf M}({\bf x},{\bf X})\right\Vert \left\Vert {\mathbf {\Sigma }}^{-1}\right\Vert \left\Vert \Delta \Sigma _{\varepsilon }\right\Vert \left\Vert {\mathbf {\Sigma }}^{-1} \bar{\mathcal {Y}}\right\Vert \nonumber \nonumber\\ & \stackrel{(ii)}{\lesssim } \sqrt {k} {\sf V}_{k,\min }^{-1} \left(\frac{\log k}{n_k^*}\right)^{\frac{1}{2}} \cdot {\sf V}_{k,\max } \sqrt {k} {\sf V}_{k,\min }^{-\frac{1}{2}} \nonumber \nonumber\\ & \stackrel{(iii)}{\asymp } k \left(\frac{\log k}{n_k^*}\right)^{\frac{1}{2}} {\sf V}_{k,\min }^{-\frac{1}{2}} = o \left(\sqrt {\beta (\tau)}\sigma _{k,B}({\bf x})\right) \quad \forall {\bf x}\in \mathcal {X}, \end{align}
(A.29)
if (1) in the finite smoothness case, \(k^{\tfrac{6\nu +d}{2d}} = o(n_k^*)\), thanks to Proposition 2; and (2) in the infinite smoothness case, \(k^{\tfrac{2d-1}{d}} (\log k) \exp ((C+\bar{C}) k^{\tfrac{1}{d}})= o(n_k^*)\), thanks to Proposition 4, where \(C, \bar{C}\gt 0\) are as specified in the proof of Proposition 4. In Expression (A.29), step \((i)\) on the right-hand side follows from Expressions (A.24) and (A.26), step \((ii)\) follows from Equation (A.23), Expression (10) with \(\eta _{k,B}=\log (1/(\pi _k \alpha))\) and \(\sum \nolimits _{k=1}^{\infty }\pi _k=1/2\) being replaced by \(\sum \nolimits _{k=1}^{\infty }\pi _k=1/3\), and that \(\Vert {\mathbf {\Sigma }}^{-1}\Vert \le {\sf V}^{-1}_{k,\min }\), and step \((iii)\) follows, since \({\sf V}_{k,\max }/{\sf V}_{k,\min }\lt \infty\).
Finally, it holds with probability of \(1-\alpha /3\) that
\[\begin{eqnarray*} \Delta \sigma _{k,B}^2({\bf x}) & = \left|\Sigma _{\sf M}({\bf x},{\bf X})^\top \left(\widehat{{\mathbf {\Sigma }}}^{-1}-{\mathbf {\Sigma }}^{-1}\right)\Sigma _{\sf M}({\bf x},{\bf X}) \right| \nonumber \nonumber\\ & \stackrel{(i)}{\le } \left\Vert \Sigma _{\sf M}({\bf x},{\bf X})\right\Vert \left\Vert {\mathbf {\Sigma }}^{-1}\right\Vert \left\Vert \Delta \Sigma _{\varepsilon }{\mathbf {\Sigma }}^{-1}\right\Vert \left\Vert \Sigma _{\sf M}({\bf x},{\bf X})\right\Vert \nonumber \nonumber\\ & \le \left\Vert \Sigma _{\sf M}({\bf x},{\bf X})\right\Vert \left\Vert {\mathbf {\Sigma }}^{-1}\right\Vert \left\Vert \Delta \Sigma _{\varepsilon }\right\Vert \left\Vert {\mathbf {\Sigma }}^{-1}\right\Vert \left\Vert \Sigma _{\sf M}({\bf x},{\bf X})\right\Vert \nonumber \nonumber \end{eqnarray*}\]
\begin{align} & \stackrel{(ii)}{\lesssim } \sqrt {k} {\sf V}_{k,\min }^{-1} \left(\frac{\log k}{n_k^*}\right)^{\frac{1}{2}} \cdot {\sf V}_{k,\max } \cdot {\sf V}_{k,\min }^{-1} \sqrt {k} \nonumber \nonumber\\ & \stackrel{(iii)}{\asymp } k {\sf V}_{k,\min }^{-1} \left(\frac{\log k}{n_k^*}\right)^{\frac{1}{2}} = o (\sigma _{k,B}^2({\bf x})), \quad \forall {\bf x}\in \mathcal {X}, \end{align}
(A.30)
if (1) in the finite smoothness case, \(k^{\tfrac{6\nu -d}{d}} (\log k) = o(n_k^*)\), thanks to Proposition 2; and (2) in the infinite smoothness case, \(k^2 (\log k) \exp (2(\bar{C}+C) k^{\tfrac{1}{d}})= o(n_k^*)\), thanks to Proposition 4, where \(C, \bar{C}\gt 0\) are as specified in the proof of Proposition 4. In both finite and infinite smoothness cases, it follows from Expression (A.30) \(\sqrt {\beta (\tau)} \Delta \sigma _{k,B}({\bf x}) = o(\sqrt {\beta (\tau)} \sigma _{k,B}({\bf x}))\) for all \({\bf x}\in \mathcal {X}\). In Expression (A.30), step \((i)\) on the right-hand side follows from Expression (A.25), step \((ii)\) follows from Expression (A.23) and that \(\Vert {\mathbf {\Sigma }}^{-1}\Vert \le {\sf V}^{-1}_{k,\min }\), and step \((iii)\) follows, since \({\sf V}_{k,\max }/{\sf V}_{k,\min }\lt \infty\). Due to the union bound, combining Expressions (A.21), (A.27), (A.28), (A.29), and (A.30) completes the proof. □

B Reproducing Kernel Hilbert Space

We provide some necessary notation and preliminaries to reproducing kernel Hilbert space (RKHS) in this section. Denote the marginal distribution of design points on \(\mathcal {X}\) by \(\mathbb {P}\) and let \(L_2(\mathcal {X})\) be the \(L_2\) space with respect to \(\mathbb {P}\). For \(f,g: \mathcal {X}\rightarrow \mathbb {R}\), let \(\langle f,g \rangle _2 = (\int _{\mathcal {X}} f({\bf x})g({\bf x}) \ d \mathbb {P}({\bf x}))^{1/2}\) denote the inner product.
Consider a continuous and positive definite kernel function K that satisfies the Hilbert-Schmidt condition:
\begin{equation*} \int _{\mathcal {X}\times \mathcal {X}} K^2({\bf x},{\bf x}^\prime) d\mathbb {P}({\bf x}) d\mathbb {P}({\bf x}^\prime) \lt \infty . \end{equation*}
For any \(f\in L_2(\mathcal {X})\), we define an integral operator \(T_K: L_2(\mathcal {X})\rightarrow \mathcal {H}\) via
\begin{equation*} T_K f({\bf x}) := \int _{\mathcal {X}} K({\bf x},{\bf x}^\prime)f({\bf x}^\prime) d\mathbb {P}({\bf x}^\prime),\quad {\bf x}\in \mathcal {X}. \end{equation*}
Mercer’s Theorem (Section 4.3 of Reference [35]) states that there exist countable pairs of eigenvalues and eigenfunctions \((\mu _i,\psi _i)_{i\in \mathbb {N}}\subset (0,\infty)\times L_2(\mathcal {X})\) of \(T_K\) such that
\begin{equation*} T_K\psi _i = \mu _i\psi _i, \quad \forall i\in \mathbb {N}, \end{equation*}
where \(\lbrace \psi _i\rbrace _{i=1}^\infty\) form an orthonormal basis of \(L_2(\mathcal {X})\) and \(\mu _1\ge \mu _2\ge \cdots\) are non-negative with \(\lim _{i\rightarrow \infty } \mu _i=0\). Furthermore, the kernel function K has the following expansion:
\begin{equation*} K({\bf x},{\bf x}^\prime)=\Sigma _{i=1}^\infty \mu _i\psi _i({\bf x})\psi _i({\bf x}^\prime), \quad \forall {\bf x},{\bf x}^\prime \in \mathcal {X}, \end{equation*}
where the convergence of the series on the right-hand side is absolute and uniform over any \({\bf x}, {\bf x}^\prime \in \mathcal {X}\). The RKHS \(\mathcal {H}\) associated with kernel K can be defined as
\begin{equation*} \mathcal {H}:= \left\lbrace f\in L_2(\mathcal {X}): \Vert f\Vert _\mathcal {H}^2=\sum _{i=1}^\infty \frac{f_i^2}{\mu _i}\lt \infty , f_i=\langle f,\psi _i \rangle _2\right\rbrace , \end{equation*}
which is equipped with the inner product \(\langle f,g\rangle _{\mathcal {H}}=\sum _{i=1}^{\infty } f_i g_i/\mu _i\) for any \(f=\sum _{i=1}^\infty f_i\psi _i\) and \(g=\sum _{i=1}^\infty g_i\psi _i\) in \(\mathcal {H}\).
Now, consider an estimator of a function \(f_0 \in \mathcal {H}\) given by
\begin{equation*} f_\lambda = \left(T_K+\lambda \mbox{id} \right)^{-1} T_K f_0, \end{equation*}
where \(\mbox{id}\) is the identify operator. It can be shown that \(f_\lambda\) is the solution to the following optimization problem [54]:
\begin{equation} f_\lambda := \mathop {\rm argmin}_{f\in \mathcal {H}} \left\lbrace \Vert f-f_0\Vert _2^2+\lambda \Vert f\Vert ^2_\mathcal {H}\right\rbrace , \end{equation}
(A.31)
which is the population counterpart of the following KRR problem:
\begin{equation*} \widehat{f}_{ \lambda }:= \mathop {\rm argmin}_{f\in \mathcal {H}} \left\lbrace \frac{1}{k} \sum _{i=1}^k\left(f({\bf x}_i)-f_0({\bf x}_i)\right)^2+\lambda \Vert f\Vert ^2_\mathcal {H}\right\rbrace . \end{equation*}
The estimator \(f_\lambda\) in Equation (A.31) can be obtained through the integral operator \(T_{\tilde{K}}\) as \(f_\lambda = T_{\tilde{K}} f_0\), where \(\tilde{K}\) is the equivalent kernel for K: \(\tilde{K}\) has the same eigenfunctions as \(K,\) but its eigenvalues are changed to \(\nu _i = \mu _i/(\mu _i+\lambda)\) for \(i\in \mathbb {N}\), i.e., \(\tilde{K}({\bf x},{\bf x}^\prime) = \sum _{i=1}^\infty \nu _i \psi _i({\bf x}) \psi _i({\bf x}^\prime)\). Let \(\widetilde{\mathcal {H}}\) be the RKHS associated with kernel \(\tilde{K}\). We have \(\widetilde{\mathcal {H}}\) equivalent to \(\mathcal {H}\) as a functional space, but with a different inner product:
\begin{equation*} \langle f,g\rangle _{\widetilde{H}} = \langle f,g\rangle _{2} + \lambda \langle f,g\rangle _{H} , \end{equation*}
and the corresponding RKHS norm is \(\Vert \cdot \Vert _{\widetilde{H}}\). For more details on equivalent kernels, we refer the interested reader to Reference [54] and Chapter 7 of Reference [35].

C Random Design Setting

This subsection focuses on the analysis of predictive variance of SK in the random design setting, i.e., the design points are randomly drawn from the input space \(\mathcal {X}\) according to some distribution \(\mathbb {P}\). Our investigation builds on the literature on the non-asymptotic error analysis for kernel ridge regression estimators [30, 54].
Consider the predictive variance \(\sigma ^2_{k,B}({\bf x}_0)\) given in Equation (3). We have
\begin{align} \sigma ^2_{k,B}({\bf x}_0) \le \Sigma _{\sf M}({\bf x}_0, {\bf x}_0) - \Sigma _{\sf M}({\bf x}_0,{\bf X})^\top \left(\Sigma _{\sf M}({\bf X},{\bf X}) + {\sf V}_{k,\max }{\bf I}_k\right)^{-1}\Sigma _{\sf M}({\bf x}_0,{\bf X}), \end{align}
(A.32)
where recall that \({\sf V}_{k,\max } = \max \limits _{1\le i \le k}{\sf V}({\bf x}_i)/n_i\). Define \(\lambda := {\sf V}_{k,\max }/(\xi ^2 k)\) in this subsection. We can write the upper bound for \(\sigma ^2_{k,B}({\bf x}_0)\) given by the right-hand side of Expression (A.32) as \(\xi ^2 (K_{{\bf x}_0}({\bf x}_0) - \widehat{K}_{{\bf x}_0}({\bf x}_0)),\) where \(K_{{\bf x}_0}({\bf x}^\prime) := K({\bf x}_0, {\bf x}^\prime)\) for any \({\bf x}^\prime \in \mathcal {X}\), and recall from Section 2.1 that \(K({\bf x}, {\bf x}^\prime) = \Sigma _{\sf M}({\bf x}, {\bf x}^\prime)/\xi ^2\) for any \({\bf x}, {\bf x}^\prime \in \mathcal {X}\) is the kernel function; in addition,
\begin{equation} \widehat{K}_{{\bf x}_0}({\bf x}^\prime) = K({\bf x}^\prime ,{\bf X})^\top \left(K({\bf X},{\bf X}) + k\lambda {\bf I}_k\right)^{-1}K({\bf X}, {\bf x}_0). \end{equation}
(A.33)
Inspired by References [30] and [54], we realize that, for any given prediction point \({\bf x}_0 \in \mathcal {X}\), \(\widehat{K}_{{\bf x}_0}(\cdot)\) is the solution to the following kernel ridge regression (KRR) problem with noiseless observations of \(K_{{\bf x}_0}\) at the design points:
\begin{equation} \widehat{K}_{{\bf x}_0} := \mathop {\rm argmin}_{g\in \mathcal {H}} \left[ \frac{1}{k} \sum _{i=1}^k \left(K_{{\bf x}_0}({\bf x}_i)-g({\bf x}_i)\right)^2 + \lambda \Vert g\Vert _{\mathcal {H}}^2 \right] , \end{equation}
(A.34)
where \(\mathcal {H}\) denotes the RKHS associated with kernel K, \(\Vert \cdot \Vert _{\mathcal {H}}\) denotes the norm equipped with \(\mathcal {H}\), and \(\lambda\) represents the regularization parameter. Hence, we can investigate the order of the upper bound \(\xi ^2 (K_{{\bf x}_0}({\bf x}_0) - \widehat{K}_{{\bf x}_0}({\bf x}_0))\) on \(\sigma ^2_{k,B}({\bf x}_0)\) through studying the bias of the noiseless KRR estimator \(\widehat{K}_{{\bf x}_0}\).

C.1 Main Results

We are now in a position to state the assumption on the eigenfunctions of the kernel function K and the main result that will be useful for establishing the convergence rate of the predictive variance under the supremum norm in the random design setting.
Assumption 1.
There exists a constant \(C_\psi \gt 0\) such that \(\Vert \psi _i\Vert _\infty \le C_\psi\) for all \(i \in \mathbb {N}\).
Define \(\tilde{\kappa }^2 := \sup _{{\bf x}\in \mathcal {X}}\tilde{K}({\bf x},{\bf x})\) and
\begin{equation} \gamma (\lambda):=\sum _{i=1}^\infty \frac{\mu _i}{\lambda +\mu _i}, \end{equation}
(A.35)
which is often referred to as the effective dimensionality of kernel K with respect to \(L_2(\mathcal {X})\) [56]. It follows directly from Assumption 1 that \(\tilde{\kappa }^2 \le C_\psi ^2 \sum _{i=1}^\infty \nu _i \lesssim \gamma (\lambda)\).
Theorem 4.
Under Assumption 1, it holds with probability at least \(1-\delta\) with any \(\delta \in (0,1/3)\) that
\begin{equation} \Vert K_{{\bf x}_0} - \widehat{K}_{{\bf x}_0}\Vert _{\infty } \le \frac{C(k,\tilde{\kappa })}{1-C(k,\tilde{\kappa })}\lambda C^2_{\psi }\gamma (\lambda), \end{equation}
(A.36)
where
\begin{equation*} C(k,\tilde{\kappa }) = \tilde{\kappa }^2\frac{\sqrt {2\log (\delta ^{-1})}}{\sqrt {k}} \left(4+ \frac{4\tilde{\kappa }\sqrt {2 \log (\delta ^{-1})}}{3 \sqrt {k}}\right) . \end{equation*}
In particular, taking \(\delta = k^{-10}\) yields
\begin{equation*} C(k,\tilde{\kappa }) = \tilde{\kappa }^2\sqrt {20}\sqrt {\frac{\log k}{k}} \left(4+ \frac{4\tilde{\kappa }\sqrt {20}}{3}\sqrt {\frac{\log k}{k}}\right); \end{equation*}
and choosing \(\lambda\) such that \(\tilde{\kappa }^2 = o(\sqrt {k/\log k})\) simplifies Expression (A.36) to
\begin{equation*} \Vert K_{{\bf x}_0} - \widehat{K}_{{\bf x}_0}\Vert _{\infty } \le 2 \lambda C^2_{\psi }\gamma (\lambda) \lesssim \lambda \gamma (\lambda). \end{equation*}
Theorem 4 follows from applying Lemma 4 and Theorems 1 in Reference [30] to the noiseless KRR estimator \(\widehat{K}_{{\bf x}_0}\) given in Equation (A.33). For the sake of brevity, we omit the proof here. Notice that setting \(\delta = k^{-10}\) simplifies the bound in Expression (A.36) and facilitates further analysis. Such a choice is quite common in the nonparametric regression literature; see, e.g., References [31, 33, 54]. Theorem 4 enables us to further investigate the convergence rate of \(\Vert K_{{\bf x}_0} - \widehat{K}_{{\bf x}_0}\Vert _{\infty }\) for commonly used kernel functions.
Polynomially decaying kernels. Kernels with polynomially decaying eigenvalues include those that underlie the Sobolev spaces with different orders of smoothness. One popular kernel function belonging to this category is the Matérn kernel when \(\mathbb {P}\) is the uniform distribution on \(\mathcal {X}\) [40].
Assumption 2.
The eigenvalues of K, \(\lbrace \mu _j\rbrace _{j=1}^\infty\), satisfy \(\mu _j \asymp j^{-2\nu /d}\) for some \(\nu \gt d/2\).
Corollary 3.
Under Assumptions 1 and 2, it holds with probability at least \(1-k^{-10}\) that
\begin{equation} \Vert K_{{\bf x}_0} - \widehat{K}_{{\bf x}_0}\Vert _{\infty } \lesssim \lambda ^{\frac{2\nu -d}{2\nu }}, \end{equation}
(A.37)
where \(\lambda\) is chosen such that \(\tilde{\kappa }^2 = o(\sqrt {k/\log k})\) holds. In particular, for any \(0\lt \lambda ^\prime \le \lambda\), it holds with probability at least \(1-k^{-10}\) for all \({\bf x}_0\in \mathcal {X}\) that
\begin{equation*} K_{{\bf x}_0}({\bf x}_0) - \widehat{K}_{{\bf x}_0}({\bf x}_0; \lambda ^\prime)\le K_{{\bf x}_0}({\bf x}_0) - \widehat{K}_{{\bf x}_0}({\bf x}_0; \lambda) \lesssim \lambda ^{\frac{2\nu -d}{2\nu }}. \end{equation*}
Proof.
Under Assumption 2, by selecting \(j^\prime =\lfloor \lambda ^{-d/(2\nu)}\rfloor\) where \(\lfloor x\rfloor\) denotes the greatest integer less than or equal to x, we have from Equation (A.35) that
\begin{align} \gamma (\lambda) = \sum _{j=1}^\infty \frac{1}{1+j^{2\nu /d}\lambda } = \sum _{j=1}^{j^\prime } \frac{1}{1+j^{2\nu /d}\lambda } + \sum _{j=j^\prime +1}^{\infty } \frac{1}{1+j^{2\nu /d}\lambda } \lesssim \lambda ^{-\frac{d}{2\nu }}+\sum _{j=j^\prime +1}^{\infty } \frac{1}{1+j^{2\nu /d}\lambda } . \end{align}
(A.38)
The last term in Expression (A.38) can be further upper bounded as follows:
\begin{align} \sum _{j=j^\prime +1}^{\infty } \frac{1}{1+j^{2\nu /d}\lambda } &\le \frac{1}{\lambda }\int _{\lambda ^{-d/(2\nu)}}^{\infty } \frac{1}{u^{2\nu /d}} du = \lambda ^{-1}\lambda ^{\frac{2\nu -d}{2\nu }}\left(\frac{d}{2\nu -d}\right) \asymp \lambda ^{-\frac{d}{2\nu }}. \end{align}
(A.39)
It follows from Expressions (A.38) and (A.39) that \(\gamma (\lambda) \lesssim \lambda ^{-d/(2\nu)}\). By choosing \(\lambda\) such that \(\tilde{\kappa }^2 \lesssim \gamma (\lambda) = o(\sqrt {k/\log k})\), we have the upper bound in Expression (A.37) follow from Theorem 4 immediately. The last statement follows, since \(K_{{\bf x}_0}({\bf x}_0) - \widehat{K}_{{\bf x}_0}({\bf x}_0;\lambda)\) is a nonincreasing function of \(\lambda\). □
With respect to Corollary 3, one viable choice of \(\lambda\) is \(\lambda \asymp k^{-p}{(\log k)}^{b}\), where \(p\gt 0\) and \(b\in \mathbb {R}\). We can show that, if \(p=\nu /d\) and \(b\gt \nu /d\), then \(\tilde{\kappa }^2 \lesssim \gamma (\lambda) \lesssim {(k^{-p} {(\log k)}^{b})}^{-\tfrac{d}{2\nu }} \lesssim k^{\tfrac{1}{2}}{(\log k)}^{-\tfrac{d}{2\nu }b} = o(\sqrt {k/\log k})\). In this case, it follows that \(\Vert K_{{\bf x}_0} - \widehat{K}_{{\bf x}_0}\Vert _{\infty } \lesssim k^{-\tfrac{2\nu -d}{2d}}{(\log k)}^{\tfrac{2\nu -d}{2\nu }b}\). Hence, with a high probability, for any \({\bf x}_0 \in \mathcal {X}\), \(\sigma _{k,B}({\bf x}_0) \lesssim k^{-\tfrac{2\nu -d}{4d}}{(\log k)}^{\tfrac{2\nu -d}{4\nu }b}\). Recall the definition \(\lambda = {\sf V}_{k,\max }/(\xi ^2 k)\); it is not difficult to allocate the simulation budget to fulfill this choice of \(\lambda\). In case the budget allocation adopted makes \(\lambda\) decay to zero at a faster rate than the aforementioned case, Corollary 3 makes it clear that the corresponding upper bound on \(\Vert K_{{\bf x}_0} - \widehat{K}_{{\bf x}_0}\Vert _{\infty }\) and hence the upper bound on the order of \(\sigma _{k,B}(\cdot)\) still hold; however, it can be loose and does not imply the optimal convergence rate of \(\sigma _{k,B}(\cdot)\).
We can further extend Theorem 2 and study the order of the uniform error bound in the random design setting as \(k, B \rightarrow \infty\) when polynomially decaying kernels are adopted. We see from Expression (8) that both \(\gamma _{k,B}(\tau)\) and \(\sqrt {\beta _k(\tau)}\sup _{{\bf x}\in \mathcal {X}}\sigma _{k,B}({\bf x})\) must converge to zero as \(k,B \rightarrow \infty\) to guarantee a vanishing approximation error at \(\forall {\bf x}\in \mathcal {X}\). Given the aforementioned choice of \(\lambda \asymp k^{-p} {(\log k)}^{b}\) with \(p=\nu /d\) and \(b\gt \nu /d\), we have that \(\lim \limits _{k, B\rightarrow \infty } \gamma _{k,B}(\tau)=0\) can be achieved if the grid constant as a function of k, \(\tau (k)\), decreases sufficiently fast. Recall from Expression (12) that \(\gamma _{k,B}(\tau) \lesssim ((\beta _k(\tau) \cdot \tau (k) \cdot k {\sf V}_{k,\min }^{-1})^{\tfrac{1}{2}} + \tau (k) + \tau (k) \cdot k {\sf V}_{k,\min }^{-\tfrac{1}{2}}) .\) Assuming that \({\sf V}_{k,\max }/{\sf V}_{k, \min }\lt \infty\), we have \({\sf V}_{k,\min } \asymp {\sf V}_{k,\max } \asymp k^{1-p} (\log k)^{b}\). Setting \(\tau (k) = \mathcal {O}(k^{-a})\) as \(k, B\rightarrow \infty\) with \(a = 2\nu /d\) ensures that \(\lim \limits _{k, B\rightarrow \infty } \gamma _{k,B}(\tau)=0\) and moreover \(\gamma _{k,B}(\tau) = o(\sqrt {\beta _k(\tau)}\sup _{{\bf x}\in \mathcal {X}}\sigma _{k,B}({\bf x}))\) as \(k, B\rightarrow \infty\). The latter follows from the fact that \(\beta _k(\tau) = \mathcal {O} (\log k)\) due to Expression (9) and \(\sigma _{k,B}({\bf x}) = \mathcal {O} (k^{-\tfrac{2\nu -d}{4d}}{(\log k)}^{\tfrac{2\nu -d}{4\nu }b})\) at \(\forall {\bf x}\in \mathcal {X}\), thanks to Corollary 3. Therefore, \(\gamma _{k,B}(\tau) + \sqrt {\beta _k(\tau)}\sup _{{\bf x}\in \mathcal {X}}\sigma _{k,B}({\bf x}) \lesssim \sqrt {\beta _{k}(\tau)} \sup _{{\bf x}\in \mathcal {X}}\sigma _{k,B}({\bf x}) = \mathcal {O}(k^{-\tfrac{2\nu -d}{4d}}{(\log k)}^{\tfrac{2\nu -d}{4\nu }b+\tfrac{1}{2}})\) as \(k, B \rightarrow \infty\).
Exponentially decaying kernels. Consider kernels having exponentially decaying eigenvalues. Roughly speaking, infinitely smooth stationary kernels belong to this category. One important kernel function that falls into this category is the squared exponential or Gaussian kernel when \(\mathbb {P}\) is the uniform distribution or normal on \(\mathcal {X}\) [39, 40]: \(K({\bf x},{\bf x}^\prime) = \exp \left(-\Vert {\bf x}-{\bf x}^\prime \Vert ^2/\theta \right),\) where \(\theta \gt 0\) denotes the lengthscale parameter.
Assumption 3.
The eigenvalues of K, \(\lbrace \mu _j\rbrace _{j=1}^\infty\), satisfy \(\mu _j \asymp b^{j^{\frac{1}{d}}}\) for some \(b\in (0,1)\).
Corollary 4.
Under Assumptions 1 and 3, it holds with probability at least \(1-k^{-10}\) that
\begin{equation} \Vert K_{{\bf x}_0} - \widehat{K}_{{\bf x}_0}\Vert _{\infty } \lesssim \lambda \left(\log \lambda ^{-1}\right)^d, \end{equation}
(A.40)
where \(\lambda\) is chosen such that \(\tilde{\kappa }^2 = o(\sqrt {k/\log k})\) holds. In particular, for any \(0\lt \lambda ^\prime \le \lambda\), it holds with probability at least \(1-k^{-10}\) for all \({\bf x}_0\in \mathcal {X}\) that
\begin{equation*} K_{{\bf x}_0}({\bf x}_0) - \widehat{K}_{{\bf x}_0}({\bf x}_0; \lambda ^\prime)\le K_{{\bf x}_0}({\bf x}_0) - \widehat{K}_{{\bf x}_0}({\bf x}_0; \lambda) \lesssim \lambda \left(\log \lambda ^{-1}\right)^d. \end{equation*}
Proof.
Under Assumption 3, by setting \(j^*=\lfloor (\tfrac{\log \lambda ^{-1}}{\log b^{-1}})^d\rfloor\), we have from Equation (A.35) that
\begin{align} \gamma (\lambda) &=\sum _{j=1}^{j^*} \frac{1}{1+\lambda \mu _j^{-1}} + \sum _{j=j^*+1}^\infty \frac{1}{1+\lambda \mu _j^{-1}} \nonumber \nonumber\\ &\lesssim \left(\frac{\log \lambda ^{-1}}{\log b^{-1}}\right)^d+\lambda ^{-1} \sum _{j=j^*+1}^\infty b^{j^{\frac{1}{d}}}\nonumber \nonumber\\ &\lesssim \left(\log \lambda ^{-1}\right)^d+\lambda ^{-1}\int _{\left(\frac{\log \lambda ^{-1}}{\log b^{-1}}\right)^d}^\infty b^{x^{\frac{1}{d}}} {{\tt d}}x. \end{align}
(A.41)
Let \(t=\zeta x^{1/d}\), where \(\zeta =-\log b\). Then, \(b^{x^{1/d}}=e^{-t}\), \({{\tt d}}x=(d\zeta ^{-d})t^{d-1}{{\tt d}}t\), and the range of t is \([\log \lambda ^{-1},\infty)\). Hence, the second term on the right-hand side of Expression (A.41) can be upper bounded as follows:
\begin{align} \lambda ^{-1}\int _{\left(\frac{\log \lambda ^{-1}}{\log b^{-1}}\right)^d}^\infty b^{x^{\frac{1}{d}}} {{\tt d}}x &= \lambda ^{-1} \int _{\log \lambda ^{-1}}^\infty e^{-t}(d\zeta ^{-d})t^{d-1}{{\tt d}}t\nonumber \nonumber\\ &= \lambda ^{-1}d\zeta ^{-d} \int _{\log \lambda ^{-1}}^\infty e^{-t}t^{d-1} {{\tt d}}t\nonumber \nonumber\\ &= \lambda ^{-1}d\zeta ^{-d}\Gamma \left(d, \log \lambda ^{-1}\right), \end{align}
(A.42)
where \(\Gamma (s,u) = \int _{u}^\infty t^{s-1} e^{-t} {{\tt d}}t\) denotes the upper incomplete gamma function; in our setting, \(s=d\) and \(u=\log \lambda ^{-1}\). It is also known that \(\Gamma (s,u) \rightarrow u^{s-1}e^{-u}\) as \(u\rightarrow \infty\). Therefore, Equation (A.42) becomes
\begin{equation} \lambda ^{-1} \int _{\left(\frac{\log \lambda ^{-1}}{\log b^{-1}}\right)^d}^\infty b^{x^{\frac{1}{d}}} {{\tt d}}x \asymp \lambda ^{-1} d\zeta ^{-d} (\log \lambda ^{-1})^{d-1}\lambda \asymp (\log \lambda ^{-1})^{d-1}. \end{equation}
(A.43)
Hence, \(\gamma (\lambda) \lesssim (\log \lambda ^{-1})^d\) follows from Expressions (A.41) and (A.43). By choosing \(\lambda\) such that \(\tilde{\kappa }^2 \lesssim \gamma (\lambda) = o(\sqrt {k/\log k})\), we have the upper bound in Expression (A.40) follow from Theorem 4 immediately. The last statement follows, since \(K_{{\bf x}_0}({\bf x}_0) - \widehat{K}_{{\bf x}_0}({\bf x}_0;\lambda)\) is a nonincreasing function of \(\lambda\). □
Some comments follow from Corollary 4. One viable choice of \(\lambda\) is \(\lambda \asymp \exp (-k^{\frac{1}{p}})\) for some \(p\gt 0\). By choosing \(p\gt 2d\), it follows that \(\tilde{\kappa }^2 \lesssim \gamma (\lambda) \lesssim k^{\frac{d}{p}} = o(\sqrt {k/\log k})\). The high probability upper bound given by Expression (A.40) in this case follows as \(\Vert K_{{\bf x}_0} - \widehat{K}_{{\bf x}_0}\Vert _{\infty } \lesssim \exp (-k^{\frac{1}{p}}) k^\frac{d}{p}\). Therefore, it follows that, with a high probability, for any \({\bf x}_0 \in \mathcal {X}\), \(\sigma _{k,B}({\bf x}_0) \lesssim \exp (-\frac{1}{2}k^{\frac{1}{p}}) k^\frac{d}{2p}\). Recall that \(\lambda := {\sf V}_{k,\max }/(\xi ^2 k)\); we see that it is possible to allocate the computational budget to fulfill this choice of \(\lambda\). In case the budget allocation adopted makes \(\lambda\) diminish at a faster rate than the case discussed above, the corresponding upper bound on \(\Vert K_{{\bf x}_0} - \widehat{K}_{{\bf x}_0}\Vert _{\infty }\) and hence the upper bound on the order of \(\sigma _{k,B}(\cdot)\) still hold; however, in this case, the upper bound is not as tight.
Finally, we further extend Theorem 2 by discussing the order of the uniform error bound in the random design setting as \(k, B \rightarrow \infty\) when exponentially decaying kernels are adopted. We see from Expression (8) that both \(\gamma _{k,B}(\tau)\) and \(\sqrt {\beta _k(\tau)}\sup _{{\bf x}\in \mathcal {X}}\sigma _{k,B}({\bf x})\) must converge to zero as \(k,B \rightarrow \infty\) to guarantee a vanishing approximation error at \(\forall {\bf x}\in \mathcal {X}\). Given the aforementioned choice of \(\lambda \asymp \exp (-k^{\frac{1}{p}})\) with \(p\gt 2d\), we have that \(\lim \limits _{k, B\rightarrow \infty } \gamma _{k,B}(\tau) = 0\) can be achieved if the grid constant as a function of k, \(\tau (k)\), decreases sufficiently fast. Recall from Expression (12) that \(\gamma _{k,B}(\tau) \lesssim ((\beta _k(\tau) \cdot \tau (k) \cdot k {\sf V}_{k,\min }^{-1})^{\tfrac{1}{2}} + \tau (k) + \tau (k) \cdot k {\sf V}_{k,\min }^{-\tfrac{1}{2}}) .\) Assuming that \({\sf V}_{k,\max }/{\sf V}_{k, \min }\lt \infty\), we have \({\sf V}_{k,\min } \asymp {\sf V}_{k,\max } \asymp k \exp (-k^{\tfrac{1}{p}})\). Setting \(\tau (k) = \mathcal {O}(\exp (-a k^{\tfrac{1}{p}}))\) where \(a \ge 2\) ensures that \(\lim \limits _{k, B\rightarrow \infty } \gamma _{k,B}(\tau)=0\) and moreover \(\gamma _{k,B}(\tau) = o(\sqrt {\beta _k(\tau)}\sup _{{\bf x}\in \mathcal {X}}\sigma _{k,B}({\bf x}))\) as \(k, B\rightarrow \infty\). The latter follows from the fact that \(\beta _k(\tau) = \mathcal {O} (k^{\tfrac{1}{p}})\) due to Expression (9) and \(\sigma _{k,B}({\bf x}) = \mathcal {O} (\exp (-\frac{1}{2}k^{\frac{1}{p}}) k^\frac{d}{2p})\) at \(\forall {\bf x}\in \mathcal {X}\), thanks to Corollary 4. Therefore, \(\gamma _{k,B}(\tau) + \sqrt {\beta _k(\tau)}\sup _{{\bf x}\in \mathcal {X}}\sigma _{k,B}({\bf x}) \lesssim \sqrt {\beta _{k}(\tau)} \sup _{{\bf x}\in \mathcal {X}}\sigma _{k,B}({\bf x}) = \mathcal {O}(\exp (-\frac{1}{2}k^{\frac{1}{p}}) k^\frac{1+d}{2p})\) as \(k, B \rightarrow \infty\).

D Additional Figures for Section 6

D.1 Figures for the M/M/1 Queueing Example

Fig. 6.
Fig. 6. Boxplots of \(\log (H_{\max })\) of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) for the M/M/1 example obtained on 100 independent macro-replications under the grid design with \(B = 25,600\).
Fig. 7.
Fig. 7. Boxplots of \(\log (H_{\max })\) of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) for the M/M/1 example obtained on 100 independent macro-replications under the uniform design with \(B = 2,560\).
Fig. 8.
Fig. 8. Boxplots of \(\log (H_{\max })\) of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) for the M/M/1 example obtained on 100 independent macro-replications under the uniform design with \(B = 25,600\).

D.2 Figures for Dimension-flexible Example

D.2.1 Additional Figures for the 1D Case.

Figures 9 and 10 summarize the magnitudes of \(H_{\max }\) (in a logarithmic scale) of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) obtained with a total budget of \(B=2,560\) and 10,240, respectively. These figures illustrate that, regardless of the kernel function adopted, different budget allocation schemes and design-point sampling schemes (i.e., LHD and uniform) tend to yield similar patterns in \(H_{\max }\) magnitudes.
Fig. 9.
Fig. 9. Boxplots of \(\log (H_{\max })\) of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) for the 1D example obtained on 100 independent macro-replications with \(B = 2,560\).
Fig. 10.
Fig. 10. Boxplots of \(\log (H_{\max })\) of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) for the 1D example obtained on 100 independent macro-replications with \(B = 10,240\).

D.2.2 Additional Figures for the 5D Case.

Figures 11 and 12 summarize the magnitudes of \(H_{\max }\) (in a logarithmic scale) of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) obtained with a total budget of \(B=2,560\) and 10,240, respectively. These figures illustrate that, regardless of the kernel adopted, different budget allocation schemes and design-point sampling schemes (i.e., LHD and uniform) tend to yield similar patterns in \(H_{\max }\) magnitudes.
Fig. 11.
Fig. 11. Boxplots of \(\log (H_{\max })\) of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) for the 5D example obtained on 100 independent macro-replications with \(B = 2,560\).
Fig. 12.
Fig. 12. Boxplots of \(\log (H_{\max })\) of \(\mbox{CI}_{{\mbox{eu}}}\), \(\mbox{CI}_{{\mbox{u}}}\), and \(\mbox{CI}_{{\mbox{b}}}\) for the 5D example obtained on 100 independent macro-replications with \(B = 10,240\).

References

[1]
A. Alaa and M. van der Schaar. 2017. Bayesian inference of individualized treatment effects using multi-task Gaussian processes. In Proceedings of the 31st Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA, 3427–3435.
[2]
B. E. Ankenman, B. L. Nelson, and J. Staum. 2010. Stochastic Kriging for simulation metamodeling. Oper. Res. 58 (2010), 371–382.
[3]
R. Arcangéli, M. C. López de Silanes, and J. J. Torrens. 2012. Extension of sampling inequalities to Sobolev semi-norms of fractional order and derivative data. Numer. Math. 121 (2012), 587–608.
[4]
P. Bickel and M. Rosenblatt. 1973. On some global measures of deviations of density function estimates. Ann. Stat. 31 (1973), 1852–1884.
[5]
L. Cai, L. Gu, Q. Wang, and S. Wang. 2021. Simultaneous confidence bands for nonparametric regression with missing covariate data. Ann. Instit. Stat. Math. 73 (2021), 1249–1279.
[6]
L. Cai, R. Liu, S. Wang, and L. Yang. 2019. Simultaneous confidence bands for mean and variance functions based on deterministic design. Stat. Sinica 29 (2019), 505–525.
[7]
T. Cai, M. Low, and Z. Ma. 2014. Adaptive confidence bands for nonparametric regression functions. J. Am. Stat. Assoc. 109 (2014), 1054–1070.
[8]
G. Cao, L. Wang, Y. Li, and L. Yang. 2016. Oracle-efficient confidence envelopes for covariance functions in dense functional data. Stat. Sinica 26 (2016), 359–383.
[9]
A. Carpentier, R. Munos, and A. Antos. 2015. Adaptive strategy for stratified Monte Carlo sampling. J. Mach. Learn. Res. 16 (2015), 2231–2271.
[10]
X. Chen, B. E. Ankenman, and B. L. Nelson. 2012. The effects of common random numbers on stochastic Kriging metamodels. ACM Trans. Model. Comput. Simul. 22, Article 7 (2012), 1–20.
[11]
X. Chen and K.-K. Kim. 2014. Stochastic Kriging with biased sample estimates. ACM Trans. Model. Comput. Simul. 24, Article 8 (2014), 1–23.
[12]
X. Chen and Q. Zhou. 2017. Sequential design strategies for mean response surface metamodeling via stochastic Kriging with adaptive exploration and exploitation. Eur. J. Oper. Res. 262 (2017), 575–-585.
[13]
S. R. Chowdhury and A. Gopalan. 2017. On kernelized multi-armed bandits. In Proceedings of the 34th International Conference on Machine Learning. JMLR.org, 844–853.
[14]
K. De Brabanter, J. De Brabanter, and J. A. K. Suykens. 2011. Approximate confidence and prediction intervals for least squares support vector regression. IEEE Trans. Neural Netw. 22 (2011), 110–120.
[15]
G. Dellino, J. P. C. Kleijnen, and C. Meloni. 2012. Robust optimization in simulation: Taguchi and Krige combined. INFORMS J. Comput. 24 (2012), 471–484.
[16]
A. B. Dieker and S.-H. Kim. 2021. Efficient fully sequential indifference-zone procedures using properties of multidimensional Brownian motion exiting a sphere. Retrieved from https://arxiv.org/abs/2104.08784
[17]
S. Efromovich. 2007. Sequential design and estimation in heteroscedastic nonparametric regression. Sequent. Anal. 26 (2007), 3–25.
[18]
R. Eubank and P. Speckman. 1993. Confidence bands in nonparametric regression. J. Am. Stat. Assoc. 88 (1993), 1287–1301.
[19]
P. I. Frazier. 2014. A fully sequential elimination procedure for indifference-zone ranking and selection with tight bounds on probability of correct selection. Oper. Res. 62 (2014), 926–942.
[20]
Z. Geng, F. Yang, X. Chen, and N. Wu. 2015. Gaussian process based modeling and experimental design for sensor calibration in drifting environments. Sensors Actuat B: Chem. 216 (2015), 321–331.
[21]
L. Gu and L. Yang. 2015. Oracally efficient estimation for single-index link function with simultaneous confidence band. Electron. J. Stat. 9 (2015), 1540–1561.
[22]
G. Johnston. 1982. Probabilities of maximal deviations for nonparametric regression function estimates. J. Multivar. Anal. 12 (1982), 402–414.
[23]
B. Kamiński. 2015. A method for the updating of stochastic Kriging metamodels. Eur. J. Oper. Res. 247 (2015), 859–866.
[24]
I. Keilegom and G. Claeskens. 2003. Bootstrap confidence bands for regression curves and their derivatives. Ann. Stat. 31 (2003), 1852–1884.
[25]
J. P. C. Kleijnen. 2015. Design and Analysis of Simulation Experiments (2nd ed.). Springer, New York.
[26]
J. P. C. Kleijnen and W. C. M. van Beers. 2022. Statistical tests for cross-validation of Kriging models. INFORMS J. Comput. 34 (2022), 607–621.
[27]
B. Laurent and P. Massart. 2000. Adaptive estimation of a quadratic functional by model selection. Ann. Stat. 28 (2000), 1302–1338.
[28]
L. Le Gratiet and J. Garnier. 2015. Asymptotic analysis of the learning curve for Gaussian process regression. Mach. Learn. 98 (2015), 407–433.
[29]
A. Lederer and J. Umlauft. 2019. Uniform error bounds for Gaussian process regression with application to safe control. In Proceedings of the 33rd Conference on Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, E. Fox, and R. Garnett (Eds.). Curran Associates Inc., Vancouver, BC, Canada, 659–669.
[30]
Z. Liu and M. Li. 2020. Non-asymptotic analysis in kernel ridge regression. Retrieved from https://arxiv.org/pdf/2006.01350v1.pdf
[31]
Z. Liu and M. Li. 2023. On the estimation of derivatives using plug-in kernel ridge regression estimators. J. Mach. Learn. Res. 24 (2023), 1–37.
[32]
X. Lu, A. Rudi, E. Borgonovo, and L. Rosasco. 2020. Faster Kriging: Facing high-dimensional simulators. Oper. Res. 68 (2020), 233–249.
[33]
C. Ma, R. Pathak, and M. J. Wainwright. 2023. Optimally tackling covariate shift in RKHS-based nonparametric regression. Ann. Stat. 51 (2023), 738–761.
[34]
S. H. Ng and J. Yin. 2012. Bayesian Kriging analysis and design for stochastic simulations. ACM Trans. Model. Comput. Simul. 22, Article 17 (2012), 1–26.
[35]
C. E. Rasmussen and C. K. I. Williams. 2006. Gaussian Processes for Machine Learning. The MIT Press, Cambridge, MA, USA.
[36]
C. Rieger and B. Zwicknagl. 2010. Sampling inequalities for infinitely smooth functions, with applications to interpolation and machine learning. Advan. Comput. Math. 32 (2010), 103–129.
[37]
T. J. Santner, B. J. Williams, and W. I. Notz. 2003. The Design and Analysis of Computer Experiments. Springer, New York.
[38]
R. Schüssler and M. Trede. 2016. Constructing minimum-width confidence bands. Econ. Lett. 145 (2016), 182–185.
[39]
M. W. Seeger, S. M. Kakade, and D. P. Foster. 2008. Information consistency of nonparametric Gaussian process methods. IEEE Trans. Inf. Theor. 54 (2008), 2376–2382.
[40]
N. Srinivas, A. Krause, S. M. Kakade, and M. W. Seeger. 2012. Information-theoretic regret bounds for Gaussian process optimization in the bandit setting. IEEE Trans. Inf. Theor. 58 (2012), 3250–3265.
[41]
A. L. Teckentrup. 2020. Convergence of Gaussian process regression with estimated hyper-parameters and applications in Bayesian inverse problems. SIAM/ASA J. Uncert. Quantif. 8 (2020), 1310–1337.
[42]
M. J. Wainwright. 2019. High-dimensional Statistics: A Non-asymptotic Viewpoint. Cambridge University Press, Cambridge, UK.
[43]
J. Wang, F. Cheng, and L. Yang. 2013. Smooth simultaneous confidence bands for cumulative distribution functions. J. Nonparam. Stat. 25 (2013), 395–407.
[44]
W. Wang. 2021. On the inference of applying Gaussian process modeling to a deterministic function. Electron. J. Stat. 15 (2021), 5014–5066.
[45]
W. Wang and X. Chen. 2016. The effects of estimation of heteroscedasticity on stochastic Kriging. In Proceedings of the Winter Simulation Conference, T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick (Eds.). IEEE, Piscataway, NJ, 326–337.
[46]
W. Wang and X. Chen. 2018. An adaptive two-stage dual metamodeling approach for stochastic simulation experiments. IISE Trans. 50 (2018), 820–836.
[47]
W. Wang, R. Tuo, and C. F. J. Wu. 2020. On prediction properties of Kriging: Uniform error bounds and robustness. J. Am. Stat. Assoc. 115 (2020), 920–930.
[48]
H. Wendland. 2005. Scattered Data Approximation. Cambridge University Press, Cambridge, UK.
[49]
H. Wendland and C. Rieger. 2005. Approximate interpolation with applications to selecting smoothing parameters. Numer. Math. 101 (2005), 729–748.
[50]
W. Whitt. 1989. Planning queueing simulations. Manag. Sci. 35 (1989), 1341–1366.
[51]
G. Wynne, F.-X. Briol, and M. Girolami. 2021. Convergence guarantees for Gaussian process means with misspecified likelihoods and smoothness. J. Mach. Learn. Res. 22 (2021), 1–40.
[52]
G. Xie and X. Chen. 2020. Uniform error bounds for stochastic Kriging. In Proceedings of the Winter Simulation Conference, K.-H. Bae, B. Feng, S. Kim, S. Lazarova-Molnar, Z. Zheng, T. Roeder, and R. Thiesing (Eds.). IEEE, Piscataway, NJ, 361–372.
[53]
W. Xie, B. Nelson, and J. Staum. 2010. The influence of correlation functions on stochastic Kriging metamodels. In Proceedings of the Winter Simulation Conference, B. Johansson, S. Jain, J. Montoya-Torres, J. Hugan, and E. Yücesan (Eds.). IEEE, Piscataway, NJ, 1067–1078.
[54]
Y. Yun, A. Bhattacharya, and D. Pati. 2017. Frequentist coverage and sup-norm convergence rate in Gaussian process regression. Retrieved from https://arxiv.org/abs/1708.04753v1
[55]
Y. Zhang and X. Chen. 2021. Information consistency of stochastic Kriging and its implications. In Proceedings of the Winter Simulation Conference, S. Kim, B. Feng, K. Smith, S. Masoud, Z. Zheng, C. Szabo, and M. Loper (Eds.). IEEE, Piscataway, NJ, 1–12.
[56]
Y. Zhang, J. Duchi, and M. Wainwright. 2015. Divide and conquer kernel ridge regression: A distributed algorithm with minimax optimal rates. J. Mach. Learn. Res. 16 (2015), 3299–3340.

Index Terms

  1. A Uniform Error Bound for Stochastic Kriging: Properties and Implications on Simulation Experimental Design

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Modeling and Computer Simulation
    ACM Transactions on Modeling and Computer Simulation  Volume 35, Issue 1
    January 2025
    170 pages
    EISSN:1558-1195
    DOI:10.1145/3696814
    • Editor:
    • Wentong Cai
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 November 2024
    Online AM: 29 July 2024
    Accepted: 16 July 2024
    Revised: 15 July 2024
    Received: 11 October 2022
    Published in TOMACS Volume 35, Issue 1

    Check for updates

    Author Tags

    1. Simulation output analysis
    2. simulation theory
    3. metamodeling
    4. design of experiments

    Qualifiers

    • Research-article

    Funding Sources

    • National Science Foundation
    • NSF CAREER

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 422
      Total Downloads
    • Downloads (Last 12 months)422
    • Downloads (Last 6 weeks)115
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media