Hamiltonian Monte Carlo acceleration using surrogate functions with random bases

Cheng Zhang¹,
Babak Shahbaba² &
Hongkai Zhao¹

1013 Accesses
19 Citations
Explore all metrics

Abstract

For big data analysis, high computational cost for Bayesian methods often limits their applications in practice. In recent years, there have been many attempts to improve computational efficiency of Bayesian inference. Here we propose an efficient and scalable computational technique for a state-of-the-art Markov chain Monte Carlo methods, namely, Hamiltonian Monte Carlo. The key idea is to explore and exploit the structure and regularity in parameter space for the underlying probabilistic model to construct an effective approximation of its geometric properties. To this end, we build a surrogate function to approximate the target distribution using properly chosen random bases and an efficient optimization process. The resulting method provides a flexible, scalable, and efficient sampling algorithm, which converges to the correct target distribution. We show that by choosing the basis functions and optimization process differently, our method can be related to other approaches for the construction of surrogate functions such as generalized additive models or Gaussian process models. Experiments based on simulated and real data show that our approach leads to substantially more efficient sampling algorithms compared to existing state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive independent sticky MCMC algorithms

Article Open access 11 January 2018

Scalable Bayesian p-generalized probit and logistic regression

Article Open access 04 July 2024

Multi-fidelity No-U-Turn Sampling

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Ahn, S., Chen, Y., Welling, M.: Distributed and adaptive darting Monte Carlo through regenerations. In: Proceedings of the 16th International Conference on Artificial Intelligence and Statistics (AI Stat, 2013)
Amari, S., Nagaoka, H.: Methods of Information Geometry. Translations of Mathematical Monographs, vol. 191. Oxford University Press, Oxford (2000)
Andrieu, C., Thoms, J.: A tutorial on adaptive MCMC. Stat. Comput. 18(4), 343–373 (2008)
Article MathSciNet Google Scholar
Bache, K., Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed April 2015 (2013)
Betancourt, M.J.: The fundamental incompatibility of scalable Hamiltonian Monte Carlo and naive data subsampling. In: Proceedings of 31th International Conference on Machine Learning (ICML’15) (2015)
Chen, T., Fox, E.B., Guestrin, C.: Stochastic gradient Hamiltonian Monte Carlo (Preprint, 2014)
Conard, P.R., Marzouk, Y.M., Pillai, N.S., Smith, A.: Asymptotically exact MCMC algorithms via local approximations of computationally intensive models. arXiv:1402.1694v3. Accessed Nov 2014 (unpublished manuscript, preprint, 2014)
Dashti, M., Stuart, A.: Uncertainty quantification and weak approximation of an elliptic inverse problem. SIAM J. Sci. Comput. 49, 2524–2542 (2011)
MathSciNet MATH Google Scholar
Duane, S., Kennedy, A.D., Pendleton, B.J., Roweth, D.: Hybrid Monte Carlo. Phys. Lett. B 195, 216–222 (1987)
Article Google Scholar
Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 721–741 (1984)
Article MATH Google Scholar
Geyer, C.J.: Practical Markov chain Monte Carlo. Stat. Sci. 7, 473–483 (1992)
Article Google Scholar
Girolami, M., Calderhead, B.: Riemann manifold Langevin and Hamiltonian Monte Carlo methods. J. R. Stat. Soc. B 73, 123–214 (with discussion, 2011)
Greville, T.: Some applications of the pseudoinverse of a matrix. SIAM Rev. 2(1), 15–22 (1960)
Article MathSciNet MATH Google Scholar
Hoffman, M.D., Blei, D.M., Bach, F.R.: Online learning for latent Dirichlet allocation. In: Neural Information Processing Systems (NIPS), pp. 856–864 (2010)
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1–3), 489–501(2006a)
Huang, G.B., Chen, L., Siew, C.K.: Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans. Neural Netw. 17(4), 879–892 (2006b)
Hyvärinen, A.: Estimation of non-normalized statistical models by score matching. J. Mach. Learn. Res. 6, 695–709 (2005)
MathSciNet MATH Google Scholar
Kohonen, T.: Self-Organization and Associative Memory. Springer, Berlin (1988)
Book MATH Google Scholar
Kovanic, P.: On the pseudo inverse of a sum of symmetric matrices with applications to estimation. Kybernetika 15(5), 341–348 (1979)
MathSciNet MATH Google Scholar
Lan, S., Streets, J., Shahbaba, B.: Wormhole Hamiltonian Monte Carlo. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014a)
Lan, S., Zhou, B., Shahbaba, B.: Spherical Hamiltonian Monte Carlo for constrained target distributions. In: Proceedings of the 31st International Conference on Machine Learning (ICML, 2014b)
Lin, C.J., Weng, R.C., Keerthi, S.S.: Trust region Newton method for large-scale logistic regression. J. Mach. Learn. Res. 9, 627–650 (2008)
MathSciNet MATH Google Scholar
Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, New York (2001)
MATH Google Scholar
Meeds, E., Welling, M.: GPS-ABC: Gaussian process surrogate approximate Bayesian computation. In: Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI) (2014)
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. J. Chem. Phys. 21, 1087–1092 (1953)
Article Google Scholar
Moro, S., Cortez, P., Rita, P.: A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst. 62, 22–31 (2014)
Article Google Scholar
Neal, R.M.: MCMC using Hamiltonian dynamics. In: Brooks, S., Gelman, A., Jones, G., Meng, X.L. (eds.) Handbook of Markov Chain Monte Carlo, pp. 113–162. Chapman and Hall/CRC, Boca Raton (2011)
Neal, R.: Regression and classification using Gaussian process priors. Bayesian Stat. 6, 475–501 (1999)
MathSciNet MATH Google Scholar
Neal, R.: Bayesian Learning in Neural Networks. Springer, New York (1996)
Book MATH Google Scholar
Quinonero-Candela, J., Rasmussen, C.E.: A unifying view of sparse approximate Gaussian process regression. J. Mach. Learn. Res. 6, 1939–1959 (2005)
MathSciNet MATH Google Scholar
Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Proceedings of the 21st Conference on Advances in Neural Information Processing Systems, NIPS (2007)
Rahimi, A., Recht, B.: Uniform approximation of functions with random bases. In: Proceedings of the 46th Annual Allerton Conference on Communication, Control, and Computing (2008)
Rasmussen, C.E.: Evaluation of Gaussian processes and other methods for non-linear regression. PhD Thesis, University of Toronto (1996)
Rasmussen, C.E.: Gaussian processes to speed up hybrid Monte Carlo for expensive Bayesian integrals. Bayesian Stat. 7, 651–659 (2003)
MathSciNet Google Scholar
Rasmussen, C.E., Nickisch, H.: Gaussian processes for machine learning (GPML) toolbox. J. Mach. Learn. Res. 11, 3011–3015 (2010)
MathSciNet MATH Google Scholar
Roberts, G.O., Rosenthal, J.S.: Coupling and ergodicity of adaptive Markov chain Monte Carlo algorithm. J. Appl. Probab. 44(2), 458–475 (2007)
Article MathSciNet MATH Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. In: Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. I, pp. 318–362. Bradford Books, Cambridge (1986)
Shahbaba, B., Lan, S., Johnson, W.O., Neal, R.M.: Split Hamiltonian Monte Carlo. Stat. Comput. 24, 339–349 (2014)
Article MathSciNet MATH Google Scholar
Snelson, E., Ghahramani, Z.: Sparse Gaussian processes using pseudo-input. In: Advances in Neural Information Processing Systems (NIPS), vol. 18. MIT Press, Cambridge (2006)
Strathmann, H., Sejdinovic, D., Livingstone, S., Szabo, Z., Gretton, A.: Gradient-free Hamiltonian Monte Carlo with efficient kernel exponential families. In: Advances in Neural Information Processing Systems (2015)
van Schaik, A., Tapson, J.: Online and adaptive pseudoinverse solutions for ELM weights. Neurocomputing 149, 233–238 (2015)
Article Google Scholar
Verlet, L.: Computer ‘experiments’ on classical fluids, I: thermodynamical properties of Lennard–Jones molecules. Phys. Rev. 159, 98–103 (1967)
Article Google Scholar
Welling, M., Teh, Y.W.: Bayesian learning via stochastic gradient Langevin dynamics. In: Proceedings of the 28th International Conference on Machine Learning (ICML), pp. 681–688 (2011)
Zhang, C., Shahbaba, B., Zhao, H.: Precomputing strategy for Hamiltonian Monte Carlo method based on regularity in parameter space. arxiv:1504.01418v1. Accessed April 2015 (unpublished manuscript, 2015)

Download references

Acknowledgments

This work is supported by NSF grants DMS-1418422 and DMS-1622490.

Author information

Authors and Affiliations

Department of Mathematics, University of California, Irvine, Irvine, CA, 92697, USA
Cheng Zhang & Hongkai Zhao
Department of Statistics, University of California, Irvine, Irvine, CA, 92697, USA
Babak Shahbaba

Authors

Cheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Babak Shahbaba
View author publications
You can also search for this author in PubMed Google Scholar
Hongkai Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cheng Zhang.

Appendices

Appendix 1: Convergence to the correct target distribution

In order to prove that the equilibrium distribution remains the same, it suffices to show that the detailed balance condition still holds. Denote $\theta = (p,\,q),\; \theta ^{\prime }={\tilde{\phi }}_t(\theta ).$ In the Metropolis–Hasting step, we use the original Hamiltonian to compute the acceptance probability

$$\begin{aligned} \alpha (\theta ,\,\theta ^{\prime }) = \min (1,\,\exp [-H(\theta ^{\prime }) + H(\theta )]), \end{aligned}$$

therefore,

$$\begin{aligned} \alpha (\theta ,\,\theta ^{\prime }){\mathbb {P}}(\mathrm{d}\theta )&= \alpha (\theta ,\,\theta ^{\prime }) \exp [-H(\theta )]\mathrm{d}\theta \\&\mathop {=}\limits ^{\theta = {\tilde{\phi }}^{-1}_{t}(\theta ^{\prime })} \min (\exp [-H(\theta )],\,\exp [-H(\theta ^{\prime })])\left| \frac{\mathrm{d}\theta }{\mathrm{d}\theta ^{\prime }}\right| \mathrm{d}\theta ^{\prime } \\&= \alpha (\theta ^{\prime },\,\theta )\exp [-H(\theta ^{\prime })]\mathrm{d}\theta ^{\prime }\\&= \alpha (\theta ^{\prime },\,\theta ) {\mathbb {P}}(\mathrm{d}\theta ^{\prime }), \end{aligned}$$

since $ \left| \frac{\mathrm{d}\theta }{\mathrm{d}\theta ^{\prime }}\right| = 1$ due to the volume conservation property of the surrogate-induced Hamiltonian flow ${\tilde{\phi }}_t.$ Now that we showed the detailed balance condition is satisfied, along with the reversibility of the surrogate-induced Hamiltonian flow, the modified Markov chain will converge to the correct target distribution.

Appendix 2: Potential matching

In the paper, training data collected from the history of Markov chain are used without a detailed explanation. Here, we will analyze the asymptotical behavior of surrogate-induced distribution and try to explain why the history of the Markov chain is a reasonable choice for training data. Recall that we find our neural network surrogate function by minimizing the mean square error (11). Similarly to Hyvärinen (2005), it turns out that minimizing (11) is asymptotically equivalent to minimizing a new distance between the surrogate-induced distribution and the underlying target distribution, independent of their corresponding normalizing constants.

Suppose we know the density of the underlying intractable target distribution up to a constant

$$\begin{aligned} P(q|Y) = \frac{1}{Z}\exp (-U(q)), \end{aligned}$$

where Z is the unknown normalizing constant. Our goal is to approximate this distribution using a parametrized density model, also known up to a constant,

$$\begin{aligned} Q(q,\,\theta ) = \frac{1}{Z(\theta )}\exp (-V(q,\,\theta )). \end{aligned}$$

Ignoring the multiplicative constant, the corresponding potential energy functions are U(q) and $V(q,\,\theta ),$ respectively. The straightforward square distance between the two potentials will not be a well-defined measure between the two distributions because of the unknown normalizing constants. Therefore, we use the following distance instead:

$$\begin{aligned} K(\theta )&= \min _d\int \Vert V(q,\,\theta )-U(q)-d\Vert ^2P(q|Y)\mathrm{d}q\nonumber \\&=\int \Vert V(q,\,\theta )-U(q)\Vert ^2 P(q|Y)\mathrm{d}q \nonumber \\&\quad - \left[ E_q(V(\theta )-U)\right] ^2 = {\mathbb {V}}\mathrm {ar}_{q}(V(\theta )-U). \end{aligned}$$

(14)

Because of its similarity to score matching (Hyvärinen 2005), we refer to the approximation method based on this new distance as potential matching; the corresponding potential matching estimator of $\theta $ is given by

$$\begin{aligned} {\hat{\theta }} = \arg \min _{\theta } K(\theta ). \end{aligned}$$

It is easy to verify that $K(\theta ) = 0 \Rightarrow V(\theta ) = U + \mathrm{constant} \Rightarrow Q(q,\,\theta ) = P(q|Y),$ so $K(\theta )$ is a well-defined squared distance. Exact evaluation of (14) is analytically intractable. In practice, given N samples from the target distribution $q_1,\,q_2,\ldots ,q_N,$ we minimize the empirical version of (14)

$$\begin{aligned} {\tilde{K}}(\theta )&= \min _d\frac{1}{N}\sum _{n=1}^N \left\| V\left( q_n,\,\theta \right) - U\left( q_n\right) -d\right\| ^2 \nonumber \\&= \frac{1}{N}\sum _{n=1}^N\left\| V\left( q_n,\,\theta \right) -U\left( q_n\right) \right\| ^2 \nonumber \\&\quad - \left( \frac{1}{N}\sum _{n=1}^NV\left( q_n,\,\theta \right) -U\left( q_n\right) \right) ^2, \end{aligned}$$

(15)

which is asymptotically equivalent to K by law of large numbers. (15) could be more concise if we allow a shift term in the parametrized model ($V(q,\,\theta ) = V(q,\,\theta _{-d}) + \theta _d$). In that case, the empirical potential matching estimator is

$$\begin{aligned} {\tilde{\theta }}&= \arg \min _\theta {\tilde{K}}(\theta ) = \arg \min _\theta \min _d\frac{1}{N}\sum _{n=1}^N \left\| V\left( q_n,\,\theta _{-d}\right) \right. \\&\left. \quad +\left( \theta _d-d\right) - U\left( q_n\right) \right\| ^2\\&= \arg \min _\theta \frac{1}{N}\sum _{n=1}^N \left\| V\left( q_n,\,\theta _{-d}\right) + \theta _d - U\left( q_n\right) \right\| ^2\\&= \arg \min _\theta \frac{1}{N}\sum _{n=1}^N \left\| V\left( q_n,\,\theta \right) - U\left( q_n\right) \right\| ^2. \end{aligned}$$

In particular, if we adopt an additive model for $V(q,\,\theta )$ with a shift term

$$\begin{aligned} V(q,\,\theta ) = \sum _{i=1}^sv_i\sigma \left( \varvec{w_i}\cdot q+d_i\right) + b,\quad \theta = (\varvec{v},\,b), \end{aligned}$$

where $\varvec{w_i},\,d_i$, and activation function $\sigma $ are chosen according to Algorithm 2 and collect early evaluations from the history of Markov chain

$$\begin{aligned} {\mathcal {T}}_N= & {} \left\{ \left( q^{(1)},\,U\left( q^{(1)}\right) \right) ,\,\left( q^{(2)},\,U\left( q^{(2)}\right) \right) ,\ldots ,\right. \\&\times \,\left. \left( q^{(N)},\,U\left( q^{(N)}\right) \right) \right\} , \end{aligned}$$

as training data; this way, the estimated parameters (i.e., weights for the output neuron) are asymptotically the potential matching estimates

$$\begin{aligned} \lim _{N\rightarrow \infty }{\hat{\theta }}_{\mathrm{ELM},{\mathcal {T}}_N} = \arg \min _{\theta }\lim _{N\rightarrow \infty }C\left( \theta |{\mathcal {T}}_N\right) = {\hat{\theta }}, \end{aligned}$$

since the Markov chain will eventually converge to the target distribution. When truncated at a finite N, the estimated parameters are almost the empirical potential matching estimates except that the samples from the history of the Markov chain are not exactly (but quite close) from the target distribution.

Appendix 3: Adaptive learning

Theorem 1

(Greville) If a matrix A, with k columns, is denoted by $A_k$ and partitioned as $A_k=[A_{k-1}a_k],$ with $A_{k-1}$ a matrix having $k-1$ columns, then the Moore–Penrose generalized inverse of $A_k$ is

$$\begin{aligned} A_k^{\dagger } =\begin{bmatrix}A_{k-1}^{\dagger }(I-a_kb_k^\mathrm{T})\\b_k^\mathrm{T}\end{bmatrix}, \end{aligned}$$

where

$$\begin{aligned} c_k = \left( I-A_{k-1}A_{k-1}^{\dagger }\right) a_k, \quad b_k = \left\{ \begin{array}{ll} \frac{(A_{k-1}^{\dagger })^\mathrm{T}A_{k-1}^{\dagger }a_k}{1+ \Vert A_{k-1}^\dagger a_k\Vert ^2}, &{}c_k=0,\\ \frac{c_k}{\Vert c_k\Vert ^2}, &{} c_k\ne 0. \end{array}\right. \end{aligned}$$

Proof of Proposition 1

To save computational cost, we only update the estimator. Suppose the current output matrix is $H_{k+1}=\begin{bmatrix}H_k\\h_{k+1}^\mathrm{T}\end{bmatrix}$ and the target vector is $T_{k+1}=\begin{bmatrix}T_k\\t_{k+1}\end{bmatrix},$ then

$$\begin{aligned} v_{k+1}^\mathrm{T}H_{k+1}^\mathrm{T}&= T_{k+1}^\mathrm{T} \Rightarrow v_{k+1}^\mathrm{T} \\&= T_{k+1}^\mathrm{T}\left( H_{k+1}^\mathrm{T}\right) ^{\dagger } \\&= \left[ T_k^\mathrm{T} t_{k+1}\right] \left( \left[ H_k^\mathrm{T} h_{k+1}\right] \right) ^{\dagger }\\&= \left[ T_k^\mathrm{T} t_{k+1}\right] \begin{bmatrix}(H_k^\mathrm{T})^{\dagger }(I-h_{k+1}b_{k+1}^\mathrm{T})\\b_{k+1}^\mathrm{T}\end{bmatrix}\\&= T_k^\mathrm{T}\left( H_k^\mathrm{T}\right) ^{\dagger }\left( I-h_{k+1}b_{k+1}^\mathrm{T}\right) + t_{k+1}b_{k+1}^\mathrm{T}\\&= v_k^\mathrm{T} + \left( t_{k+1}-v_k^\mathrm{T}h_{k+1}\right) b_{k+1}^\mathrm{T}, \end{aligned}$$

where $b_{k+1}$ is obtained according to Theorem 1. Note that the computation of $b_{k+1}$ and $c_{k+1}$ still involves $H_k$ and $H_k^{\dagger }$ whose sizes increase as data size grows. Following Kohonen (1988), Kovanic (1979), and Schaik and Tapson (2015), we introduce two auxiliary matrices here

$$\begin{aligned}&\varPhi _{k} = I-H_k^\mathrm{T}\left( H_k^\mathrm{T}\right) ^{\dagger } \in {\mathbb {R}}^{s\times s},\quad \\&\varTheta _{k}= \left( \left( H_k^\mathrm{T}\right) ^{\dagger }\right) ^\mathrm{T}\left( H_k^\mathrm{T}\right) ^{\dagger }= H_k^{\dagger }\left( H_k^\mathrm{T}\right) ^{\dagger } \in {\mathbb {R}}^{s\times s}, \end{aligned}$$

and rewrite $b_{k+1}$ and $c_{k+1}$ as

$$\begin{aligned} c_{k+1} = \varPhi _{k}h_{k+1}, \quad b_{k+1} = \left\{ \begin{array}{ll} \frac{\varTheta _kh_{k+1}}{1+ h_{k+1}^\mathrm{T}\varTheta _kh_{k+1}}, &{}c_{k+1}=0,\\ \frac{c_{k+1}}{\Vert c_{k+1}\Vert ^2}, &{} c_{k+1}\ne 0. \end{array}\right. \end{aligned}$$

Updating of the two auxiliary matrices can also be done adaptively

$$\begin{aligned} \varPhi _{k+1}= & {} I - H_{k+1}^\mathrm{T}\left( H_{k+1}^\mathrm{T}\right) ^{\dagger } \\= & {} I-\left[ H_k^\mathrm{T}h_{k+1}\right] \begin{bmatrix}(H_k^\mathrm{T})^{\dagger }(I-h_{k+1}b_{k+1}^\mathrm{T})\\b_{k+1}^\mathrm{T}\end{bmatrix}\\= & {} \varPhi _k -\varPhi _kh_{k+1}b_{k+1}^\mathrm{T},\\ \varTheta _{k+1}= & {} H_{k+1}^{\dagger }\left( H_{k+1}^\mathrm{T}\right) ^{\dagger } \\= & {} \left[ \left( I-b_{k+1}h_{k+1}^\mathrm{T}\right) H_k^{\dagger }b_{k+1}\right] \begin{bmatrix}(H_k^\mathrm{T})^{\dagger }(I-h_{k+1}b_{k+1}^\mathrm{T})\\b_{k+1}^\mathrm{T}\end{bmatrix}\\= & {} \left( I-b_{k+1}h_{k+1}^\mathrm{T}\right) \varTheta _k\left( I-h_{k+1}b_{k+1}^\mathrm{T}\right) + b_{k+1}b_{k+1}^\mathrm{T}, \end{aligned}$$

and if $c_{k+1} = 0,$ these formulas can be simplified as below

$$\begin{aligned} \varPhi _{k+1} = \varPhi _k,\quad \varTheta _{k+1} = \varTheta _k - \varTheta _kh_{k+1}b_{k+1}^\mathrm{T}. \end{aligned}$$

$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, C., Shahbaba, B. & Zhao, H. Hamiltonian Monte Carlo acceleration using surrogate functions with random bases. Stat Comput 27, 1473–1490 (2017). https://doi.org/10.1007/s11222-016-9699-1

Download citation

Received: 16 September 2015
Accepted: 30 August 2016
Published: 13 September 2016
Issue Date: November 2017
DOI: https://doi.org/10.1007/s11222-016-9699-1

Hamiltonian Monte Carlo acceleration using surrogate functions with random bases

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Adaptive independent sticky MCMC algorithms

Scalable Bayesian p-generalized probit and logistic regression

Multi-fidelity No-U-Turn Sampling

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Convergence to the correct target distribution

Appendix 2: Potential matching

Appendix 3: Adaptive learning

Theorem 1

Proof of Proposition 1

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Hamiltonian Monte Carlo acceleration using surrogate functions with random bases

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Adaptive independent sticky MCMC algorithms

Scalable Bayesian p-generalized probit and logistic regression

Multi-fidelity No-U-Turn Sampling

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Convergence to the correct target distribution

Appendix 2: Potential matching

Appendix 3: Adaptive learning

Theorem 1

Proof of Proposition 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation