Abstract
The Hamiltonian Monte Carlo (HMC) method has been recognized as a powerful sampling tool in computational statistics. We show that performance of HMC can be significantly improved by incorporating importance sampling and an irreversible part of the dynamics into a chain. This is achieved by replacing Hamiltonians in the Metropolis test with modified Hamiltonians and a complete momentum update with a partial momentum refreshment. We call the resulting generalized HMC importance sampler Mix & Match Hamiltonian Monte Carlo (MMHMC). The method is irreversible by construction and further benefits from (i) the efficient algorithms for computation of modified Hamiltonians; (ii) the implicit momentum update procedure and (iii) the multistage splitting integrators specially derived for the methods sampling with modified Hamiltonians. MMHMC has been implemented, tested on the popular statistical models and compared in sampling efficiency with HMC, Riemann Manifold Hamiltonian Monte Carlo, Generalized Hybrid Monte Carlo, Generalized Shadow Hybrid Monte Carlo, Metropolis Adjusted Langevin Algorithm and Random Walk Metropolis–Hastings. To make a fair comparison, we propose a metric that accounts for correlations among samples and weights and can be readily used for all methods which generate such samples. The experiments reveal the superiority of MMHMC over popular sampling techniques, especially in solving high-dimensional problems.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Afshar, H.M., Domke, J.: Reflection, refraction, and Hamiltonian Monte Carlo. In: Advances in Neural Information Processing Systems (NIPS) (2015)
Akhmatskaya, E., Reich, S.: The targeted shadowing hybrid Monte Carlo (TSHMC) method. In: New Algorithms for Macromolecular Simulation. Lecture Notes in Computational Science and Engineering, vol. 49, pp. 141–153. Springer, Berlin (2006)
Akhmatskaya, E., Reich, S.: GSHMC: an efficient method for molecular simulation. J. Comput. Phys. 227(10), 4934–4954 (2008). https://doi.org/10.1002/andp.19053221004
Akhmatskaya, E., Reich, S.: New hybrid Monte Carlo methods for efficient sampling: from physics to biology and statistics. Progr. Nucl. Sci. Technol. 2, 447–462 (2012)
Akhmatskaya, E., Reich, S., Nobes, R.: Method, apparatus and computer program for molecular simulation. GB patent (published) (2009)
Akhmatskaya, E., Nobes, R., Reich, S.: Method, apparatus and computer program for molecular simulation. US patent (granted) (2011)
Akhmatskaya, E., Fernández-Pendás, M., Radivojević, T., Sanz-Serna, J.M.: Adaptive splitting integrators for enhancing sampling efficiency of modified Hamiltonian Monte Carlo methods in molecular simulation. Langmuir 33(42), 11530–11542 (2017). https://doi.org/10.1021/acs.langmuir.7b01372
Beskos, A., Pillai, N., Roberts, G., Sanz-Serna, J., Stuart, A.: Optimal tuning of the hybrid Monte Carlo algorithm. Bernoulli 19, 1501–1534 (2013)
Betancourt, M.: Nested sampling with constrained Hamiltonian Monte Carlo. AIP Conf. Proc. 1305, 165–172 (2011). https://doi.org/10.1063/1.3573613
Betancourt, M.: A general metric for Riemannian manifold Hamiltonian Monte Carlo. In: Geometric Science of Information, pp. 327–334. Springer (2013a)
Betancourt, M.: Generalizing the No-U-Turn sampler to Riemannian manifolds (2013b). arXiv:1304.1920v1
Betancourt, M.: Adiabatic Monte Carlo (2014). arXiv:1405.3489v4
Betancourt, M.: A conceptual introduction to Hamiltonian Monte Carlo (2017). arXiv:1701.02434v1
Betancourt, M., Girolami, M.: Hamiltonian Monte Carlo for hierarchical models. Curr. Trends Bayesian Methodol. Appl. 79, 30 (2015)
Betancourt, M., Byrne, S., Livingstone, S., Girolami, M.: The geometric foundations of Hamiltonian Monte Carlo. Bernoulli 23(4A), 2257–2298 (2017). https://doi.org/10.3150/16-bej810
Blanes, S., Casas, F., Sanz-Serna, J.M.: Numerical integrators for the Hybrid Monte Carlo method. SIAM J. Sci. Comput. 36(4), A1556–A1580 (2014)
Bonilla, M.R., Lozano, A., Escribano, B., Carrasco, J., Akhmatskaya, E.: Revealing the mechanism of sodium diffusion in \(\text{ Na }_x\text{ FePo }_4\) using an improved force field. J. Phys. Chem. C 122(15), 8065–8075 (2018). https://doi.org/10.1021/acs.jpcc.8b00230
Bornn, L., Cornebise, J.: Discussion on the paper by Girolami and Calderhead. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73(2), 123–214 (2011)
Bouchard-Côté, A., Vollmer, S.J., Doucet, A.: The bouncy particle sampler: a non-reversible rejection-free Markov chain Monte Carlo method. J. Am. Stat. Assoc. 113(522), 855–867 (2018). https://doi.org/10.1080/01621459.2017.1294075
Brubaker, M.A., Salzmann, M., Urtasun, R.: A family of MCMC methods on implicitly defined manifolds. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 161–172 (2012)
Campos, C.M., Sanz-Serna, J.M.: Extra chance generalized hybrid Monte Carlo. J. Comput. Phys. 281, 365–374 (2015)
Chen, L., Qin, Z., Liu, J.S.: Exploring hybrid Monte Carlo in Bayesian computation. In: ISBA 2000, Proceedings (2000)
Chen, T., Fox, E.B., Guestrin, C.: Stochastic gradient Hamiltonian Monte Carlo. In: Proceedings of the 31st International Conference on Machine Learning, Beijing, China (2014)
Dinh, V., Bilge, A., Zhang, C., IV, F.A.M.: Probabilistic path Hamiltonian Monte Carlo. In: Proceedings of the 34th International Conference on Machine Learning (2017)
Duane, S., Kennedy, A.D., Pendleton, B.J., Roweth, D.: Hybrid Monte Carlo. Phys. Lett. B 195(2), 216–222 (1987)
Duncan, A.B., Leliévre, T., Pavliotis, G.A.: Variance reduction using nonreversible Langevin samplers. J. Stat. Phys. 163(3), 457–491 (2016)
Duncan, A.B., Pavliotis, G.A., Zygalakis, K.C.: Nonreversible Langevin samplers: splitting schemes, analysis and implementation (2017). arXiv:1701.04247
Escribano, B., Lozano, A., Radivojević, T., Fernández-Pendás, M., Carrasco, J., Akhmatskaya, E.: Enhancing sampling in atomistic simulations of solid state materials for batteries: a focus on olivine \(\text{ NaFePO }_4\). Theor. Chem. Acc. 136, 43 (2017). https://doi.org/10.1007/s00214-017-2064-4
Fang, Y., Sanz-Serna, J.M., Skeel, R.D.: Compressible generalized hybrid Monte Carlo. J. Chem. Phys. 140(17), 174108 (2014)
Fu, T., Luo, L., Zhang, Z.: Quasi-Newton Hamiltonian Monte Carlo. In: Proceedings of Uncertainty in Artificial Intelligence, pp. 212–221 (2016)
García Daza, F., Bonilla, M.R., Llordés, A., Carrasco, J., Akhmatskaya, E.: Atomistic insight into ion transport and conductivity in ga/al-substituted li7la3zr2o12 solid electrolytes. ACS Appl. Mater. Interfaces 11, 753–765 (2019). https://doi.org/10.1021/acsami.8b17217
Geyer, C.J.: Practical Markov chain Monte Carlo. Stat. Sci. 7(4), 473–483 (1992)
Girolami, M., Calderhead, B.: Riemann Manifold Langevin and Hamiltonian Monte Carlo methods. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73(2), 123–214 (2011)
Graham, M.M., Storkey, A.J.: Continuously tempered Hamiltonian Monte Carlo. In: Proceedings of Uncertainty in Artificial Intelligence (2017)
Gramacy, R., Samworth, R., King, R.: Importance tempering. Stat. Comput. 20(1), 1–7 (2010)
Hairer, E., Lubich, C., Wanner, G.: Geometric Numerical Integration, 2nd edn. Springer, Berlin (2006)
Hoffman, M.D., Gelman, A.: The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15, 1593–1623 (2014)
Horowitz, A.M.: A generalized guided Monte Carlo algorithm. Phys. Lett. B 268, 247–252 (1991)
Izaguirre, J.A., Hampton, S.S.: Shadow hybrid Monte Carlo: an efficient propagator in phase space of macromolecules. J. Comput. Phys. 200, 581–604 (2004)
Jacquier, E., Polson, N.G., Rossi, P.E.: Bayesian analysis of stochastic volatility models. J. Bus. Econ. Stat. 12, 4 (1994)
Kennedy, A.D.: The theory of hybrid stochastic algorithms. In: Probabilistic Methods in Quantum Field Theory and Quantum Gravity, pp. 209–223. Springer (1990)
Kennedy, A.D., Pendleton, B.: Cost of the generalised hybrid Monte Carlo algorithm for free field theory. Nucl. Phys. B 607, 456–510 (2001). https://doi.org/10.1016/S0550-3213(01)00129-8
Kim, S., Shephard, N., Chib, S.: Stochastic volatility: likelihood inference and comparison with arch models. Rev. Econ. Stud. 65, 361–393 (1998)
Kleppe, T.S.: Dynamically rescaled Hamiltonian Monte Carlo for Bayesian hierarchical models (2018). arXiv:1806.02068v1
Kong, A., Liu, J.S., Wong, W.H.: Sequential imputations and Bayesian missing data problems. J. Am. Stat. Assoc. 89(425), 278–288 (1994)
Lan, S., Streets, J., Shahbaba, B.: Wormhole Hamiltonian Monte Carlo. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence (2014a)
Lan, S., Zhou, B., Shahbaba, B.: Spherical Hamiltonian Monte Carlo for constrained target distributions. In: Proceedings of the 31st International Conference on Machine Learning, pp. 629–637 (2014b)
Lan, S., Stathopoulos, V., Shahbaba, B., Girolami, M.: Lagrangian dynamical Monte Carlo. J. Comput. Graph. Stat. 24(2), 357–378 (2015)
Leimkuhler, B., Reich, S.: Simulating Hamiltonian Dynamics. Cambridge University Press, Cambridge (2005)
Levy, D., Hoffman, M.D., Sohl-Dickstein, J.: Generalizing Hamiltonian Monte Carlo with neural networks. In: 6th International Conference on Learning Representations (2018)
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml. Accessed May 2015
Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer Series in Statistics. Springer, New York (2008)
Livingstone, S., Betancourt, M., Byrne, S., Girolami, M.: On the geometric ergodicity of Hamiltonian Monte Carlo (2016). arXiv:1601.08057v1
Livingstone, S., Faulkner, M.F., Roberts, G.O.: Kinetic energy choice in Hamiltonian/hybrid Monte Carlo (2017). arXiv:1706.02649v2
Lu, X., Perrone, V., Hasenclever, L., Teh, Y.W., Vollmer, S.J.: Relativistic Monte Carlo. In: International Conference on Artificial Intelligence and Statistics (AISTATS) (2017)
Luo, R., Yang, Y., Wang, J., Liu, Y.: Thermostat-assisted Continuous-tempered Hamiltonian Monte Carlo for multimodal posterior sampling. In: NIPS Advances in Approximate Bayesian Inference Workshop (2017)
Ma, Y.A., Fox, E.B., Chen, T., Wu, L.: A unifying framework for devising efficient and irreversible MCMC samplers (2016). arXiv:1608.05973v3
Mackenzie, P.B.: An improved hybrid Monte Carlo method. Phys. Lett. B 226, 369–371 (1989)
McLachlan, R.I.: On the numerical integration of ordinary differential equations by symmetric composition methods. SIAM J. Sci. Comput. 16(1), 151–168 (1995)
Neal, R.M.: Bayesian learning for neural networks. Ph.D. Thesis, Department of Computer Science, University of Toronto (1994)
Neal, R.M.: Annealed importance sampling. Stat. Comput. 11, 125–139 (2001)
Neal, R.M.: Improving asymptotic variance of MCMC estimators: non-reversible chains are better. Technical Report 0406, Department of Statistics, University of Toronto (2004)
Neal, R.M.: MCMC using Hamiltonian dynamics. In: Brooks, S., Gelman, A., Jones G.L., Meng, X.-L. (eds.) Handbook of Markov Chain Monte Carlo, vol. 2, pp. 113–162. Chapman & Hall/CRC (2011)
Nishimura, A., Dunson, D.B.: Recycling intermediate steps to improve Hamiltonian Monte Carlo (2015). arXiv:1511.06925v1
Nishimura, A., Dunson, D.B.: Geometrically tempered Hamiltonian Monte Carlo (2017). arXiv:1604.00872v2
Nishimura, A., Dunson, D., Lu, J.: Discontinuous Hamiltonian Monte Carlo for models with discrete parameters and discontinuous likelihoods (2018). arXiv:1705.08510v2
Ohzeki, M., Ichiki, A.: Mathematical understanding of detailed balance condition violation and its application to Langevin dynamics. J. Phys. Conf. Ser. 638, 012003 (2015). https://doi.org/10.1088/1742-6596/638/1/012003
Ottobre, M.: Markov chain Monte Carlo and irreversibility. Rep. Math. Phys. 77(3), 267–292 (2016)
Ottobre, M., Pillai, N.S., Pinski, F.J., Stuart, A.M.: A function space HMC algorithm with second order Langevin diffusion limit. Bernoulli 22(1), 60–106 (2016)
Pakman, A., Paninski, L.: Auxiliary-variable exact Hamiltonian Monte Carlo samplers for binary distributions. In: Advances in Neural Information Processing Systems (NIPS), pp. 2490–2498 (2013)
Plummer, M., Best, N., Cowles, K., Vines, K.: CODA: convergence diagnosis and output analysis for MCMC. R News 6(1), 7–11 (2006)
Radivojević, T.: Enhancing sampling in computational statistics using modified Hamiltonians. Ph.D. Thesis, UPV-EHU (2016)
Radivojević, T., Fernández-Pendás, M., Sanz-Serna, J.M., Akhmatskaya, E.: Multi-stage splitting integrators for sampling with modified Hamiltonian Monte Carlo methods. J. Comput. Phys. 373, 900–916 (2018). https://doi.org/10.1016/j.jcp.2018.07.023
Rimoldini, L.: Weighted skewness and kurtosis unbiased by sample size and Gaussian uncertainties. Astron. Comput. 5, 1–8 (2014)
Salvatier, J., Wiecki, T.V., Fonnesbeck, C.: Probabilistic programming in python using pymc3. PeerJ Comput. Sci. 2, e55 (2016)
Sohl-Dickstein, J.: Hamiltonian Monte Carlo with reduced momentum flips (2012). arXiv:1205.1939v1
Sohl-Dickstein, J., Culpepper, B.J.: Hamiltonian annealed importance sampling for partition function estimation (2012). arXiv:1205.1925
Sohl-Dickstein, J., Mudigonda, M., Deweese, M.: Hamiltonian Monte Carlo without detailed balance. In: Proceedings of the 31st International Conference on Machine Learning, pp. 719–726 (2014)
Stan Development Team: Stan Modeling Language User’s Guide and Reference Manual, version 2.17.0 ed. (2017)
Strathmann, H., Sejdinovic, D., Livingstone, S., Szabo, Z., Gretton, A.: Gradient-free Hamiltonian Monte Carlo with efficient kernel exponential families. In: Advances in Neural Information Processing Systems (NIPS), pp. 955–963 (2015)
Suwa, H., Todo, S.: General construction of irreversible kernel in Markov Chain Monte Carlo (2012). arXiv:1207.0258
Sweet, C.R., Hampton, S.S., Skeel, R.D., Izaguirre, J.A.: A separable shadow Hamiltonian hybrid Monte Carlo method. J. Chem. Phys. 131, 174106 (2009). https://doi.org/10.1063/1.3253687
Tripuraneni, N., Rowland, M., Ghahramani, Z., Turner, R.: Magnetic Hamiltonian Monte Carlo. In: Proceedings of the 34th International Conference on Machine Learning (2017). arXiv:1607.02738v2
van de Meent, J.W., Paige, B., Wood, F.: Tempering by subsampling (2014). arXiv:1401.7145v1
Wang, Z., de Freitas, N.: Predictive adaptation of hybrid Monte Carlo with Bayesian parametric bandits. In: NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011)
Wang, Z., Mohamed, S., de Freitas, N.: Adaptive Hamiltonian and Riemann manifold Monte Carlo samplers. In: Proceedings of the 30th International Conference on Machine Learning, pp. 1462–1470 (2013)
Wee, C.L., Sansom, M.S., Reich, S., Akhmatskaya, E.: Improved sampling for simulations of interfacial membrane proteins: application of generalized shadow hybrid Monte Carlo to a peptide toxin/bilayer system. J. Phys. Chem. B 112(18), 5710–5717 (2008)
Yi, K., Doshi-Velez, F.: Roll-back Hamiltonian Monte Carlo. In: 31st Conference on Neural Information Processing Systems (NIPS) (2017)
Zhang, Y., Sutton, C.: Semi-separable Hamiltonian Monte Carlo for inference in Bayesian hierarchical models. In: Advances in Neural Information Processing Systems (NIPS), pp. 10–18 (2014)
Zhang, Y., Ghahramani, Z., Storkey, A.J., Sutton, C.A.: Continuous relaxations for discrete Hamiltonian Monte Carlo. In: Advances in Neural Information Processing Systems (NIPS), pp. 3194–3202 (2012)
Zhang, Y., Wang, X., Chen, C., Fan, K., Carin, L.: Towards unifying Hamiltonian Monte Carlo and slice sampling. In: Advances in Neural Information Processing Systems (NIPS), pp. 1749–1757 (2016)
Zhang, C., Shahbaba, B., Zhao, H.: Hamiltonian Monte Carlo acceleration using surrogate functions with random bases. Stat. Comput. 27(6), 1473–1490 (2017a)
Zhang, C., Shahbaba, B., Zhao, H.: Precomputing strategy for Hamiltonian Monte Carlo method based on regularity in parameter space. Comput. Stat. 32(1), 253–279 (2017b)
Zhang, Y., Chen, C., Gan, Z., Henao, R., Carin, L.: Stochastic gradient monomial Gamma sampler. In: Proceedings of the 34th International Conference on Machine Learning (2017c)
Zhang, C., Shahbaba, B., Zhao, H.: Variational Hamiltonian Monte Carlo via score matching. Bayesian Anal. 13(2), 485–506 (2018)
Zou, D., Xu, P., Gu, Q.: Stochastic variance-reduced Hamilton Monte Carlo methods. In: Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 6028–6037 (2018)
Acknowledgements
The authors would like to thank the financial support from MTM2016-76329-R (AEI/FEDER, EU) funded by the Spanish Ministry of Economy and Competitiveness (MINECO) and from Basque Government–ELKARTEK Program, Grant KK-2018/00054. This work has been possible thanks to the support of the computing infrastructure of the i2BASQUE academic network and the technical and human support provided by IZO-SGI SGIker of UPV/EHU and European funding (ERDF and ESF). This research is also supported by the Basque Government through the BERC 2018-2021 program and by MINECO: BCAM Severo Ochoa accreditation SEV-2017-0718. This work was also part of the Agile BioFoundry (http://agilebiofoundry.org) supported by the US Department of Energy, Energy Efficiency and Renewable Energy, Bioenergy Technologies Office, through contract DE-AC02-05CH11231 between Lawrence Berkeley National Laboratory and the US Department of Energy. The views and opinions of the authors expressed herein do not necessarily state or reflect those of the US Government or any agency thereof. Neither the US Government nor any agency thereof, nor any of their employees, makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness or usefulness of any information, apparatus, product or process disclosed, or represents that its use would not infringe privately owned rights. The US Government retains and the publisher, by accepting the article for publication, acknowledges that the US Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Invariance of the PMMC step
The Partial Momentum Monte Carlo step of the MMHMC method leaves the importance target distribution \(\tilde{\pi }\) (Eq. 12) invariant if for a transition kernel \(T(\cdot |\cdot )\) the following condition is satisfied
for all \(n=1,\dots ,N\).
The PMMC step is sampling on a space augmented with a noise vector \(\mathbf {u} \sim \mathcal {N}(0,M)\) with the extended density \(\hat{\pi }\) (defined in Eq. 17), for which
Therefore, we want to show that
for the transition kernel defined as
where \(\mathcal {P}=\min \left\{ 1,\hat{\pi }(\mathcal {R} (\varvec{\theta },\mathbf {p},\mathbf {u}))/\hat{\pi } (\varvec{\theta },\mathbf {p},\mathbf {u})\right\} \) is the Metropolis probability, \(\delta \) is the Delta function, \(\mathcal {R}\) is the proposal function (Eq. 16) and \(\hat{\mathcal {F}}(\varvec{\theta },\mathbf {p},\mathbf {u}) =(\varvec{\theta },\mathbf {p},-\mathbf {u})\) is the flipping function. Note that the map \(\mathcal {R}\) is volume preserving; hence, the Metropolis probability \(\mathcal {P}\) does not include the Jacobian factor. For the sake of clarity, we denote \(\mathbf {x}=(\varvec{\theta },\mathbf {p},\mathbf {u})\) and write the right-hand side of the expression (28) as
Applying change of variables \(\mathbf {x}=\hat{\mathcal {F}} \circ \mathcal {R}(\bar{\mathbf {x}})\), which is volume preserving, to the first term in the sum, omitting the bars, and using the fact that \(\hat{\mathcal {F}} =\mathcal {R}\circ \hat{\mathcal {F}}\circ \mathcal {R}\), one obtains
Since \(\hat{\pi }\circ \hat{\mathcal {F}}=\hat{\pi }\), the first and third terms cancel out. Employing change of variables \(\mathbf {x}=\hat{\mathcal {F}}(\bar{\mathbf {x}})\) to the second term and again omitting the bars lead to
which proves the equality (28).
Appendix B: Modified Hamiltonians for splitting integrators
The coefficients for the two-stage integrator family (14) and modified Hamiltonians (22)–(24) are the following:
Using (29), one can also obtain the modified Hamiltonian for the Verlet integrator, since two steps of Verlet integration are equivalent to one step of the two-stage integrator with \(b=1/4\). The coefficients are therefore
For three-stage integrators (15) (a two-parameter family), the coefficients are
The coefficients for the modified Hamiltonians (25)–(26) are calculated as
For the fourth-order modified Hamiltonian (25), we use the second-order centered finite difference approximations of time derivatives of the gradient of the potential function
with \(\varepsilon =h\) for the Verlet, \(\varepsilon =h/2\) for two-stage and \(\varepsilon =ah\) for three-stage integrators with a being the integrator’s coefficient advancing position variables. The sixth-order modified Hamiltonian (26), here considered only for the Verlet and two-stage integrators, is calculated using fourth-order approximation for the first derivative and second-order approximations for the second and third derivatives
where \(\varepsilon \) depends on the integrator as before. The interpolating polynomial in terms of the gradient of the potential function \(\mathbf {U}(t_i)=U_{\varvec{\theta }} (\varvec{\theta }^i), \; i=n-k,\dots ,n,\dots ,n +k, n \in \{0, L\}\) is constructed from a numerical trajectory \(\{U_{\varvec{\theta }}(\varvec{\theta }^i\}^{L+k}_{i=-k}\) where \(k=1\) and \(k=2\) for the fourth- and sixth-order modified Hamiltonians, respectively.
Appendix C: Modified PMMC step
In the modified PMMC step proposed for MMHMC, a partial momentum update is integrated into the modified Metropolis test; i.e., it is implicitly present in the algorithm. This reduces the frequency of derivative calculations in the Metropolis function. To implement this idea, we recall that the momentum update probability
depends on the error in the extended Hamiltonian (18). Let us first consider the fourth-order modified Hamiltonian (22) with analytical derivatives of the potential function. It is easy to show that the difference in the extended Hamiltonian (18) between a current state and a state with partially updated momentum is
with
For the sixth-order modified Hamiltonian (24) for Gaussian problems, the error in the extended Hamiltonian (18) can be calculated in a similar manner
with
Therefore, if the modified Hamiltonians (22)–(24) with analytical derivatives are used, a new momentum can be determined as
where \(\mathbf {u} \sim \mathcal {N}(0,M)\) is the noise vector, \(\varphi \in \left( 0,1\right] \) and \(\varDelta \hat{H}\) is defined as in (32) or (34).
Consequently, for models with no hierarchical structure, there is no need to calculate gradients within the PMMC step, second derivatives can be taken from the previous Metropolis test within the HDMC step and there is no need to generate \(\mathbf {u}^{*}\).
If the modified Hamiltonians are calculated using numerical time derivatives of the gradient of the potential function, for the Verlet, two- and three-stage integrators as in (25)–(26), the difference in the fourth-order extended Hamiltonian becomes
whereas for the sixth-order extended Hamiltonian it is
Here, \(P^{*}_1, P^{*}_2,P^{*}_3\), are the first-, second- and third-order scaled time derivatives of the gradient, respectively (see Sect. 2.3.3), calculated from the trajectory with updated momentum \(\mathbf {p}^{*}\). The computational gain of the new PMMC step, in this case, results from skipping a calculation of the terms multiplying \(k_{22}\) in (25) and \(k_{44}\) in (26). It has to be admitted that the term multiplying \(k_{22}\) in (25) is of negligible cost, and thus the gain from using the new momentum update is not as significant as in the case of modified Hamiltonians with analytical derivatives. On the contrary, the saving in computation arising from the absence of the term multiplying \(k_{44}\) in the sixth-order modified Hamiltonian (26) is essential.
In summary, in the case of the sixth-order modified Hamiltonian, with derivatives calculated either analytically or numerically, the proposed momentum refreshment enhances computational performance of MMHMC. This also applies to the cases when the fourth-order modified Hamiltonian with analytical derivatives is used. In this situation, however, if the Hessian matrix of the potential function is dense, instead of using the modified Hamiltonian with analytical derivatives, we recommend using numerical derivatives, for which the saving is negligible. On the other hand, if the computation of the Hessian matrix is not very costly (e.g., being block diagonal, sparse, close to constant), it might be more efficient to use analytical derivatives, for which the new formulation of the Metropolis test leads to computational saving.
Appendix D: Algorithmic summary
We provide the algorithmic summary for HMC (Algorithm 1) and two alternative algorithms for the MMHMC method. One (Algorithm 2) uses the modified Hamiltonians defined through analytical derivatives of the potential function and is recommended for the problems with sparse Hessian matrices. The other algorithm (Algorithm 3) relies on the modified Hamiltonians expressed through numerical time derivatives of the gradient of the potential function. This algorithm, although including additional integration step, is beneficial for cases where higher order derivatives are computationally demanding.
Appendix E: Experimental setup
Rights and permissions
About this article
Cite this article
Radivojević, T., Akhmatskaya, E. Modified Hamiltonian Monte Carlo for Bayesian inference. Stat Comput 30, 377–404 (2020). https://doi.org/10.1007/s11222-019-09885-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-019-09885-x