Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Free access

Pathwise conditioning of Gaussian processes

Published: 01 January 2021 Publication History

Abstract

As Gaussian processes are used to answer increasingly complex questions, analytic solutions become scarcer and scarcer. Monte Carlo methods act as a convenient bridge for connecting intractable mathematical expressions with actionable estimates via sampling. Conventional approaches for simulating Gaussian process posteriors view samples as draws from marginal distributions of process values at finite sets of input locations. This distribution-centric characterization leads to generative strategies that scale cubically in the size of the desired random vector. These methods are prohibitively expensive in cases where we would, ideally, like to draw high-dimensional vectors or even continuous sample paths. In this work, we investigate a different line of reasoning: rather than focusing on distributions, we articulate Gaussian conditionals at the level of random variables. We show how this pathwise interpretation of conditioning gives rise to a general family of approximations that lend themselves to efficiently sampling Gaussian process posteriors. Starting from first principles, we derive these methods and analyze the approximation errors they introduce. We, then, ground these results by exploring the practical implications of pathwise conditioning in various applied settings, such as global optimization and reinforcement learning.

References

[1]
A. G. Barto, R. S. Sutton, and C. W. Anderson. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, (5):834-846, 1983. Cited on page 35.
[2]
J. Bect, D. Ginsbourger, L. Li, V. Picheny, and E. Vazquez. Sequential design of computer experiments for the estimation of a probability of failure. Statistics and Computing, 22(3):773-793, 2012. Cited on page 2.
[3]
K. Blomqvist, S. Kaski, and M. Heinonen. Deep convolutional Gaussian processes. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 582-597. Springer, 2019. Cited on page 37.
[4]
V. Borovitskiy, I. Azangulov, A. Terenin, P. Mostowsky, M. P. Deisenroth, and N. Durrande. Matérn Gaussian processes on graphs. In Artificial Intelligence and Statistics, pages 2593-2601, 2021. Cited on page 13.
[5]
V. Borovitskiy, A. Terenin, P. Mostowsky, and M. P. Deisenroth. Matérn Gaussian processes on Riemannian manifolds. In Advances in Neural Information Processing Systems, 2020. Cited on page 13.
[6]
D. R. Burt, C. E. Rasmussen, and M. van der Wilk. Convergence of sparse variational inference in Gaussian processes regression. Journal of Machine Learning Research, 21(131):1-63, 2020. Cited on page 19.
[7]
D. Calandriello, L. Carratino, A. Lazaric, M. Valko, and L. Rosasco. Gaussian process optimization with adaptive sketching: scalable and no regret. In Conference on Learning Theory, pages 533-557, 2019. Cited on page 15.
[8]
P. E. Castro, W. H. Lawton, and E. Sylvestre. Principal modes of variation for processes with continuous sample curves. Technometrics, 28(4):329-337, 1986. Cited on page 12.
[9]
J. T. Chang and D. Pollard. Conditioning as disintegration. Statistica Neerlandica, 51(3):287-317, 1997. Cited on page 5.
[10]
O. Chapelle and L. Li. An empirical evaluation of Thompson sampling. In Advances in Neural Information Processing Systems, pages 2249-2257, 2011. Cited on page 30.
[11]
C.-A. Cheng and B. Boots. Variational inference for Gaussian process models with linear complexity. In Advances in Neural Information Processing Systems, pages 5184-5194, 2017. Cited on page 15.
[12]
J.-P. Chilès and P. Delfiner. Geostatistics: Modeling Spatial Uncertainty. John Wiley & Sons, 2012. Cited on pages 2, 10.
[13]
J.-P. Chilès and C. Lantuéjoul. Prediction by conditional simulation: models and algorithms. In Space, Structure and Randomness, pages 39-68. Springer, 2005. Cited on page 9.
[14]
M. Cuturi. Sinkhorn distances: lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, pages 2292-2300, 2013. Cited on page 33.
[15]
A. Damianou and N. Lawrence. Deep Gaussian processes. In Artificial Intelligence and Statistics, pages 207-215, 2013. Cited on page 37.
[16]
C. de Fouquet. Reminders on the conditioning Kriging. In Geostatistical Simulations, pages 131-145. Springer, 1994. Cited on page 10.
[17]
M. P. Deisenroth, D. Fox, and C. E. Rasmussen. Gaussian processes for data-efficient learning in robotics and control. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2):408-423, 2015. Cited on pages 2, 34-36.
[18]
M. P. Deisenroth and C. E. Rasmussen. PILCO: A model-based and data-efficient approach to policy search. In International Conference on Machine Learning, pages 465-472, 2011. Cited on page 34.
[19]
C. R. Dietrich and G. N. Newsam. Fast and exact simulation of stationary Gaussian processes through circulant embedding of the covariance matrix. SIAM Journal of Scientific Computing, 18:1088-1107, 1997. Cited on pages 11, 20.
[20]
A. Doucet. A note on efficient conditional simulation of Gaussian distributions. Technical report, University of British Columbia, 2010. Cited on page 9.
[21]
N. Durrande, V. Adam, L. Bordeaux, S. Eleftheriadis, and J. Hensman. Banded matrix operators for Gaussian Markov models in the automatic differentiation era. In Artificial Intelligence and Statistics, pages 2780-2789, 2019. Cited on page 11.
[22]
V. Dutordoir, M. van der Wilk, A. Artemev, and J. Hensman. Bayesian image classification with deep convolutional Gaussian processes. In Artificial Intelligence and Statistics, pages 1529-1539, 2020. Cited on pages 37, 38.
[23]
X. Emery. Conditioning simulations of Gaussian random fields by ordinary Kriging. Mathematical Geology, 39(6):607-623, 2007. Cited on page 10.
[24]
L. C. Evans. Partial Differential Equations. American Mathematical Society, 2010. Cited on pages 13, 14.
[25]
R. FitzHugh. Impulses and physiological states in theoretical models of nerve membrane. Biophysical Journal, 1(6):445, 1961. Cited on page 33.
[26]
P. I. Frazier. A tutorial on Bayesian optimization. arXiv:1807.02811, 2018. Cited on page 30.
[27]
K. Fukunaga. Introduction to Statistical Pattern Recognition. Elsevier, 2013. Cited on pages 12, 13.
[28]
J. Gardner, G. Pleiss, K. Q. Weinberger, D. Bindel, and A. G. Wilson. GPyTorch: blackbox matrix-matrix Gaussian process inference with GPU acceleration. In Advances in Neural Information Processing Systems, pages 7576-7586, 2018. Cited on pages 19, 21.
[29]
A. Grigoryan. Heat Kernel Analysis on Manifolds. American Mathematical Society, 2009. Cited on page 13.
[30]
P. Hegde, M. Heinonen, H. Lähdesmäki, and S. Kaski. Deep learning with differential Gaussian process flows. In Artificial Intelligence and Statistics, pages 1812-1821, 2019. Cited on page 37.
[31]
J. Hensman, N. Durrande, and A. Solin. Variational Fourier features for Gaussian processes. Journal of Machine Learning Research, 18(151):1-151, 2017. Cited on page 15.
[32]
J. Hensman, N. Fusi, and N. D. Lawrence. Gaussian processes for big data. In Uncertainty in Artificial Intelligence, pages 282-290, 2013. Cited on page 17.
[33]
J. Hensman, A. Matthews, and Z. Ghahramani. Scalable variational Gaussian process classification. In Artificial Statistics and Machine Learning, 2015. Cited on page 17.
[34]
J. M. Hernández-Lobato, J. Requeima, E. O. Pyzer-Knapp, and A. Aspuru-Guzik. Parallel and distributed Thompson sampling for large-scale accelerated exploration of chemical space. In International Conference on Machine Learning, pages 1470-1479, 2017. Cited on page 30.
[35]
Y. Hoffman and E. Ribak. Constrained realizations of Gaussian fields: a simple algorithm. The Astrophysical Journal, 380:L5-L8, 1991. Cited on page 10.
[36]
A. D. Ialongo, M. van der Wilk, J. Hensman, and C. E. Rasmussen. Overcoming mean-field approximations in recurrent Gaussian process models. In International Conference on Machine Learning, pages 2931-2940. PMLR, 2019. Cited on page 33.
[37]
A. G. Journel and C. J. Huijbregts. Mining Geostatistics. Academic Press, 1978. Cited on pages 2, 10.
[38]
O. Kallenberg. Foundations of Modern Probability. Springer, 2006. Cited on pages 5, 6.
[39]
S. Kamthe and M. P. Deisenroth. Data-efficient reinforcement learning with probabilistic model predictive control. In Artificial Intelligence and Statistics, pages 1701-1710, 2018. Cited on page 34.
[40]
M. Kanagawa, P. Hennig, D. Sejdinovic, and B. K. Sriperumbudur. Gaussian processes and kernel methods: A review on connections and equivalences. arXiv:1807.02582, 2018. Cited on page 28.
[41]
K. Kandasamy, A. Krishnamurthy, J. Schneider, and B. Póczos. Parallelised Bayesian optimisation via Thompson sampling. In Artificial Intelligence and Statistics, pages 133-142, 2018. Cited on page 30.
[42]
D. P. Kingma and J. Ba. Adam: a method for stochastic optimization. In International Conference on Learning Representations, 2015. Cited on page 36.
[43]
E. T. Krainski, V. Gómez-Rubio, H. Bakka, A. Lenzi, D. Castro-Camilo, D. Simpson, F. Lindgren, and H. Rue. Advanced spatial modeling with stochastic partial differential equations using R and INLA. CRC Press, 2018. Cited on page 13.
[44]
M. Lázaro-Gredilla and A. Figueiras-Vidal. Inter-domain Gaussian processes for sparse inference using inducing features. In Advances in Neural Information Processing Systems, pages 1087-1095, 2009. Cited on pages 15, 18.
[45]
Y. LeCun and C. Cortes. MNIST handwritten digit database, 2010. url: http://yann.lecun.com/exdb/mnist/. Cited on page 38.
[46]
M. Lifshits. Lectures on Gaussian Processes. Springer, 2012. Cited on page 13.
[47]
F. Lindgren, H. Rue, and J. Lindström. An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(4):423-498, 2011. Cited on pages 13, 14.
[48]
D. C. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical programming, 45(1):503-528, 1989. Cited on page 36.
[49]
J. Loper, D. Blei, J. P. Cunningham, and L. Paninski. General linear-time inference for Gaussian Processes on one dimension. arXiv:2003.05554, 2020. Cited on page 11.
[50]
G. J. Lord, C. E. Powell, and T. Shardlow. An Introduction to Computational Stochastic PDEs. Cambridge University Press, 2014. Cited on pages 13, 14.
[51]
D. G. Luenberger. Optimization by Vector Space Methods. John Wiley & Sons, 1997. Cited on page 6.
[52]
A. Mallasto and A. Feragen. Learning from uncertain curves: the 2-Wasserstein metric for Gaussian processes. In Advances in Neural Information Processing Systems, pages 5660-5670, 2017. Cited on page 23.
[53]
A. G. d. G. Matthews, M. van der Wilk, T. Nickson, K. Fujii, A. Boukouvalas, P. León-Villagrá, Z. Ghahramani, and J. Hensman. GPflow: a Gaussian process library using TensorFlow. Journal of Machine Learning Research, 18(40):1-6, 2017. Cited on page 29.
[54]
J. Močkus. On Bayesian methods for seeking the extremum. In Optimization techniques IFIP Technical Conference, pages 400-404. Springer, 1975. Cited on page 30.
[55]
M. Mutny and A. Krause. Efficient high dimensional Bayesian optimization with additivity and quadrature Fourier features. In Advances in Neural Information Processing Systems, pages 9005-9016, 2018. Cited on page 15.
[56]
J. Nagumo, S. Arimoto, and S. Yoshizawa. An active pulse transmission line simulating nerve axon. Proceedings of the Institute of Radio Engineers, 50(10):2061-2070, 1962. Cited on page 33.
[57]
H. Nickisch and C. E. Rasmussen. Approximations for binary Gaussian process classification. Journal of Machine Learning Research, 9:2035-2078, 2008. Cited on page 17.
[58]
D. S. Oliver. On conditional simulation to inaccurate data. Mathematical Geology, 28(6):811-817, 1996. Cited on page 10.
[59]
M. Opper and C. Archambeau. The variational Gaussian approximation revisited. Neural computation, 21(3):786-792, 2009. Cited on page 18.
[60]
A. Parker and C. Fox. Sampling Gaussian distributions in Krylov spaces with conjugate gradients. SIAM Journal on Scientific Computing, 34(3):B312-B334, 2012. Cited on page 19.
[61]
G. Pleiss, J. R. Gardner, K. Q. Weinberger, and A. G. Wilson. Constant-time predictive distributions for Gaussian processes. In International Conference on Machine Learning, pages 4114-4123, 2018. Cited on pages 11, 19, 21.
[62]
G. Pleiss, M. Jankowiak, D. Eriksson, A. Damle, and J. R. Gardner. Fast matrix square roots with applications to Gaussian processes and Bayesian optimization. In Advances in Neural Information Processing Systems, pages 22268-22281, 2020. Cited on page 19.
[63]
J. Quiñonero-Candela, C. E. Rasmussen, and C. K. I. Williams. Approximation methods for Gaussian process regression. In Large-scale Kernel Machines, pages 203-223. MIT Press, 2007. Cited on pages 18, 20, 29.
[64]
A. Rahimi and B. Recht. Random features for large-scale kernel machines. In Advances in Neural Information Processing Systems, pages 1177-1184, 2008. Cited on page 12.
[65]
C. E. Rasmussen and M. Kuss. Gaussian processes in reinforcement learning. In Advances in Neural Information Processing Systems, 2004. Cited on page 34.
[66]
C. E. Rasmussen and J. Quiñonero-Candela. Healing the relevance vector machine through augmentation. In International Conference on Machine Learning, pages 689-696, 2005. Cited on page 29.
[67]
C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006. Cited on pages 1, 10, 14, 16, 20.
[68]
H. Rue and L. Held. Gaussian Markov Random Fields: Theory and Applications. CRC Press, 2005. Cited on pages 11, 31.
[69]
H. Salimbeni, C.-A. Cheng, B. Boots, and M. P. Deisenroth. Orthogonally decoupled variational Gaussian processes. In Advances in Neural Information Processing Systems, pages 8711-8720, 2018. Cited on page 15.
[70]
H. Salimbeni and M. P. Deisenroth. Doubly stochastic variational inference for deep Gaussian processes. In Advances in Neural Information Processing Systems, 2017. Cited on pages 37, 39.
[71]
B. Schölkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2001. Cited on page 12.
[72]
M. Seeger. Bayesian methods for support vector machines and Gaussian processes. Technical report, 1999. Cited on page 18.
[73]
B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. de Freitas. Taking the human out of the loop: a review of Bayesian optimization. Proceedings of the IEEE, 104(1):148-175, 2015. Cited on page 2.
[74]
J. Shi, M. K. Titsias, and A. Mnih. Sparse orthogonal variational inference for Gaussian processes. In Artificial Intelligence and Statistics, pages 1932-1942, 2020. Cited on page 15.
[75]
B. W. Silverman. Spline smoothing: the equivalent variable kernel method. The Annals of Statistics:898-916, 1984. Cited on pages 21, 23.
[76]
B. W. Silverman. Some aspects of the spline smoothing approach to non-parametric regression curve fitting. Journal of the Royal Statistical Society: Series B (Methodological), 47(1):1-21, 1985. Cited on page 20.
[77]
E. Snelson and Z. Ghahramani. Sparse Gaussian processes using pseudo-inputs. In Advances in Neural Information Processing Systems, pages 1257-1264, 2006. Cited on page 18.
[78]
J. Snoek, H. Larochelle, and R. P. Adams. Practical Bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems, pages 2951-2959, 2012. Cited on page 30.
[79]
A. Solin and M. Kok. Know your boundaries: Constraining Gaussian processes by variational harmonic features. In Artificial Intelligence and Statistics, pages 2193-2202, 2019. Cited on pages 13, 31.
[80]
A. Solin and S. Särkkä. Hilbert space methods for reduced-rank Gaussian process regression. Statistics and Computing, 30(2):419-446, 2020. Cited on page 13.
[81]
P. Sollich and C. Williams. Using the equivalent kernel to understand Gaussian process regression. In Advances in Neural Information Processing Systems, pages 1313-1320, 2005. Cited on page 23.
[82]
N. Srinivas, A. Krause, S. M. Kakade, and M. Seeger. Gaussian process optimization in the bandit setting: no regret and experimental design. In Inernational Conference on Machine Learning, pages 1015-1022, 2010. Cited on page 30.
[83]
D. Sutherland and J. Schneider. On the error of random Fourier features. In Uncertainty in Artificial Intelligence, pages 862-871, 2015. Cited on pages 12, 24, 25.
[84]
W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4):285-294, 1933. Cited on page 30.
[85]
M. E. Tipping. The relevance vector machine. In Advances in Neural Information Processing Systems, pages 652-658, 2000. Cited on page 20.
[86]
M. K. Titsias. Variational learning of inducing variables in sparse Gaussian processes. In Artificial Intelligence and Statistics, pages 567-574, 2009. Cited on pages 17, 18.
[87]
M. K. Titsias. Variational model selection for sparse Gaussian process regression. Technical report, University of Manchester, 2009. Cited on page 17.
[88]
M. Titsias and N. D. Lawrence. Bayesian Gaussian process latent variable model. In Artificial Intelligence and Statistics, pages 844-851, 2010. Cited on page 17.
[89]
M. van der Wilk, V. Dutordoir, S. T. John, A. Artemev, V. Adam, and J. Hensman. A framework for interdomain and multioutput Gaussian processes. arXiv:2003.01115, 2020. Cited on page 37.
[90]
M. van der Wilk, C. E. Rasmussen, and J. Hensman. Convolutional Gaussian processes. In Advances in Neural Information Processing Systems, pages 2849-2858, 2017. Cited on page 37.
[91]
C. Villani. Optimal Transport: Old and New. Springer, 2008. Cited on page 23.
[92]
G. Wahba. Spline models for observational data. Society for Industrial and Applied Mathematics, 1990. Cited on page 20.
[93]
K.Wang, G. Pleiss, J. Gardner, S. Tyree, K. Q.Weinberger, and A. G. Wilson. Exact Gaussian processes on a million data points. In Advances in Neural Information Processing Systems, pages 14622-14632, 2019. Cited on page 19.
[94]
Z. Wang, C. Gehring, P. Kohli, and S. Jegelka. Batched large-scale Bayesian optimization in high-dimensional spaces. In Artificial Intelligence and Statistics, pages 745-754, 2018. Cited on page 15.
[95]
P. Whittle. Stochastic processes in several dimensions. Bulletin of the International Statistical Institute, 40(2):974-994, 1963. Cited on page 31.
[96]
A. Wilson and H. Nickisch. Kernel interpolation for scalable structured Gaussian processes. In International Conference on Machine Learning, pages 1775-1784, 2015. Cited on pages 11, 21.
[97]
J. T. Wilson, V. Borovitskiy, A. Terenin, P. Mostowski, and M. P. Deisenroth. Efficiently sampling functions from Gaussian process posteriors. In International Conference on Machine Learning, pages 7470-7480, 2020. Cited on pages 23, 26, 29, 31.
[98]
J. T. Wilson, F. Hutter, and M. P. Deisenroth. Maximizing acquisition functions for Bayesian optimization. In Advances in Neural Information Processing Systems, pages 9884-9895, 2018. Cited on page 30.
[99]
A. T. Wood and G. Chan. Simulation of stationary Gaussian processes in [0; 1]d. Journal of Computational and Graphical Statistics, 3(4):409-432, 1994. Cited on pages 11, 20.
[100]
H. Zhu, C. K. Williams, R. Rohwer, and M. Morciniec. Gaussian regression and optimal finite dimensional linear models. Technical report, Aston University, 1997. Cited on page 13.
[101]
D. L. Zimmerman. Computationally exploitable structure of covariance matrices and generalized convariance matrices in spatial models. Journal of Statistical Computation and Simulation, 32(1-2):1-15, 1989. Cited on page 11.

Cited By

View all
  • (2023)The behavior and convergence of local Bayesian optimizationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669337(73497-73523)Online publication date: 10-Dec-2023
  • (2023)Bayesian nonparametric (non-)renewal processes for analyzing neural spike train variabilityProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669098(68013-68027)Online publication date: 10-Dec-2023

Index Terms

  1. Pathwise conditioning of Gaussian processes
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image The Journal of Machine Learning Research
    The Journal of Machine Learning Research  Volume 22, Issue 1
    January 2021
    13310 pages
    ISSN:1532-4435
    EISSN:1533-7928
    Issue’s Table of Contents
    CC-BY 4.0

    Publisher

    JMLR.org

    Publication History

    Accepted: 01 May 2021
    Published: 01 January 2021
    Received: 01 November 2020
    Published in JMLR Volume 22, Issue 1

    Author Tags

    1. Gaussian processes
    2. approximate posteriors
    3. efficient sampling

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)70
    • Downloads (Last 6 weeks)12
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)The behavior and convergence of local Bayesian optimizationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669337(73497-73523)Online publication date: 10-Dec-2023
    • (2023)Bayesian nonparametric (non-)renewal processes for analyzing neural spike train variabilityProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669098(68013-68027)Online publication date: 10-Dec-2023

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media