Bayesian Update with Importance Sampling: Required Sample Size
<p>Noise scaling with <math display="inline"><semantics> <mrow> <mi>d</mi> <mo>=</mo> <mi>k</mi> <mo>=</mo> <mn>5</mn> <mo>.</mo> </mrow> </semantics></math> Each histogram represents the empirical distribution of the largest autonormalized weight of importance sampling with a given choice of sample size <span class="html-italic">N</span> and noise level <math display="inline"><semantics> <mrow> <msup> <mi>γ</mi> <mn>2</mn> </msup> <mo>.</mo> </mrow> </semantics></math> The empirical distribution is obtained using 400 sets of random weights and the histograms are arranged so that in (<b>a</b>) <math display="inline"><semantics> <mrow> <mi>N</mi> <mo>=</mo> <msup> <mi>γ</mi> <mrow> <mo>−</mo> <mn>4</mn> </mrow> </msup> </mrow> </semantics></math> along the bottom-left to top-right diagonal, while in (<b>b</b>) <math display="inline"><semantics> <mrow> <mi>N</mi> <mo>=</mo> <msup> <mi>γ</mi> <mrow> <mo>−</mo> <mn>6</mn> </mrow> </msup> </mrow> </semantics></math> along the same diagonal. With scaling <math display="inline"><semantics> <msup> <mi>γ</mi> <mrow> <mo>−</mo> <mn>4</mn> </mrow> </msup> </semantics></math> the distribution of the maximum weight concentrates around 1 along this diagonal, suggesting weight collapse. In contrast, with scaling <math display="inline"><semantics> <msup> <mi>γ</mi> <mrow> <mo>−</mo> <mn>6</mn> </mrow> </msup> </semantics></math> weight collapse is avoided with high probability.</p> "> Figure 2
<p>Prior scaling with <math display="inline"><semantics> <mrow> <mi>d</mi> <mo>=</mo> <mi>k</mi> <mo>=</mo> <mn>5</mn> <mo>.</mo> </mrow> </semantics></math> The setting is similar to the one considered in <a href="#entropy-23-00022-f001" class="html-fig">Figure 1</a>.</p> "> Figure 3
<p>Dimensional scaling with <math display="inline"><semantics> <mrow> <mi>λ</mi> <mo>=</mo> <mn>1.3</mn> <mo>.</mo> </mrow> </semantics></math> The experimental setting is similar to those in <a href="#entropy-23-00022-f001" class="html-fig">Figure 1</a> and <a href="#entropy-23-00022-f002" class="html-fig">Figure 2</a>.</p> "> Figure A1
<p>Noise scaling with <math display="inline"><semantics> <mrow> <mi>d</mi> <mo>=</mo> <mi>k</mi> <mo>=</mo> <mn>4</mn> <mo>.</mo> </mrow> </semantics></math></p> "> Figure A2
<p>Prior scaling with <math display="inline"><semantics> <mrow> <mi>d</mi> <mo>=</mo> <mi>k</mi> <mo>=</mo> <mn>4</mn> <mo>.</mo> </mrow> </semantics></math></p> ">
Abstract
:1. Introduction
Main Goals, Specific Contributions and Outline
- Section 2 provides a unified perspective on the sufficiency and necessity of having a sample size of the order of the -divergence between target and proposal to guarantee accurate importance sampling with bounded test functions. Our analysis and presentation are informed by the specific features that shape the use of importance sampling to approximate Bayes’ rule. The key role of the second moment of the -divergence has long been acknowledged [24,25], and it is intimately related to an effective sample size used by practitioners to monitor the performance of importance sampling [26,27]. A topic of recent interest is the development of adaptive importance sampling schemes where the proposal is chosen by minimizing—over some admissible family of distributions—the -divergence with respect to the target [28,29]. The main original contributions of Section 2 are Proposition 2 and Theorem 1, which demonstrate the necessity of suitably increasing the sample size with the -divergence along singular limit regimes. The idea of Proposition 2 is inspired by [7], but adapted here from relative entropy to -divergence. Our results complement sufficient conditions on the sample size derived in [1] and necessary conditions for un-normalized (as opposed to autonormalized) importance sampling in [8].
- In Section 3, Proposition 4 gives a closed formula for the -divergence between posterior and prior in a linear-Gaussian Bayesian inverse problem setting. This formula allows us to investigate the scaling of the -divergence (and thereby the rate at which the sample size needs to grow) in several singular limit regimes, including small observation noise, large prior covariance and large dimension. Numerical examples motivate and complement the theoretical results. Large dimension and small noise singular limits were studied in [1] in a diagonal setting. The results here are generalized to a nondiagonal setting, and the presentation is simplified by using the closed formula in Proposition 4. Moreover, we include singular limits arising from large prior covariance. In an infinite dimensional setting, Corollary 1 establishes an equivalence between absolute continuity, finite -divergence and finite intrinsic dimension. A similar result was proved in more generality in [1] using the advanced theory of Gaussian measures in Hilbert space [30]; our presentation and proof here are elementary, while still giving the same degree of understanding.
- In Section 4 we follow [1,13,14,15,31] and investigate the use of importance sampling to approximate Bayes’ rule within one filtering step in a linear-Gaussian setting. We build on the examples and results in Section 3 to identify model regimes where the performance of standard and optimal proposals can be dramatically different. We refer to [2,32] for an introduction to standard and optimal proposals for particle filtering and to [33] for a more advanced presentation. The main original contribution of this section is Theorem 2, which gives a direct comparison of the -divergence between target and standard/optimal proposals. This result improves on [1], where only a comparison between the intrinsic dimension was established.
2. Importance Sampling and -Divergence
2.1. Sufficient and Necessary Sample Size
- i
- If then
- ii
- If , then there exists a fixed such that
2.2. -Divergence and Effective Sample Size
2.3. -Divergence between Gaussians
3. Importance Sampling for Inverse Problems
3.1. Inverse Problem Setting and -Divergence between Posterior and Prior
3.2. Importance Sampling in Small Noise Regime
3.3. Importance Sampling and Prior Scaling
- ;
- has a well-defined and invertible limit;
- .
3.4. Importance Sampling in High Dimension
3.4.1. One Dimensional Setting
3.4.2. Large Dimensional Limit
3.4.3. Infinite Dimensional Singularity
- i
- ;
- ii
- for almost every y;
- iii
- for almost every y.
4. Importance Sampling for Data Assimilation
4.1. One-Step Filtering Setting
4.2. -Divergence Comparison between Standard and Optimal Proposal
4.3. Standard and Optimal Proposal in Small Noise Regime
4.4. Standard and Optimal Proposal in High Dimension
- 1.
- and if and only if ,
- 2.
- and if and only if and .
Author Contributions
Funding
Conflicts of Interest
Appendix A. χ 2 -Divergence between Gaussians
Appendix B. Proof of Lemma 2
Appendix C. Additional Figures
References
- Agapiou, S.; Papaspiliopoulos, O.; Sanz-Alonso, D.; Stuart, A.M. Importance sampling: Intrinsic dimension and computational cost. Stat. Sci. 2017, 32, 405–431. [Google Scholar] [CrossRef] [Green Version]
- Sanz-Alonso, D.; Stuart, A.M.; Taeb, A. Inverse Problems and Data assimilation. arXiv 2018, arXiv:1810.06191. [Google Scholar]
- Barber, D. Bayesian Reasoning and Machine Learning; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
- Garcia Trillos, N.; Kaplan, Z.; Samakhoana, T.; Sanz-Alonso, D. On the consistency of graph-based Bayesian semi-supervised learning and the scalability of sampling algorithms. J. Mach. Learn. Res. 2020, 21, 1–47. [Google Scholar]
- Garcia Trillos, N.; Sanz-Alonso, D. The Bayesian update: Variational formulations and gradient flows. Bayesian Anal. 2018, 15, 29–56. [Google Scholar] [CrossRef]
- Dick, J.; Kuo, F.Y.; Sloan, I.H. High-dimensional integration: The quasi-Monte Carlo way. Acta Numer. 2013, 22, 133. [Google Scholar] [CrossRef]
- Chatterjee, S.; Diaconis, P. The sample size required in importance sampling. arXiv 2015, arXiv:1511.01437. [Google Scholar] [CrossRef] [Green Version]
- Sanz-Alonso, D. Importance sampling and necessary sample size: An information theory approach. SIAM/ASA J. Uncertain. Quantif. 2018, 6, 867–879. [Google Scholar] [CrossRef] [Green Version]
- Rubino, G.; Tuffin, B. Rare Event Simulation Using Monte Carlo Methods; John Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
- Kahn, H.; Marshall, A.W. Methods of reducing sample size in Monte Carlo computations. J. Oper. Res. Soc. Am. 1953, 1, 263–278. [Google Scholar] [CrossRef]
- Kahn, H. Use of different Monte Carlo Sampling Techniques; Rand Corporation: Santa Monica, CA, USA, 1955. [Google Scholar]
- Bugallo, M.F.; Elvira, V.; Martino, L.; Luengo, D.; Miguez, J.; Djuric, P.M. Adaptive importance sampling: The past, the present, and the future. IEEE Signal Process. Mag. 2017, 34, 60–79. [Google Scholar] [CrossRef]
- Bengtsson, T.; Bickel, P.; Li, B. Curse-of-dimensionality revisited: Collapse of the particle filter in very large scale systems. In Probability and Statistics: Essays in honor of David A. Freedman; Institute of Mathematical Statistics: Beachwood, OH, USA, 2008; pp. 316–334. [Google Scholar]
- Bickel, P.; Li, B.; Bengtsson, T. Sharp failure rates for the bootstrap particle filter in high dimensions. In Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh; Institute of Mathematical Statistics: Beachwood, OH, USA, 2008; pp. 318–329. [Google Scholar]
- Snyder, C.; Bengtsson, T.; Bickel, P.; Anderson, J. Obstacles to high-dimensional particle filtering. Mon. Weather Rev. 2008, 136, 4629–4640. [Google Scholar] [CrossRef] [Green Version]
- Rebeschini, P.; Van Handel, R. Can local particle filters beat the curse of dimensionality? Ann. Appl. Probab. 2015, 25, 2809–2866. [Google Scholar] [CrossRef]
- Chorin, A.J.; Morzfeld, M. Conditions for successful data assimilation. J. Geophys. Res. Atmos. 2013, 118, 11–522. [Google Scholar] [CrossRef] [Green Version]
- Houtekamer, P.L.; Mitchell, H.L. Data assimilation using an ensemble Kalman filter technique. Mon. Weather Rev. 1998, 126, 796–811. [Google Scholar] [CrossRef]
- Hamill, T.M.; Whitaker, J.S.; Anderson, J.L.; Snyder, C. Comments on “Sigma-point Kalman filter data assimilation methods for strongly nonlinear systems”. J. Atmos. Sci. 2009, 66, 3498–3500. [Google Scholar] [CrossRef]
- Morzfeld, M.; Hodyss, D.; Snyder, C. What the collapse of the ensemble Kalman filter tells us about particle filters. Tellus A Dyn. Meteorol. Oceanogr. 2017, 69, 1283809. [Google Scholar] [CrossRef]
- Farchi, A.; Bocquet, M. Comparison of local particle filters and new implementations. Nonlinear Process. Geophys. 2018, 25. [Google Scholar] [CrossRef] [Green Version]
- Morzfeld, M.; Tong, X.T.; Marzouk, Y.M. Localization for MCMC: Sampling high-dimensional posterior distributions with local structure. J. Comput. Phys. 2019, 380, 1–28. [Google Scholar] [CrossRef] [Green Version]
- Tong, X.T.; Morzfeld, M.; Marzouk, Y.M. MALA-within-Gibbs samplers for high-dimensional distributions with sparse conditional structure. SIAM J. Sci. Comput. 2020, 42, A1765–A1788. [Google Scholar] [CrossRef]
- Liu, J.S. Metropolized independent sampling with comparisons to rejection sampling and importance sampling. Stat. Comput. 1996, 6, 113–119. [Google Scholar] [CrossRef]
- Pitt, M.K.; Shephard, N. Filtering via simulation: Auxiliary particle filters. J. Am. Stat. Assoc. 1999, 94, 590–599. [Google Scholar] [CrossRef]
- Kong, A. A note on importance sampling using standardized weights. Univ. Chicago, Dept. Stat. Tech. Rep 1992, 348. [Google Scholar]
- Kong, A.; Liu, J.S.; Wong, W.H. Sequential imputations and Bayesian missing data problems. J. Am. Stat. Assoc. 1994, 89, 278–288. [Google Scholar] [CrossRef]
- Ryu, E.K.; Boyd, S.P. Adaptive importance sampling via stochastic convex programming. arXiv 2014, arXiv:1412.4845. [Google Scholar]
- Akyildiz, Ö.D.; Míguez, J. Convergence rates for optimised adaptive importance samplers. arXiv 2019, arXiv:1903.12044. [Google Scholar]
- Bogachev, V.I. Gaussian Measures; Number 62; American Mathematical Society: Providence, RI, USA, 1998. [Google Scholar]
- Snyder, C.; Bengtsson, T.; Morzfeld, M. Performance bounds for particle filters using the optimal proposal. Mon. Weather Rev. 2015, 143, 4750–4761. [Google Scholar] [CrossRef]
- Doucet, A.; De Freitas, N.; Gordon, N. An Introduction to Sequential Monte Carlo Methods. In Sequential Monte Carlo Methods in Practice; Springer: Berlin/Heisenberg, Germany, 2001; pp. 3–14. [Google Scholar]
- Del Moral, P. Feynman-Kac Formulae; Springer: Berlin/Heisenberg, Germany, 2004. [Google Scholar]
- Doucet, A.; Godsill, S.; Andrieu, C. On sequential Monte Carlo sampling methods for Bayesian filtering. Stat. Comput. 2000, 10, 197–208. [Google Scholar] [CrossRef]
- Nielsen, F.; Nock, R. On the chi square and higher-order chi distances for approximating f-divergences. IEEE Signal Process. Lett. 2013, 21, 10–13. [Google Scholar] [CrossRef] [Green Version]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sanz-Alonso, D.; Wang, Z. Bayesian Update with Importance Sampling: Required Sample Size. Entropy 2021, 23, 22. https://doi.org/10.3390/e23010022
Sanz-Alonso D, Wang Z. Bayesian Update with Importance Sampling: Required Sample Size. Entropy. 2021; 23(1):22. https://doi.org/10.3390/e23010022
Chicago/Turabian StyleSanz-Alonso, Daniel, and Zijian Wang. 2021. "Bayesian Update with Importance Sampling: Required Sample Size" Entropy 23, no. 1: 22. https://doi.org/10.3390/e23010022