article

Free access

Bayesian Inference and Optimal Design for the Sparse Linear Model

Author:

Matthias W. SeegerAuthors Info & Claims

The Journal of Machine Learning Research, Volume 9

Pages 759 - 813

Published: 01 June 2008 Publication History

Abstract

The linear model with sparsity-favouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how to estimate model hyperparameters by empirical Bayesian maximisation of the marginal likelihood, and propose ideas in order to scale up the method to very large underdetermined problems.

We demonstrate the versatility of our framework on the application of gene regulatory network identification from micro-array expression data, where both the Laplace prior and the active experimental design approach are shown to result in significant improvements. We also address the problem of sparse coding of natural images, and show how our framework can be used for compressive sensing tasks.

Part of this work appeared in Seeger et al. (2007b). The gene network identification application appears in Steinke et al. (2007).

References

[1]

H. Attias. A variational Bayesian framework for graphical models. In S. Solla, T. Leen, and K.-R. Müller, editors, Advances in Neural Information Processing Systems 12, pages 209-215. MIT Press, 2000.

[2]

P. Berkes, R. Turner, and M. Sahani. On sparsity and overcompleteness in image models. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20. MIT Press, 2008.

[3]

C. Bishop and J. Winn. Structured variational distributions in VIBES. In C. Bishop and B. Frey, editors, Workshop on Artificial Intelligence and Statistics 9, pages 244-251, 2003. Electronic Proceedings (ISBN 0-9727358-0-1).

[4]

V. Bogachev. Gaussian Measures. Mathematical Surveys and Monographs. American Mathematical Society, 1998.

[5]

L. Bottou. Online learning and stochastic approximations. In D. Saad, editor, On-Line Learning in Neural Networks. Cambridge University Press, 1998.

Digital Library

[6]

S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2002.

Digital Library

[7]

E. Candès, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory, 52 (2):489-509, 2006.

Digital Library

[8]

K. Chaloner and I. Verdinelli. Bayesian experimental design: A review. Statistical Science, 10(3): 273-304, 1995.

[9]

S. Chen, D. Donoho, and M. Saunders. Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20(1):33-61, 1999.

Digital Library

[10]

R. Chhikara and L. Folks. The Inverse Gaussian Distribution: Theory, Methodology, and Applications . Marcel Dekker Inc., 1989.

Digital Library

[11]

S. Chib. Marginal likelihood from the Gibbs output. Journal of the American Statistical Association, 90(432):1313-1321, 1995.

[12]

J. DeRisi, V. Iyer, and P. Brown. Exploring the matebolic and genetic control of gene expression on a genomic scale. Science, 282:699-705, 1997.

[13]

J. Dongarra, C. Moler, J. Bunch, and G. Stewart. LINPACK User's Guide. Society for Industrial and Applied Mathematics, 1979.

[14]

D. Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52(4):1289-1306, 2006.

Digital Library

[15]

D. Donoho and M. Elad. Optimally sparse representation in general (nonorthogonal) dictionaries via l ₁ minimization. Proc. Natl. Acad. Sci. USA, 100:2197-2202, 2003.

[16]

A. Faul and M. Tipping. Analysis of sparse Bayesian learning. In T. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 383-389. MIT Press, 2002.

[17]

V. Fedorov. Theory of Optimal Experiments. Academic Press, 1972.

[18]

M. Figueiredo. Adaptive sparseness for supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9):1050-1059, 2003.

Digital Library

[19]

T. Gardner, C. Cantor, and J. Collins. Construction of a genetic toggle switch in Escherichia coli. Nature, 403(6767):339-342, 2000.

[20]

S. Gerwinn, J. Macke, M. Seeger, and M. Bethge. Bayesian inference for spiking neuron models with a sparsity prior. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20. MIT Press, 2008.

[21]

Z. Ghahramani and M. Beal. Propagation algorithms for variational Bayesian learning. In T. Leen, T. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 507-513. MIT Press, 2001.

[22]

W. Gilks, S. Richardson, and D. Spiegelhalter, editors. Markov Chain Monte Carlo in Practice. Chapman & Hall, 1st edition, 1996.

[23]

W. R. Gilks and P. Wild. Adaptive rejection sampling for Gibbs sampling. Applied Statistics, 41(2): 337-348, 1992.

[24]

M. Girolami. A variational method for learning sparse and overcomplete representations. Neural Computation, 13:2517-2532, 2001.

Digital Library

[25]

T. Gneiting. Normal scale mixtures and dual probability densities. J. Statist. Comput. Simul., 59: 375-384, 1997.

[26]

T. Hastie, S. Rosset, R. Tibshirani, and J. Zhu. The entire regularization path for the support vector machine. Journal of Machine Learning Research, 5:1391-1415, 2004.

Digital Library

[27]

H. Henderson and S. Searle. On deriving the inverse of a sum of matrices. SIAM Review, 23:53-60, 1981.

Digital Library

[28]

P. Hojen-Sorensen, O. Winther, and L. Hansen. Mean field approaches to independent component analysis. Neural Computation, 14:889-918, 2002.

Digital Library

[29]

R. Horn and C. Johnson. Matrix Analysis. Cambridge University Press, 1st edition, 1985.

Digital Library

[30]

H. Ishwaran and J. Rao. Spike and slab gene selection for multigroup microarray data. Journal of the American Statistical Association, 100(471):764-780, 2005.

[31]

T. Jaakkola. Variational Methods for Inference and Estimation in Graphical Models. PhD thesis, Massachusetts Institute of Technology, 1997.

Digital Library

[32]

S. Ji and L. Carin. Bayesian compressive sensing and projection optimization. In Z. Ghahramani, editor, International Conference on Machine Learning 24. Omni Press, 2007.

Digital Library

[33]

M. Jordan, Z. Ghahramani, T. Jaakkola, and L. Saul. An introduction to variational methods in graphical models. In M. I. Jordan, editor, Learning in Graphical Models. Kluwer, 1997.

Digital Library

[34]

B. Kholodenko, A. Kiyatkin, F. Bruggeman, E. Sontag, H. Westerhoff, and J. Hoek. Untangling the wires: A strategy to trace functional interactions in signaling and gene networks. PNAS, 99(20): 12841-12846, 2002.

[35]

H. Kushner and A. Budhiraja. A nonlinear filtering algorithm based on an approximation of the conditional distribution. IEEE Transactions on Automatic Control, 45:580-585, 2000.

[36]

M. Kuss and C. Rasmussen. Assessing approximate inference for binary Gaussian process classification. Journal of Machine Learning Research, 6:1679-1704, 2005.

Digital Library

[37]

N. D. Lawrence, M. Seeger, and R. Herbrich. Fast sparse Gaussian process methods: The informative vector machine. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 609-616. MIT Press, 2003.

Digital Library

[38]

J. Lewi, R. Butera, and L. Paninski. Real-time adaptive information-theoretic optimization of neurophysiological experiments. In B. Schölkopf, J. Platt, and T. Hofmann, editors, Advances in Neural Information Processing Systems 19. MIT Press, 2007.

[39]

M. Lewicki and B. Olshausen. Probabilistic framework for the adaption and comparison of image codes. J. Opt. Soc. Amer. A, 16(7):1587-1601, 1999.

[40]

L. Lovász and S. Vempala. Hit and run is fast and fun. Technical Report MSR-TR-2003-05, Microsoft Research, Redmond, January 2003.

[41]

D. MacKay. Information-based objective functions for active data selection. Neural Computation, 4(4):589-603, 1991.

Digital Library

[42]

D. MacKay. Bayesian interpolation. Neural Computation, 4(3):415-447, 1992.

Digital Library

[43]

T. Minka. A Family of Algorithms for Approximate Bayesian Inference. PhD thesis, Massachusetts Institute of Technology, January 2001a.

Digital Library

[44]

T. Minka. Expectation propagation for approximate Bayesian inference. In J. Breese and D. Koller, editors, Uncertainty in Artificial Intelligence 17. Morgan Kaufmann, 2001b.

Digital Library

[45]

T. Minka. Power EP. Technical report, Microsoft Research, Cambridge, 2004.

[46]

R. M. Neal. Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRG-TR-93-1, University of Toronto, 1993. See www.cs.toronto.edu/~radford.

[47]

R. M. Neal. Bayesian Learning for Neural Networks. Number 118 in Lecture Notes in Statistics. Springer, 1996.

Digital Library

[48]

A. O'Hagan. Bayesian Inference, volume 2B of Kendall's Advanced Theory of Statistics. Arnold, London, 1994.

[49]

B. Olshausen and D. Field. Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research, 37:3311-3325, 1997.

[50]

M. Opper and O. Winther. Expectation consistent approximate inference. Journal of Machine Learning Research, 6:2177-2204, 2005.

Digital Library

[51]

M. Opper and O. Winther. Gaussian processes for classification: Mean field algorithms. Neural Computation, 12(11):2655-2684, 2000.

Digital Library

[52]

A. Palmer, D. Wipf, K. Kreutz-Delgado, and B. Rao. Variational EM algorithms for non-Gaussian latent variable models. In Y. Weiss, B. Schölkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18. MIT Press, 2006.

[53]

L. Paninski. Log-concavity results on Gaussian process methods for supervised and unsupervised learning. In L. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17. MIT Press, 2005.

[54]

T. Park and G. Casella. The Bayesian Lasso. Technical report, University of Florida, 2005.

[55]

R. Peeters and R. Westra. On the identification of sparse gene regulatory networks. In Proc. 16th Int. Symp. on Math. Theory of Networks, 2004.

[56]

J. Pratt. Concavity of the log likelihood. Journal of the American Statistical Association, 76(373): 103-106, 1981.

[57]

Y. Qi, T. Minka, R. Picard, and Z. Ghahramani. Predictive automatic relevance determination by expectation propagation. In C. Brodley, editor, International Conference on Machine Learning 21. Morgan Kaufmann, 2004.

Digital Library

[58]

L. Rabiner and B. Juang. Fundamentals of Speech Recognition. Prentice Hall, 1st edition, 2003.

Digital Library

[59]

S. Rogers and M. Girolami. A Bayesian regression approach to the inference of regulatory networks from gene expression data. Bioinformatics, 21(14):3131-3137, 2005.

Digital Library

[60]

Y. Saad. Iterative Methods for Sparse Linear Systems. International Thomson Publishing, 1st edition, 1996.

Digital Library

[61]

M. Seeger. Bayesian Gaussian Process Models: PAC-Bayesian Generalisation Error Bounds and Sparse Approximations. PhD thesis, University of Edinburgh, July 2003. See www.kyb.tuebingen.mpg.de/bs/people/seeger.

[62]

M. Seeger. Low rank updates for the Cholesky decomposition. Technical report, University of California at Berkeley, 2004. See www.kyb.tuebingen.mpg.de/bs/people/seeger.

[63]

M. Seeger. Expectation propagation for exponential families. Technical report, University of California at Berkeley, 2005. See www.kyb.tuebingen.mpg.de/bs/people/seeger.

[64]

M. Seeger and H. Nickisch. Compressed sensing and Bayesian experimental design. To appear at ICML, 2008.

Digital Library

[65]

M. Seeger, S. Gerwinn, and M. Bethge. Bayesian inference for sparse generalized linear models. In J. Kok, J. Koronacki, R. Lopez, S. Matwin, D. Mladenic, and A. Skowron, editors, European Conference on Machine Learning 18. Springer, 2007a.

Digital Library

[66]

M. Seeger, F. Steinke, and K. Tsuda. Bayesian inference and optimal design in the sparse linear model. In M. Meila and X. Shen, editors, Workshop on Artificial Intelligence and Statistics 11, 2007b.

[67]

H. S. Seung, M. Opper, and H. Sompolinsky. Query by committee. In Conference on Computational Learning Theory 5, pages 287-294. Morgan Kaufmann, 1992.

Digital Library

[68]

D. Spiegelhalter, A. Thomas, N. Best, and W. Gilks. BUGS: Bayesian inference using Gibbs sampling. Technical report, MRC Biostatistics Unit, Cambridge University, 1995.

[69]

F. Steinke, M. Seeger, and K. Tsuda. Experimental design for efficient identification of gene regulatory networks using sparse Bayesian models. BMC Systems Biology, 1(51), 2007.

[70]

J. Tegnér, M. Yeung, J. Hasty, and J. Collins. Reverse engineering gene networks: Integrating genetic perturbations with dynamical modeling. PNAS, 100(10):5944-5949, 2003.

[71]

R. Tibshirani. Regression shrinkage and selection via the Lasso. Journal of Roy. Stat. Soc. B, 58: 267-288, 1996.

[72]

M. Tipping. Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1:211-244, 2001.

Digital Library

[73]

M. J. Wainwright. Sharp thresholds for noisy and high-dimensional recovery of sparsity using l ₁- constrained quadratic programming. Technical Report 709, UC Berkeley, Dept. of Statistics, 2006.

[74]

M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference. Technical Report 649, UC Berkeley, Dept. of Statistics, 2003.

[75]

D. Wipf, J. Palmer, and B. Rao. Perspectives on sparse Bayesian learning. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, 2004.

[76]

D. Wipf, J. Palmer, B. Rao, and K. Kreutz-Delgado. Performance analysis of latent variable models with sparse priors. In Proceedings of ICASSP 2007, 2007.

[77]

O. Zoeter and T. Heskes. Gaussian quadrature based expectation propagation. In Z. Ghahramani and R. Cowell, editors, Workshop on Artificial Intelligence and Statistics 10, 2005.

Cited By

Liu XZhang XCaetano T(2022)Bayesian Models for Structured Sparse Estimation via Set Cover PriorMachine Learning and Knowledge Discovery in Databases10.1007/978-3-662-44851-9_18(273-289)Online publication date: 10-Mar-2022
https://dl.acm.org/doi/10.1007/978-3-662-44851-9_18
Candelieri AArchetti F(2021)Sparsifying to optimize over multiple information sources: an augmented Gaussian process based algorithmStructural and Multidisciplinary Optimization10.1007/s00158-021-02882-764:1(239-255)Online publication date: 1-Jul-2021
https://dl.acm.org/doi/10.1007/s00158-021-02882-7
Wilson JBorovitskiy VTerenin AMostowsky PDeisenroth MDaumé HSingh A(2020)Efficiently sampling functions from Gaussian process posteriorsProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525891(10292-10302)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525891
Show More Cited By

Index Terms

Recommendations

Large Scale Bayesian Inference and Experimental Design for Sparse Linear Models

Many problems of low-level computer vision and image processing, such as denoising, deconvolution, tomographic reconstruction or superresolution, can be addressed by maximizing the posterior distribution of a sparse linear model (SLM). We show how ...
Variational Bayesian inference for a nonlinear forward model

Variational Bayes (VB) has been proposed as a method to facilitate calculations of the posterior distributions for linear models, by providing a fast method for Bayesian inference by estimating the parameters of a factorized approximation to the ...
Mean-field variational approximate Bayesian inference for latent variable models

The ill-posed nature of missing variable models offers a challenging testing ground for new computational techniques. This is the case for the mean-field variational Bayesian inference. The behavior of this approach in the setting of the Bayesian probit ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

Publisher

JMLR.org

Publication History

Published: 01 June 2008

Published in JMLR Volume 9

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

52
Total Citations
View Citations
991
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)15

Reflects downloads up to 24 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu XZhang XCaetano T(2022)Bayesian Models for Structured Sparse Estimation via Set Cover PriorMachine Learning and Knowledge Discovery in Databases10.1007/978-3-662-44851-9_18(273-289)Online publication date: 10-Mar-2022
https://dl.acm.org/doi/10.1007/978-3-662-44851-9_18
Candelieri AArchetti F(2021)Sparsifying to optimize over multiple information sources: an augmented Gaussian process based algorithmStructural and Multidisciplinary Optimization10.1007/s00158-021-02882-764:1(239-255)Online publication date: 1-Jul-2021
https://dl.acm.org/doi/10.1007/s00158-021-02882-7
Wilson JBorovitskiy VTerenin AMostowsky PDeisenroth MDaumé HSingh A(2020)Efficiently sampling functions from Gaussian process posteriorsProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525891(10292-10302)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525891
Afrabandpey HPeltola TKaski S(2019)Human-in-the-loop active covariance learning for improving prediction in small data setsProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367243.3367311(1959-1966)Online publication date: 10-Aug-2019
https://dl.acm.org/doi/10.5555/3367243.3367311
Wang ZYe XWang CYu P(2019)Feature Selection via Transferring Knowledge Across Different ClassesACM Transactions on Knowledge Discovery from Data10.1145/331420213:2(1-29)Online publication date: 31-May-2019
https://dl.acm.org/doi/10.1145/3314202
Ho MXin J(2019)Sparse Kalman filtering approaches to realized covariance estimation from high frequency financial dataMathematical Programming: Series A and B10.1007/s10107-019-01371-6176:1-2(247-278)Online publication date: 1-Jul-2019
https://dl.acm.org/doi/10.1007/s10107-019-01371-6
Wang MXiao CNing ZLi TGong B(2019)Neural Networks for Compressed Sensing Based on Information GeometryCircuits, Systems, and Signal Processing10.1007/s00034-018-0869-638:2(569-589)Online publication date: 1-Feb-2019
https://dl.acm.org/doi/10.1007/s00034-018-0869-6
(2018)Farmland multi-parameter wireless sensor network data compression strategyInternational Journal of Ad Hoc and Ubiquitous Computing10.5555/3292681.329268629:3(221-231)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.5555/3292681.3292686
Zhang YXiang MYang B(2018)Hierarchical sparse coding from a Bayesian perspectiveNeurocomputing10.1016/j.neucom.2017.06.076272:C(279-293)Online publication date: 10-Jan-2018
https://dl.acm.org/doi/10.1016/j.neucom.2017.06.076
Neiswanger WXing E(2017)Post-inference prior swappingProceedings of the 34th International Conference on Machine Learning - Volume 7010.5555/3305890.3305949(2594-2602)Online publication date: 6-Aug-2017
https://dl.acm.org/doi/10.5555/3305890.3305949
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents