Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Bayesian Inference and Optimal Design for the Sparse Linear Model

Published: 01 June 2008 Publication History

Abstract

The linear model with sparsity-favouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how to estimate model hyperparameters by empirical Bayesian maximisation of the marginal likelihood, and propose ideas in order to scale up the method to very large underdetermined problems.
We demonstrate the versatility of our framework on the application of gene regulatory network identification from micro-array expression data, where both the Laplace prior and the active experimental design approach are shown to result in significant improvements. We also address the problem of sparse coding of natural images, and show how our framework can be used for compressive sensing tasks.
Part of this work appeared in Seeger et al. (2007b). The gene network identification application appears in Steinke et al. (2007).

References

[1]
H. Attias. A variational Bayesian framework for graphical models. In S. Solla, T. Leen, and K.-R. Müller, editors, Advances in Neural Information Processing Systems 12, pages 209-215. MIT Press, 2000.
[2]
P. Berkes, R. Turner, and M. Sahani. On sparsity and overcompleteness in image models. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20. MIT Press, 2008.
[3]
C. Bishop and J. Winn. Structured variational distributions in VIBES. In C. Bishop and B. Frey, editors, Workshop on Artificial Intelligence and Statistics 9, pages 244-251, 2003. Electronic Proceedings (ISBN 0-9727358-0-1).
[4]
V. Bogachev. Gaussian Measures. Mathematical Surveys and Monographs. American Mathematical Society, 1998.
[5]
L. Bottou. Online learning and stochastic approximations. In D. Saad, editor, On-Line Learning in Neural Networks. Cambridge University Press, 1998.
[6]
S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2002.
[7]
E. Candès, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory, 52 (2):489-509, 2006.
[8]
K. Chaloner and I. Verdinelli. Bayesian experimental design: A review. Statistical Science, 10(3): 273-304, 1995.
[9]
S. Chen, D. Donoho, and M. Saunders. Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20(1):33-61, 1999.
[10]
R. Chhikara and L. Folks. The Inverse Gaussian Distribution: Theory, Methodology, and Applications . Marcel Dekker Inc., 1989.
[11]
S. Chib. Marginal likelihood from the Gibbs output. Journal of the American Statistical Association, 90(432):1313-1321, 1995.
[12]
J. DeRisi, V. Iyer, and P. Brown. Exploring the matebolic and genetic control of gene expression on a genomic scale. Science, 282:699-705, 1997.
[13]
J. Dongarra, C. Moler, J. Bunch, and G. Stewart. LINPACK User's Guide. Society for Industrial and Applied Mathematics, 1979.
[14]
D. Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52(4):1289-1306, 2006.
[15]
D. Donoho and M. Elad. Optimally sparse representation in general (nonorthogonal) dictionaries via l 1 minimization. Proc. Natl. Acad. Sci. USA, 100:2197-2202, 2003.
[16]
A. Faul and M. Tipping. Analysis of sparse Bayesian learning. In T. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 383-389. MIT Press, 2002.
[17]
V. Fedorov. Theory of Optimal Experiments. Academic Press, 1972.
[18]
M. Figueiredo. Adaptive sparseness for supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9):1050-1059, 2003.
[19]
T. Gardner, C. Cantor, and J. Collins. Construction of a genetic toggle switch in Escherichia coli. Nature, 403(6767):339-342, 2000.
[20]
S. Gerwinn, J. Macke, M. Seeger, and M. Bethge. Bayesian inference for spiking neuron models with a sparsity prior. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20. MIT Press, 2008.
[21]
Z. Ghahramani and M. Beal. Propagation algorithms for variational Bayesian learning. In T. Leen, T. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 507-513. MIT Press, 2001.
[22]
W. Gilks, S. Richardson, and D. Spiegelhalter, editors. Markov Chain Monte Carlo in Practice. Chapman & Hall, 1st edition, 1996.
[23]
W. R. Gilks and P. Wild. Adaptive rejection sampling for Gibbs sampling. Applied Statistics, 41(2): 337-348, 1992.
[24]
M. Girolami. A variational method for learning sparse and overcomplete representations. Neural Computation, 13:2517-2532, 2001.
[25]
T. Gneiting. Normal scale mixtures and dual probability densities. J. Statist. Comput. Simul., 59: 375-384, 1997.
[26]
T. Hastie, S. Rosset, R. Tibshirani, and J. Zhu. The entire regularization path for the support vector machine. Journal of Machine Learning Research, 5:1391-1415, 2004.
[27]
H. Henderson and S. Searle. On deriving the inverse of a sum of matrices. SIAM Review, 23:53-60, 1981.
[28]
P. Hojen-Sorensen, O. Winther, and L. Hansen. Mean field approaches to independent component analysis. Neural Computation, 14:889-918, 2002.
[29]
R. Horn and C. Johnson. Matrix Analysis. Cambridge University Press, 1st edition, 1985.
[30]
H. Ishwaran and J. Rao. Spike and slab gene selection for multigroup microarray data. Journal of the American Statistical Association, 100(471):764-780, 2005.
[31]
T. Jaakkola. Variational Methods for Inference and Estimation in Graphical Models. PhD thesis, Massachusetts Institute of Technology, 1997.
[32]
S. Ji and L. Carin. Bayesian compressive sensing and projection optimization. In Z. Ghahramani, editor, International Conference on Machine Learning 24. Omni Press, 2007.
[33]
M. Jordan, Z. Ghahramani, T. Jaakkola, and L. Saul. An introduction to variational methods in graphical models. In M. I. Jordan, editor, Learning in Graphical Models. Kluwer, 1997.
[34]
B. Kholodenko, A. Kiyatkin, F. Bruggeman, E. Sontag, H. Westerhoff, and J. Hoek. Untangling the wires: A strategy to trace functional interactions in signaling and gene networks. PNAS, 99(20): 12841-12846, 2002.
[35]
H. Kushner and A. Budhiraja. A nonlinear filtering algorithm based on an approximation of the conditional distribution. IEEE Transactions on Automatic Control, 45:580-585, 2000.
[36]
M. Kuss and C. Rasmussen. Assessing approximate inference for binary Gaussian process classification. Journal of Machine Learning Research, 6:1679-1704, 2005.
[37]
N. D. Lawrence, M. Seeger, and R. Herbrich. Fast sparse Gaussian process methods: The informative vector machine. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 609-616. MIT Press, 2003.
[38]
J. Lewi, R. Butera, and L. Paninski. Real-time adaptive information-theoretic optimization of neurophysiological experiments. In B. Schölkopf, J. Platt, and T. Hofmann, editors, Advances in Neural Information Processing Systems 19. MIT Press, 2007.
[39]
M. Lewicki and B. Olshausen. Probabilistic framework for the adaption and comparison of image codes. J. Opt. Soc. Amer. A, 16(7):1587-1601, 1999.
[40]
L. Lovász and S. Vempala. Hit and run is fast and fun. Technical Report MSR-TR-2003-05, Microsoft Research, Redmond, January 2003.
[41]
D. MacKay. Information-based objective functions for active data selection. Neural Computation, 4(4):589-603, 1991.
[42]
D. MacKay. Bayesian interpolation. Neural Computation, 4(3):415-447, 1992.
[43]
T. Minka. A Family of Algorithms for Approximate Bayesian Inference. PhD thesis, Massachusetts Institute of Technology, January 2001a.
[44]
T. Minka. Expectation propagation for approximate Bayesian inference. In J. Breese and D. Koller, editors, Uncertainty in Artificial Intelligence 17. Morgan Kaufmann, 2001b.
[45]
T. Minka. Power EP. Technical report, Microsoft Research, Cambridge, 2004.
[46]
R. M. Neal. Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRG-TR-93-1, University of Toronto, 1993. See www.cs.toronto.edu/~radford.
[47]
R. M. Neal. Bayesian Learning for Neural Networks. Number 118 in Lecture Notes in Statistics. Springer, 1996.
[48]
A. O'Hagan. Bayesian Inference, volume 2B of Kendall's Advanced Theory of Statistics. Arnold, London, 1994.
[49]
B. Olshausen and D. Field. Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research, 37:3311-3325, 1997.
[50]
M. Opper and O. Winther. Expectation consistent approximate inference. Journal of Machine Learning Research, 6:2177-2204, 2005.
[51]
M. Opper and O. Winther. Gaussian processes for classification: Mean field algorithms. Neural Computation, 12(11):2655-2684, 2000.
[52]
A. Palmer, D. Wipf, K. Kreutz-Delgado, and B. Rao. Variational EM algorithms for non-Gaussian latent variable models. In Y. Weiss, B. Schölkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18. MIT Press, 2006.
[53]
L. Paninski. Log-concavity results on Gaussian process methods for supervised and unsupervised learning. In L. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17. MIT Press, 2005.
[54]
T. Park and G. Casella. The Bayesian Lasso. Technical report, University of Florida, 2005.
[55]
R. Peeters and R. Westra. On the identification of sparse gene regulatory networks. In Proc. 16th Int. Symp. on Math. Theory of Networks, 2004.
[56]
J. Pratt. Concavity of the log likelihood. Journal of the American Statistical Association, 76(373): 103-106, 1981.
[57]
Y. Qi, T. Minka, R. Picard, and Z. Ghahramani. Predictive automatic relevance determination by expectation propagation. In C. Brodley, editor, International Conference on Machine Learning 21. Morgan Kaufmann, 2004.
[58]
L. Rabiner and B. Juang. Fundamentals of Speech Recognition. Prentice Hall, 1st edition, 2003.
[59]
S. Rogers and M. Girolami. A Bayesian regression approach to the inference of regulatory networks from gene expression data. Bioinformatics, 21(14):3131-3137, 2005.
[60]
Y. Saad. Iterative Methods for Sparse Linear Systems. International Thomson Publishing, 1st edition, 1996.
[61]
M. Seeger. Bayesian Gaussian Process Models: PAC-Bayesian Generalisation Error Bounds and Sparse Approximations. PhD thesis, University of Edinburgh, July 2003. See www.kyb.tuebingen.mpg.de/bs/people/seeger.
[62]
M. Seeger. Low rank updates for the Cholesky decomposition. Technical report, University of California at Berkeley, 2004. See www.kyb.tuebingen.mpg.de/bs/people/seeger.
[63]
M. Seeger. Expectation propagation for exponential families. Technical report, University of California at Berkeley, 2005. See www.kyb.tuebingen.mpg.de/bs/people/seeger.
[64]
M. Seeger and H. Nickisch. Compressed sensing and Bayesian experimental design. To appear at ICML, 2008.
[65]
M. Seeger, S. Gerwinn, and M. Bethge. Bayesian inference for sparse generalized linear models. In J. Kok, J. Koronacki, R. Lopez, S. Matwin, D. Mladenic, and A. Skowron, editors, European Conference on Machine Learning 18. Springer, 2007a.
[66]
M. Seeger, F. Steinke, and K. Tsuda. Bayesian inference and optimal design in the sparse linear model. In M. Meila and X. Shen, editors, Workshop on Artificial Intelligence and Statistics 11, 2007b.
[67]
H. S. Seung, M. Opper, and H. Sompolinsky. Query by committee. In Conference on Computational Learning Theory 5, pages 287-294. Morgan Kaufmann, 1992.
[68]
D. Spiegelhalter, A. Thomas, N. Best, and W. Gilks. BUGS: Bayesian inference using Gibbs sampling. Technical report, MRC Biostatistics Unit, Cambridge University, 1995.
[69]
F. Steinke, M. Seeger, and K. Tsuda. Experimental design for efficient identification of gene regulatory networks using sparse Bayesian models. BMC Systems Biology, 1(51), 2007.
[70]
J. Tegnér, M. Yeung, J. Hasty, and J. Collins. Reverse engineering gene networks: Integrating genetic perturbations with dynamical modeling. PNAS, 100(10):5944-5949, 2003.
[71]
R. Tibshirani. Regression shrinkage and selection via the Lasso. Journal of Roy. Stat. Soc. B, 58: 267-288, 1996.
[72]
M. Tipping. Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1:211-244, 2001.
[73]
M. J. Wainwright. Sharp thresholds for noisy and high-dimensional recovery of sparsity using l 1- constrained quadratic programming. Technical Report 709, UC Berkeley, Dept. of Statistics, 2006.
[74]
M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference. Technical Report 649, UC Berkeley, Dept. of Statistics, 2003.
[75]
D. Wipf, J. Palmer, and B. Rao. Perspectives on sparse Bayesian learning. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, 2004.
[76]
D. Wipf, J. Palmer, B. Rao, and K. Kreutz-Delgado. Performance analysis of latent variable models with sparse priors. In Proceedings of ICASSP 2007, 2007.
[77]
O. Zoeter and T. Heskes. Gaussian quadrature based expectation propagation. In Z. Ghahramani and R. Cowell, editors, Workshop on Artificial Intelligence and Statistics 10, 2005.

Cited By

View all
  • (2022)Bayesian Models for Structured Sparse Estimation via Set Cover PriorMachine Learning and Knowledge Discovery in Databases10.1007/978-3-662-44851-9_18(273-289)Online publication date: 10-Mar-2022
  • (2021)Sparsifying to optimize over multiple information sources: an augmented Gaussian process based algorithmStructural and Multidisciplinary Optimization10.1007/s00158-021-02882-764:1(239-255)Online publication date: 1-Jul-2021
  • (2020)Efficiently sampling functions from Gaussian process posteriorsProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525891(10292-10302)Online publication date: 13-Jul-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

Publisher

JMLR.org

Publication History

Published: 01 June 2008
Published in JMLR Volume 9

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)32
  • Downloads (Last 6 weeks)15
Reflects downloads up to 24 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Bayesian Models for Structured Sparse Estimation via Set Cover PriorMachine Learning and Knowledge Discovery in Databases10.1007/978-3-662-44851-9_18(273-289)Online publication date: 10-Mar-2022
  • (2021)Sparsifying to optimize over multiple information sources: an augmented Gaussian process based algorithmStructural and Multidisciplinary Optimization10.1007/s00158-021-02882-764:1(239-255)Online publication date: 1-Jul-2021
  • (2020)Efficiently sampling functions from Gaussian process posteriorsProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525891(10292-10302)Online publication date: 13-Jul-2020
  • (2019)Human-in-the-loop active covariance learning for improving prediction in small data setsProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367243.3367311(1959-1966)Online publication date: 10-Aug-2019
  • (2019)Feature Selection via Transferring Knowledge Across Different ClassesACM Transactions on Knowledge Discovery from Data10.1145/331420213:2(1-29)Online publication date: 31-May-2019
  • (2019)Sparse Kalman filtering approaches to realized covariance estimation from high frequency financial dataMathematical Programming: Series A and B10.1007/s10107-019-01371-6176:1-2(247-278)Online publication date: 1-Jul-2019
  • (2019)Neural Networks for Compressed Sensing Based on Information GeometryCircuits, Systems, and Signal Processing10.1007/s00034-018-0869-638:2(569-589)Online publication date: 1-Feb-2019
  • (2018)Farmland multi-parameter wireless sensor network data compression strategyInternational Journal of Ad Hoc and Ubiquitous Computing10.5555/3292681.329268629:3(221-231)Online publication date: 1-Jan-2018
  • (2018)Hierarchical sparse coding from a Bayesian perspectiveNeurocomputing10.1016/j.neucom.2017.06.076272:C(279-293)Online publication date: 10-Jan-2018
  • (2017)Post-inference prior swappingProceedings of the 34th International Conference on Machine Learning - Volume 7010.5555/3305890.3305949(2594-2602)Online publication date: 6-Aug-2017
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media