Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Free access

Nonparametric graphical model for counts

Published: 01 January 2020 Publication History

Abstract

Although multivariate count data are routinely collected in many application areas, there is surprisingly little work developing flexible models for characterizing their dependence structure. This is particularly true when interest focuses on inferring the conditional independence graph. In this article, we propose a new class of pairwise Markov random field-type models for the joint distribution of a multivariate count vector. By employing a novel type of transformation, we avoid restricting to non-negative dependence structures or inducing other restrictions through truncations. Taking a Bayesian approach to inference, we choose a Dirichlet process prior for the distribution of a random effect to induce great exibility in the specification. An efficient Markov chain Monte Carlo (MCMC) algorithm is developed for posterior computation. We prove various theoretical properties, including posterior consistency, and show that our COunt Nonparametric Graphical Analysis (CONGA) approach has good performance relative to competitors in simulation studies. The methods are motivated by an application to neuron spike count data in mice.

References

[1]
John Aitchison and CH Ho. The multivariate Poisson-log normal distribution. Biometrika, 76(4):643-653, 1989.
[2]
Genevera I Allen and Zhandong Liu. A log-linear graphical model for inferring genetic networks from high-throughput sequencing data. In Bioinformatics and Biomedicine (BIBM), 2012 IEEE International Conference on, pages 1-6. IEEE, 2012.
[3]
Adrian Baddeley and Rolf Turner. Practical maximum pseudolikelihood for spatial point patterns: (with discussion). Australian & New Zealand Journal of Statistics, 42(3):283- 322, 2000.
[4]
Onureena Banerjee, Laurent El Ghaoui, and Alexandre d'Aspremont. Model selection through sparse maximum likelihood estimation for multivariate aussian or binary data. Journal of Machine Learning Research, 9(Mar):485-516, 2008.
[5]
Julian Besag. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society. Series B (Methodological), pages 192-236, 1974.
[6]
Julian Besag. Statistical analysis of non-lattice data. The Statistician, pages 179-195, 1975.
[7]
KS Chan and Johannes Ledolter. Monte carlo em estimation for time series models involving counts. Journal of the American Statistical Association, 90(429):242-252, 1995.
[8]
Shizhe Chen, Daniela M Witten, and Ali Shojaie. Selection and estimation for mixed graphical models. Biometrika, 102(1):47-64, 2014.
[9]
Siddhartha Chib and Rainer Winkelmann. Markov chain monte carlo analysis of correlated count data. Journal of Business & Economic Statistics, 19(4):428-435, 2001.
[10]
Julien Chiquet, Mahendra Mariadassou, and Stéphane Robin. Variational inference for sparse network reconstruction from count data. arXiv preprint arXiv:1806.03120, 2018.
[11]
Francis Comets. On consistency of a class of estimators for exponential families of Markov random fields on the lattice. The Annals of Statistics, pages 455-468, 1992.
[12]
Victor De Oliveira. Bayesian analysis of conditional autoregressive models. Annals of the Institute of Statistical Mathematics, 64(1):107-133, 2012.
[13]
Victor De Oliveira. Hierarchical Poisson models for spatial count data. Journal of Multivariate Analysis, 122:393-408, 2013.
[14]
Peter J Diggle, JA Tawn, and RA Moyeed. Model-based geostatistics. Journal of the Royal Statistical Society: Series C (Applied Statistics), 47(3):299-350, 1998.
[15]
Adrian Dobra and Alex Lenkoski. Copula Gaussian graphical models and their application to modeling functional disability data. The Annals of Applied Statistics, 5(2A):969-993, 2011.
[16]
Adrian Dobra, Alex Lenkoski, and Abel Rodriguez. Bayesian inference for general Gaussian graphical models with application to multivariate lattice data. Journal of the American Statistical Association, 106(496):1418-1433, 2011.
[17]
Adrian Dobra, Reza Mohammadi, et al. Loglinear model selection and human mobility. The Annals of Applied Statistics, 12(2):815-845, 2018.
[18]
Alan E Gelfand and Penelope Vounatsou. Proper multivariate conditional autoregressive models for spatial data analysis. Biostatistics, 4(1):11-15, 2003.
[19]
Subhashis Ghosal and Aad Van der Vaart. Fundamentals of nonparametric Bayesian inference, volume 44. Cambridge University Press, 2017.
[20]
Fabian Hadiji, Alejandro Molina, Sriraam Natarajan, and Kristian Kersting. Poisson dependency networks: Gradient boosted models for multivariate count data. Machine Learning, 100(2-3):477-507, 2015.
[21]
John M Hammersley and Peter Clifford. Markov fields on finite graphs and lattices. Unpublished manuscript, 1971.
[22]
John L Hay and Anthony N Pettitt. Bayesian analysis of a time series of counts with covariates: an application to the control of an infectious disease. Biostatistics, 2(4): 433-444, 2001.
[23]
David Inouye, Pradeep Ravikumar, and Inderjit Dhillon. Admixture of Poisson mrfs: A topic model with word dependencies. In International Conference on Machine Learning, pages 683-691, 2014.
[24]
David I Inouye, Pradeep Ravikumar, and Inderjit S Dhillon. Generalized root models: beyond pairwise graphical models for univariate exponential families. arXiv preprint arXiv:1606.00813, 2016a.
[25]
David I Inouye, Pradeep Ravikumar, and Inderjit S Dhillon. Square root graphical models: Multivariate generalizations of univariate exponential families that permit positive dependencies. arXiv preprint arXiv:1603.03629, 2016b.
[26]
David I Inouye, Eunho Yang, Genevera I Allen, and Pradeep Ravikumar. A review of multivariate distributions for count data derived from the Poisson distribution. Wiley Interdisciplinary Reviews: Computational Statistics, 9(3):e1398, 2017.
[27]
Ali Jalali, Pradeep Ravikumar, Vishvas Vasuki, and Sujay Sanghavi. On learning discrete graphical models using group-sparse regularization. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 378-387, 2011.
[28]
Jens Ledet Jensen and Hans R Künsch. On asymptotic normality of pseudo likelihood estimates for pairwise interaction processes. Annals of the Institute of Statistical Mathematics, 46(3):475-486, 1994.
[29]
Mladen Kolar and Eric P Xing. Improved estimation of high-dimensional Ising models. arXiv preprint arXiv:0811.1239, 2008.
[30]
Mladen Kolar, Le Song, Amr Ahmed, Eric P Xing, et al. Estimating time-varying networks. The Annals of Applied Statistics, 4(1):94-123, 2010.
[31]
Han Liu, John Lafferty, and Larry Wasserman. The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. Journal of Machine Learning Research, 10(Oct):2295-2328, 2009.
[32]
Shigeru Mase. Marked gibbs processes and asymptotic normality of maximum pseudolikelihood estimators. Mathematische Nachrichten, 209(1):151-169, 2000.
[33]
Abdolreza Mohammadi, Ernst C Wit, et al. Bayesian structure learning in sparse Gaussian graphical models. Bayesian Analysis, 10(1):109-138, 2015.
[34]
Abdolreza Mohammadi, Fentaw Abegaz, Edwin van den Heuvel, and Ernst C Wit. Bayesian modelling of dupuytren disease by using Gaussian copula graphical models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 66(3):629-645, 2017.
[35]
Jared S Murray, David B Dunson, Lawrence Carin, and Joseph E Lucas. Bayesian Gaussian copula factor models for mixed data. Journal of the American Statistical Association, 108(502):656-665, 2013.
[36]
Johan Pensar, Henrik Nyman, Juha Niiranen, Jukka Corander, et al. Marginal pseudolikelihood learning of discrete Markov network structures. Bayesian analysis, 12(4):1195- 1215, 2017.
[37]
Pradeep Ravikumar, Martin J Wainwright, and John D Lafferty. High-dimensional Ising model selection using l1-regularized logistic regression. The Annals of Statistics, 38(3): 1287-1319, 2010.
[38]
Arkaprava Roy, Brian J Reich, Joseph Guinness, Russell T Shinohara, and Ana-Maria Staicu. Spatial shrinkage via the product independent Gaussian process prior. arXiv preprint arXiv:1805.03240, 2018.
[39]
Lorraine Schwartz. On bayes procedures. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 4(1):10-26, 1965.
[40]
Måns Thulin. Decision-theoretic justifications for Bayesian hypothesis testing using credible sets. Journal of Statistical Planning and Inference, 146:133-138, 2014.
[41]
Marijtje AJ Van Duijn, Krista J Gile, and Mark S Handcock. A framework for the comparison of maximum pseudo-likelihood and maximum likelihood estimation of exponential family random graph models. Social Networks, 31(1):52-62, 2009.
[42]
Martin J Wainwright, John D Lafferty, and Pradeep K Ravikumar. High-dimensional graphical model selection using l1 regularized logistic regression. In Advances in neural information processing systems, pages 1465-1472, 2007.
[43]
Hao Wang. Bayesian graphical lasso models and efficient posterior computation. Bayesian Analysis, 7(4):867-886, 2012.
[44]
Hao Wang. Scaling it up: Stochastic search structure learning in graphical models. Bayesian Analysis, 10(2):351-377, 2015.
[45]
Yiyi Wang and Kara M Kockelman. A Poisson-lognormal conditional-autoregressive model for multivariate spatial analysis of pedestrian crash counts across neighborhoods. Accident Analysis & Prevention, 60:71-84, 2013.
[46]
Michel Wedel, Ulf Böckenholt, and Wagner A Kamakura. Factor models for multivariate count data. Journal of Multivariate Analysis, 87(2):356-369, 2003.
[47]
Peter Xue-Kun Song. Multivariate dispersion models generated from Gaussian copula. Scandinavian Journal of Statistics, 27(2):305-320, 2000.
[48]
Eunho Yang, Pradeep K Ravikumar, Genevera I Allen, and Zhandong Liu. On Poisson graphical models. In Advances in Neural Information Processing Systems, pages 1718- 1726, 2013.
[49]
Eunho Yang, Pradeep Ravikumar, Genevera I Allen, and Zhandong Liu. Graphical models via univariate exponential family distributions. The Journal of Machine Learning Research, 16(1):3813-3847, 2015.
[50]
Mingyuan Zhou, Lauren A Hannah, David B Dunson, and Lawrence Carin. Beta-negative binomial process and Poisson factor analysis. In AISTATS, pages 1462-1471, 2012.
[51]
Xiang Zhou and Scott C Schmidler. Bayesian parameter estimation in Ising and Potts models: A comparative study with applications to protein modeling. Technical report, Duke University, 2009.

Index Terms

  1. Nonparametric graphical model for counts
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image The Journal of Machine Learning Research
    The Journal of Machine Learning Research  Volume 21, Issue 1
    January 2020
    10260 pages
    ISSN:1532-4435
    EISSN:1533-7928
    Issue’s Table of Contents
    CC-BY 4.0

    Publisher

    JMLR.org

    Publication History

    Accepted: 01 December 2020
    Published: 01 January 2020
    Received: 01 May 2019
    Published in JMLR Volume 21, Issue 1

    Author Tags

    1. conditional independence
    2. dirichlet process
    3. graphical model
    4. Markov random field
    5. multivariate count data

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 54
      Total Downloads
    • Downloads (Last 12 months)28
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 25 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media