Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3412815.3416887acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfodsConference Proceedingsconference-collections
research-article

Non-Uniform Sampling of Fixed Margin Binary Matrices

Published: 18 October 2020 Publication History

Abstract

Data sets in the form of binary matrices are ubiquitous across scientific domains, and researchers are often interested in identifying and quantifying noteworthy structure. One approach is to compare the observed data to that which might be obtained under a null model. Here we consider sampling from the space of binary matrices which satisfy a set of marginal row and column sums. Whereas existing sampling methods have focused on uniform sampling from this space, we introduce modified versions of two elementwise swapping algorithms which sample according to a non-uniform probability distribution defined by a weight matrix, which gives the relative probability of a one for each entry. We demonstrate that values of zero in the weight matrix, i.e. structural zeros, are generally problematic for swapping algorithms, except when they have special monotonic structure. We explore the properties of our algorithms through simulation studies, and illustrate the potential impact of employing a non-uniform null model using a classic bird habitation dataset.

References

[1]
Julian Besag and Peter Clifford. 1989. Generalized Monte Carlo significance tests. Biometrika, Vol. 76, 4 (Dec. 1989), 633--642. https://doi.org/10.1093/biomet/76.4.633
[2]
Richard A. Brualdi. 1980. Matrices of zeros and ones with fixed row and column sum vectors. Linear algebra and its applications, Vol. 33 (1980), 159--231.
[3]
C. J. Carstens. 2015. Proof of uniform sampling of binary matrices with fixed row sums and column sums for the fast Curveball algorithm. Physical Review E, Vol. 91, 4 (April 2015), 042812. https://doi.org/10.1103/PhysRevE.91.042812
[4]
Yuguo Chen, Persi Diaconis, Susan P. Holmes, and Jun S. Liu. 2005. Sequential Monte Carlo methods for statistical analysis of tables. J. Amer. Statist. Assoc., Vol. 100, 469 (2005), 109--120.
[5]
Edward F. Connor, Michael D. Collins, and Daniel Simberloff. 2013. The checkered history of checkerboard distributions. Ecology, Vol. 94, 11 (Nov. 2013), 2403--2414. https://doi.org/10.1890/12--1471.1
[6]
Edward F. Connor and Daniel Simberloff. 1979. The Assembly of Species Communities: Chance or Competition? Ecology, Vol. 60, 6 (Dec. 1979), 1132. https://doi.org/10.2307/1936961
[7]
Jared M. Diamond. 1975. Assembly of Species Comunities . In Ecology and Evolution of Communities. Harvard University Press, 342--443.
[8]
Jared M. Diamond and Michael E. Gilpin. 1982. Examination of the 'null' model of Connor and Simberloff for species co-occurrences on islands. Oecologia, Vol. 52, 1 (1982), 64--74.
[9]
James M. Flegal and Galin L. Jones. 2010. Batch means and spectral variance estimators in Markov chain Monte Carlo . The Annals of Statistics, Vol. 38, 2 (April 2010), 1034--1070. https://doi.org/10.1214/09-AOS735
[10]
Bailey K. Fosdick, Daniel B. Larremore, Joel Nishimura, and Johan Ugander. 2018. Configuring Random Graph Models with Fixed Degree Sequences . SIAM Rev., Vol. 60, 2 (Jan. 2018), 315--355. https://doi.org/10.1137/16M1087175 Publisher: Society for Industrial and Applied Mathematics.
[11]
Nicholas J. Gotelli. 2000. Null Model Analysis of Species Co-Occurrence Patterns . Ecology, Vol. 81, 9 (2000), 2606--2621. https://doi.org/10.1890/0012--9658(2000)081[2606:NMAOSC]2.0.CO;2
[12]
Nicholas J. Gotelli and Gary L. Entsminger. 2001. Swap and fill algorithms in null model analysis: rethinking the knight's tour. Oecologia, Vol. 129, 2 (Oct. 2001), 281--291. https://doi.org/10.1007/s004420100717
[13]
Nicholas J. Gotelli and Gary R. Graves. 1996. Null models in ecology .Smithsonian.
[14]
Matthew T. Harrison and Jeffrey W. Miller. 2013. Importance sampling for weighted binary random matrices with specified margins. arXiv:1301.3928 [math, stat] (Jan. 2013). http://arxiv.org/abs/1301.3928 arXiv: 1301.3928.
[15]
P H Harvey, R K Colwell, J W Silvertown, and R M May. 1983. Null Models in Ecology . Annual Review of Ecology and Systematics, Vol. 14, 1 (1983), 189--211. https://doi.org/10.1146/annurev.es.14.110183.001201
[16]
Peter D. Hoff. 2009. A first course in Bayesian statistical methods .Springer, London ; New York. OCLC: ocn310401109.
[17]
Mark Jerrum and Alistair Sinclair. 1996. The Markov chain Monte Carlo method: an approach to approximate counting and integration. In Approximation algorithms for NP-hard problems. PWS Publishing Co., Boston. Publisher: PWS Boston.
[18]
Ravi Kannan, Prasad Tetali, and Santosh Vempala. 1999. Simple Markovchain algorithms for generating bipartite graphs and tournaments. Random Structures & Algorithms, Vol. 13, 4 (1999), 16.
[19]
Bryan F. J. Manly. 1995. A Note on the Analysis of Species Co-Occurrences . Ecology, Vol. 76, 4 (1995), 1109--1115. https://doi.org/10.2307/1940919
[20]
Jeffrey W. Miller and Matthew T. Harrison. 2013. Exact sampling and counting for fixed-margin matrices. The Annals of Statistics, Vol. 41, 3 (June 2013), 1569--1592. https://doi.org/10.1214/13-AOS1131
[21]
Martyn Plummer, Nicky Best, Kate Cowles, and Karen Vines. 2006. CODA: Convergence Diagnosis and Output Analysis for MCMC . R News, Vol. 6, 1 (2006), 7--11. https://journal.r-project.org/archive/
[22]
A. Ramachandra Rao, Rabindranath Jana, and Suraj Bandyopadhyay. 1996. A Markov chain Monte Carlo method for generating random (0, 1)-matrices with given marginals. Sankhy?: The Indian Journal of Statistics, Series A (1996), 225--242.
[23]
Alan Roberts and Lewis Stone. 1990. Island-sharing by archipelago species. Oecologia, Vol. 83, 4 (1990), 560--567.
[24]
James G. Sanderson, Michael P. Moulton, and Ralph G. Selfridge. 1998. Null matrices and the analysis of species co-occurrences. Oecologia, Vol. 116, 1--2 (1998), 275--283.
[25]
Lewi Stone and Alan Roberts. 1990. The checkerboard score and species distributions. Oecologia, Vol. 85, 1 (1990), 74--79.
[26]
Giovanni Strona, Domenico Nappo, Francesco Boccacci, Simone Fattorini, and Jesus San-Miguel-Ayanz. 2014. A fast and unbiased procedure to randomize ecological binary matrices with fixed row and column totals. Nature Communications, Vol. 5, 1 (June 2014), 1--9. https://doi.org/10.1038/ncomms5114
[27]
J. Bastow Wilson. 1987. Methods for detecting non-randomness in species co-occurrences: a contribution. Oecologia, Vol. 73, 4 (1987), 579--582.
[28]
Arif Zaman and Daniel Simberloff. 2002. Random binary matrices in biogeographical ecology-instituting a good neighbor policy. Environmental and Ecological Statistics, Vol. 9, 4 (2002), 405--421.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
FODS '20: Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference
October 2020
196 pages
ISBN:9781450381031
DOI:10.1145/3412815
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. binary matrix
  2. checkerboard swap
  3. curveball
  4. mcmc sampling

Qualifiers

  • Research-article

Funding Sources

  • National Science Foundation

Conference

FODS '20
FODS '20: ACM-IMS Foundations of Data Science Conference
October 19 - 20, 2020
Virtual Event, USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 65
    Total Downloads
  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)2
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media