Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Free access

Edge sampling using local network information

Published: 01 January 2021 Publication History

Abstract

Edge sampling is an important topic in network analysis. It provides a natural way to reduce network size while retaining desired features of the original network. Sampling methods that only use local information are common in practice as they do not require access to the entire network and can be easily parallelized. Despite promising empirical performances, most of these methods are derived from heuristic considerations and lack theoretical justification. In this paper, we study a simple and efficient edge sampling method that uses local network information. We show that when the local connectivity is sufficiently strong, the sampled network satisfies a strong spectral property. We measure the strength of local connectivity by a global parameter and relate it to more common network statistics such as the clustering coefficient and network curvature. Based on this result, we also provide sufficient conditions under which random networks and hypergraphs can be efficiently sampled.

References

[1]
L. A. Adamic and N. Glance. The political blogosphere and the 2004 US election. In Proceedings of the WWW-2005 Workshop on the Weblogging Ecosystem, 2005.
[2]
Sameer Agarwal, Kristin Branson, and Serge Belongie. Higher order learning with graphs. ICML '06, pages 17-24, 2006.
[3]
Noga Alon, Itai Benjamini, and Alan Stacey. Percolation on finite graphs and isoperimetric inequalities. Ann. Probab., 32(3):1727-1745, 2004.
[4]
Shaikh Arifuzzaman, Maleq Khan, and Madhav Marathe. Fast parallel algorithms for counting and listing triangles in big graphs. ACM Transactions on Knowledge Discovery from Data, 14(1):5:1-5:34, 2019.
[5]
Frank Bauer, J Urgen, and Shiping Liu. Ollivier-Ricci curvature and the spectrum of the normalized graph Laplace operator. Math. Res. Lett., 19(6):1185-1205, 2012.
[6]
Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. Efficient semi-streaming algorithms for local triangle counting in massive graphs. Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 16-24, 2008.
[7]
M. Belkin, I. Matveeva, and P. Niyogi. Regularization and semi-supervised learning on large graphs. COLT, 3120:624-638, 2004.
[8]
András A. Benczúr and David R. Karger. Approximating s-t minimum cuts in Õ(n2) time. Proceedings of the Twenty-eighth Annual ACM Symposium on Theory of Computing (STOC '96), pages 47-55, 1996.
[9]
Bhaswar B. Bhattacharya and Sumit Mukherjee. Exact and asymptotic results on coarse Ricci curvature of graphs. Discrete Mathematics, 338(6):23-42, 2015.
[10]
B. Bollobas, S. Janson, and O. Riordan. The phase transition in inhomogeneous random graphs. Random Structures and Algorithms, 31:3-122, 2007.
[11]
Béla Bollobás, Christian Borgs, Jennifer Chayes, and Oliver Riordan. Percolation on dense graph sequences. Ann. Probab., 38(1):150-183, 2010.
[12]
Karl Bringmann, Ralph Keusch, and Johannes Lengler. Sampling Geometric Inhomogeneous Random Graphs in Linear Time. In 25th Annual European Symposium on Algorithms (ESA 2017), volume 87 of Leibniz International Proceedings in Informatics (LIPIcs), pages 20:1-20:15. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017.
[13]
Andrei Z. Broder. On the resemblance and containment of documents. Proceeding of Compression and Complexity of Sequences, pages 21-29, 1997.
[14]
Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig. Syntactic clustering of the web. Computer Networks and ISDN Systems, 29:1157-1166, 1997.
[15]
S. Fortunato. Community detection in graphs. Physics Reports, 486(3-5):75-174, 2010.
[16]
Arpita Ghosh, Stephen Boyd, and Amin Saberi. Minimizing effective resistance of a graph. SIAM Review, 1(50):37-66, 2008.
[17]
Ashish Goel, Michael Kapralov, and Sanjeev Khanna. Graph sparsification via refinement sampling. arXiv:1004.4915, 2010.
[18]
A. Goldenberg, A. X. Zheng, S. E. Fienberg, and E. M. Airoldi. A survey of statistical network models. Foundations and Trends in Machine Learning, 2:129-233, 2010.
[19]
Rishi Gupta, Tim Roughgarden, and C. Seshadhri. Decompositions of triangle-dense graphs. Proceedings of the 5th conference on Innovations in theoretical computer science, pages 471-482, 2014.
[20]
Michael Hamann, Gerd Lindner, Henning Meyerhenke, Christian L. Staudt, and Dorothea Wagner. Structure-preserving sparsification methods for social networks. Social Network Analysis and Mining, 6(22):1-22, 2016.
[21]
Jian Huang, Shuangge Ma, Hongzhe Li, and Cun Hui Zhang. The sparse laplacian shrinkage estimator for high-dimensional regression. Annals of Statistics, 39(4):2021-2046, 2011.
[22]
Jürgen Jost and Shiping Liu. Ollivier's Ricci curvature, local clustering and curvature-dimension inequalities on graphs. Discrete and Computational Geometry, 51(2):300-322, 2014.
[23]
Michael Kapralov, Yin Tat Lee, Cameron Musco, Christopher Musco, and Aaron Sidford. Single pass spectral sparsification in dynamic streams. SIAM J. Comput., 46(1):456-477, 2014.
[24]
Jonathan Kelner and Alex Levin. Spectral sparsification in the semi-streaming setting. Theory of Computing Systems, 53(2):243-262, 2013.
[25]
Alisa Kirichenko and Harry van Zanten. Estimating a smooth function on a large graph by bayesian laplacian regularisation. Electron. J. Statist., 11(1):891-915, 2017.
[26]
Bryan Klimt and Yiming Yang. The enron corpus: A new dataset for email classification research. In Boulicaut JF., Esposito F., Giannotti F., Pedreschi D. (eds) Machine Learning: ECML 2004, volume 3201, pages 217-226, 2004.
[27]
Can M. Le and Tianxi Li. Linear regression and its inference on noisy network-linked data. arXiv:2007.00803, 2020.
[28]
Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data (ACM TKDD), 1(1):2007, 2007.
[29]
David A. Levin and Yuval Peres. Markov Chains and Mixing Times. American Mathematical Society, 2017.
[30]
Tianxi Li, Elizaveta Levina, and Ji Zhu. Network cross-validation by edge sampling. Biometrika, 107(2):257-276, 2020.
[31]
Yong Lin, Linyuan Lu, and S. T. Yau. Ricci-flat graphs with girth at least five. Communications in Analysis and Geometry, 22(4):671-687, 2014.
[32]
D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M. Dawson. The bottlenose dolphin community of doubtful sound features a large proportion of longlasting associations. can geographic isolation explain this unique trait? Behavioral Ecology and Sociobiology, 54:396-405, 2003.
[33]
Julian McAuley and Jure Leskovec. Learning to discover social circles in ego networks. Neural Information Processing Systems (NIPS), 2012.
[34]
Asaf Nachmias. Percolation on dense graph sequences. Geometric and Functional Analysis, 19(4):1171-1194, 2010.
[35]
M. E. J. Newman. Networks: An introduction. Oxford University Press, 2010.
[36]
Tin Lok James Ng, Thomas Brendan Murphy, Ted Westling, Tyler H. McCormick, and Bailey K. Fosdick. Modeling the social media relationships of Irish politicians using a generalized latent space stochastic blockmodel. arXiv:1807.06063, 2018.
[37]
Arlind Nocaj, Mark Ortmann, and Ulrik Brandes. Untangling hairballs - From 3 to 14 degrees of separation. Duncan CA, Symvonis A (eds) Graph Drawing-22nd international symposium, GD 2014, Würzburg, Germany, September 24-26, 2014, pages 101-112, 2014.
[38]
Roberto Oliveira. Concentration of the adjacency matrix and of the laplacian in random graphs with independent edges. arXiv:0911.0600, 2010.
[39]
Yann Ollivier. Ricci curvature of markov chains on metric spaces. Journal of Functional Analysis, 256(3):810-864, 2009.
[40]
Fragkiskos Papadopoulos, Rodrigo Aldecoa, and Dmitri Krioukov. Network geometry inference using common neighbors. Physical Review E, 92(2):022807, 2015.
[41]
J. A. Rodríguez. On the Laplacian eigenvalues and metric parameters of hypergraphs. Linear and Multilinear Algebra, 50(1):1-14, 2002.
[42]
Karl Rohe and Tai Qin. The blessing of transitivity in sparse and stochastic networks. arXiv:1307.2302, 2013.
[43]
Veeranjaneyulu Sadhanala, YuXiang Wang, and Ryan J. Tibshirani. Graph sparsification approaches for laplacian smoothing. AISTATS, pages 1250-1259, 2016.
[44]
Venu Satuluri, Srinivasan Parthasarathy, and Yiye Ruan. Local graph sparsification for scalable clustering. Proceedings of the 2011 international conference on Management of data (SIGMOD'11), pages 721-732, 2011.
[45]
Anshumali Shrivastava and Ping Li. Densifying one permutation hashing via rotation for fast near neighbor search. Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 557-565, 2014.
[46]
Anshumali Shrivastava and Ping Li. Asymmetric minwise hashing for indexing binary inner products and set containment. Proceedings of the 24th International Conference on World Wide Web (WWW'15), pages 981-991, 2015.
[47]
A.J. Smola and R. Kondor. Kernels and regularization on graphs. Learning Theory and Kernel Machines, 2777:144-158, 2003.
[48]
D. A. Spielman and S.-H. Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. Proceedings of the thirty-sixth annual ACM Symposium on Theory of Computing (STOC-04), pages 81-90, 2004.
[49]
Daniel A. Spielman and Nikhil Srivastava. Graph sparsification by effective resistances. SIAM J. Comput., 40(6):1913-1926, 2011.
[50]
Johan Ugander, Brian Karrer, Lars Backstrom, and Cameron Marlow. The anatomy of the facebook social graph. arXiv:1111.4503, 2011.
[51]
R. Vershynin. A note on sums of independent random matrices after Ahlswede-Winter. Available at http://www-personal.umich.edu/∼romanv/teaching/reading-group/ahlswede-winter.pdf, 2009.
[52]
Lutz Warnke. Upper tails for arithmetic progressions in random subsets. Israel Journal of Mathematics, 1(221):317-365, 2017.
[53]
D. J. Watts and S. H. Strogatz. Collective dynamics of 'small-world' networks. Nature, 393: 440, 1998.
[54]
J. Yang and J. Leskovec. Defining and evaluating network communities based on groundtruth. In 2012 IEEE 12th International Conference on Data Mining, pages 745-754, 2012.
[55]
Wayne W. Zachary. An information flow model for con ict and fission in small groups. Journal of Anthropological Research, 33(4):452-473, 1977.

Index Terms

  1. Edge sampling using local network information
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image The Journal of Machine Learning Research
          The Journal of Machine Learning Research  Volume 22, Issue 1
          January 2021
          13310 pages
          ISSN:1532-4435
          EISSN:1533-7928
          Issue’s Table of Contents
          CC-BY 4.0

          Publisher

          JMLR.org

          Publication History

          Accepted: 01 May 2021
          Published: 01 January 2021
          Revised: 01 August 2020
          Received: 01 April 2018
          Published in JMLR Volume 22, Issue 1

          Author Tags

          1. network analysis
          2. edge sampling
          3. local information
          4. network curvature
          5. clustering coefficient

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 57
            Total Downloads
          • Downloads (Last 12 months)29
          • Downloads (Last 6 weeks)7
          Reflects downloads up to 23 Nov 2024

          Other Metrics

          Citations

          View Options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Login options

          Full Access

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media