Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

An in-depth analysis of stochastic Kronecker graphs

Published: 03 May 2013 Publication History

Abstract

Graph analysis is playing an increasingly important role in science and industry. Due to numerous limitations in sharing real-world graphs, models for generating massive graphs are critical for developing better algorithms. In this article, we analyze the stochastic Kronecker graph model (SKG), which is the foundation of the Graph500 supercomputer benchmark due to its favorable properties and easy parallelization. Our goal is to provide a deeper understanding of the parameters and properties of this model so that its functionality as a benchmark is increased. We develop a rigorous mathematical analysis that shows this model cannot generate a power-law distribution or even a lognormal distribution. However, we formalize an enhanced version of the SKG model that uses random noise for smoothing. We prove both in theory and in practice that this enhancement leads to a lognormal distribution. Additionally, we provide a precise analysis of isolated vertices, showing that the graphs that are produced by SKG might be quite different than intended. For example, between 50% and 75% of the vertices in the Graph500 benchmarks will be isolated. Finally, we show that this model tends to produce extremely small core numbers (compared to most social networks and other real graphs) for common parameter choices.

References

[1]
Alvarez-Hamelin, J. I., Dall'Asta, L., Barrat, A., and Vespignani, A. 2008. K-core decomposition of internet graphs: hierarchies, self-similarity and measurement biases. Netw. Hetero. Media 3, 2, 371--393.
[2]
Andersen, R. and Chellapilla, K. 2009. Finding dense subgraphs with size bounds. In Algorithms and Models for the Web-Graph, Springer, 25--37.
[3]
Berry, A. 1941. The accuracy of the gaussian approximation to the sum of independent variates. Trans. AMS 49, 1, 122--136.
[4]
Bi, Z., Faloutsos, C., and Korn, F. 2001. The “DGX” distribution for mining massive, skewed data. In Proceedings of KDD'01. ACM, 17--26.
[5]
Carmi, S., Havlin, S., Kirkpatrick, S., Shavitt, Y., and Shir, E. 2007. A model of internet topology using k-shell decomposition. Proc. Nat. Acad. Sci. 104, 27, 11150--11154.
[6]
Chakrabarti, D. and Faloutsos, C. 2006. Graph mining: Laws, generators, and algorithms. ACM Comput. Surv. 38, 1.
[7]
Chakrabarti, D., Zhan, Y., and Faloutsos, C. 2004. R-MAT: A recursive model for graph mining. In Proceedings of SDM'04. 442--446.
[8]
Clauset, A., Shalizi, C. R., and Newman, M. E. J. 2009. Power-law distributions in empirical data. SIAM Rev. 51, 4, 661--703.
[9]
Esseen, C.-G. 1942. A moment inequality with an application to the central limit theorem. Skand. Aktuarietidskr 39, 160--170.
[10]
Feller, W. 1968. An Introduction to Probability Theory and Applications: Vol I. 3rd Ed. Wiley.
[11]
Gibson, D., Kleinberg, J., and Raghavan, P. 1998. Inferring web communities from link topology. In Proceedings of HYPERTEXT'98. ACM, 225--234.
[12]
Gkantsidis, C., Mihail, M., and Zegura, E. W. 2003. Spectral analysis of internet topologies. In Proceedings of INFOCOM 2003. IEEE, 364--374.
[13]
Goltsev, A. V., Dorogovtsev, S. N., and Mendes, J. F. F. 2006. k-core (bootstrap) percolation on complex networks: Critical phenomena and nonlocal effects. Phys. Rev. E 73, 5, 056101.
[14]
Graph500 Steering Committee. 2010. Graph 500 benchmark. http://www.graph500.org/specifications.
[15]
Groër, C., Sullivan, B. D., and Poole, S. 2011. A mathematical analysis of the R-MAT random graph generator. Networks 58, 3, 159--170.
[16]
Ibragimov, I. A. 1956. On the composition of unimodal distributions. Theory Probability its App. 1, 2, 255--260.
[17]
Kim, M. and Leskovec, J. 2010. Multiplicative attribute graph model of real-world networks. arXiv:1009.3499v2.
[18]
Kleinberg, J. M. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5, 604--632.
[19]
Kumar, R., Novak, J., and Tomkins, A. 2010. Structure and evolution of online social networks. In Link Mining: Models, Algorithms, and Applications, Springer, 337--357.
[20]
Leskovec, J., Chakrabarti, D., Kleinberg, J., and Faloutsos, C. 2005. Realistic, mathematically tractable graph generation and evolution, using Kronecker multiplication. In Proceedings of PKDD 2005. Springer, 133--145.
[21]
Leskovec, J., Chakrabarti, D., Kleinberg, J., Faloutsos, C., and Ghahramani, Z. 2010. Kronecker graphs: An approach to modeling networks. J. Mach. Learn. Res. 11, 985--1042.
[22]
Leskovec, J. and Faloutsos, C. 2007. Scalable modeling of real graphs using kronecker multiplication. In Proceedings of ICML'07. ACM, 497--504.
[23]
Mahdian, M. and Xu, Y. 2007. Stochastic kronecker graphs. In Algorithms and Models for the Web-Graph, Springer, 179--186.
[24]
Mahdian, M. and Xu, Y. 2011. Stochastic Kronecker graphs. Rand. Struct. Algor. 38, 4, 453--466.
[25]
McDiarmid, C. 1989. On the method of bounded differences. Surv. Combinat. 141, 148--188.
[26]
Miller, B., Bliss, N., and Wolfe, P. 2010. Subgraph detection using eigenvector L1 norms. In Proceedings of NIPS 2010. 1633--1641.
[27]
Mitzenmacher, M. 2003. A brief history of generative models for power law and lognormal distributions. Internet Math. 1, 2, 226--251.
[28]
Mitzenmacher, M. 2006. The future of power law research. Internet Math. 2, 4, 525--534.
[29]
Moreno, S., Kirshner, S., Neville, J., and Vishwanathan, S. V. N. 2010. Tied Kronecker product graph models to capture variance in network populations. In Proceedings of the 48th Annual Allerton Conference on Communication, Control, and Computing. 1137--1144.
[30]
Motwani, R. and Raghavan, P. 1995. Randomized Algorithms. Cambridge University Press.
[31]
Pennock, D., Flake, G., Lawrence, S., Glover, E., and Giles, C. L. 2002. Winners don't take all: Characterizing the competition for links on the web. Proc. Nat. Acad.Sci. 99, 8, 5207--5211.
[32]
Sala, A., Cao, L., Wilson, C., Zablit, R., Zheng, H., and Zhao, B. Y. 2010. Measurement-calibrated graph models for social network experiments. In Proceedings of WWW'10. ACM, 861--870.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of the ACM
Journal of the ACM  Volume 60, Issue 2
April 2013
237 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/2450142
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 May 2013
Accepted: 01 December 2012
Revised: 01 August 2012
Received: 01 September 2011
Published in JACM Volume 60, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Graph500
  2. R-MAT
  3. Stochastic Kronecker Graphs (SKG)
  4. graph models

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)The Infinity Mirror Test for Graph ModelsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.314025235:4(4281-4292)Online publication date: 1-Apr-2023
  • (2023)A Framework for Analyzing the Robustness of Graph Models2023 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC58863.2023.10363600(1-6)Online publication date: 25-Sep-2023
  • (2023)Epidemic spreading on coupling network with higher-order information layerNew Journal of Physics10.1088/1367-2630/ad092025:11(113043)Online publication date: 28-Nov-2023
  • (2022)BioCode: A Data-Driven Procedure to Learn the Growth of Biological NetworksIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2022.316509219:6(3103-3113)Online publication date: 5-Apr-2022
  • (2022)Functional Ball Dropping: A superfast hypergraph generation scheme2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020506(4606-4615)Online publication date: 17-Dec-2022
  • (2022)Community detection in complex networksWIREs Computational Statistics10.1002/wics.156614:2Online publication date: 17-Mar-2022
  • (2021)FastSGG: Efficient Social Graph Generation Using a Degree Distribution Generation Model2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00055(564-575)Online publication date: Apr-2021
  • (2021)The Kronecker-clique model for higher-order clustering coefficientsPhysica A: Statistical Mechanics and its Applications10.1016/j.physa.2021.126269582(126269)Online publication date: Nov-2021
  • (2021)iPUG: Accelerating Breadth-First Graph Traversals Using Manycore Graphcore IPUsHigh Performance Computing10.1007/978-3-030-78713-4_16(291-309)Online publication date: 24-Jun-2021
  • (2020)A scalable graph generation algorithm to sample over a given shell distribution2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW50202.2020.00051(227-236)Online publication date: May-2020
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media