research-article

A Sampling-Based Tool for Scaling Graph Datasets

Authors:

Ahmed Musaafir,

Ana-Lucia VarbanescuAuthors Info & Claims

ICPE '20: Proceedings of the ACM/SPEC International Conference on Performance Engineering

Pages 289 - 300

https://doi.org/10.1145/3358960.3379144

Published: 20 April 2020 Publication History

Abstract

Graph processing has become a topic of interest in many domains. However, we still observe a lack of representative datasets for in-depth performance and scalability analysis. Neither data collections, nor graph generators provide enough diversity and control for thorough analysis. To address this problem, we proposea heuristic method for scaling existing graphs. Our approach, based onsampling andinterconnection, can provide a scaled "version" of a given graph. Moreover, we provide analytical models to predict the topological properties of the scaled graphs (such as the diameter, degree distribution, density, or the clustering coefficient), and further enable the user to tweak these properties. Property control is achieved through a portfolio of graph interconnection methods (e.g., star, ring, chain, fully connected) applied for combining the graph samples. We further implement our method as an open-source tool which can be used to quickly provide families of datasets for in-depth benchmarking of graph processing algorithms. Our empirical evaluation demonstrates our tool provides scaled graphs of a wide range of sizes, whose properties match well with model predictions and/or user requirements. Finally, we also illustrate, through a case-study, how scaled graphs can be used for in-depth performance analysis of graph processing algorithms.

References

[1]

Nesreen Ahmed, Jennifer Neville, and Ramana Rao Kompella. 2011. Network sampling via edge-based node selection with graph induction. (2011).

[2]

Nesreen K Ahmed, Jennifer Neville, and Ramana Kompella. 2012. Network sampling designs for relational classification. In Sixth International AAAI Conference on Weblogs and Social Media.

[3]

Mohammad Al Hasan. 2016. Methods and applications of network sampling. In Optimization Challenges in Complex, Networked and Risky Systems. INFORMS, 115--139.

[4]

Neli Blagus, Lovro vS ubelj, and Marko Bajec. 2014. Assessing the effectiveness of real-world network simplification. Physica A: Statistical Mechanics and its Applications, Vol. 413 (2014), 134--146.

[5]

Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proceedings of the 2004 SIAM International Conference on Data Mining. SIAM, 442--446.

[6]

The Graph 500 Steering Committee. 2010--2016. The Graph 500 List. http://www.graph500.org.

[7]

Orri Erling, Alex Averbuch, Josep Larriba-Pey, Hassan Chafi, Andrey Gubichev, Arnau Prat, Minh-Duc Pham, and Peter Boncz. 2015. The LDBC social network benchmark: Interactive workload. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 619--630.

Digital Library

[8]

Graphalytics.org. 2019. Graphalytics Global Competition. https://graphalytics.org/competition.

[9]

Yong Guo, Ana Lucia Varbanescu, Alexandru Iosup, Claudio Martella, and Theodore L Willke. 2014. Benchmarking graph-processing platforms: a vision. In Proceedings of the 5th ACM/SPEC international conference on Performance engineering. ACM, 289--292.

Digital Library

[10]

Alexander Gutfraind, Ilya Safro, and Lauren Ancel Meyers. 2015. Multiscale network generation. In 2015 18th International Conference on Information Fusion (Fusion). IEEE, 158--165.

[11]

Pili Hu and Wing Cheong Lau. 2013. A survey and taxonomy of graph sampling. arXiv preprint arXiv:1308.5865 (2013).

[12]

Alexandru Iosup, Tim Hegeman, Wing Lung Ngai, Stijn Heldens, Arnau Prat-Pérez, Thomas Manhardto, Hassan Chafio, Mihai Capotua, Narayanan Sundaram, Michael Anderson, et almbox. 2016. LDBC Graphalytics: A benchmark for large-scale graph analysis on parallel and distributed platforms. Proceedings of the VLDB Endowment, Vol. 9, 13 (2016), 1317--1328.

Digital Library

[13]

Farzad Khorasani, Rajiv Gupta, and Laxmi N Bhuyan. 2015. Scalable simd-efficient graph processing on gpus. In 2015 International Conference on Parallel Architecture and Compilation (PACT). IEEE, 39--50.

Digital Library

[14]

Jérôme Kunegis. 2013. KONECT: The Koblenz Network Collection. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 1343--1350.

Digital Library

[15]

Sang Hoon Lee, Pan-Jun Kim, and Hawoong Jeong. 2006. Statistical properties of sampled networks. Physical review E, Vol. 73, 1 (2006), 016102.

[16]

Jure Leskovec. 2006. Stanford Network Analysis Platform (SNAP). Stanford University (2006).

[17]

Jure Leskovec. 2008. Kronecker Graphs (presentation). (2008).

[18]

Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, and Zoubin Ghahramani. 2010. Kronecker graphs: An approach to modeling networks. Journal of Machine Learning Research, Vol. 11, Feb (2010), 985--1042.

Digital Library

[19]

Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 631--636.

Digital Library

[20]

Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.

[21]

Joshua Lothian, Sarah Powers, Blair D Sullivan, Matthew Baker, Jonathan Schrock, and Stephen W Poole. 2013. Synthetic graph generation for data-intensive HPC benchmarking: Background and framework. Oak Ridge National Laboratory, Tech. Rep. ORNL/TM-2013/339 (2013).

[22]

Ahmed Musaafir. 2019. Graph scaling tool (repository). https://github.com/amusaafir/graph-scaling.

[23]

Ahmed Musaafir and Ana Varbanescu. 2018. Graph scaling models. https://github.com/amusaafir/graph-scaling/blob/master/scaling-guidelines.pdf.

[24]

Ryan Rossi and Nesreen Ahmed. 2015. The network data repository with interactive graph analytics and visualization. In Twenty-Ninth AAAI Conference on Artificial Intelligence.

[25]

Mirko Spasic, Milos Jovanovik, and Arnau Prat-Pérez. 2016. An RDF Dataset Generator for the Social Network Benchmark with Real-World Coherence. In BLINK@ ISWC.

[26]

Christian L Staudt, Michael Hamann, Ilya Safro, Alexander Gutfraind, and Henning Meyerhenke. 2016. Generating scaled replicas of real-world complex networks. In International Workshop on Complex Networks and their Applications. Springer, 17--28.

[27]

Narayanan Sundaram, Nadathur Satish, Md Mostofa Ali Patwary, Subramanya R Dulloor, Michael J Anderson, Satya Gautam Vadlamudi, Dipankar Das, and Pradeep Dubey. 2015. Graphmat: High performance graph analytics made productive. Proceedings of the VLDB Endowment, Vol. 8, 11 (2015), 1214--1225.

Digital Library

[28]

Alexandru Uta, Ana Lucia Varbanescu, Ahmed Musaafir, Chris Lemaire, and Alexandru Iosup. 2018. Exploring hpc and big data convergence: A graph processing study on intel knights landing. In 2018 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 66--77.

[29]

Merijn Verstraaten, Ana Lucia Varbanescu, and Cees de Laat. 2018. Mix-and-Match: A Model-driven Runtime Optimisation Strategy for BFS on GPUs. In 2018 IEEE/ACM 8th Workshop on Irregular Applications: Architectures and Algorithms (IA3). IEEE, 53--60.

[30]

Marija Vivs tica, Ani Grubivs ic, and Branko vZ itko. 2016. Applying graph sampling methods on student model initialization in intelligent tutoring systems. The International Journal of Information and Learning Technology, Vol. 33, 4 (2016), 202--218.

[31]

JW Zhang and YC Tay. 2016. GSCALER: Synthetically Scaling A Given Graph. In EDBT, Vol. 16. 53--64.

Index Terms

A Sampling-Based Tool for Scaling Graph Datasets
1. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Graph algorithms

Recommendations

Generalized sampling of graph signals with the prior information based on graph fractional Fourier transform
Abstract
The graph fractional Fourier transform (GFRFT) has been applied to graph signal processing and has become an important tool in graph signal processing. However, most of the graph signals are usually non-bandlimited in the GFRFT domain. How to ...
Highlights
- we develop a generalized sampling framework for graph signals based on the GFRFT.
- It allows arbitrary input of the graph signal with or without bandlimited in the GFRFT domain.
- When the space and smoothness are unknown, the ...
Tight Bounds on Vertex Connectivity Under Sampling
Special Issue on SODA'15 and Regular Papers

A fundamental result by Karger [10] states that for any λ-edge-connected graph with n nodes, independently sampling each edge with probability p = Ω(log (n)/λ) results in a graph that has edge connectivity Ω(λp), with high probability. This article ...
Sampling of graph signals with successive aggregations based on graph fractional Fourier transform
Abstract
The sampling defined on the graph nodes is a crucial method for graph signals sampling, especially considering graph structure. The sampling theory of the graph frequency domain for bandlimited graph signals has blossomed in recent ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICPE '20: Proceedings of the ACM/SPEC International Conference on Performance Engineering

April 2020

319 pages

ISBN:9781450369916

DOI:10.1145/3358960

General Chairs:
J. Nelson Amaral
University of Alberta, Canada
,
Anne Koziolek
Karlruhe Institute of Technology (KIT), Germany
,
Program Chairs:
Catia Trubiani
Gran Sasso Science Institute, GSSI, Italy
,
Alexandru Iosup
VU Amsterdam, Netherlands

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 April 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICPE '20

Sponsor:

ICPE '20: ACM/SPEC International Conference on Performance Engineering

April 25 - 30, 2020

Edmonton AB, Canada

Acceptance Rates

ICPE '20 Paper Acceptance Rate 15 of 62 submissions, 24%;

Overall Acceptance Rate 252 of 851 submissions, 30%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
120
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents