Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3358960.3379144acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
research-article

A Sampling-Based Tool for Scaling Graph Datasets

Published: 20 April 2020 Publication History

Abstract

Graph processing has become a topic of interest in many domains. However, we still observe a lack of representative datasets for in-depth performance and scalability analysis. Neither data collections, nor graph generators provide enough diversity and control for thorough analysis. To address this problem, we proposea heuristic method for scaling existing graphs. Our approach, based onsampling andinterconnection, can provide a scaled "version" of a given graph. Moreover, we provide analytical models to predict the topological properties of the scaled graphs (such as the diameter, degree distribution, density, or the clustering coefficient), and further enable the user to tweak these properties. Property control is achieved through a portfolio of graph interconnection methods (e.g., star, ring, chain, fully connected) applied for combining the graph samples. We further implement our method as an open-source tool which can be used to quickly provide families of datasets for in-depth benchmarking of graph processing algorithms. Our empirical evaluation demonstrates our tool provides scaled graphs of a wide range of sizes, whose properties match well with model predictions and/or user requirements. Finally, we also illustrate, through a case-study, how scaled graphs can be used for in-depth performance analysis of graph processing algorithms.

References

[1]
Nesreen Ahmed, Jennifer Neville, and Ramana Rao Kompella. 2011. Network sampling via edge-based node selection with graph induction. (2011).
[2]
Nesreen K Ahmed, Jennifer Neville, and Ramana Kompella. 2012. Network sampling designs for relational classification. In Sixth International AAAI Conference on Weblogs and Social Media.
[3]
Mohammad Al Hasan. 2016. Methods and applications of network sampling. In Optimization Challenges in Complex, Networked and Risky Systems. INFORMS, 115--139.
[4]
Neli Blagus, Lovro vS ubelj, and Marko Bajec. 2014. Assessing the effectiveness of real-world network simplification. Physica A: Statistical Mechanics and its Applications, Vol. 413 (2014), 134--146.
[5]
Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proceedings of the 2004 SIAM International Conference on Data Mining. SIAM, 442--446.
[6]
The Graph 500 Steering Committee. 2010--2016. The Graph 500 List. http://www.graph500.org.
[7]
Orri Erling, Alex Averbuch, Josep Larriba-Pey, Hassan Chafi, Andrey Gubichev, Arnau Prat, Minh-Duc Pham, and Peter Boncz. 2015. The LDBC social network benchmark: Interactive workload. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 619--630.
[8]
Graphalytics.org. 2019. Graphalytics Global Competition. https://graphalytics.org/competition.
[9]
Yong Guo, Ana Lucia Varbanescu, Alexandru Iosup, Claudio Martella, and Theodore L Willke. 2014. Benchmarking graph-processing platforms: a vision. In Proceedings of the 5th ACM/SPEC international conference on Performance engineering. ACM, 289--292.
[10]
Alexander Gutfraind, Ilya Safro, and Lauren Ancel Meyers. 2015. Multiscale network generation. In 2015 18th International Conference on Information Fusion (Fusion). IEEE, 158--165.
[11]
Pili Hu and Wing Cheong Lau. 2013. A survey and taxonomy of graph sampling. arXiv preprint arXiv:1308.5865 (2013).
[12]
Alexandru Iosup, Tim Hegeman, Wing Lung Ngai, Stijn Heldens, Arnau Prat-Pérez, Thomas Manhardto, Hassan Chafio, Mihai Capotua, Narayanan Sundaram, Michael Anderson, et almbox. 2016. LDBC Graphalytics: A benchmark for large-scale graph analysis on parallel and distributed platforms. Proceedings of the VLDB Endowment, Vol. 9, 13 (2016), 1317--1328.
[13]
Farzad Khorasani, Rajiv Gupta, and Laxmi N Bhuyan. 2015. Scalable simd-efficient graph processing on gpus. In 2015 International Conference on Parallel Architecture and Compilation (PACT). IEEE, 39--50.
[14]
Jérôme Kunegis. 2013. KONECT: The Koblenz Network Collection. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 1343--1350.
[15]
Sang Hoon Lee, Pan-Jun Kim, and Hawoong Jeong. 2006. Statistical properties of sampled networks. Physical review E, Vol. 73, 1 (2006), 016102.
[16]
Jure Leskovec. 2006. Stanford Network Analysis Platform (SNAP). Stanford University (2006).
[17]
Jure Leskovec. 2008. Kronecker Graphs (presentation). (2008).
[18]
Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, and Zoubin Ghahramani. 2010. Kronecker graphs: An approach to modeling networks. Journal of Machine Learning Research, Vol. 11, Feb (2010), 985--1042.
[19]
Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 631--636.
[20]
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
[21]
Joshua Lothian, Sarah Powers, Blair D Sullivan, Matthew Baker, Jonathan Schrock, and Stephen W Poole. 2013. Synthetic graph generation for data-intensive HPC benchmarking: Background and framework. Oak Ridge National Laboratory, Tech. Rep. ORNL/TM-2013/339 (2013).
[22]
Ahmed Musaafir. 2019. Graph scaling tool (repository). https://github.com/amusaafir/graph-scaling.
[23]
Ahmed Musaafir and Ana Varbanescu. 2018. Graph scaling models. https://github.com/amusaafir/graph-scaling/blob/master/scaling-guidelines.pdf.
[24]
Ryan Rossi and Nesreen Ahmed. 2015. The network data repository with interactive graph analytics and visualization. In Twenty-Ninth AAAI Conference on Artificial Intelligence.
[25]
Mirko Spasic, Milos Jovanovik, and Arnau Prat-Pérez. 2016. An RDF Dataset Generator for the Social Network Benchmark with Real-World Coherence. In BLINK@ ISWC.
[26]
Christian L Staudt, Michael Hamann, Ilya Safro, Alexander Gutfraind, and Henning Meyerhenke. 2016. Generating scaled replicas of real-world complex networks. In International Workshop on Complex Networks and their Applications. Springer, 17--28.
[27]
Narayanan Sundaram, Nadathur Satish, Md Mostofa Ali Patwary, Subramanya R Dulloor, Michael J Anderson, Satya Gautam Vadlamudi, Dipankar Das, and Pradeep Dubey. 2015. Graphmat: High performance graph analytics made productive. Proceedings of the VLDB Endowment, Vol. 8, 11 (2015), 1214--1225.
[28]
Alexandru Uta, Ana Lucia Varbanescu, Ahmed Musaafir, Chris Lemaire, and Alexandru Iosup. 2018. Exploring hpc and big data convergence: A graph processing study on intel knights landing. In 2018 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 66--77.
[29]
Merijn Verstraaten, Ana Lucia Varbanescu, and Cees de Laat. 2018. Mix-and-Match: A Model-driven Runtime Optimisation Strategy for BFS on GPUs. In 2018 IEEE/ACM 8th Workshop on Irregular Applications: Architectures and Algorithms (IA3). IEEE, 53--60.
[30]
Marija Vivs tica, Ani Grubivs ic, and Branko vZ itko. 2016. Applying graph sampling methods on student model initialization in intelligent tutoring systems. The International Journal of Information and Learning Technology, Vol. 33, 4 (2016), 202--218.
[31]
JW Zhang and YC Tay. 2016. GSCALER: Synthetically Scaling A Given Graph. In EDBT, Vol. 16. 53--64.

Index Terms

  1. A Sampling-Based Tool for Scaling Graph Datasets

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICPE '20: Proceedings of the ACM/SPEC International Conference on Performance Engineering
    April 2020
    319 pages
    ISBN:9781450369916
    DOI:10.1145/3358960
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 April 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. graph datasets scaling
    2. graph sampling
    3. graph scaling tool
    4. heuristic methods

    Qualifiers

    • Research-article

    Conference

    ICPE '20

    Acceptance Rates

    ICPE '20 Paper Acceptance Rate 15 of 62 submissions, 24%;
    Overall Acceptance Rate 252 of 851 submissions, 30%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 120
      Total Downloads
    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Nov 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media