Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Graph partitioning strategies: one size does not fit all

Published: 01 November 2022 Publication History

Abstract

As an important part of distributed graph computing, graph partitioning has been widely studied. However, the majority of the existing approaches to distributed graph partitioning barely take into consideration the relationship between the partition strategy and the graph algorithm’s characteristics. The current graph partitioning strategies are on the basis of the graph structure, and the performance is diverse as running different graph algorithms. In this paper, considering the characteristics of a graph algorithm, we propose a distributed graph partitioning framework that can decide which partitioning strategy is better based on the program analysis. We also design a triangle-based partitioning strategy which benefits several graph algorithms such as the triangle counting. In addition, to reduce the number of shuffling operations in PageRank algorithm, we redesign the PageRank algorithm which considers the corresponding hash partitioning. The experimental results show that our adaptive graph partitioning framework implemented on Graphx surpasses the original one, especially for the triangle counting and PageRank, no matter on real-world graphs or synthetic graphs.

References

[1]
Buluç A and Gilbert JR The combinatorial blas: design, implementation, and applications Int J High Perform Comput Appl 2011 25 4 496-509
[2]
Cheng R, Hong J, Kyrola A, Miao Y, Weng X, Wu M, Yang F, Zhou L, Zhao F, Chen E (2012) Kineograph: taking the pulse of a fast-changing and connected world. In: Proceedings of the 7th ACM European Conference on Computer Systems, pp 85–98
[3]
Stutz P, Bernstein A, Cohen W (2010) Signal/collect: graph algorithms for the (semantic) web. In: International Semantic Web Conference, pp 764–780. Springer
[4]
Çatalyürek ÜV, Aykanat C (1996) Decomposing irregularly sparse matrices for parallel matrix-vector multiplication. In: Ferreira A, Rolim J, Saad Y, Yang T (eds) Parallel algorithms for irregularly structured problems, pp 75–86. Springer, Berlin
[5]
Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp 135–146
[6]
Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2012) Distributed graphlab: a framework for machine learning in the cloud. arXiv preprint arXiv:1204.6078
[7]
Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) Powergraph: distributed graph-parallel computation on natural graphs. In: 10th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 12), pp 17–30
[8]
Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) Graphx: graph processing in a distributed dataflow framework. In: 11th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 14), pp 599–613
[9]
Chen R, Shi J, Chen Y, Zang B, Guan H, and Chen H Powerlyra: differentiated graph computation and partitioning on skewed graphs ACM Trans Parall Comput TOPC 2019 5 3 1-39
[10]
Dean J, Ghemawat S (2004) Mapreduce: simplified data processing on large clusters
[11]
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauly M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: 9th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 12), pp 15–28
[12]
Page Lawrence, Brin Sergey, Motwani Rajeev, and Winograd Terry The pagerank citation ranking: bringing order to the web 1999 Stanford InfoLab Technical report
[13]
Karypis George and Kumar Vipin Parallel multilevel series k-way partitioning scheme for irregular graphs SIAM Rev 1999 41 2 278-300
[14]
Schloegel K, Karypis G, Kumar V (2000) Parallel multilevel algorithms for multi-constraint graph partitioning. In: Bode A, Ludwig T, Karl W, Wismüller R (eds) Euro-Par 2000 parallel processing, pp 296–310. Springer, Heidelberg
[15]
Faloutsos M, Faloutsos P, Faloutsos C (2011) On power-law relationships of the internet topology. In: The Structure and Dynamics of Networks, pp 195–206. Princeton University Press, Princeton
[16]
Newman MEJ Power laws, pareto distributions and Zipf’s law Contemp Phys 2005 46 5 323-351
[17]
Feige U, Hajiaghayi MT, and Lee JR Improved approximation algorithms for minimum weight vertex separators SIAM J Comput 2008 38 2 629-657
[18]
Zhang Y, Li D, Zhang C, Wang J, and Liu L Grapha: efficient partitioning and storage for distributed graph computation IEEE Trans Serv Comput 2017 14 1 155-166
[19]
Karypis G and Kumar V Multilevelk-way partitioning scheme for irregular graphs J Parall Distrib Comput 1998 48 1 96-129
[20]
Stanton I, Kliot G (2012) Streaming graph partitioning for large distributed graphs. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1222–1230
[21]
Tsourakakis C, Gkantsidis C, Radunovic B, Vojnovic M (2014) Fennel: Streaming graph partitioning for massive scale graphs. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp 333–342
[22]
Liu X, Zhou Y, Guan X, and Shen C A feasible graph partition framework for parallel computing of big graph Knowl Based Syst 2017 134 228-239
[23]
Zhu X, Chen W, Zheng W, Ma X (2016) Gemini: a computation-centric distributed graph processing system. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pp 301–316, Savannah, GA, November. USENIX Association
[24]
Xie C, Yan L, Li WJ, and Zhang Z Distributed power-law graph computing: theoretical and empirical analysis Nips 2014 27 1673-1681
[25]
Jain N, Liao G, Willke TL (2013) Graphbuilder: scalable graph etl framework. In: First International Workshop on Graph Data Management Experiences and Systems, GRADES ’13, New York, NY, USA. Association for Computing Machinery
[26]
Roshan D, Gurbinder G, Loc H, Hoang-Vu D, Alex B, Nikoli D, Marc S, Keshav P (2018) Gluon: a communication-optimizing substrate for distributed heterogeneous graph analytics. In: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp 752–768
[27]
Slota GM, Root C, Devine K, Madduri K, and Rajamanickam S Scalable, multi-constraint, complex-objective graph partitioning IEEE Trans Parall Distrib Syst 2020 31 12 2789-2801
[28]
Hoang L, Dathathri R, Gill G, Pingali K (2019) Cusp: a customizable streaming edge partitioner for distributed graph analytics. In: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp 439–450. IEEE
[29]
Gill G, Dathathri R, Hoang L, and Pingali K A study of partitioning policies for graph analytics on large-scale distributed platforms Proceedings of the VLDB Endowment 2018 12 4 321-334
[30]
Boman EG, Devine KD, Rajamanickam S (2013) Scalable matrix computations on large scale-free graphs using 2d graph partitioning. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp 1–12
[31]
Leskovec J, Krevl A(2014) Snap datasets: Stanford large network dataset collection
[32]
DIMACS (2006) The 9th dimacs implementation challenge—shortest paths
[33]
Dharavath R, Singh AN (2019) Spark’s graphx-based link prediction for social communication using triangle counting. Social Netw Anal Min 9(1):1–12
[35]
Tang J, Zhang J, Yao L, Li J (2008) Extraction and mining of an academic social network. In: Proceedings of the 17th International Conference on World Wide Web, pp 1193–1194
[36]
Ding Y, Yan S, Zhang Y, Dai W, and Dong L Predicting the attributes of social network users using a graph-based machine learning method Comput Commun 2016 73 3-11
[37]
Brin S and Page L The anatomy of a large-scale hypertextual web search engine Comput Netw ISDN Systems 1998 30 1–7 107-117
[38]
Alazawi Z, Abdljabar MB, Altowaijri S, Vegni AM, Mehmood R (2012) Icdms: an intelligent cloud based disaster management system for vehicular networks. In: International Workshop on Communication Technologies for Vehicles, pp 40–56. Springer
[39]
Tian Y, Mceachin RC, Santos C, States DJ, and Patel JM Saga: a subgraph matching tool for biological graphs Bioinformatics 2007 23 2 232-239
[40]
Somyung O, Ha J, Lee K, and Sejong O Degoviz: an interactive visualization tool for a differentially expressed genes heatmap and gene ontology graph App Sci 2017 7 6 543
[41]
Ying D Scientific collaboration and endorsement: network analysis of coauthorship and citation networks J Inform 2011 5 1 187-203
[42]
Dave A, Jindal A, Li LE, Xin R, Gonzalez J, Zaharia M (2016) Graphframes: an integrated api for mixing graph and relational queries. In: Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems, pp 1–8

Cited By

View all
  • (2025)Performance optimization of heterogeneous computing for large-scale dynamic graph dataThe Journal of Supercomputing10.1007/s11227-024-06562-381:1Online publication date: 1-Jan-2025

Index Terms

  1. Graph partitioning strategies: one size does not fit all
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image The Journal of Supercomputing
    The Journal of Supercomputing  Volume 78, Issue 17
    Nov 2022
    996 pages

    Publisher

    Kluwer Academic Publishers

    United States

    Publication History

    Published: 01 November 2022
    Accepted: 19 May 2022

    Author Tags

    1. Graph partitioning
    2. Triangle counting
    3. PageRank
    4. Graph computation

    Qualifiers

    • Research-article

    Funding Sources

    • Science and Technology Research Project of Hebei Higher Education Institutions
    • Natural Science Foundation of Hebei Province of China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Performance optimization of heterogeneous computing for large-scale dynamic graph dataThe Journal of Supercomputing10.1007/s11227-024-06562-381:1Online publication date: 1-Jan-2025

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media