research-article

PowerLyra: differentiated graph computation and partitioning on skewed graphs

Authors:

Haibo ChenAuthors Info & Claims

EuroSys '15: Proceedings of the Tenth European Conference on Computer Systems

Article No.: 1, Pages 1 - 15

https://doi.org/10.1145/2741948.2741970

Published: 17 April 2015 Publication History

Abstract

Natural graphs with skewed distribution raise unique challenges to graph computation and partitioning. Existing graph-parallel systems usually use a "one size fits all" design that uniformly processes all vertices, which either suffer from notable load imbalance and high contention for high-degree vertices (e.g., Pregel and GraphLab), or incur high communication cost and memory consumption even for low-degree vertices (e.g., PowerGraph and GraphX).

In this paper, we argue that skewed distribution in natural graphs also calls for differentiated processing on high-degree and low-degree vertices. We then introduce PowerLyra, a new graph computation engine that embraces the best of both worlds of existing graph-parallel systems, by dynamically applying different computation and partitioning strategies for different vertices. PowerLyra further provides an efficient hybrid graph partitioning algorithm (hybrid-cut) that combines edge-cut and vertex-cut with heuristics. Based on PowerLyra, we design locality-conscious data layout optimization to improve cache locality of graph accesses during communication. PowerLyra is implemented as a separate computation engine of PowerGraph, and can seamlessly support various graph algorithms. A detailed evaluation on two clusters using graph-analytics and MLDM (machine learning and data mining) applications show that PowerLyra outperforms PowerGraph by up to 5.53X (from 1.24X) and 3.26X (from 1.49X) for real-world and synthetic graphs accordingly, and is much faster than other systems like GraphX and Giraph, yet with much less memory consumption. A porting of hybrid-cut to GraphX further confirms the efficiency and generality of PowerLyra.

Supplementary Material

MP4 File (a1-sidebyside.mp4)

Download
917.00 MB

References

[1]

The 9th dimacs implementation challenge - shortest paths. http://www.dis.uniroma1.it/challenge9/.

[2]

The Apache Giraph Project. http://giraph.apache.org/.

[3]

The Apache Hama Project. http://hama.apache.org/.

[4]

Laboratory for web algorithmics. http://law.di.unimi.it/.

[5]

Large-scale parallel collaborative filtering for the netflix prize. In AAIM, pages 337--348, 2008.

Digital Library

[6]

A. Abou-Rjeili and G. Karypis. Multilevel algorithms for partitioning power-law graphs. In IPDPS, 2006.

Digital Library

[7]

L. A. Adamic and B. A. Huberman. Zipfs law and the internet. Glottometrics, 3(1): 143--150, 2002.

[8]

P. Boldi, B. Codenotti, M. Santini, and S. Vigna. Ubicrawler: A scalable fully distributed web crawler. Software: Practice and Experience, 34(8): 711--726, 2004.

Digital Library

[9]

S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In WWW, pages 107--117, 1998.

Digital Library

[10]

A. Buluç and J. R. Gilbert. The combinatorial blas: Design, implementation, and applications. IJHPCA, 2011.

Digital Library

[11]

Ü. i. t. V. Çatalyürek, C. Aykanat, and B. Uçar. On two-dimensional sparse matrix partitioning: Models, methods, and a recipe. SISC, 32(2): 656--683, 2010.

Digital Library

[12]

Ü. V. Çatalyürek and C. Aykanat. Decomposing irregularly sparse matrices for parallel matrix-vector multiplication. In IRREGULAR, pages 75--86. 1996.

Digital Library

[13]

R. Chen, M. Yang, X. Weng, B. Choi, B. He, and X. Li. Improving large graph processing on partitioned graphs in the cloud. In SoCC, 2012.

Digital Library

[14]

R. Chen, X. Ding, P. Wang, H. Chen, B. Zang, and H. Guan. Computation and communication efficient graph processing with distributed immutable view. In HPDC, 2014.

Digital Library

[15]

R. Cheng, J. Hong, A. Kyrola, Y. Miao, X. Weng, M. Wu, F. Yang, L. Zhou, F. Zhao, and E. Chen. Kineograph: taking the pulse of a fast-changing and connected world. In EuroSys, pages 85--98, 2012.

Digital Library

[16]

F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P. Raghavan. On compressing social networks. In KDD, pages 219--228, 2009.

Digital Library

[17]

M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. In SIGCOMM, pages 251--262, 1999.

Digital Library

[18]

J. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. PowerGraph: Distributed graph-parallel computation on natural graphs. In OSDI, 2012.

Digital Library

[19]

J. E. Gonzalez, Y. Low, C. Guestrin, and D. O'Hallaron. Distributed parallel inference on large factor graphs. In UAI, pages 203--212, 2009.

Digital Library

[20]

J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica. Graphx: Graph processing in a distributed dataflow framework. In OSDI, 2014.

Digital Library

[21]

W. Hant, Y. Miao, K. Li, M. Wu, F. Yang, L. Zhou, V. Prabhakaran, W. Chen, and E. Chen. Chronos: a graph engine for temporal graph analysis. In EuroSys, 2014.

Digital Library

[22]

H. Haselgrove. Wikipedia page-to-page link database. http://haselgrove.id.au/wikipedia.htm, 2010.

[23]

I. Hoque and I. Gupta. Lfgraph: Simple and fast distributed graph analytics. In TRIOS, 2013.

Digital Library

[24]

N. Jain, G. Liao, and T. L. Willke. Graphbuilder: scalable graph etl framework. In GRADES, 2013.

Digital Library

[25]

U. Kang, C. E. Tsourakakis, A. P. Appel, C. Faloutsos, and J. Leskovec. Hadi: Mining radii of large graphs. TKDD, 5(2): 8, 2011.

Digital Library

[26]

G. Karypis and V. Kumar. Parallel multilevel series k-way partitioning scheme for irregular graphs. Siam Review, 41(2): 278--300, 1999.

Digital Library

[27]

Z. Khayyat, K. Awara, A. Alonazi, H. Jamjoom, D. Williams, and P. Kalnis. Mizan: a system for dynamic load balancing in large-scale graph processing. In EuroSys, 2013.

Digital Library

[28]

H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In WWW, 2010.

Digital Library

[29]

A. Kyrola, G. Blelloch, and C. Guestrin. GraphChi: Large-scale graph computation on just a PC. In OSDI, 2012.

Digital Library

[30]

K. Lang. Finding good nearly balanced cuts in power law graphs. Technical report, Yahoo Research Lab, 2004.

[31]

J. Leskovec, J. Kleinberg, and C. Faloutsos. Graph evolution: Densification and shrinking diameters. TKDD, 1(1): 2, 2007.

Digital Library

[32]

J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1): 29--123, 2009.

[33]

Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed GraphLab: a framework for machine learning and data mining in the cloud. VLDB, 5(8): 716--727, 2012.

Digital Library

[34]

A. Lumsdaine, D. Gregor, B. Hendrickson, and J. Berry. Challenges in parallel graph processing. PPL, 17(01): 5--20, 2007.

[35]

G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD, pages 135--146, 2010.

Digital Library

[36]

D. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: a timely dataflow system. In SOSP, 2013.

Digital Library

[37]

D. Nguyen, A. Lenharth, and K. Pingali. A lightweight infrastructure for graph analytics. In SOSP, pages 456--471, 2013.

Digital Library

[38]

B. Panda, J. Herbach, S. Basu, and R. Bayardo. PLANET: massively parallel learning of tree ensembles with MapReduce. VLDB, 2(2): 1426--1437, 2009.

Digital Library

[39]

S. N. A. Project. Stanford large network dataset collection. http://snap.stanford.edu/data/.

[40]

A. Roy, I. Mihailovic, and W. Zwaenepoel. X-stream: Edge-centric graph processing using streaming partitions. In SOSP, 2013.

Digital Library

[41]

A. Sala, L. Cao, C. Wilson, R. Zablit, H. Zheng, and B. Y. Zhao. Measurement-calibrated graph models for social network experiments. In WWW, pages 861--870, 2010.

Digital Library

[42]

A. Sala, X. Zhao, C. Wilson, H. Zheng, and B. Y. Zhao. Sharing graphs using differentially private graph models. In IMC, pages 81--98, 2011.

Digital Library

[43]

S. Salihoglu and J. Widom. Gps: A graph processing system. In SSDBM, page 22. ACM, 2013.

Digital Library

[44]

K. Schloegel, G. Karypis, and V. Kumar. Parallel multilevel algorithms for multi-constraint graph partitioning (distinguished paper). In Euro-Par, pages 296--310, 2000.

Digital Library

[45]

J. Seo, J. Park, J. Shin, and M. S. Lam. Distributed socialite: a datalog-based language for large-scale graph analysis. VLDB, 6(14): 1906--1917, 2013.

Digital Library

[46]

B. Shao, H. Wang, and Y. Li. Trinity: A distributed graph engine on a memory cloud. In SIGMOD, 2013.

Digital Library

[47]

J. Shun and G. E. Blelloch. Ligra: a lightweight graph processing framework for shared memory. In PPoPP, 2013.

Digital Library

[48]

A. Smola and S. Narayanamurthy. An architecture for parallel topic models. VLDB, 3 (1--2): 703--710, 2010.

Digital Library

[49]

I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In SIGKDD, pages 1222--1230, 2012.

Digital Library

[50]

G. Takács, I. Pilászy, B. Németh, and D. Tikk. Scalable collaborative filtering approaches for large recommender systems. JMLR, 10: 623--656, 2009.

Digital Library

[51]

Y. Tian, A. Balmin, S. A. Corsten, S. Tatikonda, and J. McPherson. From think like a vertex to think like a graph. VLDB, 7(3), 2013.

Digital Library

[52]

C. Tsourakakis, C. Gkantsidis, B. Radunovic, and M. Vojnovic. Fennel: Streaming graph partitioning for massive scale graphs. In WSDM, pages 333--342, 2014.

Digital Library

[53]

L. G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8): 103--111, 1990.

Digital Library

[54]

P. Wang, K. Zhang, R. Chen, H. Chen, and H. Guan. Replication-based fault-tolerance for large-scale graph processing. In DSN, 2014.

Digital Library

[55]

C. Wilson, B. Boe, A. Sala, K. P. Puttaswamy, and B. Y. Zhao. User interactions in social networks and their implications. In EuroSys, pages 205--218, 2009.

Digital Library

[56]

C. Xie, L. Yan, W.-J. Li, and Z. Zhang. Distributed power-law graph computing: Theoretical and empirical analysis. In NIPS, pages 1673--1681, 2014.

[57]

C. Xie, R. Chen, H. Guan, B. Zang, and H. Chen. Sync or async: Time to fuse for distributed graph-parallel computation. In PPoPP, 2015.

Digital Library

[58]

J. Ye, J. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In CIKM, 2009.

Digital Library

[59]

A. Yoo, E. Chow, K. Henderson, W. McLendon, B. Hendrickson, and U. Catalyurek. A scalable distributed parallel breadth-first search algorithm on bluegene/l. In SC, 2005.

Digital Library

[60]

M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, 2012.

Digital Library

[61]

K. Zhang, R. Chen, and H. Chen. Numa-aware graph-structured analytics. In PPoPP, 2015.

Digital Library

[62]

X. Zhao, A. Chang, A. D. Sarma, H. Zheng, and B. Y. Zhao. On the embeddability of random walk distances. VLDB, 6 (14), 2013.

Digital Library

[63]

Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan. Large-scale parallel collaborative filtering for the netflix prize. In AAIM, pages 337--348, 2008.

Digital Library

Cited By

Yang TChen YLiang YYang MMa XWon Y(2024)SeraphProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650719(373-387)Online publication date: 27-Feb-2024
https://dl.acm.org/doi/10.5555/3650697.3650719
Liu CPeng ZZheng WZou L(2024)FSM: A Fine-Grained Splitting and Merging Framework for Dual-Balanced Graph PartitionProceedings of the VLDB Endowment10.14778/3665844.366586417:9(2378-2391)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.14778/3665844.3665864
Yao FTao QYu WZhang YGong SWang QYu GZhou J(2024)RAGraph: A Region-Aware Framework for Geo-Distributed Graph ProcessingProceedings of the VLDB Endowment10.14778/3632093.363209417:3(264-277)Online publication date: 20-Jan-2024
https://doi.org/10.14778/3632093.3632094
Show More Cited By

Index Terms

PowerLyra: differentiated graph computation and partitioning on skewed graphs

Recommendations

PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs

Natural graphs with skewed distributions raise unique challenges to distributed graph computation and partitioning. Existing graph-parallel systems usually use a “one-size-fits-all” design that uniformly processes all vertices, which either suffer from ...
On the Multichromatic Number of s-Stable Kneser Graphs

For positive integers n and s, a subset Sï [n] is s-stable if sï |i-j|ï n-s for distinct i,j∈S . The s-stable r-uniform Kneser hypergraph KGrn,ks-stable is the r-uniform hypergraph that has the collection of all s-stable k-element subsets of [n] as ...
Adjacent vertex-distinguishing edge and total chromatic numbers of hypercubes

An adjacent vertex-distinguishing edge coloring of a simple graph G is a proper edge coloring of G such that incident edge sets of any two adjacent vertices are assigned different sets of colors. A total coloring of a graph G is a coloring of both the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

EuroSys '15: Proceedings of the Tenth European Conference on Computer Systems

April 2015

503 pages

ISBN:9781450332385

DOI:10.1145/2741948

General Chair:
Laurent Réveillère
LaBRI, University of Bordeaux, France
,
Program Chairs:
Tim Harris
Oracle Labs, UK
,
Maurice Herlihy
Brown University

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGOPS: ACM Special Interest Group on Operating Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 April 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

EuroSys '15

Sponsor:

SIGOPS

EuroSys '15: Tenth EuroSys Conference 2015

April 21 - 24, 2015

Bordeaux, France

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25

Sponsor:
sigops

Twentieth European Conference on Computer Systems

March 30 - April 3, 2025

Rotterdam , Netherlands

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

202
Total Citations
View Citations
2,129
Total Downloads

Downloads (Last 12 months)30
Downloads (Last 6 weeks)7

Reflects downloads up to 22 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yang TChen YLiang YYang MMa XWon Y(2024)SeraphProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650719(373-387)Online publication date: 27-Feb-2024
https://dl.acm.org/doi/10.5555/3650697.3650719
Liu CPeng ZZheng WZou L(2024)FSM: A Fine-Grained Splitting and Merging Framework for Dual-Balanced Graph PartitionProceedings of the VLDB Endowment10.14778/3665844.366586417:9(2378-2391)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.14778/3665844.3665864
Yao FTao QYu WZhang YGong SWang QYu GZhou J(2024)RAGraph: A Region-Aware Framework for Geo-Distributed Graph ProcessingProceedings of the VLDB Endowment10.14778/3632093.363209417:3(264-277)Online publication date: 20-Jan-2024
https://doi.org/10.14778/3632093.3632094
Akbudak K(2024)Hypergraph-based locality-enhancing methods for graph operations in Big Data applicationsInternational Journal of High Performance Computing Applications10.1177/1094342023121453238:3(210-224)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.1177/10943420231214532
Huang KZhai JZheng LWang HJin YZhang QZhang RZheng ZYi YShen X(2024)WiseGraph: Optimizing GNN with Joint Workload Partition of Graph and OperationsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650063(1-17)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3627703.3650063
Chen RWei XXie XChen H(2024)Locality-Preserving Graph Traversal With Split Live MigrationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.343682835:10(1810-1825)Online publication date: Oct-2024
https://doi.org/10.1109/TPDS.2024.3436828
Xia YYang DZhou XCheng D(2024)Scaling New Heights: Transformative Cross-GPU Sampling for Training Billion-Edge GraphsProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00056(1-15)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SC41406.2024.00056
Xu HSong SMao Z(2024)Characterizing the Performance of Emerging Deep Learning, Graph, and High Performance Computing Workloads Under Interference2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00098(468-477)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00098
Chen YYu J(2024)Accelerating SpMV for Scale-Free Graphs with Optimized Bins2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00190(2407-2420)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00190
Sun XWang WHuang T(2024)How to Fit the SCC Algorithm Efficiently into Distributed Graph Iterative Computation2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00043(254-263)Online publication date: 2-Jul-2024
https://doi.org/10.1109/COMPSAC61105.2024.00043
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents