research-article

Public Access

Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable

Authors:

Laxman Dhulipala,

Guy E. Blelloch,

Julian ShunAuthors Info & Claims

SPAA '18: Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures

Pages 393 - 404

https://doi.org/10.1145/3210377.3210414

Published: 11 July 2018 Publication History

Abstract

There has been significant recent interest in parallel graph processing due to the need to quickly analyze the large graphs available today. Many graph codes have been designed for distributed memory or external memory. However, today even the largest publicly-available real-world graph (the Hyperlink Web graph with over 3.5 billion vertices and 128 billion edges) can fit in the memory of a single commodity multicore server. Nevertheless, most experimental work in the literature report results on much smaller graphs, and the ones for the Hyperlink graph use distributed or external memory. Therefore, it is natural to ask whether we can efficiently solve a broad class of graph problems on this graph in memory. This paper shows that theoretically-efficient parallel graph algorithms can scale to the largest publicly-available graphs using a single machine with a terabyte of RAM, processing them in minutes. We give implementations of theoretically-efficient parallel algorithms for 13 important graph problems. We also present the optimizations and techniques that we used in our implementations, which were crucial in enabling us to process these large graphs quickly. We show that the running times of our implementations outperform existing state-of-the-art implementations on the largest real-world graphs. For many of the problems that we consider, this is the first time they have been solved on graphs at this scale. We provide a publicly-available benchmark suite containing our implementations.

References

[1]

C. R. Aberger, A. Lamb, S. Tu, A. Nötzli, K. Olukotun, and C. Ré. Emptyheaded: A relational engine for graph processing. ACM Trans. Database Syst., 2017.

Digital Library

[2]

A. Aggarwal, R. J. Anderson, and M.-Y. Kao. Parallel depth-first search in general directed graphs. In STOC, 1989.

Digital Library

[3]

N. Alon, L. Babai, and A. Itai. A fast and simple randomized parallel algorithm for the maximal independent set problem. J. Algorithms, 1986.

Digital Library

[4]

R. Anderson and E. W. Mayr. A $mathsfP$-complete problem and approximations to it. Technical report, 1984.

Digital Library

[5]

D. A. Bader and G. Cong. Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs. JPDC, 2006.

Digital Library

[6]

S. Beamer, K. Asanovic, and D. A. Patterson. The GAP benchmark suite. CoRR, abs/1508.03619, 2015.

[7]

N. Ben-David, G. E. Blelloch, J. T. Fineman, P. B. Gibbons, Y. Gu, C. McGuffey, and J. Shun. Implicit decomposition for write-efficient connectivity algorithms. In IPDPS, 2018.

[8]

M. Birn, V. Osipov, P. Sanders, C. Schulz, and N. Sitchinava. Efficient parallel and external matching. In Euro-Par, 2013.

Digital Library

[9]

G. E. Blelloch. Prefix sums and their applications. Synthesis of Parallel Algorithms, 1993.

[10]

G. E. Blelloch and L. Dhulipala. Introduction to parallel algorithms 15--853: Algorithms in the real world. 2018.

[11]

G. E. Blelloch, J. T. Fineman, and J. Shun. Greedy sequential maximal independent set and matching are parallel on average. In SPAA, 2012.

Digital Library

[12]

G. E. Blelloch, Y. Gu, J. Shun, and Y. Sun. Parallelism in randomized incremental algorithms. In SPAA, 2016.

Digital Library

[13]

G. E. Blelloch, Y. Gu, and Y. Sun. A new efficient construction on probabilistic tree embeddings. In ICALP, 2017.

[14]

G. E. Blelloch, R. Peng, and K. Tangwongsan. Linear-work greedy parallel approximate set cover and variants. In SPAA, 2011.

Digital Library

[15]

G. E. Blelloch, H. V. Simhadri, and K. Tangwongsan. Parallel and I/O efficient set covering algorithms. In SPAA, 2012.

Digital Library

[16]

R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. J. ACM, 46(5), Sept. 1999.

Digital Library

[17]

P. Boldi and S. Vigna. The WebGraph framework I: Compression techniques. In WWW, 2004.

Digital Library

[18]

O. Borruvka. O jistém problému minimálním. Práce Mor. Pvrírodovved. Spol. v Brnve III, 3, 1926.

[19]

U. Brandes. A faster algorithm for betweenness centrality. Journal of mathematical sociology, 25(2), 2001.

[20]

A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web. Computer networks, 33(1--6), 2000.

Digital Library

[21]

R. Cole, P. N. Klein, and R. E. Tarjan. Finding minimum spanning forests in logarithmic time and linear work using random sampling. In SPAA, 1996.

Digital Library

[22]

G. Cong and D. A. Bader. An experimental study of parallel biconnected components algorithms on symmetric multiprocessors (SMPs). In IPDPS, 2005.

Digital Library

[23]

G. Cong and I. G. Tanase. Composable locality optimizations for accelerating parallel forest computations. In HPCC, 2016.

[24]

D. Coppersmith, L. Fleischer, B. Hendrickson, and A. Pinar. A divide-and-conquer algorithm for identifying strongly connected components. 2003.

[25]

T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms (3. ed.). MIT Press, 2009.

Digital Library

[26]

D. M. Da Zheng, R. Burns, J. Vogelstein, C. E. Priebe, and A. S. Szalay. Flashgraph: Processing billion-node graphs on an array of commodity SSDs. In FAST, 2015.

Digital Library

[27]

N. S. Dasari, R. Desh, and M. Zubair. ParK: An efficient algorithm for k -core decomposition on multicore processors. In Big Data, 2014.

[28]

L. Dhulipala, G. Blelloch, and J. Shun. Julienne: A framework for parallel graph algorithms using work-efficient bucketing. In SPAA, 2017.

Digital Library

[29]

L. Dhulipala, G. E. Blelloch, and J. Shun. Theoretically efficient parallel graph algorithms can be fast and scalable. CoRR, abs/1805.05208, 2018.

Digital Library

[30]

L. Dhulipala, I. Kabiljo, B. Karrer, G. Ottaviano, S. Pupyrev, and A. Shalita. Compressing graphs and indexes with recursive graph bisection. In KDD, 2016.

Digital Library

[31]

J. A. Edwards and U. Vishkin. Better speedups using simpler parallel programming for graph connectivity and biconnectivity. In PMAM, 2012.

Digital Library

[32]

J. T. Fineman. Nearly work-efficient parallel algorithm for digraph reachability. In STOC, 2018.

Digital Library

[33]

M. Fischer and A. Noever. Tight analysis of parallel randomized greedy MIS. In SODA, 2018.

Digital Library

[34]

L. K. Fleischer, B. Hendrickson, and A. Pinar. On identifying strongly connected components in parallel. In IPDPS, 2000.

Digital Library

[35]

H. Gazit and G. L. Miller. An improved parallel algorithm that computes the BFS numbering of a directed graph. Information Processing Letters, 28(2), 1988.

Digital Library

[36]

J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. PowerGraph: Distributed graph-parallel computation on natural graphs. In OSDI, 2012.

Digital Library

[37]

O. Green, L. M. Munguia, and D. A. Bader. Load balanced clustering coefficients. In PPAA, 2014.

Digital Library

[38]

R. Greenlaw, H. J. Hoover, and W. L. Ruzzo. Limits to Parallel Computation: P-completeness Theory. Oxford University Press, Inc., 1995.

Digital Library

[39]

Y. Gu, J. Shun, Y. Sun, and G. E. Blelloch. A top-down parallel semisort. In SPAA, 2015.

Digital Library

[40]

W. Hasenplaugh, T. Kaler, T. B. Schardl, and C. E. Leiserson. Ordering heuristics for parallel graph coloring. In SPAA, 2014.

Digital Library

[41]

S. Hong, N. C. Rodia, and K. Olukotun. On fast parallel detection of strongly connected components (SCC) in small-world graphs. In SC, 2013.

Digital Library

[42]

J. Hopcroft and R. Tarjan. Algorithm 447: efficient algorithms for graph manipulation. Communications of the ACM, 1973.

Digital Library

[43]

A. Israeli and Y. Shiloach. An improved parallel algorithm for maximal matching. Inf. Process. Lett., 1986.

Digital Library

[44]

J. Jaja. Introduction to Parallel Algorithms. Addison-Wesley Professional, 1992.

Digital Library

[45]

S. W. Jun, A. Wright, S. Zhang, S. Xu, and Arvind. BigSparse: High-performance external graph analytics. CoRR, abs/1710.07736, 2017.

[46]

H. Kabir and K. Madduri. Parallel k-core decomposition on multicore platforms. In IPDPSW, May 2017.

[47]

D. R. Karger, P. N. Klein, and R. E. Tarjan. A randomized linear-time algorithm to find minimum spanning trees. J. ACM, 42(2), Mar. 1995.

Digital Library

[48]

R. M. Karp and V. Ramachandran. Handbook of theoretical computer science (vol. a). chapter Parallel Algorithms for Shared-memory Machines. MIT Press, Cambridge, MA, USA, 1990.

Digital Library

[49]

R. M. Karp and A. Wigderson. A fast parallel algorithm for the maximal independent set problem. In STOC, 1984.

Digital Library

[50]

J. Kim, W.-S. Han, S. Lee, K. Park, and H. Yu. OPT: A new framework for overlapped and parallel triangulation in large-scale graphs. In SIGMOD, 2014.

Digital Library

[51]

H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In WWW, 2010.

Digital Library

[52]

Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A new parallel framework for machine learning. In UAI, 2010.

Digital Library

[53]

M. Luby. A simple parallel algorithm for the maximal independent set problem. SIAM J. Comput., 1986.

Digital Library

[54]

S. Maass, C. Min, S. Kashyap, W. Kang, M. Kumar, and T. Kim. Mosaic: Processing a trillion-edge graph on a single machine. In EuroSys, 2017.

Digital Library

[55]

S. Maleki, D. Nguyen, A. Lenharth, M. Garzarán, D. Padua, and K. Pingali. DSMR: A parallel algorithm for single-source shortest path problem. In ICS, 2016.

Digital Library

[56]

G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In SIGMOD, 2010.

Digital Library

[57]

Y. Maon, B. Schieber, and U. Vishkin. Parallel ear decomposition search (EDS) and st-numbering in graphs. Theoretical Computer Science, 47, 1986.

Digital Library

[58]

D. W. Matula and L. L. Beck. Smallest-last ordering and clustering and graph coloring algorithms. J. ACM, 30(3), July 1983.

Digital Library

[59]

R. R. McCune, T. Weninger, and G. Madey. Thinking like a vertex: A survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Comput. Surv., 48(2), Oct. 2015.

Digital Library

[60]

W. Mclendon Iii, B. Hendrickson, S. J. Plimpton, and L. Rauchwerger. Finding strongly connected components in distributed graphs. Journal of Parallel and Distributed Computing, 65(8), 2005.

Digital Library

[61]

R. Meusel, S. Vigna, O. Lehmberg, and C. Bizer. The graph structure in the web--analyzed on different aggregation levels. The Journal of Web Science, 1(1), 2015.

[62]

U. Meyer and P. Sanders. Δ-stepping: a parallelizable shortest path algorithm. J. Algorithms, 49(1), 2003.

Digital Library

[63]

G. L. Miller, R. Peng, and S. C. Xu. Parallel graph decompositions using random shifts. In SPAA, 2013.

Digital Library

[64]

G. L. Miller and V. Ramachandran. A new graph triconnectivity algorithm and its parallelization. Combinatorica, 12(1), Mar 1992.

[65]

D. Nguyen, A. Lenharth, and K. Pingali. A lightweight infrastructure for graph analytics. In SOSP, 2013.

Digital Library

[66]

S. Nobari, T.-T. Cao, P. Karras, and S. Bressan. Scalable parallel minimum spanning forest computation. In PPoPP, 2012.

Digital Library

[67]

M. Patwary, P. Refsnes, and F. Manne. Multi-core spanning forest algorithms using the disjoint-set data structure. In IPDPS, 2012.

Digital Library

[68]

S. Pettie and V. Ramachandran. A randomized time-work optimal parallel algorithm for finding a minimum spanning forest. SIAM J. Comput., 31(6), 2002.

Digital Library

[69]

V. Ramachandran. A framework for parallel graph algorithm design. In Optimal Algorithms, 1989.

Digital Library

[70]

V. Ramachandran. Parallel open ear decomposition with applications to graph biconnectivity and triconnectivity. In Synthesis of Parallel Algorithms, 1993.

[71]

A. E. Sariyuce, C. Seshadhri, and A. Pinar. Parallel local algorithms for core, truss, and nucleus decompositions. arXiv preprint arXiv:1704.00386, 2017.

[72]

W. Schudy. Finding strongly connected components in parallel using $O(łog^2 N)$ reachability queries. In SPAA, 2008.

Digital Library

[73]

S. B. Seidman. Network structure and minimum degree. Soc. Networks, 5(3), 1983.

[74]

M. Sevenich, S. Hong, A. Welc, and H. Chafi. Fast in-memory triangle listing for large real-world graphs. In Workshop on Social Network Mining and Analysis, 2014.

Digital Library

[75]

Y. Shiloach and U. Vishkin. An $O(łog n)$ parallel connectivity algorithm. J. Algorithms, 1982.

[76]

J. Shun and G. E. Blelloch. Ligra: A lightweight graph processing framework for shared memory. In PPoPP, 2013.

Digital Library

[77]

J. Shun, G. E. Blelloch, J. T. Fineman, and P. B. Gibbons. Reducing contention through priority updates. In SPAA, 2013.

Digital Library

[78]

J. Shun, G. E. Blelloch, J. T. Fineman, P. B. Gibbons, A. Kyrola, H. V. Simhadri, and K. Tangwongsan. Brief announcement: the Problem Based Benchmark Suite. In SPAA, 2012.

Digital Library

[79]

J. Shun, L. Dhulipala, and G. E. Blelloch. A simple and practical linear-work parallel algorithm for connectivity. In SPAA, 2014.

Digital Library

[80]

J. Shun, L. Dhulipala, and G. E. Blelloch. Smaller and faster: Parallel processing of compressed graphs with Ligra

[81]

. In DCC, 2015.

[82]

J. Shun and K. Tangwongsan. Multicore triangle computations without tuning. In ICDE, 2015.

[83]

G. M. Slota and K. Madduri. Simple parallel biconnectivity algorithms for multicore platforms. In HiPC, 2014.

[84]

G. M. Slota, S. Rajamanickam, and K. Madduri. BFS and coloring-based parallel algorithms for strongly connected components and related problems. In IPDPS, 2014.

Digital Library

[85]

G. M. Slota, S. Rajamanickam, and K. Madduri. Supercomputing for Web Graph Analytics. Apr 2015.

[86]

G. M. Slota, S. Rajamanickam, and K. Madduri. A case study of complex graph analysis in distributed memory: Implementation and optimization. In IPDPS, 2016.

[87]

S. Stergiou, D. Rughwani, and K. Tsioutsiouliklis. Shortcutting label propagation for distributed connected components. In WSDM, 2018.

Digital Library

[88]

R. E. Tarjan and U. Vishkin. An efficient parallel biconnectivity algorithm. SIAM Journal on Computing, 1985.

[89]

D. J. Welsh and M. B. Powell. An upper bound for the chromatic number of a graph and its application to timetabling problems. The Computer Journal, 1967.

[90]

D. Yan, Y. Bu, Y. Tian, and A. Deshpande. Big graph analytics platforms. Foundations and Trends in Databases, 7, 2017.

Digital Library

[91]

W. Zhou. A practical scalable shared-memory parallel algorithm for computing minimum spanning trees. Master's thesis, KIT, 2017.

Cited By

Baumann YKyng RAgrawal KPetrank E(2024)A Framework for Parallelizing Approximate Gaussian EliminationProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659987(195-206)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659987
Xie YDong YQiu JYu WFeng XTang J(2023)SketchNE: Embedding Billion-Scale Networks Accurately in One HourIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.325070335:10(10666-10680)Online publication date: 1-Oct-2023
https://doi.org/10.1109/TKDE.2023.3250703
Xie YQiu JDhulipala LYu WTang JPeng RWang C(2023)Towards Lightweight and Automated Representation Learning System for NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.324316935:9(9613-9627)Online publication date: 1-Sep-2023
https://doi.org/10.1109/TKDE.2023.3243169
Show More Cited By

Index Terms

Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable
1. Theory of computation
  1. Design and analysis of algorithms
    1. Parallel algorithms
      1. Shared memory algorithms

Recommendations

Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable

There has been significant recent interest in parallel graph processing due to the need to quickly analyze the large graphs available today. Many graph codes have been designed for distributed memory or external memory. However, today even the largest ...
A Reconfigurable Computing Approach for Efficient and Scalable Parallel Graph Exploration
ASAP '12: Proceedings of the 2012 IEEE 23rd International Conference on Application-Specific Systems, Architectures and Processors

In many application domains, data are represented using large graphs involving millions of vertices and billions of edges. Graph exploration algorithms, such asbreadth-first search (BFS), are largely dominated by memory latency and are challenging to ...
A new augmentation based algorithm for extracting maximal chordal subgraphs

A graph is chordal if every cycle of length greater than three contains an edge between non-adjacent vertices. Chordal graphs are of interest both theoretically, since they admit polynomial time solutions to a range of NP-hard graph problems, and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SPAA '18: Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures

July 2018

437 pages

ISBN:9781450357999

DOI:10.1145/3210377

General Chair:
Christian Scheideler
University of Paderborn, Germany
,
Program Chair:
Jeremy Fineman
Georgetown University, USA

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGARCH: ACM Special Interest Group on Computer Architecture
EATCS: European Association for Theoretical Computer Science

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF

Conference

SPAA '18

Sponsor:

SPAA '18: 30th ACM Symposium on Parallelism in Algorithms and Architectures

July 16 - 18, 2018

Vienna, Austria

Acceptance Rates

SPAA '18 Paper Acceptance Rate 36 of 120 submissions, 30%;

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

63
Total Citations
View Citations
987
Total Downloads

Downloads (Last 12 months)131
Downloads (Last 6 weeks)18

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Baumann YKyng RAgrawal KPetrank E(2024)A Framework for Parallelizing Approximate Gaussian EliminationProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659987(195-206)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659987
Xie YDong YQiu JYu WFeng XTang J(2023)SketchNE: Embedding Billion-Scale Networks Accurately in One HourIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.325070335:10(10666-10680)Online publication date: 1-Oct-2023
https://doi.org/10.1109/TKDE.2023.3250703
Xie YQiu JDhulipala LYu WTang JPeng RWang C(2023)Towards Lightweight and Automated Representation Learning System for NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.324316935:9(9613-9627)Online publication date: 1-Sep-2023
https://doi.org/10.1109/TKDE.2023.3243169
Dutta ALakshmanan KRamamoorthy AVoumik LHarshith JMotha J(2023)A Review on Optimality Investigation Strategies for the Balanced Assignment Problem2023 International Conference on Computational Intelligence and Sustainable Engineering Solutions (CISES)10.1109/CISES58720.2023.10183493(254-259)Online publication date: 28-Apr-2023
https://doi.org/10.1109/CISES58720.2023.10183493
Xu XWang FJiang HCheng YHua YFeng DZhang Y(2023)LOSC: A locality-optimized subgraph construction scheme for out-of-core graph processingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.10.005172(51-68)Online publication date: Feb-2023
https://doi.org/10.1016/j.jpdc.2022.10.005
Dhulipala LEisenstat DŁącki JMirrokni VShi JKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Hierarchical agglomerative graph clustering in poly-logarithmic depthProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601936(22925-22940)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601936
Shi JDhulipala LShun J(2022)Theoretically and practically efficient parallel nucleus decompositionProceedings of the VLDB Endowment10.14778/3494124.349414015:3(583-596)Online publication date: 4-Feb-2022
https://dl.acm.org/doi/10.14778/3494124.3494140
Shi JHuang LShun J(2022)Parallel Five-cycle Counting AlgorithmsACM Journal of Experimental Algorithmics10.1145/355654127(1-23)Online publication date: 21-Oct-2022
https://dl.acm.org/doi/10.1145/3556541
Postnikova AKoval NNadiradze GAlistarh DLee JAgrawal KSpear M(2022)Multi-queues can be state-of-the-art priority schedulersProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508432(353-367)Online publication date: 2-Apr-2022
https://dl.acm.org/doi/10.1145/3503221.3508432
Yasar ARajamanickam SBerry JCatalyurek U(2022)A Block-Based Triangle Counting Algorithm on Heterogeneous EnvironmentsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.309324033:2(444-458)Online publication date: 1-Feb-2022
https://doi.org/10.1109/TPDS.2021.3093240
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents