Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3210377.3210414acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article
Public Access

Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable

Published: 11 July 2018 Publication History

Abstract

There has been significant recent interest in parallel graph processing due to the need to quickly analyze the large graphs available today. Many graph codes have been designed for distributed memory or external memory. However, today even the largest publicly-available real-world graph (the Hyperlink Web graph with over 3.5 billion vertices and 128 billion edges) can fit in the memory of a single commodity multicore server. Nevertheless, most experimental work in the literature report results on much smaller graphs, and the ones for the Hyperlink graph use distributed or external memory. Therefore, it is natural to ask whether we can efficiently solve a broad class of graph problems on this graph in memory. This paper shows that theoretically-efficient parallel graph algorithms can scale to the largest publicly-available graphs using a single machine with a terabyte of RAM, processing them in minutes. We give implementations of theoretically-efficient parallel algorithms for 13 important graph problems. We also present the optimizations and techniques that we used in our implementations, which were crucial in enabling us to process these large graphs quickly. We show that the running times of our implementations outperform existing state-of-the-art implementations on the largest real-world graphs. For many of the problems that we consider, this is the first time they have been solved on graphs at this scale. We provide a publicly-available benchmark suite containing our implementations.

References

[1]
C. R. Aberger, A. Lamb, S. Tu, A. Nötzli, K. Olukotun, and C. Ré. Emptyheaded: A relational engine for graph processing. ACM Trans. Database Syst., 2017.
[2]
A. Aggarwal, R. J. Anderson, and M.-Y. Kao. Parallel depth-first search in general directed graphs. In STOC, 1989.
[3]
N. Alon, L. Babai, and A. Itai. A fast and simple randomized parallel algorithm for the maximal independent set problem. J. Algorithms, 1986.
[4]
R. Anderson and E. W. Mayr. A $mathsfP$-complete problem and approximations to it. Technical report, 1984.
[5]
D. A. Bader and G. Cong. Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs. JPDC, 2006.
[6]
S. Beamer, K. Asanovic, and D. A. Patterson. The GAP benchmark suite. CoRR, abs/1508.03619, 2015.
[7]
N. Ben-David, G. E. Blelloch, J. T. Fineman, P. B. Gibbons, Y. Gu, C. McGuffey, and J. Shun. Implicit decomposition for write-efficient connectivity algorithms. In IPDPS, 2018.
[8]
M. Birn, V. Osipov, P. Sanders, C. Schulz, and N. Sitchinava. Efficient parallel and external matching. In Euro-Par, 2013.
[9]
G. E. Blelloch. Prefix sums and their applications. Synthesis of Parallel Algorithms, 1993.
[10]
G. E. Blelloch and L. Dhulipala. Introduction to parallel algorithms 15--853: Algorithms in the real world. 2018.
[11]
G. E. Blelloch, J. T. Fineman, and J. Shun. Greedy sequential maximal independent set and matching are parallel on average. In SPAA, 2012.
[12]
G. E. Blelloch, Y. Gu, J. Shun, and Y. Sun. Parallelism in randomized incremental algorithms. In SPAA, 2016.
[13]
G. E. Blelloch, Y. Gu, and Y. Sun. A new efficient construction on probabilistic tree embeddings. In ICALP, 2017.
[14]
G. E. Blelloch, R. Peng, and K. Tangwongsan. Linear-work greedy parallel approximate set cover and variants. In SPAA, 2011.
[15]
G. E. Blelloch, H. V. Simhadri, and K. Tangwongsan. Parallel and I/O efficient set covering algorithms. In SPAA, 2012.
[16]
R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. J. ACM, 46(5), Sept. 1999.
[17]
P. Boldi and S. Vigna. The WebGraph framework I: Compression techniques. In WWW, 2004.
[18]
O. Borruvka. O jistém problému minimálním. Práce Mor. Pvrírodovved. Spol. v Brnve III, 3, 1926.
[19]
U. Brandes. A faster algorithm for betweenness centrality. Journal of mathematical sociology, 25(2), 2001.
[20]
A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web. Computer networks, 33(1--6), 2000.
[21]
R. Cole, P. N. Klein, and R. E. Tarjan. Finding minimum spanning forests in logarithmic time and linear work using random sampling. In SPAA, 1996.
[22]
G. Cong and D. A. Bader. An experimental study of parallel biconnected components algorithms on symmetric multiprocessors (SMPs). In IPDPS, 2005.
[23]
G. Cong and I. G. Tanase. Composable locality optimizations for accelerating parallel forest computations. In HPCC, 2016.
[24]
D. Coppersmith, L. Fleischer, B. Hendrickson, and A. Pinar. A divide-and-conquer algorithm for identifying strongly connected components. 2003.
[25]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms (3. ed.). MIT Press, 2009.
[26]
D. M. Da Zheng, R. Burns, J. Vogelstein, C. E. Priebe, and A. S. Szalay. Flashgraph: Processing billion-node graphs on an array of commodity SSDs. In FAST, 2015.
[27]
N. S. Dasari, R. Desh, and M. Zubair. ParK: An efficient algorithm for k -core decomposition on multicore processors. In Big Data, 2014.
[28]
L. Dhulipala, G. Blelloch, and J. Shun. Julienne: A framework for parallel graph algorithms using work-efficient bucketing. In SPAA, 2017.
[29]
L. Dhulipala, G. E. Blelloch, and J. Shun. Theoretically efficient parallel graph algorithms can be fast and scalable. CoRR, abs/1805.05208, 2018.
[30]
L. Dhulipala, I. Kabiljo, B. Karrer, G. Ottaviano, S. Pupyrev, and A. Shalita. Compressing graphs and indexes with recursive graph bisection. In KDD, 2016.
[31]
J. A. Edwards and U. Vishkin. Better speedups using simpler parallel programming for graph connectivity and biconnectivity. In PMAM, 2012.
[32]
J. T. Fineman. Nearly work-efficient parallel algorithm for digraph reachability. In STOC, 2018.
[33]
M. Fischer and A. Noever. Tight analysis of parallel randomized greedy MIS. In SODA, 2018.
[34]
L. K. Fleischer, B. Hendrickson, and A. Pinar. On identifying strongly connected components in parallel. In IPDPS, 2000.
[35]
H. Gazit and G. L. Miller. An improved parallel algorithm that computes the BFS numbering of a directed graph. Information Processing Letters, 28(2), 1988.
[36]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. PowerGraph: Distributed graph-parallel computation on natural graphs. In OSDI, 2012.
[37]
O. Green, L. M. Munguia, and D. A. Bader. Load balanced clustering coefficients. In PPAA, 2014.
[38]
R. Greenlaw, H. J. Hoover, and W. L. Ruzzo. Limits to Parallel Computation: P-completeness Theory. Oxford University Press, Inc., 1995.
[39]
Y. Gu, J. Shun, Y. Sun, and G. E. Blelloch. A top-down parallel semisort. In SPAA, 2015.
[40]
W. Hasenplaugh, T. Kaler, T. B. Schardl, and C. E. Leiserson. Ordering heuristics for parallel graph coloring. In SPAA, 2014.
[41]
S. Hong, N. C. Rodia, and K. Olukotun. On fast parallel detection of strongly connected components (SCC) in small-world graphs. In SC, 2013.
[42]
J. Hopcroft and R. Tarjan. Algorithm 447: efficient algorithms for graph manipulation. Communications of the ACM, 1973.
[43]
A. Israeli and Y. Shiloach. An improved parallel algorithm for maximal matching. Inf. Process. Lett., 1986.
[44]
J. Jaja. Introduction to Parallel Algorithms. Addison-Wesley Professional, 1992.
[45]
S. W. Jun, A. Wright, S. Zhang, S. Xu, and Arvind. BigSparse: High-performance external graph analytics. CoRR, abs/1710.07736, 2017.
[46]
H. Kabir and K. Madduri. Parallel k-core decomposition on multicore platforms. In IPDPSW, May 2017.
[47]
D. R. Karger, P. N. Klein, and R. E. Tarjan. A randomized linear-time algorithm to find minimum spanning trees. J. ACM, 42(2), Mar. 1995.
[48]
R. M. Karp and V. Ramachandran. Handbook of theoretical computer science (vol. a). chapter Parallel Algorithms for Shared-memory Machines. MIT Press, Cambridge, MA, USA, 1990.
[49]
R. M. Karp and A. Wigderson. A fast parallel algorithm for the maximal independent set problem. In STOC, 1984.
[50]
J. Kim, W.-S. Han, S. Lee, K. Park, and H. Yu. OPT: A new framework for overlapped and parallel triangulation in large-scale graphs. In SIGMOD, 2014.
[51]
H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In WWW, 2010.
[52]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A new parallel framework for machine learning. In UAI, 2010.
[53]
M. Luby. A simple parallel algorithm for the maximal independent set problem. SIAM J. Comput., 1986.
[54]
S. Maass, C. Min, S. Kashyap, W. Kang, M. Kumar, and T. Kim. Mosaic: Processing a trillion-edge graph on a single machine. In EuroSys, 2017.
[55]
S. Maleki, D. Nguyen, A. Lenharth, M. Garzarán, D. Padua, and K. Pingali. DSMR: A parallel algorithm for single-source shortest path problem. In ICS, 2016.
[56]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In SIGMOD, 2010.
[57]
Y. Maon, B. Schieber, and U. Vishkin. Parallel ear decomposition search (EDS) and st-numbering in graphs. Theoretical Computer Science, 47, 1986.
[58]
D. W. Matula and L. L. Beck. Smallest-last ordering and clustering and graph coloring algorithms. J. ACM, 30(3), July 1983.
[59]
R. R. McCune, T. Weninger, and G. Madey. Thinking like a vertex: A survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Comput. Surv., 48(2), Oct. 2015.
[60]
W. Mclendon Iii, B. Hendrickson, S. J. Plimpton, and L. Rauchwerger. Finding strongly connected components in distributed graphs. Journal of Parallel and Distributed Computing, 65(8), 2005.
[61]
R. Meusel, S. Vigna, O. Lehmberg, and C. Bizer. The graph structure in the web--analyzed on different aggregation levels. The Journal of Web Science, 1(1), 2015.
[62]
U. Meyer and P. Sanders. Δ-stepping: a parallelizable shortest path algorithm. J. Algorithms, 49(1), 2003.
[63]
G. L. Miller, R. Peng, and S. C. Xu. Parallel graph decompositions using random shifts. In SPAA, 2013.
[64]
G. L. Miller and V. Ramachandran. A new graph triconnectivity algorithm and its parallelization. Combinatorica, 12(1), Mar 1992.
[65]
D. Nguyen, A. Lenharth, and K. Pingali. A lightweight infrastructure for graph analytics. In SOSP, 2013.
[66]
S. Nobari, T.-T. Cao, P. Karras, and S. Bressan. Scalable parallel minimum spanning forest computation. In PPoPP, 2012.
[67]
M. Patwary, P. Refsnes, and F. Manne. Multi-core spanning forest algorithms using the disjoint-set data structure. In IPDPS, 2012.
[68]
S. Pettie and V. Ramachandran. A randomized time-work optimal parallel algorithm for finding a minimum spanning forest. SIAM J. Comput., 31(6), 2002.
[69]
V. Ramachandran. A framework for parallel graph algorithm design. In Optimal Algorithms, 1989.
[70]
V. Ramachandran. Parallel open ear decomposition with applications to graph biconnectivity and triconnectivity. In Synthesis of Parallel Algorithms, 1993.
[71]
A. E. Sariyuce, C. Seshadhri, and A. Pinar. Parallel local algorithms for core, truss, and nucleus decompositions. arXiv preprint arXiv:1704.00386, 2017.
[72]
W. Schudy. Finding strongly connected components in parallel using $O(łog^2 N)$ reachability queries. In SPAA, 2008.
[73]
S. B. Seidman. Network structure and minimum degree. Soc. Networks, 5(3), 1983.
[74]
M. Sevenich, S. Hong, A. Welc, and H. Chafi. Fast in-memory triangle listing for large real-world graphs. In Workshop on Social Network Mining and Analysis, 2014.
[75]
Y. Shiloach and U. Vishkin. An $O(łog n)$ parallel connectivity algorithm. J. Algorithms, 1982.
[76]
J. Shun and G. E. Blelloch. Ligra: A lightweight graph processing framework for shared memory. In PPoPP, 2013.
[77]
J. Shun, G. E. Blelloch, J. T. Fineman, and P. B. Gibbons. Reducing contention through priority updates. In SPAA, 2013.
[78]
J. Shun, G. E. Blelloch, J. T. Fineman, P. B. Gibbons, A. Kyrola, H. V. Simhadri, and K. Tangwongsan. Brief announcement: the Problem Based Benchmark Suite. In SPAA, 2012.
[79]
J. Shun, L. Dhulipala, and G. E. Blelloch. A simple and practical linear-work parallel algorithm for connectivity. In SPAA, 2014.
[80]
J. Shun, L. Dhulipala, and G. E. Blelloch. Smaller and faster: Parallel processing of compressed graphs with Ligra
[81]
. In DCC, 2015.
[82]
J. Shun and K. Tangwongsan. Multicore triangle computations without tuning. In ICDE, 2015.
[83]
G. M. Slota and K. Madduri. Simple parallel biconnectivity algorithms for multicore platforms. In HiPC, 2014.
[84]
G. M. Slota, S. Rajamanickam, and K. Madduri. BFS and coloring-based parallel algorithms for strongly connected components and related problems. In IPDPS, 2014.
[85]
G. M. Slota, S. Rajamanickam, and K. Madduri. Supercomputing for Web Graph Analytics. Apr 2015.
[86]
G. M. Slota, S. Rajamanickam, and K. Madduri. A case study of complex graph analysis in distributed memory: Implementation and optimization. In IPDPS, 2016.
[87]
S. Stergiou, D. Rughwani, and K. Tsioutsiouliklis. Shortcutting label propagation for distributed connected components. In WSDM, 2018.
[88]
R. E. Tarjan and U. Vishkin. An efficient parallel biconnectivity algorithm. SIAM Journal on Computing, 1985.
[89]
D. J. Welsh and M. B. Powell. An upper bound for the chromatic number of a graph and its application to timetabling problems. The Computer Journal, 1967.
[90]
D. Yan, Y. Bu, Y. Tian, and A. Deshpande. Big graph analytics platforms. Foundations and Trends in Databases, 7, 2017.
[91]
W. Zhou. A practical scalable shared-memory parallel algorithm for computing minimum spanning trees. Master's thesis, KIT, 2017.

Cited By

View all
  • (2024)A Framework for Parallelizing Approximate Gaussian EliminationProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659987(195-206)Online publication date: 17-Jun-2024
  • (2023)SketchNE: Embedding Billion-Scale Networks Accurately in One HourIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.325070335:10(10666-10680)Online publication date: 1-Oct-2023
  • (2023)Towards Lightweight and Automated Representation Learning System for NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.324316935:9(9613-9627)Online publication date: 1-Sep-2023
  • Show More Cited By

Index Terms

  1. Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SPAA '18: Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures
    July 2018
    437 pages
    ISBN:9781450357999
    DOI:10.1145/3210377
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 July 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. graph processing
    2. multicore
    3. parallel graph algorithms
    4. shared-memory
    5. work-efficiency

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SPAA '18
    Sponsor:

    Acceptance Rates

    SPAA '18 Paper Acceptance Rate 36 of 120 submissions, 30%;
    Overall Acceptance Rate 447 of 1,461 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)131
    • Downloads (Last 6 weeks)18
    Reflects downloads up to 25 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Framework for Parallelizing Approximate Gaussian EliminationProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659987(195-206)Online publication date: 17-Jun-2024
    • (2023)SketchNE: Embedding Billion-Scale Networks Accurately in One HourIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.325070335:10(10666-10680)Online publication date: 1-Oct-2023
    • (2023)Towards Lightweight and Automated Representation Learning System for NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.324316935:9(9613-9627)Online publication date: 1-Sep-2023
    • (2023)A Review on Optimality Investigation Strategies for the Balanced Assignment Problem2023 International Conference on Computational Intelligence and Sustainable Engineering Solutions (CISES)10.1109/CISES58720.2023.10183493(254-259)Online publication date: 28-Apr-2023
    • (2023)LOSC: A locality-optimized subgraph construction scheme for out-of-core graph processingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.10.005172(51-68)Online publication date: Feb-2023
    • (2022)Hierarchical agglomerative graph clustering in poly-logarithmic depthProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601936(22925-22940)Online publication date: 28-Nov-2022
    • (2022)Theoretically and practically efficient parallel nucleus decompositionProceedings of the VLDB Endowment10.14778/3494124.349414015:3(583-596)Online publication date: 4-Feb-2022
    • (2022)Parallel Five-cycle Counting AlgorithmsACM Journal of Experimental Algorithmics10.1145/355654127(1-23)Online publication date: 21-Oct-2022
    • (2022)Multi-queues can be state-of-the-art priority schedulersProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508432(353-367)Online publication date: 2-Apr-2022
    • (2022)A Block-Based Triangle Counting Algorithm on Heterogeneous EnvironmentsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.309324033:2(444-458)Online publication date: 1-Feb-2022
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media