Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable

Published: 22 April 2021 Publication History

Abstract

There has been significant recent interest in parallel graph processing due to the need to quickly analyze the large graphs available today. Many graph codes have been designed for distributed memory or external memory. However, today even the largest publicly-available real-world graph (the Hyperlink Web graph with over 3.5 billion vertices and 128 billion edges) can fit in the memory of a single commodity multicore server. Nevertheless, most experimental work in the literature report results on much smaller graphs, and the ones for the Hyperlink graph use distributed or external memory. Therefore, it is natural to ask whether we can efficiently solve a broad class of graph problems on this graph in memory.
This paper shows that theoretically-efficient parallel graph algorithms can scale to the largest publicly-available graphs using a single machine with a terabyte of RAM, processing them in minutes. We give implementations of theoretically-efficient parallel algorithms for 20 important graph problems. We also present the interfaces, optimizations, and graph processing techniques that we used in our implementations, which were crucial in enabling us to process these large graphs quickly. We show that the running times of our implementations outperform existing state-of-the-art implementations on the largest real-world graphs. For many of the problems that we consider, this is the first time they have been solved on graphs at this scale. We have made the implementations developed in this work publicly-available as the Graph Based Benchmark Suite (GBBS).

References

[1]
Christopher R. Aberger, Andrew Lamb, Susan Tu, Andres Nötzli, Kunle Olukotun, and Christopher Ré. 2017. EmptyHeaded: A relational engine for graph processing. ACM Trans. Database Syst. 42, 4 (2017), 20:1--20:44.
[2]
Umut A. Acar, Daniel Anderson, Guy E. Blelloch, and Laxman Dhulipala. 2019. Parallel batch-dynamic graph connectivity. In The 31st ACM on Symposium on Parallelism in Algorithms and Architectures, SPAA 2019, Phoenix, AZ, USA, June 22--24, 2019. 381--392.
[3]
Alok Aggarwal, Richard J. Anderson, and M.-Y. Kao. 1989. Parallel depth-first search in general directed graphs. In ACM Symposium on Theory of Computing (STOC). 297--308.
[4]
Masab Ahmad, Farrukh Hijaz, Qingchuan Shi, and Omer Khan. 2015. Crono: A benchmark suite for multithreaded graph algorithms executing on futuristic multicores. In IEEE International Symposium on Workload Characterization, IISWC. 44--55.
[5]
Noga Alon, László Babai, and Alon Itai. 1986. A fast and simple randomized parallel algorithm for the maximal independent set problem. J. Algorithms 7, 4 (1986), 567--583.
[6]
N. Alon, R. Yuster, and U. Zwick. 1997. Finding and counting given length cycles. Algorithmica 17, 3 (1997), 209--223.
[7]
Richard Anderson and Ernst W. Mayr. 1984. A P-complete Problem and Approximations to It. Technical Report.
[8]
Alexandr Andoni, Clifford Stein, and Peilin Zhong. 2020. Parallel approximate undirected shortest paths via low hop emulators. In ACM Symposium on Theory of Computing (STOC). ACM, 322--335.
[9]
N. S. Arora, R. D. Blumofe, and C. G. Plaxton. 2001. Thread scheduling for multiprogrammed multiprocessors. Theory of Computing Systems (TOCS) 34, 2 (01 Apr 2001).
[10]
Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. 2001. Thread scheduling for multiprogrammed multiprocessors. Theory of Computing Systems (TOCS) 34, 2 (2001), 115--144.
[11]
Baruch Awerbuch. 1985. Complexity of network synchronization. J. ACM 32, 4 (1985), 804--823.
[12]
Baruch Awerbuch, Bonnie Berger, Lenore Cowen, and David Peleg. 1992. Low-diameter graph decomposition is in NC. In Scandinavian Workshop on Algorithm Theory. 83--93.
[13]
Baruch Awerbuch and Y. Shiloach. 1983. New connectivity and MSF algorithms for ultracomputer and PRAM. In International Conference on Parallel Processing (ICPP). 175--179.
[14]
David A. Bader and Guojing Cong. 2006. Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs. J. Parallel Distrib. Comput. 66, 11 (2006), 1366--1378.
[15]
David A. Bader and Kamesh Madduri. 2005. Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors. In IEEE International Conference on High-Performance Computing (HiPC). 465--476.
[16]
David A. Bader and Kamesh Madduri. 2006. Designing multithreaded algorithms for breadth-first search and st-connectivity on the Cray MTA-2. In International Conference on Parallel Processing (ICPP). 523--530.
[17]
Bahman Bahmani, Ashish Goel, and Kamesh Munagala. 2014. Efficient primal-dual graph algorithms for MapReduce. In International Workshop on Algorithms and Models for the Web-Graph. 59--78.
[18]
Bahman Bahmani, Ravi Kumar, and Sergei Vassilvitskii. 2012. Densest subgraph in streaming and MapReduce. Proc. VLDB Endow. 5, 5 (2012), 454--465.
[19]
Georg Baier, Ekkehard Köhler, and Martin Skutella. 2005. The k-splittable flow problem. Algorithmica 42, 3--4 (2005), 231--248.
[20]
MohammadHossein Bateni, Soheil Behnezhad, Mahsa Derakhshan, MohammadTaghi Hajiaghayi, Raimondas Kiveris, Silvio Lattanzi, and Vahab Mirrokni. 2017. Affinity clustering: Hierarchical clustering at scale. In Advances in Neural Information Processing Systems. 6864--6874.
[21]
Scott Beamer, Krste Asanović, and David Patterson. 2013. Direction-optimizing breadth-first search. Scientific Programming 21, 3--4 (2013), 137--148.
[22]
Scott Beamer, Krste Asanovic, and David A. Patterson. 2015. The GAP benchmark suite. CoRR abs/1508.03619 (2015).
[23]
Naama Ben-David, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Yan Gu, Charles McGuffey, and Julian Shun. 2018. Implicit decomposition for write-efficient connectivity algorithms. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 711--722.
[24]
Bonnie Berger, John Rompel, and Peter W. Shor. 1994. Efficient NC algorithms for set cover with applications to learning and geometry. J. Computer and System Sciences 49, 3 (Dec. 1994), 454--477.
[25]
Marcel Birn, Vitaly Osipov, Peter Sanders, Christian Schulz, and Nodari Sitchinava. 2013. Efficient parallel and external matching. In European Conference on Parallel Processing (Euro-Par). 659--670.
[26]
Guy E. Blelloch. 1993. Prefix sums and their applications. In Synthesis of Parallel Algorithms, John Reif (Ed.). Morgan Kaufmann.
[27]
Guy E. Blelloch, Daniel Anderson, and Laxman Dhulipala. 2020. ParlayLib - A toolkit for parallel algorithms on shared-memory multicore machines. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 507--509.
[28]
Guy E. Blelloch and Laxman Dhulipala. 2018. Introduction to Parallel Algorithms. http://www.cs.cmu.edu/realworld/slidesS18/parallelChap.pdf. Carnegie Mellon University.
[29]
Guy E. Blelloch, Laxman Dhulipala, Phillip B. Gibbons, Yan Gu, Charlie McGuffey, and Julian Shun. 2021. The read-only semi-external model. In SIAM/ACM Symposium on Algorithmic Principles of Computer Systems (APOCS). 70--84.
[30]
Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, and Julian Shun. 2012. Internally deterministic algorithms can be fast. In ACM Symposium on Principles and Practice of Parallel Programming (PPoPP). 181--192.
[31]
Guy E. Blelloch, Jeremy T. Fineman, Yan Gu, and Yihan Sun. 2020. Optimal parallel algorithms in the binary-forking model. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 89--102.
[32]
Guy E. Blelloch, Jeremy T. Fineman, and Julian Shun. 2012. Greedy sequential maximal independent set and matching are parallel on average. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 308--317.
[33]
Guy E. Blelloch, Yan Gu, Julian Shun, and Yihan Sun. 2016. Parallelism in randomized incremental algorithms. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 467--478.
[34]
Guy E. Blelloch, Yan Gu, and Yihan Sun. 2017. Efficient construction on probabilistic tree embeddings. In Intl. Colloq. on Automata, Languages and Programming (ICALP). 26:1--26:14.
[35]
Guy E. Blelloch, Anupam Gupta, Ioannis Koutis, Gary L Miller, Richard Peng, and Kanat Tangwongsan. 2014. Nearly-linear work parallel SDD solvers, low-diameter decomposition, and low-stretch subgraphs. Theory of Computing Systems 55, 3 (2014), 521--554.
[36]
Guy E. Blelloch, Richard Peng, and Kanat Tangwongsan. 2011. Linear-work greedy parallel approximate set cover and variants. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA).
[37]
Guy E. Blelloch, Harsha Vardhan Simhadri, and Kanat Tangwongsan. 2012. Parallel and I/O efficient set covering algorithms. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 82--90.
[38]
Robert D. Blumofe and Charles E. Leiserson. 1999. Scheduling multithreaded computations by work stealing. J. ACM 46, 5 (1999), 720--748.
[39]
Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph framework I: Compression techniques. In International World Wide Web Conference (WWW). 595--601.
[40]
Otakar Borůvka. 1926. O jistém problému minimálním. Práce Mor. Přírodověd. Spol. v Brně III 3 (1926), 37--58.
[41]
Ulrik Brandes. 2001. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology 25, 2 (2001), 163--177.
[42]
Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. In International World Wide Web Conference (WWW). 107--117.
[43]
Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener. 2000. Graph structure in the web. Computer Networks 33, 1--6 (2000), 309--320.
[44]
Moses Charikar. 2000. Greedy approximation algorithms for finding dense components in a graph. In International Workshop on Approximation Algorithms for Combinatorial Optimization. 84--95.
[45]
Richard Cole, Philip N. Klein, and Robert E. Tarjan. 1996. Finding minimum spanning forests in logarithmic time and linear work using random sampling. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 243--250.
[46]
Guojing Cong and David A. Bader. 2005. An experimental study of parallel biconnected components algorithms on symmetric multiprocessors (SMPs). In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 9--18.
[47]
Guojing Cong and Ilie Gabriel Tanase. 2016. Composable locality optimizations for accelerating parallel forest computations. In IEEE International Conference on High Performance Computing and Communications (HPCC). 190--197.
[48]
Don Coppersmith, Lisa Fleischer, Bruce Hendrickson, and Ali Pinar. 2003. A Divide-and-conquer Algorithm for Identifying Strongly Connected Components. Technical Report RC23744. IBM Research.
[49]
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms (3 ed.). MIT Press.
[50]
Naga Shailaja Dasari, Ranjan Desh, and Mohammad Zubair. 2014. ParK: An efficient algorithm for k-core decomposition on multicore processors. In IEEE International Conference on Big Data (BigData). 9--16.
[51]
Roshan Dathathri, Gurbinder Gill, Loc Hoang, Hoang-Vu Dang, Alex Brooks, Nikoli Dryden, Marc Snir, and Keshav Pingali. 2018. Gluon: A communication-optimizing substrate for distributed heterogeneous graph analytics. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 752--768.
[52]
Laxman Dhulipala, Guy E. Blelloch, and Julian Shun. 2017. Julienne: A framework for parallel graph algorithms using work-efficient bucketing. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 293--304.
[53]
Laxman Dhulipala, Guy E. Blelloch, and Julian Shun. 2018. Theoretically efficient parallel graph algorithms can be fast and scalable. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 293--304.
[54]
Laxman Dhulipala, Guy E. Blelloch, and Julian Shun. 2019. Low-latency graph streaming using compressed purely-functional trees. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 918--934.
[55]
Laxman Dhulipala, David Durfee, Janardhan Kulkarni, Richard Peng, Saurabh Sawlani, and Xiaorui Sun. 2020. Parallel batch-dynamic graphs: Algorithms and lower bounds. In ACM-SIAM Symposium on Discrete Algorithms (SODA). 1300--1319.
[56]
Laxman Dhulipala, Igor Kabiljo, Brian Karrer, Giuseppe Ottaviano, Sergey Pupyrev, and Alon Shalita. 2016. Compressing graphs and indexes with recursive graph bisection. In ACM International Conference on Knowledge Discovery and Data Mining (KDD). 1535--1544.
[57]
Laxman Dhulipala, Quanquan C. Liu, Julian Shun, and Shangdi Yu. 2021. Parallel batch-dynamic k-clique counting. In SIAM/ACM Symposium on Algorithmic Principles of Computer Systems (APOCS). 129--143.
[58]
Laxman Dhulipala, Charlie McGuffey, Hongbo Kang, Yan Gu, Guy E. Blelloch, Phillip B. Gibbons, and Julian Shun. 2020. Sage: Parallel semi-asymmetric graph algorithms for NVRAMs. Proc. VLDB Endow. 13, 9 (2020), 1598--1613.
[59]
Laxman Dhulipala, Jessica Shi, Tom Tseng, Guy E. Blelloch, and Julian Shun. 2020. The graph based benchmark suite (GBBS). In International Workshop on Graph Data Management Experiences and Systems (GRADES) and Network Data Analytics (NDA). 11:1--11:8.
[60]
Ran Duan, Kaifeng Lyu, and Yuanhang Xie. 2018. Single-source bottleneck path algorithm faster than sorting for sparse graphs. In Intl. Colloq. on Automata, Languages and Programming (ICALP). 43:1--43:14.
[61]
James A. Edwards and Uzi Vishkin. 2012. Better speedups using simpler parallel programming for graph connectivity and biconnectivity. In International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM). 103--114.
[62]
Martin Ester, Hans-Peter Kriegel, Jorg Sander, and Xiaowei Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In ACM International Conference on Knowledge Discovery and Data Mining (KDD). 226--231.
[63]
Jeremy T. Fineman. 2018. Nearly work-efficient parallel algorithm for digraph reachability. In ACM Symposium on Theory of Computing (STOC). 457--470.
[64]
Manuela Fischer and Andreas Noever. 2018. Tight analysis of parallel randomized greedy MIS. In ACM-SIAM Symposium on Discrete Algorithms (SODA). 2152--2160.
[65]
Lisa K. Fleischer, Bruce Hendrickson, and Ali Pinar. 2000. On identifying strongly connected components in parallel. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 505--511.
[66]
Lester Randolph Ford and Delbert R. Fulkerson. 2009. Maximal flow through a network. In Classic Papers in Combinatorics. Springer, 243--248.
[67]
Hillel Gazit. 1991. An optimal randomized parallel algorithm for finding connected components in a graph. SIAM J. on Computing 20, 6 (Dec. 1991), 1046--1067.
[68]
Hillel Gazit and Gary L. Miller. 1988. An improved parallel algorithm that computes the BFS numbering of a directed graph. Inform. Process. Lett. 28, 2 (1988), 61--65.
[69]
Gurbinder Gill, Roshan Dathathri, Loc Hoang, Ramesh Peri, and Keshav Pingali. 2020. Single machine graph analytics on massive datasets using intel optane DC persistent memory. Proc. VLDB Endow. 13, 8 (2020), 1304--13.
[70]
A. V. Goldberg. 1984. Finding a Maximum Density Subgraph. Technical Report UCB/CSD-84-171. Berkeley, CA, USA.
[71]
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In USENIX Symposium on Operating Systems Design and Implementation (OSDI). 17--30.
[72]
Oded Green, Luis M. Munguia, and David A. Bader. 2014. Load balanced clustering coefficients. In Workshop on Parallel programming for Analytics Applications (PPAA). 3--10.
[73]
Raymond Greenlaw, H. James Hoover, and Walter L. Ruzzo. 1995. Limits to Parallel Computation: P-completeness Theory. Oxford University Press, Inc.
[74]
Yan Gu, Julian Shun, Yihan Sun, and Guy E. Blelloch. 2015. A top-down parallel semisort. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 24--34.
[75]
Shay Halperin and Uri Zwick. 1996. An optimal randomized logarithmic time connectivity algorithm for the EREW PRAM. J. Comput. Syst. Sci. 53, 3 (1996), 395--416.
[76]
Shay Halperin and Uri Zwick. 2001. Optimal randomized EREW PRAM algorithms for finding spanning forests. 39, 1 (2001), 1--46.
[77]
William Hasenplaugh, Tim Kaler, Tao B. Schardl, and Charles E. Leiserson. 2014. Ordering heuristics for parallel graph coloring. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 166--177.
[78]
Loc Hoang, Matteo Pontecorvi, Roshan Dathathri, Gurbinder Gill, Bozhi You, Keshav Pingali, and Vijaya Ramachandran. 2019. A round-efficient distributed betweenness centrality algorithm. In ACM Symposium on Principles and Practice of Parallel Programming (PPoPP). 272--286.
[79]
Sungpack Hong, Nicole C. Rodia, and Kunle Olukotun. 2013. On fast parallel detection of strongly connected components (SCC) in small-world graphs. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC). 92:1--92:11.
[80]
John Hopcroft and Robert Tarjan. 1973. Algorithm 447: Efficient algorithms for graph manipulation. Commun. ACM 16, 6 (1973), 372--378.
[81]
Alexandru Iosup, Tim Hegeman, Wing Lung Ngai, Stijn Heldens, Arnau Prat-Pérez, Thomas Manhardto, Hassan Chafio, Mihai Capotă, Narayanan Sundaram, Michael Anderson, Ilie Gabriel Tănase, Yinglong Xia, Lifeng Nai, and Peter Boncz. 2016. LDBC graphalytics: A benchmark for large-scale graph analysis on parallel and distributed platforms. Proc. VLDB Endow. 9, 13 (Sept. 2016), 1317--1328.
[82]
Amos Israeli and Y. Shiloach. 1986. An improved parallel algorithm for maximal matching. Inform. Process. Lett. 22, 2 (1986), 57--60.
[83]
Alon Itai and Michael Rodeh. 1977. Finding a minimum circuit in a graph. In ACM Symposium on Theory of Computing (STOC). 1--10.
[84]
J. Jaja. 1992. Introduction to Parallel Algorithms. Addison-Wesley Professional.
[85]
Sang-Woo Jun, Andy Wright, Sizhuo Zhang, Shuotao Xu, et al. 2018. GraFBoost: Using accelerated flash storage for external graph analytics. In ACM International Symposium on Computer Architecture (ISCA). 411--424.
[86]
H. Kabir and K. Madduri. 2017. Parallel k-core decomposition on multicore platforms. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 1482--1491.
[87]
David R. Karger, Philip N. Klein, and Robert E. Tarjan. 1995. A randomized linear-time algorithm to find minimum spanning trees. J. ACM 42, 2 (March 1995), 321--328.
[88]
Richard M. Karp and Vijaya Ramachandran. 1990. Parallel algorithms for shared-memory machines. In Handbook of Theoretical Computer Science (Vol. A), Jan van Leeuwen (Ed.). MIT Press, Cambridge, MA, USA, 869--941.
[89]
Richard M. Karp and Avi Wigderson. 1984. A fast parallel algorithm for the maximal independent set problem. In ACM Symposium on Theory of Computing (STOC). 266--272.
[90]
Jinha Kim, Wook-Shin Han, Sangyeon Lee, Kyungyeol Park, and Hwanjo Yu. 2014. OPT: A new framework for overlapped and parallel triangulation in large-scale graphs. In ACM International Conference on Management of Data (SIGMOD). 637--648.
[91]
Ravi Kumar, Benjamin Moseley, Sergei Vassilvitskii, and Andrea Vattani. 2015. Fast greedy algorithms in MapReduce and streaming. ACM Trans. Parallel Comput. 2, 3 (2015), 14:1--14:22.
[92]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media? In International World Wide Web Conference (WWW). 591--600.
[93]
Matthieu Latapy. 2008. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci. 407, 1--3 (2008), 458--473.
[94]
Charles E. Leiserson and Tao B. Schardl. 2010. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 303--314.
[95]
Jason Li. 2020. Faster parallel algorithm for approximate shortest path. In ACM Symposium on Theory of Computing (STOC). 308--321.
[96]
Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5, 8 (April 2012).
[97]
Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M. Hellerstein. 2010. GraphLab: A new parallel framework for machine learning. In Conference on Uncertainty in Artificial Intelligence (UAI). 340--349.
[98]
Michael Luby. 1986. A simple parallel algorithm for the maximal independent set problem. SIAM J. Comput. (1986), 1036--1053.
[99]
Steffen Maass, Changwoo Min, Sanidhya Kashyap, Woonhak Kang, Mohan Kumar, and Taesoo Kim. 2017. Mosaic: Processing a trillion-edge graph on a single machine. In European Conference on Computer Systems (EuroSys). 527--543.
[100]
Saeed Maleki, Donald Nguyen, Andrew Lenharth, María Garzarán, David Padua, and Keshav Pingali. 2016. DSMR: A parallel algorithm for single-source shortest path problem. In Proceedings of the 2016 International Conference on Supercomputing (ICS). 32:1--32:14.
[101]
Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In ACM International Conference on Management of Data (SIGMOD). 135--146.
[102]
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge university press.
[103]
Yael Maon, Baruch Schieber, and Uzi Vishkin. 1986. Parallel ear decomposition search (EDS) and st-numbering in graphs. Theoretical Computer Science 47 (1986), 277--298.
[104]
David W. Matula and Leland L. Beck. 1983. Smallest-last ordering and clustering and graph coloring algorithms. J. ACM 30, 3 (July 1983), 417--427.
[105]
Robert Ryan McCune, Tim Weninger, and Greg Madey. 2015. Thinking like a vertex: A survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Comput. Surv. 48, 2, Article 25 (Oct. 2015), 39 pages.
[106]
William Mclendon Iii, Bruce Hendrickson, Steven J. Plimpton, and Lawrence Rauchwerger. 2005. Finding strongly connected components in distributed graphs. J. Parallel Distrib. Comput. 65, 8 (2005), 901--910.
[107]
Robert Meusel, Sebastiano Vigna, Oliver Lehmberg, and Christian Bizer. 2015. The graph structure in the web--analyzed on different aggregation levels. The Journal of Web Science 1, 1 (2015), 33--47.
[108]
Ulrich Meyer and Peter Sanders. 2000. Parallel shortest path for arbitrary graphs. In European Conference on Parallel Processing (Euro-Par). 461--470.
[109]
Ulrich Meyer and Peter Sanders. 2003. Δ-stepping: A parallelizable shortest path algorithm. J. Algorithms 49, 1 (2003), 114--152.
[110]
Gary L. Miller, Richard Peng, Adrian Vladu, and Shen Chen Xu. 2015. Improved parallel algorithms for spanners and hopsets. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 192--201.
[111]
Gary L. Miller, Richard Peng, and Shen Chen Xu. 2013. Parallel graph decompositions using random shifts. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 196--203.
[112]
Gary L. Miller and Vijaya Ramachandran. 1992. A new graph triconnectivity algorithm and its parallelization. Combinatorica 12, 1 (1992), 53--76.
[113]
Lifeng Nai, Yinglong Xia, Ilie G. Tanase, Hyesoon Kim, and Ching-Yung Lin. 2015. GraphBIG: Understanding graph computing in the context of industrial solutions. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC). 69:1--69:12.
[114]
Mark E. J. Newman. 2003. The structure and function of complex networks. SIAM Rev. 45, 2 (2003), 167--256.
[115]
Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A lightweight infrastructure for graph analytics. In ACM Symposium on Operating Systems Principles (SOSP). 456--471.
[116]
Sadegh Nobari, Thanh-Tung Cao, Panagiotis Karras, and Stéphane Bressan. 2012. Scalable parallel minimum spanning forest computation. In ACM Symposium on Principles and Practice of Parallel Programming (PPoPP). 205--214.
[117]
Mark Ortmann and Ulrik Brandes. 2014. Triangle listing algorithms: Back from the diversion. In Algorithm Engineering and Experiments (ALENEX). 1--8.
[118]
Rasmus Pagh and Francesco Silvestri. 2014. The input/output complexity of triangle enumeration. In ACM Symposium on Principles of Database Systems (PODS). 224--233.
[119]
M. M. A. Patwary, P. Refsnes, and F. Manne. 2012. Multi-core spanning forest algorithms using the disjoint-set data structure. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 827--835.
[120]
David Peleg and Alejandro A Schäffer. 1989. Graph spanners. Journal of Graph Theory 13, 1 (1989), 99--116.
[121]
Seth Pettie and Vijaya Ramachandran. 2002. A randomized time-work optimal parallel algorithm for finding a minimum spanning forest. SIAM J. on Computing 31, 6 (2002), 1879--1895.
[122]
C. A. Phillips. 1989. Parallel graph contraction. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 148--157.
[123]
Chung Keung Poon and Vijaya Ramachandran. 1997. A randomized linear work EREW PRAM algorithm to find a minimum spanning forest. In International Symposium on Algorithms and Computation (ISAAC). 212--222.
[124]
Sridhar Rajagopalan and Vijay V. Vazirani. 1999. Primal-dual RNC approximation algorithms for set cover and covering integer programs. SIAM J. on Computing 28, 2 (Feb. 1999), 525--540.
[125]
Vijaya Ramachandran. 1989. A framework for parallel graph algorithm design. In International Symposium on Optimal Algorithms. 33--40.
[126]
Vijaya Ramachandran. 1993. Parallel open ear decomposition with applications to graph biconnectivity and triconnectivity. In Synthesis of Parallel Algorithms, John H Reif (Ed.). Morgan Kaufmann Publishers Inc.
[127]
J. Reif. 1985. Optimal Parallel Algorithms for Integer Sorting and Graph Connectivity. Technical Report TR-08-85. Harvard University.
[128]
Ahmet Erdem Sariyuce, C. Seshadhri, and Ali Pinar. 2018. Parallel local algorithms for core, truss, and nucleus decompositions. Proc. VLDB Endow. 12, 1 (2018), 43--56.
[129]
T. Schank. 2007. Algorithmic Aspects of Triangle-Based Network Analysis. Ph.D. Dissertation. Universitat Karlsruhe.
[130]
Thomas Schank and Dorothea Wagner. 2005. Finding, counting and listing all triangles in large graphs, an experimental study. In Workshop on Experimental and Efficient Algorithms (WEA). 606--609.
[131]
Warren Schudy. 2008. Finding strongly connected components in parallel using O(log2 N) reachability queries. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 146--151.
[132]
Stephen B. Seidman. 1983. Network structure and minimum degree. Soc. Networks 5, 3 (1983), 269--287.
[133]
Martin Sevenich, Sungpack Hong, Adam Welc, and Hassan Chafi. 2014. Fast in-memory triangle listing for large real-world graphs. In Workshop on Social Network Mining and Analysis. Article 2, 2:1--2:9 pages.
[134]
Jessica Shi, Laxman Dhulipala, and Julian Shun. 2020. Parallel clique counting and peeling algorithms. arXiv preprint arXiv:2002.10047 (2020).
[135]
Yossi Shiloach and Uzi Vishkin. 1982. An O(log n ) parallel connectivity algorithm. J. Algorithms 3, 1 (1982), 57--67.
[136]
Julian Shun and Guy E. Blelloch. 2013. Ligra: A lightweight graph processing framework for shared memory. In ACM Symposium on Principles and Practice of Parallel Programming (PPoPP). 135--146.
[137]
Julian Shun and Guy E. Blelloch. 2014. Phase-concurrent hash tables for determinism. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 96--107.
[138]
Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, and Phillip B. Gibbons. 2013. Reducing contention through priority updates. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 299--300.
[139]
Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Aapo Kyrola, Harsha Vardhan Simhadri, and Kanat Tangwongsan. 2012. Brief announcement: The problem based benchmark suite. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA).
[140]
Julian Shun, Laxman Dhulipala, and Guy E. Blelloch. 2014. A simple and practical linear-work parallel algorithm for connectivity. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 143--153.
[141]
Julian Shun, Laxman Dhulipala, and Guy E. Blelloch. 2015. Smaller and faster: Parallel processing of compressed graphs with Ligra+. In Data Compression Conference (DCC). 403--412.
[142]
Julian Shun and Kanat Tangwongsan. 2015. Multicore triangle computations without tuning. In IEEE International Conference on Data Engineering (ICDE). 149--160.
[143]
George M. Slota and Kamesh Madduri. 2014. Simple parallel biconnectivity algorithms for multicore platforms. In IEEE International Conference on High-Performance Computing (HiPC). 1--10.
[144]
George M. Slota, Sivasankaran Rajamanickam, and Kamesh Madduri. 2014. BFS and coloring-based parallel algorithms for strongly connected components and related problems. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 550--559.
[145]
George M. Slota, Sivasankaran Rajamanickam, and Kamesh Madduri. 2015. Supercomputing for Web Graph Analytics. Technical Report SAND2015-3087C. Sandia National Lab.
[146]
G. M. Slota, S. Rajamanickam, and K. Madduri. 2016. A case study of complex graph analysis in distributed memory: Implementation and optimization. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 293--302.
[147]
Stergios Stergiou, Dipen Rughwani, and Kostas Tsioutsiouliklis. 2018. Shortcutting label propagation for distributed connected components. In International Conference on Web Search and Data Mining (WSDM). 540--546.
[148]
Robert E. Tarjan and Uzi Vishkin. 1985. An efficient parallel biconnectivity algorithm. SIAM J. on Computing 14, 4 (1985), 862--874.
[149]
Mikkel Thorup and Uri Zwick. 2005. Approximate distance oracles. J. ACM 52, 1 (2005), 1--24.
[150]
Tom Tseng, Laxman Dhulipala, and Julian Shun. 2021. Parallel index-based structural graph clustering and its approximation. To appear in ACM International Conference on Management of Data (SIGMOD) (2021).
[151]
Dominic J. A. Welsh and Martin B. Powell. 1967. An upper bound for the chromatic number of a graph and its application to timetabling problems. Comput. J. 10, 1 (1967), 85--86.
[152]
Da Yan, Yingyi Bu, Yuanyuan Tian, and Amol Deshpande. 2017. Big graph analytics platforms. Foundations and Trends in Databases 7, 1--2 (2017), 1--195.
[153]
Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian Shun, and Saman Amarasinghe. 2018. GraphIt: A high-performance graph DSL. Object-Oriented Programming Systems, Languages,and Applications (OOPSLA) (2018), 121:1--121:30.
[154]
Da Zheng, Disa Mhembere, Randal Burns, Joshua Vogelstein, Carey E. Priebe, and Alexander S. Szalay. 2015. FlashGraph: Processing billion-node graphs on an array of commodity SSDs. In USENIX Conference on File and Storage Technologies (FAST). 45--58.
[155]
Wei Zhou. 2017. A Practical Scalable Shared-Memory Parallel Algorithm for Computing Minimum Spanning Trees. Master’s thesis. KIT.

Cited By

View all
  • (2024)BYO: A Unified Framework for Benchmarking Large-Scale Graph ContainersProceedings of the VLDB Endowment10.14778/3665844.366585917:9(2307-2320)Online publication date: 1-May-2024
  • (2024)GraphScale: A Framework to Enable Machine Learning over Billion-node GraphsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680021(4514-4521)Online publication date: 21-Oct-2024
  • (2024)ParlayANN: Scalable and Deterministic Parallel Graph-Based Approximate Nearest Neighbor Search AlgorithmsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638475(270-285)Online publication date: 2-Mar-2024
  • Show More Cited By

Index Terms

  1. Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Parallel Computing
    ACM Transactions on Parallel Computing  Volume 8, Issue 1
    March 2021
    189 pages
    ISSN:2329-4949
    EISSN:2329-4957
    DOI:10.1145/3446672
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 April 2021
    Accepted: 01 September 2020
    Received: 01 May 2019
    Published in TOPC Volume 8, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Parallel graph algorithms
    2. parallel graph processing

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1,002
    • Downloads (Last 6 weeks)150
    Reflects downloads up to 24 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)BYO: A Unified Framework for Benchmarking Large-Scale Graph ContainersProceedings of the VLDB Endowment10.14778/3665844.366585917:9(2307-2320)Online publication date: 1-May-2024
    • (2024)GraphScale: A Framework to Enable Machine Learning over Billion-node GraphsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680021(4514-4521)Online publication date: 21-Oct-2024
    • (2024)ParlayANN: Scalable and Deterministic Parallel Graph-Based Approximate Nearest Neighbor Search AlgorithmsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638475(270-285)Online publication date: 2-Mar-2024
    • (2024)When Is Parallelism Fearless and Zero-Cost with Rust?Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659966(27-40)Online publication date: 17-Jun-2024
    • (2024)Parallel and (Nearly) Work-Efficient Dynamic ProgrammingProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659958(219-232)Online publication date: 17-Jun-2024
    • (2024)Teaching Parallel Algorithms Using the Binary-Forking Model2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00080(346-351)Online publication date: 27-May-2024
    • (2024)How to create graphs in hardware-constrained environments? Choosing the best creation approach via machine learning-based predictive modelsInternational Journal of Data Science and Analytics10.1007/s41060-023-00495-5Online publication date: 2-Feb-2024
    • (2024)Allok: a machine learning approach for efficient graph execution on CPU–GPU clustersThe Journal of Supercomputing10.1007/s11227-024-06079-980:13(18838-18865)Online publication date: 1-Sep-2024
    • (2023)CLAP: Locality Aware and Parallel Triangle Counting with Content Addressable Memory2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10136997(1-6)Online publication date: Apr-2023
    • (2023)Fast and Space-Efficient Parallel Algorithms for Influence MaximizationProceedings of the VLDB Endowment10.14778/3632093.363210417:3(400-413)Online publication date: 1-Nov-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media