research-article

Open access

Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable

Authors:

Laxman Dhulipala,

Guy E. Blelloch,

Julian ShunAuthors Info & Claims

ACM Transactions on Parallel Computing (TOPC), Volume 8, Issue 1

Article No.: 4, Pages 1 - 70

https://doi.org/10.1145/3434393

Published: 22 April 2021 Publication History

All formats PDF

Abstract

There has been significant recent interest in parallel graph processing due to the need to quickly analyze the large graphs available today. Many graph codes have been designed for distributed memory or external memory. However, today even the largest publicly-available real-world graph (the Hyperlink Web graph with over 3.5 billion vertices and 128 billion edges) can fit in the memory of a single commodity multicore server. Nevertheless, most experimental work in the literature report results on much smaller graphs, and the ones for the Hyperlink graph use distributed or external memory. Therefore, it is natural to ask whether we can efficiently solve a broad class of graph problems on this graph in memory.

This paper shows that theoretically-efficient parallel graph algorithms can scale to the largest publicly-available graphs using a single machine with a terabyte of RAM, processing them in minutes. We give implementations of theoretically-efficient parallel algorithms for 20 important graph problems. We also present the interfaces, optimizations, and graph processing techniques that we used in our implementations, which were crucial in enabling us to process these large graphs quickly. We show that the running times of our implementations outperform existing state-of-the-art implementations on the largest real-world graphs. For many of the problems that we consider, this is the first time they have been solved on graphs at this scale. We have made the implementations developed in this work publicly-available as the Graph Based Benchmark Suite (GBBS).

References

[1]

Christopher R. Aberger, Andrew Lamb, Susan Tu, Andres Nötzli, Kunle Olukotun, and Christopher Ré. 2017. EmptyHeaded: A relational engine for graph processing. ACM Trans. Database Syst. 42, 4 (2017), 20:1--20:44.

Digital Library

[2]

Umut A. Acar, Daniel Anderson, Guy E. Blelloch, and Laxman Dhulipala. 2019. Parallel batch-dynamic graph connectivity. In The 31st ACM on Symposium on Parallelism in Algorithms and Architectures, SPAA 2019, Phoenix, AZ, USA, June 22--24, 2019. 381--392.

Digital Library

[3]

Alok Aggarwal, Richard J. Anderson, and M.-Y. Kao. 1989. Parallel depth-first search in general directed graphs. In ACM Symposium on Theory of Computing (STOC). 297--308.

Digital Library

[4]

Masab Ahmad, Farrukh Hijaz, Qingchuan Shi, and Omer Khan. 2015. Crono: A benchmark suite for multithreaded graph algorithms executing on futuristic multicores. In IEEE International Symposium on Workload Characterization, IISWC. 44--55.

Digital Library

[5]

Noga Alon, László Babai, and Alon Itai. 1986. A fast and simple randomized parallel algorithm for the maximal independent set problem. J. Algorithms 7, 4 (1986), 567--583.

Digital Library

[6]

N. Alon, R. Yuster, and U. Zwick. 1997. Finding and counting given length cycles. Algorithmica 17, 3 (1997), 209--223.

[7]

Richard Anderson and Ernst W. Mayr. 1984. A P-complete Problem and Approximations to It. Technical Report.

[8]

Alexandr Andoni, Clifford Stein, and Peilin Zhong. 2020. Parallel approximate undirected shortest paths via low hop emulators. In ACM Symposium on Theory of Computing (STOC). ACM, 322--335.

Digital Library

[9]

N. S. Arora, R. D. Blumofe, and C. G. Plaxton. 2001. Thread scheduling for multiprogrammed multiprocessors. Theory of Computing Systems (TOCS) 34, 2 (01 Apr 2001).

[10]

Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. 2001. Thread scheduling for multiprogrammed multiprocessors. Theory of Computing Systems (TOCS) 34, 2 (2001), 115--144.

[11]

Baruch Awerbuch. 1985. Complexity of network synchronization. J. ACM 32, 4 (1985), 804--823.

Digital Library

[12]

Baruch Awerbuch, Bonnie Berger, Lenore Cowen, and David Peleg. 1992. Low-diameter graph decomposition is in NC. In Scandinavian Workshop on Algorithm Theory. 83--93.

[13]

Baruch Awerbuch and Y. Shiloach. 1983. New connectivity and MSF algorithms for ultracomputer and PRAM. In International Conference on Parallel Processing (ICPP). 175--179.

[14]

David A. Bader and Guojing Cong. 2006. Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs. J. Parallel Distrib. Comput. 66, 11 (2006), 1366--1378.

Digital Library

[15]

David A. Bader and Kamesh Madduri. 2005. Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors. In IEEE International Conference on High-Performance Computing (HiPC). 465--476.

[16]

David A. Bader and Kamesh Madduri. 2006. Designing multithreaded algorithms for breadth-first search and st-connectivity on the Cray MTA-2. In International Conference on Parallel Processing (ICPP). 523--530.

[17]

Bahman Bahmani, Ashish Goel, and Kamesh Munagala. 2014. Efficient primal-dual graph algorithms for MapReduce. In International Workshop on Algorithms and Models for the Web-Graph. 59--78.

[18]

Bahman Bahmani, Ravi Kumar, and Sergei Vassilvitskii. 2012. Densest subgraph in streaming and MapReduce. Proc. VLDB Endow. 5, 5 (2012), 454--465.

Digital Library

[19]

Georg Baier, Ekkehard Köhler, and Martin Skutella. 2005. The k-splittable flow problem. Algorithmica 42, 3--4 (2005), 231--248.

Digital Library

[20]

MohammadHossein Bateni, Soheil Behnezhad, Mahsa Derakhshan, MohammadTaghi Hajiaghayi, Raimondas Kiveris, Silvio Lattanzi, and Vahab Mirrokni. 2017. Affinity clustering: Hierarchical clustering at scale. In Advances in Neural Information Processing Systems. 6864--6874.

[21]

Scott Beamer, Krste Asanović, and David Patterson. 2013. Direction-optimizing breadth-first search. Scientific Programming 21, 3--4 (2013), 137--148.

[22]

Scott Beamer, Krste Asanovic, and David A. Patterson. 2015. The GAP benchmark suite. CoRR abs/1508.03619 (2015).

[23]

Naama Ben-David, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Yan Gu, Charles McGuffey, and Julian Shun. 2018. Implicit decomposition for write-efficient connectivity algorithms. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 711--722.

[24]

Bonnie Berger, John Rompel, and Peter W. Shor. 1994. Efficient NC algorithms for set cover with applications to learning and geometry. J. Computer and System Sciences 49, 3 (Dec. 1994), 454--477.

Digital Library

[25]

Marcel Birn, Vitaly Osipov, Peter Sanders, Christian Schulz, and Nodari Sitchinava. 2013. Efficient parallel and external matching. In European Conference on Parallel Processing (Euro-Par). 659--670.

Digital Library

[26]

Guy E. Blelloch. 1993. Prefix sums and their applications. In Synthesis of Parallel Algorithms, John Reif (Ed.). Morgan Kaufmann.

[27]

Guy E. Blelloch, Daniel Anderson, and Laxman Dhulipala. 2020. ParlayLib - A toolkit for parallel algorithms on shared-memory multicore machines. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 507--509.

Digital Library

[28]

Guy E. Blelloch and Laxman Dhulipala. 2018. Introduction to Parallel Algorithms. http://www.cs.cmu.edu/realworld/slidesS18/parallelChap.pdf. Carnegie Mellon University.

[29]

Guy E. Blelloch, Laxman Dhulipala, Phillip B. Gibbons, Yan Gu, Charlie McGuffey, and Julian Shun. 2021. The read-only semi-external model. In SIAM/ACM Symposium on Algorithmic Principles of Computer Systems (APOCS). 70--84.

[30]

Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, and Julian Shun. 2012. Internally deterministic algorithms can be fast. In ACM Symposium on Principles and Practice of Parallel Programming (PPoPP). 181--192.

Digital Library

[31]

Guy E. Blelloch, Jeremy T. Fineman, Yan Gu, and Yihan Sun. 2020. Optimal parallel algorithms in the binary-forking model. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 89--102.

Digital Library

[32]

Guy E. Blelloch, Jeremy T. Fineman, and Julian Shun. 2012. Greedy sequential maximal independent set and matching are parallel on average. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 308--317.

Digital Library

[33]

Guy E. Blelloch, Yan Gu, Julian Shun, and Yihan Sun. 2016. Parallelism in randomized incremental algorithms. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 467--478.

Digital Library

[34]

Guy E. Blelloch, Yan Gu, and Yihan Sun. 2017. Efficient construction on probabilistic tree embeddings. In Intl. Colloq. on Automata, Languages and Programming (ICALP). 26:1--26:14.

[35]

Guy E. Blelloch, Anupam Gupta, Ioannis Koutis, Gary L Miller, Richard Peng, and Kanat Tangwongsan. 2014. Nearly-linear work parallel SDD solvers, low-diameter decomposition, and low-stretch subgraphs. Theory of Computing Systems 55, 3 (2014), 521--554.

Digital Library

[36]

Guy E. Blelloch, Richard Peng, and Kanat Tangwongsan. 2011. Linear-work greedy parallel approximate set cover and variants. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA).

Digital Library

[37]

Guy E. Blelloch, Harsha Vardhan Simhadri, and Kanat Tangwongsan. 2012. Parallel and I/O efficient set covering algorithms. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 82--90.

Digital Library

[38]

Robert D. Blumofe and Charles E. Leiserson. 1999. Scheduling multithreaded computations by work stealing. J. ACM 46, 5 (1999), 720--748.

Digital Library

[39]

Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph framework I: Compression techniques. In International World Wide Web Conference (WWW). 595--601.

Digital Library

[40]

Otakar Borůvka. 1926. O jistém problému minimálním. Práce Mor. Přírodověd. Spol. v Brně III 3 (1926), 37--58.

[41]

Ulrik Brandes. 2001. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology 25, 2 (2001), 163--177.

[42]

Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. In International World Wide Web Conference (WWW). 107--117.

Digital Library

[43]

Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener. 2000. Graph structure in the web. Computer Networks 33, 1--6 (2000), 309--320.

Digital Library

[44]

Moses Charikar. 2000. Greedy approximation algorithms for finding dense components in a graph. In International Workshop on Approximation Algorithms for Combinatorial Optimization. 84--95.

[45]

Richard Cole, Philip N. Klein, and Robert E. Tarjan. 1996. Finding minimum spanning forests in logarithmic time and linear work using random sampling. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 243--250.

[46]

Guojing Cong and David A. Bader. 2005. An experimental study of parallel biconnected components algorithms on symmetric multiprocessors (SMPs). In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 9--18.

[47]

Guojing Cong and Ilie Gabriel Tanase. 2016. Composable locality optimizations for accelerating parallel forest computations. In IEEE International Conference on High Performance Computing and Communications (HPCC). 190--197.

[48]

Don Coppersmith, Lisa Fleischer, Bruce Hendrickson, and Ali Pinar. 2003. A Divide-and-conquer Algorithm for Identifying Strongly Connected Components. Technical Report RC23744. IBM Research.

[49]

Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms (3 ed.). MIT Press.

Digital Library

[50]

Naga Shailaja Dasari, Ranjan Desh, and Mohammad Zubair. 2014. ParK: An efficient algorithm for k-core decomposition on multicore processors. In IEEE International Conference on Big Data (BigData). 9--16.

[51]

Roshan Dathathri, Gurbinder Gill, Loc Hoang, Hoang-Vu Dang, Alex Brooks, Nikoli Dryden, Marc Snir, and Keshav Pingali. 2018. Gluon: A communication-optimizing substrate for distributed heterogeneous graph analytics. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 752--768.

Digital Library

[52]

Laxman Dhulipala, Guy E. Blelloch, and Julian Shun. 2017. Julienne: A framework for parallel graph algorithms using work-efficient bucketing. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 293--304.

Digital Library

[53]

Laxman Dhulipala, Guy E. Blelloch, and Julian Shun. 2018. Theoretically efficient parallel graph algorithms can be fast and scalable. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 293--304.

Digital Library

[54]

Laxman Dhulipala, Guy E. Blelloch, and Julian Shun. 2019. Low-latency graph streaming using compressed purely-functional trees. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 918--934.

Digital Library

[55]

Laxman Dhulipala, David Durfee, Janardhan Kulkarni, Richard Peng, Saurabh Sawlani, and Xiaorui Sun. 2020. Parallel batch-dynamic graphs: Algorithms and lower bounds. In ACM-SIAM Symposium on Discrete Algorithms (SODA). 1300--1319.

[56]

Laxman Dhulipala, Igor Kabiljo, Brian Karrer, Giuseppe Ottaviano, Sergey Pupyrev, and Alon Shalita. 2016. Compressing graphs and indexes with recursive graph bisection. In ACM International Conference on Knowledge Discovery and Data Mining (KDD). 1535--1544.

Digital Library

[57]

Laxman Dhulipala, Quanquan C. Liu, Julian Shun, and Shangdi Yu. 2021. Parallel batch-dynamic k-clique counting. In SIAM/ACM Symposium on Algorithmic Principles of Computer Systems (APOCS). 129--143.

[58]

Laxman Dhulipala, Charlie McGuffey, Hongbo Kang, Yan Gu, Guy E. Blelloch, Phillip B. Gibbons, and Julian Shun. 2020. Sage: Parallel semi-asymmetric graph algorithms for NVRAMs. Proc. VLDB Endow. 13, 9 (2020), 1598--1613.

Digital Library

[59]

Laxman Dhulipala, Jessica Shi, Tom Tseng, Guy E. Blelloch, and Julian Shun. 2020. The graph based benchmark suite (GBBS). In International Workshop on Graph Data Management Experiences and Systems (GRADES) and Network Data Analytics (NDA). 11:1--11:8.

Digital Library

[60]

Ran Duan, Kaifeng Lyu, and Yuanhang Xie. 2018. Single-source bottleneck path algorithm faster than sorting for sparse graphs. In Intl. Colloq. on Automata, Languages and Programming (ICALP). 43:1--43:14.

[61]

James A. Edwards and Uzi Vishkin. 2012. Better speedups using simpler parallel programming for graph connectivity and biconnectivity. In International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM). 103--114.

[62]

Martin Ester, Hans-Peter Kriegel, Jorg Sander, and Xiaowei Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In ACM International Conference on Knowledge Discovery and Data Mining (KDD). 226--231.

[63]

Jeremy T. Fineman. 2018. Nearly work-efficient parallel algorithm for digraph reachability. In ACM Symposium on Theory of Computing (STOC). 457--470.

Digital Library

[64]

Manuela Fischer and Andreas Noever. 2018. Tight analysis of parallel randomized greedy MIS. In ACM-SIAM Symposium on Discrete Algorithms (SODA). 2152--2160.

[65]

Lisa K. Fleischer, Bruce Hendrickson, and Ali Pinar. 2000. On identifying strongly connected components in parallel. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 505--511.

[66]

Lester Randolph Ford and Delbert R. Fulkerson. 2009. Maximal flow through a network. In Classic Papers in Combinatorics. Springer, 243--248.

[67]

Hillel Gazit. 1991. An optimal randomized parallel algorithm for finding connected components in a graph. SIAM J. on Computing 20, 6 (Dec. 1991), 1046--1067.

Digital Library

[68]

Hillel Gazit and Gary L. Miller. 1988. An improved parallel algorithm that computes the BFS numbering of a directed graph. Inform. Process. Lett. 28, 2 (1988), 61--65.

Digital Library

[69]

Gurbinder Gill, Roshan Dathathri, Loc Hoang, Ramesh Peri, and Keshav Pingali. 2020. Single machine graph analytics on massive datasets using intel optane DC persistent memory. Proc. VLDB Endow. 13, 8 (2020), 1304--13.

Digital Library

[70]

A. V. Goldberg. 1984. Finding a Maximum Density Subgraph. Technical Report UCB/CSD-84-171. Berkeley, CA, USA.

[71]

Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In USENIX Symposium on Operating Systems Design and Implementation (OSDI). 17--30.

[72]

Oded Green, Luis M. Munguia, and David A. Bader. 2014. Load balanced clustering coefficients. In Workshop on Parallel programming for Analytics Applications (PPAA). 3--10.

[73]

Raymond Greenlaw, H. James Hoover, and Walter L. Ruzzo. 1995. Limits to Parallel Computation: P-completeness Theory. Oxford University Press, Inc.

Digital Library

[74]

Yan Gu, Julian Shun, Yihan Sun, and Guy E. Blelloch. 2015. A top-down parallel semisort. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 24--34.

[75]

Shay Halperin and Uri Zwick. 1996. An optimal randomized logarithmic time connectivity algorithm for the EREW PRAM. J. Comput. Syst. Sci. 53, 3 (1996), 395--416.

Digital Library

[76]

Shay Halperin and Uri Zwick. 2001. Optimal randomized EREW PRAM algorithms for finding spanning forests. 39, 1 (2001), 1--46.

[77]

William Hasenplaugh, Tim Kaler, Tao B. Schardl, and Charles E. Leiserson. 2014. Ordering heuristics for parallel graph coloring. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 166--177.

[78]

Loc Hoang, Matteo Pontecorvi, Roshan Dathathri, Gurbinder Gill, Bozhi You, Keshav Pingali, and Vijaya Ramachandran. 2019. A round-efficient distributed betweenness centrality algorithm. In ACM Symposium on Principles and Practice of Parallel Programming (PPoPP). 272--286.

Digital Library

[79]

Sungpack Hong, Nicole C. Rodia, and Kunle Olukotun. 2013. On fast parallel detection of strongly connected components (SCC) in small-world graphs. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC). 92:1--92:11.

[80]

John Hopcroft and Robert Tarjan. 1973. Algorithm 447: Efficient algorithms for graph manipulation. Commun. ACM 16, 6 (1973), 372--378.

Digital Library

[81]

Alexandru Iosup, Tim Hegeman, Wing Lung Ngai, Stijn Heldens, Arnau Prat-Pérez, Thomas Manhardto, Hassan Chafio, Mihai Capotă, Narayanan Sundaram, Michael Anderson, Ilie Gabriel Tănase, Yinglong Xia, Lifeng Nai, and Peter Boncz. 2016. LDBC graphalytics: A benchmark for large-scale graph analysis on parallel and distributed platforms. Proc. VLDB Endow. 9, 13 (Sept. 2016), 1317--1328.

Digital Library

[82]

Amos Israeli and Y. Shiloach. 1986. An improved parallel algorithm for maximal matching. Inform. Process. Lett. 22, 2 (1986), 57--60.

Digital Library

[83]

Alon Itai and Michael Rodeh. 1977. Finding a minimum circuit in a graph. In ACM Symposium on Theory of Computing (STOC). 1--10.

Digital Library

[84]

J. Jaja. 1992. Introduction to Parallel Algorithms. Addison-Wesley Professional.

Digital Library

[85]

Sang-Woo Jun, Andy Wright, Sizhuo Zhang, Shuotao Xu, et al. 2018. GraFBoost: Using accelerated flash storage for external graph analytics. In ACM International Symposium on Computer Architecture (ISCA). 411--424.

Digital Library

[86]

H. Kabir and K. Madduri. 2017. Parallel k-core decomposition on multicore platforms. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 1482--1491.

[87]

David R. Karger, Philip N. Klein, and Robert E. Tarjan. 1995. A randomized linear-time algorithm to find minimum spanning trees. J. ACM 42, 2 (March 1995), 321--328.

Digital Library

[88]

Richard M. Karp and Vijaya Ramachandran. 1990. Parallel algorithms for shared-memory machines. In Handbook of Theoretical Computer Science (Vol. A), Jan van Leeuwen (Ed.). MIT Press, Cambridge, MA, USA, 869--941.

[89]

Richard M. Karp and Avi Wigderson. 1984. A fast parallel algorithm for the maximal independent set problem. In ACM Symposium on Theory of Computing (STOC). 266--272.

[90]

Jinha Kim, Wook-Shin Han, Sangyeon Lee, Kyungyeol Park, and Hwanjo Yu. 2014. OPT: A new framework for overlapped and parallel triangulation in large-scale graphs. In ACM International Conference on Management of Data (SIGMOD). 637--648.

Digital Library

[91]

Ravi Kumar, Benjamin Moseley, Sergei Vassilvitskii, and Andrea Vattani. 2015. Fast greedy algorithms in MapReduce and streaming. ACM Trans. Parallel Comput. 2, 3 (2015), 14:1--14:22.

Digital Library

[92]

Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media? In International World Wide Web Conference (WWW). 591--600.

Digital Library

[93]

Matthieu Latapy. 2008. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci. 407, 1--3 (2008), 458--473.

Digital Library

[94]

Charles E. Leiserson and Tao B. Schardl. 2010. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 303--314.

[95]

Jason Li. 2020. Faster parallel algorithm for approximate shortest path. In ACM Symposium on Theory of Computing (STOC). 308--321.

Digital Library

[96]

Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5, 8 (April 2012).

Digital Library

[97]

Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M. Hellerstein. 2010. GraphLab: A new parallel framework for machine learning. In Conference on Uncertainty in Artificial Intelligence (UAI). 340--349.

[98]

Michael Luby. 1986. A simple parallel algorithm for the maximal independent set problem. SIAM J. Comput. (1986), 1036--1053.

[99]

Steffen Maass, Changwoo Min, Sanidhya Kashyap, Woonhak Kang, Mohan Kumar, and Taesoo Kim. 2017. Mosaic: Processing a trillion-edge graph on a single machine. In European Conference on Computer Systems (EuroSys). 527--543.

Digital Library

[100]

Saeed Maleki, Donald Nguyen, Andrew Lenharth, María Garzarán, David Padua, and Keshav Pingali. 2016. DSMR: A parallel algorithm for single-source shortest path problem. In Proceedings of the 2016 International Conference on Supercomputing (ICS). 32:1--32:14.

Digital Library

[101]

Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In ACM International Conference on Management of Data (SIGMOD). 135--146.

Digital Library

[102]

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge university press.

[103]

Yael Maon, Baruch Schieber, and Uzi Vishkin. 1986. Parallel ear decomposition search (EDS) and st-numbering in graphs. Theoretical Computer Science 47 (1986), 277--298.

Digital Library

[104]

David W. Matula and Leland L. Beck. 1983. Smallest-last ordering and clustering and graph coloring algorithms. J. ACM 30, 3 (July 1983), 417--427.

Digital Library

[105]

Robert Ryan McCune, Tim Weninger, and Greg Madey. 2015. Thinking like a vertex: A survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Comput. Surv. 48, 2, Article 25 (Oct. 2015), 39 pages.

[106]

William Mclendon Iii, Bruce Hendrickson, Steven J. Plimpton, and Lawrence Rauchwerger. 2005. Finding strongly connected components in distributed graphs. J. Parallel Distrib. Comput. 65, 8 (2005), 901--910.

Digital Library

[107]

Robert Meusel, Sebastiano Vigna, Oliver Lehmberg, and Christian Bizer. 2015. The graph structure in the web--analyzed on different aggregation levels. The Journal of Web Science 1, 1 (2015), 33--47.

[108]

Ulrich Meyer and Peter Sanders. 2000. Parallel shortest path for arbitrary graphs. In European Conference on Parallel Processing (Euro-Par). 461--470.

[109]

Ulrich Meyer and Peter Sanders. 2003. Δ-stepping: A parallelizable shortest path algorithm. J. Algorithms 49, 1 (2003), 114--152.

Digital Library

[110]

Gary L. Miller, Richard Peng, Adrian Vladu, and Shen Chen Xu. 2015. Improved parallel algorithms for spanners and hopsets. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 192--201.

Digital Library

[111]

Gary L. Miller, Richard Peng, and Shen Chen Xu. 2013. Parallel graph decompositions using random shifts. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 196--203.

Digital Library

[112]

Gary L. Miller and Vijaya Ramachandran. 1992. A new graph triconnectivity algorithm and its parallelization. Combinatorica 12, 1 (1992), 53--76.

[113]

Lifeng Nai, Yinglong Xia, Ilie G. Tanase, Hyesoon Kim, and Ching-Yung Lin. 2015. GraphBIG: Understanding graph computing in the context of industrial solutions. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC). 69:1--69:12.

[114]

Mark E. J. Newman. 2003. The structure and function of complex networks. SIAM Rev. 45, 2 (2003), 167--256.

Digital Library

[115]

Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A lightweight infrastructure for graph analytics. In ACM Symposium on Operating Systems Principles (SOSP). 456--471.

Digital Library

[116]

Sadegh Nobari, Thanh-Tung Cao, Panagiotis Karras, and Stéphane Bressan. 2012. Scalable parallel minimum spanning forest computation. In ACM Symposium on Principles and Practice of Parallel Programming (PPoPP). 205--214.

Digital Library

[117]

Mark Ortmann and Ulrik Brandes. 2014. Triangle listing algorithms: Back from the diversion. In Algorithm Engineering and Experiments (ALENEX). 1--8.

[118]

Rasmus Pagh and Francesco Silvestri. 2014. The input/output complexity of triangle enumeration. In ACM Symposium on Principles of Database Systems (PODS). 224--233.

Digital Library

[119]

M. M. A. Patwary, P. Refsnes, and F. Manne. 2012. Multi-core spanning forest algorithms using the disjoint-set data structure. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 827--835.

[120]

David Peleg and Alejandro A Schäffer. 1989. Graph spanners. Journal of Graph Theory 13, 1 (1989), 99--116.

[121]

Seth Pettie and Vijaya Ramachandran. 2002. A randomized time-work optimal parallel algorithm for finding a minimum spanning forest. SIAM J. on Computing 31, 6 (2002), 1879--1895.

Digital Library

[122]

C. A. Phillips. 1989. Parallel graph contraction. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 148--157.

[123]

Chung Keung Poon and Vijaya Ramachandran. 1997. A randomized linear work EREW PRAM algorithm to find a minimum spanning forest. In International Symposium on Algorithms and Computation (ISAAC). 212--222.

[124]

Sridhar Rajagopalan and Vijay V. Vazirani. 1999. Primal-dual RNC approximation algorithms for set cover and covering integer programs. SIAM J. on Computing 28, 2 (Feb. 1999), 525--540.

[125]

Vijaya Ramachandran. 1989. A framework for parallel graph algorithm design. In International Symposium on Optimal Algorithms. 33--40.

[126]

Vijaya Ramachandran. 1993. Parallel open ear decomposition with applications to graph biconnectivity and triconnectivity. In Synthesis of Parallel Algorithms, John H Reif (Ed.). Morgan Kaufmann Publishers Inc.

[127]

J. Reif. 1985. Optimal Parallel Algorithms for Integer Sorting and Graph Connectivity. Technical Report TR-08-85. Harvard University.

[128]

Ahmet Erdem Sariyuce, C. Seshadhri, and Ali Pinar. 2018. Parallel local algorithms for core, truss, and nucleus decompositions. Proc. VLDB Endow. 12, 1 (2018), 43--56.

Digital Library

[129]

T. Schank. 2007. Algorithmic Aspects of Triangle-Based Network Analysis. Ph.D. Dissertation. Universitat Karlsruhe.

[130]

Thomas Schank and Dorothea Wagner. 2005. Finding, counting and listing all triangles in large graphs, an experimental study. In Workshop on Experimental and Efficient Algorithms (WEA). 606--609.

Digital Library

[131]

Warren Schudy. 2008. Finding strongly connected components in parallel using O(log² N) reachability queries. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 146--151.

[132]

Stephen B. Seidman. 1983. Network structure and minimum degree. Soc. Networks 5, 3 (1983), 269--287.

[133]

Martin Sevenich, Sungpack Hong, Adam Welc, and Hassan Chafi. 2014. Fast in-memory triangle listing for large real-world graphs. In Workshop on Social Network Mining and Analysis. Article 2, 2:1--2:9 pages.

Digital Library

[134]

Jessica Shi, Laxman Dhulipala, and Julian Shun. 2020. Parallel clique counting and peeling algorithms. arXiv preprint arXiv:2002.10047 (2020).

[135]

Yossi Shiloach and Uzi Vishkin. 1982. An O(log n ) parallel connectivity algorithm. J. Algorithms 3, 1 (1982), 57--67.

[136]

Julian Shun and Guy E. Blelloch. 2013. Ligra: A lightweight graph processing framework for shared memory. In ACM Symposium on Principles and Practice of Parallel Programming (PPoPP). 135--146.

[137]

Julian Shun and Guy E. Blelloch. 2014. Phase-concurrent hash tables for determinism. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 96--107.

[138]

Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, and Phillip B. Gibbons. 2013. Reducing contention through priority updates. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 299--300.

[139]

Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Aapo Kyrola, Harsha Vardhan Simhadri, and Kanat Tangwongsan. 2012. Brief announcement: The problem based benchmark suite. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA).

Digital Library

[140]

Julian Shun, Laxman Dhulipala, and Guy E. Blelloch. 2014. A simple and practical linear-work parallel algorithm for connectivity. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 143--153.

[141]

Julian Shun, Laxman Dhulipala, and Guy E. Blelloch. 2015. Smaller and faster: Parallel processing of compressed graphs with Ligra+. In Data Compression Conference (DCC). 403--412.

[142]

Julian Shun and Kanat Tangwongsan. 2015. Multicore triangle computations without tuning. In IEEE International Conference on Data Engineering (ICDE). 149--160.

[143]

George M. Slota and Kamesh Madduri. 2014. Simple parallel biconnectivity algorithms for multicore platforms. In IEEE International Conference on High-Performance Computing (HiPC). 1--10.

[144]

George M. Slota, Sivasankaran Rajamanickam, and Kamesh Madduri. 2014. BFS and coloring-based parallel algorithms for strongly connected components and related problems. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 550--559.

Digital Library

[145]

George M. Slota, Sivasankaran Rajamanickam, and Kamesh Madduri. 2015. Supercomputing for Web Graph Analytics. Technical Report SAND2015-3087C. Sandia National Lab.

[146]

G. M. Slota, S. Rajamanickam, and K. Madduri. 2016. A case study of complex graph analysis in distributed memory: Implementation and optimization. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 293--302.

[147]

Stergios Stergiou, Dipen Rughwani, and Kostas Tsioutsiouliklis. 2018. Shortcutting label propagation for distributed connected components. In International Conference on Web Search and Data Mining (WSDM). 540--546.

Digital Library

[148]

Robert E. Tarjan and Uzi Vishkin. 1985. An efficient parallel biconnectivity algorithm. SIAM J. on Computing 14, 4 (1985), 862--874.

Digital Library

[149]

Mikkel Thorup and Uri Zwick. 2005. Approximate distance oracles. J. ACM 52, 1 (2005), 1--24.

Digital Library

[150]

Tom Tseng, Laxman Dhulipala, and Julian Shun. 2021. Parallel index-based structural graph clustering and its approximation. To appear in ACM International Conference on Management of Data (SIGMOD) (2021).

[151]

Dominic J. A. Welsh and Martin B. Powell. 1967. An upper bound for the chromatic number of a graph and its application to timetabling problems. Comput. J. 10, 1 (1967), 85--86.

[152]

Da Yan, Yingyi Bu, Yuanyuan Tian, and Amol Deshpande. 2017. Big graph analytics platforms. Foundations and Trends in Databases 7, 1--2 (2017), 1--195.

Digital Library

[153]

Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian Shun, and Saman Amarasinghe. 2018. GraphIt: A high-performance graph DSL. Object-Oriented Programming Systems, Languages,and Applications (OOPSLA) (2018), 121:1--121:30.

[154]

Da Zheng, Disa Mhembere, Randal Burns, Joshua Vogelstein, Carey E. Priebe, and Alexander S. Szalay. 2015. FlashGraph: Processing billion-node graphs on an array of commodity SSDs. In USENIX Conference on File and Storage Technologies (FAST). 45--58.

[155]

Wei Zhou. 2017. A Practical Scalable Shared-Memory Parallel Algorithm for Computing Minimum Spanning Trees. Master’s thesis. KIT.

Cited By

Wheatman BDong XShen ZDhulipala LŁącki JPandey PXu H(2024)BYO: A Unified Framework for Benchmarking Large-Scale Graph ContainersProceedings of the VLDB Endowment10.14778/3665844.366585917:9(2307-2320)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.14778/3665844.3665859
Gupta VChen XHuang RMeng FChen JYan YSerra ESpezzano F(2024)GraphScale: A Framework to Enable Machine Learning over Billion-node GraphsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680021(4514-4521)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3680021
Manohar MShen ZBlelloch GDhulipala LGu YSimhadri HSun YLee IChabbi MSteuwer M(2024)ParlayANN: Scalable and Deterministic Parallel Graph-Based Approximate Nearest Neighbor Search AlgorithmsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638475(270-285)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638475
Show More Cited By

Index Terms

Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel algorithms
      1. Shared memory algorithms

Recommendations

Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable
SPAA '18: Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures

There has been significant recent interest in parallel graph processing due to the need to quickly analyze the large graphs available today. Many graph codes have been designed for distributed memory or external memory. However, today even the largest ...
A new augmentation based algorithm for extracting maximal chordal subgraphs

A graph is chordal if every cycle of length greater than three contains an edge between non-adjacent vertices. Chordal graphs are of interest both theoretically, since they admit polynomial time solutions to a range of NP-hard graph problems, and ...
A Reconfigurable Computing Approach for Efficient and Scalable Parallel Graph Exploration
ASAP '12: Proceedings of the 2012 IEEE 23rd International Conference on Application-Specific Systems, Architectures and Processors

In many application domains, data are represented using large graphs involving millions of vertices and billions of edges. Graph exploration algorithms, such asbreadth-first search (BFS), are largely dominated by memory latency and are challenging to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Parallel Computing

ACM Transactions on Parallel Computing Volume 8, Issue 1

March 2021

189 pages

ISSN:2329-4949

EISSN:2329-4957

DOI:10.1145/3446672

Editor:
David A. Bader
New Jersey Institute of Technology, USA

Issue’s Table of Contents

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 April 2021

Accepted: 01 September 2020

Received: 01 May 2019

Published in TOPC Volume 8, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

27
Total Citations
View Citations
2,359
Total Downloads

Downloads (Last 12 months)1,002
Downloads (Last 6 weeks)150

Reflects downloads up to 24 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wheatman BDong XShen ZDhulipala LŁącki JPandey PXu H(2024)BYO: A Unified Framework for Benchmarking Large-Scale Graph ContainersProceedings of the VLDB Endowment10.14778/3665844.366585917:9(2307-2320)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.14778/3665844.3665859
Gupta VChen XHuang RMeng FChen JYan YSerra ESpezzano F(2024)GraphScale: A Framework to Enable Machine Learning over Billion-node GraphsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680021(4514-4521)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3680021
Manohar MShen ZBlelloch GDhulipala LGu YSimhadri HSun YLee IChabbi MSteuwer M(2024)ParlayANN: Scalable and Deterministic Parallel Graph-Based Approximate Nearest Neighbor Search AlgorithmsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638475(270-285)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638475
Abdi JPosluns GZhang GWang BJeffrey MAgrawal KPetrank E(2024)When Is Parallelism Fearless and Zero-Cost with Rust?Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659966(27-40)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659966
Ding XGu YSun YAgrawal KPetrank E(2024)Parallel and (Nearly) Work-Efficient Dynamic ProgrammingProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659958(219-232)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659958
Blelloch GGu YSun Y(2024)Teaching Parallel Algorithms Using the Binary-Forking Model2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00080(346-351)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00080
Loukil MSfaxi LRobbana R(2024)How to create graphs in hardware-constrained environments? Choosing the best creation approach via machine learning-based predictive modelsInternational Journal of Data Science and Analytics10.1007/s41060-023-00495-5Online publication date: 2-Feb-2024
https://doi.org/10.1007/s41060-023-00495-5
Moori MRocha HLorenzon ABeck A(2024)Allok: a machine learning approach for efficient graph execution on CPU–GPU clustersThe Journal of Supercomputing10.1007/s11227-024-06079-980:13(18838-18865)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s11227-024-06079-9
Fu TWei CZhu ZYang SYu ZDai GYang HWang Y(2023)CLAP: Locality Aware and Parallel Triangle Counting with Content Addressable Memory2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10136997(1-6)Online publication date: Apr-2023
https://doi.org/10.23919/DATE56975.2023.10136997
Wang LDing XGu YSun Y(2023)Fast and Space-Efficient Parallel Algorithms for Influence MaximizationProceedings of the VLDB Endowment10.14778/3632093.363210417:3(400-413)Online publication date: 1-Nov-2023
https://dl.acm.org/doi/10.14778/3632093.3632104
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents