research-article

Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms

Authors:

Muhammad Amber Hassaan,

Martin Burtscher,

Keshav PingaliAuthors Info & Claims

ACM SIGPLAN Notices, Volume 46, Issue 8

Pages 3 - 12

https://doi.org/10.1145/2038037.1941557

Published: 12 February 2011 Publication History

Abstract

Outside of computational science, most problems are formulated in terms of irregular data structures such as graphs, trees and sets. Unfortunately, we understand relatively little about the structure of parallelism and locality in irregular algorithms. In this paper, we study multiple algorithms for four such problems: discrete-event simulation, single-source shortest path, breadth-first search, and minimal spanning trees. We show that the algorithms can be classified into two categories that we call unordered and ordered, and demonstrate experimentally that there is a trade-off between parallelism and work efficiency: unordered algorithms usually have more parallelism than their ordered counterparts for the same problem, but they may also perform more work. Nevertheless, our experimental results show that unordered algorithms typically lead to more scalable implementations, demonstrating that less work-efficient irregular algorithms may be better for parallel execution.

References

[1]

9th dimacs implementation challenge - shortest paths. http://www.dis.uniroma1.it/~challenge9/links.shtml.

[2]

D.A. Bader and G. Cong. Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs. Journal of Parallel and Distributed Computing, 66(11):1366--1378, 2006.

Digital Library

[3]

David~A. Bader and Kamesh Madduri. Designing multithreaded algorithms for breadth-first search and st-connectivity on the cray mta-2. In ICPP'06: Proceedings of the 2006 International Conference on Parallel Processing, pages 523--530, Washington, DC, USA, 2006. IEEE Computer Society.

Digital Library

[4]

Richard Bellman. On a routing problem. Quarterly of Applied Mathematics, 16(1), pages 87--90, 1958.

[5]

Gianfranco Bilardi, Paolo D'Alberto, and Alexandru Nicolau. Fractal matrix multiplication: A case study on portability of cache performance. In Algorithm Engineering, pages 26--38, 2001.

Digital Library

[6]

Gianfranco Bilardi, Afonso Ferreira, Reinhard Lüling, and José D. P. Rolim, editors. Solving Irregularly Structured Problems in Parallel, 4th International Symposium, IRREGULAR'97, Paderborn, Germany, June 12--13, 1997, Proceedings, volume 1253 of Lecture Notes in Computer Science. Springer, 1997.

Digital Library

[7]

Brian D. Carlstrom, Austen McDonald, Michael Carbin, Christos Kozyrakis, and Kunle Olukotun. Transactional collection classes. In PPOPP, pages 56--67, 2007.

Digital Library

[8]

K. Mani Chandy and Jayadev Misra. Distributed simulation: A case study in design and verification of distributed programs. IEEE Trans. Software Eng., 5(5):440--452, 1979.

Digital Library

[9]

Philippe Charles, Christian Grothoff, Vijay~A. Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In OOPSLA, pages 519--538, 2005.

Digital Library

[10]

D. Chazan and W. Miranker. Chaotic relaxation. Linear algebra, 2(199--222), 1969.

[11]

Sun Chung and Anne Condon. Parallel implementation of boruvka's minimum spanning tree algorithm. Parallel Processing Symposium, International, 0:302, 1996.

Digital Library

[12]

Thomas Cormen, Charles Leiserson, Ronald Rivest, and Clifford Stein, editors. Introduction to Algorithms. MIT Press, 2001.

Digital Library

[13]

Nick Edmonds, Alex Breuer, Douglas Gregor, and Andrew Lumsdaine. Single-source shortest paths with the parallel boost graph library. In 9th DIMACS Implementation Challenge - Shortest Paths, November, 2006.

[14]

Paul Feautrier. Some efficient solutions to the affine scheduling problem: One dimensional time. International Journal of Parallel Programming, October 1992.

Digital Library

[15]

Jr. Ford, L. R. Network flow theory. Technical Report Report P-923, The Rand Corporation, Santa Monica, Cal, 1956.

[16]

Maurice Herlihy and J. Eliot~B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA, 1993.

Digital Library

[17]

Joseph Ja'Ja'. An Introduction to Parallel Algorithms. Addison-Wesley Publishing Company, Reading, MA, 1992.

Digital Library

[18]

David~R. Jefferson. Virtual time. ACM Trans. Program. Lang. Syst., 7(3):404--425, 1985.

Digital Library

[19]

Jorg Keller, Cristoph Kessler, and Jesper Traff. Practical PRAM Programming. Wiley-Interscience, New York, 2001.

Digital Library

[20]

M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. SIGPLAN Not. (Proceedings of PLDI 2007), 42(6):211--222, 2007.

Digital Library

[21]

Milind Kulkarni, Martin Burtscher, Rajasekhar Inkulu, Keshav Pingali, and Calin Casçaval. How much parallelism is there in irregular applications? In PPoPP, pages 3--14, 2009.

Digital Library

[22]

Charles E. Leiserson and Tao~B. Schardl. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In SPAA, pages 303--314, 2010.

Digital Library

[23]

Kamesh Madduri, David~A. Bader, Jonathan~W. Berry, and Joseph~R. Crobak. Parallel shortest path algorithms for solving large-scale instances. In 9th DIMACS Implementation Challenge - Shortest Paths, November, 2006.

[24]

Mario Mendez-Lojo, Donald Nguyen, Dimitrios Prountzos, Xin Sui, M. Amber Hassaan, Milind Kulkarni, Martin Burtscher, and Keshav Pingali. Structure-driven optimizations for amorphous data-parallel programs. In PPoPP, 2010.

Digital Library

[25]

U. Meyer and P. Sanders. δ-stepping: a parallelizable shortest path algorithm. J. Algs. 49(1), pages 114--152, 2003.

Digital Library

[26]

http://www.openmp.org/.

[27]

Keshav Pingali, Milind Kulkarni, Donald Nguyen, Martin Burtscher, Mario Mendez-Lojo, Dimitrios Prountzos, Xin Sui, and Zifei Zhong. Amorphous data-parallelism in irregular algorithms. regular tech report TR-09-05, The University of Texas at Austin, 2009.

[28]

C. D. Polychronopoulos and D. J. Kuck. Guided self-scheduling: A practical scheduling scheme for parallel supercomputers. IEEE Trans. Comput., 36(12):1425--1439, 1987.

Digital Library

[29]

Lawrence Rauchwerger and David~A. Padua. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parallel Distrib. Syst., 10(2):160--180, 1999.

Digital Library

[30]

Yossi Shiloach and Uzi Vishkin. An o(log n) parallel connectivity algorithm. J. Algorithms, 3(1):57--67, 1982.

[31]

Ten H. Tzen and Lionel~M. Ni. Trapezoid self-scheduling: A practical scheduling scheme for parallel compilers, 1993.

[32]

Uzi Vishkin, Shlomit Dascal, Efraim Berkovich, and Joseph Nuzman. Explicit multi-threading (xmt) bridging models for instruction parallelism (extended abstract). In SPAA, pages 140--151, 1998.

Digital Library

[33]

Xingzhi Wen and Uzi Vishkin. Fpga-based prototype of a pram-on-chip processor. In Conf. Computing Frontiers, pages 55--66, 2008.

Digital Library

[34]

Andy Yoo, Edmond Chow, Keith Henderson, William McLendon, Bruce Hendrickson, and Umit Catalyurek. A scalable distributed parallel breadth-first search algorithm on bluegene/l. In SC'05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing, page~25, Washington, DC, USA, 2005. IEEE Computer Society.

Digital Library

[35]

Yang Zhang and Eric~A. Hansen. Parallel breadth-first heuristic search on a shared memory architecture. In AAAI-06 Workshop on Heuristic Search, Memory-Based Heuristics and Their Applications, July 2006.

Cited By

Zhang GPosluns GJeffrey MAgrawal KPetrank E(2024)Multi Bucket Queues: Efficient Concurrent Priority SchedulingProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659962(113-124)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659962
Łoś MPaszyński M(2024)Parallel shared-memory open-source code for simulations of transient problems using isogeometric analysis, implicit direction splitting and residual minimization (IGA-ADS-RM)Advances in Engineering Software10.1016/j.advengsoft.2024.103723196(103723)Online publication date: Oct-2024
https://doi.org/10.1016/j.advengsoft.2024.103723
Zhou YLeng JSong YLu SWang MLi CGuo MShen WLi YLin WLiu XWu HAamodt TJerger NSwift M(2023)uGrapher: High-Performance Graph Operator Computation via Unified Abstraction for Graph Neural NetworksProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575723(878-891)Online publication date: 27-Jan-2023
https://dl.acm.org/doi/10.1145/3575693.3575723
Show More Cited By

Index Terms

Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
        Frameworks
      2. Language types
        Parallel programming languages

Recommendations

The tao of parallelism in algorithms
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation

For more than thirty years, the parallel programming community has used the dependence graph as the main abstraction for reasoning about and exploiting parallelism in "regular" algorithms that use dense arrays, such as finite-differences and FFTs. In ...
Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming

Outside of computational science, most problems are formulated in terms of irregular data structures such as graphs, trees and sets. Unfortunately, we understand relatively little about the structure of parallelism and locality in irregular algorithms. ...
Ordered and unordered algorithms for parallel breadth first search
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques

We describe and evaluate ordered and unordered algorithms for shared-memory parallel breadth-first search. The unordered algorithm is based on viewing breadth-first search as a fixpoint computation, and in general, it may perform more work than the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 46, Issue 8

PPoPP '11

August 2011

300 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/2038037

Issue’s Table of Contents

PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
February 2011
326 pages
ISBN:9781450301190
DOI:10.1145/1941553
General Chair:
Calin Cascaval
Qualcomm Research, USA
,
Program Chair:
Pen-Chung Yew
Academia Sinica, Taiwan and University of Minnesota at Twin Cities, USA

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 February 2011

Published in SIGPLAN Volume 46, Issue 8

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

78
Total Citations
View Citations
841
Total Downloads

Downloads (Last 12 months)64
Downloads (Last 6 weeks)5

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang GPosluns GJeffrey MAgrawal KPetrank E(2024)Multi Bucket Queues: Efficient Concurrent Priority SchedulingProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659962(113-124)Online publication date: 17-Jun-2024
Łoś MPaszyński M(2024)Parallel shared-memory open-source code for simulations of transient problems using isogeometric analysis, implicit direction splitting and residual minimization (IGA-ADS-RM)Advances in Engineering Software10.1016/j.advengsoft.2024.103723196(103723)Online publication date: Oct-2024
Zhou YLeng JSong YLu SWang MLi CGuo MShen WLi YLin WLiu XWu HAamodt TJerger NSwift M(2023)uGrapher: High-Performance Graph Operator Computation via Unified Abstraction for Graph Neural NetworksProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575723(878-891)Online publication date: 27-Jan-2023
Posluns GZhu YZhang GJeffrey MSalapura VZahran MChong FTang L(2022)A scalable architecture for reprioritizing ordered parallelismProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527387(437-453)Online publication date: 18-Jun-2022
Balaji VLucia B(2022)Improving Locality of Irregular Updates with Hardware Assisted Propagation Blocking2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00047(543-557)Online publication date: Apr-2022
Lu SSun SPaul JLi YHe BLi GLi ZIdreos SSrivastava D(2021)Cache-Efficient Fork-Processing Patterns on Large GraphsProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457253(1208-1221)Online publication date: 9-Jun-2021
Rehman AAhmad MKhan O(2021)A performance predictor for implementation selection of parallelized static and temporal graph algorithmsConcurrency and Computation: Practice and Experience10.1002/cpe.626734:2Online publication date: 26-Apr-2021
Yesil SHeidarshenas AMorrison ATorrellas JTaufer MBalaji PPeña A(2019)Understanding priority-based scheduling of graph algorithms on a shared-memory platformProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356160(1-14)Online publication date: 17-Nov-2019
Meng KLi JTan GSun NHollingsworth JKeidar I(2019)A pattern based algorithmic autotuner for graph processing on GPUsProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3295716(201-213)Online publication date: 16-Feb-2019
Shun J(2017)Shared-Memory Parallelism Can Be Simple, Fast, and ScalableundefinedOnline publication date: 9-Jun-2017
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents