Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1941553.1941557acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms

Published: 12 February 2011 Publication History

Abstract

Outside of computational science, most problems are formulated in terms of irregular data structures such as graphs, trees and sets. Unfortunately, we understand relatively little about the structure of parallelism and locality in irregular algorithms. In this paper, we study multiple algorithms for four such problems: discrete-event simulation, single-source shortest path, breadth-first search, and minimal spanning trees. We show that the algorithms can be classified into two categories that we call unordered and ordered, and demonstrate experimentally that there is a trade-off between parallelism and work efficiency: unordered algorithms usually have more parallelism than their ordered counterparts for the same problem, but they may also perform more work. Nevertheless, our experimental results show that unordered algorithms typically lead to more scalable implementations, demonstrating that less work-efficient irregular algorithms may be better for parallel execution.

References

[1]
9th dimacs implementation challenge - shortest paths. http://www.dis.uniroma1.it/~challenge9/links.shtml.
[2]
D.A. Bader and G. Cong. Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs. Journal of Parallel and Distributed Computing, 66(11):1366--1378, 2006.
[3]
David~A. Bader and Kamesh Madduri. Designing multithreaded algorithms for breadth-first search and st-connectivity on the cray mta-2. In ICPP'06: Proceedings of the 2006 International Conference on Parallel Processing, pages 523--530, Washington, DC, USA, 2006. IEEE Computer Society.
[4]
Richard Bellman. On a routing problem. Quarterly of Applied Mathematics, 16(1), pages 87--90, 1958.
[5]
Gianfranco Bilardi, Paolo D'Alberto, and Alexandru Nicolau. Fractal matrix multiplication: A case study on portability of cache performance. In Algorithm Engineering, pages 26--38, 2001.
[6]
Gianfranco Bilardi, Afonso Ferreira, Reinhard Lüling, and José D. P. Rolim, editors. Solving Irregularly Structured Problems in Parallel, 4th International Symposium, IRREGULAR'97, Paderborn, Germany, June 12--13, 1997, Proceedings, volume 1253 of Lecture Notes in Computer Science. Springer, 1997.
[7]
Brian D. Carlstrom, Austen McDonald, Michael Carbin, Christos Kozyrakis, and Kunle Olukotun. Transactional collection classes. In PPOPP, pages 56--67, 2007.
[8]
K. Mani Chandy and Jayadev Misra. Distributed simulation: A case study in design and verification of distributed programs. IEEE Trans. Software Eng., 5(5):440--452, 1979.
[9]
Philippe Charles, Christian Grothoff, Vijay~A. Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In OOPSLA, pages 519--538, 2005.
[10]
D. Chazan and W. Miranker. Chaotic relaxation. Linear algebra, 2(199--222), 1969.
[11]
Sun Chung and Anne Condon. Parallel implementation of boruvka's minimum spanning tree algorithm. Parallel Processing Symposium, International, 0:302, 1996.
[12]
Thomas Cormen, Charles Leiserson, Ronald Rivest, and Clifford Stein, editors. Introduction to Algorithms. MIT Press, 2001.
[13]
Nick Edmonds, Alex Breuer, Douglas Gregor, and Andrew Lumsdaine. Single-source shortest paths with the parallel boost graph library. In 9th DIMACS Implementation Challenge - Shortest Paths, November, 2006.
[14]
Paul Feautrier. Some efficient solutions to the affine scheduling problem: One dimensional time. International Journal of Parallel Programming, October 1992.
[15]
Jr. Ford, L. R. Network flow theory. Technical Report Report P-923, The Rand Corporation, Santa Monica, Cal, 1956.
[16]
Maurice Herlihy and J. Eliot~B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA, 1993.
[17]
Joseph Ja'Ja'. An Introduction to Parallel Algorithms. Addison-Wesley Publishing Company, Reading, MA, 1992.
[18]
David~R. Jefferson. Virtual time. ACM Trans. Program. Lang. Syst., 7(3):404--425, 1985.
[19]
Jorg Keller, Cristoph Kessler, and Jesper Traff. Practical PRAM Programming. Wiley-Interscience, New York, 2001.
[20]
M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. SIGPLAN Not. (Proceedings of PLDI 2007), 42(6):211--222, 2007.
[21]
Milind Kulkarni, Martin Burtscher, Rajasekhar Inkulu, Keshav Pingali, and Calin Casçaval. How much parallelism is there in irregular applications? In PPoPP, pages 3--14, 2009.
[22]
Charles E. Leiserson and Tao~B. Schardl. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In SPAA, pages 303--314, 2010.
[23]
Kamesh Madduri, David~A. Bader, Jonathan~W. Berry, and Joseph~R. Crobak. Parallel shortest path algorithms for solving large-scale instances. In 9th DIMACS Implementation Challenge - Shortest Paths, November, 2006.
[24]
Mario Mendez-Lojo, Donald Nguyen, Dimitrios Prountzos, Xin Sui, M. Amber Hassaan, Milind Kulkarni, Martin Burtscher, and Keshav Pingali. Structure-driven optimizations for amorphous data-parallel programs. In PPoPP, 2010.
[25]
U. Meyer and P. Sanders. δ-stepping: a parallelizable shortest path algorithm. J. Algs. 49(1), pages 114--152, 2003.
[26]
http://www.openmp.org/.
[27]
Keshav Pingali, Milind Kulkarni, Donald Nguyen, Martin Burtscher, Mario Mendez-Lojo, Dimitrios Prountzos, Xin Sui, and Zifei Zhong. Amorphous data-parallelism in irregular algorithms. regular tech report TR-09-05, The University of Texas at Austin, 2009.
[28]
C. D. Polychronopoulos and D. J. Kuck. Guided self-scheduling: A practical scheduling scheme for parallel supercomputers. IEEE Trans. Comput., 36(12):1425--1439, 1987.
[29]
Lawrence Rauchwerger and David~A. Padua. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parallel Distrib. Syst., 10(2):160--180, 1999.
[30]
Yossi Shiloach and Uzi Vishkin. An o(log n) parallel connectivity algorithm. J. Algorithms, 3(1):57--67, 1982.
[31]
Ten H. Tzen and Lionel~M. Ni. Trapezoid self-scheduling: A practical scheduling scheme for parallel compilers, 1993.
[32]
Uzi Vishkin, Shlomit Dascal, Efraim Berkovich, and Joseph Nuzman. Explicit multi-threading (xmt) bridging models for instruction parallelism (extended abstract). In SPAA, pages 140--151, 1998.
[33]
Xingzhi Wen and Uzi Vishkin. Fpga-based prototype of a pram-on-chip processor. In Conf. Computing Frontiers, pages 55--66, 2008.
[34]
Andy Yoo, Edmond Chow, Keith Henderson, William McLendon, Bruce Hendrickson, and Umit Catalyurek. A scalable distributed parallel breadth-first search algorithm on bluegene/l. In SC'05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing, page~25, Washington, DC, USA, 2005. IEEE Computer Society.
[35]
Yang Zhang and Eric~A. Hansen. Parallel breadth-first heuristic search on a shared memory architecture. In AAAI-06 Workshop on Heuristic Search, Memory-Based Heuristics and Their Applications, July 2006.

Cited By

View all
  • (2022)As-Is Approximate ComputingACM Transactions on Architecture and Code Optimization10.1145/355976120:1(1-26)Online publication date: 17-Nov-2022
  • (2022)Decoupling Schedule, Topology Layout, and Algorithm to Easily Enlarge the Tuning Space of GPU Graph ProcessingProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569686(198-210)Online publication date: 8-Oct-2022
  • (2022)Atos: A Task-Parallel GPU Scheduler for Graph AnalyticsProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545056(1-11)Online publication date: 29-Aug-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
February 2011
326 pages
ISBN:9781450301190
DOI:10.1145/1941553
  • General Chair:
  • Calin Cascaval,
  • Program Chair:
  • Pen-Chung Yew
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 46, Issue 8
    PPoPP '11
    August 2011
    300 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2038037
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 February 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. amorphous data-parallelism
  2. discrete-event simulation
  3. galois system
  4. minimal spanning tree
  5. multicore processors
  6. parallel breadth first search
  7. single-source shortest path

Qualifiers

  • Research-article

Conference

PPoPP '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)58
  • Downloads (Last 6 weeks)9
Reflects downloads up to 02 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)As-Is Approximate ComputingACM Transactions on Architecture and Code Optimization10.1145/355976120:1(1-26)Online publication date: 17-Nov-2022
  • (2022)Decoupling Schedule, Topology Layout, and Algorithm to Easily Enlarge the Tuning Space of GPU Graph ProcessingProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569686(198-210)Online publication date: 8-Oct-2022
  • (2022)Atos: A Task-Parallel GPU Scheduler for Graph AnalyticsProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545056(1-11)Online publication date: 29-Aug-2022
  • (2021)An Open-Source EDA Flow for Asynchronous LogicIEEE Design & Test10.1109/MDAT.2021.305133438:2(27-37)Online publication date: Apr-2021
  • (2021)Parallel graph-grammar-based algorithm for the longest-edge refinement of triangular meshes and the pollution simulations in Lesser Poland areaEngineering with Computers10.1007/s00366-020-01253-yOnline publication date: 22-Jan-2021
  • (2020)Chronos: Efficient Speculative Parallelism for AcceleratorsProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378454(1247-1262)Online publication date: 9-Mar-2020
  • (2020)Parallel Shared-Memory Isogeometric Residual Minimization (iGRM) for Three-Dimensional Advection-Diffusion ProblemsComputational Science – ICCS 202010.1007/978-3-030-50436-6_10(133-148)Online publication date: 15-Jun-2020
  • (2019)PDES-AACM Transactions on Modeling and Computer Simulation10.1145/330225929:2(1-25)Online publication date: 18-Apr-2019
  • (2019)HyPar: A divide-and-conquer model for hybrid CPU–GPU graph processingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2019.05.014Online publication date: Jun-2019
  • (2018)Gluon: a communication-optimizing substrate for distributed heterogeneous graph analyticsACM SIGPLAN Notices10.1145/3296979.319240453:4(752-768)Online publication date: 11-Jun-2018
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media