Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1250734.1250759acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article

Optimistic parallelism requires abstractions

Published: 10 June 2007 Publication History

Abstract

Irregular applications, which manipulate large, pointer-based data structures like graphs, are difficult to parallelize manually. Automatic tools and techniques such as restructuring compilers and run-time speculative execution have failed to uncover much parallelism in these applications, in spite of a lot of effort by the research community. These difficulties have even led some researchers to wonder if there is any coarse-grain parallelism worth exploiting in irregular applications.
In this paper, we describe two real-world irregular applications: a Delaunay mesh refinement application and a graphics application thatperforms agglomerative clustering. By studying the algorithms and data structures used in theseapplications, we show that there is substantial coarse-grain, data parallelism in these applications, but that this parallelism is very dependent on the input data and therefore cannot be uncoveredby compiler analysis. In principle, optimistic techniques such asthread-level speculation can be used to uncover this parallelism, but we argue that current implementations cannot accomplish thisbecause they do not use the proper abstractions for the data structuresin these programs.
These insights have informed our design of the Galois system, an object-based optimistic parallelization system for irregular applications. There are three main aspects to Galois: (1) a small number of syntactic constructs for packaging optimistic parallelism as iteration over ordered and unordered sets, (2)assertions about methods in class libraries, and (3) a runtime scheme for detecting and recovering from potentially unsafe accesses to shared memory made by an optimistic computation.
We show that Delaunay mesh generation and agglomerative clustering can be parallelized in a straight-forward way using the Galois approach, and we present experimental measurements to show that this approach is practical. These results suggest that Galois is a practical approach to exploiting data parallelismin irregular programs.

References

[1]
C. Scott Ananian, Krste Asanovic, Bradley C. Kuszmaul, Charles E. Leiserson, and Sean Lie. Unbounded transactional memory. In HPCA '05: Proceedings of the 11th International Symposium on High-Performance Computer Architecture, 2005.
[2]
Christos D. Antonopoulos, Xiaoning Ding, Andrey Chernikov, Filip Blagojevic, Dimitrios S. Nikolopoulos, and Nikos Chrisochoides. Multigrain parallel delaunay mesh generation: challenges and opportunities for multithreaded architectures. In ICS '05: Proceedings of the 19th annual international conference on Supercomputing, 2005.
[3]
J. L. Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509--517, 1975.
[4]
A. Bernstein. Analysis of programs for parallel processing. IEEE Transactions on Electronic Computers, 1966.
[5]
Michael Burke, Paul Carini, and Jong-Deok Choi. Interprocedural pointer alias analysis. Technical Report IBM RC 21055, IBM Yorktown Heights, 1997.
[6]
Brian D. Carlstrom, Austen McDonald, Christos Kozyrakis, and Kunle Olukotun. Transactional collection classes. In Principles and Practices of Parallel Programming (PPoPP), 2007.
[7]
C. C. Foster and E. M. Riseman. Percolation of code to enhance parallel dispatching and execution. IEEE Transactions on Computers, 21(12):1411--1415, 1972.
[8]
L. Paul Chew. Guaranteed-quality mesh generation for curved surfaces. In SCG '93: Proceedings of the ninth annual symposium on Computational geometry, pages 274--280, 1993.
[9]
Johan de Galas. The quest for more processing power: is the single core CPU doomed? http://www.anandtech.com/cpuchipsets/ showdoc.aspx?i=2377, February 2005.
[10]
Pedro C. Diniz and Martin C. Rinard. Commutativity analysis: a new analysis technique for parallelizing compilers. ACM Trans. Program. Lang. Syst., 19(6):942--991, 1997.
[11]
Joseph A. Fisher. Very long instruction word architectures and the eli-512. In ISCA '98: 25 years of the international symposia on Computer architecture (selected papers), 1998.
[12]
Lance Hammond, Vicky Wong, Mike Chen, Brian D. Carlstrom, John D. Davis, Ben Hertzberg, Manohar K. Prabhu, Honggo Wijaya, Christos Kozyrakis, and Kunle Olukotun. Transactional memory coherence and consistency. ISCA 2004, 00:102, 2004.
[13]
Tim Harris and Keir Fraser. Language support for lightweight transactions. In OOPSLA '03: Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications, pages 388--402, 2003.
[14]
John Hennessy and David Patterson, editors. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 2003.
[15]
Maurice Herlihy and J. Eliot B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA '93: Proceedings of the 20th annual international symposium on Computer architecture, pages 289--300, New York, NY, USA, 1993. ACM Press.
[16]
Maurice P. Herlihy and William E. Weihl. Hybrid concurrency control for abstract data types. In PODS '88: Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pages 201--210, New York, NY, USA, 1988.
[17]
David R. Jefferson. Virtual time. ACM Trans. Program. Lang. Syst., 7(3):404--425, 1985.
[18]
Guy L. Steele Jr. Making asynchronous parallelism safe for the world. In Proceedings of the 17th symposium on Principles of Programming Languages, pages 218--231, 1990.
[19]
J. T. Schwartz, R. B. K. Dewar, E. Dubinsky, and E. Schonberg. Programming with sets: An introduction to SETL. Springer-Verlag Publishers, 1986.
[20]
Ken Kennedy and John Allen, editors. Optimizing compilers for modren architectures:a dependence-based approach. Morgan Kaufmann, 2001.
[21]
Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, and David A. Wood. Logtm: Log-based transactional memory. In HPCA '06: Proceedings of the 12th International Symposium on High Performance Computer Architecture, 2006.
[22]
J. Eliot B. Moss and Antony L. Hosking. Nested transactional memory: Model and preliminary architectural sketches. In SCOOL '05: Sychronization and Concurrency in Object-Oriented Languages, 2005.
[23]
J. B. C Neto, P. A. Wawrzynek, M. T. M. Carvalho, L. F. Martha, and A. R. Ingraffea. An algorithm for three-dimensional mesh generation for arbitrary regions with cracks. Engineering with Computers, 17:75--91, 2001.
[24]
Yang Ni, Vijay Menon, Ali-Reza Adl-Tabatabai, Antony L. Hosking, Rick Hudson, J. Eliot B. Moss, Bratin Saha, and Tatiana Shpeisman. Open nesting in software transactional memory. In Principles and Practices of Parallel Programming (PPoPP), 2007.
[25]
Openmp: A proposed industry standard api for shared memory programming. See www.openmp.org, October 28, 1997.
[26]
Michael Steinbach Pang-Ning Tan and Vipin Kumar, editors. Introduction to Data Mining. Pearson Addison Wesley, 2005.
[27]
R. Ponnusamy, J. Saltz, and A. Choudhary. Runtime compilation techniques for data partitioning and communication schedule reuse. In Proceedings of the 1993 ACM/IEEE conference on Supercomputing, 1993.
[28]
Hany E. Ramadan, Donald E. Porter Christopher J. Rossbach, Owen S. Hofmann, Aditya Bhandari, and Emmett Witchel. Transactional memory designs for an operating system. In International Symposium on Computer Architecture (ISCA), 2007.
[29]
Lawrence Rauchwerger and David A. Padua. Parallelizing while loops for multiprocessor systems. In IPPS '95: Proceedings of the 9th International Symposium on Parallel Processing, 1995.
[30]
Lawrence Rauchwerger and David A. Padua. The lrpd test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parallel Distrib. Syst., 10(2):160--180, 1999.
[31]
M. Sagiv, T. Reps, and R. Wilhelm. Solving shape-analysis problems in languages with destructive updating. In Proceedings of the 23rd Annual ACM Symposium on the Principles of Programming Languages, St. Petersburg Beach, FL, January 1996.
[32]
William Scherer and Michael Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proceedings of the Fifteenth ACM Symposium on Principles of Distributed Computing, 1996.
[33]
B. Selman, H. Levesque, and D. Mitchell. A new method for solving hard satisfiability problems. In Proceedings of the Tenth National Conference on Artificial Intelligence, pages 440--446, 1992.
[34]
Nir Shavit and Dan Touitou. Software transactional memory. In PODC '95: Proceedings of the fourteenth annual ACM Symposium on Principles of Distributed Computing, pages 204--213, 1995.
[35]
Jonathan Richard Shewchuk. Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator. In Applied Computational Geometry: Towards Geometric Engineering, volume 1148 of Lecture Notes in Computer Science, pages 203--222. May 1996.
[36]
J. Greggory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry. A scalable approach to thread-level speculation. In ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture, 2000.
[37]
Robert Tomasulo. An algorithm for exploiting multiple arithmetic units. IBM Journal, 11(1):25--33, 1967.
[38]
Christoph von Praun, Luis Ceze, and Calin Cascaval. Implicit parallelism with ordered transactions. In Principles and Practices of Parallel Programming (PPoPP), 2007.
[39]
Bruce Walter, Sebastian Fernandez, Adam Arbree, Kavita Bala, Michael Donikian, and Donald Greenberg. Lightcuts: a scalable approach to illumination. ACM Transactions on Graphics (SIGGRAPH), 24(3):1098--1107, July 2005.
[40]
Commutativity-based concurrency control for abstract data types. IEEE Transactions on Computers, 37(12), 1988.
[41]
Niklaus Wirth. Algorithms + Data Structures = Programs. Prentice Hall PTR, Upper Saddle River, NJ, USA, 1978.
[42]
Peng Wu and David A. Padua. Beyond arrays - a container-centric approach for parallelization of real-world symbolic applications. In LCPC '98: Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing, 1999.
[43]
L. Rauchwerger Y. Zhan and J. Torrellas. Hardware for speculative run--time parallelization in distributed shared-memory multiprocessors. In HPCA '98: Proceedings of the 4th International Symposium on High-Performance Computer Architecture, page 162, 1998.

Cited By

View all
  • (2024)When Is Parallelism Fearless and Zero-Cost with Rust?Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659966(27-40)Online publication date: 17-Jun-2024
  • (2024)Multi Bucket Queues: Efficient Concurrent Priority SchedulingProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659962(113-124)Online publication date: 17-Jun-2024
  • (2024)Parallel shared-memory open-source code for simulations of transient problems using isogeometric analysis, implicit direction splitting and residual minimization (IGA-ADS-RM)Advances in Engineering Software10.1016/j.advengsoft.2024.103723196(103723)Online publication date: Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI '07: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2007
508 pages
ISBN:9781595936332
DOI:10.1145/1250734
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 42, Issue 6
    Proceedings of the 2007 PLDI conference
    June 2007
    491 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/1273442
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. abstractions
  2. irregular programs
  3. optimistic parallelism

Qualifiers

  • Article

Conference

PLDI '07
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)49
  • Downloads (Last 6 weeks)5
Reflects downloads up to 02 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)When Is Parallelism Fearless and Zero-Cost with Rust?Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659966(27-40)Online publication date: 17-Jun-2024
  • (2024)Multi Bucket Queues: Efficient Concurrent Priority SchedulingProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659962(113-124)Online publication date: 17-Jun-2024
  • (2024)Parallel shared-memory open-source code for simulations of transient problems using isogeometric analysis, implicit direction splitting and residual minimization (IGA-ADS-RM)Advances in Engineering Software10.1016/j.advengsoft.2024.103723196(103723)Online publication date: Oct-2024
  • (2023)ASPENProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669124(68625-68638)Online publication date: 10-Dec-2023
  • (2023)Hypergraph-based locality-enhancing methods for graph operations in Big Data applicationsThe International Journal of High Performance Computing Applications10.1177/1094342023121453238:3(210-224)Online publication date: 20-Nov-2023
  • (2023)Automatic Code Generation for High-Performance Graph Algorithms2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT58117.2023.00010(14-26)Online publication date: 21-Oct-2023
  • (2023)STAMP-Rust: Language and Performance Comparison to C on Transactional BenchmarksBenchmarking, Measuring, and Optimizing10.1007/978-3-031-31180-2_10(160-175)Online publication date: 13-May-2023
  • (2022)Data Locality in High Performance Computing, Big Data, and Converged Systems: An Analysis of the Cutting Edge and a Future System ArchitectureElectronics10.3390/electronics1201005312:1(53)Online publication date: 23-Dec-2022
  • (2022)Veracity: declarative multicore programming with commutativityProceedings of the ACM on Programming Languages10.1145/35633496:OOPSLA2(1726-1756)Online publication date: 31-Oct-2022
  • (2022)Batched Graph Community Detection on GPUsProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569655(172-184)Online publication date: 8-Oct-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media