Article

Optimistic parallelism requires abstractions

Authors:

Milind Kulkarni,

Keshav Pingali,

Ganesh Ramanarayanan,

L. Paul ChewAuthors Info & Claims

PLDI '07: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 211 - 222

https://doi.org/10.1145/1250734.1250759

Published: 10 June 2007 Publication History

Abstract

Irregular applications, which manipulate large, pointer-based data structures like graphs, are difficult to parallelize manually. Automatic tools and techniques such as restructuring compilers and run-time speculative execution have failed to uncover much parallelism in these applications, in spite of a lot of effort by the research community. These difficulties have even led some researchers to wonder if there is any coarse-grain parallelism worth exploiting in irregular applications.

In this paper, we describe two real-world irregular applications: a Delaunay mesh refinement application and a graphics application thatperforms agglomerative clustering. By studying the algorithms and data structures used in theseapplications, we show that there is substantial coarse-grain, data parallelism in these applications, but that this parallelism is very dependent on the input data and therefore cannot be uncoveredby compiler analysis. In principle, optimistic techniques such asthread-level speculation can be used to uncover this parallelism, but we argue that current implementations cannot accomplish thisbecause they do not use the proper abstractions for the data structuresin these programs.

These insights have informed our design of the Galois system, an object-based optimistic parallelization system for irregular applications. There are three main aspects to Galois: (1) a small number of syntactic constructs for packaging optimistic parallelism as iteration over ordered and unordered sets, (2)assertions about methods in class libraries, and (3) a runtime scheme for detecting and recovering from potentially unsafe accesses to shared memory made by an optimistic computation.

We show that Delaunay mesh generation and agglomerative clustering can be parallelized in a straight-forward way using the Galois approach, and we present experimental measurements to show that this approach is practical. These results suggest that Galois is a practical approach to exploiting data parallelismin irregular programs.

References

[1]

C. Scott Ananian, Krste Asanovic, Bradley C. Kuszmaul, Charles E. Leiserson, and Sean Lie. Unbounded transactional memory. In HPCA '05: Proceedings of the 11th International Symposium on High-Performance Computer Architecture, 2005.

Digital Library

[2]

Christos D. Antonopoulos, Xiaoning Ding, Andrey Chernikov, Filip Blagojevic, Dimitrios S. Nikolopoulos, and Nikos Chrisochoides. Multigrain parallel delaunay mesh generation: challenges and opportunities for multithreaded architectures. In ICS '05: Proceedings of the 19th annual international conference on Supercomputing, 2005.

Digital Library

[3]

J. L. Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509--517, 1975.

Digital Library

[4]

A. Bernstein. Analysis of programs for parallel processing. IEEE Transactions on Electronic Computers, 1966.

[5]

Michael Burke, Paul Carini, and Jong-Deok Choi. Interprocedural pointer alias analysis. Technical Report IBM RC 21055, IBM Yorktown Heights, 1997.

[6]

Brian D. Carlstrom, Austen McDonald, Christos Kozyrakis, and Kunle Olukotun. Transactional collection classes. In Principles and Practices of Parallel Programming (PPoPP), 2007.

Digital Library

[7]

C. C. Foster and E. M. Riseman. Percolation of code to enhance parallel dispatching and execution. IEEE Transactions on Computers, 21(12):1411--1415, 1972.

Digital Library

[8]

L. Paul Chew. Guaranteed-quality mesh generation for curved surfaces. In SCG '93: Proceedings of the ninth annual symposium on Computational geometry, pages 274--280, 1993.

Digital Library

[9]

Johan de Galas. The quest for more processing power: is the single core CPU doomed? http://www.anandtech.com/cpuchipsets/ showdoc.aspx?i=2377, February 2005.

[10]

Pedro C. Diniz and Martin C. Rinard. Commutativity analysis: a new analysis technique for parallelizing compilers. ACM Trans. Program. Lang. Syst., 19(6):942--991, 1997.

Digital Library

[11]

Joseph A. Fisher. Very long instruction word architectures and the eli-512. In ISCA '98: 25 years of the international symposia on Computer architecture (selected papers), 1998.

Digital Library

[12]

Lance Hammond, Vicky Wong, Mike Chen, Brian D. Carlstrom, John D. Davis, Ben Hertzberg, Manohar K. Prabhu, Honggo Wijaya, Christos Kozyrakis, and Kunle Olukotun. Transactional memory coherence and consistency. ISCA 2004, 00:102, 2004.

[13]

Tim Harris and Keir Fraser. Language support for lightweight transactions. In OOPSLA '03: Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications, pages 388--402, 2003.

Digital Library

[14]

John Hennessy and David Patterson, editors. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 2003.

Digital Library

[15]

Maurice Herlihy and J. Eliot B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA '93: Proceedings of the 20th annual international symposium on Computer architecture, pages 289--300, New York, NY, USA, 1993. ACM Press.

Digital Library

[16]

Maurice P. Herlihy and William E. Weihl. Hybrid concurrency control for abstract data types. In PODS '88: Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pages 201--210, New York, NY, USA, 1988.

Digital Library

[17]

David R. Jefferson. Virtual time. ACM Trans. Program. Lang. Syst., 7(3):404--425, 1985.

Digital Library

[18]

Guy L. Steele Jr. Making asynchronous parallelism safe for the world. In Proceedings of the 17th symposium on Principles of Programming Languages, pages 218--231, 1990.

Digital Library

[19]

J. T. Schwartz, R. B. K. Dewar, E. Dubinsky, and E. Schonberg. Programming with sets: An introduction to SETL. Springer-Verlag Publishers, 1986.

Digital Library

[20]

Ken Kennedy and John Allen, editors. Optimizing compilers for modren architectures:a dependence-based approach. Morgan Kaufmann, 2001.

Digital Library

[21]

Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, and David A. Wood. Logtm: Log-based transactional memory. In HPCA '06: Proceedings of the 12th International Symposium on High Performance Computer Architecture, 2006.

[22]

J. Eliot B. Moss and Antony L. Hosking. Nested transactional memory: Model and preliminary architectural sketches. In SCOOL '05: Sychronization and Concurrency in Object-Oriented Languages, 2005.

[23]

J. B. C Neto, P. A. Wawrzynek, M. T. M. Carvalho, L. F. Martha, and A. R. Ingraffea. An algorithm for three-dimensional mesh generation for arbitrary regions with cracks. Engineering with Computers, 17:75--91, 2001.

[24]

Yang Ni, Vijay Menon, Ali-Reza Adl-Tabatabai, Antony L. Hosking, Rick Hudson, J. Eliot B. Moss, Bratin Saha, and Tatiana Shpeisman. Open nesting in software transactional memory. In Principles and Practices of Parallel Programming (PPoPP), 2007.

Digital Library

[25]

Openmp: A proposed industry standard api for shared memory programming. See www.openmp.org, October 28, 1997.

[26]

Michael Steinbach Pang-Ning Tan and Vipin Kumar, editors. Introduction to Data Mining. Pearson Addison Wesley, 2005.

Digital Library

[27]

R. Ponnusamy, J. Saltz, and A. Choudhary. Runtime compilation techniques for data partitioning and communication schedule reuse. In Proceedings of the 1993 ACM/IEEE conference on Supercomputing, 1993.

Digital Library

[28]

Hany E. Ramadan, Donald E. Porter Christopher J. Rossbach, Owen S. Hofmann, Aditya Bhandari, and Emmett Witchel. Transactional memory designs for an operating system. In International Symposium on Computer Architecture (ISCA), 2007.

Digital Library

[29]

Lawrence Rauchwerger and David A. Padua. Parallelizing while loops for multiprocessor systems. In IPPS '95: Proceedings of the 9th International Symposium on Parallel Processing, 1995.

Digital Library

[30]

Lawrence Rauchwerger and David A. Padua. The lrpd test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parallel Distrib. Syst., 10(2):160--180, 1999.

Digital Library

[31]

M. Sagiv, T. Reps, and R. Wilhelm. Solving shape-analysis problems in languages with destructive updating. In Proceedings of the 23rd Annual ACM Symposium on the Principles of Programming Languages, St. Petersburg Beach, FL, January 1996.

Digital Library

[32]

William Scherer and Michael Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proceedings of the Fifteenth ACM Symposium on Principles of Distributed Computing, 1996.

Digital Library

[33]

B. Selman, H. Levesque, and D. Mitchell. A new method for solving hard satisfiability problems. In Proceedings of the Tenth National Conference on Artificial Intelligence, pages 440--446, 1992.

Digital Library

[34]

Nir Shavit and Dan Touitou. Software transactional memory. In PODC '95: Proceedings of the fourteenth annual ACM Symposium on Principles of Distributed Computing, pages 204--213, 1995.

Digital Library

[35]

Jonathan Richard Shewchuk. Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator. In Applied Computational Geometry: Towards Geometric Engineering, volume 1148 of Lecture Notes in Computer Science, pages 203--222. May 1996.

Digital Library

[36]

J. Greggory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry. A scalable approach to thread-level speculation. In ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture, 2000.

Digital Library

[37]

Robert Tomasulo. An algorithm for exploiting multiple arithmetic units. IBM Journal, 11(1):25--33, 1967.

Digital Library

[38]

Christoph von Praun, Luis Ceze, and Calin Cascaval. Implicit parallelism with ordered transactions. In Principles and Practices of Parallel Programming (PPoPP), 2007.

Digital Library

[39]

Bruce Walter, Sebastian Fernandez, Adam Arbree, Kavita Bala, Michael Donikian, and Donald Greenberg. Lightcuts: a scalable approach to illumination. ACM Transactions on Graphics (SIGGRAPH), 24(3):1098--1107, July 2005.

Digital Library

[40]

Commutativity-based concurrency control for abstract data types. IEEE Transactions on Computers, 37(12), 1988.

Digital Library

[41]

Niklaus Wirth. Algorithms + Data Structures = Programs. Prentice Hall PTR, Upper Saddle River, NJ, USA, 1978.

Digital Library

[42]

Peng Wu and David A. Padua. Beyond arrays - a container-centric approach for parallelization of real-world symbolic applications. In LCPC '98: Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing, 1999.

Digital Library

[43]

L. Rauchwerger Y. Zhan and J. Torrellas. Hardware for speculative run--time parallelization in distributed shared-memory multiprocessors. In HPCA '98: Proceedings of the 4th International Symposium on High-Performance Computer Architecture, page 162, 1998.

Digital Library

Cited By

Abdi JPosluns GZhang GWang BJeffrey MAgrawal KPetrank E(2024)When Is Parallelism Fearless and Zero-Cost with Rust?Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659966(27-40)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659966
Zhang GPosluns GJeffrey MAgrawal KPetrank E(2024)Multi Bucket Queues: Efficient Concurrent Priority SchedulingProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659962(113-124)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659962
Łoś MPaszyński M(2024)Parallel shared-memory open-source code for simulations of transient problems using isogeometric analysis, implicit direction splitting and residual minimization (IGA-ADS-RM)Advances in Engineering Software10.1016/j.advengsoft.2024.103723196(103723)Online publication date: Oct-2024
https://doi.org/10.1016/j.advengsoft.2024.103723
Show More Cited By

Index Terms

Optimistic parallelism requires abstractions
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

How much parallelism is there in irregular applications?
PPoPP '09

Irregular programs are programs organized around pointer-based data structures such as trees and graphs. Recent investigations by the Galois project have shown that many irregular programs have a generalized form of data-parallelism called amorphous ...
Optimistic parallelism requires abstractions
Proceedings of the 2007 PLDI conference

Irregular applications, which manipulate large, pointer-based data structures like graphs, are difficult to parallelize manually. Automatic tools and techniques such as restructuring compilers and run-time speculative execution have failed to uncover ...
Optimistic parallelism benefits from data partitioning
ASPLOS '08

Recent studies of irregular applications such as finite-element mesh generators and data-clustering codes have shown that these applications have a generalized data parallelism arising from the use of iterative algorithms that perform computations on ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PLDI '07: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation

June 2007

508 pages

ISBN:9781595936332

DOI:10.1145/1250734

General Chair:
Jeanne Ferrante
University of California, San Diego, USA
,
Program Chair:
Kathryn S. McKinley
University of Texas at Austin, USA

ACM SIGPLAN Notices Volume 42, Issue 6
Proceedings of the 2007 PLDI conference
June 2007
491 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1273442
Issue’s Table of Contents

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

PLDI '07

Sponsor:

PLDI '07: ACM SIGPLAN Conference on Programming Language Design and Implementation

June 10 - 13, 2007

California, San Diego, USA

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

407
Total Citations
View Citations
2,029
Total Downloads

Downloads (Last 12 months)49
Downloads (Last 6 weeks)5

Reflects downloads up to 02 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Abdi JPosluns GZhang GWang BJeffrey MAgrawal KPetrank E(2024)When Is Parallelism Fearless and Zero-Cost with Rust?Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659966(27-40)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659966
Zhang GPosluns GJeffrey MAgrawal KPetrank E(2024)Multi Bucket Queues: Efficient Concurrent Priority SchedulingProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659962(113-124)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659962
Łoś MPaszyński M(2024)Parallel shared-memory open-source code for simulations of transient problems using isogeometric analysis, implicit direction splitting and residual minimization (IGA-ADS-RM)Advances in Engineering Software10.1016/j.advengsoft.2024.103723196(103723)Online publication date: Oct-2024
https://doi.org/10.1016/j.advengsoft.2024.103723
Park JBin KPark GHa SLee KOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)ASPENProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669124(68625-68638)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3669124
Akbudak K(2023)Hypergraph-based locality-enhancing methods for graph operations in Big Data applicationsThe International Journal of High Performance Computing Applications10.1177/1094342023121453238:3(210-224)Online publication date: 20-Nov-2023
https://doi.org/10.1177/10943420231214532
Peng ZAshraf RGuo LTian RKestor G(2023)Automatic Code Generation for High-Performance Graph Algorithms2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT58117.2023.00010(14-26)Online publication date: 21-Oct-2023
https://doi.org/10.1109/PACT58117.2023.00010
Suchert FCastrillon J(2023)STAMP-Rust: Language and Performance Comparison to C on Transactional BenchmarksBenchmarking, Measuring, and Optimizing10.1007/978-3-031-31180-2_10(160-175)Online publication date: 13-May-2023
https://doi.org/10.1007/978-3-031-31180-2_10
Usman SMehmood RKatib IAlbeshri A(2022)Data Locality in High Performance Computing, Big Data, and Converged Systems: An Analysis of the Cutting Edge and a Future System ArchitectureElectronics10.3390/electronics1201005312:1(53)Online publication date: 23-Dec-2022
https://doi.org/10.3390/electronics12010053
Chen AFathololumi PKoskinen EPincus J(2022)Veracity: declarative multicore programming with commutativityProceedings of the ACM on Programming Languages10.1145/35633496:OOPSLA2(1726-1756)Online publication date: 31-Oct-2022
https://dl.acm.org/doi/10.1145/3563349
Chou HGhosh SKloeckner AMoreira J(2022)Batched Graph Community Detection on GPUsProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569655(172-184)Online publication date: 8-Oct-2022
https://dl.acm.org/doi/10.1145/3559009.3569655
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents