Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1993498.1993500acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Commutative set: a language extension for implicit parallel programming

Published: 04 June 2011 Publication History

Abstract

Sequential programming models express a total program order, of which a partial order must be respected. This inhibits parallelizing tools from extracting scalable performance. Programmer written semantic commutativity assertions provide a natural way of relaxing this partial order, thereby exposing parallelism implicitly in a program. Existing implicit parallel programming models based on semantic commutativity either require additional programming extensions, or have limited expressiveness. This paper presents a generalized semantic commutativity based programming extension, called Commutative Set (COMMSET), and associated compiler technology that enables multiple forms of parallelism. COMMSET expressions are syntactically succinct and enable the programmer to specify commutativity relations between groups of arbitrary structured code blocks. Using only this construct, serializing constraints that inhibit parallelization can be relaxed, independent of any particular parallelization strategy or concurrency control mechanism. COMMSET enables well performing parallelizations in cases where they were inapplicable or non-performing before. By extending eight sequential programs with only 8 annotations per program on average, COMMSET and the associated compiler technology produced a geomean speedup of 5.7x on eight cores compared to 1.5x for the best non-COMMSET parallelization.

References

[1]
F. Aleen and N. Clark. Commutativity analysis for software parallelization: Letting program transformations see the big picture. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2009.
[2]
Apple Open Source. md5sum: Message Digest 5 computation. http://www.opensource.apple.com/darwinsource/.
[3]
E. Ayguadé, N. Copty, A. Duran, J. Hoeflinger, Y. Lin, F. Massaioli, X. Teruel, P. Unnikrishnan, and G. Zhang. The design of OpenMP tasks. IEEE Transactions on Parallel and Distributed Systems, 2009.
[4]
G. E. Blelloch and J. Greiner. A provable time and space efficient implementation of NESL. In Proceedings of the First ACM SIGPLAN International Conference on Functional Programming (ICFP), 1996.
[5]
R. L. Bocchino, Jr., V. S. Adve, D. Dig, S. V. Adve, S. Heumann, R. Komuravelli, J. Overbey, P. Simmons, H. Sung, and M. Vakilian. A type and effect system for Deterministic Parallel Java. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems, Languages, and Applications (OOPSLA), 2009.
[6]
M. Bridges, N. Vachharajani, Y. Zhang, T. Jablin, and D. August. Revisiting the sequential programming model for multi-core. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2007.
[7]
M. J. Bridges. The VELOCITY compiler: Extracting efficient multicore execution from legacy sequential codes. PhD thesis, 2008.
[8]
D. R. Butenhof. Programming with POSIX threads. Addison-Wesley Longman Publishing Co., Inc., 1997.
[9]
M. C. Carlisle. Olden: Parallelizing programs with dynamic data structures on distributed-memory machines. PhD thesis, 1996.
[10]
B. D. Carlstrom, A. McDonald, M. Carbin, C. Kozyrakis, and K. Olukotun. Transactional collection classes. In Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2007.
[11]
R. Eigenmann, J. Hoeflinger, Z. Li, and D. A. Padua. Experience in the automatic parallelization of four Perfect-benchmark programs. In Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing (LCPC), 1992.
[12]
J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst., 9(3), 1987.
[13]
T. Harris and S. Singh. Feedback directed implicit parallelism. In Proceedings of the 12th ACM SIGPLAN International Conference on Functional Programming (ICFP), 2007.
[14]
J. L. Henning. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News, 2006.
[15]
W.-m. Hwu, S. Ryoo, S.-Z. Ueng, J. Kelm, I. Gelado, S. Stone, R. Kidd, S. Baghsorkhi, A. Mahesri, S. Tsao, N. Navarro, S. Lumetta, M. Frank, and S. Patel. Implicitly parallel programming models for thousand-core microprocessors. In Proceedings of the 44th annual Design Automation Conference (DAC), 2007.
[16]
K. Kennedy and J. R. Allen. Optimizing Compilers for Modern Architectures: a Dependence-based Approach. Morgan Kaufmann Publishers Inc., 2002.
[17]
E. Koskinen, M. Parkinson, and M. Herlihy. Coarse-grained transactions. In Proceedings of the 37th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 2010.
[18]
M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).
[19]
C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis and transformation. In Proceedings of 2nd International Symposium on Code Generation and Optimization (CGO), 2004.
[20]
R. Leino, P. Müller, and J. Smans. Deadlock-free channels and locks. In Proceedings of the 19th European Symposium on Programming (ESOP), 2010.
[21]
G. Memik, W. H. Mangione-Smith, and W. Hu. NetBench: a benchmarking suite for network processors. In Proceedings of the 2001 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2001.
[22]
C. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford Transactional Applications for Multi-Processing. In IEEE International Symposium on Workload Characterization (IISWC), 2008.
[23]
R. Narayanan, B. Ozisikyilmaz, J. Zambreno, G. Memik, and A. Choudhary. MineBench: A benchmark suite for data mining workloads. In IEEE International Symposium on Workload Characterization (IIWSC), 2006.
[24]
G. Ottoni. Global Instruction Scheduling for Multi-Threaded Architectures. PhD thesis, 2008.
[25]
G. Ottoni, R. Rangan, A. Stoler, and D. I. August. Automatic thread extraction with decoupled software pipelining. Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2005.
[26]
E. Raman, G. Ottoni, A. Raman, M. J. Bridges, and D. I. August. Parallel-stage decoupled software pipelining. In Proceedings of the 6th annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 2008.
[27]
M. C. Rinard. The design, implementation and evaluation of Jade, a portable, implicitly parallel programming language. PhD thesis, 1994.
[28]
M. C. Rinard and P. Diniz. Commutativity analysis: A new analysis framework for parallelizing compilers. In Proceedings of the ACM SIGPLAN 1996 Conference on Programming Language Design and Implementation (PLDI).
[29]
P. Selinger. potrace: Transforming bitmaps into vector graphics. http://potrace.sourceforge.net.
[30]
H. Vandierendonck, S. Rul, and K. De Bosschere. The Paralax infrastructure: Automatic parallelization with a helping hand. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2010.
[31]
C. von Praun, L. Ceze, and C. Caşcaval. Implicit parallelism with ordered transactions. In Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2007.
[32]
P. Wu and D. A. Padua. Beyond arrays - a container-centric approach for parallelization of real-world symbolic applications. In Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing (LCPC), 1999.
[33]
R. M. Yoo, Y. Ni, A. Welc, B. Saha, A.-R. Adl-Tabatabai, and H.-H. S. Lee. Kicking the tires of software transactional memory: Why the going gets tough. In Proceedings of the Twentieth Annual Symposium on Parallelism in Algorithms and Architectures (SPAA), 2008.
[34]
H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. Uncovering hidden loop level parallelism in sequential applications. In Proceedings of 14th International Conference on High-Performance Computer Architecture (HPCA), 2008.

Cited By

View all
  • (2023)SPLENDID: Supporting Parallel LLVM-IR Enhanced Natural Decompilation for Interactive DevelopmentProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582058(679-693)Online publication date: 25-Mar-2023
  • (2023)Better Predicates and Heuristics for Improved Commutativity SynthesisAutomated Technology for Verification and Analysis10.1007/978-3-031-45332-8_5(93-113)Online publication date: 19-Oct-2023
  • (2021)Practical smart contract sharding with ownership and commutativity analysisProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454112(1327-1341)Online publication date: 19-Jun-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2011
668 pages
ISBN:9781450306638
DOI:10.1145/1993498
  • General Chair:
  • Mary Hall,
  • Program Chair:
  • David Padua
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 46, Issue 6
    PLDI '11
    June 2011
    652 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/1993316
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. automatic parallelization
  2. implicit parallelism
  3. programming model
  4. semantic commutativity
  5. static analysis

Qualifiers

  • Research-article

Conference

PLDI '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)4
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)SPLENDID: Supporting Parallel LLVM-IR Enhanced Natural Decompilation for Interactive DevelopmentProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582058(679-693)Online publication date: 25-Mar-2023
  • (2023)Better Predicates and Heuristics for Improved Commutativity SynthesisAutomated Technology for Verification and Analysis10.1007/978-3-031-45332-8_5(93-113)Online publication date: 19-Oct-2023
  • (2021)Practical smart contract sharding with ownership and commutativity analysisProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454112(1327-1341)Online publication date: 19-Jun-2021
  • (2021)Loop parallelization using dynamic commutativity analysisProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370319(150-161)Online publication date: 27-Feb-2021
  • (2018)Towards a compiler analysis for parallel algorithmic skeletonsProceedings of the 27th International Conference on Compiler Construction10.1145/3178372.3179513(174-184)Online publication date: 24-Feb-2018
  • (2017)The scalable commutativity ruleCommunications of the ACM10.1145/306891460:8(83-90)Online publication date: 24-Jul-2017
  • (2017)A Generalized Framework for Automatic Scripting Language Parallelization2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT.2017.28(356-369)Online publication date: Sep-2017
  • (2016)Exploiting semantic commutativity in hardware speculationThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195679(1-12)Online publication date: 15-Oct-2016
  • (2016)Speculatively Exploiting Cross-Invocation ParallelismProceedings of the 2016 International Conference on Parallel Architectures and Compilation10.1145/2967938.2967959(207-221)Online publication date: 11-Sep-2016
  • (2016)Exploiting semantic commutativity in hardware speculation2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO.2016.7783737(1-12)Online publication date: Oct-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media