research-article

Automatic speculative DOALL for clusters

Authors:

Nick P. Johnson,

Scott A. Mahlke,

David I. AugustAuthors Info & Claims

CGO '12: Proceedings of the Tenth International Symposium on Code Generation and Optimization

Pages 94 - 103

https://doi.org/10.1145/2259016.2259029

Published: 31 March 2012 Publication History

Abstract

Automatic parallelization for clusters is a promising alternative to time-consuming, error-prone manual parallelization. However, automatic parallelization is frequently limited by the imprecision of static analysis. Moreover, due to the inherent fragility of static analysis, small changes to the source code can significantly undermine performance. By replacing static analysis with speculation and profiling, automatic parallelization becomes more robust and applicable. A naïve automatic speculative parallelization does not scale for distributed memory clusters, due to the high bandwidth required to validate speculation. This work is the first automatic speculative DOALL (Spec-DOALL) parallelization system for clusters. We have implemented a prototype automatic parallelization system, called Cluster Spec-DOALL, which consists of a Spec-DOALL parallelizing compiler and a speculative runtime for clusters. Since the compiler optimizes communication patterns, and the runtime is optimized for the cases in which speculation succeeds, Cluster Spec-DOALL minimizes the communication and validation overheads of the speculative runtime. Across 8 benchmarks, Cluster Spec-DOALL achieves a geomean speedup of 43.8x on a 120-core cluster, whereas DOALL without speculation achieves only 4.5x speedup. This demonstrates that speculation makes scalable fully-automatic parallelization for clusters possible.

References

[1]

S. P. Amarasinghe and M. S. Lam. Communication optimization and code generation for distributed memory machines. In Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation, 1993.

Digital Library

[2]

C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2008.

Digital Library

[3]

B. Blume, R. Eigenmann, K. Faigin, J. Grout, J. Hoefinger, D. Padua, P. Petersen, B. Pottenger, L. Rauchwerger, P. Tu, and S. Weatherford. Polaris: The next generation in parallelizing compilers. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing, 1994.

[4]

R. L. Bocchino, V. S. Adve, and B. L. Chamberlain. Software transactional memory for large scale clusters. In PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, 2008.

Digital Library

[5]

C. Cascaval, C. Blundell, M. Michael, H. W. Cain, P. Wu, S. Chiras, and S. Chatterjee. Software transactional memory: Why is it only a research toy? Queue, 6(5):46--58, 2008.

Digital Library

[6]

M. Couceiro, P. Romano, N. Carvalho, and L. Rodrigues. D2STM: Dependable distributed software transactional memory. Pacific Rim International Symposium on Dependable Computing, IEEE, 0:307--313, 2009.

Digital Library

[7]

J. P. Hoefinger. Extending OpenMP to clusters. White Paper Intel Corporation, 2006.

[8]

H. Kim, A. Raman, F. Liu, J. W. Lee, and D. I. August. Scalable speculative parallelization on commodity clusters. In In Proceedings of the 43rd IEEE/ACM International Symposium on Microarchitecture, 2010.

Digital Library

[9]

C. Kotselidis, M. Ansari, K. Jarvis, M. Luján, C. Kirkham, and I. Watson. DiSTM: A software transactional memory framework for clusters. In ICPP '08: Proceedings of the 2008 37th International Conference on Parallel Processing, 2008.

Digital Library

[10]

C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In CGO '04: Proceedings of the International Symposium on Code Generation and Optimization, 2004.

Digital Library

[11]

W. Liu, J. Tuck, L. Ceze, W. Ahn, K. Strauss, J. Renau, and J. Torrellas. POSH: a TLS compiler that exploits program structure. In PPoPP '06: Proceedings of the 11th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2006.

Digital Library

[12]

K. Manassiev, M. Mihailescu, and C. Amza. Exploiting distributed version concurrency in a transactional memory cluster. In PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, 2006.

Digital Library

[13]

M. Mehrara, J. Hao, P.-C. Hsu, and S. Mahlke. Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2009.

Digital Library

[14]

C. E. Oancea and A. Mycroft. Software thread-level speculation: an optimistic library implementation. In IWMSE '08: Proceedings of the 1st International Workshop on Multicore Software Engineering, 2008.

Digital Library

[15]

L.-N. Pouchet. PolyBench: The Polyhedral Benchmark suite. http://www--roc.inria.fr/~pouchet/software/polybench.

[16]

C. G. Quiñones, C. Madriles, J. Sánchez, P. Marcuello, A. González, and D. M. Tullsen. Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. In Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, 2005.

Digital Library

[17]

A. Raman, H. Kim, T. R. Mason, T. B. Jablin, and D. I. August. Speculative parallelization using software multi-threaded transactions. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010.

Digital Library

[18]

S. Rus, L. Rauchwerger, and J. Hoefinger. Hybrid analysis: static & dynamic memory reference analysis. Int. J. Parallel Program., 31:251--283, August 2003.

Digital Library

[19]

Stanford Compiler Group. SUIF: A parallelizing and optimizing research compiler. Technical Report CSL-TR-94-620, Stanford University, Computer Systems Laboratory, 1994.

[20]

J. G. Steffan, C. Colohan, A. Zhai, and T. C. Mowry. The STAMPede approach to thread-level speculation. ACM Transactions on Computer Systems, 23(3):253--300, February 2005.

Digital Library

[21]

J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A scalable approach to thread-level speculation. In Proceedings of the 27th International Symposium on Computer Architecture, 2000.

Digital Library

[22]

C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In MICRO '08: Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture, 2008.

Digital Library

[23]

R. M. Yoo and H.-H. S. Lee. Helper transactions: Enabling thread-level speculation via a transactional memory system. In PESPMA '08: Workshop on Parallel Execution of Sequential Programs on Multi-core Architectures, 2008.

[24]

Y. Zhang, L. Rauchwerger, and J. Torrellas. Hardware for speculative parallelization of partially-parallel loops in DSM multiprocessors. In The 5TH International Symposium on High-Performance Computer Architecture, 1999.

Digital Library

[25]

H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. Uncovering hidden loop level parallelism in sequential applications. In HPCA '08: Proceedings of the 14th International Symposium on High-Performance Computer Architecture, 2008.

Cited By

Zhang XJones TCampanoni S(2021)Quantifying the Semantic Gap Between Serial and Parallel Programming2021 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC53511.2021.00024(151-162)Online publication date: Nov-2021
https://doi.org/10.1109/IISWC53511.2021.00024
Apostolakis SXu ZTan ZChan GCampanoni SAugust DDonaldson ATorlak E(2020)SCAF: a speculation-aware collaborative dependence analysis frameworkProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3386028(638-654)Online publication date: 11-Jun-2020
https://dl.acm.org/doi/10.1145/3385412.3386028
Prema SNasre RJehadeesan RPanigrahi B(2019)A study on popular auto‐parallelization frameworksConcurrency and Computation: Practice and Experience10.1002/cpe.516831:17Online publication date: 11-Feb-2019
https://doi.org/10.1002/cpe.5168
Show More Cited By

Recommendations

Speculative precomputation: long-range prefetching of delinquent loads
Special Issue: Proceedings of the 28th annual international symposium on Computer architecture (ISCA '01)

This paper explores Speculative Precomputation, a technique that uses idle thread context in a multithreaded architecture to improve performance of single-threaded applications. It attacks program stalls from data cache misses by pre-computing future ...
An evaluation of speculative instruction execution on simultaneous multithreaded processors

Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...
Speculative dynamic vectorization
ISCA '02: Proceedings of the 29th annual international symposium on Computer architecture

Traditional vector architectures have shown to be very effective for regular codes where the compiler can detect data-level parallelism. However, this SIMD parallelism is also present in irregular or pointer-rich codes, for which the compiler is quite ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CGO '12: Proceedings of the Tenth International Symposium on Code Generation and Optimization

March 2012

285 pages

ISBN:9781450312066

DOI:10.1145/2259016

General Chairs:
Carol Eidt
Microsoft
,
Anne Holler
VMware
,
Program Chairs:
Uma Srinivasan
Intel
,
Saman Amarasinghe
MIT

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 March 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Conference

CGO '12

Sponsor:

CGO '12: Annual IEEE/ACM International Symposium on Code Generation and Optimization

March 31 - April 4, 2012

California, San Jose

Acceptance Rates

CGO '12 Paper Acceptance Rate 26 of 90 submissions, 29%;

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

27
Total Citations
View Citations
295
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)1

Reflects downloads up to 01 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang XJones TCampanoni S(2021)Quantifying the Semantic Gap Between Serial and Parallel Programming2021 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC53511.2021.00024(151-162)Online publication date: Nov-2021
https://doi.org/10.1109/IISWC53511.2021.00024
Apostolakis SXu ZTan ZChan GCampanoni SAugust DDonaldson ATorlak E(2020)SCAF: a speculation-aware collaborative dependence analysis frameworkProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3386028(638-654)Online publication date: 11-Jun-2020
https://dl.acm.org/doi/10.1145/3385412.3386028
Prema SNasre RJehadeesan RPanigrahi B(2019)A study on popular auto‐parallelization frameworksConcurrency and Computation: Practice and Experience10.1002/cpe.516831:17Online publication date: 11-Feb-2019
https://doi.org/10.1002/cpe.5168
Chae DKim JLee GKim HChang KLee HKim JGunawi HReed B(2018)DynaMixProceedings of the 2018 USENIX Conference on Usenix Annual Technical Conference10.5555/3277355.3277363(71-83)Online publication date: 11-Jul-2018
https://dl.acm.org/doi/10.5555/3277355.3277363
Deiana ESt-Amour VDinda PHardavellas NCampanoni S(2018)Unconventional Parallelization of Nondeterministic ApplicationsACM SIGPLAN Notices10.1145/3296957.317318153:2(432-447)Online publication date: 19-Mar-2018
https://dl.acm.org/doi/10.1145/3296957.3173181
Deiana ESt-Amour VDinda PHardavellas NCampanoni SShen XTuck JBianchini RSarkar V(2018)Unconventional Parallelization of Nondeterministic ApplicationsProceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3173162.3173181(432-447)Online publication date: 19-Mar-2018
https://dl.acm.org/doi/10.1145/3173162.3173181
Oh TBeard SJohnson NPopovych SAugust D(2017)A Generalized Framework for Automatic Scripting Language Parallelization2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT.2017.28(356-369)Online publication date: Sep-2017
https://doi.org/10.1109/PACT.2017.28
Kim CKim JKang JLee JKim H(2017)Context-Aware Memory Profiling for Speculative Parallelism2017 IEEE 24th International Conference on High Performance Computing (HiPC)10.1109/HiPC.2017.00045(328-337)Online publication date: Dec-2017
https://doi.org/10.1109/HiPC.2017.00045
Estebanez ALlanos DGonzalez-Escribano A(2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
https://dl.acm.org/doi/10.1145/2938369
Hammacher CStreit KZeller AHack SZaks AHermenegildo M(2016)Thread-level speculation with kernel supportProceedings of the 25th International Conference on Compiler Construction10.1145/2892208.2892221(1-11)Online publication date: 17-Mar-2016
https://dl.acm.org/doi/10.1145/2892208.2892221
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents