Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2259016.2259029acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
research-article

Automatic speculative DOALL for clusters

Published: 31 March 2012 Publication History

Abstract

Automatic parallelization for clusters is a promising alternative to time-consuming, error-prone manual parallelization. However, automatic parallelization is frequently limited by the imprecision of static analysis. Moreover, due to the inherent fragility of static analysis, small changes to the source code can significantly undermine performance. By replacing static analysis with speculation and profiling, automatic parallelization becomes more robust and applicable. A naïve automatic speculative parallelization does not scale for distributed memory clusters, due to the high bandwidth required to validate speculation. This work is the first automatic speculative DOALL (Spec-DOALL) parallelization system for clusters. We have implemented a prototype automatic parallelization system, called Cluster Spec-DOALL, which consists of a Spec-DOALL parallelizing compiler and a speculative runtime for clusters. Since the compiler optimizes communication patterns, and the runtime is optimized for the cases in which speculation succeeds, Cluster Spec-DOALL minimizes the communication and validation overheads of the speculative runtime. Across 8 benchmarks, Cluster Spec-DOALL achieves a geomean speedup of 43.8x on a 120-core cluster, whereas DOALL without speculation achieves only 4.5x speedup. This demonstrates that speculation makes scalable fully-automatic parallelization for clusters possible.

References

[1]
S. P. Amarasinghe and M. S. Lam. Communication optimization and code generation for distributed memory machines. In Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation, 1993.
[2]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2008.
[3]
B. Blume, R. Eigenmann, K. Faigin, J. Grout, J. Hoefinger, D. Padua, P. Petersen, B. Pottenger, L. Rauchwerger, P. Tu, and S. Weatherford. Polaris: The next generation in parallelizing compilers. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing, 1994.
[4]
R. L. Bocchino, V. S. Adve, and B. L. Chamberlain. Software transactional memory for large scale clusters. In PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, 2008.
[5]
C. Cascaval, C. Blundell, M. Michael, H. W. Cain, P. Wu, S. Chiras, and S. Chatterjee. Software transactional memory: Why is it only a research toy? Queue, 6(5):46--58, 2008.
[6]
M. Couceiro, P. Romano, N. Carvalho, and L. Rodrigues. D2STM: Dependable distributed software transactional memory. Pacific Rim International Symposium on Dependable Computing, IEEE, 0:307--313, 2009.
[7]
J. P. Hoefinger. Extending OpenMP to clusters. White Paper Intel Corporation, 2006.
[8]
H. Kim, A. Raman, F. Liu, J. W. Lee, and D. I. August. Scalable speculative parallelization on commodity clusters. In In Proceedings of the 43rd IEEE/ACM International Symposium on Microarchitecture, 2010.
[9]
C. Kotselidis, M. Ansari, K. Jarvis, M. Luján, C. Kirkham, and I. Watson. DiSTM: A software transactional memory framework for clusters. In ICPP '08: Proceedings of the 2008 37th International Conference on Parallel Processing, 2008.
[10]
C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In CGO '04: Proceedings of the International Symposium on Code Generation and Optimization, 2004.
[11]
W. Liu, J. Tuck, L. Ceze, W. Ahn, K. Strauss, J. Renau, and J. Torrellas. POSH: a TLS compiler that exploits program structure. In PPoPP '06: Proceedings of the 11th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2006.
[12]
K. Manassiev, M. Mihailescu, and C. Amza. Exploiting distributed version concurrency in a transactional memory cluster. In PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, 2006.
[13]
M. Mehrara, J. Hao, P.-C. Hsu, and S. Mahlke. Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2009.
[14]
C. E. Oancea and A. Mycroft. Software thread-level speculation: an optimistic library implementation. In IWMSE '08: Proceedings of the 1st International Workshop on Multicore Software Engineering, 2008.
[15]
L.-N. Pouchet. PolyBench: The Polyhedral Benchmark suite. http://www--roc.inria.fr/~pouchet/software/polybench.
[16]
C. G. Quiñones, C. Madriles, J. Sánchez, P. Marcuello, A. González, and D. M. Tullsen. Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. In Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, 2005.
[17]
A. Raman, H. Kim, T. R. Mason, T. B. Jablin, and D. I. August. Speculative parallelization using software multi-threaded transactions. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010.
[18]
S. Rus, L. Rauchwerger, and J. Hoefinger. Hybrid analysis: static & dynamic memory reference analysis. Int. J. Parallel Program., 31:251--283, August 2003.
[19]
Stanford Compiler Group. SUIF: A parallelizing and optimizing research compiler. Technical Report CSL-TR-94-620, Stanford University, Computer Systems Laboratory, 1994.
[20]
J. G. Steffan, C. Colohan, A. Zhai, and T. C. Mowry. The STAMPede approach to thread-level speculation. ACM Transactions on Computer Systems, 23(3):253--300, February 2005.
[21]
J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A scalable approach to thread-level speculation. In Proceedings of the 27th International Symposium on Computer Architecture, 2000.
[22]
C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In MICRO '08: Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture, 2008.
[23]
R. M. Yoo and H.-H. S. Lee. Helper transactions: Enabling thread-level speculation via a transactional memory system. In PESPMA '08: Workshop on Parallel Execution of Sequential Programs on Multi-core Architectures, 2008.
[24]
Y. Zhang, L. Rauchwerger, and J. Torrellas. Hardware for speculative parallelization of partially-parallel loops in DSM multiprocessors. In The 5TH International Symposium on High-Performance Computer Architecture, 1999.
[25]
H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. Uncovering hidden loop level parallelism in sequential applications. In HPCA '08: Proceedings of the 14th International Symposium on High-Performance Computer Architecture, 2008.

Cited By

View all
  • (2021)Quantifying the Semantic Gap Between Serial and Parallel Programming2021 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC53511.2021.00024(151-162)Online publication date: Nov-2021
  • (2020)SCAF: a speculation-aware collaborative dependence analysis frameworkProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3386028(638-654)Online publication date: 11-Jun-2020
  • (2019)A study on popular auto‐parallelization frameworksConcurrency and Computation: Practice and Experience10.1002/cpe.516831:17Online publication date: 11-Feb-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CGO '12: Proceedings of the Tenth International Symposium on Code Generation and Optimization
March 2012
285 pages
ISBN:9781450312066
DOI:10.1145/2259016
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 March 2012

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

CGO '12

Acceptance Rates

CGO '12 Paper Acceptance Rate 26 of 90 submissions, 29%;
Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 01 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Quantifying the Semantic Gap Between Serial and Parallel Programming2021 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC53511.2021.00024(151-162)Online publication date: Nov-2021
  • (2020)SCAF: a speculation-aware collaborative dependence analysis frameworkProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3386028(638-654)Online publication date: 11-Jun-2020
  • (2019)A study on popular auto‐parallelization frameworksConcurrency and Computation: Practice and Experience10.1002/cpe.516831:17Online publication date: 11-Feb-2019
  • (2018)DynaMixProceedings of the 2018 USENIX Conference on Usenix Annual Technical Conference10.5555/3277355.3277363(71-83)Online publication date: 11-Jul-2018
  • (2018)Unconventional Parallelization of Nondeterministic ApplicationsACM SIGPLAN Notices10.1145/3296957.317318153:2(432-447)Online publication date: 19-Mar-2018
  • (2018)Unconventional Parallelization of Nondeterministic ApplicationsProceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3173162.3173181(432-447)Online publication date: 19-Mar-2018
  • (2017)A Generalized Framework for Automatic Scripting Language Parallelization2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT.2017.28(356-369)Online publication date: Sep-2017
  • (2017)Context-Aware Memory Profiling for Speculative Parallelism2017 IEEE 24th International Conference on High Performance Computing (HiPC)10.1109/HiPC.2017.00045(328-337)Online publication date: Dec-2017
  • (2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
  • (2016)Thread-level speculation with kernel supportProceedings of the 25th International Conference on Compiler Construction10.1145/2892208.2892221(1-11)Online publication date: 17-Mar-2016
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media