research-article

Parcae: a system for flexible parallel execution

Authors:

David I. AugustAuthors Info & Claims

PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 133 - 144

https://doi.org/10.1145/2254064.2254082

Published: 11 June 2012 Publication History

Abstract

Workload, platform, and available resources constitute a parallel program's execution environment. Most parallelization efforts statically target an anticipated range of environments, but performance generally degrades outside that range. Existing approaches address this problem with dynamic tuning but do not optimize a multiprogrammed system holistically. Further, they either require manual programming effort or are limited to array-based data-parallel programs.

This paper presents Parcae, a generally applicable automatic system for platform-wide dynamic tuning. Parcae includes (i) the Nona compiler, which creates flexible parallel programs whose tasks can be efficiently reconfigured during execution; (ii) the Decima monitor, which measures resource availability and system performance to detect change in the environment; and (iii) the Morta executor, which cuts short the life of executing tasks, replacing them with other functionally equivalent tasks better suited to the current environment. Parallel programs made flexible by Parcae outperform original parallel implementations in many interesting scenarios.

References

[1]

R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers Inc., 2002.

Digital Library

[2]

J. Ansel, C. Chan, Y. L. Wong, M. Olszewski, Q. Zhao, A. Edelman, and S. Amarasinghe. PetaBricks: A language and compiler for algorithmic choice. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2009.

Digital Library

[3]

C. W. Antoine, A. Petitet, and J. J. Dongarra. Automated empirical optimization of software and the ATLAS project. Parallel Computing, 27:2001, 2000.

[4]

Apple Open Source. md5sum: Message Digest 5 computation. http://www.opensource.apple.com/darwinsource.

[5]

M. M. Baskaran, N. Vydyanathan, U. K. R. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors. In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 219--228, 2009.

Digital Library

[6]

A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schüpbach, and A. Singhania. The multikernel: A new OS architecture for scalable multicore systems. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP), pages 29--44, 2009.

Digital Library

[7]

C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2008.

Digital Library

[8]

O. Bilgir, M. Martonosi, and Q. Wu. Exploring the potential of CMP core count management on data center energy savings. In Proceedings of the 3rd Workshop on Energy Efficient Design (WEED), 2011.

[9]

S. L. Bird and B. J. Smith. PACORA: Performance aware convex optimization for resource allocation. In Proceedings of the 3rd USENIX Workshop on Hot Topics in Parallelism (HotPar: Posters), 2011.

[10]

F. Blagojevic, D. S. Nikolopoulos, A. Stamatakis, C. D. Antonopoulos, and M. Curtis-Maury. Runtime scheduling of dynamic parallelism on accelerator-based multi-core systems. Parallel Computing, 33(10--11):700--719, 2007.

Digital Library

[11]

Y. Ding, M. Kandemir, P. Raghavan, and M. J. Irwin. Adapting application execution in CMPs using helper threads. Journal of Parallel and Distributed Computing, 69(9):790--806, 2009.

Digital Library

[12]

P. Diniz and M. Rinard. Dynamic feedback: An effective technique for adaptive computing. In Proceedings of the 18th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 1997.

Digital Library

[13]

G. Edjlali, G. Agrawal, A. Sussman, J. Humphries, and J. Saltz. Compiler and runtime support for programming in adaptive parallel environments. In Scientific Programming, pages 215--227, 1995.

[14]

M. W. Hall and M. Martonosi. Adaptive parallelism in compiler-parallelized code. In Proceedings of the 2nd SUIF Compiler Workshop, 1997.

[15]

J. L. Hellerstein, V. Morrison, and E. Eilebrecht. Applying control theory in the real world: Experience with building a controller for the .NET thread pool. Performance Evaluation Review, 37:38--42, 2010.

Digital Library

[16]

T. Karcher and V. Pankratius. Run-time automatic performance tuning for multicore applications. In Proceedings of the International Euro-Par Conference on Parallel Processing (Euro-Par), pages 3--14, 2011.

Digital Library

[17]

A. Kejariwal, A. Nicolau, A. V. Veidenbaum, U. Banerjee, and C. D. Polychronopoulos. Efficient scheduling of nested parallel loops on multi-core systems. In Proceedings of the 2009 International Conference on Parallel Processing (ICPP), pages 74--83, 2009.

Digital Library

[18]

M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 211--222, 2007.

Digital Library

[19]

C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the Annual International Symposium on Code Generation and Optimization (CGO), pages 75--86, 2004.

Digital Library

[20]

C. E. Leiserson. The Cilk concurrency platform. In Proceedings of the 46th ACM/IEEE Design Automation Conference (DAC), pages 522--527, 2009.

Digital Library

[21]

LLVM Test Suite Guide. http://llvm.org/docs/TestingGuide.html.

[22]

C.-K. Luk, S. Hong, and H. Kim. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 45--55, 2009.

Digital Library

[23]

J. Mars, N. Vachharajani, M. L. Soffa, and R. Hundt. Contention aware execution: Online contention detection and response. In Proceedings of the Annual International Symposium on Code Generation and Optimization (CGO), Toronto, Canada, 2010.

Digital Library

[24]

G. Memik, W. H. Mangione-Smith, and W. Hu. NetBench: A benchmarking suite for network processors. In Proceedings of the 2001 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2001.

Digital Library

[25]

C. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford Transactional Applications for Multi-Processing. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), 2008.

[26]

R. Narayanan, B. Ozisikyilmaz, J. Zambreno, G. Memik, and A. Choudhary. Minebench: A benchmark suite for data mining workloads. 2006.

[27]

I. Neamtiu. Elastic executions from inelastic programs. In Proceedings of the 6th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS), 2011.

Digital Library

[28]

H. Pan, B. Hindman, and K. Asanović. Composing parallel software efficiently with Lithe. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 376--387, 2010.

Digital Library

[29]

D. A. Penry. Multicore diversity: A software developer's nightmare. ACM SIGOPS Operating Systems Review, 43:100--101, 2009.

Digital Library

[30]

C. D. Polychronopoulos. The hierarchical task graph and its use in auto-scheduling. In Proceedings of the 5th International Conference on Supercomputing (ICS), pages 252--263, 1991.

Digital Library

[31]

P. Prabhu, S. Ghosh, Y. Zhang, N. P. Johnson, and D. I. August. Commutative set: A language extension for implicit parallel programming. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2011.

Digital Library

[32]

M. Püschel, F. Franchetti, and Y. Voronenko. Encyclopedia of Parallel Computing, chapter Spiral. Springer, 2011.

Digital Library

[33]

A. Raman, H. Kim, T. Oh, J. W. Lee, and D. I. August. Parallelism orchestration using DoPE: the degree of parallelism executive. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2011.

Digital Library

[34]

E. Raman, G. Ottoni, A. Raman, M. Bridges, and D. I. August. Parallel-stage decoupled software pipelining. In Proceedings of the Annual International Symposium on Code Generation and Optimization (CGO), 2008.

Digital Library

[35]

L. Rauchwerger, N. M. Amato, and D. A. Padua. A scalable method for run-time loop parallelization. International Journal of Parallel Programming (IJPP), 26:537--576, 1995.

Digital Library

[36]

A. Robison, M. Voss, and A. Kukanov. Optimization via reflection on work stealing in TBB. In Proceedings of the 22nd International Parallel and Distributed Processing Symposium (IPDPS), pages 1--8, 2008.

[37]

J. Saltz, R. Mirchandaney, and R. Crowley. Run-time parallelization and scheduling of loops. IEEE Transactions on Computers, 40, 1991.

Digital Library

[38]

P. Selinger. potrace: Transforming bitmaps into vector graphics. http://potrace.sourceforge.net.

[39]

J. C. Spall. Introduction to Stochastic Search and Optimization. Wiley-Interscience, 2003.

Digital Library

[40]

M. A. Suleman, M. K. Qureshi, Khubaib, and Y. N. Patt. Feedback-directed pipeline parallelism. In Proceedings of the 19th International Conference on Parallel Architecture and Compilation Techniques (PACT), pages 147--156, 2010.

Digital Library

[41]

A. Tiwari and J. K. Hollingsworth. Online adaptive code generation and tuning. In Proceedings of the 25th International Parallel and Distributed Processing Symposium (IPDPS), 2011.

Digital Library

[42]

A. Tzannes, G. C. Caragea, R. Barua, and U. Vishkin. Lazy binary-splitting: A run-time adaptive work-stealing scheduler. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 179--190, 2010.

Digital Library

[43]

H. Vandierendonck, S. Rul, and K. De Bosschere. The Paralax infrastructure: Automatic parallelization with a helping hand. In Proceedings of the 19th International Conference on Parallel Architecture and Compilation Techniques (PACT), pages 389--400, 2010.

Digital Library

[44]

M. J. Voss and R. Eigenmann. ADAPT: Automated de-coupled adaptive program transformation. In Proceedings of the 1999 International Conference on Parallel Processing (ICPP), pages 163--170, 1999.

Digital Library

[45]

Z. Wang and M. F. O'Boyle. Mapping parallelism to multi-cores: A machine learning based approach. In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 75--84, 2009.

Digital Library

[46]

M. Wolfe. DOANY: Not just another parallel loop. In Proceedings of the 4th International Workshop on Languages and Compilers for Parallel Computing (LCPC), 1992.

Digital Library

[47]

H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. Uncovering hidden loop level parallelism in sequential applications. In Proceedings of the 14th International Symposium on High-Performance Computer Architecture (HPCA), 2008.

Cited By

Cho YPark JNegele FJo CGross TEgger BLee JAgrawal KSpear M(2022)DopiaProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508421(32-45)Online publication date: 2-Apr-2022
https://dl.acm.org/doi/10.1145/3503221.3508421
Zhang YYin LLi DPeng YLu K(2022)ParaX : Bandwidth-Efficient Instance Assignment for DL on Multi-NUMA Many-Core CPUsIEEE Transactions on Computers10.1109/TC.2022.314516471:11(3032-3046)Online publication date: 1-Nov-2022
https://doi.org/10.1109/TC.2022.3145164
Metzger PSeeker VFensch CCole M(2021)Device HoppingACM Transactions on Architecture and Code Optimization10.1145/347190918:4(1-25)Online publication date: 29-Sep-2021
https://dl.acm.org/doi/10.1145/3471909
Show More Cited By

Index Terms

Parcae: a system for flexible parallel execution
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments
    2. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

Parcae: a system for flexible parallel execution
PLDI '12

Workload, platform, and available resources constitute a parallel program's execution environment. Most parallelization efforts statically target an anticipated range of environments, but performance generally degrades outside that range. Existing ...
From sequential programming to flexible parallel execution
CASES '12: Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems

The embedded computing landscape is being transformed by three trends: growing demand for greater functionality and enriched user experience, increasing diversity and parallelism in the processing substrate, and an accelerating push for ever-greater ...
MemoDyn: exploiting weakly consistent data structures for dynamic parallel memoization
PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques

Several classes of algorithms for combinatorial search and optimization problems employ memoization data structures to speed up their serial convergence. However, accesses to these data structures impose dependences that obstruct program ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation

June 2012

572 pages

ISBN:9781450312059

DOI:10.1145/2254064

General Chairs:
Jan Vitek
Purdue University
,
Haibo Lin
Microsoft China
,
Program Chair:
Frank Tip
IBM T.J. Watson Research Center

ACM SIGPLAN Notices Volume 47, Issue 6
PLDI '12
June 2012
534 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2345156
Issue’s Table of Contents

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PLDI '12

Sponsor:

SIGPLAN

PLDI '12: ACM SIGPLAN Conference on Programming Language Design and Implementation

June 11 - 16, 2012

Beijing, China

Acceptance Rates

PLDI '12 Paper Acceptance Rate 48 of 255 submissions, 19%;

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

51
Total Citations
View Citations
579
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)4

Reflects downloads up to 01 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cho YPark JNegele FJo CGross TEgger BLee JAgrawal KSpear M(2022)DopiaProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508421(32-45)Online publication date: 2-Apr-2022
https://dl.acm.org/doi/10.1145/3503221.3508421
Zhang YYin LLi DPeng YLu K(2022)ParaX : Bandwidth-Efficient Instance Assignment for DL on Multi-NUMA Many-Core CPUsIEEE Transactions on Computers10.1109/TC.2022.314516471:11(3032-3046)Online publication date: 1-Nov-2022
https://doi.org/10.1109/TC.2022.3145164
Metzger PSeeker VFensch CCole M(2021)Device HoppingACM Transactions on Architecture and Code Optimization10.1145/347190918:4(1-25)Online publication date: 29-Sep-2021
https://dl.acm.org/doi/10.1145/3471909
da Silva VNogueira Ade Lima Ede A. Rocha HSerpa MLuizelli MRossi FNavaux PBeck AFrancisco Lorenzon A(2021)Smart resource allocation of concurrent execution of parallel applicationsConcurrency and Computation: Practice and Experience10.1002/cpe.660035:17Online publication date: 8-Sep-2021
https://doi.org/10.1002/cpe.6600
Rauber TRunger G(2020)A Parameter Selection Process by Data Analysis for Tuning Multi-threaded Time-Stepping Algorithms2020 Seventh International Conference on Software Defined Systems (SDS)10.1109/SDS49854.2020.9143911(43-50)Online publication date: Apr-2020
https://doi.org/10.1109/SDS49854.2020.9143911
Kalinnik NKiesel RRauber TRichter MRünger G(2020)A performance- and energy-oriented extended tuning process for time-step-based scientific applicationsThe Journal of Supercomputing10.1007/s11227-020-03402-yOnline publication date: 25-Aug-2020
https://doi.org/10.1007/s11227-020-03402-y
Lorenzon Ade Oliveira CSouza JBeck A(2019)Aurora: Seamless Optimization of OpenMP ApplicationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.287299230:5(1007-1021)Online publication date: 1-May-2019
https://doi.org/10.1109/TPDS.2018.2872992
Cho YGuzman CEgger BEvripidou SStenström PO'Boyle M(2018)Maximizing system utilization via parallelism management for co-located parallel applicationsProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243199(1-14)Online publication date: 1-Nov-2018
https://dl.acm.org/doi/10.1145/3243176.3243199
Prabhu PBeard SApostolakis SZaks AAugust DEvripidou SStenström PO'Boyle M(2018)MemoDynProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243193(1-12)Online publication date: 1-Nov-2018
https://dl.acm.org/doi/10.1145/3243176.3243193
Oh YQuan QKim DKim SHeo JJung SJang JLee JEvripidou SStenström PO'Boyle M(2018)A portable, automatic data qantizer for deep neural networksProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243180(1-14)Online publication date: 1-Nov-2018
https://dl.acm.org/doi/10.1145/3243176.3243180
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents