research-article

Parallelism orchestration using DoPE: the degree of parallelism executive

Authors:

David I. AugustAuthors Info & Claims

PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 26 - 37

https://doi.org/10.1145/1993498.1993502

Published: 04 June 2011 Publication History

Abstract

In writing parallel programs, programmers expose parallelism and optimize it to meet a particular performance goal on a single platform under an assumed set of workload characteristics. In the field, changing workload characteristics, new parallel platforms, and deployments with different performance goals make the programmer's development-time choices suboptimal. To address this problem, this paper presents the Degree of Parallelism Executive (DoPE), an API and run-time system that separates the concern of exposing parallelism from that of optimizing it. Using the DoPE API, the application developer expresses parallelism options. During program execution, DoPE's run-time system uses this information to dynamically optimize the parallelism options in response to the facts on the ground. We easily port several emerging parallel applications to DoPE's API and demonstrate the DoPE run-time system's effectiveness in dynamically optimizing the parallelism for a variety of performance goals.

References

[1]

R. Allen and K. Kennedy. Optimizing compilers for modern architectures: A dependence-based approach. Morgan Kaufmann Publishers Inc., 2002.

Digital Library

[2]

APC metered rack PDU user's guide. http://www.apc.com

[3]

C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In Proceedings of the Seventeenth International Conference on Parallel Architecture and Compilation Techniques (PACT), 2008.

Digital Library

[4]

F. Blagojevic, D. S. Nikolopoulos, A. Stamatakis, C. D. Antonopoulos, and M. Curtis-Maury. Runtime scheduling of dynamic parallelism on accelerator-based multi-core systems. Parallel Computing, 2007.

Digital Library

[5]

R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In Proceedings of the 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 1995.

Digital Library

[6]

M. Bridges, N. Vachharajani, Y. Zhang, T. Jablin, and D. August. Revisiting the sequential programming model for multi-core. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2007.

Digital Library

[7]

C. B. Colohan, A. Ailamaki, J. G. Steffan, and T. C. Mowry. Optimistic intra-transaction parallelism on chip multiprocessors. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB), 2005.

Digital Library

[8]

M. Curtis-Maury, J. Dzierwa, C. D. Antonopoulos, and D. S. Nikolopoulos. Online power-performance adaptation of multithreaded programs using hardware event-based prediction. In Proceedings of the 20th International Conference on Supercomputing (ICS), 2006.

Digital Library

[9]

Y. Ding, M. Kandemir, P. Raghavan, and M. J. Irwin. Adapting application execution in CMPs using helper threads. Journal of Parallel and Distributed Computing, 2009.

Digital Library

[10]

GNU Image Manipulation Program. http://www.gimp.org

[11]

M. W. Hall and M. Martonosi. Adaptive parallelism in compiler-parallelized code. In Proceedings of the 2nd SUIF Compiler Workshop, 1997.

[12]

N. Hardavellas, I. Pandis, R. Johnson, N. Mancheril, A. Ailamaki, and B. Falsafi. Database servers on chip multiprocessors: Limitations and opportunities. In Proceedings of the Third Biennial Conference on Innovative Data Systems Research (CIDR), 2007.

[13]

W. Ko, M. N. Yankelevsky, D. S. Nikolopoulos, and C. D. Polychronopoulos. Effective cross-platform, multilevel parallelism via dynamic adaptive execution. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), 2002.

Digital Library

[14]

M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI), 2007.

Digital Library

[15]

R. Liu, K. Klues, S. Bird, S. Hofmeyr, K. Asanović, and J. Kubiatowicz. Tessellation: Space-time partitioning in a manycore client OS. In Proceedings of the First USENIX Workshop on Hot Topics in Parallelism (HotPar), 2009.

Digital Library

[16]

C.-K. Luk, S. Hong, and H. Kim. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2009.

Digital Library

[17]

Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Ferret: A toolkit for content-based similarity search of feature-rich data. ACM SIGOPS Operating Systems Review, 2006.

Digital Library

[18]

J. Mars, N. Vachharajani, M. L. Soffa, and R. Hundt. Contention aware execution: Online contention detection and response. In Proceedings of the Eighth Annual International Symposium on Code Generation and Optimization (CGO), 2010.

Digital Library

[19]

D. Meisner, B. T. Gold, and T. F. Wenisch. PowerNap: Eliminating server idle power. In Proceedings of the Fourteenth International Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2009.

Digital Library

[20]

A. Moreno, E. César, A. Guevara, J. Sorribes, T. Margalef, and E. Luque. Dynamic Pipeline Mapping (DPM). In Proceedings of the International Euro-Par Conference on Parallel Processing (Euro-Par), 2008.

Digital Library

[21]

A. Navarro, R. Asenjo, S. Tabik, and C. Cascaval. Analytical modeling of pipeline parallelism. In Proceedings of the Eighteenth International Conference on Parallel Architecture and Compilation Techniques (PACT), 2009.

Digital Library

[22]

The OpenMP API specification for parallel programming. http://www.openmp.org .

[23]

H. Pan, B. Hindman, and K. Asanović. Composing parallel software efficiently with Lithe. In Proceedings of the ACM SIGPLAN 2010 Conference on Programming Language Design and Implementation (PLDI), 2010.

Digital Library

[24]

M. K. Prabhu and K. Olukotun. Exposing speculative thread parallelism in SPEC2000. In Proceedings of the 10th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2005.

Digital Library

[25]

A. Raman, H. Kim, T. R. Mason, T. B. Jablin, and D. I. August. Speculative parallelization using software multi-threaded transactions. In Proceedings of the Fifteenth International Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2010.

Digital Library

[26]

J. Reinders. Intel Threading Building Blocks. O'Reilly & Associates, Inc., Sebastopol, CA, USA, 2007.

Digital Library

[27]

G. Semeraro, G. Magklis, R. Balasubramonian, D. H. Albonesi, S. Dwarkadas, and M. L. Scott. Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling. In Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA), 2002.

Digital Library

[28]

Standard Performance Evaluation Corporation (SPEC). http://www.spec.org.

[29]

M. A. Suleman, M. K. Qureshi, Khubaib, and Y. N. Patt. Feedback-directed pipeline parallelism. In Proceedings of the Nineteenth International Conference on Parallel Architecture and Compilation Techniques (PACT), 2010.

Digital Library

[30]

M. A. Suleman, M. K. Qureshi, and Y. N. Patt. Feedback-driven threading: Power-efficient and high-performance execution of multi-threaded workloads on CMPs. In Proceedings of the Thirteenth International Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2008.

Digital Library

[31]

Sybase adaptive server. http://sybooks.sybase.com/nav/base.do.

[32]

J. Tellez and B. Dageville. Method for computing the degree of parallelism in a multi-user environment. United States Patent No. 6,820,262. Oracle International Corporation, 2004.

[33]

The IEEE and The Open Group. The Open Group Base Specifications Issue 6 IEEE Std 1003.1, 2004 Edition. 2004.

[34]

C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2008.

Digital Library

[35]

G. Upadhyaya, V. S. Pai, and S. P. Midkiff. Expressing and exploiting concurrency in networked applications with Aspen. In Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2007.

Digital Library

[36]

M. J. Voss and R. Eigenmann. ADAPT: Automated De-Coupled Adaptive Program Transformation. In Proceedings of the 28th International Conference on Parallel Processing (ICPP), 1999.

Digital Library

[37]

Z. Wang and M. F. O'Boyle. Mapping parallelism to multi-cores: A machine learning based approach. In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2009.

Digital Library

[38]

M. Welsh, D. Culler, and E. Brewer. SEDA: An architecture for well-conditioned, scalable internet services. ACM SIGOPS Operating Systems Review, 2001.

Digital Library

[39]

T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra. Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology, 2003.

Digital Library

[40]

H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. Uncovering hidden loop level parallelism in sequential applications. In Proceedings of the 14th International Symposium on High-Performance Computer Architecture (HPCA), 2008.

Cited By

da Silva VNogueira Ade Lima Ede A. Rocha HSerpa MLuizelli MRossi FNavaux PBeck AFrancisco Lorenzon A(2021)Smart resource allocation of concurrent execution of parallel applicationsConcurrency and Computation: Practice and Experience10.1002/cpe.660035:17Online publication date: 8-Sep-2021
https://doi.org/10.1002/cpe.6600
Arif MVandierendonck H(2021)Reducing the burden of parallel loop schedulers for many‐core processorsConcurrency and Computation: Practice and Experience10.1002/cpe.624133:13Online publication date: 5-Apr-2021
https://doi.org/10.1002/cpe.6241
Vandierendonck HNikolopoulos D(2019)HyperqueuesACM Transactions on Parallel Computing10.1145/33656606:4(1-35)Online publication date: 19-Nov-2019
https://dl.acm.org/doi/10.1145/3365660
Show More Cited By

Index Terms

Parallelism orchestration using DoPE: the degree of parallelism executive
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments
    2. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

Parallelism orchestration using DoPE: the degree of parallelism executive
PLDI '11

In writing parallel programs, programmers expose parallelism and optimize it to meet a particular performance goal on a single platform under an assumed set of workload characteristics. In the field, changing workload characteristics, new parallel ...
Transformations techniques for extracting parallelism in non-uniform nested loops

Executing a program in parallel machines needs not only to find sufficient parallelism in a program, but it is also important that we minimize the synchronization and communication overheads in the parallelized program. This yields to improve the ...
Maximize Parallelism Minimize Overhead for Nested Loops via Loop Striping
Abstract
Majority of scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are generally applied to increase parallelism for these nested loops. Most of the existing loop transformation techniques ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation

June 2011

668 pages

ISBN:9781450306638

DOI:10.1145/1993498

General Chair:
Mary Hall
University of Utah
,
Program Chair:
David Padua
University of Illinois at Urbana-Champaign

ACM SIGPLAN Notices Volume 46, Issue 6
PLDI '11
June 2011
652 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1993316
Issue’s Table of Contents

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PLDI '11

Sponsor:

SIGPLAN

PLDI '11: ACM SIGPLAN Conference on Programming Language Design and Implementation

June 4 - 8, 2011

California, San Jose, USA

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

57
Total Citations
View Citations
563
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)2

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

da Silva VNogueira Ade Lima Ede A. Rocha HSerpa MLuizelli MRossi FNavaux PBeck AFrancisco Lorenzon A(2021)Smart resource allocation of concurrent execution of parallel applicationsConcurrency and Computation: Practice and Experience10.1002/cpe.660035:17Online publication date: 8-Sep-2021
https://doi.org/10.1002/cpe.6600
Arif MVandierendonck H(2021)Reducing the burden of parallel loop schedulers for many‐core processorsConcurrency and Computation: Practice and Experience10.1002/cpe.624133:13Online publication date: 5-Apr-2021
https://doi.org/10.1002/cpe.6241
Vandierendonck HNikolopoulos D(2019)HyperqueuesACM Transactions on Parallel Computing10.1145/33656606:4(1-35)Online publication date: 19-Nov-2019
https://dl.acm.org/doi/10.1145/3365660
Mastoras AGross T(2019)Chunking for Dynamic Linear PipelinesACM Transactions on Architecture and Code Optimization10.1145/336381516:4(1-25)Online publication date: 18-Nov-2019
https://dl.acm.org/doi/10.1145/3363815
Huang HRao JWu SJin HSuo KWu XWeissman JButt ASmirni E(2019)Adaptive Resource Views for ContainersProceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing10.1145/3307681.3325403(243-254)Online publication date: 17-Jun-2019
https://dl.acm.org/doi/10.1145/3307681.3325403
Bacou MTchana AHagimont D(2019)Your Containers Should be WYSIWYG2019 IEEE International Conference on Services Computing (SCC)10.1109/SCC.2019.00022(56-64)Online publication date: Jul-2019
https://doi.org/10.1109/SCC.2019.00022
Iwasaki SAmer ATaura KSeo SBalaji P(2019)BOLT: Optimizing OpenMP Parallel Regions with User-Level Threads2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT.2019.00011(29-42)Online publication date: Sep-2019
https://doi.org/10.1109/PACT.2019.00011
Martins AGaribotti RDutt NMoraes F(2019)The power impact of hardware and software actuators on self-adaptable many-core systemsJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2019.05.00697:C(42-53)Online publication date: 1-Aug-2019
https://dl.acm.org/doi/10.1016/j.sysarc.2019.05.006
Martins Ada Silva ARahmani ADutt NMoraes F(2019)Hierarchical adaptive Multi-objective resource management for many-core systemsJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2019.01.00697:C(416-427)Online publication date: 1-Aug-2019
https://dl.acm.org/doi/10.1016/j.sysarc.2019.01.006
Dice DKogan A(2019)Avoiding Scalability Collapse by Restricting ConcurrencyEuro-Par 2019: Parallel Processing10.1007/978-3-030-29400-7_26(363-376)Online publication date: 26-Aug-2019
https://dl.acm.org/doi/10.1007/978-3-030-29400-7_26
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents