research-article

Scheduling directives for shared-memory many-core processor systems

Authors:

Yitzhak BirkAuthors Info & Claims

PMAM '13: Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores

Pages 115 - 124

https://doi.org/10.1145/2442992.2443005

Published: 23 February 2013 Publication History

Abstract

We consider many-core processors with task-oriented programming, whereby scheduling constraints among tasks are decided offline, and are then enforced by the runtime system. Here, exposing and beneficially exploiting fine grain data and control parallelism is increasingly important. Therefore, high expressive power for stating such constraints/directives, along with the ability to implement them in fast, simple hardware, is critical for success. In this paper, we focus on the relationship between duplicable tasks, which are used to express and exploit data parallelism. We extend the conventional Start-After-Complete (precedence) constraint to also be usable between replicas of different such tasks rather than only between entire tasks, thereby increasing the exposable parallelism. Additionally, we propose the parameterized Start-After-Start constraint, which can be used to control the degree of "lockstep" among multiple such tasks, e.g., in order to improve cache performance when the tasks work on the same data. Also, we briefly describe several additional interesting directives. Finally, we show that the directives can be supported efficiently in hardware. Hypercore, a very efficient CREW PRAM-like shared-cache architecture, which is very challenging because it has extremely fast dispatching for basic constraints, is used in the discussion. However, the new directives have broader applicability.

References

[1]

Analysis: 'Hypercore' Touts 256 CPUs Per Chip. EE Times www.eetimes.com/design/signal-processing-dsp/4017491/Analysis--Hypercore-touts-256-CPUs-per-chip.2007).

[2]

HyperCore Software Developer's Handbook ed: Plurality, Online: www.plurality.com2009).

[3]

Wen, X. and Vishkin, U. Fpga-based prototype of a pram-on-chip processor. In Proceedings of the Proceedings of the 5th conference on Computing frontiers (Ischia, Italy, 2008). ACM

Digital Library

[4]

Wen, X. HARDWARE DESIGN, PROTOTYPING AND STUDIES OF THE EXPLICIT MULTI-THREADING (XMT) PARADIGM. University of Maryland, 2008.

[5]

Graham, R. Bounds for certain multiprocessing anomalies. Bell System Technical Journal, XLV, No 9 1966).

[6]

Ullman, J. D. Polynomial complete scheduling problems. In Proceedings of the Proceedings of the fourth ACM symposium on Operating system principles (1973).

Digital Library

[7]

Graham, R., E. Lawler, E. Lenstra, A. Rinnooy Kan Optimization and Approximation in Deterministic Sequencing and Scheduling: A Survey. Annals of Discrete Mathematics, 51979), 287--326.

[8]

Brucker, P. Scheduling algorithms. Springer, 2007.

Digital Library

[9]

Sinnen, O. Task scheduling for parallel systems. Wiley-Interscience, Hoboken, N. J., 2007.

Digital Library

[10]

Sih, G. C. and Lee, E. A. A Compile-Time Scheduling Heuristic for Interconnection-Constrained Heterogeneous Processor Architectures. Ieee Transactions on Parallel and Distributed Systems, 4, 2 (Feb 1993), 175--187.

Digital Library

[11]

Kwok, Y. K. and Ahmad, I. Dynamic critical-path scheduling: An effective technique for allocating task graphs to multiprocessors. Ieee Transactions on Parallel and Distributed Systems, 7, 5 (May 1996), 506--521.

Digital Library

[12]

Gillies, D. W. and Liu, J. W. S. Scheduling Tasks with and/or Precedence Constraints. Siam Journal on Computing, 24, 4 (Aug 1995), 797--810.

Digital Library

[13]

Cormen, T. H., Leiserson, C. E. and Rivest, R. L. Introduction to algorithms. MIT Press;McGraw-Hill, Cambridge, Mass., 1990.

Digital Library

[14]

Tomasulo, R. M. An Efficient Algorithm for Exploiting Multiple Arithmetic Units. IBM Journal of Research and Development, 1967, 25--33.

Digital Library

[15]

Hennessy, J. L., Patterson, D. A. and Arpaci-Dusseau, A. C. Computer architecture: a quantitative approach. Morgan Kaufmann, Amsterdam; Boston, 2007.

Digital Library

[16]

NVIDIA CUDA#8482; Programming Guide Version 3.0.

[17]

Nickolls, J., Buck, I., Garland, M. and Skadron, K. Scalable parallel programming with CUDA. In Proceedings of the ACM SIGGRAPH 2008 classes (Los Angeles, California, 2008).

Digital Library

[18]

Intel® Concurrent Collections, 2012.

Index Terms

Recommendations

Scheduling directives: Accelerating shared-memory many-core processor execution

We consider many-core processors with a task-graph oriented programming model, whereby scheduling constraints among tasks are decided offline, and are then enforced by the runtime system using dedicated hardware. Here, exposing and beneficially ...
Interpreting Parallel Processor Performance Measurements

This paper discusses execution time versus number of simultaneous operations in parallel computing systems. The main focus is on shared memory multiprocessors. A model for execution time as a function of the number of processes used in a computation is ...
The NYU Ultracomputer Designing an MIMD Shared Memory Parallel Computer

We present the design for the NYU Ultracomputer, a shared-memory MIMD parallel machine composed of thousands of autonomous processing elements. This machine uses an enhanced message switching network with the geometry of an Omega-network to approximate ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PMAM '13: Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores

February 2013

134 pages

ISBN:9781450319089

DOI:10.1145/2442992

Editors:
Pavan Balaji
Argonne National Laboratory
,
Minyi Guo
Shanghai Jiao Tong University, China
,
Zhiyi Huang
University of Otago, New Zealand

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 February 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PPoPP '13

Sponsor:

SIGPLAN

PPoPP '13: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 23, 2013

Guangdong, Shenzhen, China

Acceptance Rates

Overall Acceptance Rate 53 of 97 submissions, 55%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
136
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)1

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents