A Unified Runtime System for Heterogeneous Multi-core Architectures

Cédric Augonnet²⁴ &
Raymond Namyst²⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5415))

Included in the following conference series:

European Conference on Parallel Processing

1054 Accesses
8 Citations

Abstract

Approaching the theoretical performance of heterogeneous multicore architectures, equipped with specialized accelerators, is a challenging issue. Unlike regular CPUs that can transparently access the whole global memory address range, accelerators usually embed local memory on which they perform all their computations using a specific instruction set. While many research efforts have been devoted to offloading parts of a program over such coprocessors, the real challenge is to find a programming model providing a unified view of all available computing units.

In this paper, we present an original runtime system providing a high-level, unified execution model allowing seamless execution of tasks over the underlying heterogeneous hardware. The runtime is based on a hierarchical memory management facility and on a codelet scheduler. We demonstrate the efficiency of our solution with a LU decomposition for both homogeneous (3.8 speedup on 4 cores) and heterogeneous machines (95 % efficiency). We also show that a “granularity aware” scheduling can improve execution time by 35 %.

Download to read the full chapter text

Chapter PDF

Runtime-Aware Architectures

High-Level Programming for Many-Cores Using C++14 and the STL

Article 13 March 2017

CoreTSAR: Adaptive Worksharing for Heterogeneous Systems

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

http://www.openmp.org/
AMD FireStream SDK, http://ati.amd.com/technology/streamcomputing/
Cuda zone, http://www.nvidia.com/cuda
Bouzas, B., Cooper, R., Greene, J., Pepe, M., Prelle, M.J.: Multicore framework: An api for programming heterogeneous multicore processors. In: STMCS. Mercury Computer Systems (2006)
Google Scholar
Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., Hanrahan, P.: Brook for gpus: stream computing on graphics hardware. In: SIGGRAPH 2004 (2004)
Google Scholar
Buttari, A., Luszczek, P., Kurzak, J., Dongarra, J., Bosilca, G.: A rough guide to scientific computing on the playstation 3. Technical report, UTK (2007)
Google Scholar
Crawford, C.H., Henning, P., Kistler, M., Wright, C.: Accelerating computing with the cell broadband engine processor. In: CF 2008 (2008)
Google Scholar
Dolbeau, R., Bihan, S., Bodin, F.: HMPP: A hybrid multi-core parallel programming environment (2007)
Google Scholar
Duran, A., Perez, J.M., Ayguade, E., Badia, R., Labarta, J.: Extending the openmp tasking model to allow dependant tasks. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 111–122. Springer, Heidelberg (2008)
Chapter Google Scholar
Fatahalian, K., Knight, T.J., Houston, M., Erez, M., Reiter Horn, D., Leem, L., Young Park, J., Ren, M., Aiken, A., Dally, W.J., Hanrahan, P.: Sequoia: Programming the memory hierarchy. In: Supercomputing (2006)
Google Scholar
Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the cilk-5 multithreaded language. In: PLDI 1998 (1998)
Google Scholar
Kunzman, D., Zheng, G., Bohm, E., Kalé, L.V.: Charm++, Offload API, and the Cell Processor. In: PMUP 2006 (2006)
Google Scholar
Linderman, M.D., Collins, J.D., Wang, H., Meng, T.H.: Merge: a programming model for heterogeneous multi-core systems. In: ASPLOS XIII (2008)
Google Scholar
McCool, M.D.: Data-parallel programming on the cell be and the gpu using the rapidmind development platform (2006)
Google Scholar
Nijhuis, M.: Simple and Efficient Parallel Streaming Applications (working title). Ph.D thesis, Vrije Universiteit Amsterdam (2008) (to appear)
Google Scholar
Ohara, M., Inoue, H., Sohda, Y., Komatsu, H., Nakatani, T.: Mpi microtask for programming the cell broadband enginetm processor. IBM Syst. J. 45(1) (2006)
Google Scholar
Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Krger, J., Lefohn, A.E., Purcell, T.J.: A survey of general-purpose computation on graphics hardware. Computer Graphics Forum 26(1), 80–113 (2007)
Article Google Scholar
Pakin, S.: Receiver-initiated Message Passing over RDMA Networks. In: IPDPS 2008 (2008)
Google Scholar
Penczek, F.: Design and Implementation of a Multithreaded Runtime System for the Stream Processing Language S-Net. Master’s thesis, Institute of Software Technology and Programming Languages, University of Lübeck, Germany (2007)
Google Scholar
Meister, B., Lethin, R., Leung, A., Schweitz, E.: R-stream: A parametric high level compiler. In: HPEC (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

INRIA Bordeaux – LaBRI, University of Bordeaux, France
Cédric Augonnet & Raymond Namyst

Authors

Cédric Augonnet
View author publications
You can also search for this author in PubMed Google Scholar
Raymond Namyst
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departament Arquitectura de Computadors i Sistemes Operatius, Universitat Autònoma de Barcelona, 08193, Bellaterra, Spain
Eduardo César
Wirtschaftsuniversität Wien, 1090, Wien, Austria
Michael Alexander
Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich GmbH, 52425, Jülich, Germany
Achim Streit
NEC Laboratories Europe, NEC Europe Ltd., Rathausallee 10, 53757, Sankt Augustin, Germany
Jesper Larsson Träff
Université de Paris Nord, LIPN, CNRS UMR 7030, 99 avenue J.B. Clément, 93430, Villetaneuse, France
Christophe Cérin
Technische Universität Dresden, 01069, Dresden, Germany
Andreas Knüpfer
LMU München, Institut für Informatik,, 80538, München, Germany
Dieter Kranzlmüller
Center for Computation and Technology (CCT), Louisiana State University, LA 70803, Baton Rouge, USA
Shantenu Jha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Augonnet, C., Namyst, R. (2009). A Unified Runtime System for Heterogeneous Multi-core Architectures. In: César, E., et al. Euro-Par 2008 Workshops - Parallel Processing. Euro-Par 2008. Lecture Notes in Computer Science, vol 5415. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00955-6_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-00955-6_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00954-9
Online ISBN: 978-3-642-00955-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Unified Runtime System for Heterogeneous Multi-core Architectures

Abstract

Chapter PDF

Similar content being viewed by others

Runtime-Aware Architectures

High-Level Programming for Many-Cores Using C++14 and the STL

CoreTSAR: Adaptive Worksharing for Heterogeneous Systems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Unified Runtime System for Heterogeneous Multi-core Architectures

Abstract

Chapter PDF

Similar content being viewed by others

Runtime-Aware Architectures

High-Level Programming for Many-Cores Using C++14 and the STL

CoreTSAR: Adaptive Worksharing for Heterogeneous Systems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation