Abstract
Approaching the theoretical performance of heterogeneous multicore architectures, equipped with specialized accelerators, is a challenging issue. Unlike regular CPUs that can transparently access the whole global memory address range, accelerators usually embed local memory on which they perform all their computations using a specific instruction set. While many research efforts have been devoted to offloading parts of a program over such coprocessors, the real challenge is to find a programming model providing a unified view of all available computing units.
In this paper, we present an original runtime system providing a high-level, unified execution model allowing seamless execution of tasks over the underlying heterogeneous hardware. The runtime is based on a hierarchical memory management facility and on a codelet scheduler. We demonstrate the efficiency of our solution with a LU decomposition for both homogeneous (3.8 speedup on 4 cores) and heterogeneous machines (95 % efficiency). We also show that a “granularity aware” scheduling can improve execution time by 35 %.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
AMD FireStream SDK, http://ati.amd.com/technology/streamcomputing/
Cuda zone, http://www.nvidia.com/cuda
Bouzas, B., Cooper, R., Greene, J., Pepe, M., Prelle, M.J.: Multicore framework: An api for programming heterogeneous multicore processors. In: STMCS. Mercury Computer Systems (2006)
Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., Hanrahan, P.: Brook for gpus: stream computing on graphics hardware. In: SIGGRAPH 2004 (2004)
Buttari, A., Luszczek, P., Kurzak, J., Dongarra, J., Bosilca, G.: A rough guide to scientific computing on the playstation 3. Technical report, UTK (2007)
Crawford, C.H., Henning, P., Kistler, M., Wright, C.: Accelerating computing with the cell broadband engine processor. In: CF 2008 (2008)
Dolbeau, R., Bihan, S., Bodin, F.: HMPP: A hybrid multi-core parallel programming environment (2007)
Duran, A., Perez, J.M., Ayguade, E., Badia, R., Labarta, J.: Extending the openmp tasking model to allow dependant tasks. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 111–122. Springer, Heidelberg (2008)
Fatahalian, K., Knight, T.J., Houston, M., Erez, M., Reiter Horn, D., Leem, L., Young Park, J., Ren, M., Aiken, A., Dally, W.J., Hanrahan, P.: Sequoia: Programming the memory hierarchy. In: Supercomputing (2006)
Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the cilk-5 multithreaded language. In: PLDI 1998 (1998)
Kunzman, D., Zheng, G., Bohm, E., Kalé, L.V.: Charm++, Offload API, and the Cell Processor. In: PMUP 2006 (2006)
Linderman, M.D., Collins, J.D., Wang, H., Meng, T.H.: Merge: a programming model for heterogeneous multi-core systems. In: ASPLOS XIII (2008)
McCool, M.D.: Data-parallel programming on the cell be and the gpu using the rapidmind development platform (2006)
Nijhuis, M.: Simple and Efficient Parallel Streaming Applications (working title). Ph.D thesis, Vrije Universiteit Amsterdam (2008) (to appear)
Ohara, M., Inoue, H., Sohda, Y., Komatsu, H., Nakatani, T.: Mpi microtask for programming the cell broadband enginetm processor. IBM Syst. J. 45(1) (2006)
Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Krger, J., Lefohn, A.E., Purcell, T.J.: A survey of general-purpose computation on graphics hardware. Computer Graphics Forum 26(1), 80–113 (2007)
Pakin, S.: Receiver-initiated Message Passing over RDMA Networks. In: IPDPS 2008 (2008)
Penczek, F.: Design and Implementation of a Multithreaded Runtime System for the Stream Processing Language S-Net. Master’s thesis, Institute of Software Technology and Programming Languages, University of Lübeck, Germany (2007)
Meister, B., Lethin, R., Leung, A., Schweitz, E.: R-stream: A parametric high level compiler. In: HPEC (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Augonnet, C., Namyst, R. (2009). A Unified Runtime System for Heterogeneous Multi-core Architectures. In: César, E., et al. Euro-Par 2008 Workshops - Parallel Processing. Euro-Par 2008. Lecture Notes in Computer Science, vol 5415. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00955-6_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-00955-6_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00954-9
Online ISBN: 978-3-642-00955-6
eBook Packages: Computer ScienceComputer Science (R0)