Article

Programming Heterogeneous Architectures Using Hierarchical Tasks

Authors:

Mathieu Faverge,

Nathalie Furmento,

Abdou Guermouche,

Gwenolé Lucas,

Raymond Namyst,

Samuel Thibault,

Pierre-André WacrenierAuthors Info & Claims

Euro-Par 2022: Parallel Processing Workshops: Euro-Par 2022 International Workshops, Glasgow, UK, August 22–26, 2022, Revised Selected Papers

Pages 97 - 108

https://doi.org/10.1007/978-3-031-31209-0_7

Published: 02 May 2023 Publication History

Abstract

Task-based systems have gained popularity as they promise to exploit the computational power of complex heterogeneous systems. A common programming model is the so-called Sequential Task Flow (STF) model, which, unfortunately, has the intrinsic limitation of supporting static task graphs only. This leads to potential submission overhead and to a static task graph not necessarily adapted for execution on heterogeneous systems. A standard approach is to find a trade-off between the granularity needed by accelerator devices and the one required by CPU cores to achieve performance. To address these problems, we extend the STF model of StarPU [5] to enable tasks subgraphs at runtime. We refer to these tasks as hierarchical tasks. This approach allows for a more dynamic task graph. Combined with an automatic data manager, it allows to dynamically adapt the granularity to meet the optimal size of the targeted computing resource. We show that the model is correct and we provide an early evaluation on shared memory heterogeneous systems, using the Chameleon [1] dense linear algebra library.

References

[1]

Agullo E, Augonnet C, Dongarra J, Ltaief H, Namyst R, Thibault S, and Tomov S A hybridization methodology for high-performance linear algebra software for GPUs GPU Comput. Gems Jade Edition 2011 2 473-484

[2]

Akbudak, K., Ltaief, H., Mikhalev, A., Keyes, D.: Tile low rank cholesky factorization for climate/weather modeling applications on manycore architectures (2017)

[3]

Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann, Burlington (2002)

[4]

Álvarez, D., Sala, K., Maroñas, M., Roca, A., Beltran, V.: Advanced synchronization techniques for task-based runtime systems. In: Proceedings of PPoPP 2021, pp. 334–347 (2021)

[5]

Augonnet C, Thibault S, Namyst R, and Wacrenier PA StarPU: a unified platform for task scheduling on heterogeneous multicore architectures Concurr. Comput. Pract. Exper. 2011 23 187-198

[6]

Augonnet, C., Goudin, D., Kuhn, M., Lacoste, X., Namyst, R., Ramet, P.: A hierarchical fast direct solver for distributed memory machines with manycore nodes. Technical Report, October 2019. https://hal-cea.archives-ouvertes.fr/cea-02304706

[7]

Bosilca, G., et al.: Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA. In: IEEE IPDPS Workshops and Phd Forum, pp. 1432–1441 (2011)

[8]

Carratala-Saez R, Christophersen S, Aliaga JI, Beltran V, Borm S, and Quintana-Orti ES Exploiting nested task-parallelism in the H-LU factorization J. Comput. Sci. 2019 33 20-33

[9]

Cojean T, Guermouche A, Hugo A, Namyst R, and Wacrenier P Resource aggregation for task-based Cholesky Factorization on top of modern architectures Parallel Comput. 2019 83 73-92

[10]

Cosnard, M., Jeannot, E., Yang, T.: Slc: symbolic scheduling for executing parameterized task graphs on multiprocessors. In: Proceedings of ICPP 1999, pp. 413–421 (1999)

[11]

Elshazly H, Lordan F, Ejarque J, and Badia RM Accelerated execution via eager-release of dependencies in task-based workflows Int. J. High Perform. Comput. Appl. 2021 35 4 325-343

[12]

Gautier, T., Lima, J.V.F., Maillard, N., Raffin, B.: Xkaapi: a runtime system for data-flow task programming on heterogeneous architectures. In: Proceedings of IPDPS 2013, pp. 1299–1308 (2013)

[13]

Huang TW, Lin DL, Lin CX, and Lin Y Taskflow: a lightweight parallel and heterogeneous task graph computing system IEEE Trans. Parallel Distrib. Syst. 2021 33 6 1303-1320

[14]

Kim, J., Lee, S., Johnston, B., Vetter, J.S.: Iris: a portable runtime system exploiting multiple heterogeneous programming systems. In: Proceedings of HPEC 2021, pp. 1–8 (2021)

[15]

Maroñas, M., Sala, K., Mateo, S., Ayguadé, E., Beltran, V.: Worksharing tasks: an efficient way to exploit irregular and fine-grained loop parallelism. In: Proceedings of of HiPC 2019, pp. 383–394 (2019)

[16]

Perez, J.M., Beltran, V., Labarta, J., Ayguadé, E.: Improving the integration of task nesting and dependencies in OpenMP. In: Proceedings of IPDPS 2017, pp. 809–818 (2017)

[17]

Valero-Lara P, Catalán S, Martorell X, Usui T, and Labarta J sLASs: a fully automatic auto-tuned linear algebra library based on OpenMP extensions implemented in OmpSs J. Parallel Distrib. Comput. 2020 138 153-171

[18]

Wu, W., Bouteiller, A., Bosilca, G., Faverge, M., Dongarra, J.: Hierarchical DAG scheduling for hybrid distributed systems. In: Proceedings of IPDPS 2015, pp. 156–165 (2015)

Recommendations

Resource aggregation for task-based Cholesky Factorization on top of modern architectures
Abstract
Hybrid computing platforms are now commonplace, featuring a large number of CPU cores and accelerators. This trend makes balancing computations between these heterogeneous resources performance critical. In this paper we propose ...
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
Euro-Par 2009

In the field of HPC, the current hardware trend is to design multiprocessor architectures featuring heterogeneous technologies such as specialized coprocessors (e.g. Cell/BE) or data-parallel accelerators (e.g. GPUs). Approaching the theoretical ...
Divide and Conquer on Hybrid GPU-Accelerated Multicore Systems

With the raw computing power of graphics processing units (GPUs) being more widely available in commodity multicore systems, there is an imminent need to harness their power for important numerical libraries such as LAPACK. In this paper, we consider ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Euro-Par 2022: Parallel Processing Workshops: Euro-Par 2022 International Workshops, Glasgow, UK, August 22–26, 2022, Revised Selected Papers

Aug 2022

312 pages

ISBN:978-3-031-31208-3

DOI:10.1007/978-3-031-31209-0

Editors:
Jeremy Singer
University of Glasgow, Glasgow, UK
,
Yehia Elkhatib
University of Glasgow, Glasgow, UK
,
Dora Blanco Heras
University of Santiago de Compostela, Santiago de Compostela, La Coruña, Spain
,
Patrick Diehl
Louisiana State University, Baton Rouge, LA, USA
,
Nick Brown
University of Edinburgh, Edinburgh, UK
,
Aleksandar Ilic
Universidade de Lisboa, Lisbon, Portugal

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 02 May 2023

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents