Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-031-31209-0_7guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Programming Heterogeneous Architectures Using Hierarchical Tasks

Published: 02 May 2023 Publication History

Abstract

Task-based systems have gained popularity as they promise to exploit the computational power of complex heterogeneous systems. A common programming model is the so-called Sequential Task Flow (STF) model, which, unfortunately, has the intrinsic limitation of supporting static task graphs only. This leads to potential submission overhead and to a static task graph not necessarily adapted for execution on heterogeneous systems. A standard approach is to find a trade-off between the granularity needed by accelerator devices and the one required by CPU cores to achieve performance. To address these problems, we extend the STF model of StarPU  [5] to enable tasks subgraphs at runtime. We refer to these tasks as hierarchical tasks. This approach allows for a more dynamic task graph. Combined with an automatic data manager, it allows to dynamically adapt the granularity to meet the optimal size of the targeted computing resource. We show that the model is correct and we provide an early evaluation on shared memory heterogeneous systems, using the Chameleon [1] dense linear algebra library.

References

[1]
Agullo E, Augonnet C, Dongarra J, Ltaief H, Namyst R, Thibault S, and Tomov S A hybridization methodology for high-performance linear algebra software for GPUs GPU Comput. Gems Jade Edition 2011 2 473-484
[2]
Akbudak, K., Ltaief, H., Mikhalev, A., Keyes, D.: Tile low rank cholesky factorization for climate/weather modeling applications on manycore architectures (2017)
[3]
Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann, Burlington (2002)
[4]
Álvarez, D., Sala, K., Maroñas, M., Roca, A., Beltran, V.: Advanced synchronization techniques for task-based runtime systems. In: Proceedings of PPoPP 2021, pp. 334–347 (2021)
[5]
Augonnet C, Thibault S, Namyst R, and Wacrenier PA StarPU: a unified platform for task scheduling on heterogeneous multicore architectures Concurr. Comput. Pract. Exper. 2011 23 187-198
[6]
Augonnet, C., Goudin, D., Kuhn, M., Lacoste, X., Namyst, R., Ramet, P.: A hierarchical fast direct solver for distributed memory machines with manycore nodes. Technical Report, October 2019. https://hal-cea.archives-ouvertes.fr/cea-02304706
[7]
Bosilca, G., et al.: Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA. In: IEEE IPDPS Workshops and Phd Forum, pp. 1432–1441 (2011)
[8]
Carratala-Saez R, Christophersen S, Aliaga JI, Beltran V, Borm S, and Quintana-Orti ES Exploiting nested task-parallelism in the H-LU factorization J. Comput. Sci. 2019 33 20-33
[9]
Cojean T, Guermouche A, Hugo A, Namyst R, and Wacrenier P Resource aggregation for task-based Cholesky Factorization on top of modern architectures Parallel Comput. 2019 83 73-92
[10]
Cosnard, M., Jeannot, E., Yang, T.: Slc: symbolic scheduling for executing parameterized task graphs on multiprocessors. In: Proceedings of ICPP 1999, pp. 413–421 (1999)
[11]
Elshazly H, Lordan F, Ejarque J, and Badia RM Accelerated execution via eager-release of dependencies in task-based workflows Int. J. High Perform. Comput. Appl. 2021 35 4 325-343
[12]
Gautier, T., Lima, J.V.F., Maillard, N., Raffin, B.: Xkaapi: a runtime system for data-flow task programming on heterogeneous architectures. In: Proceedings of IPDPS 2013, pp. 1299–1308 (2013)
[13]
Huang TW, Lin DL, Lin CX, and Lin Y Taskflow: a lightweight parallel and heterogeneous task graph computing system IEEE Trans. Parallel Distrib. Syst. 2021 33 6 1303-1320
[14]
Kim, J., Lee, S., Johnston, B., Vetter, J.S.: Iris: a portable runtime system exploiting multiple heterogeneous programming systems. In: Proceedings of HPEC 2021, pp. 1–8 (2021)
[15]
Maroñas, M., Sala, K., Mateo, S., Ayguadé, E., Beltran, V.: Worksharing tasks: an efficient way to exploit irregular and fine-grained loop parallelism. In: Proceedings of of HiPC 2019, pp. 383–394 (2019)
[16]
Perez, J.M., Beltran, V., Labarta, J., Ayguadé, E.: Improving the integration of task nesting and dependencies in OpenMP. In: Proceedings of IPDPS 2017, pp. 809–818 (2017)
[17]
Valero-Lara P, Catalán S, Martorell X, Usui T, and Labarta J sLASs: a fully automatic auto-tuned linear algebra library based on OpenMP extensions implemented in OmpSs J. Parallel Distrib. Comput. 2020 138 153-171
[18]
Wu, W., Bouteiller, A., Bosilca, G., Faverge, M., Dongarra, J.: Hierarchical DAG scheduling for hybrid distributed systems. In: Proceedings of IPDPS 2015, pp. 156–165 (2015)

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
Euro-Par 2022: Parallel Processing Workshops: Euro-Par 2022 International Workshops, Glasgow, UK, August 22–26, 2022, Revised Selected Papers
Aug 2022
312 pages
ISBN:978-3-031-31208-3
DOI:10.1007/978-3-031-31209-0

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 02 May 2023

Author Tags

  1. Multicore
  2. accelerator
  3. GPU
  4. heterogeneous computing
  5. task graph
  6. programming model
  7. runtime system
  8. dense linear algebra

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Oct 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media