OpenMP Target Task: Tasking and Target Offloading on Heterogeneous Systems

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13098))

Included in the following conference series:

European Conference on Parallel Processing

909 Accesses

Abstract

This work evaluated the use of OpenMP tasking with target GPU offloading as a potential solution for programming productivity and performance on heterogeneous systems. Also, it is proposed a new OpenMP specification to make the implementation of heterogeneous codes simpler by using OpenMP target task, which integrates both OpenMP tasking and target GPU offloading in a single OpenMP pragma. As a test case, the authors used one of the most popular and widely used Basic Linear Algebra Subprogram Level-3 routines: triangular solver (TRSM). To benefit from the heterogeneity of the current high-performance computing systems, the authors propose a different parallelization of the algorithm by using a nonuniform decomposition of the problem. This work used target GPU offloading inside OpenMP tasks to address the heterogeneity found in the hardware. This new approach can outperform the state-of-the-art algorithms, which use a uniform decomposition of the data, on both the CPU-only and hybrid CPU-GPU systems, reaching speedups of up to one order of magnitude. The performance that this approach achieves is faster than the IBM ESSL math library on CPU and competitive relative to a highly optimized heterogeneous CUDA version. One node of Oak Ridge National Laboratory’s supercomputer, Summit, was used for performance analysis.

Notice: This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Feasibility Studies in Multi-GPU Target Offloading

Toward Supporting Multi-GPU Targets via Taskloop and User-Defined Schedules

OmpSs-OpenCL Programming Model for Heterogeneous Systems

Notes

1.
http://www.netlib.org/blas/.

References

AMD: AOMP, June 2021. https://rocmdocs.amd.com/en/latest/Programming_Guides/aomp.html
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Exp. 23(2), 187–198 (2011). https://doi.org/10.1002/cpe.1631
Article Google Scholar
Catalán, S., Martorell, X., Labarta, J., Usui, T., Díaz, L.A.T., Valero-Lara, P.: Accelerating conjugate gradient using OmpSs. In: 20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019, 5–7 December 2019, Gold Coast, Australia, pp. 121–126. IEEE (2019). https://doi.org/10.1109/PDCAT46702.2019.00033
Catalán, S., Usui, T., Toledo, L., Martorell, X., Labarta, J., Valero-Lara, P.: Towards an auto-tuned and task-based SpMV (LASs library). In: Milfeld, K., de Supinski, B.R., Koesterke, L., Klinkenberg, J. (eds.) OpenMP: Portable Multi-Level Parallelism on Modern Systems, IWOMP 2020, LNCS, vol. 12295, pp. 115–129. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58144-2_8
Cray: CCE OpenMP, June 2021. https://pubs.cray.com/bundle/Cray_Fortran_Reference_Manual_S-3901_11-0/page/OpenMP_Overview.html
Dongarra, J.J., Croz, J.D., Hammarling, S., Duff, I.S.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16(1), 1–17 (1990). https://doi.org/10.1145/77626.79170
Article MathSciNet MATH Google Scholar
Dongarra, J.J., et al.: PLASMA: parallel linear algebra software for multicore using OpenMP. ACM Trans. Math. Softw. 45(2), 16:1-16:35 (2019). https://doi.org/10.1145/3264491
Article MathSciNet MATH Google Scholar
Duran, A., et al.: OmpSs: a proposal for programming heterogeneous multi-core architectures. Parallel Process. Lett. 21(2), 173–193 (2011). https://doi.org/10.1142/S0129626411000151
Article MathSciNet Google Scholar
GNU: GCC OpenMP, June 2021. https://gcc.gnu.org/wiki/Offloading
Haidar, A., Ltaief, H., Dongarra, J.J.: Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels. In: Lathrop, S.A., Costa, J., Kramer, W. (eds.) Conference on High Performance Computing Networking, Storage and Analysis, SC 2011, 12–18 November 2011, Seattle, WA, USA, pp. 8:1–8:11. ACM (2011). https://doi.org/10.1145/2063384.2063394
IBM: XLC OpenMP, June 2021. https://www.ibm.com/docs/en/xl-c-and-cpp-linux/13.1.6?topic=gpus-programming-openmp-device-constructs
Intel: OneAPI, June 2021. https://software.intel.com/content/www/us/en/develop/documentation/get-started-with-cpp-fortran-compiler-openmp/top.html
LLVM: OpenMP, June 2021. https://llvm.org/docs/AMDGPUUsage.html#target-triples
Nath, R., Tomov, S., Dongarra, J.J.: An improved magma GEMM for fermi graphics processing units. Int. J. High Perform. Comput. Appl. 24(4), 511–515 (2010). https://doi.org/10.1177/1094342010385729
Article Google Scholar
NVIDIA: NVCC OpenMP, June 2021. https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html#openmp-use
Planas, J., Badia, R.M., Ayguadé, E., Labarta, J.: Self-adaptive OmpSs tasks in heterogeneous environments. In: 27th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2013, 20–24 May 2013, Cambridge, MA, USA, pp. 138–149. IEEE Computer Society (2013). https://doi.org/10.1109/IPDPS.2013.53
Valero-Lara, P., Catalán, S., Martorell, X., Labarta, J.: BLAS-3 optimized by OmpSs regions (LASs library). In: 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019, 13–15 February 2019, Pavia, Italy, pp. 25–32. IEEE (2019). https://doi.org/10.1109/EMPDP.2019.8671545
Valero-Lara, P., Catalán, S., Martorell, X., Usui, T., Labarta, J.: sLASs: a fully automatic auto-tuned linear algebra library based on OpenMP extensions implemented in OmpSs (LASs library). J. Parallel Distrib. Comput. 138, 153–171 (2020). https://doi.org/10.1016/j.jpdc.2019.12.002
Article Google Scholar
Valero-Lara, P., Sirvent, R., Peña, A.J., Labarta, J.: MPI+OpenMP tasking scalability for multi-morphology simulations of the human brain. Parallel Comput. 84, 50–61 (2019). https://doi.org/10.1016/j.parco.2019.03.006
Article Google Scholar
Valero-Lara, P., Sirvent, R., Peña, A.J., Martorell, X., Labarta, J.: MPI+OpenMP tasking scalability for the simulation of the human brain: human brain project. In: Proceedings of the 25th European MPI Users’ Group Meeting, 23–26 September 2018, Barcelona, Spain, pp. 5:1–5:8. ACM (2018). https://doi.org/10.1145/3236367.3236373

Download references

Author information

Authors and Affiliations

Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA
Pedro Valero-Lara, Jungwon Kim, Oscar Hernandez & Jeffrey Vetter

Authors

Pedro Valero-Lara
View author publications
You can also search for this author in PubMed Google Scholar
Jungwon Kim
View author publications
You can also search for this author in PubMed Google Scholar
Oscar Hernandez
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey Vetter
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pedro Valero-Lara .

Editor information

Editors and Affiliations

University of Lisbon, Lisbon, Portugal
Ricardo Chaves
Department of Computer Engineering, CiTIUS, University of Santiago de Compostela, Santiago de Compostela, La Coruña, Spain
Dora B. Heras
University of Lisbon, Lisbon, Portugal
Aleksandar Ilic
Koç University, Istanbul, Turkey
Didem Unat
Barcelona Supercomputing Center, Barcelona, Spain
Rosa M. Badia
University of Stirling, Stirling, UK
Andrea Bracciali
Louisiana State University, Baton Rouge, USA
Patrick Diehl
Mathematics and Computer Science, Argonne National Laboratory, Lemont, IL, USA
Anshu Dubey
Ajou University, Suwon, Korea (Republic of)
Oh Sangyoon
Tennessee Technological University, Cookeville, TN, USA
Stephen L. Scott
University of Pisa, Pisa, Italy
Laura Ricci

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Valero-Lara, P., Kim, J., Hernandez, O., Vetter, J. (2022). OpenMP Target Task: Tasking and Target Offloading on Heterogeneous Systems. In: Chaves, R., et al. Euro-Par 2021: Parallel Processing Workshops. Euro-Par 2021. Lecture Notes in Computer Science, vol 13098. Springer, Cham. https://doi.org/10.1007/978-3-031-06156-1_35

Download citation

DOI: https://doi.org/10.1007/978-3-031-06156-1_35
Published: 09 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06155-4
Online ISBN: 978-3-031-06156-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

OpenMP Target Task: Tasking and Target Offloading on Heterogeneous Systems

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Feasibility Studies in Multi-GPU Target Offloading

Toward Supporting Multi-GPU Targets via Taskloop and User-Defined Schedules

OmpSs-OpenCL Programming Model for Heterogeneous Systems

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

OpenMP Target Task: Tasking and Target Offloading on Heterogeneous Systems

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Feasibility Studies in Multi-GPU Target Offloading

Toward Supporting Multi-GPU Targets via Taskloop and User-Defined Schedules

OmpSs-OpenCL Programming Model for Heterogeneous Systems

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation