Abstract
With discrete Intel GPUs entering the high performance computing landscape, there is an urgent need for production-ready software stacks for these platforms. In this paper, we report how we prepare the Ginkgo math library for Intel GPUs by developing a kernel backed based on the DPC++ programming environment. We discuss conceptual differences to the CUDA and HIP programming models and describe workflows for simplified code conversion. We benchmark advanced sparse linear algebra routines utilizing the converted kernels to assess the efficiency of the DPC++ backend in the hardware-specific performance bounds. We compare the performance of basic building blocks against routines providing the same functionality that ship with Intel’s oneMKL vendor library.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
These extensions are now part of the SYCL 2020 Specification: https://www.khronos.org/news/press/khronos-releases-sycl-2020-final-specification.
- 3.
- 4.
- 5.
- 6.
Ginkgo is designed to compile for IEEE 754 double precision, single precision, double precision complex, and single precision complex arithmetic.
- 7.
At the point of writing, oneMKL does not provide a COO implementation and CSR can only operate on shared memory on the Gen12 architecture.
References
Anzt, H., et al.: Ginkgo: a high performance numerical linear algebra library. J. Open Source Softw. 5(52), 2260 (2020). https://doi.org/10.21105/joss.02260
Cojean, T., Tsai, Y.H.M., Anzt, H.: Ginkgo - a math library designed for platform portability (2020). https://www.sciencedirect.com/science/article/abs/pii/S0167819122000096
Deakin, T., Price, J., Martineau, M., McIntosh-Smith, S.: Evaluating attainable memory bandwidth of parallel programming models via babelstream. Int. J. Comput. Sci. Eng. 17, 247–262 (2017)
Keryell, R., Reyes, R., Howes, L.: Khronos SYCL for OpenCL: a tutorial. In: Proceedings of the 3rd International Workshop on OpenCL, IWOCL 2015. Association for Computing Machinery, New York (2015). https://doi.org/10.1145/2791321.2791345
Konstantinidis, E., Cotronis, Y.: A quantitative roofline model for GPU kernel performance estimation using micro-benchmarks and hardware metric profiling. J. Parallel Distrib. Comput. 107, 37–56 (2017). https://doi.org/10.1016/j.jpdc.2017.04.002
Tsai, Y.M., Cojean, T., Ribizel, T., Anzt, H.: Preparing Ginkgo for AMD GPUs – a testimonial on porting CUDA code to HIP. In: Balis, B., et al. (eds.) Euro-Par 2020. LNCS, vol. 12480, pp. 109–121. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-71593-9_9
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Tsai, Y.M., Cojean, T., Anzt, H. (2022). Porting Sparse Linear Algebra to Intel GPUs. In: Chaves, R., et al. Euro-Par 2021: Parallel Processing Workshops. Euro-Par 2021. Lecture Notes in Computer Science, vol 13098. Springer, Cham. https://doi.org/10.1007/978-3-031-06156-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-06156-1_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06155-4
Online ISBN: 978-3-031-06156-1
eBook Packages: Computer ScienceComputer Science (R0)