Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3535508.3545591acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article
Public Access

Performance portability study of epistasis detection using SYCL on NVIDIA GPU

Published: 07 August 2022 Publication History

Abstract

We describe the experience of converting a CUDA implementation of a high-order epistasis detection algorithm to SYCL. The goals are for our work to be useful to application and compiler developers with a detailed description of migration paths between CUDA and SYCL. Evaluating the CUDA and SYCL applications on an NVIDIA V100 GPU, we find that the optimization of loop unrolling needs to be applied manually to the SYCL kernel for obtaining comparable performance. The performance of the SYCL group reduce function, an alternative to the CUDA warp-based reduction, depends on the problem and work group sizes. The 64-bit popcount operation implemented with tree of adders is slightly faster than the built-in popcount operation. When the number of OpenMP threads is four, the highest performance of the SYCL and CUDA applications are comparable.

References

[1]
Garland, M., Le Grand, S., Nickolls, J., Anderson, J., Hardwick, J., Morton, S., Phillips, E., Zhang, Y. and Volkov, V., 2008. Parallel computing experiences with CUDA. IEEE MICRO, 28(4), pp.13--27.
[2]
Munshi, A., Gaster, B., Mattson, T.G. and Ginsburg, D., 2011. OpenCL programming guide. Pearson Education.
[3]
Kaeli, D., Mistry, P., Schaa, D. and Zhang, D.P., 2015. Heterogeneous computing with OpenCL 2.0. Morgan Kaufmann.
[4]
Li, P., Brunet, E., Trahay, F., Parrot, C., Thomas, G. and Namyst, R., 2015, September. Automatic OpenCL code generation for multi-device heterogeneous architectures. In 2015 44th International Conference on Parallel Processing (pp. 959--968). IEEE.
[5]
Steuwer, M. and Gorlatch, S., 2014. SkelCL: a high-level extension of OpenCL for multi-GPU systems. The Journal of Supercomputing, 69(1), pp.25--33.
[6]
Reinders, J., Ashbaugh, B., Brodman, J., Kinsner, M., Pennycook, J. and Tian, X., 2021. Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL. Springer Nature.
[7]
Stroustrup, B., 2013. The C++ Programming Language. Pearson Education.
[8]
Lin, Z., Wang, Z., Hegarty, J.P., Lin, T.R., Wang, Y., Deiling, S., Wu, R., Thomas, N.J. and Floros, J.: Genetic association and epistatic interaction of the interleukin-10 signaling pathway in pediatric inflammatory bowel disease. World journal of gastroenterology 23(27), 4897 -- 4909 (Jul 2017).
[9]
Niel, C., Sinoquet, C., Dina, C, Rocheleau, G.: A survey about methods dedicated to epistasis detection. Frontiers in genetics (6):285, 1 - 19 (Sep 2015).
[10]
Ritchie, M.D., Hahn, L.W., Roodi, N., Bailey, L.R., Dupont, W.D., Parl, F.F., Moore, J.H.: Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. American journal of human genetics 69(1), 138 - 147 (Jul 2001).
[11]
Zubenko, G.S., Hughes, H.B.r., Zubenko, W.N.: D10s1423 identifies a susceptibility locus for Alzheimer's disease (ad7) in a prospective, longitudinal, double-blind study of asymptomatic individuals: results at 14 years. American journal of medical genetics. Part B, Neuropsychiatric genetics: the official publication of the International Society of Psychiatric Genetics 153B(2), 359 - 364 (Mar 2010).
[12]
Gonzalez-Dominguez, J., Schmidt, B.: GPU-accelerated exhaustive search for third-order epistatic interactions in case-control studies. Journal of Computational Science 8, 93 - 100 (2015).
[13]
Nobre, R., Santander-Jiménez, S., Sousa, L. and Ilic, A., 2020, May. Accelerating 3-way Epistasis Detection with CPU+GPU processing. In Workshop on Job Scheduling Strategies for Parallel Processing (pp. 106--126). Springer, Cham.
[14]
Chapman, B., Jost, G. and Van Der Pas, R., 2007. Using OpenMP: portable shared memory parallel programming. MIT press.
[15]
Ponte-Fernandez, C., Gonzalez-Dominguez, J., Martin, M.J.: Fast search of third-order epistatic interactions on CPU and GPU clusters. The International Journal of High-Performance Computing Applications 34(1), 20 -- 29 (2020).
[16]
Reyes, R., Brown, G. and Burns, R., 2020, April. Bringing performant support for NVIDIA hardware to SYCL. In Proceedings of the International Workshop on OpenCL (pp. 1--1).
[17]
https://github.com/intel/llvm/blob/sycl/sycl/doc/PluginInterface.md
[18]
https://github.com/intel/llvm
[19]
https://github.com/intel/llvm/blob/sycl/sycl/doc/CompilerAndRuntimeDesign.md
[20]
Lattner, C. and Adve, V., 2004, March. LLVM: A compilation framework for lifelong program analysis & transformation. In International Symposium on Code Generation and Optimization, 2004. CGO 2004. (pp. 75--86). IEEE.
[21]
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html
[22]
https://github.com/hiperbio/cuda-episdet
[23]
https://github.com/intel/llvm/releases/tag/2021-12
[24]
Farber, R., 2011. CUDA application design and development. Morgan Kaufmann (pp 85--108)
[25]
Z. Jin and H. Finkel, Population Count on Intel CPU, GPU, and FPGA, 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2020, pp. 432--439
[26]
J. Luitjens, Faster Parallel Reductions on Kepler, Feb. 2014, [online] Available: http://devblogs.nvidia.com/parallelforall/faster-parallelreductions-kepler.
[27]
SYCL 2020 Specification (revision 5) [online] https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html
[28]
Homerding, B. and Tramm, J., 2020, April. Evaluating the Performance of the hipSYCL Toolchain for HPC Kernels on NVIDIA V100 GPUs. In Proceedings of the International Workshop on OpenCL (pp. 1--7).
[29]
Haseeb, M., Ding, N., Deslippe, J. and Awan, M., 2021, November. Evaluating Performance and Portability of a core bioinformatics kernel on multiple vendor GPUs. In 2021 International Workshop on Performance, Portability and Productivity in HPC (P3HPC) (pp. 68--78). IEEE.
[30]
Joó, B., Kurth, T., Clark, M.A., Kim, J., Trott, C.R., Ibanez, D., Sunderland, D. and Deslippe, J., 2019, November. Performance portability of a Wilson Dslash stencil operator mini-app using Kokkos and SYCL. In 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC) (pp. 14--25). IEEE.
[31]
David J Hardy, Jaemin Choi, Wei Jiang, and Emad Tajkhorshid. 2022. Experiences Porting NAMD to the Data Parallel C++ Programming Model. In International Workshop on OpenCL (IWOCL'22). Association for Computing Machinery, New York, NY, USA, Article 15, 1--5.
[32]
Marcel Breyer, Alexander Van Craen, and Dirk Pflüger. 2022. A Comparison of SYCL, OpenCL, CUDA, and OpenMP for Massively Parallel Support Vector Machine Classification on Multi-Vendor Hardware. In International Workshop on OpenCL (IWOCL'22). Association for Computing Machinery, New York, NY, USA, Article 2, 1--12.
[33]
Jin, Zheming. 2022. Experience of Migrating Parallel Graph Coloring from CUDA to SYCL. United States. https://www.osti.gov/servlets/purl/1864412.

Cited By

View all
  • (2024)Enabling performance portability on the LiGen drug discovery pipelineFuture Generation Computer Systems10.1016/j.future.2024.03.045158:C(44-59)Online publication date: 1-Sep-2024
  • (2024)Assessing opportunities of SYCL for biological sequence alignment on GPU-based systemsThe Journal of Supercomputing10.1007/s11227-024-05907-280:9(12599-12622)Online publication date: 19-Feb-2024
  • (2023)Comparing Performance and Portability Between CUDA and SYCL for Protein Database Search on NVIDIA, AMD, and Intel GPUs2023 IEEE 35th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD59825.2023.00023(141-148)Online publication date: 17-Oct-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
BCB '22: Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
August 2022
549 pages
ISBN:9781450393867
DOI:10.1145/3535508
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPU
  2. epistasis
  3. portability
  4. programming model

Qualifiers

  • Research-article

Funding Sources

Conference

BCB '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 254 of 885 submissions, 29%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)86
  • Downloads (Last 6 weeks)13
Reflects downloads up to 02 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Enabling performance portability on the LiGen drug discovery pipelineFuture Generation Computer Systems10.1016/j.future.2024.03.045158:C(44-59)Online publication date: 1-Sep-2024
  • (2024)Assessing opportunities of SYCL for biological sequence alignment on GPU-based systemsThe Journal of Supercomputing10.1007/s11227-024-05907-280:9(12599-12622)Online publication date: 19-Feb-2024
  • (2023)Comparing Performance and Portability Between CUDA and SYCL for Protein Database Search on NVIDIA, AMD, and Intel GPUs2023 IEEE 35th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD59825.2023.00023(141-148)Online publication date: 17-Oct-2023
  • (2023)A Performance Portability Study Using Tensor Contraction Benchmarks2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW59300.2023.00102(591-600)Online publication date: May-2023
  • (2023)Understanding Performance Portability of SYCL Kernels: A Case Study with the All-Pairs Distance Calculation in Bioinformatics on GPUs2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW59300.2023.00067(366-372)Online publication date: May-2023
  • (2023)Understanding SYCL Portability for Pseudorandom Number Generation: a Case Study with Gene-Expression Connectivity Mapping2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW59300.2023.00057(295-298)Online publication date: May-2023
  • (2022)Understanding Performance Portability of Bioinformatics Applications in SYCL on an NVIDIA GPU2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM55620.2022.9995222(2190-2195)Online publication date: 6-Dec-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media