Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3627535.3638474acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article
Open access

AGAThA: Fast and Efficient GPU Acceleration of Guided Sequence Alignment for Long Read Mapping

Published: 20 February 2024 Publication History

Abstract

With the advance in genome sequencing technology, the lengths of deoxyribonucleic acid (DNA) sequencing results are rapidly increasing at lower prices than ever. However, the longer lengths come at the cost of a heavy computational burden on aligning them. For example, aligning sequences to a human reference genome can take tens or even hundreds of hours. The current de facto standard approach for alignment is based on the guided dynamic programming method. Although this takes a long time and could potentially benefit from high-throughput graphic processing units (GPUs), the existing GPU-accelerated approaches often compromise the algorithm's structure, due to the GPU-unfriendly nature of the computational pattern. Unfortunately, such compromise in the algorithm is not tolerable in the field, because sequence alignment is a part of complicated bioinformatics analysis pipelines. In such circumstances, we propose AGAThA, an exact and efficient GPU-based acceleration of guided sequence alignment. We diagnose and address the problems of the algorithm being unfriendly to GPUs, which comprises strided/redundant memory accesses and workload imbalances that are difficult to predict. According to the experiments on modern GPUs, AGAThA achieves 18.8× speedup against the CPU-based baseline, 9.6× against the best GPU-based baseline, and 3.6× against GPU-based algorithms with different heuristics.

References

[1]
Nauman Ahmed, Jonathan Lévy, Shanshan Ren, Hamid Mushtaq, Koen Bertels, and Zaid Al-Ars. 2019. GASAL2: A GPU accelerated sequence alignment library for high-throughput NGS data. BMC bioinformatics 20 (2019), 1--20.
[2]
Mohammed Alser, Joel Lindegger, Can Firtina, Nour Almadhoun, Haiyu Mao, Gagandeep Singh, Juan Gomez-Luna, and Onur Mutlu. 2022. From molecules to genomic variations: Accelerating genome analysis via intelligent algorithms and architectures. Computational and Structural Biotechnology Journal 20 (2022), 4579--4599.
[3]
Mohammed Alser, Jeremy Rotman, Dhrithi Deshpande, Kodi Taraszka, Huwenbo Shi, Pelin Icer Baykal, Harry Taegyun Yang, Victor Xue, Sergey Knyazev, Benjamin D Singer, et al. 2021. Technology dictates algorithms: recent developments in read alignment. Genome biology 22, 1 (2021), 249.
[4]
Stephen F Altschul, Thomas L Madden, Alejandro A Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research 25, 17 (1997), 3389--3402.
[5]
Muaaz G Awan, Jack Deslippe, Aydin Buluc, Oguz Selvitopi, Steven Hofmeyr, Leonid Oliker, and Katherine Yelick. 2020. ADEPT: A domain independent sequence alignment strategy for GPU architectures. BMC bioinformatics 21, 1 (2020), 1--29.
[6]
Jacek Blazewicz, Wojciech Frohmberg, Michal Kierzynka, Erwin Pesch, and Pawel Wojciechowski. 2011. Protein alignment algorithms with an efficient backtracking routine on multiple GPUs. BMC bioinformatics 12, 1 (2011), 1--17.
[7]
Abhishek Bohra, Uday Chand Jha, Ian D Godwin, and Rajeev Kumar Varshney. 2020. Genomic interventions for sustainable agriculture. Plant Biotechnology Journal 18, 12 (2020), 2388--2405.
[8]
Sanjay Chatterjee, Max Grossman, Alina Sbîrlea, and Vivek Sarkar. 2011. Dynamic task parallelism with a GPU work-stealing runtime system. In LCPC Workshop. Springer, Fort Collins, CO, USA, 203--217.
[9]
Jack Choquette. 2023. NVIDIA Hopper H100 GPU: Scaling Performance. IEEE Micro 43 (2023), 9--17.
[10]
Edans Flavius de Oliveira Sandes, Guillermo Miranda, Xavier Martorell, Eduard Ayguade, George Teodoro, and Alba Cristina Magalhaes Melo. 2016. CUDAlign 4.0: Incremental speculative traceback for exact chromosome-wide alignment in GPU clusters. IEEE Transactions on Parallel and Distributed Systems 27, 10 (2016), 2838--2850.
[11]
F de O Edans, Guillermo Miranda, Alba CMA de Melo, Xavier Martorell, and Eduard Ayguadé. 2014. CUDAlign 3.0: Parallel biological sequence comparison in large GPU clusters. In CCGrid. IEEE, Chicago, IL, USA, 160--169.
[12]
Zonghao Feng, Shuang Qiu, Lipeng Wang, and Qiong Luo. 2019. Accelerating long read alignment on three processors. In ICPP. ACM, Kyoto, Japan, 1--10.
[13]
Daichi Fujiki, Shunhao Wu, Nathan Ozog, Kush Goliya, David Blaauw, Satish Narayanasamy, and Reetuparna Das. 2020. SeedEx: A Genome Sequencing Accelerator for Optimal Alignments in Subminimal Space. In MICRO. IEEE, Athens, Greece, 937--950.
[14]
Hasindu Gamaarachchi, Chun Wai Lam, Gihan Jayatilaka, Hiruna Samarakoon, Jared T Simpson, Martin A Smith, and Sri Parameswaran. 2020. GPU accelerated adaptive banded event alignment for rapid comparative nanopore signal analysis. BMC bioinformatics 21 (2020), 1--13.
[15]
Alice Maria Giani, Guido Roberto Gallo, Luca Gianfranceschi, and Giulio Formenti. 2020. Long walk to genomics: History and current approaches to genome sequencing and assembly. Computational and Structural Biotechnology Journal 18 (2020), 9--19.
[16]
Kshitij Gupta, Jeff A Stuart, and John D Owens. 2012. A study of persistent threads style GPU programming for GPGPU workloads. In InPar. IEEE, San Jose, CA, USA, 1--14.
[17]
Sungpack Hong, Sang Kyun Kim, Tayo Oguntebi, and Kunle Olukotun. 2011. Accelerating CUDA graph algorithms at maximum warp. In PPoPP. ACM, San Antonio, TX, USA, 267--276.
[18]
Saurabh Kalikar, Chirag Jain, Md Vasimuddin, and Sanchit Misra. 2022. Accelerating minimap2 for long-read sequencing applications on modern CPUs. Nature Computational Science 2, 2 (2022), 78--83.
[19]
Farzad Khorasani, Rajiv Gupta, and Laxmi N Bhuyan. 2015. Efficient warp execution in presence of divergence with collaborative context collection. In MICRO. ACM, Waikiki, HI, USA, 204--215.
[20]
Farzad Khorasani, Bryan Rowe, Rajiv Gupta, and Laxmi N Bhuyan. 2016. Eliminating intra-warp load imbalance in irregular nested patterns via collaborative task engagement. In IPDPS. IEEE, Chicago, IL, USA, 524--533.
[21]
Matija Korpar and Mile Šikić. 2013. SW#-GPU-enabled exact alignments on genome scale. Bioinformatics 29, 19 (2013), 2494--2495.
[22]
Heng Li. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997 [q-bio.GN]
[23]
Heng Li. 2018. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics 34, 18 (2018), 3094--3100.
[24]
Chi-Man Liu, Thomas Wong, Edward Wu, Ruibang Luo, Siu-Ming Yiu, Yingrui Li, Bingqiang Wang, Chang Yu, Xiaowen Chu, Kaiyong Zhao, et al. 2012. SOAP3: Ultra-fast GPU-based parallel alignment tool for short reads. Bioinformatics 28, 6 (2012), 878--879.
[25]
Weiguo Liu, Bertil Schmidt, Gerrit Voss, Andre Schroder, and Wolfgang Muller-Wittig. 2006. Bio-sequence database scanning on a GPU. In IPDPS. IEEE, Rhodes Island, Greece, 1--8.
[26]
Yang Liu, Wayne Huang, John Johnson, and Sheila Vaidya. 2006. GPU accelerated smith-waterman. In ICCS. Springer, Reading, UK, 188--195.
[27]
Yongchao Liu, Douglas L Maskell, and Bertil Schmidt. 2009. CUDASW++: Optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units. BMC research notes 2, 1 (2009), 1--10.
[28]
Yongchao Liu, Bernt Popp, and Bertil Schmidt. 2013. High-speed and accurate color-space short-read alignment with CUSHAW2. arXiv:1304.4766 [q-bio.GN]
[29]
Yongchao Liu, Bernt Popp, and Bertil Schmidt. 2014. CUSHAW3: Sensitive and accurate base-space and color-space short-read alignment with hybrid seeding. PloS one 9, 1 (2014), e86869.
[30]
Yongchao Liu and Bertil Schmidt. 2015. GSWABE: Faster GPU-accelerated sequence alignment with optimal alignment retrieval for short DNA sequences. Concurrency and Computation: Practice and Experience 27, 4 (2015), 958--972.
[31]
Yongchao Liu, Bertil Schmidt, and Douglas L Maskell. 2010. CUDASW++ 2.0: Enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions. BMC research notes 3, 1 (2010), 1--12.
[32]
Yongchao Liu, Bertil Schmidt, and Douglas L Maskell. 2012. CUSHAW: A CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform. Bioinformatics 28, 14 (2012), 1830--1837.
[33]
Yongchao Liu, Adrianto Wirawan, and Bertil Schmidt. 2013. CUDASW++ 3.0: Accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions. BMC bioinformatics 14, 1 (2013), 1--10.
[34]
Tuomo Mantere, Simone Kersten, and Alexander Hoischen. 2019. Long-read sequencing emerging in medical genetics. Frontiers in genetics 10 (2019), 426.
[35]
NCBI. 1982. GenBank. https://www.ncbi.nlm.nih.gov/genbank/, visited 2024-01-15.
[36]
Saul B Needleman and Christian D Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology 48, 3 (1970), 443--453.
[37]
NIST. 2012. Genome in a Bottle. https://www.nist.gov/programs-projects/genome-bottle, visited 2024-01-16.
[38]
NVIDIA. 2022. Boosting Dynamic Programming Performance Using NVIDIA Hopper GPU DPX Instructions. https://developer.nvidia.com/blog/boosting-dynamic-programming-performance-using-nvidia-hopper-gpu-dpx-instructions/, visited 2024-01-16.
[39]
NVIDIA. 2022. NVIDIA Hopper GPU Architecture Accelerates Dynamic Programming Up to 40x Using New DPX Instructions. https://blogs.nvidia.com/blog/2022/03/22/nvidia-hopper-accelerates-dynamic-programming-using-dpx-instructions/, visited 2024-01-15.
[40]
PacBio. 2023. PacBio - Sequence with Confidence. https://www.pacb.com/, visited 2023-04-21.
[41]
Seongyeon Park. 2024. readwrite112/AGAThA: AGAThA: Fast and Efficient GPU Acceleration of Guided Sequence Alignment for Long Read Mapping. AISys.
[42]
Seongyeon Park, Hajin Kim, Tanveer Ahmad, Nauman Ahmed, Zaid Al-Ars, H Peter Hofstee, Youngsok Kim, and Jinho Lee. 2022. SALoBa: Maximizing Data Locality and Workload Balance for Fast Sequence Alignment on GPUs. In IPDPS. IEEE, Lyon, France, 728--738.
[43]
Julian Parkhill and Brendan W Wren. 2011. Bacterial epidemiology and biology-lessons from genome sequencing. Genome biology 12 (2011), 1--7.
[44]
NCBI RefSeq. 2019. GRCh38.p13. https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000001405.39/, visited 2024-01-16.
[45]
Harisankar Sadasivan, Milos Maric, Eric Dawson, Vishanth Iyer, Johnny Israeli, and Satish Narayanasamy. 2023. Accelerating Minimap2 for accurate long read alignment on GPUs. Journal of biotechnology and biomedicine 6, 1 (2023), 13--23.
[46]
Edans Flavius de O Sandes and Alba Cristina MA de Melo. 2011. Smith-Waterman alignment of huge sequences with GPU in linear space. In IPDPS. IEEE, Anchorage, AK, USA, 1199--1211.
[47]
Edans Flavius O Sandes and Alba Cristina MA de Melo. 2010. CUD-Align: Using GPU to accelerate the comparison of megabase genomic sequences. In PPoPP. ACM, New York, NY, USA, 137--146.
[48]
Temple F Smith, Michael S Waterman, et al. 1981. Identification of common molecular subsequences. Journal of molecular biology 147, 1 (1981), 195--197.
[49]
Markus Steinberger, Bernhard Kainz, Bernhard Kerbl, Stefan Hauswiesner, Michael Kenzel, and Dieter Schmalstieg. 2012. Softshell: dynamic scheduling on gpus. ACM Transactions on Graphics 31, 6 (2012), 1--11.
[50]
Markus Steinberger, Michael Kenzel, Pedro Boechat, Bernhard Kerbl, Mark Dokter, and Dieter Schmalstieg. 2014. Whippletree: Task-based scheduling of dynamic workloads on the GPU. ACM Transactions on Graphics 33, 6 (2014), 1--11.
[51]
Yutaka Suzuki. 2020. Advent of a new sequencing era: long-read and on-site sequencing. Journal of Human Genetics 65, 1 (2020), 1--1.
[52]
Oxford Nanopore Technologies. 2018. PromethION. https://nanoporetech.com/products/promethion, visited 2024-01-16.
[53]
Stanley Tzeng, Anjul Patney, and John D Owens. 2010. Task management for irregular-parallel workloads on the GPU. In HPG. ACM, Saarbrucken, Germany, 29--37.
[54]
Ze-Gang Wei, Shao-Wu Zhang, and Fei Liu. 2020. smsMap: mapping single molecule sequencing reads by locating the alignment starting positions. BMC bioinformatics 21, 1 (2020), 1--15.
[55]
Aaron M Wenger, Paul Peluso, William J Rowell, Pi-Chuan Chang, Richard J Hall, Gregory T Concepcion, Jana Ebler, Arkarachai Fungtammasan, Alexey Kolesnikov, Nathan D Olson, et al. 2019. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nature biotechnology 37, 10 (2019), 1155--1162.
[56]
Bo Wu, Guoyang Chen, Dong Li, Xipeng Shen, and Jeffrey Vetter. 2015. Enabling and exploiting flexible task assignment on GPU through SM-centric program transformations. In ICS. ACM, Newport Beach, CA, USA, 119--130.
[57]
Alberto Zeni, Giulia Guidi, Marquita Ellis, Nan Ding, Marco D Santambrogio, Steven Hofmeyr, Aydın Buluç, Leonid Oliker, and Katherine Yelick. 2020. LOGAN: High-performance GPU-based x-drop long-read alignment. In IPDPS. IEEE, New Orleans, LA, USA, 462--471.
[58]
Eddy Z Zhang, Yunlian Jiang, Ziyu Guo, and Xipeng Shen. 2010. Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping. In ICS. ACM, Tsukuba, Ibaraki, Japan, 115--126.
[59]
Zhen Zheng, Chanyoung Oh, Jidong Zhai, Xipeng Shen, Youngmin Yi, and Wenguang Chen. 2017. Versapipe: a versatile programming framework for pipelined computing on GPU. In MICRO. ACM, Cambridge, MA, USA, 587--599.

Cited By

View all
  • (2024)CUDASW++4.0: ultra-fast GPU-based Smith–Waterman protein sequence database searchBMC Bioinformatics10.1186/s12859-024-05965-625:1Online publication date: 2-Nov-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '24: Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming
March 2024
498 pages
ISBN:9798400704352
DOI:10.1145/3627535
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 February 2024

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. GPU acceleration
  2. genome sequence alignment
  3. long reads
  4. dynamic programming

Qualifiers

  • Research-article

Conference

PPoPP '24

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)709
  • Downloads (Last 6 weeks)75
Reflects downloads up to 29 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)CUDASW++4.0: ultra-fast GPU-based Smith–Waterman protein sequence database searchBMC Bioinformatics10.1186/s12859-024-05965-625:1Online publication date: 2-Nov-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media