Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2148600.2148612acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
poster

Poster: fast GPU read alignment with burrows wheeler transform based index

Published: 12 November 2011 Publication History

Abstract

We address the problem of performing faster read alignment on GPU devices. The task of DNA sequence processing is extremely computationally intensive as constant progress in sequencing technology leads to ever-increasing amounts of sequence data[6]. One of possible solutions for this problem is to use the extreme parallel capacities of modern GPU devices[5]. However, performance characteristics and programming models for GPU differ from those of traditional architectures and require new approaches.
Most importantly, host memory and I/O systems are not directly accessible from a GPU device and GPU memory is usually an order of magnitude smaller than memory on a host. Considering the size of read alignment data, the memory limit becomes a real problem: when reference sequence index does not fit into memory it has to be split into chunks that will be processed individually. In most cases the complexity of the algorithm does not depend on the index size, so such index splitting increases computation time tremendously. Analysis of existing solutions for read alignment on GPU showed that memory limit is the chief performance issue. One of the attempts to reduce memory consumption consisted in replacing commonly used suffix tree, which allows for better theoretical performance of the algorithm [4], with suffix array, which is less efficient in terms of pure computational complexity but more compact. By doing this, authors of MummerGPU++ achieved several times better performance[3].
We suggest using Burrows-Wheeler Transform[1] for both index and the corresponding search algorithm to achieve much smaller memory footprint. This transform is used mainly in compression algorithms such as bzip2 as it replaces reoccurring patterns in the string by continuous runs of a single symbol, but it can be also used for pattern matching[2]. At the same time we continue using more traditional suffix array on host side to benefit from computational characteristics of both GPU and CPU. We reduced index size 12 times and just by doing this achieved 3-4 time performance improvement compared to suffix-array based solution MummerGPU++.
Since even with this compressed index workload size can exceed available device memory we developed a performance model to analyze how overall execution time is affected by proportions and succession in memory is allocated for chunks of index and query set. This model allowed us to find best balance of memory allocation and double performance compared to naive approach when we allocate equal shares of memory for index and queries. The model is then applied to show that using multiple GPUs is a way not only to speed up application, but also to overcome some single-GPU performance issues and have super-linear scaling at least on number of GPUs typically available on one host.

Supplementary Material

PDF File (post139.pdf)

References

[1]
M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, 1994.
[2]
P. Ferragina and G. Manzini. Indexing compressed text. Journal of the ACM, 53(4):552--581, 2005.
[3]
A. Gharaibeh and M. Ripeanu. Size matters: Space/time tradeoffs to improve gpgpu applications performance. In SC '10 Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society, 2010.
[4]
D. Gusfield. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.
[5]
J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohm, and T. J. Purcell. A survay on general-purpose computation on graphics hardware. Computer Graphics Forum, 26(1):80--113, 2007.
[6]
M. Pop. Genome assembly reborn: recent computational challenges. Briefings in Bioinformatics, 10:354, 2009.

Index Terms

  1. Poster: fast GPU read alignment with burrows wheeler transform based index

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SC '11 Companion: Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion
    November 2011
    166 pages
    ISBN:9781450310307
    DOI:10.1145/2148600

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 November 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. gpu
    2. read alignment

    Qualifiers

    • Poster

    Conference

    SC '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 139
      Total Downloads
    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media