Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3225058.3225134acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Accelerating FM-index Search for Genomic Data Processing

Published: 13 August 2018 Publication History

Abstract

The deluge of genomics data is incurring prohibitively high computational costs. As an important building block for genomic data processing algorithms, FM-index search occupies most of execution time in sequence alignment. Due to massive random streaming memory references relative to only small amount of computations, FM-index search algorithm exhibits extremely low efficiency on conventional architectures. This paper proposes Niubility, an accelerator for FM-index search in genomic sequence alignment. Based on our algorithm-architecture co-design analysis, we found that conventional architectures exploit low memory-level parallelism so that the available memory bandwidth cannot be fully utilized. Niubility accelerator customizes bit-wise operations and exploit data-level parallelism, that produces maximal concurrent memory accesses to saturate memory bandwidth. We implement an accelerator ASIC in a ST 28nm process that achieves up to 990x speedup over the state-of-the-art software.

References

[1]
{n. d.}. Amazon Elastic Compute Cloud (Amazon EC2). ({n. d.}).
[2]
Intel Xeon Phi 7250F. {n. d.}. https://www.exxactcorp.com/Intel-HJ8066702268900-E316536. ({n. d.}). Accessed April 1, 2018.
[3]
GenAx: A Genome Sequencing Accelerator. {n. d.}. To appear in ISCA 2018. ({n. d.}).
[4]
Mark D Adams, Jenny M Kelley, Jeannine D Gocayne, Mark Dubnick, Mihael H Polymeropoulos, Hong Xiao, Carl R Merril, Andrew Wu, Bjorn Olde, Ruben F Moreno, et al. 1991. Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252, 5013 (1991), 1651--1656.
[5]
Can Alkan, Jeffrey M Kidd, Tomas Marques-Bonet, Gozde Aksay, Francesca Antonacci, Fereydoun Hormozdiari, Jacob O Kitzman, Carl Baker, Maika Malig, Onur Mutlu, et al. 2009. Personalized copy number and segmental duplication maps using next-generation sequencing. Nature genetics 41, 10 (2009), 1061.
[6]
Michael Burrows and David J Wheeler. 1994. A block-sorting lossless data compression algorithm. (1994).
[7]
Stefan Canzar and Steven L Salzberg. 2017. Short read mapping: an algorithmic tour. Proc. IEEE 105, 3 (2017), 436--458.
[8]
Mau-Chung Frank Chang, Yu-Ting Chen, Jason Cong, Po-Tsang Huang, Chun-Liang Kuo, and Cody Hao Yu. 2016. The SMEM Seeding Acceleration for DNA Sequence Alignment. In Field-Programmable Custom Computing Machines (FCCM), 2016 IEEE 24th Annual International Symposium on. IEEE, 32--39.
[9]
Yangho Chen, Tade Souaiaia, and Ting Chen. 2009. PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds. Bioinformatics 25, 19 (2009), 2514--21.
[10]
Yu Ting Chen, Jason Cong, Zhenman Fang, Jie Lei, and Peng Wei. 2016. When Spark Meets FPGAs: A Case Study for Next-Generation DNA Sequencing Acceleration. In IEEE International Symposium on Field-Programmable Custom Computing Machines. 64--70.
[11]
Yu-Ting Chen, Jason Cong, Jie Lei, and Peng Wei. 2015. A novel high-throughput acceleration engine for read alignment. In Field-Programmable Custom Computing Machines (FCCM), 2015 IEEE 23rd Annual International Symposium on. IEEE, 199--202.
[12]
Design Compiler, R User, and Modeling Guide. 2000. Synopsys. Inc., see http://www.synopsys.com (2000).
[13]
Matei David, Misko Dzamba, Dan Lister, Lucian Ilie, and Michael Brudno. 2011. SHRiMP2: sensitive yet practical short read mapping. Bioinformatics 27, 7 (2011), 1011--1012.
[14]
Michael A Eberle, Epameinondas Fritzilas, Peter Krusche, Morten Källberg, Benjamin L Moore, Mitchell A Bekritsky, Zamin Iqbal, Han-Yu Chuang, Sean J Humphray, Aaron L Halpern, et al. 2017. A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome research 27, 1 (2017), 157--164.
[15]
Philippe Faes, Bram Minnaert, Mark Christiaens, Eric Bonnet, Yvan Saeys, Dirk Stroobandt, and Yves Van de Peer. 2006. Scalable hardware accelerator for comparing DNA and protein sequences. In Proceedings of the 1st international conference on Scalable information systems. ACM, 33.
[16]
Edward Fernandez, Walid Najjar, and Stefano Lonardi. 2011. String matching in hardware using the FM-index. In Field-Programmable Custom Computing Machines (FCCM), 2011 IEEE 19th Annual International Symposium on. IEEE, 218--225.
[17]
E. Fernandez, W. Najjar, S. Lonardi, and E. Harris. 2011. String Matching in Hardware Using the FM-Index. In IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines(FCCM). 218--225.
[18]
Paolo Ferragina and Giovanni Manzini. 2000. Opportunistic data structures with applications. In Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on. IEEE, 390--398.
[19]
Uscs genome browser. {n.d.}. http://hgdownload.soe.ucsc.edu/. ({n.d.}). Accessed April 1, 2018.
[20]
Roberto Grossi and Jeffrey Scott Vitter. 2005. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35, 2 (2005), 378--407.
[21]
Faraz Hach, Fereydoun Hormozdiari, Can Alkan, Farhad Hormozdiari, Inanc Birol, Evan E Eichler, and S Cenk Sahinalp. 2010. mrsFAST: a cache-oblivious algorithm for short-read mapping. Nature methods 7, 8 (2010), 576.
[22]
Nils Homer, Barry Merriman, and Stanley F Nelson. 2009. BFAST: an alignment tool for large scale genome resequencing. PloS one 4, 11 (2009), e7767.
[23]
illumina. 2017. NovaSeq. https://www.illumina.com/systems/sequencing-platforms/novaseq.html. (2017).
[24]
Xianyang Jiang, Xinchun Liu, Lin Xu, Peiheng Zhang, and Ninghui Sun. 2007. A reconfigurable accelerator for smith-waterman algorithm. IEEE Transactions on Circuits and Systems II: Express Briefs 54, 12 (2007), 1077--1081.
[25]
Benjamin J. Kelly, James R. Fitch, Yangqiu Hu, Donald J. Corsmeier, Huachun Zhong, Amy N. Wetzel, Russell D. Nordquist, David L. Newsom, and Peter White. 2015. Churchill: an ultra-fast, deterministic, highly scalable and balanced parallelization strategy for the discovery of human genetic variation in clinical and population-scale genomics. Genome Biology 16, 1 (2015), 1--14.
[26]
Yoongu Kim, Weikun Yang, and Onur Mutlu. 2016. Ramulator: A fast and extensible DRAM simulator. IEEE Computer architecture letters 15, 1 (2016), 45--49.
[27]
Petr Klus, Simon Lam, Dag Lyberg, Ming S Cheung, Graham Pullan, Ian McFarlane, Giles SH Yeo, and Brian YH Lam. 2012. BarraCUDA-a fast short read sequence aligner using graphics processing units. BMC research notes 5, 1 (2012), 27.
[28]
Ben Langmead and Steven L Salzberg. 2012. Fast gapped-read alignment with Bowtie 2. Nat Meth 9, 4 (04 2012), 357--359.
[29]
Ben Langmead and Steven L Salzberg. 2012. Fast gapped-read alignment with Bowtie 2. Nature methods 9, 4 (2012), 357.
[30]
Ben Langmead, Cole Trapnell, Mihai Pop, and Steven L Salzberg. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome biology 10, 3 (2009), R25.
[31]
Heng Li. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. 1303 (2013).
[32]
Heng Li and Richard Durbin. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 14 (2009), 1754--1760.
[33]
Ming Li, Bin Ma, Derek Kisman, and John Tromp. 2004. PatternHunter II: Highly sensitive and fast homology search. Journal of bioinformatics and computational biology 2, 03 (2004), 417--439.
[34]
Xueqi Li, Guangming Tan, Bingchen Wang, and Ninghui Sun. 2018. High-performance genomic analysis framework with in-memory computing. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, 317--328.
[35]
Xueqi Li, Guangming Tan, Chunming Zhang, Xu Li, Zhonghai Zhang, and Ninghui Sun. 2017. Quantifying and Mitigating Computational Inefficiency of Genomics Data Analysis. In High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), 2017 IEEE 19th International Conference on. IEEE, 262--269.
[36]
Richard J Lipton and Daniel Philip Lopresti. 1986. Comparing long strings on a short systolic array. Princeton University, Department of Computer Science.
[37]
Lira Mamanova, Alison J Coffey, Carol E Scott, Iwanka Kozarewa, Emily H Turner, Akash Kumar, Eleanor Howard, Jay Shendure, and Daniel J Turner. 2010. Target-enrichment strategies for next-generation sequencing. Nat Meth 7, 2 (02 2010), 111--118.
[38]
Michael L Metzker. 2010. Sequencing technologies---the next generation. Nature reviews genetics 11, 1 (2010), 31--46.
[39]
Nabeel M Mohamed, Heshan Lin, and Wuchun Feng. 2013. Accelerating data-intensive genome analysis in the cloud. In Proceedings of the 5th International Conference on Bioinformatics and Computational Biology (BICoB), Honolulu, Hawaii, USA.
[40]
A. Mukherjee, N. Motgi, J. Becker, A. Friebe, C. Habermann, and M. Glesner. 2001. Prototyping of Efficient Hardware Algorithms for Data Compression in Future Communication Systems. In International Workshop on Rapid System Prototyping. 58.
[41]
Wojciech Muła, Nathan Kurz, and Daniel Lemire. 2017. Faster population counts using AVX2 instructions. Comput. J. 61, 1 (2017), 111--120.
[42]
Intel PCM power utility. {n. d.}. https://software.intel.com/en-us/articles/intel-performance-counter-monitor#pcm_power. ({n. d.}). Accessed April 1, 2018.
[43]
James Reinders. 2005. VTune performance analyzer essentials. Intel Press (2005).
[44]
James Reinders. 2014. Knights corner: Your path to knights landing. Intel Developer Zone (2014).
[45]
Michael Schindler. 1997. A fast block-sorting algorithm for lossless data compression. In Data Compression Conference, 1997. DCC '97. Proceedings. 469.
[46]
Premkishore Shivakumar and Norman P Jouppi. 2001. Cacti 3.0: An integrated cache timing, power, and area model. (2001).
[47]
Tf Smith. 1980. Waterman MS: Identification of common molecular subsequences. Journal of Molecular Biology 147, 1 (1980), 195--197.
[48]
Avinash Sodani. 2015. Knights landing (knl): 2nd generation intel® xeon phi processor. In Hot Chips 27 Symposium (HCS), 2015 IEEE. IEEE, 1--24.
[49]
Guangming Tan, Shengzhong Feng, and Ninghui Sun. 2006. Biology -Locality and parallelism optimization for dynamic programming algorithm in bioinformatics. SC (2006), 78.
[50]
Guangming Tan, Chunming Zhang, Wen Tang, and Ninghui Sun. 2015. Accelerating Irregular Computation in Massive Short Reads Mapping on FPGA Co-processor. IEEE Transactions on Parallel and Distributed Systems PP, 99 (2015), 1--1.
[51]
Wen Tang, Wendi Wang, Bo Duan, Chunming Zhang, Guangming Tan, Peiheng Zhang, and Ninghui Su. 2012. Accelerating Millions of Short Reads Mapping on a Heterogeneous Architecture with FPGA Accelerator. In IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM ). 184--187.
[52]
Yatish Turakhia, Gill Bejerano, and William J Dally. 2018. Darwin: A Genomics Co-processor Provides up to 15,000 X Acceleration on Long Read Assembly. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 199--213.
[53]
Panagiotis D Vouzis and Nikolaos V Sahinidis. 2011. GPU-BLAST: using graphics processors to accelerate protein sequence alignment. Bioinformatics 27, 2 (2011), 182--188.
[54]
Wendi Wang, Wen Tang, Linchuan Li, Guangming Tan, Peiheng Zhang, and Ninghui Sun. 2012. Investigating Memory Optimization of Hash-index for Next Generation Sequencing on Multi-core Architecture. In IEEE International Parallel and Distributed Processing Symposium Workshops & Phd Forum. 665--674.
[55]
Chris Wilkerson, Alaa R Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, and Shih-lien Lu. 2010. Reducing cache power with low-cost, multi-bit error-correcting codes. In ACM SIGARCH Computer Architecture News, Vol. 38. ACM, 83--93.
[56]
Hongyi Xin, Donghyuk Lee, Farhad Hormozdiari, Samihan Yedkar, Onur Mutlu, and Can Alkan. 2013. Accelerating read mapping with FastHASH. In BMC genomics, Vol. 14. BioMed Central, S13.
[57]
Yoshiki Yamaguchi, Tsutomu Maruyama, and Akihiko Konagaya. 2001. High speed homology search with FPGAs. In Biocomputing 2002. World Scientific, 271--282.
[58]
Chi Wai Yu, KH Kwong, Kin-Hong Lee, and Philip Heng Wai Leong. 2003. A Smith-Waterman systolic cell. In International Conference on Field Programmable Logic and Applications. Springer, 375--384.
[59]
Jing Zhang, Heshan Lin, Pavan Balaji, and Wu-chun Feng. 2013. Optimizing Burrows-Wheeler Transform-Based Sequence Alignment on Multicore Architectures. CCGRID (2013), 377--384.

Cited By

View all
  • (2024)GEMA: A Genome Exact Mapping Accelerator Based on Learned IndexesIEEE Transactions on Biomedical Circuits and Systems10.1109/TBCAS.2023.334815218:3(523-538)Online publication date: Jun-2024
  • (2023)Nucleotide String Indexing using Range MatchingProceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics10.1145/3584371.3613024(1-2)Online publication date: 3-Sep-2023
  • (2023)NvWa: Enhancing Sequence Alignment Accelerator Throughput via Hardware Scheduling2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070978(1236-1248)Online publication date: Feb-2023
  • Show More Cited By

Index Terms

  1. Accelerating FM-index Search for Genomic Data Processing

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICPP '18: Proceedings of the 47th International Conference on Parallel Processing
    August 2018
    945 pages
    ISBN:9781450365109
    DOI:10.1145/3225058
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • University of Oregon: University of Oregon

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 August 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Energy efficiency
    2. Genome alignment
    3. Hardware-acceleration

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    ICPP 2018

    Acceptance Rates

    ICPP '18 Paper Acceptance Rate 91 of 313 submissions, 29%;
    Overall Acceptance Rate 91 of 313 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)21
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 21 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)GEMA: A Genome Exact Mapping Accelerator Based on Learned IndexesIEEE Transactions on Biomedical Circuits and Systems10.1109/TBCAS.2023.334815218:3(523-538)Online publication date: Jun-2024
    • (2023)Nucleotide String Indexing using Range MatchingProceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics10.1145/3584371.3613024(1-2)Online publication date: 3-Sep-2023
    • (2023)NvWa: Enhancing Sequence Alignment Accelerator Throughput via Hardware Scheduling2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070978(1236-1248)Online publication date: Feb-2023
    • (2022)BWA-MEM-SCALE: Accelerating Genome Sequence Mapping on Commodity ServersProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545033(1-12)Online publication date: 29-Aug-2022
    • (2022)BEACON: Scalable Near-Data-Processing Accelerators for Genome Analysis near Memory Pool with the CXL Support2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO56248.2022.00057(727-743)Online publication date: Oct-2022
    • (2022)DSIM: Distributed Sequence Matching on Near-DRAM Accelerator for Genome AssemblyIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2022.317277412:2(486-499)Online publication date: Jun-2022
    • (2021)Accelerated seeding for genome sequence alignment with enumerated radix treesProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00038(388-401)Online publication date: 14-Jun-2021
    • (2021)EXMA: A Genomics Accelerator for Exact-Matching2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00041(399-411)Online publication date: Feb-2021
    • (2021)PIM-Align: A Processing-in-Memory Architecture for FM-Index Search AlgorithmJournal of Computer Science and Technology10.1007/s11390-020-0825-336:1(56-70)Online publication date: 1-Jan-2021
    • (2019)FindeR: Accelerating FM-Index-Based Exact Pattern Matching in Genomic Sequences through ReRAM Technology2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT.2019.00030(284-295)Online publication date: Sep-2019

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media