Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3410463.3414626acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article
Public Access

Helix: Algorithm/Architecture Co-design for Accelerating Nanopore Genome Base-calling

Published: 30 September 2020 Publication History

Abstract

Nanopore genome sequencing is the key to enabling personalized medicine, global food security, and virus surveillance. The state-of-the-art base-callers adopt deep neural networks (DNNs) to translate electrical signals generated by nanopore sequencers to digital DNA symbols. A DNN-based base-caller consumes 44.5% of total execution time of a nanopore sequencing pipeline. However, it is difficult to quantize a base-caller and build a power-efficient processing-in-memory (PIM) to run the quantized base-caller. Although conventional network quantization techniques reduce the computing overhead of a base-caller by replacing floating-point multiply-accumulations by cheaper fixed-point operations, it significantly increases the number of systematic errors that cannot be corrected by read votes. The power density of prior nonvolatile memory (NVM)-based PIMs has already exceeded memory thermal tolerance even with active heat sinks, because their power efficiency is severely limited by analog-to-digital converters (ADC). Finally, Connectionist Temporal Classification (CTC) decoding and read voting cost 53.7% of total execution time in a quantized base-caller, and thus became its new bottleneck.
In this paper, we propose a novel algorithm/architecture co-designed PIM, Helix, to power-efficiently and accurately accelerate nanopore base-calling. From algorithm perspective, we present systematic error aware training to minimize the number of systematic errors in a quantized base-caller. From architecture perspective, we propose a low-power SOT-MRAM-based ADC array to process analog-to-digital conversion operations and improve power efficiency of prior DNN PIMs. Moreover, we revised a traditional NVM-based dot-product engine to accelerate CTC decoding operations, and create a SOT-MRAM binary comparator array to process read voting. Compared to state-of-the-art PIMs, Helix improves base-calling throughput by 6x, throughput per Watt by 11.9x and per mm2 by 7.5x without degrading base-calling accuracy.

References

[1]
S. Ambrogio, M. Gallot, et almbox. 2019. Reducing the Impact of Phase-Change Memory Conductance Drift on the Inference of large-scale Hardware Neural Networks. In IEEE International Electron Devices Meeting. 6.1.1--6.1.4.
[2]
Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Geoffrey Ndu, Martin Foltin, R Stanley Williams, Paolo Faraboschi, Wen-mei W Hwu, John Paul Strachan, Kaushik Roy, et almbox. 2019. PUMA: A programmable ultra-efficient memristor-based accelerator for machine learning inference. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 715--731.
[3]
Vladim'ir Bovz a, Brovn a Brejová, and Tomávs Vinavr. 2017. DeepNano: Deep recurrent neural networks for base calling in MinION nanopore reads. PloS one, Vol. 12, 6 (2017), e0178751.
[4]
I. Chakraborty, A. Agrawal, and K. Roy. 2018. Design of a Low-Voltage Analog-to-Digital Converter Using Voltage-Controlled Stochastic Switching of Low Barrier Nanomagnets. IEEE Magnetics Letters, Vol. 9 (2018), 1--5.
[5]
Xiangyu Dong, Cong Xu, Yuan Xie, and Norman P Jouppi. 2012. Nvsim: A circuit-level performance, energy, and area model for emerging nonvolatile memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 31, 7 (2012), 994--1007.
[6]
Nuno Rodrigues Faria, Ester C Sabino, Marcio RT Nunes, Luiz Carlos Junior Alcantara, Nicholas J Loman, and Oliver G Pybus. 2016. Mobile real-time surveillance of Zika virus in Brazil. Genome medicine, Vol. 8, 1 (2016), 97.
[7]
Flappie. 2019. Oxford Nanopore Technologies. https://github.com/nanoporetech/flappie
[8]
Daichi Fuijiki, Arun Subramaniyan, Tianjun Zhang, Yu Zheng, Reetuparna Das, David Blaauw, and Satish Narayanasamy. 2018. GenAx: A Genome Sequencing Accelerator. In IEEE/ACM International Symposium on Computer Architecture.
[9]
Daichi Fujiki, Scott Mahlke, and Reetuparna Das. 2018. In-Memory Data Parallel Processor. In IEEE/ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 1--14.
[10]
Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In ACM International Conference on Machine Learning. 369--376.
[11]
Awni Hannun. 2017. Sequence Modeling with CTC. Distill (2017). https://doi.org/10.23915/distill.00008 https://distill.pub/2017/ctc.
[12]
Thomas Hoenen, Allison Groseth, Kyle Rosenke, Robert J Fischer, Andreas Hoenen, Seth D Judson, Cynthia Martellaro, Darryl Falzarano, Andrea Marzi, and R Burke Squires. 2016. Nanopore sequencing as a rapidly deployable Ebola outbreak tool. Emerging infectious diseases, Vol. 22, 2 (2016), 331.
[13]
H. Honjo, T. V. A. Nguyen, et almbox. 2019. First demonstration of field-free SOT-MRAM with 0.35 ns write speed and 70 thermal stability under 400°C thermal tolerance by canted SOT structure and its advanced patterning/SOT channel technology. In 2019 IEEE International Electron Devices Meeting. 28.5.1--28.5.4.
[14]
Wenqin Huangfu, Xueqi Li, Shuangchen Li, Xing Hu, Peng Gu, and Yuan Xie. 2019. MEDAL: Scalable DIMM Based Near Data Processing Accelerator for DNA Seeding Algorithm. In IEEE/ACM International Symposium on Microarchitecture. 587--599.
[15]
Miten Jain, Sergey Koren, Karen H Miga, Josh Quick, Arthur C Rand, Thomas A Sasani, John R Tyson, Andrew D Beggs, Alexander T Dilthey, and Ian T Fiddes. 2018. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nature biotechnology, Vol. 36, 4 (2018), 338.
[16]
Jimmy J Kan, Chando Park, Chi Ching, Jaesoo Ahn, Yuan Xie, Mahendra Pakala, and Seung H Kang. 2017. A study on practically unlimited endurance of STT-MRAM. IEEE Transactions on Electron Devices, Vol. 64, 9 (2017), 3639--3646.
[17]
H. Lee, F. Ebrahimi, P. K. Amiri, and K. L. Wang. 2016. Low-Power, High-Density Spintronic Programmable Logic With Voltage-Gated Spin Hall Effect in Magnetic Tunnel Junctions. IEEE Magnetics Letters (2016).
[18]
Rundong Li, Yan Wang, Feng Liang, Hongwei Qin, Junjie Yan, and Rui Fan. 2019. Fully Quantized Network for Object Detection. In IEEE Conference on Computer Vision and Pattern Recognition. 2810--2819.
[19]
Darryl Lin, Sachin Talathi, and Sreekanth Annapureddy. 2016. Fixed point quantization of deep convolutional networks. In International Conference on Machine Learning.
[20]
Roujian Lu, Xiang Zhao, Juan Li, Peihua Niu, Bo Yang, Honglong Wu, Wenling Wang, Hao Song, Baoying Huang, Na Zhu, et almbox. 2020. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. The Lancet, Vol. 395, 10224 (2020), 565--574.
[21]
Advait Madhavan, Timothy Sherwood, and Dmitri Strukov. 2014. Race Logic: A hardware acceleration for dynamic programming algorithms. In IEEE/ACM International Symposium on Computer Architecture.
[22]
Metrichor. 2017. Oxford Nanopore Technologies. https://metrichor.com
[23]
Kazuma Nakano, Akino Shiroma, Makiko Shimoji, Hinako Tamotsu, Noriko Ashimine, Shun Ohki, Misuzu Shinzato, Maiko Minami, Tetsuhiro Nakanishi, and Kuniko Teruya. 2017. Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area. Human cell, Vol. 30, 3 (2017), 149--161.
[24]
Nanopore. 2020. SmidgION Nanopore Sequencer. https://nanoporetech.com/products/smidgion.
[25]
Janusz J Nowak, Ray P Robertazzi, Jonathan Z Sun, Guohan Hu, Jeong-Heon Park, JungHyuk Lee, Anthony J Annunziata, Gen P Lauer, Raman Kothandaraman, Eugene J O'Sullivan, et almbox. 2016. Dependence of voltage and size on write error rates in spin-transfer torque magnetic random-access memory. IEEE Magnetics Letters, Vol. 7 (2016), 1--4.
[26]
Oxford. 2018a. Albacore. https://nanoporetech.com/about-us/news/new-basecaller-now-performs-raw-basecalling-improved-sequencing-accuracy.
[27]
Oxford. 2018b. Metrichor. https://nanoporetech.com/products/metrichor.
[28]
M. M. Sabry Aly, T. F. Wu, A. Bartolo, Y. H. Malviya, W. Hwang, G. Hills, I. Markov, M. Wootters, M. M. Shulaker, H. Philip Wong, and S. Mitra. 2019. The N3XT Approach to Energy-Efficient Abundant-Data Computing. Proc. IEEE, Vol. 107, 1 (2019), 19--48.
[29]
Scrappie. 2019. Oxford Nanopore Technologies. https://github.com/nanoporetech/scrappie
[30]
Damla Senol Cali, Jeremie S Kim, Saugata Ghose, Can Alkan, and Onur Mutlu. 2018. Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions. Briefings in bioinformatics (04 2018).
[31]
A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar. 2016. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars. In ACM/IEEE International Symposium on Computer Architecture. 14--26.
[32]
Haotian Teng. 2018. Chiron: A basecaller for Oxford Nanopore Technologies' sequencers. https://github.com/haotianteng/Chiron.
[33]
Haotian Teng, Minh Duc Cao, Michael B Hall, Tania Duarte, Sheng Wang, and Lachlan JM Coin. 2018. Chiron: Translating nanopore raw signal directly into nucleotide sequence using deep learning. GigaScience, Vol. 7, 5 (2018).
[34]
Yatish Turakhia, Gill Bejerano, and William J. Dally. 2018. Darwin: A Genomics Co-processor Provides Up to 15,000X Acceleration on Long Read Assembly. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems.
[35]
Y. Turakhia, S. D. Goenka, G. Bejerano, and W. J. Dally. 2019. Darwin-WGA: A Co-processor Provides Increased Sensitivity in Whole Genome Alignments with High Speedup. In IEEE International Symposium on High Performance Computer Architecture. 359--372.
[36]
Ryan R. Wick, Louise M. Judd, and Kathryn E. Holt. 2019. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biology, Vol. 20, 1 (24 Jun 2019), 129.
[37]
L. Wu, D. Bruns-Smith, F. A. Nothaft, Q. Huang, S. Karandikar, J. Le, A. Lin, H. Mao, B. Sweeney, K. Asanovi?, D. A. Patterson, and A. D. Joseph. 2019. FPGA Accelerated INDEL Realignment in the Cloud. In IEEE International Symposium on High Performance Computer Architecture. 277--290.
[38]
Chen Xu, Jianqiang Yao, Zhouchen Lin, Wenwu Ou, Yuanbin Cao, Zhirong Wang, and Hongbin Zha. 2018. Alternating Multi-bit Quantization for Recurrent Neural Networks. In International Conference on Learning Representations.
[39]
Hao Yan, Hebin R. Cherian, Ethan C. Ahn, and Lide Duan. 2018. CELIA: A Device and Architecture Co-Design Framework for STT-MRAM-Based Deep Learning Acceleration. In ACM International Conference on Supercomputing. 149--159.
[40]
Tzu-Hsien Yang, Hsiang-Yun Cheng, Chia-Lin Yang, I-Ching Tseng, Han-Wen Hu, Hung-Sheng Chang, and Hsiang-Pang Li. 2019. Sparse ReRAM Engine: Joint Exploration of Activation and Weight Sparsity in Compressed Neural Networks. In ACM/IEEE International Symposium on Computer Architecture. 236--249.
[41]
Yu Zhang, Xiaoyang Lin, Jean-Paul Adam, Guillaume Agnus, Wang Kang, Wenlong Cai, Jean-Rene Coudevylle, Nathalie Isac, Jianlei Yang, Huaiwen Yang, et almbox. 2018. Heterogeneous memristive devices enabled by magnetic tunnel junction nanopillars surrounded by resistive silicon switches. Advanced Electronic Materials, Vol. 4, 3 (2018), 1700461.
[42]
Yuxiong Zhu, Borui Wang, Dong Li, and Jishen Zhao. 2016. Integrated Thermal Analysis for Processing In Die-Stacking Memory. In IEEE International Symposium on Memory Systems. 402--414.
[43]
F. Zokaee, M. Zhang, and L. Jiang. 2019. FindeR: Accelerating FM-Index-Based Exact Pattern Matching in Genomic Sequences through ReRAM Technology. In International Conference on Parallel Architectures and Compilation Techniques. 284--295.

Cited By

View all
  • (2024)TargetCall: eliminating the wasted computation in basecalling via pre-basecalling filteringFrontiers in Genetics10.3389/fgene.2024.142930615Online publication date: 28-Oct-2024
  • (2024)RUBICON: a framework for designing efficient deep learning-based genomic basecallersGenome Biology10.1186/s13059-024-03181-225:1Online publication date: 16-Feb-2024
  • (2024)MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage Processing2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00054(660-677)Online publication date: 29-Jun-2024
  • Show More Cited By

Index Terms

  1. Helix: Algorithm/Architecture Co-design for Accelerating Nanopore Genome Base-calling

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques
      September 2020
      505 pages
      ISBN:9781450380751
      DOI:10.1145/3410463
      • General Chair:
      • Vivek Sarkar,
      • Program Chair:
      • Hyesoon Kim
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 30 September 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. base-calling
      2. nanopore sequencing
      3. processing-in-memory

      Qualifiers

      • Research-article

      Funding Sources

      • National Science Foundation

      Conference

      PACT '20
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 121 of 471 submissions, 26%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)139
      • Downloads (Last 6 weeks)13
      Reflects downloads up to 12 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)TargetCall: eliminating the wasted computation in basecalling via pre-basecalling filteringFrontiers in Genetics10.3389/fgene.2024.142930615Online publication date: 28-Oct-2024
      • (2024)RUBICON: a framework for designing efficient deep learning-based genomic basecallersGenome Biology10.1186/s13059-024-03181-225:1Online publication date: 16-Feb-2024
      • (2024)MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage Processing2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00054(660-677)Online publication date: 29-Jun-2024
      • (2023)Swordfish: A Framework for Evaluating Deep Neural Network-based Basecalling using Computation-In-Memory with Non-Ideal MemristorsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614252(1437-1452)Online publication date: 28-Oct-2023
      • (2023)Invited: Accelerating Genome Analysis via Algorithm-Architecture Co-Design2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247887(1-4)Online publication date: 9-Jul-2023
      • (2023)SieveMem: A Computation-in-Memory Architecture for Fast and Accurate Pre-Alignment2023 IEEE 34th International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP57973.2023.00035(156-164)Online publication date: Jul-2023
      • (2023)Efficient Signed Arithmetic Multiplication on Memristor-Based CrossbarIEEE Access10.1109/ACCESS.2023.326325911(33964-33978)Online publication date: 2023
      • (2022)KrakenOnMemProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532367(1-14)Online publication date: 28-Jun-2022
      • (2022)SeGraMProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527436(638-655)Online publication date: 18-Jun-2022
      • (2022)System Design for Computation-in-Memory: From Primitive to Complex Functions2022 IFIP/IEEE 30th International Conference on Very Large Scale Integration (VLSI-SoC)10.1109/VLSI-SoC54400.2022.9939571(1-6)Online publication date: 3-Oct-2022
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media