Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2897937.2898010acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Dot-product engine for neuromorphic computing: programming 1T1M crossbar to accelerate matrix-vector multiplication

Published: 05 June 2016 Publication History

Abstract

Vector-matrix multiplication dominates the computation time and energy for many workloads, particularly neural network algorithms and linear transforms (e.g, the Discrete Fourier Transform). Utilizing the natural current accumulation feature of memristor crossbar, we developed the Dot-Product Engine (DPE) as a high density, high power efficiency accelerator for approximate matrix-vector multiplication. We firstly invented a conversion algorithm to map arbitrary matrix values appropriately to memristor conductances in a realistic crossbar array, accounting for device physics and circuit issues to reduce computational errors. The accurate device resistance programming in large arrays is enabled by close-loop pulse tuning and access transistors. To validate our approach, we simulated and benchmarked one of the state-of-the-art neural networks for pattern recognition on the DPEs. The result shows no accuracy degradation compared to software approach (99 % pattern recognition accuracy for MNIST data set) with only 4 Bit DAC/ADC requirement, while the DPE can achieve a speed-efficiency product of 1,000× to 10,000× compared to a custom digital ASIC.

References

[1]
S. K. Hsu et al., "A 280 mv-to-1.1 v 256b reconfigurable simd vector permutation engine with 2-dimensional shuffle in 22 nm tri-gate cmos," IEEE JSSC, vol. 48, no. 1, pp. 118--127, 2013.
[2]
J. J. Yang et al., "Memristive devices for computing," Nature nanotechnology, vol. 8, no. 1, pp. 13--24, 2013.
[3]
M. Hu et al., "Hardware realization of bsb recall function using memristor crossbar arrays," in DAC. ACM, 2012, pp. 498--503.
[4]
K. Fatahalian et al., "Understanding the efficiency of gpu algorithms for matrix-matrix multiplication," in ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware. ACM, 2004, pp. 133--137.
[5]
P. Gu et al., "Technological exploration of rram crossbar array for matrix-vector multiplication," in ASP-DAC. IEEE, 2015, pp. 106--111.
[6]
G. Burr et al., "Experimental demonstration and tolerancing of a large-scale neural network (165,000 synapses), using phase-change memory as the synaptic weight element," in IEEE IEDM. IEEE, 2014, pp. 29--5.
[7]
B. Liu et al., "Vortex: variation-aware training for memristor x-bar," in DAC. ACM, 2015, p. 15.
[8]
M. Prezioso et al., "Training and operation of an integrated neuromorphic network based on metal-oxide memristors," Nature, vol. 521, no. 7550, pp. 61--64, 2015.
[9]
M. Hu et al., "Memristor crossbar-based neuromorphic computing system: A case study," IEEE TNNLS, vol. 25, no. 10, pp. 1864--1878, 2014.
[10]
R. Salakhutdinov and G. E. Hinton, "Learning a nonlinear embedding by preserving class neighbourhood structure," in ICAIS, 2007, pp. 412--419.
[11]
Y. Y. Chen et al., "Endurance/retention trade-off on cap 1t1r bipolar rram," TED, vol. 60, no. 3, pp. 1114--1121, 2013.
[12]
H.-S. P. Wong et al., "Metal--oxide rram," Proceedings of the IEEE, vol. 100, no. 6, pp. 1951--1970, 2012.
[13]
S. Jo et al., "Nanoscale Memristor Device as Synapse in Neuromorphic Systems," Nano Letter, vol. 10, no. 4, pp. 1297--1301, 2010.
[14]
M. Tarkov, "Mapping weight matrix of a neural network?s layer onto memristor crossbar," Optical Memory and Neural Networks, vol. 24, no. 2, pp. 109--115, 2015.
[15]
S. Choi et al., "Data clustering using memristor networks," Scientific Reports, vol. 5, 2015.
[16]
F. Alibart et al., "High precision tuning of state for memristive devices by adaptable variation-tolerant algorithm," Nanotechnology, vol. 23, no. 7, p. 075201, 2012.
[17]
S. Choi et al., "Random telegraph noise and resistance switching analysis of oxide based resistive memory," Nanoscale, vol. 6, no. 1, pp. 400--404, 2014.
[18]
X. Dong et al., "Pcramsim: System-level performance, energy, and area modeling for phase-change ram," in ICCAD. ACM, 2009, pp. 269--275.
[19]
S.-S. Sheu et al., "A 4mb embedded slc resistive-ram macro with 7.2 ns read-write random-access time and 160ns mlc-access capability," in IEEE ISSCC, 2011, pp. 200--202.

Cited By

View all
  • (2024)Theoretico-experimental analysis of bistability in the oscillatory response of a TaOx ReRAM to pulse train stimuliFrontiers in Nanotechnology10.3389/fnano.2024.13013206Online publication date: 15-May-2024
  • (2024)Towards Reliable and Energy-Efficient RRAM Based Discrete Fourier Transform Accelerator2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546709(1-6)Online publication date: 25-Mar-2024
  • (2024)A general yet accurate approach for energy-efficient processing-in-memory architecture computationsSCIENTIA SINICA Informationis10.1360/SSI-2023-034554:8(1827)Online publication date: 7-Aug-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
DAC '16: Proceedings of the 53rd Annual Design Automation Conference
June 2016
1048 pages
ISBN:9781450342360
DOI:10.1145/2897937
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2016

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

DAC '16

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)444
  • Downloads (Last 6 weeks)61
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Theoretico-experimental analysis of bistability in the oscillatory response of a TaOx ReRAM to pulse train stimuliFrontiers in Nanotechnology10.3389/fnano.2024.13013206Online publication date: 15-May-2024
  • (2024)Towards Reliable and Energy-Efficient RRAM Based Discrete Fourier Transform Accelerator2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546709(1-6)Online publication date: 25-Mar-2024
  • (2024)A general yet accurate approach for energy-efficient processing-in-memory architecture computationsSCIENTIA SINICA Informationis10.1360/SSI-2023-034554:8(1827)Online publication date: 7-Aug-2024
  • (2024)Brain-inspired computing systems: a systematic literature reviewThe European Physical Journal B10.1140/epjb/s10051-024-00703-697:6Online publication date: 6-Jun-2024
  • (2024)Enhancing ConvNets With ConvFIFO: A Crossbar PIM Architecture Based on Kernel-Stationary First-In-First-Out DataflowIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.340964832:9(1640-1651)Online publication date: Sep-2024
  • (2024)HARDSEA: Hybrid Analog-ReRAM Clustering and Digital-SRAM In-Memory Computing Accelerator for Dynamic Sparse Self-Attention in TransformerIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.333777732:2(269-282)Online publication date: Feb-2024
  • (2024)Fast and Accurate Output Error Estimation for Memristor-Based Deep Neural NetworksIEEE Transactions on Signal Processing10.1109/TSP.2024.336942372(1205-1218)Online publication date: 2024
  • (2024)AutoGMap: Learning to Map Large-Scale Sparse Graphs on Memristive CrossbarsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.326538335:9(12888-12898)Online publication date: Sep-2024
  • (2024)Manufacturability Evaluation of Magnetic Tunnel Junction-Based Computational Random Access MemoryIEEE Transactions on Magnetics10.1109/TMAG.2023.332393560:5(1-7)Online publication date: May-2024
  • (2024)On-Chip Learning of Neural Network Using Spin-Based Activation Function NodesIEEE Transactions on Electron Devices10.1109/TED.2024.341872171:8(5118-5124)Online publication date: Aug-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media