research-article

3DICT: A Reliable and QoS Capable Mobile Process-In-Memory Architecture for Lookup-based CNNs in 3D XPoint ReRAMs

Authors:

Lei JiangAuthors Info & Claims

2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pages 1 - 8

https://doi.org/10.1145/3240765.3240767

Published: 05 November 2018 Publication History

Abstract

It is extremely challenging to deploy computing-intensive convolutional neural networks (CNNs) with rich parameters in mobile devices because of their limited computing resources and low power budgets. Although prior works build fast and energy-efficient CNN accelerators by greatly sacrificing test accuracy, mobile devices have to guarantee high CNN test accuracy for critical applications, e.g., unlocking phones by face recognitions. In this paper, we propose a 3D XPoint ReRAM-based process-in-memory architecture, 3DICT, to provide various test accuracies to applications with different priorities by lookup-based CNN tests that dynamically exploit the trade-off between test accuracy and latency. Compared to the state-of-the-art accelerators, on average, 3DICT improves the CNN test performance per Watt by 13% ∼ 61× and guarantees 9-year endurance under various CNN test accuracy requirements.

References

[1]

R. Andri et al., “YodaNN: An Ultra-Low Power CNN Accelerator Based on Binary Weights;” in ISVLSI, pages 236–241, July 2016.

[2]

H. Bagherinezhad et al., “LCNN: Lookup-based Convolutional Neural Network,” in CVPR, 2017.

[3]

B. Chakrabarti et al., “A multiply-add engine with monolithically integrated 3D memristor crossbar/CMOS hybrid circuit,” Scientific Reports, 7, 2017.

[4]

Y.H. Chen et al., “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” in ISSCC, 2016.

[5]

P. Chi et al., “PRIME: A Novel PIM Architecture for Neural Network Computation in ReRAM-Based Main Memory;” in ISCA, 2016.

[6]

N. Chidambaram Nachiappan, et al., “GemDroid: A Framework to Evaluate Mobile Platforms,” in SIGMETRICS, 2014.

[7]

D. Ciregan et al., “Multi-column deep neural networks for image classification,” in CVPR, 2012.

[8]

R. Collobert et al., “Torch7: A Matlab-like Environment for Machine Learning,” in BigLeam, NIPS Workshop, 2011.

[9]

M. Courbariaux et al., “Binaryconnect: Training deep neural networks with binary weights during propagations;” in NIPS, 2015.

[10]

X. Dong et al., “NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory,” TCAD, 2012.

[11]

C. Farabet et al., “CNP: An FPGA-based processor for Convolutional Networks,” in FPL, 2009.

[12]

M. Gao et al., “TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory,” in ASPLOS, 2017.

[13]

P. Gu et al., “Technological exploration of RRAM crossbar array for matrix-vector multiplication,” in ASPDAC, 2015.

[14]

S. Han et al., “MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints,” in MobiSys, 2016.

[15]

K. He et al., “Deep Residual Learning for Image Recognition;” in CVPR, 2016.

[16]

M. Hu et al., “Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication;” in DAC, 2016.

[17]

L. Jiang et al., “Enhancing Phase Change Memory Lifetime through Fine-Grained Current Regulation and Voltage Upscaling,” in ISLPED, 2011.

[18]

L. Jiang et al., “XNOR-POP: A processing-in-memory architecture for binary Convolutional Neural Networks in Wide-IO2 DRAMs,” in ISLPED, 2017.

[19]

A. Krizhevsky et al., “ImageNet Classification with Deep Convolutional Neural Networks,” in NIPS, 2012.

[20]

Y. Lecun et al., “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, 86 (11), Nov 1998.

[21]

S.E. Lee et al., “Accelerating mobile augmented reality on a handheld platform,” in ICCD, 2009.

[22]

R. Midya et al., “Anatomy of Ag/Hafnia-Based Selectors with 10¹⁰ Nonlinearity,” Advanced Materials, 29 (12), 2017.

[23]

B. Moons et al., “Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI,” in ISSCC, 2017.

[24]

B. Murmann, “An ADC Performance, Power and Area Survey from 1997 to 2017,” http://web.stanford.edu/-murmann/adcsurvey.html

[25]

L. Ni et al., “An Energy-Efficient Digital ReRAM-Crossbar-Based CNN With Bitwise Parallelism,” JESSCDC, 2017.

[26]

M. Rastegari et al., “XNOR-Net: Imagenet Classification Using Binary Convolutional Neural Networks,” in ECCV, 2016.

[27]

A. Sampson and M. Buckler, “FODLAM: a first-order deep learning accelerator model,” https://github.com/cucapra/fodlam

[28]

A. Shafiee et al., “ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars,” in ISCA, 2016.

[29]

P.Y. Simard et al., “Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis,” in ICDAR, 2003.

[30]

T. Tang et al., “Binary convolutional neural network on RRAM,” in ASP-DAC, 2017.

[31]

W. Wen et al., “Speeding up crossbar resistive memory by exploiting in-memory data patterns,” in ICCAD, 2017.

[32]

H.S.P. Wong et al., “Metal-Oxide RRAM;” Proceedings of the IEEE, 2012.

[33]

C. Xu, et al., “Overcoming the challenges of crossbar resistive memory architectures;” in HPCA, 2015.

[34]

T.Y. Liu, et al., “A 130.7-mm²2-Layer 32-Gb ReRAM Memory Device in 24-nm Technology,” JSSC, 2014.

[35]

C. Zhang et al., “Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks,” in FPGA, 2015.

[36]

P. Zhou et al., “A Durable and Energy Efficient Main Memory Using Phase Change Memory Technology,” in ISCA, 2009.

Cited By

Zheng YYang WChen YHan D(2023)An Energy-Efficient Inference Engine for a Configurable ReRAM-Based Neural Network AcceleratorIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.318446442:3(740-753)Online publication date: Mar-2023
https://doi.org/10.1109/TCAD.2022.3184464
Sha SYang XSzczecinski TWhitman DWen WQuan G(2023)Endurance-Aware Deep Neural Network Real-Time Scheduling on ReRAM Accelerators2023 International Conference on Computational Science and Computational Intelligence (CSCI)10.1109/CSCI62032.2023.00072(404-410)Online publication date: 13-Dec-2023
https://doi.org/10.1109/CSCI62032.2023.00072
Chen WQi ZAkhtar ZSiddique K(2022)Resistive-RAM-Based In-Memory Computing for Neural Network: A ReviewElectronics10.3390/electronics1122366711:22(3667)Online publication date: 9-Nov-2022
https://doi.org/10.3390/electronics11223667
Show More Cited By

Index Terms

3DICT: A Reliable and QoS Capable Mobile Process-In-Memory Architecture for Lookup-based CNNs in 3D XPoint ReRAMs
1. Computer systems organization
2. Hardware

Index terms have been assigned to the content through auto-classification.

Recommendations

A frequent-value based PRAM memory architecture
ASPDAC '11: Proceedings of the 16th Asia and South Pacific Design Automation Conference

Phase Change Random Access Memory (PRAM) has great potential as the replacement of DRAM as main memory, due to its advantages of high density, non-volatility, fast read speed, and excellent scalability. However, poor endurance and high write energy ...
Initial experience with 3D XPoint main memory
Abstract
3D XPoint is the first commercially available main memory NVM solution targeting mainstream computer systems. Previous database studies on NVM memory evaluate their proposed techniques mainly on simulated or emulated NVM hardware. In this paper, ...
An Energy Efficient 3D-Heterogeneous Main Memory Architecture for Mobile Devices
MEMSYS '20: Proceedings of the International Symposium on Memory Systems

The demand for main memory capacity is ever increasing in mobile devices and embedded systems. Dynamic Random Access Memories (DRAMs) can not keep pace with the required main memory capacities because of the restrictions in improving the cell density ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Nov 2018

939 pages

Copyright © 2018.

Publisher

IEEE Press

Publication History

Published: 05 November 2018

Permissions

Request permissions for this article.

Request Permissions

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
355
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zheng YYang WChen YHan D(2023)An Energy-Efficient Inference Engine for a Configurable ReRAM-Based Neural Network AcceleratorIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.318446442:3(740-753)Online publication date: Mar-2023
https://doi.org/10.1109/TCAD.2022.3184464
Sha SYang XSzczecinski TWhitman DWen WQuan G(2023)Endurance-Aware Deep Neural Network Real-Time Scheduling on ReRAM Accelerators2023 International Conference on Computational Science and Computational Intelligence (CSCI)10.1109/CSCI62032.2023.00072(404-410)Online publication date: 13-Dec-2023
https://doi.org/10.1109/CSCI62032.2023.00072
Chen WQi ZAkhtar ZSiddique K(2022)Resistive-RAM-Based In-Memory Computing for Neural Network: A ReviewElectronics10.3390/electronics1122366711:22(3667)Online publication date: 9-Nov-2022
https://doi.org/10.3390/electronics11223667
Zabihi MResch SCilasun HChowdhury ZZhao ZKarpuzcu UWang JSapatnekar S(2021)Exploring the Feasibility of Using 3-D XPoint as an In-Memory Computing AcceleratorIEEE Journal on Exploratory Solid-State Computational Devices and Circuits10.1109/JXCDC.2021.31122387:2(88-96)Online publication date: Dec-2021
https://doi.org/10.1109/JXCDC.2021.3112238
Huang JHua YZuo PZhou WHuang F(2020)An Efficient Wear-level Architecture using Self-adaptive Wear LevelingProceedings of the 49th International Conference on Parallel Processing10.1145/3404397.3404405(1-11)Online publication date: 17-Aug-2020
https://dl.acm.org/doi/10.1145/3404397.3404405
Song LChen FZhuo YQian XLi HChen Y(2020)AccPar: Tensor Partitioning for Heterogeneous Deep Learning Accelerators2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA47549.2020.00036(342-355)Online publication date: Feb-2020
https://doi.org/10.1109/HPCA47549.2020.00036
Chen FSong LLi HChen Y(2019)ZARAProceedings of the 56th Annual Design Automation Conference 201910.1145/3316781.3317936(1-6)Online publication date: 2-Jun-2019
https://dl.acm.org/doi/10.1145/3316781.3317936
Song LMao JZhuo YQian XLi HChen Y(2019)HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2019.00027(56-68)Online publication date: Feb-2019
https://doi.org/10.1109/HPCA.2019.00027

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents