invited-talk

Fundamental limits on the precision of in-memory architectures

Authors:

Sujan K. Gonugondla,

Naresh R. ShanbhagAuthors Info & Claims

ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided Design

Article No.: 128, Pages 1 - 9

https://doi.org/10.1145/3400302.3416344

Published: 17 December 2020 Publication History

Abstract

This paper obtains the fundamental limits on the computational precision of in-memory computing architectures (IMCs). Various compute SNR metrics for IMCs are defined and their interrelationships analyzed to show that the accuracy of IMCs is fundamentally limited by the compute SNR (SNR_a) of its analog core, and that activation, weight and output precision needs to be assigned appropriately for the final output SNR SNR_T → SNR_a. The minimum precision criterion (MPC) is proposed to minimize the output and hence the column analog-to-digital converter (ADC) precision. The charge summing (QS) compute model and its associated IMC QS-Arch are studied to obtain analytical models for its compute SNR, minimum ADC precision, energy and latency. Compute SNR models of QS-Arch are validated via Monte Carlo simulations in a 65 nm CMOS process. Employing these models, upper bounds on SNR_a of a QS-Arch-based IMC employing a 512 row SRAM array are obtained and it is shown that QS-Arch's energy cost reduces by 3.3× for every 6 dB drop in SNR_a, and that the maximum achievable SNR_a reduces with technology scaling while the energy cost at the same SNR_a increases. These models also indicate the existence of an upper bound on the dot product dimension N due to voltage headroom clipping, and this bound can be doubled for every 3 dB drop in SNR_a.

References

[1]

Avishek Biswas and Anantha P Chandrakasan. 2018. Conv-RAM: An energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications. In IEEE International Solid-State Circuits Conference (ISSCC). 488--490.

[2]

Wei-Hao Chen et al. 2018. A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors. In IEEE International Solid-State Circuits Conference (ISSCC). 494--496.

[3]

Hassan Dbouk, Sujan K Gonugondla, Charbel Sakr, and Naresh R Shanbhag. 2020. KeyRAM: A 0.34 uJ/decision 18 k decisions/s Recurrent Attention Inmemory Processor for Keyword Spotting. In 2020 IEEE Custom Integrated Circuits Conference (CICC). IEEE, 1--4.

[4]

Qing Dong et al. 2020. A 351 TOPS/W and 372.4 GOPS Compute-in-Memory SRAM Macro in 7nm FinFET CMOS for Machine Learning Applications. In IEEE International Solid-State Circuits Conference (ISSCC). 242--243.

[5]

Laura Fick, David Blaauw, Dennis Sylvester, Skylar Skrzyniarz, M Parikh, and David Fick. 2017. Analog in-memory subthreshold deep neural network accelerator. In 2017 IEEE Custom Integrated Circuits Conference (CICC). IEEE, 1--4.

[6]

Sujan Kumar Gonugondla, Mingu Kang, and Naresh Shanbhag. 2018. A 42pJ/decision 3.12 TOPS/W robust in-memory machine learning classifier with on-chip training. In IEEE International Solid-State Circuits Conference (ISSCC). 490--492.

[7]

Sujan K Gonugondla, Mingu Kang, and Naresh R. Shanbhag. 2018. A variationtolerant in-memory machine learning classifier via on-chip training. IEEE Journal of Solid-State Circuits 53, 11 (2018), 3163--3173.

[8]

Ruiqi Guo, Yonggang Liu, Shixuan Zheng, Ssu-Yen Wu, Peng Ouyang, Win-San Khwa, Xi Chen, Jia-Jing Chen, Xiudong Li, Leibo Liu, Meng-Fan Chang, Shaojun Wei, and Shouyi Yin. 2019. A 5.1pJ/Neuron 127.3us/Inference RNN-based Speech Recognition Processor using 16 Computing-in-Memory SRAM Macros in 65nm CMOS. In 2019 IEEE Symposium on VLSI Circuits. IEEE, 120--121.

[9]

Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. In International Conference on Machine Learning. 1737--1746.

Digital Library

[10]

ITRS-collaborations. 2015. ITRS Roadmap tables. ITRS (2015). http://www.itrs2.net/itrs-reports.html

[11]

Hongyang Jia, Yinqi Tang, Hossein Valavi, Jintao Zhang, and Naveen Verma. 2018. A Microprocessor implemented in 65nm CMOS with Configurable and Bit-scalable Accelerator for Programmable In-memory Computing. arXiv preprint arXiv:1811.04047 (2018).

[12]

Zhewei Jiang, Shihui Yin, Mingoo Seok, and Jae-sun Seo. 2018. XNOR-SRAM: In-memory computing SRAM macro for binary/ternary deep neural networks. In 2018 IEEE Symposium on VLSI Technology. IEEE, 173--174.

[13]

Mingu Kang, Sujan Gonugondla, and Naresh R Shanbhag. 2020. Deep In-memory Architectures for Machine Learning. Springer.

[14]

Mingu Kang, Sujan K. Gonugondla, Min-Sun Keel, and Naresh R. Shanbhag. 2015. An energy-efficient memory-based high-throughput VLSI architecture for convolutional networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]

Mingu Kang, Sujan K Gonugondla, Ameya Patil, and Naresh R. Shanbhag. 2018. A multi-functional in-memory inference processor using a standard 6T SRAM array. IEEE Journal of Solid-State Circuits 53, 2 (2018), 642--655.

[16]

Win-San Khwa, Jia-Jing Chen, Jia-Fang Li, Xin Si, En-Yu Yang, Xiaoyu Sun, Rui Liu, Pai-Yu Chen, Qiang Li, Shimeng Yu, et al. 2018. A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3 ns and 55.8 TOPS/W fully parallel product-sum operation for binary DNN edge processors. In IEEE International Solid-State Circuits Conference (ISSCC). 496--498.

[17]

Jinseok Kim, Jongeun Koo, Taesu Kim, Yulhwa Kim, Hyungjun Kim, Seunghyun Yoo, and Jae-Joon Kim. 2019. Area-Efficient and Variation-Tolerant In-Memory BNN Computing using 6T SRAM Array. In 2019 IEEE Symposium on VLSI Circuits. IEEE, 118--119.

[18]

S. Lloyd. 1982. Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 2 (1982), 129--137.

Digital Library

[19]

M. Kang, M.-S. Keel, N. R. Shanbhag, S. Eilert, and K. Curewitz. 2014. An energy-efficient VLSI architecture for pattern recognition via deep embedding of computation in SRAM. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 8326--8330.

[20]

Boris Murmann. 2008. A/D converter trends: Power dissipation, scaling and digitally assisted architectures. In 2008 IEEE Custom Integrated Circuits Conference. IEEE, 105--112.

[21]

Boris Murmann. 2015. The race for the extra decibel: a brief review of current ADC performance trajectories. IEEE Solid-State Circuits Magazine 7, 3 (2015), 58--66.

[22]

Boris Murmann. 2019. ADC performance survey 1997--2019. https://web.stanford.edu/~murmann/adcsurvey.html

[23]

Shunsuke Okumura, Makoto Yabuuchi, Kenichiro Hijioka, and Koichi Nose. 2019. A Ternary Based Bit Scalable, 8.80 TOPS/W CNN accelerator with Many-core Processing-in-memory Architecture with 896K synapses/mm2. In 2019 IEEE Symposium on VLSI Circuits. IEEE, 248--249.

[24]

Angad S. Rekhi, Brian Zimmer, Nikola Nedovic, Ningxi Liu, Rangharajan Venkatesan, Miaorong Wang, Brucek Khailany, William J. Dally, and C. Thomas Gray. 2019. Analog/Mixed-Signal Hardware Error Modeling for Deep Learning Inference. In Proceedings of the 56th Annual Design Automation Conference 2019 (DAC '19). Association for Computing Machinery, New York, NY, USA, Article 81, 6 pages.

Digital Library

[25]

Charbel Sakr, Yongjune Kim, and Naresh Shanbhag. 2017. Analytical Guarantees on Numerical Precision of Deep Neural Networks. In International Conference on Machine Learning. 3007--3016.

[26]

Charbel Sakr and Naresh Shanbhag. 2018. An analytical method to determine minimum per-layer precision of deep neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1090--1094.

[27]

Richard Schreier, Gabor C Temes, et al. 2005. Understanding delta-sigma data converters. Vol. 74. IEEE press Piscataway, NJ.

[28]

Naresh Shanbhag, Mingu Kang, and Min-Sun Keel. 2017. Compute memory. US Patent 9,697,877, Issued July 4th., 2017.

[29]

Naresh R Shanbhag, Naveen Verma, Yongjune Kim, Ameya D Patil, and Lav R Varshney. 2018. Shannon-inspired statistical computing for the nanoscale era. Proc. IEEE 107, 1 (2018), 90--107.

[30]

Xin Si, Jia-Jing Chen, Yung-Ning Tu, Wei-Hsing Huang, Jing-Hong Wang, Yen-Cheng Chiu, Wei-Chen Wei, Ssu-Yen Wu, Xiaoyu Sun, Rui Liu, et al. 2019. A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning. In IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 396--398.

[31]

Xin Si, Yung-Ning Tu, Wei-Hsing Huang, Jian-Wei Su, Pei-Jung Lu, Jing-Hong Wang, Ta-Wei Liu, Ssu-Yen Wu, Ruhui Liu, Yen-Chi Chou, Zhixiao Zhang, Syuan-Hao Sie, Wei-Chen Wei, Yun-Chen Lo, Tai-Hsing Wen, Tzu-Hsiang Hsu, YenKai Chen, William Shih, Chung-Chuan Lo, Ren-Shuo Liu, Chih-Cheng Hsieh, Kea-Tiong Tang, Nan-Chun Lien, Wei-Chiang Shih, Yajuan He, Qiang Li, and Meng-Fan Chang. 2020. A 28nm 64Kb 6T SRAM Computing-in- Memory Macro with 8b MAC Operation for AI Edge Chips. In IEEE International Solid-State Circuits Conference (ISSCC). 246--247.

[32]

Jian-Wei Su, Xin Si, Yen-Chi Chou, Ting-Wei Chang, Wei-Hsing Huang, Yung-Ning Tu, Ruhui Liu, Ta-Wei Lu, Pei-Jungand Liu, Jing-Hong Wang, Zhixiao Zhang, Hongwu Jiang, Shanshi Huang, Chung-Chuan Lo, Ren-Shuo Liu, Chih-Cheng Hsieh, Kea-Tiong Tang, Shyh-Shyuan Sheu, Sih-Han Li, Heng-Yuan Lee, Shih-Chieh Chang, Shimeng Yu, and Meng-Fan Chang. 2020. A 28nm 64Kb Inference-Training Two-Way Transpose Multibit 6T SRAM Compute-in-Memory Macro for AI Edge Chips. In IEEE International Solid-State Circuits Conference (ISSCC). 240--241.

[33]

Hossein Valavi, Peter J Ramadge, Eric Nestler, and Naveen Verma. 2018. A mixed-signal binarized convolutional-neural-network accelerator integrating dense weight storage and multiplication for reduced data movement. In 2018 IEEE Symposium on VLSI Circuits. IEEE, 141--142.

[34]

Naveen Verma, Hongyang Jia, Hossein Valavi, Yinqi Tang, Murat Ozatay, LungYen Chen, Bonan Zhang, and Peter Deaville. 2019. In-memory computing: Advances and prospects. IEEE Solid-State Circuits Magazine 11, 3 (2019), 43--55.

[35]

Cheng-Xin Xue, Wei-Hao Chen, Je-Syu Liu, Jia-Fang Li, Wei-Yu Lin, Wei-En Lin, Jing-Hong Wang, Wei-Chen Wei, Ting-Wei Chang, Tung-Cheng Chang, et al. 2019. A 1Mb Multibit ReRAM Computing-In-Memory Macro with 14.6 ns Parallel MAC Computing Time for CNN Based AI Edge Processors. In IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 388--390.

[36]

Cheng-Xin Xue, Tsung-Yuan Huang, Je-Syu Liu, Ting-Wei Chang, Hui-Yao Kao, Jing-Hong Wang, Ta-Wei Liu, Shih-Ying Wei, Sheng-Po Huang, Wei-Chen Wei, Yi-Ren Chen, Tzu-Hsiang Hsu, Yen-Kai Chen, Yun-Chen Lo, Tai-Hsing Wen, Chung-Chuan Lo, Ren-Shuo Liu, Chih-Cheng Hsieh, Kea-Tiong Tang, and MengFan Chang. 2020. A 22nm 2Mb ReRAM Compute-in-Memory Macro with 121-28TOPS/W for Multibit MAC Computing for Tiny AI Edge Devices. In IEEE International Solid-State Circuits Conference (ISSCC). 244--245.

[37]

Bonan Yan, Qing Yang, Wei-Hao Chen, Kung-Tang Chang, Jian-Wei Su, Chien-Hua Hsu, Sih-Han Li, Heng-Yuan Lee, Shyh-Shyuan Sheu, Mon-Shu Ho, et al. 2019. RRAM-based Spiking Nonvolatile Computing-In-Memory Processing Engine with Precision-Configurable In Situ Nonlinear Activation. In 2019 Symposium on VLSI Technology. IEEE, T86--T87.

[38]

Jinshan Yue, Zhe Yuan, Xiaoyu Feng, Yifan He, Zhixiao Zhang, Xin Si, Ruhui Liu, Meng-Fan Chang, Xueqing Li, Huazhong Yang, and Yongpan Liu. 2020. A 65nm Computing-in-Memory-Based CNN Processor with 2.9-to-35.8TOPS/W System Energy Eficiency Using Dynamic-Sparsity Performance-Scaling Architecture and Energy-Efficient Inter/Intra-Macro Data Reuse. In IEEE International SolidState Circuits Conference (ISSCC). 234--235.

[39]

Yue Zha, Etienne Nowak, and Jing Li. 2019. Liquid Silicon: A Nonvolatile Fully Programmable Processing-In-Memory Processor with Monolithically Integrated ReRAM for Big Data/Machine Learning Applications. In 2019 IEEE Symposium on VLSI Circuits. IEEE, 206--207.

[40]

Jintao Zhang, Zhuo Wang, and Naveen Verma. 2017. In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array. IEEE Journal of Solid-State Circuits 52, 4 (April 2017), 915--924.

Cited By

Yoshioka K(2024)34.5 A 818-4094TOPS/W Capacitor-Reconfigured CIM Macro for Unified Acceleration of CNNs and Transformers2024 IEEE International Solid-State Circuits Conference (ISSCC)10.1109/ISSCC49657.2024.10454489(574-576)Online publication date: 18-Feb-2024
https://doi.org/10.1109/ISSCC49657.2024.10454489
Chen YYin GZhong HLee MYang HGeorge SNarayanan VLi X(2024)ZEBRA: A Zero-Bit Robust-Accumulation Compute-In-Memory Approach for Neural Network Acceleration Utilizing Different Bitwise Patterns2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)10.1109/ASP-DAC58780.2024.10473851(153-158)Online publication date: 22-Jan-2024
https://doi.org/10.1109/ASP-DAC58780.2024.10473851
Tsai YTing WWang CChang CLiu R(2023)Built-in Self-Test and Built-in Self-Repair Strategies Without Golden Signature for Computing in Memory2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137074(1-6)Online publication date: Apr-2023
https://doi.org/10.23919/DATE56975.2023.10137074
Show More Cited By

Recommendations

Fundamental Limits on Energy-Delay-Accuracy of In-Memory Architectures in Inference Applications
This article obtains fundamental limits on the computational precision of in-memory computing architectures (IMCs). An IMC noise model and associated signal-to-noise ratio (SNR) metrics are defined and their interrelationships analyzed to show that the ...
Precision analog to digital and digital to analog conversion using reference recirculating algorithmic architectures (codec, cyclic, successive approximation)
Current conveyor-based versatile precision rectifier

A versatile precision rectifier based on a current conveyor and current mirrors is presented. The proposed circuit performs positive half-wave, negative half-wave, positive full-wave, and negative full-wave rectification into a single circuit. The ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided Design

November 2020

1396 pages

ISBN:9781450380263

DOI:10.1145/3400302

General Chair:
Yuan Xie
Univ. of California, Santa Barbara, CA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

In-Cooperation

IEEE CAS
IEEE CEDA
IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 December 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Invited-talk

Conference

ICCAD '20

Sponsor:

SIGDA

ICCAD '20: IEEE/ACM International Conference on Computer-Aided Design

November 2 - 5, 2020

Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
436
Total Downloads

Downloads (Last 12 months)122
Downloads (Last 6 weeks)14

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yoshioka K(2024)34.5 A 818-4094TOPS/W Capacitor-Reconfigured CIM Macro for Unified Acceleration of CNNs and Transformers2024 IEEE International Solid-State Circuits Conference (ISSCC)10.1109/ISSCC49657.2024.10454489(574-576)Online publication date: 18-Feb-2024
https://doi.org/10.1109/ISSCC49657.2024.10454489
Chen YYin GZhong HLee MYang HGeorge SNarayanan VLi X(2024)ZEBRA: A Zero-Bit Robust-Accumulation Compute-In-Memory Approach for Neural Network Acceleration Utilizing Different Bitwise Patterns2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)10.1109/ASP-DAC58780.2024.10473851(153-158)Online publication date: 22-Jan-2024
https://doi.org/10.1109/ASP-DAC58780.2024.10473851
Tsai YTing WWang CChang CLiu R(2023)Built-in Self-Test and Built-in Self-Repair Strategies Without Golden Signature for Computing in Memory2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137074(1-6)Online publication date: Apr-2023
https://doi.org/10.23919/DATE56975.2023.10137074
Andrulis TEmer JSze VSolihin YHeinrich M(2023)RAELLA: Reforming the Arithmetic for Efficient, Low-Resolution, and Low-Loss Analog PIM: No Retraining Required!Proceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589062(1-16)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3579371.3589062
Spear MKim JBennett CAgarwal SMarinella MXiao T(2023)The Impact of Analog-to-Digital Converter Architecture and Variability on Analog Neural Network AccuracyIEEE Journal on Exploratory Solid-State Computational Devices and Circuits10.1109/JXCDC.2023.33151349:2(176-184)Online publication date: Dec-2023
https://doi.org/10.1109/JXCDC.2023.3315134
Houshmand PSarda GJain VUeyoshi KPapistas IShi MZheng QBhattacharjee DMallik ADebacker PVerkest DVerhelst M(2023)DIANA: An End-to-End Hybrid DIgital and ANAlog Neural Network SoC for the EdgeIEEE Journal of Solid-State Circuits10.1109/JSSC.2022.321406458:1(203-215)Online publication date: Jan-2023
https://doi.org/10.1109/JSSC.2022.3214064
Li MGeng HNiemier MHu X(2023)Accelerating Polynomial Modular Multiplication with Crossbar-Based Compute-in-Memory2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323790(1-9)Online publication date: 28-Oct-2023
https://doi.org/10.1109/ICCAD57390.2023.10323790
Jain VVerhelst MJain VVerhelst M(2023)DIANA: DIgital and ANAlog Heterogeneous Multi-core System-on-ChipTowards Heterogeneous Multi-core Systems-on-Chip for Edge Machine Learning10.1007/978-3-031-38230-7_7(119-141)Online publication date: 3-Jul-2023
https://doi.org/10.1007/978-3-031-38230-7_7
Saxena UChakraborty IRoy K(2022)Towards ADC-Less Compute-In-Memory Accelerators for Energy Efficient Deep Learning2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774573(624-627)Online publication date: 14-Mar-2022
https://doi.org/10.23919/DATE54114.2022.9774573
Saha GWang CRaghunathan ARoy KOshana R(2022)A cross-layer approach to cognitive computingProceedings of the 59th ACM/IEEE Design Automation Conference10.1145/3489517.3530642(1327-1330)Online publication date: 10-Jul-2022
https://dl.acm.org/doi/10.1145/3489517.3530642
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten