Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Fundamental Limits on Energy-Delay-Accuracy of In-Memory Architectures in Inference Applications

Published: 01 October 2022 Publication History

Abstract

This article obtains fundamental limits on the computational precision of in-memory computing architectures (IMCs). An IMC noise model and associated signal-to-noise ratio (SNR) metrics are defined and their interrelationships analyzed to show that the accuracy of IMCs is fundamentally limited by the compute SNR (<inline-formula> <tex-math notation="LaTeX">${\mathrm {SNR}}_{a}$ </tex-math></inline-formula>) of its analog core, and that activation, weight, and output (ADC) precision needs to be assigned appropriately for the final output SNR (<inline-formula> <tex-math notation="LaTeX">${\mathrm {SNR}}_{T}$ </tex-math></inline-formula>) to approach <inline-formula> <tex-math notation="LaTeX">${\mathrm {SNR}}_{a}$ </tex-math></inline-formula>. The minimum precision criterion (MPC) is proposed to minimize the analog-to-digital converter (ADC) precision and hence its overhead. Three in-memory compute models&#x2014;charge summing (QS), current summing (IS), and charge redistribution (QR)&#x2014;are shown to underlie most known IMCs. Noise, energy, and delay expressions for the compute models are developed and employed to derive expressions for the SNR, ADC precision, energy, and latency of IMCs. The compute SNR expressions are validated via Monte Carlo simulations in a 65 nm CMOS process. For a 512 row SRAM array, it is shown that: 1) IMCs have an upper bound on their maximum achievable <inline-formula> <tex-math notation="LaTeX">${\mathrm {SNR}}_{a}$ </tex-math></inline-formula> due to constraints on energy, area and voltage swing, and this upper bound reduces with technology scaling for QS-based architectures; 2) MPC enables <inline-formula> <tex-math notation="LaTeX">${\mathrm {SNR}}_{T}$ </tex-math></inline-formula> to approach <inline-formula> <tex-math notation="LaTeX">${\mathrm {SNR}}_{a}$ </tex-math></inline-formula> to be realized with minimal ADC precision; and 3) QS-based (QR-based) architectures are preferred for low (high) compute SNR scenarios.

References

[1]
M. Kang, M.-S. Keel, N. R. Shanbhag, S. Eilert, and K. Curewitz, “An energy-efficient VLSI architecture for pattern recognition via deep embedding of computation in SRAM,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), May 2014, pp. 8326–8330.
[2]
N. Shanbhag, M. Kang, and M.-S. Keel, “Compute memory,” U.S. Patent 9 697 877, Jul. 4, 2017.
[3]
M. Kang, S. Gonugondla, and N. R. Shanbhag, Deep In-Memory Architectures for Machine Learning. Cham, Switzerland: Springer, 2020.
[4]
N. Vermaet al., “In-memory computing: Advances and prospects,” IEEE Solid-State Circuits Mag., vol. 11, no. 3, pp. 43–55, Aug. 2019.
[5]
J. Zhang, Z. Wang, and N. Verma, “In-memory computation of a machine-learning classifier in a standard 6T SRAM array,” IEEE J. Solid-State Circuits, vol. 52, no. 4, pp. 915–924, Apr. 2017.
[6]
M. Kang, S. K. Gonugondla; A. Patil, and N. R. Shanbhag, “A multi-functional in-memory inference processor using a standard 6T SRAM array,” IEEE J. Solid-State Circuits, vol. 53, no. 2, pp. 642–655, Feb. 2018.
[7]
Z. Jiang, Z. Jiang, J.-S. Seo, and M. Seok, “XNOR-SRAM: In-memory computing SRAM macro for binary/ternary deep neural networks,” in Proc. IEEE Symp. VLSI Technol., 2018, pp. 173–174.
[8]
A. Biswas and A. P. Chandrakasan, “Conv-RAM: An energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), 2018, pp. 488–490.
[9]
S. K. Gonugondla, M. Kang, and N. R. Shanbhag, “A variation-tolerant in-memory machine learning classifier via on-chip training,” IEEE J. Solid-State Circuits, vol. 53, no. 11, pp. 3163–3173, Nov. 2018.
[10]
H. Dbouk, S. K. Gonugondla, C. Sakr, and N. R. Shanbhag, “KeyRAM: A 0.34 uj/decision 18 k decisions/s recurrent attention in-memory processor for keyword spotting,” in Proc. IEEE Custom Integr. Circuits Conf. (CICC), 2020, pp. 1–4.
[11]
W.-S. Khwaet al., “A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3 ns and 55.8 TOPS/W fully parallel product-sum operation for binary DNN edge processors,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), 2018, pp. 496–498.
[12]
H. Valavi, P. J. Ramadge, E. Nestler, and N. Verma, “A mixed-signal binarized convolutional-neural-network accelerator integrating dense weight storage and multiplication for reduced data movement,” in Proc. IEEE Symp. VLSI Circuits, 2018, pp. 141–142.
[13]
J. Kimet al., “Area-efficient and variation-tolerant in-memory BNN computing using 6T SRAM array,” in Proc. IEEE Symp. VLSI Circuits, 2019, pp. 118–119.
[14]
Q. Donget al., “A 351 TOPS/W and 372.4 GOPS compute-in-memory SRAM macro in 7nm FinFET CMOS for machine learning applications,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), 2020, pp. 242–243.
[15]
J.-W. Suet al., “15.2 a 28nm 64Kb inference-training two-way transpose multibit 6T SRAM compute-in-memory macro for AI edge chips,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), 2020, pp. 240–241.
[16]
X. Siet al., “15.5 a 28nm 64Kb 6T SRAM computing-in- memory macro with 8b MAC operation for AI edge chips,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), 2020, pp. 246–247.
[17]
A. Jaiswal, I. Chakraborty, A. Agrawal, and K. Roy, “8T SRAM cell as a multi-bit dot product engine for beyond von-neumann computing,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 27, no. 11, pp. 2556–2567, Nov. 2019.
[18]
M. Ali, A. Jaiswal, S. Kodge, A. Agrawal, I. Chakraborty, and K. Roy, “IMAC: In-memory multi-bit multiplication and accumulation in 6T SRAM array,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 67, no. 8, pp. 2521–2531, Aug. 2020.
[19]
X. Siet al., “A dual-split 6T SRAM-based computing-in-memory unit-macro with fully parallel product-sum operation for binarized DNN edge processors,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 66, no. 11, pp. 4172–4185, Nov. 2019.
[20]
Z. Liuet al., “NS-CIM: A current-mode computation-in-memory architecture enabling near-sensor processing for intelligent IoT vision nodes,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 67, no. 9, pp. 2909–2922, Sep. 2020.
[21]
S. Zhang, K. Huang, and H. Shen, “A robust 8-bit non-volatile computing-in-memory core for low-power parallel MAC operations,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 67, no. 6, pp. 1867–1880, Jun. 2020.
[22]
M. Gong, N. Cao, M. Chang, and A. Raychowdhury, “A 65nm thermometer-encoded time/charge-based compute-in-memory neural network accelerator at 0.735pJ/MAC and 0.41pJ/Update,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 68, no. 4, pp. 1408–1412, Apr. 2021.
[23]
A. Agrawalet al., “Xcel-RAM: Accelerating binary neural networks in high-throughput SRAM compute arrays,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 66, no. 8, pp. 3064–3076, Aug. 2019.
[24]
A. Jaiswal, A. Agrawal, M. F. Ali, S. Sharmin, and K. Roy, “i-SRAM: Interleaved wordlines for vector boolean operations using SRAMS,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 67, no. 12, pp. 4651–4659, Dec. 2020.
[25]
S. Yin, Z. Jiang, M. Kim, T. Gupta, M. Seok, and J.-S. Seo, “Vesti: Energy-efficient in-memory computing accelerator for deep neural networks,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 28, no. 1, pp. 48–61, Jan. 2020.
[26]
S. Srinivasaet al., “ROBIN: Monolithic-3D SRAM for enhanced robustness with in-memory computation support,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 66, no. 7, pp. 2533–2545, Jul. 2019.
[27]
M. A. Lebdeh, H. Abunahla, B. Mohammad, and M. Al-Qutayri, “An efficient heterogeneous memristive XNOR for in-memory computing,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 64, no. 9, pp. 2427–2437, Sep. 2017.
[28]
A. Kneip and D. Bol, “Impact of analog non-idealities on the design space of 6T-SRAM current-domain dot-product operators for in-memory computing,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 68, no. 5, pp. 1931–1944, May 2021.
[29]
M. Kang, Y. Kim, A. D. Patil, and N. R. Shanbhag, “Deep in-memory architectures for machine learning–accuracy vs. efficiency trade-offs,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 67, no. 5, pp. 1627–1639, May 2020.
[30]
M. Ali, A. Jaiswal, S. Kodge, A. Agrawal, I. Chakraborty, and K. Roy, “IMAC: In-memory multi-bit multiplication and accumulation in 6T SRAM array,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 67, no. 8, pp. 2521–2531, Aug. 2020.
[31]
S. K. Gonugondla, C. Sakr, H. Dbouk, and N. R. Shanbhag, “Fundamental limits on the precision of in-memory architectures,” in Proc. IEEE/ACM Int. Conf. Comput. Aided Design (ICCAD), 2020, pp. 1–9.
[32]
S. K. Gonugondla, C. Sakr, H. Dbouk, and N. R. Shanbhag, “Fundamental limits on energy-delay-accuracy of in-memory architectures in inference applications,” 2020, arXiv:2012.13645.
[33]
M. Goel and N. R. Shanbhag, “Finite-precision analysis of the pipelined strength-reduced adaptive filter,” IEEE Trans. Signal Process., vol. 46, no. 6, pp. 1763–1769, Jun. 1998.
[34]
S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, “Deep learning with limited numerical precision,” in Proc. Int. Conf. Mach. Learn., 2015, pp. 1737–1746.
[35]
C. Sakr, Y. Kim, and N. Shanbhag, “Analytical guarantees on numerical precision of deep neural networks,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 3007–3016.
[36]
C. Sakr and N. Shanbhag, “An analytical method to determine minimum per-layer precision of deep neural networks,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), 2018, pp. 1090–1094.
[37]
A. S. Rekhiet al., “Analog/mixed-signal hardware error modeling for deep learning inference,” in Proc. 56th Annu. Design Autom. Conf., 2019, p. 81. [Online]. Available: https://doi.org/10.1145/3316781.3317770
[38]
S. Yin, Z. Jiang, J.-S. Seo, and M. Seok, “XNOR-SRAM: In-memory computing SRAM macro for binary/ternary deep neural networks,” IEEE J. Solid-State Circuits, vol. 55, no. 6, pp. 1733–1743, Jun. 2020.
[39]
X. Siet al., “24.5 a twin-8T SRAM computation-in-memory macro for multiple-bit CNN-based machine learning,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), 2019, pp. 396–398.
[40]
H. Jia, Y. Tang, H. Valavi, J. Zhang, and N. Verma, “A microprocessor implemented in 65nm CMOS with configurable and bit-scalable accelerator for programmable in-memory computing,” 2018, arXiv:1811.04047.
[41]
S. Okumura, M. Yabuuchi, K. Hijioka, and K. Nose, “A ternary based bit scalable, 8.80 TOPS/W CNN accelerator with many-core processing-in-memory architecture with 896k synapses/mm2,” in Proc. IEEE Symp. VLSI Circuits, 2019, pp. 248–249.
[42]
R. Guoet al., “A 5.1pJ/neuron 127.3us/inference RNN-based speech recognition processor using 16 computing-in-memory SRAM macros in 65nm CMOS,” in Proc. IEEE Symp. VLSI Circuits, 2019, pp. 120–121.
[43]
J. Yueet al., “A 65nm computing-in-memory-based CNN processor with 2.9-to-35.8TOPS/W system energy efficiency using dynamic-sparsity performance-scaling architecture and energy-efficient inter/intra-macro data reuse,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), 2020, pp. 234–235.
[44]
Z. Jiang, S. Yin, J.-S. Seo, and M. Seok, “C3SRAM: An in-memory-computing sram macro based on robust capacitive coupling computing mechanism,” IEEE J. Solid-State Circuits, vol. 55, no. 7, pp. 1888–1897, Jul. 2020.
[45]
M. Kang, S. K. Gonugondla, M.-S. Keel, and N. R. Shanbhag, “An energy-efficient memory-based high-throughput VLSI architecture for convolutional networks,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), May 2015, pp. 1037–1041.
[46]
V. Tripathi and B. Murmann, “Mismatch characterization of small metal fringe capacitors,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 61, no. 8, pp. 2236–2242, Aug. 2014.
[47]
G. Wegmann, E. A. Vittoz, and F. Rahali, “Charge injection in analog MOS switches,” IEEE J. Solid-State Circuits, vol. 22, no. 6, pp. 1091–1097, Dec. 1987.
[48]
D. Bankman and B. Murmann, “An 8-bit, 16 input, 3.2 pJ/op switched-capacitor dot product circuit in 28-nm FDSOI CMOS,” in Proc. IEEE Asian Solid-State Circuits Conf. (A-SSCC), 2016, pp. 21–24.
[49]
B. Murmann, “Mixed-signal computing for deep neural network inference,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 29, no. 1, pp. 3–13, Jan. 2021.
[50]
S. K. Gonugondla, M. Kang, and N. R. Shanbhag, “A 42pJ/decision 3.12 TOPS/W robust in-memory machine learning classifier with on-chip training,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), 2018, pp. 490–492.
[51]
B. Murmann. “ADC Performance Survey 1997–2019.” 2019. [Online]. Available: https://web.stanford.edu/murmann/adcsurvey.html
[52]
B. Murmann, “A/D converter trends: Power dissipation, scaling and digitally assisted architectures,” in Proc. IEEE Custom Integr. Circuits Conf., 2008, pp. 105–112.
[53]
ITRS roadmap tables,” ITRS-Collaborations, Int. Technol. Roadmap Semicond. (ITRS), Semicond. Ind. Assoc., Washington, DC, USA, Rep. ITRS 2015, 2015. [Online]. Available: http://www.itrs2.net/itrs-reports.html

Index Terms

  1. Fundamental Limits on Energy-Delay-Accuracy of In-Memory Architectures in Inference Applications
              Index terms have been assigned to the content through auto-classification.

              Recommendations

              Comments

              Please enable JavaScript to view thecomments powered by Disqus.

              Information & Contributors

              Information

              Published In

              cover image IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
              IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  Volume 41, Issue 10
              Oct. 2022
              401 pages

              Publisher

              IEEE Press

              Publication History

              Published: 01 October 2022

              Qualifiers

              • Research-article

              Contributors

              Other Metrics

              Bibliometrics & Citations

              Bibliometrics

              Article Metrics

              • 0
                Total Citations
              • 0
                Total Downloads
              • Downloads (Last 12 months)0
              • Downloads (Last 6 weeks)0
              Reflects downloads up to 22 Nov 2024

              Other Metrics

              Citations

              View Options

              View options

              Login options

              Media

              Figures

              Other

              Tables

              Share

              Share

              Share this Publication link

              Share on social media