tutorial

ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks

Authors:

Swagath Venkataramani,

Subarno Banerjee,

Sasikanth Avancha,

Ashok Jagannathan,

Dheemanth Nagaraj,

Anand RaghunathanAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 45, Issue 2

Pages 13 - 26

https://doi.org/10.1145/3140659.3080244

Published: 24 June 2017 Publication History

Abstract

Deep Neural Networks (DNNs) have demonstrated state-of-the-art performance on a broad range of tasks involving natural language, speech, image, and video processing, and are deployed in many real world applications. However, DNNs impose significant computational challenges owing to the complexity of the networks and the amount of data they process, both of which are projected to grow in the future. To improve the efficiency of DNNs, we propose ScaleDeep, a dense, scalable server architecture, whose processing, memory and interconnect subsystems are specialized to leverage the compute and communication characteristics of DNNs. While several DNN accelerator designs have been proposed in recent years, the key difference is that ScaleDeep primarily targets DNN training, as opposed to only inference or evaluation. The key architectural features from which ScaleDeep derives its efficiency are: (i) heterogeneous processing tiles and chips to match the wide diversity in computational characteristics (FLOPs and Bytes/FLOP ratio) that manifest at different levels of granularity in DNNs, (ii) a memory hierarchy and 3-tiered interconnect topology that is suited to the memory access and communication patterns in DNNs, (iii) a low-overhead synchronization mechanism based on hardware data-flow trackers, and (iv) methods to map DNNs to the proposed architecture that minimize data movement and improve core utilization through nested pipelining. We have developed a compiler to allow any DNN topology to be programmed onto ScaleDeep, and a detailed architectural simulator to estimate performance and energy. The simulator incorporates timing and power models of ScaleDeep's components based on synthesis to Intel's 14nm technology. We evaluate an embodiment of ScaleDeep with 7032 processing tiles that operates at 600 MHz and has a peak performance of 680 TFLOPs (single precision) and 1.35 PFLOPs (half-precision) at 1.4KW. Across 11 state-of-the-art DNNs containing 0.65M-14.9M neurons and 6.8M-145.9M weights, including winners from 5 years of the ImageNet competition, ScaleDeep demonstrates 6x-28x speedup at iso-power over the state-of-the-art performance on GPUs.

References

[1]

2013. Improving Photo Search: A Step Across the Semantic Gap. Google Research blog (2013).

[2]

2014. Skype Translator - How it Works: http://blogs.skype.com/2014/12/15/skypetranslator-how-it-works/. Skype Blog (2014).

[3]

2016. Apple is turning Siri into a next-level Artificial Intelligence: http://mashable.com/2016/06/13/siri-sirikit-wwdc2016-analysis/hLMSxZKVnEqO. Mashable (2016).

[4]

2016. ConvNet Benchmarks: https://github.com/soumith/convnet-benchmarks. (2016).

[5]

2016. Driver's Ed for Self-Driving Cars: How Our Deep Learning Tech Taught a Car to Drive: https://blogs.nvidia.com/blog/2016/05/06/self-driving-cars-3/. NVIDIA blog (2016).

[6]

2016. Google supercharges machine learning tasks with TPU custom chip. Google Research blog (2016).

[7]

2016. Introducing DeepText: Facebook's text understanding engine: https://code.facebook.com/posts/181565595577955/introducing-deeptext-facebook-s-text-understanding-engine/. Facebook Code (2016).

[8]

2016. Neon, Nervana Systems: http://neon.nervanasys.com/docs/latest/index.html. (2016).

[9]

2016. Nervana Zoo: https://gist.github.com/nervanazoo. (2016).

[10]

2016. Princeton Deep Driving: http://deepdriving.cs.princeton.edu/. (2016).

[11]

2016. Synopsys Design Compiler: http://www.synopsys.com/Tools/Implementation/RTLSynthesis/DesignCompiler/Pages/default.aspx. (2016).

[12]

2016. Titan X: https://blogs.nvidia.com/blog/2016/07/21/titan-x. (2016).

[13]

Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: Ineffectual-neuron-free Deep Neural Network Computing. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA '16). IEEE Press, Piscataway, NJ, USA, 1--13.

Digital Library

[14]

Srimat Chakradhar, Murugan Sankaradas, Venkata Jakkula, and Srihari Cadambi. 2010. A Dynamically Configurable Coprocessor for Convolutional Neural Networks. SIGARCH Comput. Archit. News 38, 3 (June 2010), 247--257.

Digital Library

[15]

Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A Small-footprint High-throughput Accelerator for Ubiquitous Machine-learning. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). ACM, New York, NY, USA, 269--284.

Digital Library

[16]

Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2014. DaDianNao: A Machine-Learning Supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). IEEE Computer Society, Washington, DC, USA, 609--622.

Digital Library

[17]

Y. H. Chen, T. Krishna, J. Emer, and V. Sze. 2016. 14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. In 2016 IEEE International Solid-State Circuits Conference (ISSCC). 262--263.

[18]

Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. BinaryConnect: Training Deep Neural Networks with binary weights during propagations. CoRR abs/1511.00363 (2015). http://arxiv.org/abs/1511.00363

Digital Library

[19]

Dipankar Das, Sasikanth Avancha, Dheevatsa Mudigere, Karthikeyan Vaidyanathan, Srinivas Sridharan, Dhiraj D. Kalamkar, Bharat Kaul, and Pradeep Dubey. 2016. Distributed Deep Learning Using Synchronous Stochastic Gradient Descent. CoRR abs/1602.06709 (2016). http://arxiv.org/abs/1602.06709

[20]

Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Marc'aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, Quoc V. Le, and Andrew Y. Ng. 2012. Large Scale Distributed Deep Networks. In Advances in Neural Information Processing Systems 25, P. Bartlett, F.c.n. Pereira, C.j.c. Burges, L. Bottou, and K.q. Weinberger (Eds.). 1232--1240. http://books.nips.cc/papers/files/nips25/NIPS2012_0598.pdf

Digital Library

[21]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09.

[22]

S. Eldridge, A. Waterland, M. Seltzer, J. Appavoo, and A. Joshi. 2015. Towards General-Purpose Neural Network Computing. In 2015 International Conference on Parallel Architecture and Compilation (PACT). 99--112.

Digital Library

[23]

C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and Y. LeCun. 2011. NeuFlow: A runtime reconfigurable dataflow processor for vision. In CVPR 2011 WORKSHOPS. 109--116.

[24]

V. Gokhale, J. Jin, A. Dundar, B. Martini, and E. Culurciello. 2014. A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks. In 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops. 696--701.

Digital Library

[25]

Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep Learning with Limited Numerical Precision. In Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37 (ICML'15). JMLR.org, 1737--1746.

Digital Library

[26]

S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 243--254.

Digital Library

[27]

Song Han, Huizi Mao, and William J. Dally. 2015. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. CoRR abs/1510.00149 (2015). http://arxiv.org/abs/1510.00149

[28]

Awni Y. Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, and Andrew Y. Ng. 2014. Deep Speech: Scaling up end-to-end speech recognition. CoRR abs/1412.5567 (2014). http://arxiv.org/abs/1412.5567

[29]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. CoRR abs/1512.03385 (2015). http://arxiv.org/abs/1512.03385

[30]

Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, and Brian Kingsbury. 2012. Deep Neural Networks for Acoustic Modeling in Speech Recognition. Signal Processing Magazine (2012).

[31]

Forrest N. Iandola, Khalid Ashraf, Matthew W. Moskewicz, and Kurt Keutzer. 2015. FireCaffe: near-linear acceleration of deep neural network training on compute clusters. CoRR abs/1511.00175 (2015). http://arxiv.org/abs/1511.00175

[32]

Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. 2014. Speeding up Convolutional Neural Networks with Low Rank Expansions. CoRR abs/1405.3866 (2014). http://arxiv.org/abs/1405.3866

[33]

Alex Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. CoRR abs/1404.5997 (2014). http://arxiv.org/abs/1404.5997

[34]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 2012.

Digital Library

[35]

Andrew Lavin. 2015. Fast Algorithms for Convolutional Neural Networks. CoRR abs/1509.09308 (2015). http://arxiv.org/abs/1509.09308

[36]

Jiwei Li, Michel Galley, Chris Brockett, Georgios P. Spithourakis, Jianfeng Gao, and Bill Dolan. 2016. A Persona-Based Neural Conversation Model. https://www.microsoft.com/en-us/research/publication/persona-based-neural-conversation-model/

[37]

Xiaoxiao Liu, Mengjie Mao, Beiye Liu, Hai Li, Yiran Chen, Boxun Li, Yu Wang, Hao Jiang, Mark Barnell, Qing Wu, and Jianhua Yang. 2015. RENO: A High-efficient Reconfigurable Neuromorphic Computing Accelerator Design. In Proceedings of the 52Nd Annual Design Automation Conference (DAC '15). ACM, New York, NY, USA, Article 66, 6 pages.

Digital Library

[38]

Abhinandan Majumdar, Srihari Cadambi, Michela Becchi, Srimat T. Chakradhar, and Hans Peter Graf. 2012. A Massively Parallel, Energy Efficient Programmable Accelerator for Learning and Classification. ACM Trans. Archit. Code Optim. 9, 1, Article 6 (March 2012), 30 pages.

Digital Library

[39]

N. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D. H. Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th International Symposium on Computer Architecture (ISCA '17).

Digital Library

[40]

S. G. Ramasubramanian, R. Venkatesan, M. Sharad, K. Roy, and A. Raghunathan. 2014. SPINDLE: SPINtronic Deep Learning Engine for large-scale neuromorphic computing. In Low Power Electronics and Design (ISLPED), 2014 IEEE/ACM International Symposium on. 15--20.

Digital Library

[41]

Brandon Reagen, Paul Whatmough, Robert Adolf, Saketh Rama, Hyunkwang Lee, Sae Kyu Lee, José Miguel Hernández-Lobato, Gu-Yeon Wei, and David Brooks. 2016. Minerva: Enabling Low-power, Highly-accurate Deep Neural Network Accelerators. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA '16). IEEE Press, Piscataway, NJ, USA, 267--278.

Digital Library

[42]

M. Rhu, N. Gimelshein, J. Clemons, A. Zulfiqar, and S. W. Keckler. 2016. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1--13.

Digital Library

[43]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211--252.

Digital Library

[44]

Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. 2014. 1-Bit Stochastic Gradient Descent and Application to Data-Parallel Distributed Training of Speech DNNs, In Interspeech 2014.

[45]

Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, and Yann Lecun. 2013. Overfeat: Integrated recognition, localization and detection using convolutional networks. http://arxiv.org/abs/1312.6229 (2013).

[46]

A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar. 2016. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 14--26.

Digital Library

[47]

Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556 (2014). http://arxiv.org/abs/1409.1556

[48]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015. 1--9.

[49]

Swagath Venkataramani, Vinay K. Chippa, Srimat T. Chakradhar, Kaushik Roy, and Anand Raghunathan. 2013. Quality Programmable Vector Processors for Approximate Computing. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). ACM, New York, NY, USA, 1--12.

Digital Library

[50]

Swagath Venkataramani, Ashish Ranjan, Kaushik Roy, and Anand Raghunathan. 2014. AxNN: Energy-efficient Neuromorphic Systems Using Approximate Computing. In Proceedings of the 2014 International Symposium on Low Power Electronics and Design (ISLPED '14). ACM, New York, NY, USA, 27--32.

Digital Library

[51]

S. Venugopalan, M. Rohrbach, J. Donahue, R. J. Mooney, T. Darrell, and K. Saenko. 2015. Sequence to Sequence - Video to Text. In 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015. 4534--4542.

Digital Library

[52]

Wei Zhang, Suyog Gupta, Xiangru Lian, and Ji Liu. 2015. Staleness-aware Async-SGD for Distributed Deep Learning. CoRR abs/1511.05950 (2015). http://arxiv.org/abs/1511.05950

Digital Library

[53]

Xiang Zhang and Yann LeCun. 2015. Text Understanding from Scratch. CoRR abs/1502.01710 (2015). http://arxiv.org/abs/1502.01710

[54]

Chenzhuo Zhu, Song Han, Huizi Mao, and William J. Dally. 2016. Trained Ternary Quantization. CoRR abs/1612.01064 (2016). http://arxiv.org/abs/1612.01064

[55]

Aleksandar Zlateski, Kisuk Lee, and H. Sebastian Seung. 2016. ZNN - A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-core and Many-Core Shared Memory Machines. In 2016 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2016, Chicago, IL, USA, May 23-27, 2016. 801--811.

Cited By

Reza MYeazel A(2024)Mapping Model and Heuristics for Accelerating Deep Neural Networks and for Energy-Efficient Networks-on-ChipSoutheastCon 202410.1109/SoutheastCon52093.2024.10500232(119-126)Online publication date: 15-Mar-2024
https://doi.org/10.1109/SoutheastCon52093.2024.10500232
Zhang CZhang FChen KChen MHe BDu X(2023)EdgeNN: Efficient Neural Network Inference for CPU-GPU Integrated Edge Devices2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00096(1193-1207)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00096
Kotkova B(2023)Use of Dynamic Biometric Signature in Communication of Company2023 27th International Conference on Circuits, Systems, Communications and Computers (CSCC)10.1109/CSCC58962.2023.00037(1-5)Online publication date: 19-Jul-2023
https://doi.org/10.1109/CSCC58962.2023.00037
Show More Cited By

Index Terms

ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks

Recommendations

ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks
ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture

Deep Neural Networks (DNNs) have demonstrated state-of-the-art performance on a broad range of tasks involving natural language, speech, image, and video processing, and are deployed in many real world applications. However, DNNs impose significant ...
From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming

In this work, we evaluate OpenCL as a programming tool for developing performance-portable applications for GPGPU. While the Khronos group developed OpenCL with programming portability in mind, performance is not necessarily portable. OpenCL has ...
Computing discrete transforms on the Cell Broadband Engine

Discrete transforms are of primary importance and fundamental kernels in many computationally intensive scientific applications. In this paper, we investigate the performance of two such algorithms; Fast Fourier Transform (FFT) and Discrete Wavelet ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 45, Issue 2

ISCA'17

May 2017

715 pages

ISSN:0163-5964

DOI:10.1145/3140659

Editor:
Babak Falsafi
Interim

Issue’s Table of Contents

ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture
June 2017
736 pages
ISBN:9781450348928
DOI:10.1145/3079856

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 June 2017

Published in SIGARCH Volume 45, Issue 2

Check for updates

Author Tags

Qualifiers

Tutorial
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

152
Total Citations
View Citations
4,899
Total Downloads

Downloads (Last 12 months)126
Downloads (Last 6 weeks)10

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Reza MYeazel A(2024)Mapping Model and Heuristics for Accelerating Deep Neural Networks and for Energy-Efficient Networks-on-ChipSoutheastCon 202410.1109/SoutheastCon52093.2024.10500232(119-126)Online publication date: 15-Mar-2024
https://doi.org/10.1109/SoutheastCon52093.2024.10500232
Zhang CZhang FChen KChen MHe BDu X(2023)EdgeNN: Efficient Neural Network Inference for CPU-GPU Integrated Edge Devices2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00096(1193-1207)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00096
Kotkova B(2023)Use of Dynamic Biometric Signature in Communication of Company2023 27th International Conference on Circuits, Systems, Communications and Computers (CSCC)10.1109/CSCC58962.2023.00037(1-5)Online publication date: 19-Jul-2023
https://doi.org/10.1109/CSCC58962.2023.00037
Chen F(2021)PuffinProceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design10.5555/3489049.3489073(1-6)Online publication date: 26-Jul-2021
https://dl.acm.org/doi/10.5555/3489049.3489073
Servais JAtoofian E(2021)Adaptive Computation Reuse for Energy-Efficient Training of Deep Neural NetworksACM Transactions on Embedded Computing Systems10.1145/348702520:6(1-24)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3487025
Hosseini FMeng FYang CWen WCammarota R(2021)Tolerating Defects in Low-Power Neural Network Accelerators Via Retraining-Free Weight ApproximationACM Transactions on Embedded Computing Systems10.1145/347701620:5s(1-21)Online publication date: 22-Sep-2021
https://dl.acm.org/doi/10.1145/3477016
Krishnan GMandal SChakrabarti CSeo JOgras UCao Y(2021)Impact of On-chip Interconnect on In-memory Acceleration of Deep Neural NetworksACM Journal on Emerging Technologies in Computing Systems10.1145/346023318:2(1-22)Online publication date: 31-Dec-2021
https://dl.acm.org/doi/10.1145/3460233
Roy SSridharan SJain SRaghunathan A(2021)TxSim: Modeling Training of Deep Neural Networks on Resistive Crossbar SystemsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2021.306354329:4(730-738)Online publication date: Apr-2021
https://doi.org/10.1109/TVLSI.2021.3063543
Song ZSun YChen LLi TJing NLiang XJiang L(2021)ITT-RNA: Imperfection Tolerable Training for RRAM-Crossbar-Based Deep Neural-Network AcceleratorIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.298937340:1(129-142)Online publication date: Jan-2021
https://doi.org/10.1109/TCAD.2020.2989373
Dave SBaghdadi RNowatzki TAvancha SShrivastava ALi B(2021)Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and InsightsProceedings of the IEEE10.1109/JPROC.2021.3098483109:10(1706-1752)Online publication date: Oct-2021
https://doi.org/10.1109/JPROC.2021.3098483
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents