research-article

Public Access

HSIM-DNN: Hardware Simulator for Computation-, Storage- and Power-Efficient Deep Neural Networks

Authors:

Naehyuck Chang,

Xue LinAuthors Info & Claims

GLSVLSI '19: Proceedings of the 2019 Great Lakes Symposium on VLSI

Pages 81 - 86

https://doi.org/10.1145/3299874.3317996

Published: 13 May 2019 Publication History

Abstract

Deep learning that utilizes large-scale deep neural networks (DNNs) is effective in automatic high-level feature extraction but also computation and memory intensive. Constructing DNNs using block-circulant matrices can simultaneously achieve hardware acceleration and model compression while maintaining high accuracy. This paper proposes HSIM-DNN, an accurate hardware simulator on the C++ platform, to simulate the exact behavior of DNN hardware implementations and thereby facilitate the block-circulant matrix-based design of DNN training and inference procedures in hardware. Real FPGA implementations validate the simulator with various circulant block sizes and data bit lengths taking into account accuracy, compression ratio and power consumption, which provides excellent insights for hardware design.

References

[1]

Sourav Bhattacharya and Nicholas D. Lane. 2016. From smart to deep: Robust activity recognition on smartwatches using deep learning. In Proceedings of the IEEE International Conference on Pervasive Computing and Communication Workshops (PerCom Workshops). 1--6.

[2]

Dario Bini and Victor Y. Pan. 2012. Polynomial and Matrix Computations: Fundamental Algorithms. Springer Science & Business Media.

Digital Library

[3]

Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM Sigplan Notices, Vol. 49, 4 (2014), 269--284.

Digital Library

[4]

Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, et al. 2014. DaDianNao: A Machine-Learning Supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. 609--622.

Digital Library

[5]

Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze. 2017. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE Journal of Solid-State Circuits, Vol. 52, 1 (2017), 127--138.

[6]

Yu Cheng, Felix X. Yu, Rogerio S. Feris, Sanjiv Kumar, Alok Choudhary, and Shi-Fu Chang. 2015. An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections. In Proceedings of the IEEE International Conference on Computer Vision. 2857--2865.

Digital Library

[7]

Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. BinaryNet: Training Neural Networks with Weights and Activations Constrained to +1 or -1. CoRR, Vol. abs/1602.02830 (2016).

[8]

Caiwen Ding, Siyu Liao, Yanzhi Wang, Zhe Li, Ning Liu, Youwei Zhuo, Chao Wang, Xuehai Qian, Yu Bai, Geng Yuan, et al. 2017. CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-circulant Weight Matrices. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. 395--408.

Digital Library

[9]

Steve K. Esser, Rathinakumar Appuswamy, Paul Merolla, John V. Arthur, and Dharmendra S. Modha. 2015. Backpropagation for Energy-Efficient Neuromorphic Computing. In Proceedings of Neural Information Processing Systems. 1117--1125.

Digital Library

[10]

Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In Proceedings of the 43rd Annual International Symposium on Computer Architecture. 243--254.

Digital Library

[11]

Pu Zhao, Sijia Liu, Yanzhi Wang, and Xue Lin. 2018. An ADMM-Based Universal Framework for Adversarial Attacks on Deep Neural Networks. In Proceedings of the 26th ACM International Conference on Multimedia. 1065--1073.

Digital Library

[12]

Song Han, Huizi Mao, and William J. Dally. 2015. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. CoRR, Vol. abs/1510.00149 (2015).

[13]

Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both Weights and Connections for Efficient Neural Networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems. 1135--1143.

Digital Library

[14]

Geoffrey Hinton, Li Deng, Dong Yu, George E. Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N. Sainath, et al. 2012. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, Vol. 29, 6 (2012), 82--97.

[15]

Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. 2014. Speeding up Convolutional Neural Networks with Low Rank Expansions. CoRR, Vol. abs/1405.3866 (2014).

[16]

Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 1--12.

Digital Library

[17]

Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor M. Aamodt, and Andreas Moshovos. 2016. Stripes: Bit-serial deep neural network computing. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture. 1--12.

Digital Library

[18]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems. 1097--1105.

Digital Library

[19]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE, Vol. 86, 11 (1998), 2278--2324.

[20]

He Li, Kaoru Ota, and Mianxiong Dong. 2018. Learning IoT in Edge: Deep Learning for the Internet of Things with Edge Computing. IEEE Network, Vol. 32, 1 (2018), 96--101.

[21]

Darryl Lin, Sachin Talathi, and Sreekanth Annapureddy. 2016. Fixed Point Quantization of Deep Convolutional Networks. In Proceedings of the 33rd International Conference on International Conference on Machine Learning. 2849--2858.

Digital Library

[22]

Paul A. Merolla, John V. Arthur, Rodrigo Alvarez-Icaza, Andrew S. Cassidy, Jun Sawada, Filipp Akopyan, Bryan L. Jackson, Nabil Imam, Chen Guo, Yutaka Nakamura, et al. 2014. A million spiking-neuron integrated circuit with a scalable communication network and interface. Science, Vol. 345, 6197 (2014), 668--673.

[23]

Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, et al. 2016. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 26--35.

Digital Library

[24]

SourceForge. 2010. FFTW++: Fast Fourier Transform C++ Header/MPI Transpose for FFTW3. http://fftwpp.sourceforge.net/

[25]

Cheng Tai, Tong Xiao, Yi Zhang, Xiaogang Wang, and Weinan E. 2015. Convolutional neural networks with low-rank regularization. CoRR, Vol. abs/1511.06067 (2015).

[26]

Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, and Jian Cheng. 2016. Quantized Convolutional Neural Networks for Mobile Devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4820--4828.

[27]

Chenzhuo Zhu, Song Han, Huizi Mao, and William J. Dally. 2016. Trained Ternary Quantization. CoRR, Vol. abs/1612.01064 (2016).

Cited By

Mao YWu JXu XWang L(2022)Adaptive sparse ternary gradient compression for distributed DNN training in edge computingCCF Transactions on High Performance Computing10.1007/s42514-022-00091-24:2(120-134)Online publication date: 18-Mar-2022
https://doi.org/10.1007/s42514-022-00091-2
Wu JMao YXu X(2021)ASTC: An Adaptive Gradient Compression Scheme for Communication-Efficient Edge Computing2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00142(889-894)Online publication date: Dec-2021
https://doi.org/10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00142
Xi SYao YBhardwaj KWhatmough PWei GBrooks D(2020)SMAUGACM Transactions on Architecture and Code Optimization10.1145/342466917:4(1-26)Online publication date: 10-Nov-2020
https://dl.acm.org/doi/10.1145/3424669

Index Terms

HSIM-DNN: Hardware Simulator for Computation-, Storage- and Power-Efficient Deep Neural Networks

Recommendations

Resource-constrained FPGA/DNN co-design
Abstract
Deep neural networks (DNNs) have demonstrated super performance in most learning tasks. However, a DNN typically contains a large number of parameters and operations, requiring a high-end processing platform for high-speed execution. To address ...
Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

As convolution layers contribute most operations in convolutional neural network (CNN) algorithms, an effective convolution acceleration scheme significantly affects the efficiency and performance of a hardware CNN accelerator. Convolution in CNNs ...
CirCNN: accelerating and compressing deep neural networks using block-circulant weight matrices
MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture

Large-scale deep neural networks (DNNs) are both compute and memory intensive. As the size of DNNs continues to grow, it is critical to improve the energy efficiency and performance while maintaining accuracy. For DNNs, the model size is an important ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

GLSVLSI '19: Proceedings of the 2019 Great Lakes Symposium on VLSI

May 2019

562 pages

ISBN:9781450362528

DOI:10.1145/3299874

General Chairs:
Houman Homayoun
George Mason University, USA
,
Baris Taskin
Drexel University, USA
,
Program Chairs:
Tinoosh Mohsenin
UMBC, USA
,
Weisheng Zhao
Beihang University, China

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

GLSVLSI '19

Sponsor:

SIGDA

GLSVLSI '19: Great Lakes Symposium on VLSI 2019

May 9 - 11, 2019

VA, Tysons Corner, USA

Acceptance Rates

Overall Acceptance Rate 312 of 1,156 submissions, 27%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
570
Total Downloads

Downloads (Last 12 months)67
Downloads (Last 6 weeks)8

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mao YWu JXu XWang L(2022)Adaptive sparse ternary gradient compression for distributed DNN training in edge computingCCF Transactions on High Performance Computing10.1007/s42514-022-00091-24:2(120-134)Online publication date: 18-Mar-2022
https://doi.org/10.1007/s42514-022-00091-2
Wu JMao YXu X(2021)ASTC: An Adaptive Gradient Compression Scheme for Communication-Efficient Edge Computing2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00142(889-894)Online publication date: Dec-2021
https://doi.org/10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00142
Xi SYao YBhardwaj KWhatmough PWei GBrooks D(2020)SMAUGACM Transactions on Architecture and Code Optimization10.1145/342466917:4(1-26)Online publication date: 10-Nov-2020
https://dl.acm.org/doi/10.1145/3424669

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents