research-article

Improving Efficiency in Neural Network Accelerator using Operands Hamming Distance Optimization

Authors:

Vikas ChandraAuthors Info & Claims

ASPDAC '21: Proceedings of the 26th Asia and South Pacific Design Automation Conference

Pages 599 - 604

https://doi.org/10.1145/3394885.3446242

Published: 29 January 2021 Publication History

Abstract

Neural network accelerator is a key enabler for the on-device AI inference, for which energy efficiency is an important metric. The datapath energy, including the computation energy and the data movement energy among the arithmetic units, claims a significant part of the total accelerator energy. By revisiting the basic physics of the arithmetic logic circuits, we show that the datapath energy is highly correlated with the bit flips when streaming the input operands into the arithmetic units, defined as the hamming distance (HD) of the input operand matrices. Based on the insight, we propose a post-training optimization algorithm and a HD-aware training algorithm to co-design and co-optimize the accelerator and the network synergistically. The experimental results based on post-layout simulation with MobileNetV2 demonstrate on average 2.85x datapath energy reduction and up to 8.51x datapath energy reduction for certain layers.

References

[1]

Renzo Andri, Lukas Cavigelli, Davide Rossi, and Luca Benini. 2016. YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights. 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) (2016), 236--241.

[2]

Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A Small-footprint High-throughput Accelerator for Ubiquitous Machine-learning. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). ACM, New York, NY, USA, 269--284.

Digital Library

[3]

Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2016. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits 52, 1 (2016), 127--138.

[4]

Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, and Vivienne Sze. 2018. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices. arXiv preprint arXiv:1807.07928 (2018).

[5]

Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830 (2016).

[6]

Li Du, Yuan Du, Yilei Li, Junjie Su, Yen-Cheng Kuan, Chun-Chen Liu, and Mau-Chung Frank Chang. 2017. A reconfigurable streaming deep convolutional neural network accelerator for Internet of Things. IEEE Transactions on Circuits and Systems I: Regular Papers 65, 1 (2017), 198--208.

[7]

Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17).

Digital Library

[8]

Richard W Hamming. 1950. Error detecting and error correcting codes. The Bell system technical journal 29, 2 (1950), 147--160.

[9]

Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).

[10]

Kim Hazelwood, Sarah Bird, David Brooks, Soumith Chintala, Utku Diril, Dmytro Dzhulgakov, Mohamed Fawzy, Bill Jia, Yangqing Jia, Aditya Kalro, et al. 2018. Applied machine learning at facebook: A datacenter infrastructure perspective. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 620--629.

[11]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[12]

Yihui He, Xiangyu Zhang, and Jian Sun. 2017. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision. 1389--1397.

[13]

Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).

[14]

Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size. arXiv preprint arXiv:1602.07360 (2016).

[15]

Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, and et al. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17).

[16]

Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. Technical Report. Citeseer.

[17]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature 521, 7553 (2015), 436.

[18]

C. E. Miller, A. W. Tucker, and R. A. Zemlin. 1960. Integer Programming Formulation of Traveling Salesman Problems. J. ACM 7, 4 (Oct. 1960).

Digital Library

[19]

Bert Moons and Marian Verhelst. 2016. A 0.3--2.6 TOPS/W precision-scalable processor for real-time large-scale ConvNets. In 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits).

[20]

Eunhyeok Park, Dongyoung Kim, and Sungjoo Yoo. 2018. Energy-efficient Neural Network Accelerator Based on Outlier-aware Low-precision Computation. In Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA '18).

Digital Library

[21]

Jan M. Rabaey, Anantha Chandrakasan, and Borivoje Nikolic. 2008. Digital Integrated Circuits (3rd ed.). Prentice Hall Press, Upper Saddle River, NJ, USA.

[22]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510--4520.

[23]

Hardik Sharma, Jongse Park, Naveen Suda, Liangzhen Lai, Benson Chau, Vikas Chandra, and Hadi Esmaeilzadeh. 2018. Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural networks. In Proceedings of the 45th Annual International Symposium on Computer Architecture. IEEE Press, 764--775.

Digital Library

[24]

V. Sze, Y. Chen, T. Yang, and J. S. Emer. 2017. Efficient Processing of Deep Neural Networks: A Tutorial and Survey. Proc. IEEE 105, 12 (2017).

[25]

Carole-Jean Wu, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan, Kim Hazelwood, Eldad Isaac, Yangqing Jia, Bill Jia, et al. 2019. Machine learning at facebook: Understanding inference at the edge. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 331--344.

[26]

Yundong Zhang, Naveen Suda, Liangzhen Lai, and Vikas Chandra. 2017. Hello edge: Keyword spotting on microcontrollers. arXiv preprint arXiv:1711.07128 (2017).

Cited By

Jannesar NAkbarzadeh-Sherbaf KSafari SVahabie A(2024)SSTE: Syllable-Specific Temporal Encoding to FORCE-learn audio sequences with an associative memory approachNeural Networks10.1016/j.neunet.2024.106368177(106368)Online publication date: Sep-2024
https://doi.org/10.1016/j.neunet.2024.106368

Index Terms

Improving Efficiency in Neural Network Accelerator using Operands Hamming Distance Optimization
1. Computer systems organization
  1. Architectures
2. Computing methodologies
  1. Artificial intelligence
  2. Machine learning

Recommendations

Improving XOR-Dominated Circuits by Exploiting Dependencies between Operands
ASP-DAC '07: Proceedings of the 2007 Asia and South Pacific Design Automation Conference

Logic synthesis has made impressive progress in the last decade and has pervaded digital design replacing almost universally manual techniques. A remarkable exception is computer arithmetic and datapath design, where designers still rely mostly on well ...
A Two's Complement Array Multiplier Using True Values of the Operands

A new algorithm for implementing the two's complement multiplication of an m n bit number is described. By interpreting certain positive partial product bits as negative, a parallel array is developed which has the advantage of using only one type of ...
Reliable Floating-Point Arithmetic Algorithms for Error-Coded Operands

Reliable floating-point arithmetic is vital for dependable computing systems. It is also important for future high-density VLSI realizations that are vulnerable to soft-errors. However, the direct checking of floating-point arithmetic is still an open ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPDAC '21: Proceedings of the 26th Asia and South Pacific Design Automation Conference

January 2021

930 pages

ISBN:9781450379991

DOI:10.1145/3394885

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE CAS
IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 January 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ASPDAC '21

Sponsor:

SIGDA

ASPDAC '21: 26th Asia and South Pacific Design Automation Conference

January 18 - 21, 2021

Tokyo, Japan

Acceptance Rates

ASPDAC '21 Paper Acceptance Rate 111 of 368 submissions, 30%;

Overall Acceptance Rate 466 of 1,454 submissions, 32%

Upcoming Conference

ASPDAC '25

January 20 - 23, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
110
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jannesar NAkbarzadeh-Sherbaf KSafari SVahabie A(2024)SSTE: Syllable-Specific Temporal Encoding to FORCE-learn audio sequences with an associative memory approachNeural Networks10.1016/j.neunet.2024.106368177(106368)Online publication date: Sep-2024
https://doi.org/10.1016/j.neunet.2024.106368

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents