research-article

An Energy-Efficient Architecture for Binary Weight Convolutional Neural Networks

Authors:

Zhongfeng WangAuthors Info & Claims

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Volume 26, Issue 2

Pages 280 - 293

https://doi.org/10.1109/TVLSI.2017.2767624

Published: 01 February 2018 Publication History

Abstract

Binary weight convolutional neural networks (BCNNs) can achieve near state-of-the-art classification accuracy and have far less computation complexity compared with traditional CNNs using high-precision weights. Due to their binary weights, BCNNs are well suited for vision-based Internet-of-Things systems being sensitive to power consumption. BCNNs make it possible to achieve very high throughput with moderate power dissipation. In this paper, an energy-efficient architecture for BCNNs is proposed. It fully exploits the binary weights and other hardware-friendly characteristics of BCNNs. A judicious processing schedule is proposed so that off-chip I/O access is minimized and activations are maximally reused. To significantly reduce the critical path delay, we introduce optimized compressor trees and approximate binary multipliers with two novel compensation schemes. The latter is able to save significant hardware resource, and almost no computation accuracy is compromised. Taking advantage of error resiliency of BCNNs, an innovative approximate adder is developed, which significantly reduces the silicon area and data path delay. Thorough error analysis and extensive experimental results on several data sets show that the approximate adders in the data path cause negligible accuracy loss. Moreover, algorithmic transformations for certain layers of BCNNs and a memory-efficient quantization scheme are incorporated to further reduce the energy cost and on-chip storage requirement. Finally, the proposed BCNN hardware architecture is implemented with the SMIC 130-nm technology. The postlayout results demonstrate that our design can achieve an energy efficiency over 2.0TOp/s/W when scaled to 65 nm, which is more than two times better than the prior art.

References

[1]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ ImageNet classification with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst., 2012, pp. 1097–1105.

Digital Library

[2]

S. Ji, W. Xu, M. Yang, and K. Yu, “ 3D convolutional neural networks for human action recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. Volume 35, no. Issue 1, pp. 221–231, 2013.

Digital Library

[3]

O. Abdel-Hamid, A.-R. Mohamed, H. Jiang, and G. Penn, “ Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Mar. 2012, pp. 4277–4280.

[4]

K. Simonyan and A. Zisserman, “ Very deep convolutional networks for large-scale image recognition,” CoRR, vol. Volume abs/1409.1556, 2014. {Online}. Available: http://arxiv.org/abs/1409.1556

[5]

C. Szegedy, “ Going deeper with convolutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 1–9.

[6]

K. He, X. Zhang, S. Ren, and J. Sun, “ Identity mappings in deep residual networks,” CoRR, vol. Volume abs/1603.05027, 2016. {Online}. Available: http://arxiv.org/abs/1603.05027

[7]

L. Cavigelli, M. Magno, and L. Benini, “ Accelerating real-time embedded scene labeling with convolutional networks,” in Proc. 52nd Annu. Design Autom. Conf., 2015, Art. no. .

Digital Library

[8]

Y.-H. Chen, J. Emer, and V. Sze, “ Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks,” in Proc. Int. Symp. Comput. Archit., 2016, pp. 367–379.

Digital Library

[9]

Z. Du, “ ShiDianNao: Shifting vision processing closer to the sensor,” ACM SIGARCH Comput. Archit. News, vol. Volume 43, no. Issue 3, pp. 92–104, 2015.

Digital Library

[10]

L. Cavigelli, D. Gschwend, C. Mayer, S. Willi, B. Muheim, and L. Benini, “ Origami: A convolutional network accelerator,” in Proc. 25th Ed. Great Lakes Symp. (VLSI), 2015, pp. 199–204.

Digital Library

[11]

Y. Wang, “ Low power convolutional neural networks on a chip,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2016, pp. 129–132.

[12]

F. Sun, J. Lin, and Z. Wang, “ Intra-layer nonuniform quantization for deep convolutional neural network,” in Proc. Int. Conf. Wireless Commun. Signal Process. (WCSP), 2016, pp. 1–5.

[13]

S. Han, H. Mao, and W. J. Dally, “ Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” CoRR, vol. Volume abs/1510.00149, 2015. {Online}. Available: http://arxiv.org/abs/1510.00149

[14]

F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J. Dally, and K. Keutzer, “ SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size,” CoRR, vol. Volume abs/1602.07360, 2016. {Online}. Available: http://arxiv.org/abs/1602.07360

[15]

S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, “ Deep learning with limited numerical precision,” in Proc. 32nd Int. Conf. Mach. Learn. (ICML), 2015, pp. 1737–1746.

Digital Library

[16]

M. Courbariaux, Y. Bengio, and J.-P. David, “ BinaryConnect: Training deep neural networks with binary weights during propagations,” in Proc. Adv. Neural Inf. Process. Syst., 2015, pp. 3123–3131.

Digital Library

[17]

I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “ Binarized neural networks,” in Proc. Adv. Neural Inf. Process. Syst., 2016, pp. 4107–4115.

Digital Library

[18]

M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “ XNOR-Net: ImageNet classification using binary convolutional neural networks,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 525–542.

[19]

R. Andri, L. Cavigelli, D. Rossi, and L. Benini, “ YodaNN: An ultra-low power convolutional neural network accelerator based on binary weights,” in Proc. IEEE Comput. Soc. Annu. Symp. VLSI (ISVLSI), Jul. 2016, pp. 236–241.

[20]

Y. Umuroglu, “ FINN: A framework for fast, scalable binarized neural network inference,” in Proc. ACM/SIGDA Int. Symp. Field-Programm. Gate Arrays, 2017, pp. 65–74.

Digital Library

[21]

S. Ioffe and C. Szegedy, “ Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proc. 32nd Int. Conf. Mach. Learn., 2015, pp. 448–456.

Digital Library

[22]

C. Farabet, C. Poulet, J. Y. Han, and Y. LeCun, “ CNP: An FPGA-based processor for convolutional networks,” in Proc. Int. Conf. Field Program. Logic Appl., Aug. 2009, pp. 32–37.

[23]

C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and Y. LeCun, “ NeuFlow: A runtime reconfigurable dataflow processor for vision,” in Proc. IEEE CVPRW, Jun. 2011, pp. 109–116.

[24]

V. Gokhale, J. Jin, A. Dundar, and B. Martini, “ A 240 G-ops/s mobile coprocessor for deep neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, Jan. 2014, pp. 696–701.

Digital Library

[25]

P. Merolla, R. Appuswamy, J. V. Arthur, S. K. Esser, and D. S. Modha, “ Deep neural networks are robust to weight binarization and other non-linear distortions,” CoRR, vol. Volume abs/1606.01981, 2016. {Online}. Available: http://arxiv.org/abs/1606.01981

[26]

D. Lin, S. Talathi, and S. Annapureddy, “ Fixed point quantization of deep convolutional networks,” in Proc. Int. Conf. Mach. Learn., 2016, pp. 2849–2858.

Digital Library

[27]

J. Wu, C. Leng, Y. Wang, Q. Hu, and J. Cheng, “ Quantized convolutional neural networks for mobile devices,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2016, pp. 4820–4828.

[28]

S.-F. Hsiao, M.-R. Jiang, and J.-S. Yeh, “ Design of high-speed low-power 3–2 counter and 4–2 compressor for fast multipliers,” Electron. Lett., vol. Volume 34, no. Issue 4, pp. 341–343, 1998.

[29]

H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie, and C. Lucas, “ Bio-inspired imprecise computational blocks for efficient VLSI implementation of soft-computing applications,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. Volume 57, no. Issue 4, pp. 850–862, 2010.

Digital Library

[30]

N. Zhu, W. L. Goh, W. Zhang, K. S. Yeo, and Z. H. Kong, “ Design of low-power high-speed truncation-error-tolerant adder and its application in digital signal processing,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. Volume 18, no. Issue 8, pp. 1225–1229, 2010.

Digital Library

[31]

S.-L. Lu, “ Speeding up processing with approximation circuits,” Computer, vol. Volume 37, no. Issue 3, pp. 67–73, 2004.

Digital Library

[32]

Y. Kim, Y. Zhang, and P. Li, “ An energy efficient approximate adder with carry skip for error resilient neuromorphic VLSI systems,” in Proc. Int. Conf. Comput.-Aided Design, Nov. 2013, pp. 130–137.

Digital Library

[33]

X. Wu, Y. Wu, and Y. Zhao, “ Binarized neural networks on the ImageNet classification task,” CoRR, vol. Volume abs/1604.03058, 2016. {Online}. Available: http://arxiv.org/abs/1604.03058

[34]

H. Alemdar, N. Caldwell, V. Leroy, A. Prost-Boucle, and F. Pàtrot, “ Ternary neural networks for resource-efficient AI applications,” in Proc. Int. Joint Conf. Neural Netw. (IJCNN), 2017, pp. 2547–2554.

[35]

G. Venkatesh, E. Nurvitadhi, and D. Marr, “ Accelerating deep convolutional networks using low-precision and sparsity,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), 2017, pp. 2861–2865.

Cited By

Zhou KQiu K(2024)REC: REtime Convolutional Layers to Fully Exploit Harvested Energy for ReRAM-based CNN AcceleratorsACM Transactions on Embedded Computing Systems10.1145/365259323:6(1-25)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3652593
Li TMa YYoshikawa KEndoh T(2023)Hybrid Signed Convolution Module With Unsigned Divide-and-Conquer Multiplier for Energy-Efficient STT-MRAM-Based AI AcceleratorIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.324509931:7(1078-1082)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1109/TVLSI.2023.3245099
Ramezanzad ARezaei MNikmehr HKalbasi M(2023)Real-time approximate and combined 2D convolvers for FPGA-based image processingThe Journal of Supercomputing10.1007/s11227-023-05377-y79:16(18910-18946)Online publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1007/s11227-023-05377-y
Show More Cited By

Recommendations

Efficient densely connected convolutional neural networks
Highlights
- Proposed two efficient densely connected ConvNets, DesneDsc and Dense2Net.
- ...
Graphical abstract

Display Omitted

Abstract
Recent works have shown that convolutional neural networks (CNNs) are parameter redundant, which limits the application of CNNs in Mobile devices with limited memory and computational resources. In this paper, two novel and efficient ...
A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks (Abstract Only)
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

FPGA-based hardware accelerator for convolutional neural networks (CNNs) has obtained great attentions due to its higher energy efficiency than GPUs. However, it has been a challenge for FPGA-based solutions to achieve a higher throughput than GPU ...
Specializing CGRAs for Light-Weight Convolutional Neural Networks
Deep neural network (DNN) processing units, or DPUs, are one of the most energy-efficient platforms for DNN applications. However, designing new DPUs for every DNN model is very costly and time consuming. In this article, we propose an alternative ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Very Large Scale Integration (VLSI) Systems

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Volume 26, Issue 2

February 2018

200 pages

ISSN:1063-8210

Issue’s Table of Contents

Copyright © 2018.

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 February 2018

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhou KQiu K(2024)REC: REtime Convolutional Layers to Fully Exploit Harvested Energy for ReRAM-based CNN AcceleratorsACM Transactions on Embedded Computing Systems10.1145/365259323:6(1-25)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3652593
Li TMa YYoshikawa KEndoh T(2023)Hybrid Signed Convolution Module With Unsigned Divide-and-Conquer Multiplier for Energy-Efficient STT-MRAM-Based AI AcceleratorIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.324509931:7(1078-1082)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1109/TVLSI.2023.3245099
Ramezanzad ARezaei MNikmehr HKalbasi M(2023)Real-time approximate and combined 2D convolvers for FPGA-based image processingThe Journal of Supercomputing10.1007/s11227-023-05377-y79:16(18910-18946)Online publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1007/s11227-023-05377-y
Rehman STu SWaqas MHuang YRehman OAhmad BAhmad S(2022)Unsupervised pre-trained filter learning approach for efficient convolution neural networkNeurocomputing10.1016/j.neucom.2019.06.084365:C(171-190)Online publication date: 21-Apr-2022
https://dl.acm.org/doi/10.1016/j.neucom.2019.06.084
Alam SGarland JGregg D(2021)Low-precision Logarithmic Number SystemsACM Transactions on Architecture and Code Optimization10.1145/346169918:4(1-25)Online publication date: 17-Jul-2021
https://dl.acm.org/doi/10.1145/3461699
Hosseini MMohsenin T(2021)Binary Precision Neural Network Manycore AcceleratorACM Journal on Emerging Technologies in Computing Systems10.1145/342313617:2(1-27)Online publication date: 5-Apr-2021
https://dl.acm.org/doi/10.1145/3423136
Murovič TTrost A(2021)Genetically optimized massively parallel binary neural networks for intrusion detection systemsComputer Communications10.1016/j.comcom.2021.07.015179:C(1-10)Online publication date: 29-Dec-2021
https://dl.acm.org/doi/10.1016/j.comcom.2021.07.015
Panwar MSri Hari NBiswas DAcharyya A(2021)M2DA: A Low-Complex Design Methodology for Convolutional Neural Network Exploiting Data Symmetry and RedundancyCircuits, Systems, and Signal Processing10.1007/s00034-020-01534-340:3(1542-1567)Online publication date: 1-Mar-2021
https://dl.acm.org/doi/10.1007/s00034-020-01534-3
Murovič TTrost A(2020)Resource-optimized combinational binary neural network circuitsMicroelectronics Journal10.1016/j.mejo.2020.10472497:COnline publication date: 1-Mar-2020
https://dl.acm.org/doi/10.1016/j.mejo.2020.104724
Mirza DDangwal DSherwood T(2019)PyRTL in Early Undergraduate ResearchProceedings of the Workshop on Computer Architecture Education10.1145/3338698.3338890(1-8)Online publication date: 22-Jun-2019
https://dl.acm.org/doi/10.1145/3338698.3338890
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents