Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

An Energy-Efficient Architecture for Binary Weight Convolutional Neural Networks

Published: 01 February 2018 Publication History

Abstract

Binary weight convolutional neural networks (BCNNs) can achieve near state-of-the-art classification accuracy and have far less computation complexity compared with traditional CNNs using high-precision weights. Due to their binary weights, BCNNs are well suited for vision-based Internet-of-Things systems being sensitive to power consumption. BCNNs make it possible to achieve very high throughput with moderate power dissipation. In this paper, an energy-efficient architecture for BCNNs is proposed. It fully exploits the binary weights and other hardware-friendly characteristics of BCNNs. A judicious processing schedule is proposed so that off-chip I/O access is minimized and activations are maximally reused. To significantly reduce the critical path delay, we introduce optimized compressor trees and approximate binary multipliers with two novel compensation schemes. The latter is able to save significant hardware resource, and almost no computation accuracy is compromised. Taking advantage of error resiliency of BCNNs, an innovative approximate adder is developed, which significantly reduces the silicon area and data path delay. Thorough error analysis and extensive experimental results on several data sets show that the approximate adders in the data path cause negligible accuracy loss. Moreover, algorithmic transformations for certain layers of BCNNs and a memory-efficient quantization scheme are incorporated to further reduce the energy cost and on-chip storage requirement. Finally, the proposed BCNN hardware architecture is implemented with the SMIC 130-nm technology. The postlayout results demonstrate that our design can achieve an energy efficiency over 2.0TOp/s/W when scaled to 65 nm, which is more than two times better than the prior art.

References

[1]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ ImageNet classification with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst., 2012, pp. 1097–1105.
[2]
S. Ji, W. Xu, M. Yang, and K. Yu, “ 3D convolutional neural networks for human action recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. Volume 35, no. Issue 1, pp. 221–231, 2013.
[3]
O. Abdel-Hamid, A.-R. Mohamed, H. Jiang, and G. Penn, “ Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Mar. 2012, pp. 4277–4280.
[4]
K. Simonyan and A. Zisserman, “ Very deep convolutional networks for large-scale image recognition,” CoRR, vol. Volume abs/1409.1556, 2014. {Online}. Available: http://arxiv.org/abs/1409.1556
[5]
C. Szegedy, “ Going deeper with convolutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 1–9.
[6]
K. He, X. Zhang, S. Ren, and J. Sun, “ Identity mappings in deep residual networks,” CoRR, vol. Volume abs/1603.05027, 2016. {Online}. Available: http://arxiv.org/abs/1603.05027
[7]
L. Cavigelli, M. Magno, and L. Benini, “ Accelerating real-time embedded scene labeling with convolutional networks,” in Proc. 52nd Annu. Design Autom. Conf., 2015, Art. no. .
[8]
Y.-H. Chen, J. Emer, and V. Sze, “ Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks,” in Proc. Int. Symp. Comput. Archit., 2016, pp. 367–379.
[9]
Z. Du, “ ShiDianNao: Shifting vision processing closer to the sensor,” ACM SIGARCH Comput. Archit. News, vol. Volume 43, no. Issue 3, pp. 92–104, 2015.
[10]
L. Cavigelli, D. Gschwend, C. Mayer, S. Willi, B. Muheim, and L. Benini, “ Origami: A convolutional network accelerator,” in Proc. 25th Ed. Great Lakes Symp. (VLSI), 2015, pp. 199–204.
[11]
Y. Wang, “ Low power convolutional neural networks on a chip,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2016, pp. 129–132.
[12]
F. Sun, J. Lin, and Z. Wang, “ Intra-layer nonuniform quantization for deep convolutional neural network,” in Proc. Int. Conf. Wireless Commun. Signal Process. (WCSP), 2016, pp. 1–5.
[13]
S. Han, H. Mao, and W. J. Dally, “ Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” CoRR, vol. Volume abs/1510.00149, 2015. {Online}. Available: http://arxiv.org/abs/1510.00149
[14]
F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J. Dally, and K. Keutzer, “ SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size,” CoRR, vol. Volume abs/1602.07360, 2016. {Online}. Available: http://arxiv.org/abs/1602.07360
[15]
S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, “ Deep learning with limited numerical precision,” in Proc. 32nd Int. Conf. Mach. Learn. (ICML), 2015, pp. 1737–1746.
[16]
M. Courbariaux, Y. Bengio, and J.-P. David, “ BinaryConnect: Training deep neural networks with binary weights during propagations,” in Proc. Adv. Neural Inf. Process. Syst., 2015, pp. 3123–3131.
[17]
I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “ Binarized neural networks,” in Proc. Adv. Neural Inf. Process. Syst., 2016, pp. 4107–4115.
[18]
M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “ XNOR-Net: ImageNet classification using binary convolutional neural networks,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 525–542.
[19]
R. Andri, L. Cavigelli, D. Rossi, and L. Benini, “ YodaNN: An ultra-low power convolutional neural network accelerator based on binary weights,” in Proc. IEEE Comput. Soc. Annu. Symp. VLSI (ISVLSI), Jul. 2016, pp. 236–241.
[20]
Y. Umuroglu, “ FINN: A framework for fast, scalable binarized neural network inference,” in Proc. ACM/SIGDA Int. Symp. Field-Programm. Gate Arrays, 2017, pp. 65–74.
[21]
S. Ioffe and C. Szegedy, “ Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proc. 32nd Int. Conf. Mach. Learn., 2015, pp. 448–456.
[22]
C. Farabet, C. Poulet, J. Y. Han, and Y. LeCun, “ CNP: An FPGA-based processor for convolutional networks,” in Proc. Int. Conf. Field Program. Logic Appl., Aug. 2009, pp. 32–37.
[23]
C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and Y. LeCun, “ NeuFlow: A runtime reconfigurable dataflow processor for vision,” in Proc. IEEE CVPRW, Jun. 2011, pp. 109–116.
[24]
V. Gokhale, J. Jin, A. Dundar, and B. Martini, “ A 240 G-ops/s mobile coprocessor for deep neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, Jan. 2014, pp. 696–701.
[25]
P. Merolla, R. Appuswamy, J. V. Arthur, S. K. Esser, and D. S. Modha, “ Deep neural networks are robust to weight binarization and other non-linear distortions,” CoRR, vol. Volume abs/1606.01981, 2016. {Online}. Available: http://arxiv.org/abs/1606.01981
[26]
D. Lin, S. Talathi, and S. Annapureddy, “ Fixed point quantization of deep convolutional networks,” in Proc. Int. Conf. Mach. Learn., 2016, pp. 2849–2858.
[27]
J. Wu, C. Leng, Y. Wang, Q. Hu, and J. Cheng, “ Quantized convolutional neural networks for mobile devices,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2016, pp. 4820–4828.
[28]
S.-F. Hsiao, M.-R. Jiang, and J.-S. Yeh, “ Design of high-speed low-power 3–2 counter and 4–2 compressor for fast multipliers,” Electron. Lett., vol. Volume 34, no. Issue 4, pp. 341–343, 1998.
[29]
H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie, and C. Lucas, “ Bio-inspired imprecise computational blocks for efficient VLSI implementation of soft-computing applications,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. Volume 57, no. Issue 4, pp. 850–862, 2010.
[30]
N. Zhu, W. L. Goh, W. Zhang, K. S. Yeo, and Z. H. Kong, “ Design of low-power high-speed truncation-error-tolerant adder and its application in digital signal processing,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. Volume 18, no. Issue 8, pp. 1225–1229, 2010.
[31]
S.-L. Lu, “ Speeding up processing with approximation circuits,” Computer, vol. Volume 37, no. Issue 3, pp. 67–73, 2004.
[32]
Y. Kim, Y. Zhang, and P. Li, “ An energy efficient approximate adder with carry skip for error resilient neuromorphic VLSI systems,” in Proc. Int. Conf. Comput.-Aided Design, Nov. 2013, pp. 130–137.
[33]
X. Wu, Y. Wu, and Y. Zhao, “ Binarized neural networks on the ImageNet classification task,” CoRR, vol. Volume abs/1604.03058, 2016. {Online}. Available: http://arxiv.org/abs/1604.03058
[34]
H. Alemdar, N. Caldwell, V. Leroy, A. Prost-Boucle, and F. Pàtrot, “ Ternary neural networks for resource-efficient AI applications,” in Proc. Int. Joint Conf. Neural Netw. (IJCNN), 2017, pp. 2547–2554.
[35]
G. Venkatesh, E. Nurvitadhi, and D. Marr, “ Accelerating deep convolutional networks using low-precision and sparsity,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), 2017, pp. 2861–2865.

Cited By

View all
  • (2024)REC: REtime Convolutional Layers to Fully Exploit Harvested Energy for ReRAM-based CNN AcceleratorsACM Transactions on Embedded Computing Systems10.1145/365259323:6(1-25)Online publication date: 11-Sep-2024
  • (2023)Hybrid Signed Convolution Module With Unsigned Divide-and-Conquer Multiplier for Energy-Efficient STT-MRAM-Based AI AcceleratorIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.324509931:7(1078-1082)Online publication date: 1-Jul-2023
  • (2023)Real-time approximate and combined 2D convolvers for FPGA-based image processingThe Journal of Supercomputing10.1007/s11227-023-05377-y79:16(18910-18946)Online publication date: 1-Nov-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems  Volume 26, Issue 2
February 2018
200 pages

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 February 2018

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)REC: REtime Convolutional Layers to Fully Exploit Harvested Energy for ReRAM-based CNN AcceleratorsACM Transactions on Embedded Computing Systems10.1145/365259323:6(1-25)Online publication date: 11-Sep-2024
  • (2023)Hybrid Signed Convolution Module With Unsigned Divide-and-Conquer Multiplier for Energy-Efficient STT-MRAM-Based AI AcceleratorIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.324509931:7(1078-1082)Online publication date: 1-Jul-2023
  • (2023)Real-time approximate and combined 2D convolvers for FPGA-based image processingThe Journal of Supercomputing10.1007/s11227-023-05377-y79:16(18910-18946)Online publication date: 1-Nov-2023
  • (2022)Unsupervised pre-trained filter learning approach for efficient convolution neural networkNeurocomputing10.1016/j.neucom.2019.06.084365:C(171-190)Online publication date: 21-Apr-2022
  • (2021)Low-precision Logarithmic Number SystemsACM Transactions on Architecture and Code Optimization10.1145/346169918:4(1-25)Online publication date: 17-Jul-2021
  • (2021)Binary Precision Neural Network Manycore AcceleratorACM Journal on Emerging Technologies in Computing Systems10.1145/342313617:2(1-27)Online publication date: 5-Apr-2021
  • (2021)Genetically optimized massively parallel binary neural networks for intrusion detection systemsComputer Communications10.1016/j.comcom.2021.07.015179:C(1-10)Online publication date: 29-Dec-2021
  • (2021)M2DA: A Low-Complex Design Methodology for Convolutional Neural Network Exploiting Data Symmetry and RedundancyCircuits, Systems, and Signal Processing10.1007/s00034-020-01534-340:3(1542-1567)Online publication date: 1-Mar-2021
  • (2020)Resource-optimized combinational binary neural network circuitsMicroelectronics Journal10.1016/j.mejo.2020.10472497:COnline publication date: 1-Mar-2020
  • (2019)PyRTL in Early Undergraduate ResearchProceedings of the Workshop on Computer Architecture Education10.1145/3338698.3338890(1-8)Online publication date: 22-Jun-2019
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media