Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3394885.3446242acmconferencesArticle/Chapter ViewAbstractPublication PagesaspdacConference Proceedingsconference-collections
research-article

Improving Efficiency in Neural Network Accelerator using Operands Hamming Distance Optimization

Published: 29 January 2021 Publication History

Abstract

Neural network accelerator is a key enabler for the on-device AI inference, for which energy efficiency is an important metric. The datapath energy, including the computation energy and the data movement energy among the arithmetic units, claims a significant part of the total accelerator energy. By revisiting the basic physics of the arithmetic logic circuits, we show that the datapath energy is highly correlated with the bit flips when streaming the input operands into the arithmetic units, defined as the hamming distance (HD) of the input operand matrices. Based on the insight, we propose a post-training optimization algorithm and a HD-aware training algorithm to co-design and co-optimize the accelerator and the network synergistically. The experimental results based on post-layout simulation with MobileNetV2 demonstrate on average 2.85x datapath energy reduction and up to 8.51x datapath energy reduction for certain layers.

References

[1]
Renzo Andri, Lukas Cavigelli, Davide Rossi, and Luca Benini. 2016. YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights. 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) (2016), 236--241.
[2]
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A Small-footprint High-throughput Accelerator for Ubiquitous Machine-learning. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). ACM, New York, NY, USA, 269--284.
[3]
Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2016. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits 52, 1 (2016), 127--138.
[4]
Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, and Vivienne Sze. 2018. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices. arXiv preprint arXiv:1807.07928 (2018).
[5]
Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830 (2016).
[6]
Li Du, Yuan Du, Yilei Li, Junjie Su, Yen-Cheng Kuan, Chun-Chen Liu, and Mau-Chung Frank Chang. 2017. A reconfigurable streaming deep convolutional neural network accelerator for Internet of Things. IEEE Transactions on Circuits and Systems I: Regular Papers 65, 1 (2017), 198--208.
[7]
Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17).
[8]
Richard W Hamming. 1950. Error detecting and error correcting codes. The Bell system technical journal 29, 2 (1950), 147--160.
[9]
Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).
[10]
Kim Hazelwood, Sarah Bird, David Brooks, Soumith Chintala, Utku Diril, Dmytro Dzhulgakov, Mohamed Fawzy, Bill Jia, Yangqing Jia, Aditya Kalro, et al. 2018. Applied machine learning at facebook: A datacenter infrastructure perspective. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 620--629.
[11]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[12]
Yihui He, Xiangyu Zhang, and Jian Sun. 2017. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision. 1389--1397.
[13]
Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).
[14]
Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size. arXiv preprint arXiv:1602.07360 (2016).
[15]
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, and et al. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17).
[16]
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. Technical Report. Citeseer.
[17]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature 521, 7553 (2015), 436.
[18]
C. E. Miller, A. W. Tucker, and R. A. Zemlin. 1960. Integer Programming Formulation of Traveling Salesman Problems. J. ACM 7, 4 (Oct. 1960).
[19]
Bert Moons and Marian Verhelst. 2016. A 0.3--2.6 TOPS/W precision-scalable processor for real-time large-scale ConvNets. In 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits).
[20]
Eunhyeok Park, Dongyoung Kim, and Sungjoo Yoo. 2018. Energy-efficient Neural Network Accelerator Based on Outlier-aware Low-precision Computation. In Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA '18).
[21]
Jan M. Rabaey, Anantha Chandrakasan, and Borivoje Nikolic. 2008. Digital Integrated Circuits (3rd ed.). Prentice Hall Press, Upper Saddle River, NJ, USA.
[22]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510--4520.
[23]
Hardik Sharma, Jongse Park, Naveen Suda, Liangzhen Lai, Benson Chau, Vikas Chandra, and Hadi Esmaeilzadeh. 2018. Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural networks. In Proceedings of the 45th Annual International Symposium on Computer Architecture. IEEE Press, 764--775.
[24]
V. Sze, Y. Chen, T. Yang, and J. S. Emer. 2017. Efficient Processing of Deep Neural Networks: A Tutorial and Survey. Proc. IEEE 105, 12 (2017).
[25]
Carole-Jean Wu, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan, Kim Hazelwood, Eldad Isaac, Yangqing Jia, Bill Jia, et al. 2019. Machine learning at facebook: Understanding inference at the edge. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 331--344.
[26]
Yundong Zhang, Naveen Suda, Liangzhen Lai, and Vikas Chandra. 2017. Hello edge: Keyword spotting on microcontrollers. arXiv preprint arXiv:1711.07128 (2017).

Cited By

View all
  • (2024)SSTE: Syllable-Specific Temporal Encoding to FORCE-learn audio sequences with an associative memory approachNeural Networks10.1016/j.neunet.2024.106368177(106368)Online publication date: Sep-2024

Index Terms

  1. Improving Efficiency in Neural Network Accelerator using Operands Hamming Distance Optimization

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        ASPDAC '21: Proceedings of the 26th Asia and South Pacific Design Automation Conference
        January 2021
        930 pages
        ISBN:9781450379991
        DOI:10.1145/3394885
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 29 January 2021

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Hamming distance optimization
        2. Neural network accelerator
        3. datapath efficiency

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        ASPDAC '21
        Sponsor:

        Acceptance Rates

        ASPDAC '21 Paper Acceptance Rate 111 of 368 submissions, 30%;
        Overall Acceptance Rate 466 of 1,454 submissions, 32%

        Upcoming Conference

        ASPDAC '25
        January 20 - 23, 2025
        Tokyo , Japan

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)10
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 12 Nov 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)SSTE: Syllable-Specific Temporal Encoding to FORCE-learn audio sequences with an associative memory approachNeural Networks10.1016/j.neunet.2024.106368177(106368)Online publication date: Sep-2024

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media