Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3453688.3461738acmconferencesArticle/Chapter ViewAbstractPublication PagesglsvlsiConference Proceedingsconference-collections
research-article

3U-EdgeAI: Ultra-Low Memory Training, Ultra-Low Bitwidth Quantization, and Ultra-Low Latency Acceleration

Published: 22 June 2021 Publication History

Abstract

The deep neural network (DNN) based AI applications on the edge require both low-cost computing platforms and high-quality services. However, the limited memory, computing resources, and power budget of the edge devices constrain the effectiveness of the DNN algorithms. Developing edge-oriented AI algorithms and implementations (e.g., accelerators) is challenging. In this paper, we summarize our recent efforts for efficient on-device AI development from three aspects, including both training and inference. First, we present on-device training with ultra-low memory usage. We propose a novel rank-adaptive tensor-based tensorized neural network model, which offers orders-of-magnitude memory reduction during training. Second, we introduce an ultra-low bitwidth quantization method for DNN model compression, achieving the state-of-the-art accuracy under the same compression ratio. Third, we introduce an ultra-low latency DNN accelerator design, practicing the software/hardware co-design methodology. This paper emphasizes the importance and efficacy of training, quantization and accelerator design, and calls for more research breakthroughs in the area for AI on the edge.

Supplemental Material

MP4 File
presentation video

References

[1]
H. Alemdar et al. 2017. Ternary neural networks for resource-efficient AI applications. In IJCNN.
[2]
Yoshua Bengio et al. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013).
[3]
Giuseppe G Calvi et al. 2019. Tucker tensor layer in fully connected neural networks. arXiv preprint arXiv:1903.06133 (2019).
[4]
Yao Chen et al. 2019. Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs. In FPGA.
[5]
Yao Chen et al. 2019. T-DLA: An Open-source Deep Learning Acceleratorfor Ternarized DNN Models on Embedded FPGA. ISVLSI (2019).
[6]
Gong Cheng et al. 2019. ""L2Q: An Ultra-Low Loss Quantization Method for DNN Compression. (2019).
[7]
M. Courbariaux et al. 2016. Binarynet: training deep neural networks with weights and activations constrained to +1 or -1. arXiv (2016).
[8]
Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in neural information processing systems. 3123--3131.
[9]
Timur Garipov et al. 2016. Ultimate tensorization: compressing convolutional and fc layers alike. arXiv preprint arXiv:1611.03214 (2016).
[10]
Cheng Gong et al. 2020. VecQ: Minimal loss dnn model compression with vectorized weight quantization. IEEE Trans. Comput. (2020).
[11]
Philipp Gysel and other. 2016. Hardware-oriented Approximation of Convolutional Neural Networks. CoRR abs/1604.03168 (2016).
[12]
Song Han et al. 2016. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. (2016).
[13]
Cong Hao et al. 2019. FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge. In DAC.
[14]
C. Hao et al. 2021. Enabling Design Methodologies and Future Trends for Edge AI: Specialization and Co-design. IEEE Design Test (2021), 1--1.
[15]
Cole Hawkins, Xing Liu, and Zheng Zhang. 2020. Towards Compact Neural Networks via End-to-End Training: A Bayesian Tensor Approach with Automatic Rank Determination. arXiv preprint arXiv:2010.08689 (2020).
[16]
Cole Hawkins and Zheng Zhang. 2019. Bayesian Tensorized Neural Networks with Automatic Rank Selection. arXiv preprint arXiv:1905.10478 (2019).
[17]
Christopher J Hillar and Lek-Heng Lim. 2013. Most tensor problems are NP-hard. Journal of the ACM (JACM) 60, 6 (2013), 45.
[18]
Matthew D Hoffman, David M Blei, Chong Wang, and John Paisley. 2013. Stochastic variational inference. The Journal of Machine Learning Research 14, 1 (2013), 1303--1347.
[19]
Valentin Khrulkov, Oleksii Hrinchuk, Leyla Mirvakhabova, and Ivan Oseledets. 2019. Tensorized embedding layers for efficient model compression. arXiv preprint arXiv:1901.10787 (2019).
[20]
Tamara G Kolda and Brett W Bader. 2009. Tensor decompositions and applications. SIAM review 51, 3 (2009), 455--500.
[21]
Vadim Lebedev, Yaroslav Ganin, Maksim Rakhuba, Ivan Oseledets, and Victor Lempitsky. 2014. Speeding-up convolutional neural networks using fine-tuned cp-decomposition. arXiv preprint arXiv:1412.6553 (2014).
[22]
Cong Leng et al. 2018. Extremely Low Bit Neural Network: Squeeze the Last Bit Out With ADMM. In AAAI.
[23]
Cong Leng et al. 2018. Extremely low bit neural network: Squeeze the last bit out with admm. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
[24]
F Li and B Liu. 2016. Ternary Weight Networks. Proc. NIPS Workshop Efficient Methods Deep Neural Network (2016).
[25]
Qiang Liu and Dilin Wang. 2016. Stein variational Gradient descent: a general purpose Bayesian inference algorithm. In Proceedings of the 30th International Conference on Neural Information Processing Systems. 2378--2386.
[26]
S. Liu et al. 2011. Real-time object tracking system on FPGAs. In SAAHPC.
[27]
H. Nakahara et al. 2017. A fully connected layer elimination for a binarizec convolutional neural network on an FPGA. In FPL.
[28]
Maxim Naumov et al. 2019. Deep learning recommendation model for personalization and recommendation systems. arXiv preprint arXiv:1906.00091 (2019).
[29]
Alexander Novikov et al. 2015. Tensorizing neural networks. In Advances in neural information processing systems. 442--450.
[30]
Ivan V Oseledets. 2011. Tensor-train decomposition. SIAM Journal on Scientific Computing 33, 5 (2011), 2295--2317.
[31]
Jinhwan Park and Wonyong Sung. 2016. FPGA Based Implementation of Deep Neural Networks Using On-chip Memory Only. CoRR (2016).
[32]
A. Prost-Boucle et al. 2017. Scalable high-performance architecture for convolutional ternary neural networks on FPGA. In FPL.
[33]
Jiantao Qiu et al. 2016. Going deeper with embedded fpga platform for convolutional neural network. In FPGA. 26--35.
[34]
Surat Teerapittayanon et al. 2017. Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices. In ICDCS. 328--339.
[35]
Andros Tjandra et al. 2017. Compressing recurrent neural network with tensor train. In 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, 4451--4458.
[36]
Yaman Umuroglu et al. 2017. FINN: A Framework for Fast, Scalable Binarized Neural Network Inference. In FPGA.
[37]
Peisong Wang et al. 2018. Two-step quantization for low-bit neural networks. In CVPR.
[38]
Yue Wang et al. 2019. E2-Train: Training State-of-the-art CNNs with Over 80% Energy Savings. In NeurIPS.
[39]
Miao Yin et al. 2020. Compressing Recurrent Neural Networks Using Hierarchical Tucker Tensor Decomposition. arXiv preprint arXiv:2005.04366 (2020).
[40]
Kaiqi Zhang et al. 2021. On-FPGA Training with Ultra Memory Reduction: A Low-Precision Tensor Method. ICLR Workshop of Hardware Aware Efficient Training (2021).
[41]
Xiaofan Zhang et al. 2017. Machine learning on FPGAs to face the IoT revolution. In ICCAD.
[42]
Xiaofan Zhang et al. 2018. DNNBuilder: an automated tool for building high-performance Dnn hardware accelerators for FPGAs. In ICCAD.
[43]
Ritchie Zhao et al. 2017. Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs. In FPGA.
[44]
Mingyi Zhou et al. 2019. Tensor rank learning in CP decomposition via convolutional neural network. Signal Processing: Image Communication 73 (2019), 12--21.

Cited By

View all
  • (2024)Winols: A Large-Tiling Sparse Winograd CNN Accelerator on FPGAsACM Transactions on Architecture and Code Optimization10.1145/364368221:2(1-24)Online publication date: 23-Mar-2024
  • (2023)A Novel Low-Power Compression Scheme for Systolic Array-Based Deep Learning AcceleratorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319803642:4(1085-1098)Online publication date: 1-Apr-2023
  • (2023)Enabling All In-Edge Deep Learning: A Literature ReviewIEEE Access10.1109/ACCESS.2023.323476111(3431-3460)Online publication date: 2023
  • Show More Cited By

Index Terms

  1. 3U-EdgeAI: Ultra-Low Memory Training, Ultra-Low Bitwidth Quantization, and Ultra-Low Latency Acceleration

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      GLSVLSI '21: Proceedings of the 2021 Great Lakes Symposium on VLSI
      June 2021
      504 pages
      ISBN:9781450383936
      DOI:10.1145/3453688
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 June 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. dnn acceleration
      2. dnn quantization
      3. edge ai
      4. on-device training

      Qualifiers

      • Research-article

      Data Availability

      Conference

      GLSVLSI '21
      Sponsor:
      GLSVLSI '21: Great Lakes Symposium on VLSI 2021
      June 22 - 25, 2021
      Virtual Event, USA

      Acceptance Rates

      Overall Acceptance Rate 312 of 1,156 submissions, 27%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)23
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 10 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Winols: A Large-Tiling Sparse Winograd CNN Accelerator on FPGAsACM Transactions on Architecture and Code Optimization10.1145/364368221:2(1-24)Online publication date: 23-Mar-2024
      • (2023)A Novel Low-Power Compression Scheme for Systolic Array-Based Deep Learning AcceleratorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319803642:4(1085-1098)Online publication date: 1-Apr-2023
      • (2023)Enabling All In-Edge Deep Learning: A Literature ReviewIEEE Access10.1109/ACCESS.2023.323476111(3431-3460)Online publication date: 2023
      • (2022)Layer-Wise Data-Free CNN Compression2022 26th International Conference on Pattern Recognition (ICPR)10.1109/ICPR56361.2022.9956237(2019-2026)Online publication date: 21-Aug-2022
      • (2022)A Survey of State-of-the-art on Edge Computing: Theoretical Models, Technologies, Directions, and Development PathsIEEE Access10.1109/ACCESS.2022.317610610(54038-54063)Online publication date: 2022

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media