research-article

Compressing RNNs to Kilobyte Budget for IoT Devices Using Kronecker Products

Authors:

Urmish Thakker,

Matthew Mattina,

Jesse BeuAuthors Info & Claims

ACM Journal on Emerging Technologies in Computing Systems (JETC), Volume 17, Issue 4

Article No.: 46, Pages 1 - 18

https://doi.org/10.1145/3440016

Published: 14 July 2021 Publication History

Abstract

Micro-controllers (MCUs) make up most of the processors in the world with widespread applicability from automobile to medical devices. The Internet of Things promises to enable these resource-constrained MCUs with machine learning algorithms to provide always-on intelligence. Many Internet of Things applications consume time-series data that are naturally suitable for recurrent neural networks (RNNs) like LSTMs and GRUs. However, RNNs can be large and difficult to deploy on these devices, as they have few kilobytes of memory. As a result, there is a need for compression techniques that can significantly compress RNNs without negatively impacting task accuracy. This article introduces a method to compress RNNs for resource-constrained environments using the Kronecker product (KP). KPs can compress RNN layers by 16× to 38× with minimal accuracy loss. By quantizing the resulting models to 8 bits, we further push the compression factor to 50×. We compare KP with other state-of-the-art compression techniques across seven benchmarks spanning five different applications and show that KP can beat the task accuracy achieved by other techniques by a large margin while simultaneously improving the inference runtime. Sometimes the KP compression mechanism can introduce an accuracy loss. We develop a hybrid KP approach to mitigate this. Our hybrid KP algorithm provides fine-grained control over the compression ratio, enabling us to regain accuracy lost during compression by adding a small number of model parameters.

References

[1]

Kaggle. 2020. Yelp Review Dataset. Retrieved August 3, 2020 from https://www.kaggle.com/yelp-dataset/yelp-dataset.

[2]

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, et al. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Retrieved June 1, 2021 from https://www.tensorflow.org/. (Software available from tensorflow.org.)

[3]

Colby R. Banbury, Vijay Janapa Reddi, Max Lam, William Fu, Amin Fazel, Jeremy Holleman, Xinyuan Huang, et al. 2020. Benchmarking TinyML systems: Challenges and direction. arxiv:cs.PF/2003.04821.

[4]

Giuseppe Giovanni Calvi, Ahmad Moniri, Mahmoud Mahfouz, Zeyang Yu, Qibin Zhao, and Danilo P. Mandic. 2019. Tucker tensor layer in fully connected neural networks. arxiv:1903.06133.

[5]

Soravit Changpinyo, Mark Sandler, and Andrey Zhmoginov. 2017. The power of sparsity in convolutional neural networks. arxiv:1702.06257.

[6]

Y. Cheng, F. X. Yu, R. S. Feris, S. Kumar, A. Choudhary, and S. Chang. 2015. An exploration of parameter redundancy in deep networks with circulant projections. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV’15). 2857–2865.

Digital Library

[7]

Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arxiv:1406.1078.

[8]

Krzysztof Choromanski, Mark Rowland, and Adrian Weller. 2017. The unreasonable effectiveness of structured random orthogonal embeddings. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). 218–227. http://dl.acm.org/citation.cfm?id=3294771.3294792.

Digital Library

[9]

Matthieu Courbariaux and Yoshua Bengio. 2016. BinaryNet: Training deep neural networks with weights and activations constrained to +1 or –1. arxiv:1602.02830.

[10]

Misha Denil, Babak Shakibi, Laurent Dinh, Marc’Aurelio Ranzato, and Nando de Freitas. 2013. Predicting parameters in deep learning. arxiv:1306.0543.

Digital Library

[11]

Caiwen Ding, Ao Ren, Geng Yuan, Xiaolong Ma, Jiayu Li, Ning Liu, Bo Yuan, and Yanzhi Wang. 2018. Structured weight matrices-based hardware accelerators in deep neural networks: FPGAs and ASICs. In Proceedings of the 2018 on Great Lakes Symposium on VLSI (GLSVLSI’18). ACM, New York, NY, 353–358.

Digital Library

[12]

Trevor Gale, Erich Elsen, and Sara Hooker. 2019. The state of sparsity in deep neural networks. arxiv:1902.09574.

[13]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press, Cambridge, MA. http://www.deeplearningbook.org.

Digital Library

[14]

Dibakar Gope, Jesse Beu, Urmish Thakker, and Matthew Mattina. 2020. Ternary MobileNets via per-layer hybrid filter banks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR’20).

[15]

Gael Guennebau and Benoit Jacob. 2009. Eigen Library. Retrieved December 21, 2018 from http://eigen.tuxfamily.org/.

[16]

Nils Y. Hammerla, Shane Halloran, and Thomas Ploetz. 2016. Deep, convolutional, and recurrent models for human activity recognition using wearables. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI’16). 1533–1540.

Digital Library

[17]

Song Han, Huizi Mao, and William J. Dally. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In Proceedings of the International Conference on Learning Representations (ICLR’16).

[18]

Qinyao He, He Wen, Shuchang Zhou, Yuxin Wu, Cong Yao, Xinyu Zhou, and Yuheng Zou. 2016. Effective quantization methods for recurrent neural networks. arxiv:1611.10176.

[19]

Qinyao He, He Wen, Shuchang Zhou, Yuxin Wu, Cong Yao, Xinyu Zhou, and Yuheng Zou. 2016. Effective quantization methods for recurrent neural networks. arxiv:1611.10176.

[20]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (Nov. 1997), 1735–1780.

Digital Library

[21]

Xueqin Huang, Urmish Thakker, Dibakar Gope, and Jesse Beu. 2020. Pushing the envelope of dynamic spatial gating technologies. In Proceedings of the 2nd International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things. ACM, New York, NY, 2126.

Digital Library

[22]

Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Quantized neural networks: Training neural networks with low precision weights and activations. arxiv:1609.07061. https://doi.org/10.1145/3417313.3429380

Digital Library

[23]

Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. Journal of Machine Learning Research 18, 1 (Jan. 2017), 6869–6898. http://dl.acm.org/citation.cfm?id=3122009.3242044.

Digital Library

[24]

J. J. Hull. 1994. A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence 16, 5 (May 1994), 550–554.

Digital Library

[25]

Cijo Jose, Moustapha Cissé, and François Fleuret. 2017. Kronecker recurrent units. arxiv:1705.10142.

[26]

Oleksii Kuchaiev and Boris Ginsburg. 2017. Factorization tricks for LSTM networks. arxiv:1703.10722.

[27]

Ashish Kumar, Saurabh Goyal, and Manik Varma. 2017. Resource-efficient machine learning in 2 KB RAM for the Internet of Things. In Proceedings of the 34th International Conference on Machine Learning (ICML’17), Vol. 70. 1935–1944. http://proceedings.mlr.press/v70/kumar17a.html.

Digital Library

[28]

Aditya Kusupati, Manish Singh, Kush Bhatia, Ashish Kumar, Prateek Jain, and Manik Varma. 2019. FastGRNN: A fast, accurate, stable and tiny kilobyte sized gated recurrent neural network. arxiv:1901.02358.

Digital Library

[29]

Alan J. Laub. 2005. Matrix Analysis for Scientists and Engineers. Vol. 91. SIAM.

Digital Library

[30]

V. Lebedev, Y. Ganin, M. Rakhuba, I. Oseledets, and V. Lempitsky. 2014. Speeding-up convolutional neural networks using fine-tuned cp-decomposition. arxiv:cs.CV/1412.6553.

[31]

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (Nov. 1998), 2278–2324.

[32]

Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, and Yoshua Bengio. 2015. Neural networks with few multiplications. arxiv:1510.03009. https://doi.org/10.1109/5.726791

[33]

Christos Louizos, Max Welling, and Diederik P. Kingma. 2017. Learning sparse neural networks through regularization. arxiv:1712.01312.

[34]

James Nagy. 2010. Introduction to Kronecker Products. Retrieved May 20, 2019 from http://www.mathcs.emory.edu/ nagy/courses/fall10/515/KroneckerIntro.pdf.

[35]

Sharan Narang, Eric Undersander, and Gregory F. Diamos. 2017. Block-sparse recurrent neural networks. arxiv:1711.02782.

[36]

Kirill Neklyudov, Dmitry Molchanov, Arsenii Ashukha, and Dmitry Vetrov. 2017. Structured Bayesian pruning via log-normal multiplicative noise. arxiv:1705.07283.

Digital Library

[37]

Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2012. Understanding the exploding gradient problem. arxiv:1211.5063.

[38]

Ravi Raju, Dibakar Gope, Urmish Thakker, and Jesse Beu. 2020. Understanding the impact of dynamic channel pruning on conditionally parameterized convolutions. In Proceedings of the 2nd International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things (AIChallengeIoT’20). ACM, New York, NY, 27–33.

Digital Library

[39]

D. Roggen, A. Calatroni, M. Rossi, T. Holleczek, K. Förster, G. Tröster, P. Lukowicz, et al. 2010. Collecting complex activity datasets in highly rich networked sensor environments. In Proceedings of the 2010 7th International Conference on Networked Sensing Systems (INSS’10). 233–240.

[40]

M. Schuster and K. K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 11 (Nov. 1997), 2673–2681.

Digital Library

[41]

Vikas Sindhwani, Tara Sainath, and Sanjiv Kumar. 2015. Structured transforms for small-footprint deep learning. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Red Hook, NY, 3088–3096.

Digital Library

[42]

Yu Tang, Zhigang Kan, Dequan Sun, Linbo Qiao, Jingjing Xiao, Zhiquan Lai, and Dongsheng Li. 2020. ADMMiRNN: Training RNN with stable convergence via an efficient ADMM approach. arxiv:2006.05622.

[43]

Jin Tao, Urmish Thakker, Ganesh Dasika, and Jesse Beu. 2019. Skipping RNN state updates without retraining the original model. In Proceedings of the 1st Workshop on Machine Learning on Edge in Sensor Systems (SenSys-ML’19). ACM, New York, NY, 3136.

Digital Library

[44]

Urmish Thakker, Jesse Beu, Dibakar Gope, Ganesh Dasika, and Matthew Mattina. 2020. Rank and run-time aware compression of NLP applications. In Proceedings of the Workshop on Simple and Efficient Natural Language Processing (SustaiNLP’20). ACM, New York, NY, 8–18.

[45]

Urmish Thakker, Jesse G. Beu, Dibakar Gope, Ganesh Dasika, and Matthew Mattina. 2019. Run-time efficient RNN compression for inference on edge devices. arxiv:1906.04886.

[46]

Urmish Thakker, Ganesh Dasika, Jesse G. Beu, and Matthew Mattina. 2019. Measuring scheduling efficiency of RNNs for NLP applications. arxiv:1904.03302.

[47]

Urmish Thakker, Paul Whatmough, Matthew Mattina, and Jesse Beu. 2020. Compressing language models using doped Kronecker products. arxiv:cs.LG/2001.08896.

[48]

Anna Thomas, Albert Gu, Tri Dao, Atri Rudra, and Christopher Ré. 2018. Learning compressed transforms with low displacement rank. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Curran Associates, Red Hook, NY, 9066–9078. http://papers.nips.cc/paper/8119-learning-compressed-transforms-with-low-displacement-rank.pdf.

Digital Library

[49]

Lloyd Trefethen and David Bau. 1997. Numerical Linear Algebra. SIAM.

[50]

Vincent Vanhoucke, Andrew Senior, and Mark Z. Mao. 2011. Improving the speed of neural networks on CPUs. In Proceedings of theDeep Learning and Unsupervised Feature Learning Workshop (NIPS’11).

[51]

Shuo Wang, Zhe Li, Caiwen Ding, Bo Yuan, Qinru Qiu, Yanzhi Wang, and Yun Liang. 2018. C-LSTM: Enabling efficient LSTM using structured compression techniques on FPGAs. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’18). ACM, New York, NY, 11–20.

Digital Library

[52]

Pete Warden. 2018. Speech Commands: A dataset for limited-vocabulary speech recognition. arxiv:1804.03209.

[53]

Scott Wisdom, Thomas Powers, John R. Hershey, Jonathan Le Roux, and Les Atlas. 2016. Full-capacity unitary recurrent neural networks. arxiv:1611.00035.

Digital Library

[54]

Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. 2014. Recurrent neural network regularization. arxiv:1409.2329.

[55]

Jiong Zhang, Qi Lei, and Inderjit S. Dhillon. 2018. Stabilizing gradients for deep neural networks via efficient SVD parameterization. arxiv:1803.09327.

[56]

X. Zhang, F. X. Yu, R. Guo, S. Kumar, S. Wang, and S. Chang. 2015. Fast orthogonal projection based on Kronecker product. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV’15). 2929–2937.

Digital Library

[57]

Yundong Zhang, Naveen Suda, Liangzhen Lai, and Vikas Chandra. 2017. Hello Edge: Keyword spotting on microcontrollers. arxiv:1711.07128.

[58]

Shuchang Zhou and Jia-Nan Wu. 2015. Compression of fully-connected layer in neural network by Kronecker product. arxiv:1507.05775.

[59]

Michael Zhu and Suyog Gupta. 2017. To prune, or not to prune: Exploring the efficacy of pruning for model compression. arxiv:1710.01878.

Cited By

Saha SSandha SAggarwal MWang BHan LBriseno JSrivastava M(2024)TinyNS: Platform-aware Neurosymbolic Auto Tiny Machine LearningACM Transactions on Embedded Computing Systems10.1145/360317123:3(1-48)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3603171
Saha SSandha SSrivastava M(2022)Machine Learning for Microcontroller-Class Hardware: A ReviewIEEE Sensors Journal10.1109/JSEN.2022.321077322:22(21362-21390)Online publication date: 15-Nov-2022
https://doi.org/10.1109/JSEN.2022.3210773
Ciliberto MFortes Rey VCalatroni ALukowicz PRoggen D(2021)Opportunity++: A Multimodal Dataset for Video- and Wearable, Object and Ambient Sensors-Based Human Activity RecognitionFrontiers in Computer Science10.3389/fcomp.2021.7920653Online publication date: 20-Dec-2021
https://doi.org/10.3389/fcomp.2021.792065

Index Terms

Compressing RNNs to Kilobyte Budget for IoT Devices Using Kronecker Products
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems
      1. Embedded software
2. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Factorization methods
      2. Neural networks

Recommendations

Stardust: A deep learning serving system in IoT: demo abstract
SenSys '19: Proceedings of the 17th Conference on Embedded Networked Sensor Systems

The deep neural network becomes an increasingly crucial component in recent intelligent applications. The excessive resource consumptions of state-of-the-art neural networks, however, remains a huge impediment towards their widespread deployment in the ...
Bayesian asymmetric quantized neural networks
Highlights
- M-ary quantized neural network is proposed with adjustable M to balance between end performance and implementation cost.
Abstract
This paper develops a robust model compression for neural networks via parameter quantization. Traditionally, quantized neural networks (QNN) were constructed by binary or ternary weights where the weights were deterministic. This ...
Complete vector quantization of feedforward neural networks
Abstract
Deep neural networks are widely used to solve several difficult machine learning tasks due to their impressive performance on standard benchmarking datasets. Most of the state of the art neural architectures contain a staggering amount ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Journal on Emerging Technologies in Computing Systems

ACM Journal on Emerging Technologies in Computing Systems Volume 17, Issue 4

October 2021

446 pages

ISSN:1550-4832

EISSN:1550-4840

DOI:10.1145/3472280

Editor:
Ramesh Karri
Polytechnic Institute of New York University, USA

Issue’s Table of Contents

Copyright © 2021 Association for Computing Machinery.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 14 July 2021

Accepted: 01 November 2020

Revised: 01 September 2020

Received: 01 April 2020

Published in JETC Volume 17, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
196
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)1

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Saha SSandha SAggarwal MWang BHan LBriseno JSrivastava M(2024)TinyNS: Platform-aware Neurosymbolic Auto Tiny Machine LearningACM Transactions on Embedded Computing Systems10.1145/360317123:3(1-48)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3603171
Saha SSandha SSrivastava M(2022)Machine Learning for Microcontroller-Class Hardware: A ReviewIEEE Sensors Journal10.1109/JSEN.2022.321077322:22(21362-21390)Online publication date: 15-Nov-2022
https://doi.org/10.1109/JSEN.2022.3210773
Ciliberto MFortes Rey VCalatroni ALukowicz PRoggen D(2021)Opportunity++: A Multimodal Dataset for Video- and Wearable, Object and Ambient Sensors-Based Human Activity RecognitionFrontiers in Computer Science10.3389/fcomp.2021.7920653Online publication date: 20-Dec-2021
https://doi.org/10.3389/fcomp.2021.792065

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents