research-article

Pushing the Envelope of Dynamic Spatial Gating technologies

Authors:

Urmish Thakker,

Jesse BeuAuthors Info & Claims

AIChallengeIoT '20: Proceedings of the 2nd International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things

Pages 21 - 26

https://doi.org/10.1145/3417313.3429380

Published: 16 November 2020 Publication History

Abstract

There has been a recent surge in interest in dynamic inference technologies which can reduce the cost of inference, without sacrificing the accuracy of the model. These models are based on the assumption that not all parts of the output feature map (OFM) are equally important for all inputs. The parts of the output feature maps that are deemed unimportant for a certain input can be skipped entirely or computed at a lower precision, leading to reduced number of computation. In this paper we focus on one such technology that targets unimportant features in the spatial domain of OFM, called Precision Gating (PG). PG computes most features in low precision, to identify regions in the OFM where an object of interest is present, and computes high precision OFM for that region only. We show that PG leads to loss in accuracy when we push the MAC reduction achieved by a PG network. We identify orthogonal dynamic optimization opportunities not exploited by PG and show that the combined technologies can achieve far better results than their individual baseline. This Hybrid Model can achieve 1.92x computation savings on a CIFAR-10 model at an accuracy of 91.35%. At a similar computation savings, the PG model achieves an accuracy of 89.9%. Additionally, we show that PG leads to GEMM computations that are not hardware aware and propose a fix that makes PG technique CPU friendly without losing accuracy.

References

[1]

Colby R. Banbury, V. Reddi, M. Lam, W. Fu, A. Fazel, J. Holleman, X. Huang, R. Hurtado, D. Kanter, Anton Lokhmotov, D. Patterson, D. Pau, Jae sun Seo, Jeff Sieracki, Urmish Thakker, Marian Verhelst, and Poonam Yadav. 2020. Benchmarking TinyML Systems: Challenges and Direction. ArXiv abs/2003.04821 (2020).

[2]

Babak Ehteshami Bejnordi, Tijmen Blankevoort, and Max Welling. 2019. Batch-Shaping for Learning Conditional Channel Gated Networks. arXiv: 1907.06627 [cs.LG]

[3]

Francesco Petrogalli Dan Andrei Iliescu. 2018 (accessed September 3, 2020). Arm Scalable Vector Extensions and application to Machine Learning. https://developer.arm.com/solutions/hpc/resources/hpc-white-papers/arm-scalable-vector-extensions-and-application-to-machine-learning

[4]

Xitong Gao, Yiren Zhao, Łukasz Dudziak, Robert Mullins, and Cheng zhong Xu. 2018. Dynamic Channel Pruning: Feature Boosting and Suppression. arXiv: 1810.05331 [cs.CV]

[5]

Dibakar Gope, Jesse Beu, and Matthew Mattina. 2020. High Throughput Matrix-Matrix Multiplication between Asymmetric Bit-Width Operands. arXiv: 2008.00638 [cs.LG]

[6]

Dibakar Gope, Jesse G. Beu, Urmish Thakker, and Matthew Mattina. 2020. Ternary MobileNets via Per-Layer Hybrid Filter Banks. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2020), 3036--3046.

[7]

Dibakar Gope, Ganesh Dasika, and Matthew Mattina. 2019. Ternary Hybrid Neural-Tree Networks for Highly Constrained IoT Applications. In Proceedings of Machine Learning and Systems 2019. 190--200.

[8]

Song Han, Huizi Mao, and William J. Dally. 2015. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv: 1510.00149 [cs.CV]

Digital Library

[9]

Yihui He, Xiangyu Zhang, and Jian Sun. 2017. Channel Pruning for Accelerating Very Deep Neural Networks. arXiv: 1707.06168 [cs.CV]

[10]

Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv: 1704.04861 [cs.CV]

[11]

Jie Hu, Li Shen, Samuel Albanie, Gang Sun, and Enhua Wu. 2017. Squeeze-and-Excitation Networks. arXiv: 1709.01507 [cs.CV]

[12]

Weizhe Hua, Yuan Zhou, Christopher De Sa, Zhiru Zhang, and G. Edward Suh. 2018. Channel Gating Neural Networks. arXiv: 1805.12549 [cs.LG]

[13]

Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations. CoRR abs/1609.07061 (2016). arXiv:1609.07061 http://arxiv.org/abs/1609.07061

[14]

Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. arXiv:1602.07360 [cs.CV]

[15]

Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. 2014. Speeding up Convolutional Neural Networks with Low Rank Expansions. arXiv:1405.3866 [cs.CV]

[16]

P. Judd, J. Albericio, T. Hetherington, T. M. Aamodt, and A. Moshovos. 2016. Stripes: Bit-serial deep neural network computing. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1--12.

[17]

Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2016. Pruning Convolutional Neural Networks for Resource Efficient Inference. arXiv:1611.06440 [cs.LG]

[18]

H. Sharma, J. Park, N. Suda, L. Lai, B. Chau, V. Chandra, and H. Esmaeilzadeh. 2018. Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 764--775.

[19]

Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. 2017. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. arXiv:1701.06538 [cs.LG]

[20]

Jin Tao, Urmish Thakker, Ganesh Dasika, and Jesse Beu. 2019. Skipping RNN State Updates without Retraining the Original Model (SenSys-ML 2019). Association for Computing Machinery, New York, NY, USA, 31--36. https://doi.org/10.1145/3362743.3362965

[21]

U. Thakker, J. Beu, D. Gope, G. Dasika, and M. Mattina. 2019. Run-Time Efficient RNN Compression for Inference on Edge Devices. In 2019 2nd Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2). 26--30.

[22]

Urmish Thakker, Jesse Beu, Dibakar Gope, Ganesh Dasika, and Matthew Mattina. 2020. Rank and run-time aware compression of NLP Applications. arXiv:2010.03193 [cs.CL]

[23]

Urmish Thakker, Jesse G. Beu, Dibakar Gope, Chu Zhou, Igor Fedorov, Ganesh Dasika, and Matthew Mattina. 2019. Compressing RNNs for IoT devices by 15--38x using Kronecker Products. CoRR abs/1906.02876 (2019). arXiv:1906.02876 http://arxiv.org/abs/1906.02876

[24]

Urmish Thakker, Igor Fedorov, Jesse Beu, Dibakar Gope, Chu Zhou, Ganesh Dasika, and Matthew Mattina. 2019. Pushing the limits of RNN Compression. arXiv:1910.02558 [cs.LG]

[25]

Urmish Thakker, Paul Whatmough, Matthew Mattina, and Jesse Beu. 2020. Compressing Language Models using Doped Kronecker Products. arXiv:2001.08896 [cs.LG]

[26]

Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, and Joseph E. Gonzalez. 2018. SkipNet: Learning Dynamic Routing in Convolutional Networks. In Proceedings of the European Conference on Computer Vision (ECCV).

[27]

Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2017. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. arXiv:1707.01083 [cs.CV]

[28]

Yichi Zhang, Ritchie Zhao, Weizhe Hua, Nayun Xu, G. Edward Suh, and Zhiru Zhang. 2020. Precision Gating: Improving Neural Network Efficiency with Dynamic Dual-Precision Activations. arXiv:2002.07136 [cs.CV]

Cited By

Imteaj AThakker UWang SLi JAmini M(2022)A Survey on Federated Learning for Resource-Constrained IoT DevicesIEEE Internet of Things Journal10.1109/JIOT.2021.30950779:1(1-24)Online publication date: 1-Jan-2022
https://doi.org/10.1109/JIOT.2021.3095077
Imteaj AMamun Ahmed KThakker UWang SLi JAmini M(2022)Federated Learning for Resource-Constrained IoT Devices: Panoramas and State of the ArtFederated and Transfer Learning10.1007/978-3-031-11748-0_2(7-27)Online publication date: 1-Oct-2022
https://doi.org/10.1007/978-3-031-11748-0_2
Thakker UFedorov IZhou CGope DMattina MDasika GBeu J(2021)Compressing RNNs to Kilobyte Budget for IoT Devices Using Kronecker ProductsACM Journal on Emerging Technologies in Computing Systems10.1145/344001617:4(1-18)Online publication date: 14-Jul-2021
https://dl.acm.org/doi/10.1145/3440016
Show More Cited By

Index Terms

Pushing the Envelope of Dynamic Spatial Gating technologies
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Pushing the Envelope: Extreme Network Coding on the GPU
ICDCS '09: Proceedings of the 2009 29th IEEE International Conference on Distributed Computing Systems

While it is well known that network coding achieves optimal flow rates in multicast sessions, its potential for practical use has remained to be a question, due to its high computational complexity. With GPU computing gaining momentum as a result of ...
Power gating strategies on GPUs

As technology continues to shrink, reducing leakage is critical to achieving energy efficiency. Previous studies on low-power GPUs (Graphics Processing Units) focused on techniques for dynamic power reduction, such as DVFS (Dynamic Voltage and Frequency ...
Decoupling for power gating: sources of power noise and design strategies
DAC '11: Proceedings of the 48th Design Automation Conference

Power gating is essential for controlling leakage power dissipation of modern chip designs. However, power gating introduces unique power delivery integrity issues and tradeoffs between switching and rush current (wake-up) supply noises. In addition, in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

AIChallengeIoT '20: Proceedings of the 2nd International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things

November 2020

74 pages

ISBN:9781450381345

DOI:10.1145/3417313

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 November 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

SenSys '20

Sponsor:

SenSys '20: The 18th ACM Conference on Embedded Networked Sensor Systems

November 16 - 19, 2020

Virtual Event, Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
100
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)1

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Imteaj AThakker UWang SLi JAmini M(2022)A Survey on Federated Learning for Resource-Constrained IoT DevicesIEEE Internet of Things Journal10.1109/JIOT.2021.30950779:1(1-24)Online publication date: 1-Jan-2022
https://doi.org/10.1109/JIOT.2021.3095077
Imteaj AMamun Ahmed KThakker UWang SLi JAmini M(2022)Federated Learning for Resource-Constrained IoT Devices: Panoramas and State of the ArtFederated and Transfer Learning10.1007/978-3-031-11748-0_2(7-27)Online publication date: 1-Oct-2022
https://doi.org/10.1007/978-3-031-11748-0_2
Thakker UFedorov IZhou CGope DMattina MDasika GBeu J(2021)Compressing RNNs to Kilobyte Budget for IoT Devices Using Kronecker ProductsACM Journal on Emerging Technologies in Computing Systems10.1145/344001617:4(1-18)Online publication date: 14-Jul-2021
https://dl.acm.org/doi/10.1145/3440016
Jiang ZHe ZHuang XYang ZTan P(2021)CE-PeopleSeg: Real-time people segmentation with 10% CPU usage for video conference2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW53098.2021.00102(914-922)Online publication date: Jun-2021
https://doi.org/10.1109/CVPRW53098.2021.00102

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents