Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3417313.3429380acmconferencesArticle/Chapter ViewAbstractPublication PagessensysConference Proceedingsconference-collections
research-article

Pushing the Envelope of Dynamic Spatial Gating technologies

Published: 16 November 2020 Publication History

Abstract

There has been a recent surge in interest in dynamic inference technologies which can reduce the cost of inference, without sacrificing the accuracy of the model. These models are based on the assumption that not all parts of the output feature map (OFM) are equally important for all inputs. The parts of the output feature maps that are deemed unimportant for a certain input can be skipped entirely or computed at a lower precision, leading to reduced number of computation. In this paper we focus on one such technology that targets unimportant features in the spatial domain of OFM, called Precision Gating (PG). PG computes most features in low precision, to identify regions in the OFM where an object of interest is present, and computes high precision OFM for that region only. We show that PG leads to loss in accuracy when we push the MAC reduction achieved by a PG network. We identify orthogonal dynamic optimization opportunities not exploited by PG and show that the combined technologies can achieve far better results than their individual baseline. This Hybrid Model can achieve 1.92x computation savings on a CIFAR-10 model at an accuracy of 91.35%. At a similar computation savings, the PG model achieves an accuracy of 89.9%. Additionally, we show that PG leads to GEMM computations that are not hardware aware and propose a fix that makes PG technique CPU friendly without losing accuracy.

References

[1]
Colby R. Banbury, V. Reddi, M. Lam, W. Fu, A. Fazel, J. Holleman, X. Huang, R. Hurtado, D. Kanter, Anton Lokhmotov, D. Patterson, D. Pau, Jae sun Seo, Jeff Sieracki, Urmish Thakker, Marian Verhelst, and Poonam Yadav. 2020. Benchmarking TinyML Systems: Challenges and Direction. ArXiv abs/2003.04821 (2020).
[2]
Babak Ehteshami Bejnordi, Tijmen Blankevoort, and Max Welling. 2019. Batch-Shaping for Learning Conditional Channel Gated Networks. arXiv: 1907.06627 [cs.LG]
[3]
Francesco Petrogalli Dan Andrei Iliescu. 2018 (accessed September 3, 2020). Arm Scalable Vector Extensions and application to Machine Learning. https://developer.arm.com/solutions/hpc/resources/hpc-white-papers/arm-scalable-vector-extensions-and-application-to-machine-learning
[4]
Xitong Gao, Yiren Zhao, Łukasz Dudziak, Robert Mullins, and Cheng zhong Xu. 2018. Dynamic Channel Pruning: Feature Boosting and Suppression. arXiv: 1810.05331 [cs.CV]
[5]
Dibakar Gope, Jesse Beu, and Matthew Mattina. 2020. High Throughput Matrix-Matrix Multiplication between Asymmetric Bit-Width Operands. arXiv: 2008.00638 [cs.LG]
[6]
Dibakar Gope, Jesse G. Beu, Urmish Thakker, and Matthew Mattina. 2020. Ternary MobileNets via Per-Layer Hybrid Filter Banks. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2020), 3036--3046.
[7]
Dibakar Gope, Ganesh Dasika, and Matthew Mattina. 2019. Ternary Hybrid Neural-Tree Networks for Highly Constrained IoT Applications. In Proceedings of Machine Learning and Systems 2019. 190--200.
[8]
Song Han, Huizi Mao, and William J. Dally. 2015. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv: 1510.00149 [cs.CV]
[9]
Yihui He, Xiangyu Zhang, and Jian Sun. 2017. Channel Pruning for Accelerating Very Deep Neural Networks. arXiv: 1707.06168 [cs.CV]
[10]
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv: 1704.04861 [cs.CV]
[11]
Jie Hu, Li Shen, Samuel Albanie, Gang Sun, and Enhua Wu. 2017. Squeeze-and-Excitation Networks. arXiv: 1709.01507 [cs.CV]
[12]
Weizhe Hua, Yuan Zhou, Christopher De Sa, Zhiru Zhang, and G. Edward Suh. 2018. Channel Gating Neural Networks. arXiv: 1805.12549 [cs.LG]
[13]
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations. CoRR abs/1609.07061 (2016). arXiv:1609.07061 http://arxiv.org/abs/1609.07061
[14]
Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. arXiv:1602.07360 [cs.CV]
[15]
Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. 2014. Speeding up Convolutional Neural Networks with Low Rank Expansions. arXiv:1405.3866 [cs.CV]
[16]
P. Judd, J. Albericio, T. Hetherington, T. M. Aamodt, and A. Moshovos. 2016. Stripes: Bit-serial deep neural network computing. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1--12.
[17]
Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2016. Pruning Convolutional Neural Networks for Resource Efficient Inference. arXiv:1611.06440 [cs.LG]
[18]
H. Sharma, J. Park, N. Suda, L. Lai, B. Chau, V. Chandra, and H. Esmaeilzadeh. 2018. Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 764--775.
[19]
Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. 2017. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. arXiv:1701.06538 [cs.LG]
[20]
Jin Tao, Urmish Thakker, Ganesh Dasika, and Jesse Beu. 2019. Skipping RNN State Updates without Retraining the Original Model (SenSys-ML 2019). Association for Computing Machinery, New York, NY, USA, 31--36. https://doi.org/10.1145/3362743.3362965
[21]
U. Thakker, J. Beu, D. Gope, G. Dasika, and M. Mattina. 2019. Run-Time Efficient RNN Compression for Inference on Edge Devices. In 2019 2nd Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2). 26--30.
[22]
Urmish Thakker, Jesse Beu, Dibakar Gope, Ganesh Dasika, and Matthew Mattina. 2020. Rank and run-time aware compression of NLP Applications. arXiv:2010.03193 [cs.CL]
[23]
Urmish Thakker, Jesse G. Beu, Dibakar Gope, Chu Zhou, Igor Fedorov, Ganesh Dasika, and Matthew Mattina. 2019. Compressing RNNs for IoT devices by 15--38x using Kronecker Products. CoRR abs/1906.02876 (2019). arXiv:1906.02876 http://arxiv.org/abs/1906.02876
[24]
Urmish Thakker, Igor Fedorov, Jesse Beu, Dibakar Gope, Chu Zhou, Ganesh Dasika, and Matthew Mattina. 2019. Pushing the limits of RNN Compression. arXiv:1910.02558 [cs.LG]
[25]
Urmish Thakker, Paul Whatmough, Matthew Mattina, and Jesse Beu. 2020. Compressing Language Models using Doped Kronecker Products. arXiv:2001.08896 [cs.LG]
[26]
Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, and Joseph E. Gonzalez. 2018. SkipNet: Learning Dynamic Routing in Convolutional Networks. In Proceedings of the European Conference on Computer Vision (ECCV).
[27]
Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2017. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. arXiv:1707.01083 [cs.CV]
[28]
Yichi Zhang, Ritchie Zhao, Weizhe Hua, Nayun Xu, G. Edward Suh, and Zhiru Zhang. 2020. Precision Gating: Improving Neural Network Efficiency with Dynamic Dual-Precision Activations. arXiv:2002.07136 [cs.CV]

Cited By

View all
  • (2022)A Survey on Federated Learning for Resource-Constrained IoT DevicesIEEE Internet of Things Journal10.1109/JIOT.2021.30950779:1(1-24)Online publication date: 1-Jan-2022
  • (2022)Federated Learning for Resource-Constrained IoT Devices: Panoramas and State of the ArtFederated and Transfer Learning10.1007/978-3-031-11748-0_2(7-27)Online publication date: 1-Oct-2022
  • (2021)Compressing RNNs to Kilobyte Budget for IoT Devices Using Kronecker ProductsACM Journal on Emerging Technologies in Computing Systems10.1145/344001617:4(1-18)Online publication date: 14-Jul-2021
  • Show More Cited By

Index Terms

  1. Pushing the Envelope of Dynamic Spatial Gating technologies

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    AIChallengeIoT '20: Proceedings of the 2nd International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things
    November 2020
    74 pages
    ISBN:9781450381345
    DOI:10.1145/3417313
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 November 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. IoT
    2. dynamic computation
    3. efficient inference
    4. neural networks

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)A Survey on Federated Learning for Resource-Constrained IoT DevicesIEEE Internet of Things Journal10.1109/JIOT.2021.30950779:1(1-24)Online publication date: 1-Jan-2022
    • (2022)Federated Learning for Resource-Constrained IoT Devices: Panoramas and State of the ArtFederated and Transfer Learning10.1007/978-3-031-11748-0_2(7-27)Online publication date: 1-Oct-2022
    • (2021)Compressing RNNs to Kilobyte Budget for IoT Devices Using Kronecker ProductsACM Journal on Emerging Technologies in Computing Systems10.1145/344001617:4(1-18)Online publication date: 14-Jul-2021
    • (2021)CE-PeopleSeg: Real-time people segmentation with 10% CPU usage for video conference2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW53098.2021.00102(914-922)Online publication date: Jun-2021

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media