Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3674558.3674562acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicctaConference Proceedingsconference-collections
research-article
Open access

Layer-wise Exploration of a Neural Processing Unit Compiler's Optimization Space

Published: 26 August 2024 Publication History

Abstract

Given the increasing popularity of Edge AI, embedded neural processing units (NPUs) are gradually becoming a standard feature in microcontrollers (MCUs) and System-on-a-Chip (SoCs). The deployment of neural networks on accelerators needs specialized neural network compilers that incorporate graph optimization stages, where layer-specific transformations are applied to reduce execution latency or memory footprint on platform-specific computing elements. For this reason, neural network compilers expose control parameters to be tuned for each individual network layer. The challenge addressed in this paper is finding an optimal combination of neural network compilation parameters for the efficient utilization of the computing resources of the target hardware accelerators. To address this task despite the huge space of parameters, we propose a greedy algorithm that iterates through the convolutional layers of the network, while preserving a set of solutions for the preceding layers. We evaluated this approach by transforming the graphs of some popular neural networks to optimize their performance and memory footprint, mapping them onto an experimental embedded NPU developed by STMicroelectronics using its associated neural network compiler. For the reported set of network models, the proposed technique has improved latency and memory footprint by 43% approximately compared to the baseline and exceeded the simulated annealing heuristics by 15% approximately.

References

[1]
20017. XLA: Optimizing Compiler for Machine Learning. https://www.tensorflow.org/xla.
[2]
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. [n. d.]. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. https://doi.org/10.48550/arXiv.1802.04799 arxiv:1802.04799 [cs]
[3]
K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. [n. d.]. A fast and elitist multiobjective genetic algorithm: NSGA-II. 6, 2 ([n. d.]). https://doi.org/10.1109/4235.996017
[4]
Giuseppe Desoli, Nitin Chawla, Thomas Boesch, Surinder-pal Singh, Elio Guidetti, Fabio De Ambroggi, Tommaso Majo, Paolo Zambotti, Manuj Ayodhyawasi, Harvinder Singh, and Nalin Aggarwal. [n. d.]. A 2.9TOPS/W deep convolutional neural network SoC in FD-SOI 28nm for intelligent embedded systems. In 2017 IEEE International Solid-State Circuits Conference (ISSCC) (2017-02). 238–239. https://doi.org/10.1109/ISSCC.2017.7870349 ISSN: 2376-8606.
[5]
Ahmet Erdem, Cristina Silvano, Thomas Boesch, Andrea Carlo Ornstein, Surinder-Pal Singh, and Giuseppe Desoli. [n. d.]. Runtime Design Space Exploration and Mapping of DCNNs for the Ultra-Low-Power Orlando SoC. 17, 2 ([n. d.]). https://doi.org/10.1145/3379933
[6]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. [n. d.]. Deep Residual Learning for Image Recognition. arxiv:1512.03385https://arxiv.org/abs/1512.03385
[7]
Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. [n. d.]. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. https://doi.org/10.48550/arXiv.1602.07360 arxiv:1602.07360 [cs]
[8]
Zhihao Jia, Oded Padon, James Thomas, Todd Warszawski, Matei Zaharia, and Alex Aiken. [n. d.]. TASO: optimizing deep learning computation with automatic generation of graph substitutions. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (New York, NY, USA, 2019-10-27) (SOSP ’19). Association for Computing Machinery, 47–62. https://doi.org/10.1145/3341301.3359630
[9]
Mingzhen Li, Yi Liu, Xiaoyan Liu, Qingxiao Sun, Xin You, Hailong Yang, Zhongzhi Luan, Lin Gan, Guangwen Yang, and Depei Qian. [n. d.]. The Deep Learning Compiler: A Comprehensive Survey. 32, 3 ([n. d.]), 708–727. https://doi.org/10.1109/TPDS.2020.3030548
[10]
M. G. Sarwar Murshed, Christopher Murphy, Daqing Hou, Nazar Khan, Ganesh Ananthanarayanan, and Faraz Hussain. [n. d.]. Machine Learning at the Network Edge: A Survey. ([n. d.]). arxiv:1908.00080http://arxiv.org/abs/1908.00080
[11]
Gianluca Palermo, Cristina Silvano, and Vittorio Zaccaria. [n. d.]. ReSPIR: A response surface-based pareto iterative refinement for application-specific design space exploration. 28, 12 ([n. d.]), 1816–1829. https://doi.org/10.1109/TCAD.2009.2028681
[12]
Joseph Redmon and Ali Farhadi. [n. d.]. YOLO9000: Better, Faster, Stronger. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Honolulu, HI, 2017-07). IEEE, 6517–6525. https://doi.org/10.1109/CVPR.2017.690
[13]
Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Garret Catron, Summer Deng, Roman Dzhabarov, Nick Gibson, James Hegeman, Meghan Lele, Roman Levenstein, Jack Montgomery, Bert Maher, Satish Nadathur, Jakob Olesen, Jongsoo Park, Artem Rakhov, Misha Smelyanskiy, and Man Wang. [n. d.]. Glow: Graph Lowering Compiler Techniques for Neural Networks. https://doi.org/10.48550/arXiv.1805.00907 arxiv:1805.00907 [cs]
[14]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. [n. d.]. MobileNetV2: Inverted Residuals and Linear Bottlenecks. IEEE Computer Society, 4510–4520. https://doi.org/10.1109/CVPR.2018.00474
[15]
Cristina Silvano, William Fornaciari, Gianluca Palermo, Vittorio Zaccaria, Fabrizio Castro, Marcos Martinez, Sara Bocchio, Roberto Zafalon, Prabhat Avasare, Geert Vanmeerbeeck, Chantal Ykman-Couvreur, Maryse Wouters, Carlos Kavka, Luka Onesti, Alessandro Turco, Umberto Bondi, Giovanni Mariani, Hector Posadas, Eugenio Villar, Chris Wu, Fan Dongrui, and Zhang Hao. [n. d.]. The MULTICUBE Design Flow. In Multi-objective Design Space Exploration of Multiprocessor SoC Architectures: The MULTICUBE Approach, Cristina Silvano, William Fornaciari, and Eugenio Villar (Eds.). Springer, 3–17. https://doi.org/10.1007/978-1-4419-8837-9_1
[16]
Kevin I. Smith, Richard M. Everson, Jonathan E. Fieldsend, Chris Murphy, and Rashmi Misra. [n. d.]. Dominance-Based Multiobjective Simulated Annealing. 12, 3 ([n. d.]), 323–342. https://doi.org/10.1109/TEVC.2007.904345
[17]
Vivienne Sze, Yu-Hsin Chen, Joel Emer, Amr Suleiman, and Zhengdong Zhang. [n. d.]. Hardware for machine learning: Challenges and opportunities. In 2017 IEEE Custom Integrated Circuits Conference (CICC) (2017-04). 1–8. https://doi.org/10.1109/CICC.2017.7993626 ISSN: 2152-3630.
[18]
Mingxing Tan and Quoc V. Le. [n. d.]. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arxiv:1905.11946 [cs, stat]http://arxiv.org/abs/1905.11946
[19]
Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. [n. d.]. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018-06). 6848–6856. https://doi.org/10.1109/CVPR.2018.00716 ISSN: 2575-7075.
[20]
Size Zheng, Yun Liang, Shuo Wang, Renze Chen, and Kaiwen Sheng. [n. d.]. FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (New York, NY, USA, 2020-03-13) (ASPLOS ’20). Association for Computing Machinery, 859–873. https://doi.org/10.1145/3373376.3378508
[21]
Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi, Daniel Wong, Peter Ma, Qiumin Xu, Hanxiao Liu, Mangpo Phitchaya Phothilimtha, Shen Wang, Anna Goldie, Azalia Mirhoseini, and James Laudon. [n. d.]. Transferable graph optimizers for ML compilers. In Proceedings of the 34th International Conference on Neural Information Processing Systems (Red Hook, NY, USA, 2020) (NIPS’20). Curran Associates Inc., 13844–13855.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICCTA '24: Proceedings of the 2024 10th International Conference on Computer Technology Applications
May 2024
324 pages
ISBN:9798400716386
DOI:10.1145/3674558
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 August 2024

Check for updates

Author Tags

  1. Compilers.
  2. Design space exploration
  3. NPU
  4. Tiny machine learning
  5. optimization

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICCTA 2024

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 87
    Total Downloads
  • Downloads (Last 12 months)87
  • Downloads (Last 6 weeks)57
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media