research-article

Open access

Layer-wise Exploration of a Neural Processing Unit Compiler's Optimization Space

Authors:

Fabrizio Indirli,

Andrea Carlo Ornstein,

Giuseppe Desoli,

Alessandro Buschini,

Cristina Silvano,

Vittorio ZaccariaAuthors Info & Claims

ICCTA '24: Proceedings of the 2024 10th International Conference on Computer Technology Applications

Pages 20 - 26

https://doi.org/10.1145/3674558.3674562

Published: 26 August 2024 Publication History

All formats PDF

Abstract

Given the increasing popularity of Edge AI, embedded neural processing units (NPUs) are gradually becoming a standard feature in microcontrollers (MCUs) and System-on-a-Chip (SoCs). The deployment of neural networks on accelerators needs specialized neural network compilers that incorporate graph optimization stages, where layer-specific transformations are applied to reduce execution latency or memory footprint on platform-specific computing elements. For this reason, neural network compilers expose control parameters to be tuned for each individual network layer. The challenge addressed in this paper is finding an optimal combination of neural network compilation parameters for the efficient utilization of the computing resources of the target hardware accelerators. To address this task despite the huge space of parameters, we propose a greedy algorithm that iterates through the convolutional layers of the network, while preserving a set of solutions for the preceding layers. We evaluated this approach by transforming the graphs of some popular neural networks to optimize their performance and memory footprint, mapping them onto an experimental embedded NPU developed by STMicroelectronics using its associated neural network compiler. For the reported set of network models, the proposed technique has improved latency and memory footprint by 43% approximately compared to the baseline and exceeded the simulated annealing heuristics by 15% approximately.

References

[1]

20017. XLA: Optimizing Compiler for Machine Learning. https://www.tensorflow.org/xla.

[2]

Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. [n. d.]. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. https://doi.org/10.48550/arXiv.1802.04799 arxiv:1802.04799 [cs]

[3]

K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. [n. d.]. A fast and elitist multiobjective genetic algorithm: NSGA-II. 6, 2 ([n. d.]). https://doi.org/10.1109/4235.996017

Digital Library

[4]

Giuseppe Desoli, Nitin Chawla, Thomas Boesch, Surinder-pal Singh, Elio Guidetti, Fabio De Ambroggi, Tommaso Majo, Paolo Zambotti, Manuj Ayodhyawasi, Harvinder Singh, and Nalin Aggarwal. [n. d.]. A 2.9TOPS/W deep convolutional neural network SoC in FD-SOI 28nm for intelligent embedded systems. In 2017 IEEE International Solid-State Circuits Conference (ISSCC) (2017-02). 238–239. https://doi.org/10.1109/ISSCC.2017.7870349 ISSN: 2376-8606.

[5]

Ahmet Erdem, Cristina Silvano, Thomas Boesch, Andrea Carlo Ornstein, Surinder-Pal Singh, and Giuseppe Desoli. [n. d.]. Runtime Design Space Exploration and Mapping of DCNNs for the Ultra-Low-Power Orlando SoC. 17, 2 ([n. d.]). https://doi.org/10.1145/3379933

Digital Library

[6]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. [n. d.]. Deep Residual Learning for Image Recognition. arxiv:1512.03385https://arxiv.org/abs/1512.03385

[7]

Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. [n. d.]. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. https://doi.org/10.48550/arXiv.1602.07360 arxiv:1602.07360 [cs]

[8]

Zhihao Jia, Oded Padon, James Thomas, Todd Warszawski, Matei Zaharia, and Alex Aiken. [n. d.]. TASO: optimizing deep learning computation with automatic generation of graph substitutions. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (New York, NY, USA, 2019-10-27) (SOSP ’19). Association for Computing Machinery, 47–62. https://doi.org/10.1145/3341301.3359630

Digital Library

[9]

Mingzhen Li, Yi Liu, Xiaoyan Liu, Qingxiao Sun, Xin You, Hailong Yang, Zhongzhi Luan, Lin Gan, Guangwen Yang, and Depei Qian. [n. d.]. The Deep Learning Compiler: A Comprehensive Survey. 32, 3 ([n. d.]), 708–727. https://doi.org/10.1109/TPDS.2020.3030548

[10]

M. G. Sarwar Murshed, Christopher Murphy, Daqing Hou, Nazar Khan, Ganesh Ananthanarayanan, and Faraz Hussain. [n. d.]. Machine Learning at the Network Edge: A Survey. ([n. d.]). arxiv:1908.00080http://arxiv.org/abs/1908.00080

[11]

Gianluca Palermo, Cristina Silvano, and Vittorio Zaccaria. [n. d.]. ReSPIR: A response surface-based pareto iterative refinement for application-specific design space exploration. 28, 12 ([n. d.]), 1816–1829. https://doi.org/10.1109/TCAD.2009.2028681

Digital Library

[12]

Joseph Redmon and Ali Farhadi. [n. d.]. YOLO9000: Better, Faster, Stronger. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Honolulu, HI, 2017-07). IEEE, 6517–6525. https://doi.org/10.1109/CVPR.2017.690

[13]

Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Garret Catron, Summer Deng, Roman Dzhabarov, Nick Gibson, James Hegeman, Meghan Lele, Roman Levenstein, Jack Montgomery, Bert Maher, Satish Nadathur, Jakob Olesen, Jongsoo Park, Artem Rakhov, Misha Smelyanskiy, and Man Wang. [n. d.]. Glow: Graph Lowering Compiler Techniques for Neural Networks. https://doi.org/10.48550/arXiv.1805.00907 arxiv:1805.00907 [cs]

[14]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. [n. d.]. MobileNetV2: Inverted Residuals and Linear Bottlenecks. IEEE Computer Society, 4510–4520. https://doi.org/10.1109/CVPR.2018.00474

[15]

Cristina Silvano, William Fornaciari, Gianluca Palermo, Vittorio Zaccaria, Fabrizio Castro, Marcos Martinez, Sara Bocchio, Roberto Zafalon, Prabhat Avasare, Geert Vanmeerbeeck, Chantal Ykman-Couvreur, Maryse Wouters, Carlos Kavka, Luka Onesti, Alessandro Turco, Umberto Bondi, Giovanni Mariani, Hector Posadas, Eugenio Villar, Chris Wu, Fan Dongrui, and Zhang Hao. [n. d.]. The MULTICUBE Design Flow. In Multi-objective Design Space Exploration of Multiprocessor SoC Architectures: The MULTICUBE Approach, Cristina Silvano, William Fornaciari, and Eugenio Villar (Eds.). Springer, 3–17. https://doi.org/10.1007/978-1-4419-8837-9_1

[16]

Kevin I. Smith, Richard M. Everson, Jonathan E. Fieldsend, Chris Murphy, and Rashmi Misra. [n. d.]. Dominance-Based Multiobjective Simulated Annealing. 12, 3 ([n. d.]), 323–342. https://doi.org/10.1109/TEVC.2007.904345

Digital Library

[17]

Vivienne Sze, Yu-Hsin Chen, Joel Emer, Amr Suleiman, and Zhengdong Zhang. [n. d.]. Hardware for machine learning: Challenges and opportunities. In 2017 IEEE Custom Integrated Circuits Conference (CICC) (2017-04). 1–8. https://doi.org/10.1109/CICC.2017.7993626 ISSN: 2152-3630.

[18]

Mingxing Tan and Quoc V. Le. [n. d.]. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arxiv:1905.11946 [cs, stat]http://arxiv.org/abs/1905.11946

[19]

Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. [n. d.]. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018-06). 6848–6856. https://doi.org/10.1109/CVPR.2018.00716 ISSN: 2575-7075.

[20]

Size Zheng, Yun Liang, Shuo Wang, Renze Chen, and Kaiwen Sheng. [n. d.]. FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (New York, NY, USA, 2020-03-13) (ASPLOS ’20). Association for Computing Machinery, 859–873. https://doi.org/10.1145/3373376.3378508

Digital Library

[21]

Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi, Daniel Wong, Peter Ma, Qiumin Xu, Hanxiao Liu, Mangpo Phitchaya Phothilimtha, Shen Wang, Anna Goldie, Azalia Mirhoseini, and James Laudon. [n. d.]. Transferable graph optimizers for ML compilers. In Proceedings of the 34th International Conference on Neural Information Processing Systems (Red Hook, NY, USA, 2020) (NIPS’20). Curran Associates Inc., 13844–13855.

Digital Library

Index Terms

Layer-wise Exploration of a Neural Processing Unit Compiler's Optimization Space

Recommendations

gpucc: an open-source GPGPU compiler
CGO '16: Proceedings of the 2016 International Symposium on Code Generation and Optimization

Graphics Processing Units have emerged as powerful accelerators for massively parallel, numerically intensive workloads. The two dominant software models for these devices are NVIDIA’s CUDA and the cross-platform OpenCL standard. Until now, there has ...
Design space exploration of the turbo decoding algorithm on GPUs
CASES '10: Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems

In this paper, we explore the design space of the Turbo decoding algorithm on GPUs and find a performance bottleneck. We consider three axes for the design space exploration: a radix degree, a parallelization method, and the number of sub-frames per ...
Neural Acceleration for General-Purpose Approximate Programs
MICRO-45: Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture

This paper describes a learning-based approach to the acceleration of approximate programs. We describe the \emph{Parrot transformation}, a program transformation that selects and trains a neural network to mimic a region of imperative code. After the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICCTA '24: Proceedings of the 2024 10th International Conference on Computer Technology Applications

May 2024

324 pages

ISBN:9798400716386

DOI:10.1145/3674558

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 August 2024

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICCTA 2024

ICCTA 2024: 2024 10th International Conference on Computer Technology Applications

May 15 - 17, 2024

Vienna, Austria

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
87
Total Downloads

Downloads (Last 12 months)87
Downloads (Last 6 weeks)57

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents