Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

ZIP-CNN: Design Space Exploration for CNN Implementation within a MCU

Published: 26 September 2024 Publication History

Abstract

Embedded systems based on Microcontroller Units (MCUs) often gather significant quantities of data and solve various issues. Convolutional Neural Networks (CNNs) have proven their effectiveness in solving computer vision and natural language processing tasks. However, implementing CNNs within MCUs is challenging due to their high inference costs, which varies widely depending on hardware targets and CNN topologies. Despite state-of-the-art advancements, no efficient design space exploration solutions handle the wide variety of implementation solutions. In this article, we introduce the ZIP-CNN design space exploration methodology, which facilitates CNN implementation within MCUs. We developed a model that quantitatively estimates the latency, energy consumption, and memory space required to run a CNN within an MCU. This model accounts for algorithmic reductions such as knowledge distillation, pruning, or quantization and applies to any CNN topology. To demonstrate the efficiency of our methodology, we investigated LeNet5, ResNet8, and ResNet26 within three different MCUs. We made materials and supplementary results available in a GitHub repository: https://github.com/ThGbay/ZIP-CNN. The proposed method was empirically verified on three hardware targets running at 14 different operating frequencies. The three CNN topologies investigated were implemented in their default configuration in FP32, and also reduced with INT8 quantization, pruning at five different rates and with knowledge distillation. The estimates of our model are very reliable with an error of 3.29% to 15.23% for latency, 3.12% to 10.34% for energy consumption, and 1.95% to 6.31% for memory space. These results are based on on-device measurements.

References

[1]
STMicroelectronics. n.d. STM32L496ZG: Ultra-low-power with FPU Arm Cortex-M4 MCU 80 MHz with 1 Mbyte of Flash Memory, USB OTG, LCD, DFSDM. Retrieved September 6, 2024 from https://www.st.com/en/microcontrollers-microprocessors/stm32l496zg.html
[2]
Manoj Alwani, Han Chen, Michael Ferdman, and Peter Milder. 2016. Fused-layer CNN accelerators. In Proceedings of the 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). 1–12.
[3]
Alberto Ancilotto, Francesco Paissan, and Elisabetta Farella. 2023. XiNet: Efficient neural networks for TinyML. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’23). 16968–16977.
[4]
Jimmy Ba and Rich Caruana. 2014. Do deep nets really need to be deep? In Advances in Neural Information Processing Systems. Vol. 27. Curran Associates, 1–9.https://proceedings.neurips.cc/paper/2014/hash/ea8fcd92d59581717e06eb187f10666d-Abstract.html
[5]
Bowen Baker, Otkrist Gupta, Ramesh Raskar, and Nikhil Naik. 2018. Accelerating neural architecture search using performance prediction. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18).
[6]
Colby Banbury, Chuteng Zhou, Igor Fedorov, Ramon Matas, Urmish Thakker, Dibakar Gope, Vijay Janapa Reddi, Matthew Mattina, and Paul Whatmough. 2021. MicroNets: Neural network architectures for deploying tinyML applications on commodity microcontrollers. Proceedings of Machine Learning and Systems 3 (March2021), 517–532. https://proceedings.mlsys.org/paper_files/paper/2021/hash/c4d41d9619462c534b7b61d1f772385e-Abstract.html
[7]
Colby R. Banbury, Vijay Janapa Reddi, Max Lam, William Fu, Amin Fazel, Jeremy Holleman, Xinyuan Huang, Robert Hurtado, David Kanter, Anton Lokhmotov, David Patterson, Danilo Pau, Jae-Sun Seo, Jeff Sieracki, Urmish Thakker, Marian Verhelst, and Poonam Yadav. 2020. Benchmarking TinyML Systems: Challenges and Direction. Retrieved September 6, 2024 from https://eprints.whiterose.ac.uk/174601/
[8]
Hadjer Benmeziane, Hamza Ouarnoughi, Kaoutar El Maghraoui, and Smail Niar. 2023. Multi-objective hardware-aware neural architecture search with pareto rank-preserving surrogate models. ACM Transactions on Architecture and Code Optimization (Jan.2023), 3579853.
[9]
Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2019. Once-for-all: Train one network and specialize it for efficient deployment. In Proceedings of the International Conference on Learning Representations (ICLR’19).
[10]
Han Cai, Ligeng Zhu, and Song Han. 2018. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware. Retrieved September 6, 2024 from https://openreview.net/forum?id=HylVB3AqYm&utm_campaign=NLP%20News&utm_medium=email&utm_source=Revue%20newsletter
[11]
Gang Chen, Jiawei Chen, Fuli Feng, Sheng Zhou, and Xiangnan He. 2023. Unbiased knowledge distillation for recommendation. In Proceedings of the 16th ACM International Conference on Web Search and Data Mining. 976–984.
[12]
Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits 52, 1 (Jan.2017), 127–138.
[13]
Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. BinaryConnect: Training deep neural networks with binary weights during propagations. In Advances in Neural Information Processing Systems. Vol. 28. Curran Associates, 1–9.https://proceedings.neurips.cc/paper_files/paper/2015/hash/3e15cc11f979ed25912dff5b0669f2cd-Abstract.html
[14]
Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Hongxu Yin, Fei Sun, Yanghan Wang, Marat Dukhan, Yunqing Hu, Yiming Wu, Yangqing Jia, Peter Vajda, Matt Uyttendaele, and Niraj K. Jha. 2019. ChamNet: Towards efficient network design through platform-aware model adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11398–11407.
[15]
Robert David, Jared Duke, Advait Jain, Vijay Janapa Reddi, Nat Jeffries, Jian Li, Nick Kreeger, Ian Nappier, Meghna Natraj, Tiezhen Wang, Pete Warden, and Rocky Rhodes. 2021. TensorFlow Lite Micro: Embedded machine learning for TinyML systems. Proceedings of Machine Learning and Systems 3 (March2021), 800–811. https://proceedings.mlsys.org/paper_files/paper/2021/hash/6c44dc73014d66ba49b28d483a8f8b0d-Abstract.html
[16]
Emily L. Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, and Rob Fergus. 2014. Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in Neural Information Processing Systems. Vol. 27. Curran Associates, 1–9.
[17]
Mark Deutel, Philipp Woller, Christopher Mutschler, and Juergen Teich. 2023. Energy-efficient deployment of deep learning applications on Cortex-M based microcontrollers using deep compression. In Proceedings of the 26th Workshop on Methods and Description Languages for Modeling and Verification of Circuits and Systems (MBMV’23). 1–12. https://ieeexplore.ieee.org/abstract/document/10173060
[18]
Wilfried Dron, Simon Duquennoy, Thiemo Voigt, Khalil Hachicha, and Patrick Garda. 2014. An emulation-based method for lifetime estimation of wireless sensor networks. In Proceedings of the 2014 IEEE International Conference on Distributed Computing in Sensor Systems. 241–248.
[19]
Wilfried Dron, Khalil Hachicha, and Patrick Garda. 2013. A fixed frequency sampling method for wireless sensors power consumption estimation. In Proceedings of the 2013 IEEE 11th International New Circuits and Systems Conference (NEWCAS’13). 1–4.
[20]
Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA’15). ACM, New York, NY, USA, 92–104.
[21]
Etienne Dupuis, David Novo, Ian O’Connor, and Alberto Bosio. 2021. CNN weight sharing based on a fast accuracy estimation metric. Microelectronics Reliability 122 (2021), 114148.
[22]
Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. 2018. Efficient multi-objective neural architecture search via Lamarckian evolution. In Proceedings of the International Conference on Learning Representations (ICLR’18).
[23]
Igor Fedorov, Ryan P. Adams, Matthew Mattina, and Paul N. Whatmough. 2019. SpArSe: Sparse architecture search for CNNs on resource-constrained microcontrollers. In Proceedings of the 33rd International Conference on Neural Information Processing Systems. 4977–4989.
[24]
Jonathan Frankle and Michael Carbin. 2018. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In Proceedings of the International Conference on Learning Representations (ICLR’18).
[25]
Thomas Garbay, Petr Dobias, Wilfried Dron, Pedro Lusich, Imane Khalis, Andrea Pinna, Khalil Hachicha, and Bertrand Granado. 2021. CNN inference costs estimation on microcontrollers: The EST primitive-based model. In Proceedings of the 2021 28th IEEE International Conference on Electronics, Circuits, and Systems (ICECS’21). IEEE, 1–5.
[26]
Thomas Garbay, Khalil Hachicha, Petr Dobias, Wilfried Dron, Pedro Lusich, Imane Khalis, Andrea Pinna, and Bertrand Granado. 2022. Accurate estimation of the CNN inference cost for TinyML devices. In Proceedings of the 2022 IEEE 35th International System-on-Chip Conference (SOCC’22). 1–6.
[27]
Thomas Garbay, Karim Hocine, Khalil Hachicha, Andrea Pinna, and Bertrand Granado. 2023. An empirical study of convolutional neural network compressions within low-power devices. In Proceedings of the 2023 30th IEEE International Conference on Electronics, Circuits, and Systems (ICECS’23). IEEE, 1–4.
[28]
Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W. Mahoney, and Kurt Keutzer. 2022. A survey of quantization methods for efficient neural network inference. In Low-Power Computer Vision. Chapman & Hall/CRC, 291–326.
[29]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770–778. https://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html
[30]
Yang He, Guoliang Kang, Xuanyi Dong, Yanwei Fu, and Yi Yang. 2018. Soft filter pruning for accelerating deep convolutional neural networks. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI ’18). 2234–2240.
[31]
Yang He, Ping Liu, Ziwei Wang, Zhilan Hu, and Yi Yang. 2019. Filter pruning via geometric median for deep convolutional neural networks acceleration. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 4340–4349. https://openaccess.thecvf.com/content_CVPR_2019/html/He_Filter_Pruning_via_Geometric_Median_for_Deep_Convolutional_Neural_\Networks_CVPR_2019_paper.html
[32]
Lennart Heim, Andreas Biri, Zhongnan Qu, and Lothar Thiele. 2021. Measuring what really matters: Optimizing neural networks for TinyML. arXiv:2104.10645 (2021).
[33]
Geoffrey Hinton, Jeff Dean, and Oriol Vinyals. 2015. Distilling the knowledge in a neural network. arXiv:1503.02531 (2015).
[34]
Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, 2704–2713.
[35]
Aaron Klein, Stefan Falkner, Jost Tobias Springenberg, and Frank Hutter. 2022. Learning curve prediction with Bayesian neural networks. In Proceedings of the International Conference on Learning Representations (ICLR’22).
[36]
Hyoukjun Kwon, Prasanth Chatarasi, Michael Pellauer, Angshuman Parashar, Vivek Sarkar, and Tushar Krishna. 2019. Understanding reuse, performance, and hardware cost of DNN dataflow: A data-centric approach. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’52). ACM, New York, NY, USA, 754–768.
[37]
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (Nov.1998), 2278–2324.
[38]
Yann LeCun and Corinna Cortes. n.d. The MNIST Database of Handwritten Digits. Retrieved September 6, 2024 from http://yann.lecun.com/exdb/mnist/
[39]
Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2016. Pruning filters for efficient ConvNets. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17). https://openreview.net/forum?id=rJqFGTslg
[40]
Edgar Liberis, Lukasz Dudziak, and Nicholas D. Lane. 2021. \(\mu\)NAS: Constrained neural architecture search for microcontrollers. In Proceedings of the 1st Workshop on Machine Learning and Systems. ACM, New York, NY, 70–79.
[41]
Ji Lin, Wei-Ming Chen, Han Cai, Chuang Gan, and Song Han. 2021. MCUNetV2: Memory-efficient patch-based inference for tiny deep learning. arXiv:2110.15352 [cs] (2021).
[42]
Ji Lin, Wei-Ming Chen, Yujun Lin, John Cohn, Chuang Gan, and Song Han. 2020. MCUNet: Tiny deep learning on IoT devices. In Advances in Neural Information Processing Systems. Vol. 33. Curran Associates, 11711–11722. https://proceedings.neurips.cc/paper/2020/hash/86c51678350f656dcc7f490a43946ee5-Abstract.html
[43]
Ji Lin, Wei-Ming Chen, Yujun Lin, John Cohn, Chuang Gan, and Song Han. 2020. MCUNet: Tiny deep learning on IoT devices. In Advances in Neural Information Processing Systems. Vol. 33. Curran Associates, 11711–11722.
[44]
Bin Liu, Fengfu Li, Xiaoxing Wang, Bo Zhang, and Junchi Yan. 2023. Ternary weight networks. In Proceedings of the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’23). 1–5.
[45]
Jiayi Liu, Samarth Tripathi, Unmesh Kurup, and Mohak Shah. 2020. Pruning algorithms to accelerate convolutional neural networks for edge applications: A survey. arXiv preprint arXiv:2005.04275 (2020).
[46]
Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. 2018. Rethinking the value of network pruning. arXiv preprint arXiv:1810.05270 (2018).
[47]
Mohammad Loni, Sima Sinaei, Ali Zoljodi, Masoud Daneshtalab, and Mikael Sjödin. 2020. DeepMaker: A multi-objective optimization framework for deep neural networks in embedded systems. Microprocessors and Microsystems 73 (March2020), 102989.
[48]
Linyan Mei, Pouya Houshmand, Vikram Jain, Sebastian Giraldo, and Marian Verhelst. 2021. ZigZag: Enlarging joint architecture-mapping design space exploration for DNN accelerators. IEEE Transactions on Computers 70, 8 (Aug.2021), 1160–1174.
[49]
Svetlana Minakova, Dolly Sapra, Todor Stefanov, and Andy D. Pimentel. 2022. Scenario based run-time switching for adaptive CNN-based applications at the edge. ACM Transactions on Embedded Computing Systems 21, 2 (Feb.2022), Article 14, 33 pages.
[50]
Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2016. Pruning convolutional neural networks for resource efficient inference. In Proceedings of the 5th International Conference on Learning Representations (ICLR’16). https://openreview.net/forum?id=SJGCiw5gl
[51]
Bert Moons, Daniel Bankman, Lita Yang, Boris Murmann, and Marian Verhelst. 2018. BinarEye: An always-on energy-accuracy-scalable binary CNN processor with all memory on chip in 28nm CMOS. In Proceedings of the 2018 IEEE Custom Integrated Circuits Conference (CICC’18). 1–4. ISSN: 2152-3630.
[52]
Francesco Paissan, Alberto Ancilotto, and Elisabetta Farella. 2022. PhiNets: A scalable backbone for low-power AI at the edge. ACM Transactions on Embedded Computing Systems 21, 5 (Dec.2022), Article 53, 18 pages.
[53]
Shvetank Prakash, Matthew Stewart, Colby Banbury, Mark Mazumder, Pete Warden, Brian Plancher, and Vijay Janapa Reddi. 2023. Is TinyML sustainable? Communications of the ACM 66, 11 (Oct.2023), 68–77.
[54]
Dominika Przewlocka-Rus, Syed Shakib Sarwar, H. Ekin Sumbul, Yuecheng Li, and Barbara De Salvo. 2022. Power-of-two quantization for low bitwidth and hardware compliant neural networks. arXiv:2203.05025 [cs] (2022). http://arxiv.org/abs/2203.05025
[55]
Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V. Le. 2019. Regularized evolution for image classifier architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4780–4789.
[56]
Pengzhen Ren, Yun Xiao, Xiaojun Chang, Po-yao Huang, Zhihui Li, Xiaojiang Chen, and Xin Wang. 2021. A comprehensive survey of neural architecture search: Challenges and solutions. ACM Computing Surveys 54, 4 (May2021), Article 76, 34 pages.
[57]
Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2015. FitNets: Hints for thin deep nets. arXiv:1412.6550 [cs] (2015).
[58]
Frederic Runge, Danny Stoll, Stefan Falkner, and Frank Hutter. 2019. Learning to design RNA. In Proceedings of the International Conference on Learning Representations (ICLR’19). 1–29.
[59]
Jaber Samir, Soldatos John, Milovanovic Miroslav, and Husser Lydia. 2023. 2023 Edge AI Technology Report. Wevolver. https://www.wevolver.com/article/2023-edge-ai-technology-report
[60]
Anush Sankaran, Olivier Mastropietro, Ehsan Saboori, Yasser Idris, Davis Sawyer, MohammadHossein AskariHemmat, and Ghouthi Boukli Hacene. 2021. Deeplite Neutrino: An End-to-End Framework for Constrained Deep Learning Model Optimization. Technical Report. Deeplite.http://arxiv.org/abs/2101.04073
[61]
Xuan Shen, Yaohua Wang, Ming Lin, Yilun Huang, Hao Tang, Xiuyu Sun, and Yanzhi Wang. 2023. DeepMAD: Mathematical architecture design for deep convolutional neural network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6163–6173.
[62]
Dongjoo Shin, Jinmook Lee, Jinsu Lee, Juhyoung Lee, and Hoi-Jun Yoo. 2018. DNPU: An energy-efficient deep-learning processor with heterogeneous multi-core architecture. IEEE Micro 38, 5 (Sept.2018), 85–93.
[63]
Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V. Le. 2019. MnasNet: Platform-aware neural architecture search for mobile. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 2820–2828. https://openaccess.thecvf.com/content_CVPR_2019/html/Tan_MnasNet_Platform-Aware_Neural_Architecture_Search_for_Mobile_CVPR_2019_paper
[64]
Delia Velasco-Montero, Jorge Fernandez-Berni, Ricardo Carmona-Galan, and Angel Rodriguez-Vazquez. 2020. PreVIous: A methodology for prediction of visual inference performance on IoT devices. arXiv:1912.06442 [cs] (2020).
[65]
Haibin Wang, Ce Ge, Hesen Chen, and Xiuyu Sun. 2023. PreNAS: Preferred one-shot learning towards efficient neural architecture search. In Proceedings of the International Conference on Machine Learning. 35642–35654.
[66]
Tien-Ju Yang, Andrew Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, and Hartwig Adam. 2018. NetAdapt: Platform-aware neural network adaptation for mobile applications. 285–300. https://openaccess.thecvf.com/content_ECCV_2018/html/Tien-Ju_Yang_NetAdapt_Platform-Aware_Neural_ECCV_2018_paper.html
[67]
Jianbo Ye, Xin Lu, Zhe Lin, and James Z. Wang. 2018. Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. arXiv:1802.00124 (2018). https://openreview.net/forum?id=HJ94fqApW
[68]
Yundong Zhang, Naveen Suda, Liangzhen Lai, and Vikas Chandra. 2018. Hello edge: Keyword spotting on microcontrollers. arXiv:1711.07128 [cs, eess] (2018). http://arxiv.org/abs/1711.07128
[69]
Brenda Zhuang and Danilo Pau. 2024. A practical framework for designing and deploying tiny deep neural networks on microcontrollers. In Proceedings of the 2024 IEEE International Conference on Consumer Electronics (ICCE’24). 1–6.
[70]
Shaojie Zhuo, Hongyu Chen, Ramchalam Kinattinkara Ramakrishnan, Tommy Chen, Chen Feng, Yicheng Lin, Parker Zhang, and Liang Shen. 2022. An empirical study of low precision quantization for TinyML. arXiv:2203.05492 [cs] (2022). http://arxiv.org/abs/2203.05492

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 24, Issue 1
January 2025
325 pages
EISSN:1558-3465
DOI:10.1145/3696805
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 26 September 2024
Online AM: 04 September 2024
Accepted: 21 August 2024
Revised: 16 July 2024
Received: 08 March 2024
Published in TECS Volume 24, Issue 1

Check for updates

Author Tags

  1. Design space exploration
  2. neural networks
  3. microcontroller units
  4. tiny machine learning

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 314
    Total Downloads
  • Downloads (Last 12 months)314
  • Downloads (Last 6 weeks)130
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media