research-article

Open access

Just Accepted

RaNAS: Resource-Aware Neural Architecture Search for Edge Computing

Authors:

Weixing JiAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization

Accepted on 10 October 2024

https://doi.org/10.1145/3703353

Online AM: 05 November 2024 Publication History

Abstract

Neural architecture search (NAS) for edge devices is often time-consuming because of long-latency deploying and testing on edge devices. The ability to accurately predict the computation cost and memory requirement for convolutional neural networks (CNNs) in advance holds substantial value. Existing work primarily relies on analytical models, which can result in high prediction errors. This paper proposes a resource-aware NAS (RaNAS) model based on various features. Additionally, a new graph neural network is introduced to predict inference latency and maximum memory requirements for CNNs on edge devices. Experimental results show that, within the error bound of ±1%, RaNAS achieves an accuracy improvement of approximately 8% for inference latency prediction and about 25% for maximum memory occupancy prediction over the state-of-the-art approaches.

References

[1]

Jacob Adlers and Gustaf Pihl. 2018. Prediction of Training Time for Deep Neural Networks in Tensorflow.

[2]

Lu Bai, Weixing Ji, Qinyuan Li, Xilai Yao, Wei Xin, and Wanyi Zhu. 2024. ConvDarts: A Fast and Exact Convolutional Algorithm Selector for Deep Learning Frameworks. CCF Trans. High Perform. Comput. 6, 1 (2024), 32–44. https://doi.org/10.1007/S42514-023-00167-7

[3]

Shaked Brody, Uri Alon, and Eran Yahav. 2021. How Attentive Are Graph Attention Networks?arXiv preprint arXiv:2105.14491(2021).

[4]

Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2020. Once-for-All: Train One Network and Specialize it for Efficient Deployment. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. https://openreview.net/forum?id=HylxE1HKwS

[5]

Han Cai, Ligeng Zhu, and Song Han. 2019. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. https://openreview.net/forum?id=HylVB3AqYm

[6]

Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cuDNN: Efficient Primitives for Deep Learning. arXiv preprint arXiv:1410.0759(2014).

[7]

Krishna Teja Chitty-Venkata and Arun K. Somani. 2022. Neural Architecture Search Survey: A Hardware Perspective. ACM Comput. Surv. 55, 4, Article 78 (nov 2022), 36 pages. https://doi.org/10.1145/3524500

Digital Library

[8]

Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Hongxu Yin, Fei Sun, Yanghan Wang, Marat Dukhan, Yunqing Hu, Yiming Wu, Yangqing Jia, et al. 2019. ChamNet: Towards Efficient Network Design Through Platform-Aware Model Adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11398–11407.

[9]

Li Deng. 2012. The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web]. IEEE Signal Process. Mag. 29, 6 (2012), 141–142. https://doi.org/10.1109/MSP.2012.2211477

[10]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171–4186. https://doi.org/10.18653/v1/n19-1423

[11]

Xuanyi Dong and Yi Yang. 2020. NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search. In International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=HJxyZkBKDr

[12]

Lukasz Dudziak, Thomas Chau, Mohamed S. Abdelfattah, Royson Lee, Hyeji Kim, and Nicholas D. Lane. 2020. BRP-NAS: Prediction-based NAS using GCNs. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https://proceedings.neurips.cc/paper/2020/hash/768e78024aa8fdb9b8fe87be86f64745-Abstract.html

[13]

Yanjie Gao, Xianyu Gu, Hongyu Zhang, Haoxiang Lin, and Mao Yang. 2023. Runtime Performance Prediction for Deep Learning Models with Graph Neural Network. In 45th IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice, SEIP@ICSE 2023, Melbourne, Australia, May 14-20, 2023. IEEE, 368–380. https://doi.org/10.1109/ICSE-SEIP58684.2023.00039

Digital Library

[14]

Yanjie Gao, Yu Liu, Hongyu Zhang, Zhengxian Li, Yonghao Zhu, Haoxiang Lin, and Mao Yang. 2020. Estimating GPU memory consumption of deep learning models. In ESEC/FSE ’20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8-13, 2020, Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann (Eds.). ACM, 1342–1352. https://doi.org/10.1145/3368089.3417050

Digital Library

[15]

Eugenio Gianniti, Li Zhang, and Danilo Ardagna. 2018. Performance Prediction of GPU-Based Deep Learning Applications. In 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). 167–170. https://doi.org/10.1109/CAHPC.2018.8645908

[16]

Google. 2021. google-research/bert: TensorFlow code and pre-trained models for BERT. https://github.com/google-research/bert. (Accessed on 01/23/2022).

[17]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.

[18]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity Mappings in Deep Residual Networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14. Springer, 630–645.

[19]

Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv preprint arXiv:1704.04861(2017).

[20]

Jie Hu, Li Shen, Samuel Albanie, Gang Sun, and Enhua Wu. 2020. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 42, 8 (2020), 2011–2023. https://doi.org/10.1109/TPAMI.2019.2913372

Digital Library

[21]

Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-Level Accuracy With 50x Fewer Parameters and < 0.5 MB Model Size. arXiv preprint arXiv:1602.07360(2016).

[22]

Md Johirul Islam, Giang Nguyen, Rangeet Pan, and Hridesh Rajan. 2019. A Comprehensive Study on Deep Learning Bug Characteristics. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Tallinn, Estonia) (ESEC/FSE 2019). Association for Computing Machinery, New York, NY, USA, 510–520. https://doi.org/10.1145/3338906.3338955

Digital Library

[23]

Daniel Justus, John Brennan, Stephen Bonner, and Andrew Stephen McGough. 2018. Predicting the Computational Cost of Deep Learning Models. In 2018 IEEE International Conference on Big Data (Big Data). 3873–3882. https://doi.org/10.1109/BigData.2018.8622396

[24]

Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).

[25]

Zhuojin Li, Marco Paolieri, and Leana Golubchik. 2023. Predicting Inference Latency of Neural Architectures on Mobile Devices. In Proceedings of the 2023 ACM/SPEC International Conference on Performance Engineering. 99–112.

Digital Library

[26]

Bingqian Lu, Jianyi Yang, Weiwen Jiang, Yiyu Shi, and Shaolei Ren. 2021. One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search. Proceedings of the ACM on Measurement and Analysis of Computing Systems 5, 3(2021), 1–34.

Digital Library

[27]

Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In Proceedings of the European conference on computer vision (ECCV). 116–131.

Digital Library

[28]

Jonathan Malmaud. 2020. Shape Infererence in TensorFlow.jl. https://malmaud.github.io/tfdocs/

[29]

Evgeny Ponomarev, Sergey Matveev, Ivan Oseledets, and Valery Glukhov. 2021. Latency Estimation Tool and Investigation of Neural Networks Inference on Mobile GPU. Computers 10, 8 (2021), 104.

[30]

Hang Qi, Evan Randall Sparks, and Ameet Talwalkar. 2017. Paleo: A Performance Model for Deep Neural Networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=SyVVJ85lg

[31]

Zhongnan Qu. 2022. Enabling Deep Learning on Edge Devices. Ph. D. Dissertation. ETH Zurich, Zürich, Switzerland. https://doi.org/10.3929/ETHZ-B-000574442

[32]

Aditya Rajagopal and Christos-Savvas Bouganis. 2021. perf4sight: A toolflow to model CNN training performance on Edge GPUs. In IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2021, Montreal, QC, Canada, October 11-17, 2021. IEEE, 963–971. https://doi.org/10.1109/ICCVW54120.2021.00112

[33]

Minsoo Rhu, Natalia Gimelshein, Jason Clemons, Arslan Zulfiqar, and Stephen W. Keckler. 2016. VDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture (Taipei, Taiwan) (MICRO-49). IEEE Press, Article 18, 13 pages.

[34]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520.

[35]

Iqbal H. Sarker. 2021. Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. SN Computer Science (2021).

[36]

Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1409.1556

[37]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1–9.

[38]

Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V Le. 2019. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2820–2828.

[39]

Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=rJXMpikCZ

[40]

Chuan-Chi Wang, Ying-Chiao Liao, Ming-Chang Kao, Wen-Yew Liang, and Shih-Hao Hung. 2020. PerfNet: Platform-Aware Performance Modeling for Deep Neural Networks. In Proceedings of the International Conference on Research in Adaptive and Convergent Systems(Gwangju, Republic of Korea) (RACS ’20). Association for Computing Machinery, New York, NY, USA, 90–95. https://doi.org/10.1145/3400286.3418245

Digital Library

[41]

Chuan-Chi Wang, Ying-Chiao Liao, Ming-Chang Kao, Wen-Yew Liang, and Shih-Hao Hung. 2021. Toward Accurate Platform-Aware Performance Modeling for Deep Neural Networks. SIGAPP Appl. Comput. Rev. 21, 1 (jul 2021), 50–61. https://doi.org/10.1145/3477133.3477137

Digital Library

[42]

Colin White, Mahmoud Safari, Rhea Sukthanker, Binxin Ru, Thomas Elsken, Arber Zela, Debadeepta Dey, and Frank Hutter. 2023. Neural Architecture Search: Insights from 1000 Papers. CoRR abs/2301.08727(2023). https://doi.org/10.48550/ARXIV.2301.08727 arXiv:2301.08727

[43]

Xiaofan Wu, Ying Li, Zhijian Cheng, and Yuan Wang. 2020. Methods for Predict Memory Usage of Big Data Computing System: A Comparison Study. In Journal of Physics: Conference Series, Vol. 1631. IOP Publishing, 012153.

[44]

Chengru Yang, Zhehao Li, Chaoyi Ruan, Guanbin Xu, Cheng Li, Ruichuan Chen, and Feng Yan. 2021. PerfEstimator: A Generic and Extensible Performance Estimator for Data Parallel DNN Training. In 2021 IEEE/ACM International Workshop on Cloud Intelligence (CloudIntelligence). 13–18. https://doi.org/10.1109/CloudIntelligence52565.2021.00012

[45]

Zhao Yang, Shengbing Zhang, Ruxu Li, Chuxi Li, Miao Wang, Danghui Wang, and Meng Zhang. 2021. Efficient Resource-Aware Convolutional Neural Architecture Search for Edge Computing with Pareto-Bayesian Optimization. Sensors 21, 2 (2021), 444. https://doi.org/10.3390/S21020444

[46]

Geoffrey X. Yu, Yubo Gao, Pavel Golikov, and Gennady Pekhimenko. 2021. Habitat: A Runtime-Based Computational Performance Predictor for Deep Neural Network Training. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). USENIX Association, 503–521. https://www.usenix.org/conference/atc21/presentation/yu

[47]

Li Lyna Zhang, Shihao Han, Jianyu Wei, Ningxin Zheng, Ting Cao, Yuqing Yang, and Yunxin Liu. 2021. nn-Meter: Towards Accurate Latency Prediction of Deep-Learning Model Inference on Diverse Edge Devices. In Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services (Virtual Event, Wisconsin) (MobiSys ’21). Association for Computing Machinery, New York, NY, USA, 81–93. https://doi.org/10.1145/3458864.3467882

Digital Library

[48]

Ru Zhang, Wencong Xiao, Hongyu Zhang, Yu Liu, Haoxiang Lin, and Mao Yang. 2020. An Empirical Study on Program Failures of Deep Learning Jobs. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (Seoul, South Korea) (ICSE ’20). Association for Computing Machinery, New York, NY, USA, 1159–1170. https://doi.org/10.1145/3377811.3380362

Digital Library

[49]

Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. Computer Vision Foundation / IEEE Computer Society, 6848–6856. https://doi.org/10.1109/CVPR.2018.00716

Index Terms

RaNAS: Resource-Aware Neural Architecture Search for Edge Computing
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
  2. Modeling and simulation
    1. Model development and analysis

Recommendations

Graph neural architecture prediction
Abstract
Graph neural networks (GNNs) have shown their superiority in the modeling of graph data. Recently, increasing attention has been paid to automatic graph neural architecture search, aiming to overcome the shortcomings of manually constructing GNN ...
Deep Neural Architecture Search with Deep Graph Bayesian Optimization
WI '19: IEEE/WIC/ACM International Conference on Web Intelligence

Image recognition aims to identify objects, places, people, or other targeted items in a given image, and has a wide range of social applications such as natural disasters recognition, plant disease detection, and traffic jam detection. Currently state-...
Differentiable neural architecture learning for efficient neural networks
Highlights
- We build a new standalone control module based on the scaled sigmoid function to enrich the neural network module family to enable the neural architecture ...
Abstract
Efficient neural networks has received ever-increasing attention with the evolution of convolutional neural networks (CNNs), especially involving their deployment on embedded and mobile platforms. One of the biggest problems to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Just Accepted

EISSN:1544-3973

Table of Contents

Copyright © 2024 Copyright held by the owner/author(s).

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Online AM: 05 November 2024

Accepted: 10 October 2024

Revised: 21 August 2024

Received: 31 May 2024

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
118
Total Downloads

Downloads (Last 12 months)118
Downloads (Last 6 weeks)118

Reflects downloads up to 22 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables