Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3498361.3538932acmconferencesArticle/Chapter ViewAbstractPublication PagesmobisysConference Proceedingsconference-collections
research-article

CoDL: efficient CPU-GPU co-execution for deep learning inference on mobile devices

Published: 27 June 2022 Publication History

Abstract

Concurrent inference execution on heterogeneous processors is critical to improve the performance of increasingly heavy deep learning (DL) models. However, available inference frameworks can only use one processor at a time, or hardly achieve speedup by concurrent execution compared to using one processor. This is due to the challenges to 1) reduce data sharing overhead, and 2) properly partition each operator between processors.
By solving the challenges, we propose CoDL, a concurrent DL inference framework for the CPU and GPU on mobile devices. It can fully utilize the heterogeneous processors to accelerate each operator of a model. It integrates two novel techniques: 1) hybrid-type-friendly data sharing, which allows each processor to use its efficient data type for inference. To reduce data sharing overhead, we also propose hybrid-dimension partitioning and operator chain methods; 2) non-linearity- and concurrency-aware latency prediction, which can direct proper operator partitioning by building an extremely light-weight but accurate latency predictor for different processors.
Based on the two techniques, we build the end-to-end CoDL inference framework, and evaluate it on different DL models. The results show up to 4.93× speedup and 62.3% energy saving compared with the state-of-the-art concurrent execution system.

References

[1]
Snapdragon 855. 2021. https://www.qualcomm.com/products/snapdragon-855-mobile-platform
[2]
Snapdragon 865. 2021. https://www.qualcomm.com/products/snapdragon-865-5g-mobile-platform
[3]
Snapdragon 888. 2021. https://www.qualcomm.com/products/snapdragon-888-5g-mobile-platform
[4]
Kirin 990. 2021. https://www.hisilicon.com/en/products/Kirin/Kirin-flagship-chips/Kirin-990-5G
[5]
Jie An, Haoyi Xiong, Jiebo Luo, Jun Huan, and Jinwen Ma. 2019. Fast Universal Style Transfer for Artistic and Photorealistic Rendering. arXiv:1907.03118 [cs.CV]
[6]
Ermao Cai, Da-Cheng Juan, Dimitrios Stamoulis, and Diana Marculescu. 2017. NeuralPower: Predict and Deploy Energy-Efficient Convolutional Neural Networks. In ACML. 622--637.
[7]
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In OSDI 18. USENIX Association, Carlsbad, CA, 578--594.
[8]
Jiankang Deng, Jia Guo, Yuxiang Zhou, Jinke Yu, Irene Kotsia, and Stefanos Zafeiriou. 2019. RetinaFace: Single-stage Dense Face Localisation in the Wild. arXiv:1905.00641 [cs.CV]
[9]
Mali G76. 2021. https://developer.arm.com/ip-products/graphics-and-multimedia/mali-gpus/mali-g76-gpu
[10]
Ling Huang, Jinzhu Jia, Bin Yu, Byung-Gon Chun, Petros Maniatis, and Mayur Naik. 2010. Predicting Execution Time of Computer Programs Using Sparse Polynomial Regression. In NIPS.
[11]
Loc N. Huynh, Youngki Lee, and Rajesh Krishna Balan. 2017. DeepMon: Mobile GPU-Based Deep Learning Framework for Continuous Vision Applications. In MobiSys '17. Association for Computing Machinery, New York, NY, USA, 82--95.
[12]
Shiqi Jiang, Lihao Ran, Ting Cao, Yusen Xu, and Yunxin Liu. 2020. Profiling and Optimizing Deep Learning Inference on Mobile GPUs. In APSys '20. Association for Computing Machinery, New York, NY, USA, 75--81.
[13]
Woosung Kang, Kilho Lee, Jinkyu Lee, Insik Shin, and Hoon Sung Chwa. 2021. LaLaRAND: Flexible Layer-by-Layer CPU/GPU Scheduling for Real-Time DNN Tasks. In 2021 IEEE Real-Time Systems Symposium (RTSS). 329--341.
[14]
Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge. SIGARCH Comput. Archit. News 45, 1 (April 2017), 615--629.
[15]
Youngsok Kim, Joonsung Kim, Dongju Chae, Daehyun Kim, and Jangwoo Kim. 2019. μLayer: Low Latency On-Device Inference Using Cooperative Single-Layer Acceleration and Processor-Friendly Quantization. In EuroSys '19. Association for Computing Machinery, New York, NY, USA, Article 45, 15 pages.
[16]
Nicholas D. Lane, Sourav Bhattacharya, Petko Georgiev, Claudio Forlivesi, Lei Jiao, Lorena Qendro, and Fahim Kawsar. 2016. DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices. In 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN). 1--12.
[17]
Tensorflow Lite. 2020. https://www.tensorflow.org/lite/
[18]
Tensorflow Lite. 2020. https://www.tensorflow.org/lite/
[19]
MACE. 2020. https://github.com/XiaoMi/mace
[20]
MNN. 2020. https://github.com/alibaba/MNN
[21]
OpenCL. 2021. https://www.khronos.org/opencl/
[22]
Hang Qi, Evan R. Sparks, and Ameet Talwalkar. 2017. Paleo: A Performance Model for Deep Neural Networks. In ICLR.
[23]
J. Redmon and A. Farhadi. 2017. YOLO9000: Better, Faster, Stronger. In CVPR 6517--6525.
[24]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4510--4520.
[25]
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations.
[26]
Xiaohu Tang, Shihao Han, Li Lyna Zhang, Ting Cao, and Yunxin Liu. 2021. To Bridge Neural Network Design and Real-World Performance: A Behaviour Study for Neural Networks. In MLSys. https://www.microsoft.com/en-us/research/publication/to-bridge-neural-network-design-and-real-world-performance-a-behaviour-study-for-neural-networks/
[27]
TinyML. 2021. https://github.com/BurnellLiu/TinyML
[28]
Manni Wang, Shaohua Ding, Ting Cao, Yunxin Liu, and Fengyuan Xu. 2021. AsyMo: Scalable and Efficient Deep-Learning Inference on Asymmetric Mobile CPUs. In MobiCom '21. Association for Computing Machinery, New York, NY, USA, 215--228.
[29]
S. Wang, G. Ananthanarayanan, and T. Mitra. 2019. OPTiC: Optimizing Collaborative CPU-GPU Computing on Mobile Devices With Thermal Constraints. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 38, 3 (2019), 393--406.
[30]
S. Wang, G. Ananthanarayanan, Y. Zeng, N. Goel, A. Pathania, and T. Mitra. 2020. High-Throughput CNN Inference on Embedded ARM Big.LITTLE Multicore Processors. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 10 (2020), 2254--2267.
[31]
Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. 2019. FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search. In CVPR. 10726--10734.
[32]
Jinrui Zhang, Deyu Zhang, Xiaohui Xu, Fucheng Jia, Yunxin Liu, Xuanzhe Liu, Ju Ren, and Yaoxue Zhang. 2020. MobiPose: Real-Time Multi-Person Pose Estimation on Mobile Devices. In SenSys '20. Association for Computing Machinery, New York, NY, USA, 136--149.
[33]
Li Lyna Zhang, Shihao Han, Jianyu Wei, Ningxin Zheng, Ting Cao, Yuqing Yang, and Yunxin Liu. 2021. nn-Meter: Towards Accurate Latency Prediction of Deep-Learning Model Inference on Diverse Edge Devices. In MobiSys 2021. https://www.microsoft.com/en-us/research/publication/nn-meter-towards-accurate-latency-prediction-of-deep-learning-model-inference-on-diverse-edge-devices/
[34]
Zhuoran Zhao, Kamyar Mirzazad Barijough, and Andreas Gerstlauer. 2018. Deepthings: Distributed adaptive deep learning inference on resource-constrained iot edge clusters. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 11 (2018), 2348--2359.
[35]
Christian Zimmermann and Thomas Brox. 2017. Learning to estimate 3d hand pose from single rgb images. In Proceedings of the IEEE international conference on computer vision. 4903--4911.

Cited By

View all
  • (2024)Troy: Efficient Service Deployment for Windows SystemsChinese Journal of Electronics10.23919/cje.2022.00.40533:1(313-322)Online publication date: Jan-2024
  • (2024)Context-aware Multi-Model Object Detection for Diversely Heterogeneous Compute Systems2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546645(1-6)Online publication date: 25-Mar-2024
  • (2024)CARIn: Constraint-Aware and Responsive Inference on Heterogeneous Devices for Single- and Multi-DNN WorkloadsACM Transactions on Embedded Computing Systems10.1145/366586823:4(1-32)Online publication date: 29-Jun-2024
  • Show More Cited By

Index Terms

  1. CoDL: efficient CPU-GPU co-execution for deep learning inference on mobile devices

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MobiSys '22: Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services
      June 2022
      668 pages
      ISBN:9781450391856
      DOI:10.1145/3498361
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 June 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Badges

      Author Tags

      1. CPU-GPU co-execution
      2. deep learning inference
      3. mobile devices

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      MobiSys '22

      Acceptance Rates

      Overall Acceptance Rate 274 of 1,679 submissions, 16%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)428
      • Downloads (Last 6 weeks)23
      Reflects downloads up to 26 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Troy: Efficient Service Deployment for Windows SystemsChinese Journal of Electronics10.23919/cje.2022.00.40533:1(313-322)Online publication date: Jan-2024
      • (2024)Context-aware Multi-Model Object Detection for Diversely Heterogeneous Compute Systems2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546645(1-6)Online publication date: 25-Mar-2024
      • (2024)CARIn: Constraint-Aware and Responsive Inference on Heterogeneous Devices for Single- and Multi-DNN WorkloadsACM Transactions on Embedded Computing Systems10.1145/366586823:4(1-32)Online publication date: 29-Jun-2024
      • (2024)AdaOper: Energy-efficient and Responsive Concurrent DNN Inference on Mobile DevicesProceedings of the 2024 Workshop on Adaptive AIoT Systems10.1145/3662007.3663884(19-20)Online publication date: 3-Jun-2024
      • (2024)Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and PitfallsProceedings of the 2024 Workshop on Adaptive AIoT Systems10.1145/3662007.3663881(1-6)Online publication date: 3-Jun-2024
      • (2024)Reaching the Edge of the Edge: Image Analysis in SpaceProceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning10.1145/3650203.3663330(29-38)Online publication date: 9-Jun-2024
      • (2024)Practical Optical Camera Communication Behind Unseen and Complex BackgroundsProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661866(113-126)Online publication date: 3-Jun-2024
      • (2024)Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous System-on-ChipsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638502(243-256)Online publication date: 2-Mar-2024
      • (2024)An Adaptive Android Memory Management Based on a Lightweight PSO-LSTM Model2024 IEEE Wireless Communications and Networking Conference (WCNC)10.1109/WCNC57260.2024.10570952(1-6)Online publication date: 21-Apr-2024
      • (2024)Graft: Efficient Inference Serving for Hybrid Deep Learning With SLO Guarantees via DNN Re-AlignmentIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.334051835:2(280-296)Online publication date: 1-Feb-2024
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media