Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3427921.3450256acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
short-paper
Public Access

PieSlicer: Dynamically Improving Response Time for Cloud-based CNN Inference

Published: 09 April 2021 Publication History

Abstract

Executing deep-learning inference on cloud servers enables the usage of high complexity models for mobile devices with limited resources. However, pre-execution time-the time it takes to prepare and transfer data to the cloud-is variable and can take orders of magnitude longer to complete than inference execution itself. This pre-execution time can be reduced by dynamically deciding the order of two essential steps, preprocessing and data transfer, to better take advantage of on-device resources and network conditions. In this work, we present PieSlicer, a system for making dynamic preprocessing decisions to improve cloud inference performance using linear regression models. PieSlicer then leverages these models to select the appropriate preprocessing location. We show that for image classification applications PieSlicer reduces median and 99th percentile pre-execution time by up to 50.2ms and 217.2ms respectively when compared to static preprocessing methods.

References

[1]
Qualcomm on Tour. https://www.anandtech.com/show/11201/qualcomm-snapdragon-835-performance-preview/5.
[2]
NVIDIA Triton Inference Server. https://developer.nvidia.com/nvidia-triton-inference-server.
[3]
Deep Learning for Siri's Voice. https://machinelearning.apple.com/2017/08/06/siri-voices.html, 2017.
[4]
Pixel 2 - Wikipedia. https://en.wikipedia.org/wiki/Pixel_2, 2019.
[5]
Bianco, S. et al. Benchmark analysis of representative deep neural network architectures. 2018.
[6]
Chen, T. et al. TVM: An automated end-to-end optimizing compiler for deep learning. In OSDI 18, 2018.
[7]
Chen, T.Y.H. et al. Glimpse: Continuous, real-time object recognition on mobile devices. In SenSys '15, 2015.
[8]
Chun, B.G. et al. Clonecloud: Elastic execution between mobile device and cloud. In EuroSys '11, 2011.
[9]
Crankshaw, D. et al. Clipper: A low-latency online prediction serving system. In 14th USENIX Symposium on Networked Systems Design and Implementation, 2017.
[10]
Cuervo, E. et al. Maui: Making smartphones last longer with code offload. In ACM MobiSys 2010. Association for Computing Machinery, Inc., June 2010.
[11]
Dai, X. et al. Recurrent Networks for Guided Multi-Attention Classification. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'20).
[12]
Deng, J. et al. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR'09.
[13]
Goodfellow, I. et al.Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.
[14]
Gujarati, A. et al. Swayam: Distributed autoscaling to meet slas of machine learning inference services with resource efficiency. In Middleware '17, 2017.
[15]
Gujarati, A. et al. Serving DNNs like clockwork: Performance predictability from the bottom up. In OSDI'20, 2020.
[16]
Guo, T. Cloud-based or on-device: An empirical study of mobile deep inference. In IC2E'18, 2018.
[17]
He, K. et al. Deep residual learning for image recognition. CVPR'16.
[18]
Howard, A.G. et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications. abs/1704.04861, 2017.
[19]
Hu, J. et al. Banner: An image sensor reconfiguration framework for seamless resolution-based tradeoffs. MobiSys '19.
[20]
Huang, J. et al. Clio: Enabling automatic compilation of deep learning pipelines across iot and cloud. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, 2020.
[21]
Ignatov, A. et al. AI benchmark: All about deep learning on smartphones in 2019. CoRR, abs/1910.06663, 2019. URL http://arxiv.org/abs/1910.06663.
[22]
Ishakian, V. et al. Serving deep learning models in a serverless platform. CoRR,abs/1710.08460, 2017. URL http://arxiv.org/abs/1710.08460.
[23]
Jia, Y. et al. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22Nd ACM International Conference on Multimedia, 2014.
[24]
Jouppi, N.P. et al. In-data center performance analysis of a tensor processing unit. In ISCA'17, pages 1--12, 2017.
[25]
Kang, Y. et al. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. In ACM SIGARCH Computer Architecture News, 2017.
[26]
Kannan, R.S. et al. Grandslam: Guaranteeing slas for jobs in microservices execution frameworks. In Proceedings of the Fourteenth EuroSys Conference 2019.
[27]
Kosta, S. et al. Thinkair: Dynamic resource allocation and parallel execution in the cloud for mobile code offloading. In 2012 Proceedings IEEE INFOCOM, 2012.
[28]
Krizhevsky, A. et al. Learning multiple layers of features from tiny images. Technical report, 2009.
[29]
LeMay, M. et al. Perseus: Characterizing performance and cost of multi-tenant serving for cnn models. In IC2E'20, pages 66--72. IEEE, 2020.
[30]
Liang, Q. et al. AI on the edge: Rethinking AI-based IoT applications using specialized edge architectures. arXiv preprint arXiv:2003.12488, 2020.
[31]
List, N. et al. Svm-optimization and steepest-descent line search. In Proceedings of the 22nd Annual Conference on Computational Learning Theory, 2009.
[32]
Liu, L. et al. Edge assisted real-time object detection for mobile augmented reality. In MobiCom'19, 2019.
[33]
Liu, Z. et al. Deep n-jpeg: a deep neural network favorable jpeg-based image compression framework. InDAC'18, pages 1--6, 2018.
[34]
Ogden, S.S. et al. MODI: Mobile deep inference made efficient by edge computing. In USENIX Workshop on Hot Topics in Edge Computing (HotEdge 18), 2018.
[35]
Ogden, S.S. et al. Mdinference: Balancing inference accuracy and latency for mobile applications. In IC2E 2020, 2020.
[36]
Ogden, S.S. et al. Pieslicer. https://github.com/cake-lab/PieSlicer, 2020.
[37]
Olston, C. et al. Tensor flow-serving: Flexible, high-performance ml serving. In Workshop on ML Systems at NIPS 2017, 2017.
[38]
Paszke, A. et al. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32. 2019.
[39]
Ran, X. et al. Deep decision: A mobile deep learning framework for edge video analytics. In IEEE Conference on Computer Communications, 2018.
[40]
Rayner, K. et al. Masking of foveal and parafoveal vision during eye fixations in reading. J. Exp. Psychol. Hum. Percept. Perform., 1981.
[41]
Reddi, V.J. et al. Mlperf inference benchmark. In ISCA'20, pages 446--459.
[42]
Rice, A. et al. Measuring mobile phone energy consumption for 802.11 wireless networking.Pervasive and Mobile Computing, 6(6):593--606, 2010.
[43]
Romero, F. et al. Infaas: Managed & model-less inference serving. CoRR, abs/1905.13348, 2019. URL http://arxiv.org/abs/1905.13348.
[44]
Soifer, J. et al. Deep learning inference service at Microsoft. In 2019 USENIX Conference on Operational Machine Learning (OpML 19), 2019.
[45]
Szegedy, C. et al. Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567, 2015.
[46]
Teerapittayanon, S. et al. Distributed deep neural networks over the cloud, the edge and end devices. In ICDCS'17, pages 328--339. IEEE, 2017.
[47]
Wallace, G.K. The jpeg still picture compression standard. IEEE transactions on consumer electronics, 1992.
[48]
Wu, C.J. et al. Machine learning at facebook: Understanding inference at the edge. In HPCA'19, pages 331--344. IEEE, 2019.
[49]
Xie, X. et al. Source compression with bounded DNN perception loss for IoTedge computer vision. In MobiCom'19, 2019.
[50]
Xu, M. et al. A first look at deep learning apps on smartphones. In The World Wide Web Conference, WWW '19, 2019.
[51]
Zhang, C. et al. Mark: Exploiting cloud services for cost-effective, slo-aware machine learning inference serving. In 2019 USENIX Annual Technical Conference.
[52]
Zoph, B. et al. Learning transferable architectures for scalable image recognition. CoRR, abs/1707.07012, 2017. URL http://arxiv.org/abs/1707.07012.

Cited By

View all
  • (2024)PraxiPaaS: A Decomposable Machine Learning System for Efficient Container Package Discovery2024 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E61754.2024.00027(178-188)Online publication date: 24-Sep-2024
  • (2023)Analysis of Machine Learning Techniques for Information Classification in Mobile ApplicationsApplied Sciences10.3390/app1309543813:9(5438)Online publication date: 27-Apr-2023
  • (2023)Deep Learning Workload Scheduling in GPU Datacenters: A SurveyACM Computing Surveys10.1145/3638757Online publication date: 27-Dec-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICPE '21: Proceedings of the ACM/SPEC International Conference on Performance Engineering
April 2021
301 pages
ISBN:9781450381949
DOI:10.1145/3427921
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 April 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cloud inference
  2. mobile deep learning
  3. performance modeling

Qualifiers

  • Short-paper

Funding Sources

Conference

ICPE '21

Acceptance Rates

ICPE '21 Paper Acceptance Rate 16 of 61 submissions, 26%;
Overall Acceptance Rate 252 of 851 submissions, 30%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)56
  • Downloads (Last 6 weeks)9
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)PraxiPaaS: A Decomposable Machine Learning System for Efficient Container Package Discovery2024 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E61754.2024.00027(178-188)Online publication date: 24-Sep-2024
  • (2023)Analysis of Machine Learning Techniques for Information Classification in Mobile ApplicationsApplied Sciences10.3390/app1309543813:9(5438)Online publication date: 27-Apr-2023
  • (2023)Deep Learning Workload Scheduling in GPU Datacenters: A SurveyACM Computing Surveys10.1145/3638757Online publication date: 27-Dec-2023
  • (2023)Differentiate Quality of Experience Scheduling for Deep Learning Inferences With Docker Containers in the CloudIEEE Transactions on Cloud Computing10.1109/TCC.2022.315411711:2(1667-1677)Online publication date: 1-Apr-2023
  • (2023)Layercake: Efficient Inference Serving with Cloud and Mobile Resources2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid57682.2023.00027(191-202)Online publication date: May-2023
  • (2022)Computational Estimation by Scientific Data Mining with Classical Methods to Automate Learning Strategies of ScientistsACM Transactions on Knowledge Discovery from Data10.1145/350273616:5(1-52)Online publication date: 9-Mar-2022
  • (2021)Many Models at the Edge: Scaling Deep Inference via Model-Level Caching2021 IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS)10.1109/ACSOS52086.2021.00027(51-60)Online publication date: Sep-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media