short-paper

Public Access

PieSlicer: Dynamically Improving Response Time for Cloud-based CNN Inference

Authors:

Samuel S. Ogden,

Tian GuoAuthors Info & Claims

ICPE '21: Proceedings of the ACM/SPEC International Conference on Performance Engineering

Pages 249 - 256

https://doi.org/10.1145/3427921.3450256

Published: 09 April 2021 Publication History

Abstract

Executing deep-learning inference on cloud servers enables the usage of high complexity models for mobile devices with limited resources. However, pre-execution time-the time it takes to prepare and transfer data to the cloud-is variable and can take orders of magnitude longer to complete than inference execution itself. This pre-execution time can be reduced by dynamically deciding the order of two essential steps, preprocessing and data transfer, to better take advantage of on-device resources and network conditions. In this work, we present PieSlicer, a system for making dynamic preprocessing decisions to improve cloud inference performance using linear regression models. PieSlicer then leverages these models to select the appropriate preprocessing location. We show that for image classification applications PieSlicer reduces median and 99th percentile pre-execution time by up to 50.2ms and 217.2ms respectively when compared to static preprocessing methods.

References

[1]

Qualcomm on Tour. https://www.anandtech.com/show/11201/qualcomm-snapdragon-835-performance-preview/5.

[2]

NVIDIA Triton Inference Server. https://developer.nvidia.com/nvidia-triton-inference-server.

[3]

Deep Learning for Siri's Voice. https://machinelearning.apple.com/2017/08/06/siri-voices.html, 2017.

[4]

Pixel 2 - Wikipedia. https://en.wikipedia.org/wiki/Pixel_2, 2019.

[5]

Bianco, S. et al. Benchmark analysis of representative deep neural network architectures. 2018.

[6]

Chen, T. et al. TVM: An automated end-to-end optimizing compiler for deep learning. In OSDI 18, 2018.

[7]

Chen, T.Y.H. et al. Glimpse: Continuous, real-time object recognition on mobile devices. In SenSys '15, 2015.

[8]

Chun, B.G. et al. Clonecloud: Elastic execution between mobile device and cloud. In EuroSys '11, 2011.

Digital Library

[9]

Crankshaw, D. et al. Clipper: A low-latency online prediction serving system. In 14th USENIX Symposium on Networked Systems Design and Implementation, 2017.

[10]

Cuervo, E. et al. Maui: Making smartphones last longer with code offload. In ACM MobiSys 2010. Association for Computing Machinery, Inc., June 2010.

[11]

Dai, X. et al. Recurrent Networks for Guided Multi-Attention Classification. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'20).

[12]

Deng, J. et al. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR'09.

[13]

Goodfellow, I. et al.Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.

Digital Library

[14]

Gujarati, A. et al. Swayam: Distributed autoscaling to meet slas of machine learning inference services with resource efficiency. In Middleware '17, 2017.

Digital Library

[15]

Gujarati, A. et al. Serving DNNs like clockwork: Performance predictability from the bottom up. In OSDI'20, 2020.

[16]

Guo, T. Cloud-based or on-device: An empirical study of mobile deep inference. In IC2E'18, 2018.

[17]

He, K. et al. Deep residual learning for image recognition. CVPR'16.

[18]

Howard, A.G. et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications. abs/1704.04861, 2017.

[19]

Hu, J. et al. Banner: An image sensor reconfiguration framework for seamless resolution-based tradeoffs. MobiSys '19.

[20]

Huang, J. et al. Clio: Enabling automatic compilation of deep learning pipelines across iot and cloud. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, 2020.

Digital Library

[21]

Ignatov, A. et al. AI benchmark: All about deep learning on smartphones in 2019. CoRR, abs/1910.06663, 2019. URL http://arxiv.org/abs/1910.06663.

[22]

Ishakian, V. et al. Serving deep learning models in a serverless platform. CoRR,abs/1710.08460, 2017. URL http://arxiv.org/abs/1710.08460.

[23]

Jia, Y. et al. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22Nd ACM International Conference on Multimedia, 2014.

Digital Library

[24]

Jouppi, N.P. et al. In-data center performance analysis of a tensor processing unit. In ISCA'17, pages 1--12, 2017.

Digital Library

[25]

Kang, Y. et al. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. In ACM SIGARCH Computer Architecture News, 2017.

Digital Library

[26]

Kannan, R.S. et al. Grandslam: Guaranteeing slas for jobs in microservices execution frameworks. In Proceedings of the Fourteenth EuroSys Conference 2019.

Digital Library

[27]

Kosta, S. et al. Thinkair: Dynamic resource allocation and parallel execution in the cloud for mobile code offloading. In 2012 Proceedings IEEE INFOCOM, 2012.

[28]

Krizhevsky, A. et al. Learning multiple layers of features from tiny images. Technical report, 2009.

[29]

LeMay, M. et al. Perseus: Characterizing performance and cost of multi-tenant serving for cnn models. In IC2E'20, pages 66--72. IEEE, 2020.

[30]

Liang, Q. et al. AI on the edge: Rethinking AI-based IoT applications using specialized edge architectures. arXiv preprint arXiv:2003.12488, 2020.

[31]

List, N. et al. Svm-optimization and steepest-descent line search. In Proceedings of the 22nd Annual Conference on Computational Learning Theory, 2009.

[32]

Liu, L. et al. Edge assisted real-time object detection for mobile augmented reality. In MobiCom'19, 2019.

Digital Library

[33]

Liu, Z. et al. Deep n-jpeg: a deep neural network favorable jpeg-based image compression framework. InDAC'18, pages 1--6, 2018.

[34]

Ogden, S.S. et al. MODI: Mobile deep inference made efficient by edge computing. In USENIX Workshop on Hot Topics in Edge Computing (HotEdge 18), 2018.

[35]

Ogden, S.S. et al. Mdinference: Balancing inference accuracy and latency for mobile applications. In IC2E 2020, 2020.

[36]

Ogden, S.S. et al. Pieslicer. https://github.com/cake-lab/PieSlicer, 2020.

[37]

Olston, C. et al. Tensor flow-serving: Flexible, high-performance ml serving. In Workshop on ML Systems at NIPS 2017, 2017.

[38]

Paszke, A. et al. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32. 2019.

[39]

Ran, X. et al. Deep decision: A mobile deep learning framework for edge video analytics. In IEEE Conference on Computer Communications, 2018.

[40]

Rayner, K. et al. Masking of foveal and parafoveal vision during eye fixations in reading. J. Exp. Psychol. Hum. Percept. Perform., 1981.

[41]

Reddi, V.J. et al. Mlperf inference benchmark. In ISCA'20, pages 446--459.

[42]

Rice, A. et al. Measuring mobile phone energy consumption for 802.11 wireless networking.Pervasive and Mobile Computing, 6(6):593--606, 2010.

Digital Library

[43]

Romero, F. et al. Infaas: Managed & model-less inference serving. CoRR, abs/1905.13348, 2019. URL http://arxiv.org/abs/1905.13348.

[44]

Soifer, J. et al. Deep learning inference service at Microsoft. In 2019 USENIX Conference on Operational Machine Learning (OpML 19), 2019.

[45]

Szegedy, C. et al. Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567, 2015.

[46]

Teerapittayanon, S. et al. Distributed deep neural networks over the cloud, the edge and end devices. In ICDCS'17, pages 328--339. IEEE, 2017.

[47]

Wallace, G.K. The jpeg still picture compression standard. IEEE transactions on consumer electronics, 1992.

Digital Library

[48]

Wu, C.J. et al. Machine learning at facebook: Understanding inference at the edge. In HPCA'19, pages 331--344. IEEE, 2019.

[49]

Xie, X. et al. Source compression with bounded DNN perception loss for IoTedge computer vision. In MobiCom'19, 2019.

Digital Library

[50]

Xu, M. et al. A first look at deep learning apps on smartphones. In The World Wide Web Conference, WWW '19, 2019.

Digital Library

[51]

Zhang, C. et al. Mark: Exploiting cloud services for cost-effective, slo-aware machine learning inference serving. In 2019 USENIX Annual Technical Conference.

[52]

Zoph, B. et al. Learning transferable architectures for scalable image recognition. CoRR, abs/1707.07012, 2017. URL http://arxiv.org/abs/1707.07012.

Cited By

Zhang ZKumar RLi JKorver LByrne AStringhini GMatta ICoskun A(2024)PraxiPaaS: A Decomposable Machine Learning System for Efficient Container Package Discovery2024 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E61754.2024.00027(178-188)Online publication date: 24-Sep-2024
https://doi.org/10.1109/IC2E61754.2024.00027
Pérez Arteaga SSandoval Orozco AGarcía Villalba L(2023)Analysis of Machine Learning Techniques for Information Classification in Mobile ApplicationsApplied Sciences10.3390/app1309543813:9(5438)Online publication date: 27-Apr-2023
https://doi.org/10.3390/app13095438
Ye ZGao WHu QSun PWang XLuo YZhang TWen Y(2023)Deep Learning Workload Scheduling in GPU Datacenters: A SurveyACM Computing Surveys10.1145/3638757Online publication date: 27-Dec-2023
https://doi.org/10.1145/3638757
Show More Cited By

Index Terms

PieSlicer: Dynamically Improving Response Time for Cloud-based CNN Inference
1. Computing methodologies
  1. Modeling and simulation
2. Networks
  1. Network performance evaluation
    1. Network performance modeling
  2. Network services
    1. Cloud computing

Recommendations

Response time for cloud computing providers
iiWAS '10: Proceedings of the 12th International Conference on Information Integration and Web-based Applications & Services

Cloud services are becoming popular in terms of distributed technology because they allow cloud users to rent well-specified resources of computing, network, and storage infrastructure. Users pay for their use of services without needing to spend ...
Addressing response time of cloud-based mobile applications
MobileCloud '13: Proceedings of the first international workshop on Mobile cloud computing & networking

With more mobile applications being developed to take advantage of the elastic cloud computing resources instead of restricting to native mobile device resources, this paper investigates a timely question: is there any fundamental challenge that needs ...
Near-Real-Time Cloud Auditing for Rapid Response
WAINA '12: Proceedings of the 2012 26th International Conference on Advanced Information Networking and Applications Workshops

Due to the rapid emergence of Information Technology, cloud computing provides assorted advantages to service providers, developers, organizations, and customers with respect to scalability, flexibility, cost-effectiveness, and availability. However, it ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICPE '21: Proceedings of the ACM/SPEC International Conference on Performance Engineering

April 2021

301 pages

ISBN:9781450381949

DOI:10.1145/3427921

General Chairs:
Johann Bourcier
University of Rennes 1, France
,
Zhen Ming (Jack) Jiang
York University, Canada
,
Program Chairs:
Cor-Paul Bezemer
University of Alberta, Canada
,
Vittorio Cortellessa
University of L'Aquila, Italy

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 April 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

National Science Foundation

Conference

ICPE '21

Sponsor:

ICPE '21: ACM/SPEC International Conference on Performance Engineering

April 19 - 23, 2021

Virtual Event, France

Acceptance Rates

ICPE '21 Paper Acceptance Rate 16 of 61 submissions, 26%;

Overall Acceptance Rate 252 of 851 submissions, 30%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
378
Total Downloads

Downloads (Last 12 months)56
Downloads (Last 6 weeks)9

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang ZKumar RLi JKorver LByrne AStringhini GMatta ICoskun A(2024)PraxiPaaS: A Decomposable Machine Learning System for Efficient Container Package Discovery2024 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E61754.2024.00027(178-188)Online publication date: 24-Sep-2024
https://doi.org/10.1109/IC2E61754.2024.00027
Pérez Arteaga SSandoval Orozco AGarcía Villalba L(2023)Analysis of Machine Learning Techniques for Information Classification in Mobile ApplicationsApplied Sciences10.3390/app1309543813:9(5438)Online publication date: 27-Apr-2023
https://doi.org/10.3390/app13095438
Ye ZGao WHu QSun PWang XLuo YZhang TWen Y(2023)Deep Learning Workload Scheduling in GPU Datacenters: A SurveyACM Computing Surveys10.1145/3638757Online publication date: 27-Dec-2023
https://doi.org/10.1145/3638757
Mao YYan WSong YZeng YChen MCheng LLiu Q(2023)Differentiate Quality of Experience Scheduling for Deep Learning Inferences With Docker Containers in the CloudIEEE Transactions on Cloud Computing10.1109/TCC.2022.315411711:2(1667-1677)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TCC.2022.3154117
Ogden SGuo T(2023)Layercake: Efficient Inference Serving with Cloud and Mobile Resources2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid57682.2023.00027(191-202)Online publication date: May-2023
https://doi.org/10.1109/CCGrid57682.2023.00027
Varde A(2022)Computational Estimation by Scientific Data Mining with Classical Methods to Automate Learning Strategies of ScientistsACM Transactions on Knowledge Discovery from Data10.1145/350273616:5(1-52)Online publication date: 9-Mar-2022
https://dl.acm.org/doi/10.1145/3502736
Ogden SGilman GWalls RGuo T(2021)Many Models at the Edge: Scaling Deep Inference via Model-Level Caching2021 IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS)10.1109/ACSOS52086.2021.00027(51-60)Online publication date: Sep-2021
https://doi.org/10.1109/ACSOS52086.2021.00027

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten