research-article

Public Access

Sinan: ML-based and QoS-aware resource management for cloud microservices

Authors:

Zhuangzhuang Zhou,

Christina DelimitrouAuthors Info & Claims

ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 167 - 181

https://doi.org/10.1145/3445814.3446693

Published: 17 April 2021 Publication History

Abstract

Cloud applications are increasingly shifting from large monolithic services, to large numbers of loosely-coupled, specialized microservices. Despite their advantages in terms of facilitating development, deployment, modularity, and isolation, microservices complicate resource management, as dependencies between them introduce backpressure effects and cascading QoS violations.

We present Sinan, a data-driven cluster manager for interactive cloud microservices that is online and QoS-aware. Sinan leverages a set of scalable and validated machine learning models to determine the performance impact of dependencies between microservices, and allocate appropriate resources per tier in a way that preserves the end-to-end tail latency target. We evaluate Sinan both on dedicated local clusters and large-scale deployments on Google Compute Engine (GCE) across representative end-to-end applications built with microservices, such as social networks and hotel reservation sites. We show that Sinan always meets QoS, while also maintaining cluster utilization high, in contrast to prior work which leads to unpredictable performance or sacrifices resource efficiency. Furthermore, the techniques in Sinan are explainable, meaning that cloud operators can yield insights from the ML models on how to better deploy and design their applications to reduce unpredictable performance.

References

[1]

Decomposing twitter: Adventures in service-oriented architecture. https://www.slideshare.net/InfoQ/decomposing-twitter-adventures-in-serviceoriented-architecture.

[2]

Why grpc? https://grpc.io/.

[3]

The evolution of microservices. https://www.slideshare.net/adriancockcroft/evolution-of-microservices-craft-conference, 2016.

[4]

Microservices workshop: Why, what, and how to get there. http://www.slideshare.net/adriancockcroft/microservices-workshop-craft-conference.

[5]

Amazon ec2. http://aws.amazon.com/ec2/.

[6]

Autoscale. https://cwiki.apache.org/cloudstack/autoscaling.html.

[7]

Aws autoscaling. http://aws.amazon.com/autoscaling/.

[8]

Luiz Barroso. Warehouse-scale computing: Entering the teenage decade. ISCA Keynote, SJ, June 2011.

[9]

Luiz Barroso and Urs Hoelzle. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. MC Publishers, 2009.

Digital Library

[10]

Jeffrey Chase, Darrell Anderson, Prachi Thakar, Amin Vahdat, and Ronald Doyle. Managing energy and server resources in hosting centers. In Proceedings of SOSP. Banff, CA, 2001.

Digital Library

[11]

Shuang Chen, Christina Delimitrou, and Jos\'e F. Martínez. Parties: Qos-aware resource partitioning for multiple interactive services. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '19, pages 107?120, New York, NY, USA, 2019. ACM.

Digital Library

[12]

Shuang Chen, Christina Delimitrou, and Jos\'e F Martínez. Parties: Qos-aware resource partitioning for multiple interactive services. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 107?120. ACM, 2019.

[13]

Tianqi Chen and Carlos Guestrin. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '16, pages 785?794, New York, NY, USA, 2016. ACM.

Digital Library

[14]

Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. CoRR, abs/1512.01274, 2015.

[15]

Eli Cortez, Anand Bonde, Alexandre Muzio, Mark Russinovich, Marcus Fontoura, and Ricardo Bianchini. Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms. In Proceedings of the 26th Symposium on Operating Systems Principles, SOSP '17, pages 153?167, New York, NY, USA, 2017. ACM.

Digital Library

[16]

Eli Cortez, Anand Bonde, Alexandre Muzio, Mark Russinovich, Marcus Fontoura, and Ricardo Bianchini. Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms. In Proceedings of the 26th Symposium on Operating Systems Principles, pages 153?167. ACM, 2017.

[17]

Jeffrey Dean and Luiz Andre Barroso. The tail at scale. In CACM, Vol. 56 No. 2, Pages 74-80.

[18]

Christina Delimitrou and Christos Kozyrakis. Paragon: QoS-Aware Scheduling for Heterogeneous Datacenters. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Houston, TX, USA, 2013.

Digital Library

[19]

Christina Delimitrou and Christos Kozyrakis. Quasar: Qos-aware and resource-efficient cluster management. In Technical Report. Stanford, CA, July 2013.

[20]

Christina Delimitrou and Christos Kozyrakis. Quasar: Resource-Efficient and QoS-Aware Cluster Management. In Proceedings of the Nineteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Salt Lake City, UT, USA, 2014.

Digital Library

[21]

Christina Delimitrou, Daniel Sanchez, and Christos Kozyrakis. Tarcil: Reconciling Scheduling Speed and Quality in Large Shared Clusters. In Proceedings of the Sixth ACM Symposium on Cloud Computing (SOCC), August 2015.

[22]

Brad Fitzpatrick. Distributed caching with memcached. In Linux Journal, Volume 2004, Issue 124, 2004.

Digital Library

[23]

Yu Gan, Meghna Pancholi, Dailun Cheng, Siyuan Hu, Yuan He, and Christina Delimitrou. Seer: leveraging big data to navigate the complexity of cloud debugging. In Proceedings of the 10th USENIX Conference on Hot Topics in Cloud Computing, pages 13?13. USENIX Association, 2018.

[24]

Yu Gan, Yanqi Zhang, Dailun Cheng, Ankitha Shetty, Priyal Rathi, Nayan Katarki, Ariana Bruno, Justin Hu, Brian Ritchken, Brendon Jackson, et al. An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 3?18. ACM, 2019.

[25]

Yu Gan, Yanqi Zhang, Kelvin Hu, Yuan He, Meghna Pancholi, Dailun Cheng, and Christina Delimitrou. Seer: Leveraging Big Data to Navigate the Complexity of Performance Debugging in Cloud Microservices. In Proceedings of the Twenty Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 2019.

Digital Library

[26]

Kristina Gligori\'c, Ashton Anderson, and Robert West. How constraints affect content: The case of twitter?s switch from 140 to 280 characters. In Twelfth International AAAI Conference on Web and Social Media, 2018.

[27]

Google container engine. https://cloud.google.com/container-engine.

[28]

Sriram Govindan, Jie Liu, Aman Kansal, and Anand Sivasubramaniam. Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines. In Proceedings of the 2nd ACM Symposium on Cloud Computing. Cascais, Portugal, 2011.

Digital Library

[29]

Milad Hashemi, Kevin Swersky, Jamie A. Smith, Grant Ayers, Heiner Litz, Jichuan Chang, Christos Kozyrakis, and Parthasarathy Ranganathan. Learning memory access patterns. CoRR, abs/1803.02329, 2018.

[30]

Ben Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, and Ion Stoica. Mesos: A platform for fine-grained resource sharing in the data center. In Proceedings of NSDI. Boston, MA, 2011.

Digital Library

[31]

Ana Klimovic, Yawen Wang, Patrick Stuedi, Animesh Trivedi, Jonas Pfefferle, and Christos Kozyrakis. Pocket: Elastic ephemeral storage for serverless analytics. In 13th $\$USENIX$\$ Symposium on Operating Systems Design and Implementation ($\$OSDI$\$ 18), pages 427?444, 2018.

[32]

Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web, pages 591?600. AcM, 2010.

[33]

Ching-Chi Lin, Pangfeng Liu, and Jan-Jan Wu. Energy-aware virtual machine dynamic provision and scheduling for cloud computing. In Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing (CLOUD). Washington, DC, USA, 2011.

Digital Library

[34]

David Lo, Liqun Cheng, Rama Govindaraju, Luiz Andr\'e Barroso, and Christos Kozyrakis. Towards energy proportionality for large-scale latency-critical workloads. In Proceedings of the 41st Annual International Symposium on Computer Architecuture (ISCA). Minneapolis, MN, 2014.

Digital Library

[35]

David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. Heracles: Improving resource efficiency at scale. In Proc. of the 42Nd Annual International Symposium on Computer Architecture (ISCA). Portland, OR, 2015.

Digital Library

[36]

Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, and Mohammad Alizadeh. Learning scheduling algorithms for data processing clusters. CoRR, abs/1810.01963, 2018.

[37]

Jason Mars and Lingjia Tang. Whare-map: heterogeneity in "homogeneous" warehouse-scale computers. In Proceedings of ISCA. Tel-Aviv, Israel, 2013.

[38]

Llew Mason, Jonathan Baxter, Peter Bartlett, and Marcus Frean. Boosting algorithms as gradient descent. In Proceedings of the 12th International Conference on Neural Information Processing Systems, NIPS'99, pages 512?518, Cambridge, MA, USA, 1999. MIT Press.

[39]

David Meisner, Christopher M. Sadler, Luiz Andr\'e Barroso, Wolf-Dietrich Weber, and Thomas F. Wenisch. Power management of online data-intensive services. In Proceedings of the 38th annual international symposium on Computer architecture, pages 319?330, 2011.

Digital Library

[40]

Mongodb. https://www.mongodb.com.

[41]

Ripal Nathuji, Canturk Isci, and Eugene Gorbatov. Exploiting platform heterogeneity for power efficient data centers. In Proceedings of ICAC. Jacksonville, FL, 2007.

Digital Library

[42]

Ripal Nathuji, Aman Kansal, and Alireza Ghaffarkhah. Q-clouds: Managing performance interference effects for qos-aware clouds. In Proceedings of EuroSys. Paris,France, 2010.

[43]

Nginx. https://www.nginx.com.

[44]

Kay Ousterhout, Patrick Wendell, Matei Zaharia, and Ion Stoica. Sparrow: Distributed, low latency scheduling. In Proceedings of SOSP. Farminton, PA, 2013.

[45]

Chenhao Qu, Rodrigo N Calheiros, and Rajkumar Buyya. Auto-scaling web applications in clouds: A taxonomy and survey. ACM Computing Surveys (CSUR), 51(4):73, 2018.

[46]

Rabbitmq. https://www.rabbitmq.com.

[47]

Joy Rahman and Palden Lama. Predicting the end-to-end tail latency of containerized microservices in the cloud. In IEEE International Conference on Cloud Engineering, IC2E 2019, Prague, Czech Republic, June 24-27, 2019, pages 200?210. IEEE, 2019.

[48]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. "why should I trust you?": Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, pages 1135?1144, 2016.

Digital Library

[49]

Ryan A. Rossi and Nesreen K. Ahmed. The network data repository with interactive graph analytics and visualization. In AAAI, 2015.

Digital Library

[50]

S. Sarwar, A. Ankit, and K. Roy. Incremental learning in deep convolutional neural networks using partial network sharing. In arXiv preprint arXiv:1712.02719.

[51]

Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, and John Wilkes. Omega: flexible, scalable schedulers for large compute clusters. In Proceedings of EuroSys. Prague, Czech Republic, 2013.

[52]

Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, and John Wilkes. Cloudscale: elastic resource scaling for multi-tenant cloud systems. In Proceedings of SOCC. Cascais, Portugal, 2011.

[53]

Akshitha Sriraman and Thomas F. Wenisch. tune: Auto-tuned threading for OLDI microservices. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 177?194, Carlsbad, CA, October 2018. USENIX Association.

[54]

Akshitha Sriraman and Thomas F Wenisch. usuite: A benchmark suite for microservices. In 2018 IEEE International Symposium on Workload Characterization (IISWC), pages 1?12. IEEE, 2018.

[55]

Lalith Suresh, Peter Bodik, Ishai Menache, Marco Canini, and Florin Ciucu. Distributed resource management across process boundaries. In Proceedings of the 2017 Symposium on Cloud Computing, pages 611?623. ACM, 2017.

[56]

Apache thrift. https://thrift.apache.org.

[57]

Torque resource manager. http://www.adaptivecomputing.com/products/open-source/torque/.

[58]

Bhuvan Urgaonkar, Giovanni Pacifici, Prashant Shenoy, Mike Spreitzer, and Asser Tantawi. An analytical model for multi-tier internet services and its applications. SIGMETRICS Perform. Eval. Rev., 33(1):291?302, June 2005.

Digital Library

[59]

Abhishek Verma, Luis Pedrosa, Madhukar R. Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. Large-scale cluster management at Google with Borg. In Proceedings of the European Conference on Computer Systems (EuroSys), Bordeaux, France, 2015.

Digital Library

[60]

Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. Bubble-flux: precise online qos management for increased utilization in warehouse scale computers. In Proceedings of ISCA. 2013.

[61]

Hailong Yang, Quan Chen, Moeiz Riaz, Zhongzhi Luan, Lingjia Tang, and Jason Mars. Powerchief: Intelligent power allocation for multi-stage applications to improve responsiveness on power constrained cmp. In Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA ?17, page 133?146, New York, NY, USA, 2017. Association for Computing Machinery.

Digital Library

[62]

Hao Zhou, Ming Chen, Qian Lin, Yong Wang, Xiaobin She, Sifan Liu, Rui Gu, Beng Chin Ooi, and Junfeng Yang. Overload control for scaling wechat microservices. In Proceedings of the ACM Symposium on Cloud Computing, pages 149?161. ACM, 2018.

Cited By

Xu DLiu FWang BTang XZeng DGao HChen RWu Q(2025)GenesisRM: A state-driven approach to resource management for distributed JVM web applicationsFuture Generation Computer Systems10.1016/j.future.2024.107539163(107539)Online publication date: Feb-2025
https://doi.org/10.1016/j.future.2024.107539
Surianarayanan CChelliah PRamasamy MM B(2024)Quality of Service (QoS)-Aware Microservices Selection Based on Local ConstraintsInternational Journal of Computer Theory and Engineering10.7763/IJCTE.2024.V16.135216:2(35-43)Online publication date: 2024
https://doi.org/10.7763/IJCTE.2024.V16.1352
Patel SHan DNarodystka NJyothi S(2024)Toward Trustworthy Learning-Enabled Systems with Concept-Based ExplanationsProceedings of the 23rd ACM Workshop on Hot Topics in Networks10.1145/3696348.3696894(60-67)Online publication date: 18-Nov-2024
https://dl.acm.org/doi/10.1145/3696348.3696894
Show More Cited By

Index Terms

Sinan: ML-based and QoS-aware resource management for cloud microservices
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Cloud computing
2. Computing methodologies
  1. Artificial intelligence
    1. Planning and scheduling

Recommendations

AQUATOPE: QoS-and-Uncertainty-Aware Resource Management for Multi-stage Serverless Workflows
ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1

Multi-stage serverless applications, i.e., workflows with many computation and I/O stages, are becoming increasingly representative of FaaS platforms. Despite their advantages in terms of fine-grained scalability and modular development, these ...
Quasar: resource-efficient and QoS-aware cluster management
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Cloud computing promises flexibility and high performance for users and high cost-efficiency for operators. Nevertheless, most cloud facilities operate at very low utilization, hurting both cost effectiveness and future scalability.

We present Quasar, a ...
Workload-aware resource management for software-defined compute

With advance of cloud computing technologies, there have been more diverse and heterogeneous workloads running on cloud datacenters. As more and more workloads run on the datacenters, the contention for the limited shared resources may increase, which ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

April 2021

1090 pages

ISBN:9781450383172

DOI:10.1145/3445814

General Chair:
Tim Sherwood
University of California at Santa Barbara, USA
,
Program Chairs:
Emery Berger
University of Massachusetts at Amherst, USA
,
Christos Kozyrakis
Stanford University, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 April 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

NSF (National Science Foundation)

Conference

ASPLOS '21

Sponsor:

SIGPLAN

ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

April 19 - 23, 2021

Virtual, USA

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

83
Total Citations
View Citations
3,091
Total Downloads

Downloads (Last 12 months)1,045
Downloads (Last 6 weeks)148

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xu DLiu FWang BTang XZeng DGao HChen RWu Q(2025)GenesisRM: A state-driven approach to resource management for distributed JVM web applicationsFuture Generation Computer Systems10.1016/j.future.2024.107539163(107539)Online publication date: Feb-2025
https://doi.org/10.1016/j.future.2024.107539
Surianarayanan CChelliah PRamasamy MM B(2024)Quality of Service (QoS)-Aware Microservices Selection Based on Local ConstraintsInternational Journal of Computer Theory and Engineering10.7763/IJCTE.2024.V16.135216:2(35-43)Online publication date: 2024
https://doi.org/10.7763/IJCTE.2024.V16.1352
Patel SHan DNarodystka NJyothi S(2024)Toward Trustworthy Learning-Enabled Systems with Concept-Based ExplanationsProceedings of the 23rd ACM Workshop on Hot Topics in Networks10.1145/3696348.3696894(60-67)Online publication date: 18-Nov-2024
https://dl.acm.org/doi/10.1145/3696348.3696894
Lim GPrerepa AGodfrey BMittal R(2024)Opportunities and Challenges in Service Layer Traffic EngineeringProceedings of the 23rd ACM Workshop on Hot Topics in Networks10.1145/3696348.3696871(352-359)Online publication date: 18-Nov-2024
https://dl.acm.org/doi/10.1145/3696348.3696871
Liao HGuo JHuang BHan YYang DShi KDing JXu GYang GZhang LFilkov VRay BZhou M(2024)DeployFix: Dynamic Repair of Software Deployment Failures via Constraint SolvingProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695268(2053-2064)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695268
Park JPark JJung YLim HYeo HHan DSekar VYu MSeneviratne AVeitch D(2024)TopFull: An Adaptive Top-Down Overload Control for SLO-Oriented MicroservicesProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672253(876-890)Online publication date: 4-Aug-2024
https://dl.acm.org/doi/10.1145/3651890.3672253
Christofidi GDoudali T(2024)Do Predictors for Resource Overcommitment Even Predict?Proceedings of the 4th Workshop on Machine Learning and Systems10.1145/3642970.3655838(153-160)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3642970.3655838
Luo YGao MYu ZGe HGao XCai TChen GBaeza-Yates RBonchi F(2024)Integrating System State into Spatio Temporal Graph Neural Network for Microservice Workload PredictionProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671508(5521-5531)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671508
Luo SLin CYe KXu GZhang LYang GXu HXu C(2024)Optimizing Resource Management for Shared Microservices: A Scalable System DesignACM Transactions on Computer Systems10.1145/363160742:1-2(1-28)Online publication date: 13-Feb-2024
https://dl.acm.org/doi/10.1145/3631607
Li WZhang JYin YLi YZhu ZZhou WLin LLi FBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)Flux: Decoupled Auto-Scaling for Heterogeneous Query Workload in Alibaba AnalyticDBCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653381(255-268)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3653381
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents