Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3472883.3487003acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Characterizing Microservice Dependency and Performance: Alibaba Trace Analysis

Published: 01 November 2021 Publication History

Abstract

Loosely-coupled and light-weight microservices running in containers are replacing monolithic applications gradually. Understanding the characteristics of microservices is critical to make good use of microservice architectures. However, there is no comprehensive study about microservice and its related systems in production environments so far. In this paper, we present a solid analysis of large-scale deployments of microservices at Alibaba clusters. Our study focuses on the characterization of microservice dependency as well as its runtime performance. We conduct an in-depth anatomy of microservice call graphs to quantify the difference between them and traditional DAGs of data-parallel jobs. In particular, we observe that microservice call graphs are heavy-tail distributed and their topology is similar to a tree and moreover, many microservices are hot-spots. We reveal three types of meaningful call dependency that can be utilized to optimize microservice designs. Our investigation on microservice runtime performance indicates most microservices are much more sensitive to CPU interference than memory interference. To synthesize more representative microservice traces, we build a mathematical model to simulate call graphs. Experimental results demonstrate our model can well preserve those graph properties observed from Alibaba traces.

Supplementary Material

VTT File (Day3_Session9-Order1.vtt)
MP4 File (Day3_Session9-Order1.mp4)
Presentation video

References

[1]
2021. Alibaba Cloud. https://www.alibabacloud.com/.
[2]
2021. Alibaba Cloud Product. https://www.alibabacloud.com/product.
[3]
2021. Amazon Web Services. https://aws.amazon.com/.
[4]
2021. CNCF. https://www.cncf.io/.
[5]
2021. Google Cloud. https://cloud.google.com/.
[6]
2021. Kata Containers. https://katacontainers.io/.
[7]
2021. Kubernetes. https://kubernetes.io/.
[8]
2021. Mysql. https://www.mysql.com/.
[9]
2021. Pods and Nodes. https://kubernetes.io/docs/tutorials/kubernetes-basics/explore/explore-intro/.
[10]
Alexandru Agache, Marc Brooker, Alexandra Iordache, Anthony Liguori, Rolf Neugebauer, Phil Piwonka, and Diana-Maria Popa. 2020. Firecracker: Lightweight virtualization for serverless applications. In Proceedings of USENIX NSDI. 419--434.
[11]
George Amvrosiadis, Jun Woo Park, Gregory R. Ganger, Garth A. Gibson, Elisabeth Baseman, and Nathan DeBardeleben. 2018. On the diversity of cluster workloads and its impact on research results. In Proceedings of USENIX ATC.
[12]
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 36, 4 (2015).
[13]
Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of ACM SoCC. 143--154.
[14]
Eli Cortez, Anand Bonde, Alexandre Muzio, Mark Russinovich, Marcus Fontoura, and Ricardo Bianchini. 2017. Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms. In Proceedings of ACM SOSP. 153--167.
[15]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107--113.
[16]
Yu Gan, Yanqi Zhang, Dailun Cheng, Ankitha Shetty, Priyal Rathi, Nayan Katarki, Ariana Bruno, Justin Hu, Brian Ritchken, Brendon Jackson, et al. 2019. An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems. In Proceedings of ASPLOS. ACM, 3--18.
[17]
Yu Gan, Yanqi Zhang, Kelvin Hu, Dailun Cheng, Yuan He, Meghna Pancholi, and Christina Delimitrou. 2019. Seer: Leveraging big data to navigate the complexity of performance debugging in cloud microservices. In Proceedings of ASPLOS. 19--33.
[18]
SE Ghirotti, T Reilly, and A Rentz. 2018. Tracking and Controlling Microservice Dependencies. In Communications of the ACM.
[19]
Arief Hakim, I Fithriani, and Mila Novita. 2021. Properties of Burr distribution and its application to heavy-tailed survival time data. Journal of Physics: Conference Series (2021).
[20]
Abhinav Kamra, Vishal Misra, and Erich M Nahum. 2004. Yaksha: A self-tuning controller for managing the performance of 3-tiered web sites. In Proceedings of IWQoS. IEEE, 47--56.
[21]
Qixiao Liu and Zhibin Yu. 2018. The elasticity and plasticity in semi-containerized co-locating cloud workload: A view from Alibaba trace. In Proceedings of ACM SoCC. 347--360.
[22]
Shangpin Ma, Chenyuan Fan, Yen Chuang I-Hsiu Liu, and Ciwei Lan. 2019. Graph-based and scenario-driven microservice analysis, retrieval, and testing. In Future Generation Computer Systems.
[23]
Haoran Qiu, Subho S. Banerjee, Saurabh Jha, Zbigniew T. Kalbarczyk, and Ravishankar K. Iyer. 2020. FIRM: An Intelligent Fine-grained Resource Management Framework for SLO-Oriented Microservices. In Proceedings of USENIX OSDI.
[24]
Charles Reiss, Alexey Tumanov, Gregory R Ganger, Randy H Katz, and Michael A Kozuch. 2012. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of ACM SoCC. 1--13.
[25]
Mohammad Shahrad, Rodrigo Fonseca, Íñigo Goiri, Gohar Chaudhry, Paul Batum, Jason Cooke, Eduardo Laureano, Colby Tresness, Mark Russinovich, and Ricardo Bianchini. 2020. Serverless in the wild: Characterizing and optimizing the serverless workload at a large cloud provider. In Proceedings of USENIX ATC. 205--218.
[26]
Benjamin H Sigelman, Luiz Andre Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag. 2010. Dapper, a large-scale distributed systems tracing infrastructure. (2010).
[27]
Akshitha Sriraman and Thomas F Wenisch. 2018. μ suite: a benchmark suite for microservices. In 2018 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 1--12.
[28]
Akshitha Sriraman and Thomas F Wenisch. 2018. μTune: Auto-Tuned Threading for OLDI Microservices. In Proceedings of USENIX OSDI. 177--194.
[29]
Fan-Yun Sun, Jordan Hoffman, Vikas Verma, and Jian Tang. 2020. InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization. In Proceedings of ICLR.
[30]
Andrew S Tanenbaum and Maarten Van Steen. 2007. Distributed systems: principles and paradigms. Prentice-Hall.
[31]
Huangshi Tian, Yunchuan Zheng, and Wei Wang. 2019. Characterizing and Synthesizing Task Dependencies of Data-Parallel Jobs in Alibaba Cloud. In Proceedings of ACM SoCC. 139--151.
[32]
Muhammad Tirmazi, Adam Barker, Nan Deng, Md E Haque, Zhijing Gene Qin, Steven Hand, Mor Harchol-Balter, and John Wilkes. 2020. Borg: the next generation. In Proceedings of Eurosys. 1--14.
[33]
Takanori Ueda, Takuya Nakaike, and Moriyoshi Ohara. 2016. Workload characterization for microservices. In 2016 IEEE international symposium on workload characterization (IISWC). IEEE, 1--10.
[34]
Bhuvan Urgaonkar, Giovanni Pacifici, Prashant Shenoy, Mike Spreitzer, and Asser Tantawi. 2005. An Analytical Model for Multi-tier Internet Services and Its Applications. In Proceedings of ACM Sigmetrics. 291--302.
[35]
Kaushik Veeraraghavan, Justin Meza, David Chou, Wonho Kim, Sonia Margulis, Scott Michelson, Rajesh Nishtala, Daniel Obenshain, Dmitri Perelman, and Yee Jiun Song. 2016. Kraken: leveraging live traffic tests to identify and resolve resource utilization bottlenecks in large scale web services. In Proceedings of USENIX OSDI. 635--651.
[36]
S.V.N. Vishwanathan, Nicol N. Schraudolph, Risi Kondor, and Karsten M. Borgwardt. 2010. Graph Kernels. Journal of Machine Learning Research (2010).
[37]
Tianyi Yu, Qingyuan Liu, Dong Du, Yubin Xia, Binyu Zang, Ziqian Lu, Pingchao Yang, Chenggang Qin, and Haibo Chen. 2020. Characterizing serverless platforms with serverlessbench. In Proceedings of ACM SoCC. 30--44.
[38]
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceeding of USENIX NSDI. 15--28.
[39]
Xiantao Zhang, Xiao Zheng, Zhi Wang, Hang Yang, Yibin Shen, and Xin Long. 2020. High-density Multi-tenant Bare-metal Cloud. In Proceedings of ASPLOS. 483--495.
[40]
Yanqi Zhang, Weizhe Hua, Zhuangzhuang Zhou, G. Edward Suh, and Christina Delimitrou. 2021. Sinan: ML-Based and QoS-Aware Resource Management for Cloud Microservices. In Proceedings of ASPLOS.
[41]
Zhuo Zhang, Chao Li, Yangyu Tao, Renyu Yang, Hong Tang, and Jie Xu. 2014. Fuxi: a fault-tolerant resource management and job scheduling system at internet scale. In Proceedings of VLDB. 1393--1404.
[42]
Hao Zhou, Ming Chen, Qian Lin, Yong Wang, Xiaobin She, Sifan Liu, Rui Gu, Beng Chin Ooi, and Junfeng Yang. 2018. Overload Control for Scaling WeChat Microservices. In Proceedings of ACM SoCC. ACM, 149--161.
[43]
Xiang Zhou, Xin Peng, Tao Xie, Jun Sun, Chenjie Xu, Chao Ji, and Wenyun Zhao. 2018. Poster: Benchmarking microservice systems for software engineering research. In Proceedings of ICSE. IEEE, 323--324.

Cited By

View all
  • (2025)GenesisRM: A state-driven approach to resource management for distributed JVM web applicationsFuture Generation Computer Systems10.1016/j.future.2024.107539163(107539)Online publication date: Feb-2025
  • (2024)Optimizing Microservice Deployment in Edge Computing with Large Language Models: Integrating Retrieval Augmented Generation and Chain of Thought TechniquesSymmetry10.3390/sym1611147016:11(1470)Online publication date: 5-Nov-2024
  • (2024)Leveraging Large Language Models for Efficient Alert Aggregation in AIOPsElectronics10.3390/electronics1322442513:22(4425)Online publication date: 12-Nov-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SoCC '21: Proceedings of the ACM Symposium on Cloud Computing
November 2021
685 pages
ISBN:9781450386388
DOI:10.1145/3472883
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2021

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SoCC '21
Sponsor:
SoCC '21: ACM Symposium on Cloud Computing
November 1 - 4, 2021
WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,187
  • Downloads (Last 6 weeks)139
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2025)GenesisRM: A state-driven approach to resource management for distributed JVM web applicationsFuture Generation Computer Systems10.1016/j.future.2024.107539163(107539)Online publication date: Feb-2025
  • (2024)Optimizing Microservice Deployment in Edge Computing with Large Language Models: Integrating Retrieval Augmented Generation and Chain of Thought TechniquesSymmetry10.3390/sym1611147016:11(1470)Online publication date: 5-Nov-2024
  • (2024)Leveraging Large Language Models for Efficient Alert Aggregation in AIOPsElectronics10.3390/electronics1322442513:22(4425)Online publication date: 12-Nov-2024
  • (2024)Multi-Dimensional Moving Target Defense Method Based on Adaptive Simulated Annealing Genetic AlgorithmElectronics10.3390/electronics1303048713:3(487)Online publication date: 24-Jan-2024
  • (2024)Software compliance in various industries using CI/CD, dynamic microservices, and containersOpen Computer Science10.1515/comp-2024-001314:1Online publication date: 12-Jul-2024
  • (2024)SURE: Secure Unikernels Make Serverless Computing Rapid and EfficientProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698558(668-688)Online publication date: 20-Nov-2024
  • (2024)TailClipper: Reducing Tail Response Time of Distributed Services Through System-Wide SchedulingProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698554(398-414)Online publication date: 20-Nov-2024
  • (2024)Toward Data-Centric Service CompositionProceedings of the 23rd ACM Workshop on Hot Topics in Networks10.1145/3696348.3702013(360-367)Online publication date: 18-Nov-2024
  • (2024)MRCA: Metric-level Root Cause Analysis for Microservices via Multi-Modal DataProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695485(1057-1068)Online publication date: 27-Oct-2024
  • (2024)FaaSConf: QoS-aware Hybrid Resources Configuration for Serverless WorkflowsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695477(957-969)Online publication date: 27-Oct-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media