research-article

Humas: A Heterogeneity- and Upgrade-Aware Microservice Auto-Scaling Framework in Large-Scale Data Centers

Authors:

Minglu LiAuthors Info & Claims

IEEE Transactions on Computers, Volume 74, Issue 3

Pages 968 - 982

https://doi.org/10.1109/TC.2024.3506862

Published: 27 November 2024 Publication History

Abstract

An effective auto-scaling framework is essential for microservices to ensure performance stability and resource efficiency under dynamic workloads. As revealed by many prior studies, the key to efficient auto-scaling lies in accurately learning performance patterns, i.e., the relationship between performance metrics and workloads in data-driven schemes. However, we notice that there are two significant challenges in characterizing performance patterns for large-scale microservices. Firstly, diverse microservices demonstrate varying sensitivities to heterogeneous machines, causing difficulty in quantifying the performance difference in a fixed manner. Secondly, frequent version upgrades of microservices result in uncertain changes in performance patterns, known as pattern drifts, leading to imprecise resource capacity estimation issues. To address these challenges, we propose Humas, a heterogeneity- and upgrade-aware auto-scaling framework for large-scale microservices. Firstly, Humas quantifies the difference in resource efficiency among heterogeneous machines for various microservices online and normalizes their resources in standard units. Additionally, Humas develops a least-squares density-difference (LSDD) based algorithm to identify pattern drifts caused by upgrades. Lastly, Humas generates capacity adjustment plans for microservices based on the latest performance patterns and predicted workloads. The experiment results conducted on 50 real microservices with over 11,000 containers demonstrate that Humas improves resource efficiency and performance stability by approximately 30.4% and 48.0%, respectively, compared to state-of-the-art approaches.

References

[1]

“Drift detection method based on maximum mean discrepancy.” Seldon. Accessed: Aug. 19, 2022. [Online]. Available: https://docs.seldon.io/projects/alibi-detect/en/stable/cd/methods/mmddrift.html

[2]

“Step and simple scaling policies for Amazon EC2 auto scaling.” Amazon.com. Accessed: Jul. 30, 2024. [Online]. Available: https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scaling-simple-step.html

[3]

S. Athey, J. Tibshirani, and S. Wager, “Generalized random forests,” Ann. Statist., vol. 47, no. 2, pp. 1148–1178, 2019.

[4]

“AWS auto scaling documentation.” Amazon AWS Documentation. Accessed: Jul. 30, 2024. [Online]. Available: https://docs.aws.amazon.com/autoscaling/index.html

[5]

L. Bu, C. Alippi, and D. Zhao, “A pdf-free change detection test based on density difference estimation,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 2, pp. 324–334, Feb. 2018.

[6]

L. Bu, D. Zhao, and C. Alippi, “An incremental change detection test based on density difference estimation,” IEEE Trans. Syst., Man, Cybern.: Syst., vol. 47, no. 10, pp. 2714–2726, Oct. 2017.

[7]

S. Chen, C. Delimitrou, and J. F. Martínez, “PARTIES: QoS-aware resource partitioning for multiple interactive services,” in Proc. 24th Int. Conf. Architectural Support Program. Lang. Operating Syst., New York, NY, USA: ACM, 2019, pp. 107–120.

Digital Library

[8]

T. Chen, R. Bahsoon, and X. Yao, “A survey and taxonomy of self-aware and self-adaptive cloud autoscaling systems,” ACM Comput. Surv., vol. 51, no. 3, pp. 61–100, Jun. 2018.

Digital Library

[9]

T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, New York, NY, USA: ACM, 2016, pp. 785–794.

Digital Library

[10]

K. Cheng et al., “ProScale: Proactive autoscaling for microservice with time-varying workload at the edge,” IEEE Trans. Parallel Distrib. Syst., vol. 34, no. 4, pp. 1294–1312, Apr. 2023.

Digital Library

[11]

O. Cobb and A. Van Looveren, “Context-aware drift detection,” in Proc. 39th Int. Conf. Mach. Learn., vol. 162, K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, Eds., PMLR, Jul. 17–23, 2022, pp. 4087–4111. [Online]. Available: https://proceedings.mlr.press/v162/cobb22a.html

[12]

S. Dev, D. Lo, L. Cheng, and P. Ranganathan, “Autonomous warehouse-scale computers,” in Proc. 57th ACM/IEEE Des. Automat. Conf. (DAC), 2020, pp. 1–6.

[13]

M. Ghorbani, Y. Wang, Y. Xue, M. Pedram, and P. Bogdan, “Prediction and control of bursty cloud workloads: A fractal framework,” in Proc. Int. Conf. Hardware/Softw. Codesign Syst. Synthesis, New York, NY, USA: ACM, 2014, pp. 1–9.

Digital Library

[14]

K. Hazelwood et al., “Applied machine learning at Facebook: A datacenter infrastructure perspective,” in Proc. IEEE Int. Symp. High Perform. Comput. Archit. (HPCA), 2018, pp. 620–629.

[15]

Q. Hua, D. Yang, S. Qian, H. Hu, J. Cao, and G. Xue, “KAE-informer: A knowledge auto-embedding informer for forecasting long-term workloads of microservices,” in Proc. ACM Web Conf., New York, NY, USA: ACM, 2023, pp. 1551–1561.

Digital Library

[16]

S. Kannan, A. Gavrilovska, V. Gupta, and K. Schwan, “HeteroOS: OS design for heterogeneous memory management in datacenter,” in Proc. 44th Annu. Int. Symp. Comput. Archit., New York, NY, USA: ACM, 2017, pp. 521–534.

Digital Library

[17]

Z. Liu, X. Xia, D. Lo, M. Yan, and S. Li, “Just-in-time obsolete comment detection and update,” IEEE Trans. Softw. Eng., vol. 49, no. 1, pp. 1–23, Jan. 2023.

[18]

N. Lu, G. Zhang, and J. Lu, “Concept drift detection via competence models,” Artif. Intell., vol. 209, pp. 11–28, Apr. 2014.

Digital Library

[19]

S. Luo et al., “Characterizing microservice dependency and performance: Alibaba trace analysis,” in Proc. ACM Symp. Cloud Comput., New York, NY, USA: ACM, 2021, pp. 412–426.

Digital Library

[20]

S. Luo and et al, “Erms: Efficient resource management for shared microservices with SLA guarantees,” in Proc. 28th ACM Int. Conf. Architectural Support Program. Lang. Operating Syst., vol. 1, New York, NY, USA: ACM, 2022, pp. 62–77.

Digital Library

[21]

P. Minet, E. Renault, I. Khoufi, and S. Boumerdassi, “Analyzing traces from a google data center,” in Proc. 14th Int. Wireless Commun. Mobile Comput. Conf., 2018, pp. 1167–1172.

[22]

J. Park, B. Choi, C. Lee, and D. Han, “GRAF: A graph neural network based proactive resource allocation framework for SLO-oriented microservices,” in Proc. 17th Int. Conf. Emerg. Netw. Exp. Technol., New York, NY, USA: ACM, 2021, pp. 154–167.

Digital Library

[23]

H. Qiu, S. S. Banerjee, S. Jha, Z. T. Kalbarczyk, and R. K. Iyer, “FIRM: An intelligent fine-grained resource management framework for SLO-oriented microservices,” in Proc. 14th USENIX Conf. Operating Syst. Des. Implementation. Berkeley, CA, USA: USENIX Association, 2020, pp. 805–825.

[24]

C. Qu, R. N. Calheiros, and R. Buyya, “Auto-scaling web applications in clouds: A taxonomy and survey,” ACM Comput. Surv., vol. 51, no. 4, pp. 1–31, Jul. 2018.

Digital Library

[25]

R. Rosen, “Resource management: Linux kernel namespaces and cgroups,” Haifux, vol. 186, no. 70, pp. 1–121, May 2013.

[26]

K. Rzadca et al., “Autopilot: Workload autoscaling at Google,” in Proc. 15th Eur. Conf. Comput. Syst., New York, NY, USA: ACM, 2020.

Digital Library

[27]

M. Sablik, “Taylor's theorem and functional equations,” aequationes mathematicae, vol. 60, no. 3, pp. 258–267, 2000.

[28]

C. Singh, N. S. Gaba, M. Kaur, and B. Kaur, “Comparison of different CI/CD tools integrated with cloud platform,” in Proc. 9th Int. Conf. Cloud Comput., Data Sci. Eng., 2019, pp. 7–12.

[29]

A. Sriraman, A. Dhanotia, and T. F. Wenisch, “SoftSKU: Optimizing server architectures for microservice diversity @scale,” in Proc. 46th Int. Symp. Comput. Archit., New York, NY, USA: ACM, 2019, pp. 513–526.

Digital Library

[30]

R. Taft, “P-store: An elastic database system with predictive provisioning,” in Proc. Int. Conf. Manage. Data, New York, NY, USA: ACM, 2018, pp. 205–219.

Digital Library

[31]

S. J. Taylor and B. Letham, “Forecasting at scale,” Amer. Statistician, vol. 72, no. 1, pp. 37–45, 2018.

[32]

M. Tirmazi et al., “Borg: The next generation,” in Proc. 15th Eur. Conf. Comput. Syst., New York, NY, USA: ACM, 2020.

Digital Library

[33]

W. Wang, B. Li, and B. Liang, “Dominant resource fairness in cloud computing systems with heterogeneous servers,” in Proc. IEEE Conf. Comput. Commun., 2014, pp. 583–591.

[34]

Z. Wang et al., “DeepScaling: Microservices autoscaling for stable CPU utilization in large scale cloud systems,” in Proc. 13th Symp. Cloud Comput., New York, NY, USA: ACM, 2022, pp. 16–30.

Digital Library

[35]

Y. Xiao, Y. Xue, S. Nazarian, and P. Bogdan, “A load balancing inspired optimization framework for exascale multicore systems: A complex networks approach,” in Proc. IEEE/ACM Int. Conf. Comput.-Aided Des. (ICCAD), 2017, pp. 217–224.

[36]

Y. Xiao et al., “End-to-end programmable computing systems,” Commun. Eng., vol. 2, no. 1, p. 84, 2023.

[37]

Y. Xiao, S. Nazarian, and P. Bogdan, “Plasticity-on-chip design: Exploiting self-similarity for data communications,” IEEE Trans. Comput., vol. 70, no. 6, pp. 950–962, Jun. 2021.

[38]

S. Xue et al., “A meta reinforcement learning approach for predictive autoscaling in the cloud,” in Proc. 28th ACM SIGKDD Conf. Knowl. Discovery Data Mining, New York, NY, USA: ACM, 2022, pp. 4290–4299.

Digital Library

[39]

L. Yi, C. Li, and J. Guo, “CPI for runtime performance measurement: The good, the bad, and the ugly,” in Proc. IEEE Int. Symp. Workload Characterization, Piscataway, NJ, USA: IEEE Press, 2020, pp. 106–113.

[40]

Q. Zhang, M. F. Zhani, R. Boutaba, and J. L. Hellerstein, “Harmony: Dynamic heterogeneity-aware resource provisioning in the cloud,” in Proc. IEEE 33rd Int. Conf. Distrib. Comput. Syst., 2013, pp. 510–519.

[41]

S. Zhang, T. Wu, M. Pan, C. Zhang, and Y. Yu, “A-SARSA: A predictive container auto-scaling algorithm based on reinforcement learning,” in Proc. IEEE Int. Conf. Web Services (ICWS), 2020, pp. 489–497.

[42]

X. Zhang, E. Tune, R. Hagmann, R. Jnagal, V. Gokhale, and J. Wilkes, “CPI2: CPU performance isolation for shared compute clusters,” in Proc. 8th ACM Eur. Conf. Comput. Syst., 2013, pp. 379–391.

[43]

Y. Zhang, W. Hua, Z. Zhou, G. E. Suh, and C. Delimitrou, “Sinan: ML-based and QoS-aware resource management for cloud microservices,” in Proc. 26th ACM Int. Conf. Architectural Support Program. Lang. Operating Syst., New York, NY, USA: ACM, 2021, pp. 167–181.

Digital Library

[44]

D. Zhou, H. Chen, K. Shang, G. Cheng, J. Zhang, and H. Hu, “Cushion: A proactive resource provisioning method to mitigate SLO violations for containerized microservices,” IET Commun., vol. 16, no. 17, pp. 2105–2122, 2022.

Digital Library

[45]

H. Zhou et al., “Informer: Beyond efficient transformer for long sequence time-series forecasting,” in Proc. AAAI Conf. Artif. Intell., May 2021, vol. 35, no. 12, pp. 11106–11115.

Index Terms

Index terms have been assigned to the content through auto-classification.

Recommendations

Load balancing and auto-scaling issues in container microservice cloud-based system: a review on the current trend technologies

Load balancing and auto-scaling are essential to cloud features for cloud-based container microservices as they control the number of computing resources available. Many research works have proposed load balancing and auto-scaling approaches for ...
Transforming reactive auto-scaling into proactive auto-scaling
CloudDP '13: Proceedings of the 3rd International Workshop on Cloud Data and Platforms

Elasticity is a key characteristic of cloud platforms enabling resource to be acquired on-demand in response to time-varying workloads. We introduce a new elasticity management framework that takes as input commonly used reactive rule-based scaling ...
Heterogeneity-aware adaptive auto-scaling heuristic for improved QoS and resource usage in cloud environments

Cloud computing is a promising utility-based distributed computing environment in which resources (hardware/software) are offered as a service over the Internet on a pay per use basis. It involves elastic resource provisioning capabilities and hence ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers

IEEE Transactions on Computers Volume 74, Issue 3

March 2025

360 pages

Issue’s Table of Contents

0018-9340 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Computer Society

United States

Publication History

Published: 27 November 2024

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents