abstract

Optimal Service Elasticity in Large-Scale Distributed Systems

Authors:

Debankur Mukherjee,

Souvik Dhara,

Sem C. Borst,

Johan S.H. van LeeuwaardenAuthors Info & Claims

SIGMETRICS '17 Abstracts: Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems

Page 3

https://doi.org/10.1145/3078505.3078532

Published: 05 June 2017 Publication History

Get Access

Abstract

A fundamental challenge in large-scale cloud networks and data centers is to achieve highly efficient server utilization and limit energy consumption, while providing excellent user-perceived performance in the presence of uncertain and time-varying demand patterns. Auto-scaling provides a popular paradigm for automatically adjusting service capacity in response to demand while meeting performance targets, and queue-driven auto-scaling techniques have been widely investigated in the literature. In typical data center architectures and cloud environments however, no centralized queue is maintained, and load balancing algorithms immediately distribute incoming tasks among parallel queues. In these distributed settings with vast numbers of servers, centralized queue-driven auto-scaling techniques involve a substantial communication overhead and major implementation burden, or may not even be viable at all.

Motivated by the above issues, we propose a joint auto-scaling and load balancing scheme which does not require any global queue length information or explicit knowledge of system parameters, and yet provides provably near-optimal service elasticity. We establish the fluid-level dynamics for the proposed scheme in a regime where the total traffic volume and nominal service capacity grow large in proportion. The fluid-limit results show that the proposed scheme achieves asymptotic optimality in terms of user-perceived delay performance as well as energy consumption. Specifically, we prove that both the waiting time of tasks and the relative energy portion consumed by idle servers vanish in the limit. At the same time, the proposed scheme operates in a distributed fashion and involves only constant communication overhead per task, thus ensuring scalability in massive data center operations. Extensive simulation experiments corroborate the fluid-limit results, and demonstrate that the proposed scheme can match the user performance and energy consumption of state-of-the-art approaches that do take full advantage of a centralized queue.

Cited By

View all

Wu XDe Pellegrini FCasale G(2023)Delay and Price Differentiation in Cloud Computing: A Service Model, Supporting Architectures, and PerformanceACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/35928528:3(1-40)Online publication date: 24-Jun-2023
https://dl.acm.org/doi/10.1145/3592852
Williams JHarchol-Balter MWang W(2022)The M/M/k with Deterministic Setup TimesProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35706176:3(1-45)Online publication date: 8-Dec-2022
https://dl.acm.org/doi/10.1145/3570617
Bu QLiu LZhao Y(2021)Mean field approximations to a queueing system with threshold-based workload control schemeCommunications in Statistics - Theory and Methods10.1080/03610926.2021.198360152:11(3960-3981)Online publication date: 7-Oct-2021
https://doi.org/10.1080/03610926.2021.1983601
Show More Cited By

Index Terms

Optimal Service Elasticity in Large-Scale Distributed Systems
1. Mathematics of computing
  1. Probability and statistics
    1. Stochastic processes
      1. Markov processes
2. Networks
  1. Network algorithms
    1. Data path algorithms
      1. Packet scheduling
  2. Network services
    1. Cloud computing

Recommendations

Optimal Service Elasticity in Large-Scale Distributed Systems

A fundamental challenge in large-scale cloud networks and data centers is to achieve highly efficient server utilization and limit energy consumption, while providing excellent user-perceived performance in the presence of uncertain and time-varying ...
Optimal Service Elasticity in Large-Scale Distributed Systems
Performance evaluation review

A fundamental challenge in large-scale cloud networks and data centers is to achieve highly efficient server utilization and limit energy consumption, while providing excellent user-perceived performance in the presence of uncertain and time-varying ...
Asymptotically Optimal Load Balancing Topologies

We consider a system of N servers inter-connected by some underlying graph topology G_N. Tasks with unit-mean exponential processing times arrive at the various servers as independent Poisson processes of rate λ. Each incoming task is irrevocably assigned ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

SIGMETRICS '17 Abstracts: Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems

June 2017

84 pages

ISBN:9781450350327

DOI:10.1145/3078505

General Chairs:
Bruce Hajek
University of Illinois
,
Sewoong Oh
University of Illinois
,
Program Chairs:
Augustin Chaintreau
Columbia University
,
Leana Golubchik
University of Southern California
,
Zhi-Li Zhang
University of Minnesota

ACM SIGMETRICS Performance Evaluation Review Volume 45, Issue 1
Performance evaluation review
June 2017
70 pages
ISSN:0163-5999
DOI:10.1145/3143314
Editor:
Nidhi Hegde
Bell Labs, Nokia
Issue’s Table of Contents

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2017

Check for updates

Author Tags

Qualifiers

Abstract

Funding Sources

Nederlandse Organisatie voor Wetenschappelijk Onderzoek TOP-GO grant
Nederlandse Organisatie voor Wetenschappelijk Onderzoek Gravitation Networks grant

Conference

SIGMETRICS '17

Sponsor:

SIGMETRICS

SIGMETRICS '17: ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems

June 5 - 9, 2017

Illinois, Urbana-Champaign, USA

Acceptance Rates

SIGMETRICS '17 Abstracts Paper Acceptance Rate 27 of 76 submissions, 36%;

Overall Acceptance Rate 459 of 2,691 submissions, 17%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
260
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Wu XDe Pellegrini FCasale G(2023)Delay and Price Differentiation in Cloud Computing: A Service Model, Supporting Architectures, and PerformanceACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/35928528:3(1-40)Online publication date: 24-Jun-2023
https://dl.acm.org/doi/10.1145/3592852
Williams JHarchol-Balter MWang W(2022)The M/M/k with Deterministic Setup TimesProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35706176:3(1-45)Online publication date: 8-Dec-2022
https://dl.acm.org/doi/10.1145/3570617
Bu QLiu LZhao Y(2021)Mean field approximations to a queueing system with threshold-based workload control schemeCommunications in Statistics - Theory and Methods10.1080/03610926.2021.198360152:11(3960-3981)Online publication date: 7-Oct-2021
https://doi.org/10.1080/03610926.2021.1983601
Sharma MKumar RJain A(2020)A Proficient Approach for Load Balancing in Cloud Computing-Join Minimum Loaded QueueInternational Journal of Information System Modeling and Design10.4018/IJISMD.202001010211:1(12-36)Online publication date: Jan-2020
https://doi.org/10.4018/IJISMD.2020010102
Duc TLeiva RCasari PÖstberg P(2019)Machine Learning Methods for Reliable Resource Provisioning in Edge-Cloud ComputingACM Computing Surveys10.1145/334114552:5(1-39)Online publication date: 13-Sep-2019
https://dl.acm.org/doi/10.1145/3341145
Or-Meir ONissim NElovici YRokach L(2019)Dynamic Malware Analysis in the Modern Era—A State of the Art SurveyACM Computing Surveys10.1145/332978652:5(1-48)Online publication date: 13-Sep-2019
https://dl.acm.org/doi/10.1145/3329786
Touzeau VMaïza CMonniaux DReineke J(2019)Fast and exact analysis for LRU cachesProceedings of the ACM on Programming Languages10.1145/32903673:POPL(1-29)Online publication date: 2-Jan-2019
https://dl.acm.org/doi/10.1145/3290367
Chatterjee KGoharshady AOkati NPavlogiannis A(2019)Efficient parameterized algorithms for data packingProceedings of the ACM on Programming Languages10.1145/32903663:POPL(1-28)Online publication date: 2-Jan-2019
https://dl.acm.org/doi/10.1145/3290366
Bae KLee J(2019)Bounded model checking of signal temporal logic properties using syntactic separationProceedings of the ACM on Programming Languages10.1145/32903643:POPL(1-30)Online publication date: 2-Jan-2019
https://dl.acm.org/doi/10.1145/3290364
Crary K(2019)Fully abstract module compilationProceedings of the ACM on Programming Languages10.1145/32903233:POPL(1-29)Online publication date: 2-Jan-2019
https://dl.acm.org/doi/10.1145/3290323
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Optimal Service Elasticity in Large-Scale Distributed Systems

Optimal Service Elasticity in Large-Scale Distributed Systems

Asymptotically Optimal Load Balancing Topologies