Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3492866.3549713acmconferencesArticle/Chapter ViewAbstractPublication PagesmobihocConference Proceedingsconference-collections
research-article
Open access

Tackling heterogeneous traffic in multi-access systems via erasure coded servers

Published: 03 October 2022 Publication History

Abstract

Most data generated by modern applications is stored in the cloud, and there is an exponential growth in the volume of jobs to access these data and perform computations using them. The volume of data access or computing jobs can be heterogeneous across different job types and can unpredictably change over time. Cloud service providers cope with this demand heterogeneity and unpredictability by over-provisioning the number of servers hosting each job type. In this paper, we propose the addition of erasure-coded servers that can flexibly serve multiple job types without additional storage cost. We analyze the service capacity region and the response time of such erasure-coded systems and compare them with standard uncoded replication-based systems currently used in the cloud. We show that coding expands the service capacity region, thus enabling the system to handle variability in demand for different data types. Moreover, we characterize the response time of the coded system in various arrival rate regimes. This analysis reveals that adding even a small number of coded servers can significantly reduce the mean response time, with a drastic reduction in regimes where the demand is skewed across different job types.

Supplementary Material

PDF File (p171-choudhury-supp.pdf)
Supplemental material.

References

[1]
Mehmet Aktaş, Sarah E. Anderson, Ann Johnston, Gauri Joshi, Swanand Kadhe, Gretchen L. Matthews, Carolyn Mayer, and Emina Soljanin. 2017. On the service capacity region of accessing erasure coded content. In Proc. Ann. Allerton Conf. Communication, Control and Computing. Monticello, IL, USA, 17--24.
[2]
Mehmet Aktaş, Gauri Joshi, Swanand Kadhe, Fatemeh Kazemi, and Emina Soljanin. 2021. Service Rate Region: A New Aspect of Coded Distributed System Design. IEEE Trans. Inf. Theory 67, 12 (Oct. 2021), 7940--7963.
[3]
Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, and Ion Stoica. 2013. Effective Straggler Mitigation: Attack of the Clones. In USENIX Symp. Networked Systems Design and Implem. (NSDI). USENIX Association, Lombard, IL, 185--198.
[4]
Lee Breslau, Pei Cao, Li Fan, Graham Phillips, and Scott Shenker. 1999. Web caching and Zipf-like distributions: evidence and implications. In Proc. IEEE Int. Conf. Computer Communications (INFOCOM), Vol. 1. New York, NY, USA, 126--134.
[5]
Peter M. Chen, Edward K. Lee, Garth A. Gibson, Randy H. Katz, and David A. Patterson. 1994. RAID: high-performance, reliable secondary storage. ACM Comput. Surv. 26 (1994), 145--185.
[6]
Shengbo Chen, Yin Sun, Ulaş C. Kozat, Longbo Huang, Prasun Sinha, Guanfeng Liang, Xin Liu, and Ness B. Shroff. 2014. When queueing meets coding: Optimal-latency data retrieving scheme in storage clouds. In Proc. IEEE Int. Conf. Computer Communications (INFOCOM). 1042--1050.
[7]
Tuhinangshu Choudhury, Weina Wang, and Gauri Joshi. 2022. Tackling Heterogeneous Traffic in Multi-access Systems via Erasure Coded Servers. https://arxiv.org/abs/2207.03983
[8]
Walfredo Cirne, Francisco Brasileiro, Daniel Paranhos, Luís Fabrício W. Góes, and William Voorsluys. 2007. On the efficacy, efficiency and emergent behavior of task replication in large distributed systems. Parallel Comput. 33, 3 (2007), 213--234.
[9]
William Dally. 2015. High-performance Hardware for Machine Learning. NIPS Tutorial (July 2015).
[10]
Jeffrey Dean and Luiz André Barroso. 2013. The Tail at Scale. ACM Commun. 56, 2 (Feb. 2013), 74--80.
[11]
Elwyn Berkelamp. 1968. Algebraic coding theory. McGraw-Hill, New York, USA.
[12]
Kristen Gardner, Mor Harchol-Balter, Alan Scheller-Wolf, Mark Velednitsky, and Samuel Zbarsky. 2017. Redundancy-d: The Power of d Choices for Redundancy. Oper. Res. 65, 4 (Aug. 2017), 1078--1094.
[13]
Google. 2022. Google Trends. https://trends.google.com/trends
[14]
Gauri Joshi and Dhruva Kaushal. 2021. Synergy via Redundancy: Adaptive Replication Policies and Fundamental Limits. IEEE/ACM Trans. Netw. 29, 02 (March 2021), 737--749.
[15]
Gauri Joshi, Emina Soljanin, and Gregory Wornell. 2017. Efficient Redundancy Techniques for Latency Reduction in Cloud Systems. ACM Trans. Model. Perform. Eval. Comput. Syst. 2, 2 (April 2017), 30.
[16]
Yasaman Keshtkarjahromi, Yuxuan Xing, and Hulya Seferoglu. 2018. Dynamic Heterogeneity-Aware Coded Cooperative Computation at the Edge. In Proc. IEEE Int. Conf. Network Protocols (ICNP). 23--33.
[17]
Jack Kosaian, K. V. Rashmi, and Shivaram Venkataraman. 2019. Parity Models: Erasure-Coded Resilience for Prediction Serving Systems. In Proc. ACM Symp. Operating Systems Principles (SOSP) (Huntsville, Ontario, Canada). 30--46.
[18]
Bin Li, Aditya Ramamoorthy, and R. Srikant. 2018. Mean-Field Analysis of Coding Versus Replication in Large Data Storage Systems. ACM SIGMETRICS Perform. Evaluation Rev. 3, 1, Article 3 (Feb. 2018), 28 pages.
[19]
Mohammad Ali Maddah-Ali and Urs Niesen. 2016. Coding for caching: fundamental limits and practical challenges. IEEE Communications Magazine 54, 8 (Aug. 2016), 23--29.
[20]
Ankur Mallick, Sophie Smith, and Gauri Joshi. 2021. Rateless Codes for Distributed Non-linear Computations. In IEEE Int. Symp. Topics in Coding (ISTC). 1--5.
[21]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing order to the Web. Technical Report. Stanford InfoLab.
[22]
Michael Rabinovich and Oliver Spatscheck. 2002. Web caching and replication. Vol. 67. Addison-Wesley Boston, USA.
[23]
Nihar B. Shah, Kangwook Lee, and Kannan Ramchandran. 2016. When Do Redundant Requests Reduce Latency? IEEE Trans. Commun. 64, 2 (2016), 715--722.
[24]
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The Hadoop Distributed File System. In IEEE Symp. Mass Storage Systems and Technologies (MSST). IEEE Computer Society, USA, 1--10.
[25]
R. Srikant and Lei Ying. 2014. Communication Networks: An Optimization, Control and Stochastic Networks Perspective. Cambridge University Press, USA.
[26]
Yin Sun, C. Emre Koksal, and Ness B. Shroff. 2017. On Delay-Optimal Scheduling in Queueing Systems with Replications. arXiv:1603.07322 [cs.PF] (March 2017).
[27]
John N. Tsitsiklis and Kuang Xu. 2017. Flexible Queueing Architectures. Operations Research 65, 5 (July 2017), 1398--1413.
[28]
Da Wang, Gauri Joshi, and Gregory W. Wornell. 2019. Efficient Straggler Replication in Large-Scale Parallel Computing. ACM Trans. Model. Perform. Eval. Comput. Syst. 4, 2, Article 7 (April 2019), 23 pages.
[29]
Weina Wang, Mor Harchol-Balter, Haotian Jiang, Alan Scheller-Wolf, and R. Srikant. 2019. Delay asymptotics and bounds for multitask parallel jobs. Queueing Syst. 91, 3--4 (April 2019), 207--239.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MobiHoc '22: Proceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing
October 2022
442 pages
ISBN:9781450391658
DOI:10.1145/3492866
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 October 2022

Check for updates

Author Tags

  1. coded computation
  2. erasure coding
  3. latency
  4. service capacity region

Qualifiers

  • Research-article

Funding Sources

  • Carnegie Bosch Institute Research Award
  • NSF ECCS
  • NSF CAREER CCF
  • NSF AI Institute CNS
  • NSF CNS
  • NSF CCF

Conference

MobiHoc '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 296 of 1,843 submissions, 16%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 295
    Total Downloads
  • Downloads (Last 12 months)84
  • Downloads (Last 6 weeks)7
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media