Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3485447.3512060acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Truthful Online Scheduling of Cloud Workloads under Uncertainty

Published: 25 April 2022 Publication History

Abstract

Cloud computing customers often submit repeating jobs and computation pipelines on approximately regular schedules, with arrival and running times that exhibit variance. This pattern, typical of training tasks in machine learning, allows customers to partially predict future job requirements. We develop a model of cloud computing platforms that receive statements of work (SoWs) in an online fashion. The SoWs describe future jobs whose arrival times and durations are probabilistic, and whose utility to the submitting agents declines with completion time. The arrival and duration distributions, as well as the utility functions, are considered private customer information and are reported by strategic agents to a scheduler that is optimizing for social welfare.
We design pricing, scheduling, and eviction mechanisms that incentivize truthful reporting of SoWs. An important challenge is maintaining incentives despite the possibility of the platform becoming saturated. We introduce a framework to reduce scheduling under uncertainty to a relaxed scheduling problem without uncertainty. Using this framework, we tackle both adversarial and stochastic submissions of statements of work, and obtain logarithmic and constant competitive mechanisms, respectively.

References

[1]
Orna Agmon Ben-Yehuda, Muli Ben-Yehuda, Assaf Schuster, and Dan Tsafrir. 2013. Deconstructing Amazon EC2 spot instance pricing. ACM Transactions on Economics and Computation (TEAC) 1, 3 (2013), 1–20.
[2]
May Al-Roomi, Shaikha Al-Ebrahim, Sabika Buqrais, and Imtiaz Ahmad. 2013. Cloud computing pricing models: a survey. International Journal of Grid and Distributed Computing 6, 5 (2013), 93–106.
[3]
James Aspnes, Yossi Azar, Amos Fiat, Serge Plotkin, and Orli Waarts. 1993. On-line load balancing with applications to machine scheduling and virtual circuit routing. In Proceedings of the twenty-fifth annual ACM symposium on Theory of computing. 623–631.
[4]
Baruch Awerbuch, Yossi Azar, and Serge Plotkin. 1993. Throughput-competitive on-line routing. In Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science. IEEE, 32–40.
[5]
Yossi Azar, Inna Kalp-Shaltiel, Brendan Lucier, Ishai Menache, Joseph Naor, and Jonathan Yaniv. 2015. Truthful online scheduling with commitments. In Proceedings of the Sixteenth ACM Conference on Economics and Computation. 715–732.
[6]
Yossi Azar, Bala Kalyanasundaram, Serge Plotkin, Kirk R Pruhs, and Orli Waarts. 1997. On-line load balancing of temporary tasks. Journal of Algorithms 22, 1 (1997), 93–110.
[7]
Moshe Babaioff, Shaddin Dughmi, Robert Kleinberg, and Aleksandrs Slivkins. 2015. Dynamic pricing with limited supply.
[8]
Moshe Babaioff, Yishay Mansour, Noam Nisan, Gali Noti, Carlo Curino, Nar Ganapathy, Ishai Menache, Omer Reingold, Moshe Tennenholtz, and Erez Timnat. 2017. Era: A framework for economic resource allocation for the cloud. In Proceedings of the 26th International Conference on World Wide Web Companion. 635–642.
[9]
Yixin Bao, Yanghua Peng, Chuan Wu, and Zongpeng Li. 2018. Online job scheduling in distributed machine learning clusters. In IEEE INFOCOM 2018-IEEE Conference on Computer Communications. IEEE, 495–503.
[10]
Yair Bartal, Rica Gonen, and Noam Nisan. 2003. Incentive compatible multi unit combinatorial auctions. In Proceedings of the 9th conference on Theoretical aspects of rationality and knowledge. 72–87.
[11]
Ran Canetti and Sandy Irani. 1998. Bounding the power of preemption in randomized scheduling. SIAM J. Comput. 27, 4 (1998), 993–1015.
[12]
Shubham Chaudhary, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, and Srinidhi Viswanatha. 2020. Balancing efficiency and fairness in heterogeneous GPU clusters for deep learning. In Proceedings of the Fifteenth European Conference on Computer Systems. 1–16.
[13]
Shuchi Chawla, Nikhil Devanur, Janardhan Kulkarni, and Rad Niazadeh. 2017. Truth and regret in online scheduling. In Proceedings of the 2017 ACM Conference on Economics and Computation. 423–440.
[14]
Shuchi Chawla, Nikhil R Devanur, Alexander E Holroyd, Anna R Karlin, James B Martin, and Balasubramanian Sivan. 2017. Stability of service under time-of-use pricing. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing. 184–197.
[15]
Shuchi Chawla, J. Benjamin Miller, and Yifeng Teng. 2019. Pricing for Online Resource Allocation: Intervals and Paths. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, San Diego, California, USA, January 6-9, 2019, Timothy M. Chan (Ed.). SIAM, 1962–1981. https://doi.org/10.1137/1.9781611975482.119
[16]
Andrew Chung, Subru Krishnan, Konstantinos Karanasos, Carlo Curino, and Gregory R Ganger. 2020. Unearthing inter-job dependencies for better cluster scheduling. In 14th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 20). 1205–1223.
[17]
Carlo Curino, Djellel E Difallah, Chris Douglas, Subru Krishnan, Raghu Ramakrishnan, and Sriram Rao. 2014. Reservation-based scheduling: If you’re late don’t blame us!. In Proceedings of the ACM Symposium on Cloud Computing. 1–14.
[18]
Nikhil R Devanur, Kamal Jain, Balasubramanian Sivan, and Christopher A Wilkens. 2019. Near optimal online algorithms and fast approximation algorithms for resource allocation problems. Journal of the ACM (JACM) 66, 1 (2019), 1–41.
[19]
Paul Dutting, Michal Feldman, Thomas Kesselheim, and Brendan Lucier. 2020. Prophet inequalities made easy: Stochastic optimization by pricing nonstochastic inputs. SIAM J. Comput. 49, 3 (2020), 540–582.
[20]
Yuval Emek, Ron Lavi, Rad Niazadeh, and Yangguang Shi. 2020. Stateful posted pricing with vanishing regret via dynamic deterministic markov decision processes. arXiv preprint arXiv:2005.01869(2020).
[21]
Michal Feldman, Nick Gravin, and Brendan Lucier. 2014. Combinatorial auctions via posted prices. In Proceedings of the twenty-sixth annual ACM-SIAM symposium on Discrete algorithms. SIAM, 123–135.
[22]
Andrew D Ferguson, Peter Bodik, Srikanth Kandula, Eric Boutin, and Rodrigo Fonseca. 2012. Jockey: guaranteed job latency in data parallel clusters. In Proceedings of the 7th ACM european conference on Computer Systems. 99–112.
[23]
Juncheng Gu, Mosharaf Chowdhury, Kang G Shin, Yibo Zhu, Myeongjae Jeon, Junjie Qian, Hongqiang Liu, and Chuanxiong Guo. 2019. Tiresias: A {GPU} cluster manager for distributed deep learning. In 16th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 19). 485–500.
[24]
Navendu Jain, Ishai Menache, Joseph Naor, and Jonathan Yaniv. 2015. Near-optimal scheduling mechanisms for deadline-sensitive jobs in large computing clusters. ACM Transactions on Parallel Computing (TOPC) 2, 1 (2015), 1–29.
[25]
Navendu Jain, Ishai Menache, Joseph Seffi Naor, and Jonathan Yaniv. 2014. A truthful mechanism for value-based scheduling in cloud computing. Theory of Computing Systems 54, 3 (2014), 388–406.
[26]
Virajith Jalaparti, Ivan Bliznets, Srikanth Kandula, Brendan Lucier, and Ishai Menache. 2016. Dynamic pricing and traffic engineering for timely inter-datacenter transfers. In Proceedings of the 2016 ACM SIGCOMM Conference. 73–86.
[27]
Sangeetha Abdu Jyothi, Carlo Curino, Ishai Menache, Shravan Matthur Narayanamurthy, Alexey Tumanov, Jonathan Yaniv, Ruslan Mavlyutov, Inigo Goiri, Subru Krishnan, Janardhan Kulkarni, 2016. Morpheus: Towards automated SLOs for enterprise clusters. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). 117–134.
[28]
Gilad Koren and Dennis Shasha. 1992. D-OVER; an optimal on-line scheduling algorithm for overloaded real-time systems. Ph.D. Dissertation. Inria.
[29]
Tom Leighton, Fillia Makedon, Serge Plotkin, Clifford Stein, Eva Tardos, and Spyros Tragoudas. 1995. Fast approximation algorithms for multicommodity flow problems. J. Comput. System Sci. 50, 2 (1995), 228–243.
[30]
Brendan Lucier, Ishai Menache, Joseph Naor, and Jonathan Yaniv. 2013. Efficient online scheduling for deadline-sensitive jobs. In Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures. 305–314.
[31]
Kshiteej Mahajan, Arjun Balasubramanian, Arjun Singhvi, Shivaram Venkataraman, Aditya Akella, Amar Phanishayee, and Shuchi Chawla. 2020. Themis: Fair and Efficient {GPU} Cluster Scheduling. In 17th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 20). 289–304.
[32]
Ishai Menache, Ohad Shamir, and Navendu Jain. 2014. On-demand, spot, or both: Dynamic resource allocation for executing batch jobs in the cloud. In 11th International Conference on Autonomic Computing ({ICAC} 14). 177–187.
[33]
Yanghua Peng, Yixin Bao, Yangrui Chen, Chuan Wu, and Chuanxiong Guo. 2018. Optimus: an efficient dynamic resource scheduler for deep learning clusters. In Proceedings of the Thirteenth EuroSys Conference. 1–14.
[34]
Serge A Plotkin, David B Shmoys, and Éva Tardos. 1995. Fast approximation algorithms for fractional packing and covering problems. Mathematics of Operations Research 20, 2 (1995), 257–301.
[35]
Ryan Porter. 2004. Mechanism design for online real-time scheduling. In Proceedings of the 5th ACM conference on Electronic commerce. 61–70.
[36]
Alexey Tumanov, Timothy Zhu, Jun Woo Park, Michael A Kozuch, Mor Harchol-Balter, and Gregory R Ganger. 2016. TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters. In Proceedings of the Eleventh European Conference on Computer Systems. 1–16.
[37]
Caesar Wu, Rajkumar Buyya, and Kotagiri Ramamohanarao. 2019. Cloud pricing models: Taxonomy, survey, and interdisciplinary challenges. ACM Computing Surveys (CSUR) 52, 6 (2019), 1–36.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '22: Proceedings of the ACM Web Conference 2022
April 2022
3764 pages
ISBN:9781450390965
DOI:10.1145/3485447
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 April 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cloud computing
  2. mechanism design
  3. online algorithms
  4. scheduling

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

WWW '22
Sponsor:
WWW '22: The ACM Web Conference 2022
April 25 - 29, 2022
Virtual Event, Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 287
    Total Downloads
  • Downloads (Last 12 months)41
  • Downloads (Last 6 weeks)12
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media