Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/CCGRID.2017.50acmconferencesArticle/Chapter ViewAbstractPublication PagesccgridConference Proceedingsconference-collections
tutorial

PCSsampler: Sample-based, Private-state Cluster Scheduling

Published: 14 May 2017 Publication History

Abstract

As a promising alternative to centralized scheduling, sample-based scheduling is especially suitable for high fan-out workloads that contain a large number of interactive jobs. Compared to centralized schedulers, existing sample-based schedulers do not hold a global view of the cluster's resource status. Instead, the scheduling decisions are made solely based on the status of a small set of randomly sampled workers. Although this simple approach is highly efficient in large clusters, the lack of global knowledge of the cluster can lead to sub-optimal task placement decisions and difficulties in enforcing global scheduling policies. In this paper, we address these challenges in existing sample-based scheduling approaches by allowing the scheduler to maintain an approximate version of the global resource status through caching the worker node's status extracted from reply messages. More specifically, we introduce the private-cluster-state technique (PCS) for the scheduler to obtain such global information. We show that the scheduler can make better scheduling decisions by utilizing PCS and the scheduler can become more capable in enforcing global scheduling policies. The use of PCS is of low cost since it does not initiate new communication in sample-based scheduling. Our approach is implemented in PSCSampler, a full distribute sample-based scheduler, which gains global knowledge from PCS. Experiment results from both simulation runs and Amazon cluster runs show that compared to Sparrow, PCSsampler can significantly reduce both 50th percentile and 90th percentile runtime. The firsttime success rate of PCSsampler in gang scheduling is closer to an omniscient centralized scheduler than baseline sample based scheduler.

References

[1]
C. Reiss, A. Tumanov, G. R. Ganger, R. H. Katz, and M. A. Kozuch, "Heterogeneity and dynamicity of clouds at scale: Google trace analysis," SoCC, 2012.
[2]
M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek, and J. Wilkes, "Omega: flexible, scalable schedulers for large compute clusters," in EuroSys, 2013, pp. 351--364.
[3]
V. Vavilapallih, A. Murthyh, C. Douglasm, M. Konarh, R. Evansy, T. Gravesy, J. Lowey, S. Sethh, B. Sahah, C. Curinom, O. O'Malleyh, S. Agarwali, H. Shahh, S. Radiah, B. Reed, and E. Baldeschwieler, "Apache hadoop yarn," in SoCC, 2013, pp. 1--16.
[4]
K. Karanasos, S. Rao, C. Curino, C. Douglas, K. Chaliparambil, G. M. Fumarola, S. Heddaya, R. Ramakrishnan, and S. Sakalanaga, "Mercury: Hybrid centralized and distributed scheduling in large shared clusters," in USENIX ATC, 2015, pp. 485--497.
[5]
K. Ousterhout, P. Wendell, M. Zaharia, and I. Stoica, "Sparrow: distributed, low latency scheduling," in SOSP, 2013, pp. 69--84.
[6]
Y. Chen, S. Alspaugh, and R. Katz, "Interactive analytical processing in big data systems," in VLDB, 2012, pp. 1802--1813.
[7]
G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica, "Effective Straggler Mitigation: Attack of the Clones." NSDI, pp. 185--198, 2013.
[8]
P. Delgado, F. Dinu, A.-M. Kermarrec, and W. Zwaenepoel, "Hawk: Hybrid datacenter scheduling," in USENIX ATC, 2015, pp. 499--510.
[9]
C. Delimitrou, D. Sanchez, and C. Kozyrakis, "Tarcil: reconciling scheduling speed and quality in large shared clusters," in SoCC, 2015, pp. 97--110.
[10]
M. Mitzenmacher, "The power of two choices in randomized load balancing," IEEE TPDS, vol. 12, no. 10, pp. 1094--1104, 2001.
[11]
Apache thrift. {Online}. Available: https://thrift.apache.org
[12]
J. Wilkes. More google cluster data. {Online}. Available: http://googleresearch.blogspot.ch/2011/11/more-google-cluster-data.html
[13]
C. Reiss, A. Tumanov, G. R. Ganger, R. H. Katz, and M. A. Kozuch, "Heterogeneity and dynamicity of clouds at scale: Google trace analysis," in SoCC, 2012, pp. 7--13.
[14]
Y. Chen, A. Ganapathi, R. Griffith, and R. H. Katz, "The case for evaluating mapreduce performance using workload suites," in MASCOTS, 2011, pp. 390--399.
[15]
Amazon elastic compute cloud. {Online}. Available: http://aws.amazon.com
[16]
C. Hao, J. Shen, H. Zhang, Y. Wu, and M. Li, "Tiresias: Low latency cluster scheduling with task hopping," in IEEE Cluster, 2016.
[17]
J. Rasley, K. Karanasos, S. Kandula, R. Fonseca, M. Vojnovic, and S. Rao, "Efficient queue management for cluster scheduling," in EuroSys, 2016, pp. 1--15.
[18]
P. Delgado, D. Didona, F. Dinu, and W. Zwaenepoel, "Job-aware scheduling in eagle: Divide and stick to your probes," in SoCC, 2016.

Cited By

View all
  • (2017)Improving the robustness and performance of parallel joins over distributed systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2017.06.016109:C(310-323)Online publication date: 1-Nov-2017

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CCGrid '17: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
May 2017
1167 pages
ISBN:9781509066100

Sponsors

Publisher

IEEE Press

Publication History

Published: 14 May 2017

Check for updates

Qualifiers

  • Tutorial
  • Research
  • Refereed limited

Conference

CCGrid '17
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Improving the robustness and performance of parallel joins over distributed systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2017.06.016109:C(310-323)Online publication date: 1-Nov-2017

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media