Nothing Special   »   [go: up one dir, main page]

skip to main content
extended-abstract

Minimizing Interference and Maximizing Progress for Hadoop Virtual Machines

Published: 02 June 2015 Publication History

Abstract

Virtualization promised to dramatically increase server utilization levels, yet many data centers are still only lightly loaded. In some ways, big data applications are an ideal fit for using this residual capacity to perform meaningful work, but the high level of interference between interactive and batch processing workloads currently prevents this from being a practical solution in virtualized environments. Further, the variable nature of spare capacity may make it difficult to meet big data application deadlines. In this work we propose two schedulers: one in the virtualization layer designed to minimize interference on high priority interactive services, and one in the Hadoop framework that helps batch processing jobs meet their own performance deadlines. Our approach uses performance models to match Hadoop tasks to the servers that will benefit them the most, and deadline-aware scheduling to effectively order incoming jobs. We use admission control to meet deadlines even when resources are overloaded. The combination of these schedulers allows data center administrators to safely mix resource intensive Hadoop jobs with latency sensitive web applications, and still achieve predictable performance for both. We have implemented our system using Xen and Hadoop, and our evaluation shows that our schedulers allow a mixed cluster to reduce web response times by more than ten fold compared to the existing Xen Credit Scheduler, while meeting more Hadoop deadlines and lowering total task execution times by 6.5%.

References

[1]
X. Bu, J. Rao, and C.-z. Xu. Interference and Locality-aware Task Scheduling for MapReduce Applications in Virtual Clusters. In Proc. of HPDC, 2013.
[2]
L. Cheng and C.-L. Wang. vBalance: using interrupt load balance to improve I/O performance for SMP virtual machines. In Proc. of SOCC, 2012.
[3]
T. Chia-Ying and L. Kang-Yuan. A Modified Priority Based CPU Scheduling Scheme for Virtualized Environment. Int. Journal of Hybrid Information Technology, 2013.
[4]
N. Chohan, C. Castillo, M. Spreitzer, M. Steinder, A. Tantawi, and C. Krintz. See spot run: using spot instances for mapreduce workflows. In Proc. of HotCloud, 2010.
[5]
R. B. Clay, Z. Shen, and X. Ma. Accelerating Batch Analytics with Residual Resources from Interactive Clouds. In Proc. of MASCOTS, 2013.
[6]
J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM, 51, 2008.
[7]
C. Delimitrou and C. Kozyrakis. Paragon: QoS-aware Scheduling for Heterogeneous Datacenters. In Proc. of ASPLOS, 2013.
[8]
H. Herodotou, F. Dong, and S. Babu. No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics. In Proc. of SOCC, 2011.
[9]
M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: fair scheduling for distributed computing clusters. In Proc.of SOSP, 2009.
[10]
E. Keller, J. Szefer, J. Rexford, and R. B. Lee. NoHype: virtualized cloud infrastructure without the virtualization. In Proc. of ISCA, 2010.
[11]
G. Lee, B.-G. Chun, and H. Katz. Heterogeneity-aware resource allocation and scheduling in the cloud. In Proc. of HotCloud, 2011.
[12]
B. Lin and P. A. Dinda. Vsched: Mixing batch and interactive virtual machines using periodic real-time scheduling. In Proc. of Super Computing, 2005.
[13]
H. Lin, X. Ma, J. Archuleta, W.-c. Feng, M. Gardner, and Z. Zhang. MOON: MapReduce On Opportunistic eNvironments. In Proc. of HPDC, 2010.
[14]
K. Morton, A. Friesen, M. Balazinska, and D. Grossman. Estimating the progress of MapReduce pipelines. In Proc. of ICDE, 2010.
[15]
J. Polo, C. Castillo, D. Carrera, Y. Becerra, I. Whalley, M. Steinder, J. Torres, and E. Ayguadé. Resource-aware adaptive scheduling for mapreduce clusters. In Proc. of Middleware, 2011.
[16]
T. Sandholm and K. Lai. Dynamic proportional share scheduling in Hadoop. In Proc. of JSSPP, 2010.
[17]
B. Sharma, T. Wood, and C. R. Das. HybridMR: A Hierarchical MapReduce Scheduler for Hybrid Data Centers. In Proc. of ICDCS, 2013.
[18]
A. Verma, L. Cherkasova, V. Kumar, and R. Campbell. Deadline-based workload management for MapReduce environments: Pieces of the performance puzzle. In Proc. of NOMS, 2012.
[19]
S. Xi, J. Wilson, C. Lu, and C. Gill. RT-Xen: towards real-time hypervisor scheduling in xen. In EMSOFT, 2011.
[20]
C. Xu, S. Gamage, H. Lu, R. Kompella, and D. Xu. vTurbo: accelerating virtual machine I/O processing using designated turbo-sliced core. In Proc. of Usenix ATC, 2013.
[21]
Y. Xu, Z. Musgrave, B. Noble, and M. Bailey. Bobtail: Avoiding Long Tails in the Cloud. In Proc. of NSDI, 2013.
[22]
M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In Proc. of EuroSys, 2010.
[23]
W. Zhang, S. Rajasekaran, and T. Wood. Big Data in the Background: Maximizing Productivity while Minimizing Virtual Machine Interference. In Proc. of Workshop on Architectures and Systems for Big Data, 2013.
[24]
W. Zhang, S. Rajasekaran, T. Wood, and M. Zhu. Mimp: Deadline and interference aware scheduling of hadoop virtual machines. CCGrid, 2014.
[25]
Z. Zhang, L. Cherkasova, A. Verma, and B. T. Loo. Performance Modeling and Optimization of Deadline-Driven Pig Programs. ACM TAAS, 8, 2013.

Cited By

View all
  • (2024)A tale of two pathsProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691943(77-95)Online publication date: 10-Jul-2024
  • (2023)Performance Interference of Virtual Machines: A SurveyACM Computing Surveys10.1145/357300955:12(1-37)Online publication date: 2-Mar-2023
  • (2021)Optimizing Quality-Aware Big Data Applications in the CloudIEEE Transactions on Cloud Computing10.1109/TCC.2018.28749449:2(737-752)Online publication date: 1-Apr-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGMETRICS Performance Evaluation Review
ACM SIGMETRICS Performance Evaluation Review  Volume 42, Issue 4
March 2015
70 pages
ISSN:0163-5999
DOI:10.1145/2788402
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 June 2015
Published in SIGMETRICS Volume 42, Issue 4

Check for updates

Author Tags

  1. Map Reduce
  2. admission control
  3. deadlines
  4. interference
  5. scheduling
  6. virtualization

Qualifiers

  • Extended-abstract

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A tale of two pathsProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691943(77-95)Online publication date: 10-Jul-2024
  • (2023)Performance Interference of Virtual Machines: A SurveyACM Computing Surveys10.1145/357300955:12(1-37)Online publication date: 2-Mar-2023
  • (2021)Optimizing Quality-Aware Big Data Applications in the CloudIEEE Transactions on Cloud Computing10.1109/TCC.2018.28749449:2(737-752)Online publication date: 1-Apr-2021
  • (2020)Cloud Computing-Assisted Dose Verification System and Method for Tumor Pain TreatmentIEEE Access10.1109/ACCESS.2020.30027978(122529-122538)Online publication date: 2020
  • (2019)ScavengerProceedings of the ACM Symposium on Cloud Computing10.1145/3357223.3362734(272-285)Online publication date: 20-Nov-2019
  • (2018)IronProceedings of the 15th USENIX Conference on Networked Systems Design and Implementation10.5555/3307441.3307468(313-328)Online publication date: 9-Apr-2018
  • (2018)An optimization framework for the capacity allocation and admission control of MapReduce jobs in cloud systemsThe Journal of Supercomputing10.5555/3288339.328836374:10(5314-5348)Online publication date: 1-Oct-2018
  • (2018)Data locality and VM interference aware mitigation of data skew in hadoop leveraging modern portfolio theoryProceedings of the 33rd Annual ACM Symposium on Applied Computing10.1145/3167132.3167150(175-182)Online publication date: 9-Apr-2018
  • (2018)A framework for joint resource allocation of MapReduce and web service applications in a shared cloud clusterJournal of Parallel and Distributed Computing10.1016/j.jpdc.2018.05.010120(127-147)Online publication date: Oct-2018
  • (2018)An optimization framework for the capacity allocation and admission control of MapReduce jobs in cloud systemsThe Journal of Supercomputing10.1007/s11227-018-2426-274:10(5314-5348)Online publication date: 25-May-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media