extended-abstract

Minimizing Interference and Maximizing Progress for Hadoop Virtual Machines

Authors:

Sundaresan Rajasekaran,

Mingfa ZhuyAuthors Info & Claims

ACM SIGMETRICS Performance Evaluation Review, Volume 42, Issue 4

Pages 62 - 71

https://doi.org/10.1145/2788402.2788411

Published: 02 June 2015 Publication History

Abstract

Virtualization promised to dramatically increase server utilization levels, yet many data centers are still only lightly loaded. In some ways, big data applications are an ideal fit for using this residual capacity to perform meaningful work, but the high level of interference between interactive and batch processing workloads currently prevents this from being a practical solution in virtualized environments. Further, the variable nature of spare capacity may make it difficult to meet big data application deadlines. In this work we propose two schedulers: one in the virtualization layer designed to minimize interference on high priority interactive services, and one in the Hadoop framework that helps batch processing jobs meet their own performance deadlines. Our approach uses performance models to match Hadoop tasks to the servers that will benefit them the most, and deadline-aware scheduling to effectively order incoming jobs. We use admission control to meet deadlines even when resources are overloaded. The combination of these schedulers allows data center administrators to safely mix resource intensive Hadoop jobs with latency sensitive web applications, and still achieve predictable performance for both. We have implemented our system using Xen and Hadoop, and our evaluation shows that our schedulers allow a mixed cluster to reduce web response times by more than ten fold compared to the existing Xen Credit Scheduler, while meeting more Hadoop deadlines and lowering total task execution times by 6.5%.

References

[1]

X. Bu, J. Rao, and C.-z. Xu. Interference and Locality-aware Task Scheduling for MapReduce Applications in Virtual Clusters. In Proc. of HPDC, 2013.

Digital Library

[2]

L. Cheng and C.-L. Wang. vBalance: using interrupt load balance to improve I/O performance for SMP virtual machines. In Proc. of SOCC, 2012.

Digital Library

[3]

T. Chia-Ying and L. Kang-Yuan. A Modified Priority Based CPU Scheduling Scheme for Virtualized Environment. Int. Journal of Hybrid Information Technology, 2013.

[4]

N. Chohan, C. Castillo, M. Spreitzer, M. Steinder, A. Tantawi, and C. Krintz. See spot run: using spot instances for mapreduce workflows. In Proc. of HotCloud, 2010.

Digital Library

[5]

R. B. Clay, Z. Shen, and X. Ma. Accelerating Batch Analytics with Residual Resources from Interactive Clouds. In Proc. of MASCOTS, 2013.

Digital Library

[6]

J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM, 51, 2008.

Digital Library

[7]

C. Delimitrou and C. Kozyrakis. Paragon: QoS-aware Scheduling for Heterogeneous Datacenters. In Proc. of ASPLOS, 2013.

Digital Library

[8]

H. Herodotou, F. Dong, and S. Babu. No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics. In Proc. of SOCC, 2011.

Digital Library

[9]

M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: fair scheduling for distributed computing clusters. In Proc.of SOSP, 2009.

Digital Library

[10]

E. Keller, J. Szefer, J. Rexford, and R. B. Lee. NoHype: virtualized cloud infrastructure without the virtualization. In Proc. of ISCA, 2010.

Digital Library

[11]

G. Lee, B.-G. Chun, and H. Katz. Heterogeneity-aware resource allocation and scheduling in the cloud. In Proc. of HotCloud, 2011.

Digital Library

[12]

B. Lin and P. A. Dinda. Vsched: Mixing batch and interactive virtual machines using periodic real-time scheduling. In Proc. of Super Computing, 2005.

Digital Library

[13]

H. Lin, X. Ma, J. Archuleta, W.-c. Feng, M. Gardner, and Z. Zhang. MOON: MapReduce On Opportunistic eNvironments. In Proc. of HPDC, 2010.

Digital Library

[14]

K. Morton, A. Friesen, M. Balazinska, and D. Grossman. Estimating the progress of MapReduce pipelines. In Proc. of ICDE, 2010.

[15]

J. Polo, C. Castillo, D. Carrera, Y. Becerra, I. Whalley, M. Steinder, J. Torres, and E. Ayguadé. Resource-aware adaptive scheduling for mapreduce clusters. In Proc. of Middleware, 2011.

Digital Library

[16]

T. Sandholm and K. Lai. Dynamic proportional share scheduling in Hadoop. In Proc. of JSSPP, 2010.

Digital Library

[17]

B. Sharma, T. Wood, and C. R. Das. HybridMR: A Hierarchical MapReduce Scheduler for Hybrid Data Centers. In Proc. of ICDCS, 2013.

Digital Library

[18]

A. Verma, L. Cherkasova, V. Kumar, and R. Campbell. Deadline-based workload management for MapReduce environments: Pieces of the performance puzzle. In Proc. of NOMS, 2012.

[19]

S. Xi, J. Wilson, C. Lu, and C. Gill. RT-Xen: towards real-time hypervisor scheduling in xen. In EMSOFT, 2011.

Digital Library

[20]

C. Xu, S. Gamage, H. Lu, R. Kompella, and D. Xu. vTurbo: accelerating virtual machine I/O processing using designated turbo-sliced core. In Proc. of Usenix ATC, 2013.

Digital Library

[21]

Y. Xu, Z. Musgrave, B. Noble, and M. Bailey. Bobtail: Avoiding Long Tails in the Cloud. In Proc. of NSDI, 2013.

Digital Library

[22]

M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In Proc. of EuroSys, 2010.

Digital Library

[23]

W. Zhang, S. Rajasekaran, and T. Wood. Big Data in the Background: Maximizing Productivity while Minimizing Virtual Machine Interference. In Proc. of Workshop on Architectures and Systems for Big Data, 2013.

[24]

W. Zhang, S. Rajasekaran, T. Wood, and M. Zhu. Mimp: Deadline and interference aware scheduling of hadoop virtual machines. CCGrid, 2014.

Digital Library

[25]

Z. Zhang, L. Cherkasova, A. Verma, and B. T. Loo. Performance Modeling and Optimization of Deadline-Driven Pig Programs. ACM TAAS, 8, 2013.

Digital Library

Cited By

Chen LLiu SWang CMa HQiao YWang ZWu CLu YFeng XCui HLu SXu HGavrilovska ATerry D(2024)A tale of two pathsProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691943(77-95)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691938.3691943
Lin WXiong CWu WShi FLi KXu M(2023)Performance Interference of Virtual Machines: A SurveyACM Computing Surveys10.1145/357300955:12(1-37)Online publication date: 2-Mar-2023
https://dl.acm.org/doi/10.1145/3573009
Gianniti ECiavotta MArdagna D(2021)Optimizing Quality-Aware Big Data Applications in the CloudIEEE Transactions on Cloud Computing10.1109/TCC.2018.28749449:2(737-752)Online publication date: 1-Apr-2021
https://doi.org/10.1109/TCC.2018.2874944
Show More Cited By

Index Terms

Minimizing Interference and Maximizing Progress for Hadoop Virtual Machines

Recommendations

MIMP: deadline and interference aware scheduling of hadoop virtual machines
CCGRID '14: Proceedings of the 14th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing

Virtualization promised to dramatically increase server utilization levels, yet many data centers are still only lightly loaded. In some ways, big data applications are an ideal fit for using this residual capacity to perform meaningful work, but the ...
Dynamic adaptive scheduling for virtual machines
HPDC '11: Proceedings of the 20th international symposium on High performance distributed computing

With multi-core processors becoming popular, exploiting their computational potential becomes an urgent matter. The functionality of multiple standalone computer systems can be aggregated into a single hardware computer by virtualization, giving ...
Partial coscheduling of virtual machines based on memory access patterns
SAC '15: Proceedings of the 30th Annual ACM Symposium on Applied Computing

Virtualization is omnipresent in server environments. The scheduling of virtual machines is a challenging task because it is necessary to avoid differences in processing progress of the virtual CPUs, which otherwise can lead to a severe performance ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGMETRICS Performance Evaluation Review

ACM SIGMETRICS Performance Evaluation Review Volume 42, Issue 4

March 2015

70 pages

ISSN:0163-5999

DOI:10.1145/2788402

Editor:
Giuliano Casale
Imperial College London, London

Issue’s Table of Contents

Copyright © 2015 Authors.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 June 2015

Published in SIGMETRICS Volume 42, Issue 4

Check for updates

Author Tags

Qualifiers

Extended-abstract

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
268
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)1

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen LLiu SWang CMa HQiao YWang ZWu CLu YFeng XCui HLu SXu HGavrilovska ATerry D(2024)A tale of two pathsProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691943(77-95)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691938.3691943
Lin WXiong CWu WShi FLi KXu M(2023)Performance Interference of Virtual Machines: A SurveyACM Computing Surveys10.1145/357300955:12(1-37)Online publication date: 2-Mar-2023
https://dl.acm.org/doi/10.1145/3573009
Gianniti ECiavotta MArdagna D(2021)Optimizing Quality-Aware Big Data Applications in the CloudIEEE Transactions on Cloud Computing10.1109/TCC.2018.28749449:2(737-752)Online publication date: 1-Apr-2021
https://doi.org/10.1109/TCC.2018.2874944
Lin JYang PWang KJiang XZhao YLu DGan WLiao WMo QGan ZNong ZMa MHe FPang JTang H(2020)Cloud Computing-Assisted Dose Verification System and Method for Tumor Pain TreatmentIEEE Access10.1109/ACCESS.2020.30027978(122529-122538)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.3002797
Javadi SSuresh AWajahat MGandhi A(2019)ScavengerProceedings of the ACM Symposium on Cloud Computing10.1145/3357223.3362734(272-285)Online publication date: 20-Nov-2019
https://dl.acm.org/doi/10.1145/3357223.3362734
Khalid JRozner EFelter WXu CRajamani KFerreira AAkella ASeshan SBanerjee S(2018)IronProceedings of the 15th USENIX Conference on Networked Systems Design and Implementation10.5555/3307441.3307468(313-328)Online publication date: 9-Apr-2018
https://dl.acm.org/doi/10.5555/3307441.3307468
Malekimajd MArdagna DCiavotta MGianniti EPassacantando MRizzi A(2018)An optimization framework for the capacity allocation and admission control of MapReduce jobs in cloud systemsThe Journal of Supercomputing10.5555/3288339.328836374:10(5314-5348)Online publication date: 1-Oct-2018
https://dl.acm.org/doi/10.5555/3288339.3288363
Nabavinejad SGoudarzi MHaddad HWainwright RChbeir R(2018)Data locality and VM interference aware mitigation of data skew in hadoop leveraging modern portfolio theoryProceedings of the 33rd Annual ACM Symposium on Applied Computing10.1145/3167132.3167150(175-182)Online publication date: 9-Apr-2018
https://dl.acm.org/doi/10.1145/3167132.3167150
Cano LCarello GArdagna D(2018)A framework for joint resource allocation of MapReduce and web service applications in a shared cloud clusterJournal of Parallel and Distributed Computing10.1016/j.jpdc.2018.05.010120(127-147)Online publication date: Oct-2018
https://doi.org/10.1016/j.jpdc.2018.05.010
Malekimajd MArdagna DCiavotta MGianniti EPassacantando MRizzi A(2018)An optimization framework for the capacity allocation and admission control of MapReduce jobs in cloud systemsThe Journal of Supercomputing10.1007/s11227-018-2426-274:10(5314-5348)Online publication date: 25-May-2018
https://doi.org/10.1007/s11227-018-2426-2
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents