research-article

Minimal-overhead virtualization of a large scale supercomputer

Authors:

Kevin Pedretti,

Patrick G. Bridges,

Philip Soltero,

Alexander MerrittAuthors Info & Claims

ACM SIGPLAN Notices, Volume 46, Issue 7

Pages 169 - 180

https://doi.org/10.1145/2007477.1952705

Published: 09 March 2011 Publication History

Abstract

Virtualization has the potential to dramatically increase the usability and reliability of high performance computing (HPC) systems. However, this potential will remain unrealized unless overheads can be minimized. This is particularly challenging on large scale machines that run carefully crafted HPC OSes supporting tightly-coupled, parallel applications. In this paper, we show how careful use of hardware and VMM features enables the virtualization of a large-scale HPC system, specifically a Cray XT4 machine, with < = 5% overhead on key HPC applications, microbenchmarks, and guests at scales of up to 4096 nodes. We describe three techniques essential for achieving such low overhead: passthrough I/O, workload-sensitive selection of paging mechanisms, and carefully controlled preemption. These techniques are forms of symbiotic virtualization, an approach on which we elaborate.

References

[1]

K. Adams and O. Agesen. A comparison of software and hardware techniques for x86 virtualization. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 2006.

Digital Library

[2]

S. R. Alam, J. A. Kuehn, R. F. Barrett, J. M. Larkin, M. R. Fahey, R. Sankaran, and P. H. Worley. Cray XT4: an early evaluation for petascale scientific simulation. In SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing, pages 1--12, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-764-3. http://doi.acm.org/10.1145/1362622.1362675.

Digital Library

[3]

PACIFICAAMD Corporation. AMD64 virtualization codenamed "Pacifica" technology: Secure Virtual Machine Architecture reference manual, May 2005.

[4]

J. Appavoo, V. Uhlig, and A. Waterland. Project kittyhawk: building a global-scale computer: Blue gene/p as a generic computing platform. SIGOPS Oper. Syst. Rev., 42: 77--84, January 2008. ISSN 0163-5980. http://doi.acm.org/10.1145/1341312.1341326. URL http://doi.acm.org/10.1145/1341312.1341326.

Digital Library

[5]

C. Bae, J. Lange, and P. Dinda. Comparing approaches to virtualized page translation in modern VMMs. Technical Report NWU-EECS-10-07, Department of Electrical Engineering and Computer Science, Northwestern University, April 2010.

[6]

R. Bhargava, B. Serebrin, F. Spanini, and S. Manne. Accelerating two-dimensional page walks for virtualized systems. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 2008.

Digital Library

[7]

R. Brightwell, T. Hudson, K. T. Pedretti, and K. D. Underwood. SeaStar Interconnect: Balanced bandwidth for scalable performance. IEEE Micro, 26 (3): 41--57, May/June 2006.

Digital Library

[8]

J. E.S. Hertel, R. Bell, M. Elrick, A. Farnsworth, G. Kerley, J. McGlaun, S. Petney, S. Silling, P. Taylor, and L. Yarrington. CTH: A Software Family for Multi-Dimensional Shock Physics Analysis. In 19th International Symposium on Shock Waves, held at Marseille, France, pages 377--382, July 1993.

[9]

K. B. Ferreira, R. Brightwell, and P. G. Bridges. Characterizing application sensitivity to OS interference using kernel-level noise injection. In Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, November 2008.

Digital Library

[10]

R. Figueiredo, P. A. Dinda, and J. Fortes. A case for grid computing on virtual machines. In 23rd IEEE Conference on Distributed Computing (ICDCS 2003, pages 550--559, May 2003.

Digital Library

[11]

A. Gavrilovska, S. Kumar, H. Raj, K. Schwan, V. Gupta, R. Nathuji, R. Niranjan, A. Ranadive, and P. Saraiya. High performance hypervisor architectures: Virtualization in HPC systems. In 1st Workshop on System-level Virtualization for High Performance Computing (HPCVirt), 2007.

[12]

M. Heroux. HPCCG MicroApp. https://software.sandia.gov/mantevo/downloads/HPCCG-0.5.tar.gz, July 2007.

[13]

W. Huang, J. Liu, B. Abali, and D. K. Panda. A case for high performance computing with virtual machines. In 20th Annual International Conference on Supercomputing (ICS), pages 125--134, 2006.

Digital Library

[14]

Intel Corporation. Intel virtualization technology specification for the IA-32 Intel architecture, April 2005.

[15]

Intel GmbH. Intel MPI benchmarks: Users guide and methodology description, 2004.

[16]

L. Kaplan. Cray CNL. In FastOS PI Meeting and Workshop, June 2007. URL http://www.cs.unm.edu/fastos/07meeting/CNL_FASTOS.pdf.

[17]

S. Kelly and R. Brightwell. Software architecture of the lightweight kernel, Catamount. In 2005 Cray Users' Group Annual Technical Conference. Cray Users' Group, May 2005.

[18]

D. Kerbyson, H. Alme, A. Hoisie, F. Petrini, H. Wasserman, and M. Gittings. Predictive performance and scalability modeling of a large-scale application. In Proceedings of ACM/IEEE Supercomputing, November 2001.

Digital Library

[19]

J. Lange and P. Dinda. SymCall: Symbiotic virtualization through VMM-to-guest upcalls. In Proceedings of the 2011 ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE 2011), Newport Beach, CA, March 2011.

Digital Library

[20]

J. Lange, K. Pedretti, T. Hudson, P. Dinda, Z. Cui, L. Xia, P. Bridges, A. Gocke, S. Jaconette, M. Levenhagen, and R. Brightwell. Palacios and kitten: New high performance operating systems for scalable virtualized and native supercomputing. In Proceedings of the 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010), April 2010.

[21]

J. Liu, W. Huang, B. Abali, and D. Panda. High Performance VMM-Bypass I/O in Virtual Machines. In Proceedings of the USENIX Annual Technical Conference, May 2006.

Digital Library

[22]

P. Luszczek, J. Dongarra, and J. Kepner. Design and implementation of the HPCC benchmark suite. CT Watch Quarterly, 2 (4A), Nov. 2006.

[23]

M. F. Mergen, V. Uhlig, O. Krieger, and J. Xenidis. Virtualization for high-performance computing. Operating Systems Review, 40 (2): 8--11, 2006.

Digital Library

[24]

H. Nishimura, N. Maruyama, and S. Matsuoka. Virtual clusters on the fly - fast, scalable, and flexible installation. In 7th IEEE International Symposium on Cluster Computing and the Grid (CCGRID), pages 549--556, 2007.

Digital Library

[25]

F. Petrini, D. Kerbyson, and S. Pakin. The case of the missing supercomputer performance: Achieving optimal performance on the 8,192 processors of ASCI Q. In Proceedings of SC'03, 2003.

Digital Library

[26]

H. Raj and K. Schwan. High performance and scalable I/O virtualization via self-virtualized devices. In 16th IEEE International Symposium on High Performance Distributed Computing, July 2007.

Digital Library

[27]

S. Song, R. Ge, X. Feng, and K. W. Cameron. Energy profiling and analysis of the HPC Challenge benchmarks. International Journal of High Performance Computing Applications, Vol. 23, No. 3: 265--276, 2009.

Digital Library

[28]

Top500. Top 500 Supercomputing Sites. URL http://www.top500.org/.

[29]

D. Williams, P. Reynolds, K. Walsh, E. G. Sirer, and F. B. Schneider. Device driver safety through a reference validation mechanism. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI'08), 2008.

Digital Library

Cited By

Nguetchouang KBitchebe SDubuc TCallau-Zori MHubert COlivier PTchana A(2024)SVD: A Scalable Virtual Machine Disk FormatIEEE Transactions on Cloud Computing10.1109/TCC.2024.339139012:2(684-696)Online publication date: Apr-2024
https://doi.org/10.1109/TCC.2024.3391390
Khorandi SSharifi M(2018)Non-clairvoyant online scheduling of synchronized jobs on virtual clustersThe Journal of Supercomputing10.1007/s11227-018-2262-474:6(2353-2384)Online publication date: 1-Jun-2018
https://dl.acm.org/doi/10.1007/s11227-018-2262-4
Younge APedretti KGrant RBrightwell R(2017)A Tale of Two Systems: Using Containers to Deploy HPC Applications on Supercomputers and Clouds2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)10.1109/CloudCom.2017.40(74-81)Online publication date: Dec-2017
https://doi.org/10.1109/CloudCom.2017.40
Show More Cited By

Index Terms

Minimal-overhead virtualization of a large scale supercomputer
1. Software and its engineering
  1. Software creation and management
    1. Designing software
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems

Recommendations

Minimal-overhead virtualization of a large scale supercomputer
VEE '11: Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments

Virtualization has the potential to dramatically increase the usability and reliability of high performance computing (HPC) systems. However, this potential will remain unrealized unless overheads can be minimized. This is particularly challenging on ...
Virtualizing HPC applications using modern hypervisors
FederatedClouds '12: Proceedings of the 2012 workshop on Cloud services, federation, and the 8th open cirrus summit

In this paper we explore the prospects of virtualization technologies being applied to high performance computing tasks. We use an extensive set of HPC benchmarks to evaluate virtualization overhead, including HPC Challenge, NAS Parallel Benchmarks and ...
A case for dual stack virtualization: consolidating HPC and commodity applications in the cloud
SoCC '12: Proceedings of the Third ACM Symposium on Cloud Computing

With the growth of Infrastructure as a Service (IaaS) cloud providers, many have begun to seriously consider cloud services as a substrate for HPC applications. While the cloud promises many benefits for the HPC community, it currently does not come ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 46, Issue 7

VEE '11

July 2011

231 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/2007477

Issue’s Table of Contents

VEE '11: Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
March 2011
250 pages
ISBN:9781450306874
DOI:10.1145/1952682
General Chair:
Erez Petrank
The Technion, Israel
,
Program Chair:
Doug Lea
SUNY Oswego, USA

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 March 2011

Published in SIGPLAN Volume 46, Issue 7

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

55
Total Citations
View Citations
630
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Nguetchouang KBitchebe SDubuc TCallau-Zori MHubert COlivier PTchana A(2024)SVD: A Scalable Virtual Machine Disk FormatIEEE Transactions on Cloud Computing10.1109/TCC.2024.339139012:2(684-696)Online publication date: Apr-2024
https://doi.org/10.1109/TCC.2024.3391390
Khorandi SSharifi M(2018)Non-clairvoyant online scheduling of synchronized jobs on virtual clustersThe Journal of Supercomputing10.1007/s11227-018-2262-474:6(2353-2384)Online publication date: 1-Jun-2018
https://dl.acm.org/doi/10.1007/s11227-018-2262-4
Younge APedretti KGrant RBrightwell R(2017)A Tale of Two Systems: Using Containers to Deploy HPC Applications on Supercomputers and Clouds2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)10.1109/CloudCom.2017.40(74-81)Online publication date: Dec-2017
https://doi.org/10.1109/CloudCom.2017.40
Younge APedretti KGrant RGaines BBrightwell R(2017)Enabling Diverse Software Stacks on Supercomputers Using High Performance Virtual Clusters2017 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2017.92(310-321)Online publication date: Sep-2017
https://doi.org/10.1109/CLUSTER.2017.92
Khorandi SGhiasvand SSharifi M(2017)Reducing Load Imbalance of Virtual Clusters via Reconfiguration and Adaptive Job SchedulingProceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGRID.2017.60(992-999)Online publication date: 14-May-2017
https://dl.acm.org/doi/10.1109/CCGRID.2017.60
(2017)Scheduling of online compute-intensive synchronized jobs on high performance virtual clustersJournal of Computer and System Sciences10.1016/j.jcss.2016.10.00985:C(1-17)Online publication date: 1-May-2017
https://dl.acm.org/doi/10.1016/j.jcss.2016.10.009
Yuan YWu YZheng WLi K(2014)Guarantee Strict Fairness and UtilizePrediction Better in Parallel Job SchedulingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2013.8825:4(971-981)Online publication date: 1-Apr-2014
https://dl.acm.org/doi/10.1109/TPDS.2013.88
Min CKim IKim TEom Y(2012)VMMBJournal of Grid Computing10.1007/s10723-012-9209-410:1(69-84)Online publication date: 1-Mar-2012
https://dl.acm.org/doi/10.1007/s10723-012-9209-4
Haut JPaoletti MMoreno-Alvarez SPlaza JRico-Gallego JPlaza A(2021)Distributed Deep Learning for Remote Sensing Data InterpretationProceedings of the IEEE10.1109/JPROC.2021.3063258109:8(1320-1349)Online publication date: Aug-2021
https://doi.org/10.1109/JPROC.2021.3063258
Aleksander RPaweł C(2021)Recent advances in traffic optimisation: systematic literature review of modern models, methods and algorithmsIET Intelligent Transport Systems10.1049/iet-its.2020.032814:13(1740-1758)Online publication date: 2-Feb-2021
https://doi.org/10.1049/iet-its.2020.0328
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents