Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Minimal-overhead virtualization of a large scale supercomputer

Published: 09 March 2011 Publication History

Abstract

Virtualization has the potential to dramatically increase the usability and reliability of high performance computing (HPC) systems. However, this potential will remain unrealized unless overheads can be minimized. This is particularly challenging on large scale machines that run carefully crafted HPC OSes supporting tightly-coupled, parallel applications. In this paper, we show how careful use of hardware and VMM features enables the virtualization of a large-scale HPC system, specifically a Cray XT4 machine, with < = 5% overhead on key HPC applications, microbenchmarks, and guests at scales of up to 4096 nodes. We describe three techniques essential for achieving such low overhead: passthrough I/O, workload-sensitive selection of paging mechanisms, and carefully controlled preemption. These techniques are forms of symbiotic virtualization, an approach on which we elaborate.

References

[1]
K. Adams and O. Agesen. A comparison of software and hardware techniques for x86 virtualization. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 2006.
[2]
S. R. Alam, J. A. Kuehn, R. F. Barrett, J. M. Larkin, M. R. Fahey, R. Sankaran, and P. H. Worley. Cray XT4: an early evaluation for petascale scientific simulation. In SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing, pages 1--12, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-764-3. http://doi.acm.org/10.1145/1362622.1362675.
[3]
PACIFICAAMD Corporation. AMD64 virtualization codenamed "Pacifica" technology: Secure Virtual Machine Architecture reference manual, May 2005.
[4]
J. Appavoo, V. Uhlig, and A. Waterland. Project kittyhawk: building a global-scale computer: Blue gene/p as a generic computing platform. SIGOPS Oper. Syst. Rev., 42: 77--84, January 2008. ISSN 0163-5980. http://doi.acm.org/10.1145/1341312.1341326. URL http://doi.acm.org/10.1145/1341312.1341326.
[5]
C. Bae, J. Lange, and P. Dinda. Comparing approaches to virtualized page translation in modern VMMs. Technical Report NWU-EECS-10-07, Department of Electrical Engineering and Computer Science, Northwestern University, April 2010.
[6]
R. Bhargava, B. Serebrin, F. Spanini, and S. Manne. Accelerating two-dimensional page walks for virtualized systems. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 2008.
[7]
R. Brightwell, T. Hudson, K. T. Pedretti, and K. D. Underwood. SeaStar Interconnect: Balanced bandwidth for scalable performance. IEEE Micro, 26 (3): 41--57, May/June 2006.
[8]
J. E.S. Hertel, R. Bell, M. Elrick, A. Farnsworth, G. Kerley, J. McGlaun, S. Petney, S. Silling, P. Taylor, and L. Yarrington. CTH: A Software Family for Multi-Dimensional Shock Physics Analysis. In 19th International Symposium on Shock Waves, held at Marseille, France, pages 377--382, July 1993.
[9]
K. B. Ferreira, R. Brightwell, and P. G. Bridges. Characterizing application sensitivity to OS interference using kernel-level noise injection. In Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, November 2008.
[10]
R. Figueiredo, P. A. Dinda, and J. Fortes. A case for grid computing on virtual machines. In 23rd IEEE Conference on Distributed Computing (ICDCS 2003, pages 550--559, May 2003.
[11]
A. Gavrilovska, S. Kumar, H. Raj, K. Schwan, V. Gupta, R. Nathuji, R. Niranjan, A. Ranadive, and P. Saraiya. High performance hypervisor architectures: Virtualization in HPC systems. In 1st Workshop on System-level Virtualization for High Performance Computing (HPCVirt), 2007.
[12]
M. Heroux. HPCCG MicroApp. https://software.sandia.gov/mantevo/downloads/HPCCG-0.5.tar.gz, July 2007.
[13]
W. Huang, J. Liu, B. Abali, and D. K. Panda. A case for high performance computing with virtual machines. In 20th Annual International Conference on Supercomputing (ICS), pages 125--134, 2006.
[14]
Intel Corporation. Intel virtualization technology specification for the IA-32 Intel architecture, April 2005.
[15]
Intel GmbH. Intel MPI benchmarks: Users guide and methodology description, 2004.
[16]
L. Kaplan. Cray CNL. In FastOS PI Meeting and Workshop, June 2007. URL http://www.cs.unm.edu/fastos/07meeting/CNL_FASTOS.pdf.
[17]
S. Kelly and R. Brightwell. Software architecture of the lightweight kernel, Catamount. In 2005 Cray Users' Group Annual Technical Conference. Cray Users' Group, May 2005.
[18]
D. Kerbyson, H. Alme, A. Hoisie, F. Petrini, H. Wasserman, and M. Gittings. Predictive performance and scalability modeling of a large-scale application. In Proceedings of ACM/IEEE Supercomputing, November 2001.
[19]
J. Lange and P. Dinda. SymCall: Symbiotic virtualization through VMM-to-guest upcalls. In Proceedings of the 2011 ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE 2011), Newport Beach, CA, March 2011.
[20]
J. Lange, K. Pedretti, T. Hudson, P. Dinda, Z. Cui, L. Xia, P. Bridges, A. Gocke, S. Jaconette, M. Levenhagen, and R. Brightwell. Palacios and kitten: New high performance operating systems for scalable virtualized and native supercomputing. In Proceedings of the 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010), April 2010.
[21]
J. Liu, W. Huang, B. Abali, and D. Panda. High Performance VMM-Bypass I/O in Virtual Machines. In Proceedings of the USENIX Annual Technical Conference, May 2006.
[22]
P. Luszczek, J. Dongarra, and J. Kepner. Design and implementation of the HPCC benchmark suite. CT Watch Quarterly, 2 (4A), Nov. 2006.
[23]
M. F. Mergen, V. Uhlig, O. Krieger, and J. Xenidis. Virtualization for high-performance computing. Operating Systems Review, 40 (2): 8--11, 2006.
[24]
H. Nishimura, N. Maruyama, and S. Matsuoka. Virtual clusters on the fly - fast, scalable, and flexible installation. In 7th IEEE International Symposium on Cluster Computing and the Grid (CCGRID), pages 549--556, 2007.
[25]
F. Petrini, D. Kerbyson, and S. Pakin. The case of the missing supercomputer performance: Achieving optimal performance on the 8,192 processors of ASCI Q. In Proceedings of SC'03, 2003.
[26]
H. Raj and K. Schwan. High performance and scalable I/O virtualization via self-virtualized devices. In 16th IEEE International Symposium on High Performance Distributed Computing, July 2007.
[27]
S. Song, R. Ge, X. Feng, and K. W. Cameron. Energy profiling and analysis of the HPC Challenge benchmarks. International Journal of High Performance Computing Applications, Vol. 23, No. 3: 265--276, 2009.
[28]
Top500. Top 500 Supercomputing Sites. URL http://www.top500.org/.
[29]
D. Williams, P. Reynolds, K. Walsh, E. G. Sirer, and F. B. Schneider. Device driver safety through a reference validation mechanism. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI'08), 2008.

Cited By

View all
  • (2024)SVD: A Scalable Virtual Machine Disk FormatIEEE Transactions on Cloud Computing10.1109/TCC.2024.339139012:2(684-696)Online publication date: Apr-2024
  • (2018)Non-clairvoyant online scheduling of synchronized jobs on virtual clustersThe Journal of Supercomputing10.1007/s11227-018-2262-474:6(2353-2384)Online publication date: 1-Jun-2018
  • (2017)A Tale of Two Systems: Using Containers to Deploy HPC Applications on Supercomputers and Clouds2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)10.1109/CloudCom.2017.40(74-81)Online publication date: Dec-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 46, Issue 7
VEE '11
July 2011
231 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2007477
Issue’s Table of Contents
  • cover image ACM Conferences
    VEE '11: Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
    March 2011
    250 pages
    ISBN:9781450306874
    DOI:10.1145/1952682
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 March 2011
Published in SIGPLAN Volume 46, Issue 7

Check for updates

Author Tags

  1. high performance computing
  2. parallel computing
  3. virtual machine monitors

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)SVD: A Scalable Virtual Machine Disk FormatIEEE Transactions on Cloud Computing10.1109/TCC.2024.339139012:2(684-696)Online publication date: Apr-2024
  • (2018)Non-clairvoyant online scheduling of synchronized jobs on virtual clustersThe Journal of Supercomputing10.1007/s11227-018-2262-474:6(2353-2384)Online publication date: 1-Jun-2018
  • (2017)A Tale of Two Systems: Using Containers to Deploy HPC Applications on Supercomputers and Clouds2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)10.1109/CloudCom.2017.40(74-81)Online publication date: Dec-2017
  • (2017)Enabling Diverse Software Stacks on Supercomputers Using High Performance Virtual Clusters2017 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2017.92(310-321)Online publication date: Sep-2017
  • (2017)Reducing Load Imbalance of Virtual Clusters via Reconfiguration and Adaptive Job SchedulingProceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGRID.2017.60(992-999)Online publication date: 14-May-2017
  • (2017)Scheduling of online compute-intensive synchronized jobs on high performance virtual clustersJournal of Computer and System Sciences10.1016/j.jcss.2016.10.00985:C(1-17)Online publication date: 1-May-2017
  • (2014)Guarantee Strict Fairness and UtilizePrediction Better in Parallel Job SchedulingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2013.8825:4(971-981)Online publication date: 1-Apr-2014
  • (2012)VMMBJournal of Grid Computing10.1007/s10723-012-9209-410:1(69-84)Online publication date: 1-Mar-2012
  • (2021)Distributed Deep Learning for Remote Sensing Data InterpretationProceedings of the IEEE10.1109/JPROC.2021.3063258109:8(1320-1349)Online publication date: Aug-2021
  • (2021)Recent advances in traffic optimisation: systematic literature review of modern models, methods and algorithmsIET Intelligent Transport Systems10.1049/iet-its.2020.032814:13(1740-1758)Online publication date: 2-Feb-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media