OS Virtualization: CSC 456 Final Presentation Brandon D. Shroyer

OS Virtualization

CSC 456 Final Presentation

Brandon D. Shroyer
  Virtualization: Providing an interface to software
that maps to some underlying system.
  A one-to-one mapping between a guest and the host
on which it runs [9, 10].

  Virtualized system should be an “efficient, isolated

duplicate” [8] of the real one.

  Process virtual machine just supports a process;

system virtual machine supports an entire system.
Why Virtualize?
  Reasons for Virtualization
  Hardware Economy
  Versatility
  Environment Specialization
  Security
  Safe Kernel Development
  OS Research [12]
Process Virtualization
  VM interfaces with single
  Application sees “virtual
machine” as address Application
space, registers, and Virtualization Layer
instruction set [10].

  Examples: OS
  Multiprogramming
  Emulation for binaries
  High-level language VMMs
(e.g., JVM)
System Virtualization

Virtualization Layer
Virtualization Layer


Classical Virtualization Hosted Virtualization/

System Virtualization
  Interfaces with operating system
  OS sees VM as an actual machine—memory, I/O,
CPU, etc [10].
  Classic virtualization: virtualization layer runs atop
the hardware.
  Usually found on servers (Xen, VMWare ESX)
  Hosted or whole-system virtualization: virtualization
runs on an operating system
  Popular for desktops (VMWare Workstation, Virtual
  Providing an interface to a system so that it can run
on a system with a different interface [10].
  Lets compiled binaries, OSes run on architectures
with different ISA (binary translation)
  Performance usually worse than classic virtualization.

  Example: QEMU [11]

  Breaks CPU instructions into small ops, coded in C.
  C code is compiled into small objects on native ISA.
  dyngen utility runs code by dynamically stitching
objects together (dynamic code generation).
Some Important Terms
  Virtual Machine (VM): An instance of of an
operating system running on a virtualized system.
Also known as a virtual or guest OS.

  hypervisor: The underlying virtualization system

sitting between the guest OSes and the hardware.
Also known as a Virtual Machine Monitor (VMM).
Requirements of a VMM
Developed by Popek & Goldberg in 1974 [8]:

1.  Provides environment identical to underlying

2.  Most of the instructions coming from the guest OS
are executed by the hardware without being
modified by the VMM.

3.  Resource management is handled by the VMM

(this all non-CPU hardware such as memory and
Guest OS Model
  Hypervisor exists as a layer
between the operating
systems and the hardware.

Apps Apps Apps

  Performs memory
management and
scheduling required to Guest OS Guest OS Guest OS
coordinate multiple
operating systems. Hypervisor (Host)

  May also have a separate
controlling interface.
Virtualization Challenges
  Privileged Instructions
  Handling architecture-imposed instruction privilege

  Performance Requirements
  Holding down the cost of VMM activities.
  Memory Management
  Managing multiple address spaces efficiently.
  I/O Virtualization
  Handling I/O requests from multiple operating
e hardware. The functionality of the hypervisor varies greatly based on architecture and
plementation. Each VMM running on the hypervisor implements the virtual machine hardware

Virtualizing Privileged
straction and is responsible for running a guest OS. Each VMM has to partition and share the
U, memory and I/O devices to successfully virtualize the system.

PU Virtualization
  x86
he Challenges ofarchitecture
x86 Hardware hasVirtualization
privilege levels (rings).
6 operating systems are designed to run directly on
e bare-metal hardware,
The OS assumes
they naturally assume
it will be
ly 'own' the computer hardware. As shown in Figure
executing in Ring 0.
the x86 architecture offers four levels of privilege
own as Ring 0, 1,
2 and system
3 to operating
calls systems
plications to manage access to the computer
0-level privileges to
rdware. While user level applications typically run in
ng 3, the operating system needs to have direct
cess to the memory
Any virtualization strategy
virtualization must execute
privileged instructions in Ring 0. Virtualizing the x86
must find a way to
chitecture requires placing a virtualization layer under
circumvent this.
e operating system (which expects to be in the
ost privileged Ring 0) to create and manage the
tual machines that deliver shared resources.
rther complicating the situation, some sensitive
Image Source: VMWare White Paper, “Understanding Full Virtualization, Paravirtualization, and Hardware
tructions can't effectively
Assist”, 2007. be virtualized as they have different semantics when they are not
ecuted in Ring 0. The difficulty in trapping and translating these sensitive and privileged
truction requests at runtime was the challenge that originally made x86 architecture
tualization look impossible.
Full Virtualization
  “Hardware is functionally
identical to underlying
architecture.” [3]
  Typically accomplished
through interpretation or
binary translation.
  Advantage: Guest OS will
run without any changes to
source code.
  Disadvantage: Complex,
usually slower than

Image Source: VMWare White Paper, “Understanding Full Virtualization, Paravirtualization, and Hardware
Assist”, 2007.
  Replace certain
unvirtualized sections of
OS code with
virtualization-friendly code.
  Virtual architecture
“similar but not identical
to the underlying
architecture.” [3]
  Advantages: easier, lower
virtualization overhead
  Disadvantages: requires
modifications to guest OS

Image Source: VMWare White Paper, “Understanding Full Virtualization, Paravirtualization, and Hardware
Assist”, 2007.
  Modern VMMs based around
trap-and-emulate [8].
Guest OS
  When a guest OS executes a CPU_INST
privileged instruction,
control is passed to VMM
(VMM “traps” on TRAP VMM
instruction), which decides
how to handle instruction
[8]. CPU_INST1
  VMM generates instructions
to handle trapped EXEC
instruction (emulation).
  Non-privileged instructions
do not trap (system stays in CPU_INST
guest context).
Trap-and-Emulate Problems
  Trap-and-emulate is expensive
  Requires context-switch from guest OS mode to VMM.
  x86 is not trap-friendly
  Guest’s CPL privilege level is visible in hardware
registers; cannot change it in a way that the guest OS
cannot detect [5].
  Some instructions are not privileged, but access
privileged systems (page tables, for example) [5].
VMWare Virtualization
  Full virtualization implemented through dynamic
binary translation [5].
  Translated code is grouped and stored in translation
caches (TCs).
  Callout method replaces traps with stored emulation
  In-TC emulation blocks are even more efficient.
  Adaptive binary translation rewrites translated blocks
to minimize PTE traps [5].

  Direct execution of user-space code further reduces

overhead [5].
Xen Virtualization
  Xen occupies privilege level 0; guest OS occupies
privilege level 1.
  OS code is modified so that high-privilege calls
(hypercalls) are made to and trapped by Xen [3].
  Xen traps guest OS instructions using table of
exception handlers.
  Frequently used handlers (e.g., system calls) have
special handlers that allow guest OS to bypass
privilege level 0 [3].
  Approach does not work with page faults.
  Handlers are vetted by Xen before being stored.
Hardware-Assisted Virtualization
  Hardware virtualization-assist released in 2006 [5].
  Intel, AMD both have technologies of this type.

  Introduces new VMX runtime mode.

  Two modes: guest (for OS) and root (for VMM).
  Each mode has all four CPL privilege levels available [8].
  Switching from guest to VMM does not require changes in
privilege level.
  Root mode supports special VMX instructions.
  Virtual machine control block [5] contains control flags and
state information for active guest OS.
  New CPU instructions for entering and exiting VMM mode.

  Does not support I/O virtualization.

Intel VT-X

  Both modes have no restrictions on privilege

  No need for software-based deprivileging

Image Source: Smith, J. and Nair, R. Virtual Machines, Morgan Kaufmann, 2005.
Applications of VT-X
  Xen uses Intel VT-x to host fully-virtualized guests
alongside paravirtualized guests [6].
  System has root (VMM) and non-root (guest) modes,
each with privilege levels 0-3.
  QEMU/Bochs projects provide emulations

  VMWare does not make use of VT technology [5].

  VMWare’s software-based VMMs significantly
outperformed VT-X-based VMMs [5].
  VT-X virtualization is trap-based, and DBT tries to
eliminate traps wherever possible.
Virtualizing Memory
  Virtualization software must find a way to handle paging requests
of operating systems, keeping each set of pages separate.

  Memory virtualization must not impose too much overhead, or

performance and scalability will be impaired.

  Guest OS must each have an address space, be convinced that it

has access to the entire address space.
  SOLUTION: most modern VMMs add an additional layer of
abstraction in address space [4].
  Machine Address—bare hardware address.
  Physical Address—VMM abstraction of machine address, used by
guest Oses.
  Guest maintains virtual-to-physical page tables.
  VMM maintains pmap structure containing physical-to-machine page
Memory Problem
virtual physical physical machine
a b b c


Page Table for Pmap

Program m on structure in
VM n. VMM.

That’s a lot of lookups!

Shadow Page Tables
  Shadow page tables map virtual memory to machine
memory [4].
  One page table maintained per guest OS.
  TLB caches results from shadow page tables.
  Shadow page tables must be kept consistent with
guest pages.
  VMM updates shadow page tables when pmap
(physical-to-machine) records are updated.

  VMM now has access to virtual addresses,

eliminating two page table lookups.
Shadow Page Tables
virtual physical physical machine virtual machine
a b b c a c

Page Table for Pmap Shadow page

Program m on structure in table in VMM.
VM n. VMM.

Guest VMM
Shadow Page Table
  Updates are expensive
  On a write, the VMM must update the VM and the
shadow page table.

  TLB must be flushed on world switch.

  TLB from other guest will be full of machine addresses
that would be invalid in the new context.
Direct Access
  Direct access to hardware is not permitted by the Popek
and Goldberg model [8].
  VMWare and Xen both bend this rule, allow guests to
access hardware directly in certain cases.

  Xen uses validated access model [3].

  Fine-grained control over direct access.
  VMWare allows user-mode instructions to bypass BT, go
straight to CPU [5].

  Memory accesses are sometimes batched to minimize

context switches.
Load Balancing Problem
  Assume VMM divides
address space evenly
among guests.

  If guest workload is not

balanced, one guest could
be routinely starved for
memory. 2/n 1/n (n–2)/n 4/n

  Other guests have way

more than they need.

  Solution: memory
Memory Overcommitment
  Overcommitment: committing more total memory to
guest OSes than actually exists on the system [4].
  Guest memory can be adjusted according to workload.
  Higher-workload servers get better performance than with
a simple even allocation.

  Requires some mechanism to reclaim memory from

other guests [4].
  Poor page replacement schemes can result in double
paging [4].
  VMM marks page for reclamation, OS immediately moves
reclaimed page out of memory
  Most common in high memory-usage situations.
  Mechanism for page
  Technique to induce page-
ins, page-outs in a guest OS.
  “Balloon module” [4] loaded
on guest OS reserves
physical pages; can be
expanded or contracted.
  Balloon inflates, guest starts
releasing memory
  Balloon deflates, guest may
start allocating pages.
  VMWare and Xen both
support ballooning.

Image Source: Waldspurger, C. “Memory Resource Management in VMware ESX Server”, OSDI 2002.
I/O Virtualization
  Performance is critical for Guest OS
virtualized I/O
  Many I/O devices are time-
sensitive or require low
latency [7].

  Most common method: Virtual

device emulation Device
  VMM presents guest OS
with a virtual device [7]. VMM
  Preserves security, handles
concurrency, but imposes
more overhead.

Physical Device
I/O Virtualization Problems
  Multiplexing
  How to share hardware access among multiple OSes.

  Switching Expense
  Low-level I/O functionality happens at the VMM level,
requiring a context switch.
Packet Queuing
  Both major VMMs use an
asynchronous ring buffer
to store I/O descriptors.
Request Consumer Request Producer
Private pointer Shared pointer
the timel
in Xen updated by guest OS
ing virtu
'ideal' fa
other sch
switches [7]. Response Producer
our gene
ters can b
updated by Response Consumer
  Sends and receives exist in Xen Private pointer
in guest OS
same buffer. Request queue - Descriptors queued by the VM but not yet accepted by Xen Xen p
Outstanding descriptors - Descriptor slots awaiting a response from Xen and wall-
Response queue - Descriptors returned by Xen in response to serviced requests since ma
  If buffer fills up, an exit is Unused descriptors sor’s cyc
source (f
Figure 2: The structure of asynchronous I/O rings, which are
used for data transfer between Xen and guest OSes.
cation pr
to be add
Figure 2 shows the structure of our I/O descriptor rings. A ring
is a circular queue of descriptors allocated by a domain but accessi-
Image Source: Barham, P. et al. “Xen and the Art of Virtualization”,
ble from withinSOSP 2003.
instead, I/O data buffers are allocated out-of-band by the guest OS
and indirectly referenced by I/O descriptors. Access to each ring
is based around two pairs of producer-consumer pointers: domains
place requests on a ring, advancing a request producer pointer, and
Xen removes these requests for handling, advancing an associated
Xen removes these requests for handling, advancing an associated 3.3.3
I/O Rings, continued
Xen VMWare
  Rings contain memory   Ring buffer is constructed
descriptors pointing to I/O
buffer regions declared in in and managed by VMM.
guest address space.
  If VMM detects a great
  Guest and VMM deposit and deal of entries and exits, it
remove messages using a
producer-consumer model starts queuing I/O
[2]. requests in ring buffer [7].
  Xen 3.0 places device   Next interrupt triggers
drivers on their own virtual transmission of
domains, minimizing the
effect of driver crashes. accumulated messages.
  Current VMM implementations provide safe, relatively
efficient virtualization, albeit often at the expense of
theoretical soundness [8].

  The x86 architecture requires a) binary translation, b)

paravirtualization, or c) hardware support to virtualize.

  Binary translation and instruction trapping costs are

currently the largest drains on efficiency [5].

  Management of memory and other resources remains a

complex and expensive task in modern virtualization
1.  Singh, A. “An Introduction To Virtualization”,, 2004.

2.  VMWare White Paper, “Understanding Full Virtualization, Paravirtualization, and

Hardware Assist”, 2007.

3.  Barham, P. et al. “Xen and the Art of Virtualization”, SOSP 2003.

4.  Waldspurger, C. “Memory Resource Management in VMware ESX Server”, OSDI 2002.

5.  Adams, K. and Agesen, O. “A Comparison of Software and Hardware Techniques for
x86 Virtualization”, ASPLOS 2006.

6.  Pratt, I. et al. “Xen 3.0 and the Art of Virtualization”, Linux Symposium 2005.

7.  Sugerman, J. et al. “Virtualizing I/O Devices on Vmware Workstation’s Hosted Virtual
Machine Monitor”, Usenix, 2001.

8.  Popek, G. and Kgoldberg, R. “Formal Requirements for Virtualizable Third-Generation

Architectures”, Communications of the ACM, 1974.

9.  Mahalingam, M. “I/O Architectures for Virtualization”, VMWorld, 2006.

10.  Smith, J. and Nair, R. Virtual Machines, Morgan Kaufmann, 2005.

11.  Bellard, F. “QEMU, a Fast and Portable Translator”, USENIX 2005.

12.  Silberschatz, A., Galvin, P., Gagne, G. Operating System Concepts, Eighth Edition. Wiley
& Sons, 2009.

