OS Virtualization: CSC 456 Final Presentation Brandon D. Shroyer
OS Virtualization: CSC 456 Final Presentation Brandon D. Shroyer
OS Virtualization: CSC 456 Final Presentation Brandon D. Shroyer
Examples: OS
Multiprogramming
Hardware
Emulation for binaries
High-level language VMMs
(e.g., JVM)
System Virtualization
Application
Application
OS
OS
Virtualization Layer
Virtualization Layer
OS
Hardware
Hardware
Hardware
May also have a separate
controlling interface.
Virtualization Challenges
Privileged Instructions
Handling architecture-imposed instruction privilege
levels.
Performance Requirements
Holding down the cost of VMM activities.
Memory Management
Managing multiple address spaces efficiently.
I/O Virtualization
Handling I/O requests from multiple operating
systems.
e hardware. The functionality of the hypervisor varies greatly based on architecture and
plementation. Each VMM running on the hypervisor implements the virtual machine hardware
Virtualizing Privileged
straction and is responsible for running a guest OS. Each VMM has to partition and share the
U, memory and I/O devices to successfully virtualize the system.
PU Virtualization Instructions
x86
he Challenges ofarchitecture
x86 Hardware hasVirtualization
four
privilege levels (rings).
6 operating systems are designed to run directly on
e bare-metal hardware,
The OSsoassumes
they naturally assume
it will be they
ly ‘own’ the computer hardware. As shown in Figure
executing in Ring 0.
the x86 architecture offers four levels of privilege
own as Ring 0, 1,Many
2 and system
3 to operating
calls systems
requireand
plications to manage access to the computer
0-level privileges to
rdware. While user level applications typically run in
execute.
ng 3, the operating system needs to have direct
cess to the memory
Any and hardware andstrategy
virtualization must execute
privileged instructions in Ring 0. Virtualizing the x86
must find a way to
chitecture requires placing a virtualization layer under
circumvent this.
e operating system (which expects to be in the
ost privileged Ring 0) to create and manage the Figure 4 – x86 privilege level architecture
tual machines that deliver shared resources. without virtualization
rther complicating the situation, some sensitive
Image Source: VMWare White Paper, “Understanding Full Virtualization, Paravirtualization, and Hardware
tructions can’t effectively
Assist”, 2007. be virtualized as they have different semantics when they are not
ecuted in Ring 0. The difficulty in trapping and translating these sensitive and privileged
truction requests at runtime was the challenge that originally made x86 architecture
tualization look impossible.
Full Virtualization
“Hardware is functionally
identical to underlying
architecture.” [3]
Typically accomplished
through interpretation or
binary translation.
Advantage: Guest OS will
run without any changes to
source code.
Disadvantage: Complex,
usually slower than
paravirtualization.
Image Source: VMWare White Paper, “Understanding Full Virtualization, Paravirtualization, and Hardware
Assist”, 2007.
Paravirtualization
Replace certain
unvirtualized sections of
OS code with
virtualization-friendly code.
Virtual architecture
“similar but not identical
to the underlying
architecture.” [3]
Advantages: easier, lower
virtualization overhead
Disadvantages: requires
modifications to guest OS
Image Source: VMWare White Paper, “Understanding Full Virtualization, Paravirtualization, and Hardware
Assist”, 2007.
Performance
Modern VMMs based around
trap-and-emulate [8].
Guest OS
When a guest OS executes a CPU_INST
privileged instruction,
control is passed to VMM
(VMM “traps” on TRAP VMM
instruction), which decides
how to handle instruction
[8]. CPU_INST1
VMM generates instructions
to handle trapped EXEC
instruction (emulation).
Non-privileged instructions
do not trap (system stays in CPU_INST
guest context).
Trap-and-Emulate Problems
Trap-and-emulate is expensive
Requires context-switch from guest OS mode to VMM.
x86 is not trap-friendly
Guest’s CPL privilege level is visible in hardware
registers; cannot change it in a way that the guest OS
cannot detect [5].
Some instructions are not privileged, but access
privileged systems (page tables, for example) [5].
VMWare Virtualization
Full virtualization implemented through dynamic
binary translation [5].
Translated code is grouped and stored in translation
caches (TCs).
Callout method replaces traps with stored emulation
functions.
In-TC emulation blocks are even more efficient.
Adaptive binary translation rewrites translated blocks
to minimize PTE traps [5].
Image Source: Smith, J. and Nair, R. Virtual Machines, Morgan Kaufmann, 2005.
Applications of VT-X
Xen uses Intel VT-x to host fully-virtualized guests
alongside paravirtualized guests [6].
System has root (VMM) and non-root (guest) modes,
each with privilege levels 0-3.
QEMU/Bochs projects provide emulations
frame
Guest VMM
Shadow Page Table
Drawbacks
Updates are expensive
On a write, the VMM must update the VM and the
shadow page table.
Solution: memory
overcommitment
Memory Overcommitment
Overcommitment: committing more total memory to
guest OSes than actually exists on the system [4].
Guest memory can be adjusted according to workload.
Higher-workload servers get better performance than with
a simple even allocation.
Image Source: Waldspurger, C. “Memory Resource Management in VMware ESX Server”, OSDI 2002.
I/O Virtualization
Performance is critical for Guest OS
virtualized I/O
Guest
Many I/O devices are time-
Driver
sensitive or require low
latency [7].
Physical Device
I/O Virtualization Problems
Multiplexing
How to share hardware access among multiple OSes.
Switching Expense
Low-level I/O functionality happens at the VMM level,
requiring a context switch.
Packet Queuing
Both major VMMs use an
asynchronous ring buffer
to store I/O descriptors.
Request Consumer Request Producer
Private pointer Shared pointer
the timel
Batches I/O operations to in Xen updated by guest OS work rou
ing virtu
minimize cost of world ‘ideal’ fa
other sch
switches [7]. Response Producer
our gene
Shared pointer ters can b
updated by Response Consumer
Sends and receives exist in Xen Private pointer
in guest OS 3.3.2
same buffer. Request queue - Descriptors queued by the VM but not yet accepted by Xen Xen p
Outstanding descriptors - Descriptor slots awaiting a response from Xen and wall-
Response queue - Descriptors returned by Xen in response to serviced requests since ma
If buffer fills up, an exit is Unused descriptors sor’s cyc
source (f
triggered [7]. Figure 2: The structure of asynchronous I/O rings, which are vances w
used for data transfer between Xen and guest OSes. schedule
cation pr
to be add
Figure 2 shows the structure of our I/O descriptor rings. A ring to be adj
is a circular queue of descriptors allocated by a domain but accessi- Each g
Image Source: Barham, P. et al. “Xen and the Art of Virtualization”,
ble from withinSOSP 2003.
Xen. Descriptors do not directly contain I/O data; time and
instead, I/O data buffers are allocated out-of-band by the guest OS maintain
and indirectly referenced by I/O descriptors. Access to each ring timers to
is based around two pairs of producer-consumer pointers: domains ing Xen’
place requests on a ring, advancing a request producer pointer, and
Xen removes these requests for handling, advancing an associated 3.3.3
I/O Rings, continued
Xen VMWare
Rings contain memory Ring buffer is constructed
descriptors pointing to I/O
buffer regions declared in in and managed by VMM.
guest address space.
If VMM detects a great
Guest and VMM deposit and deal of entries and exits, it
remove messages using a
producer-consumer model starts queuing I/O
[2]. requests in ring buffer [7].
Xen 3.0 places device Next interrupt triggers
drivers on their own virtual transmission of
domains, minimizing the
effect of driver crashes. accumulated messages.
Summary
Current VMM implementations provide safe, relatively
efficient virtualization, albeit often at the expense of
theoretical soundness [8].
3. Barham, P. et al. “Xen and the Art of Virtualization”, SOSP 2003.
4. Waldspurger, C. “Memory Resource Management in VMware ESX Server”, OSDI 2002.
5. Adams, K. and Agesen, O. “A Comparison of Software and Hardware Techniques for
x86 Virtualization”, ASPLOS 2006.
6. Pratt, I. et al. “Xen 3.0 and the Art of Virtualization”, Linux Symposium 2005.
7. Sugerman, J. et al. “Virtualizing I/O Devices on Vmware Workstation’s Hosted Virtual
Machine Monitor”, Usenix, 2001.
12. Silberschatz, A., Galvin, P., Gagne, G. Operating System Concepts, Eighth Edition. Wiley
& Sons, 2009.