Secure Computer 2006a
Secure Computer 2006a
Secure Computer 2006a
Operating Systems
Reliable and Secure?
Andrew S. Tanenbaum, Jorrit N. Herder, and Herbert Bos
When was the last time your TV set crashed or
implored you to download some emergency
software update from the Web? After all, unless it
is an ancient set, it is just a computer with a CPU, a
big monitor, some analog electronics for decoding
radio signals, a couple of peculiar I/O devices (e.g.,
remote control, built in VCR or DVD drive) and a
boatload of software in ROM.
This rhetorical question points out a nasty little
secret that we in the computer industry do not like
to discuss: why are TV sets, DVD recorders, MP3
players, cell phones, and other software-laden electronic devices reliable and secure and computers
not? Of course there are many reasons (computers are flexible, users can change the software, the
IT industry is immature, etc.), but as we move to an
era in which the vast majority of computer users
are nontechnical people, increasingly these seem
like lame excuses to them. What they expect from a
computer is what they expect from a TV set: you
buy it, you plug it in, and it works perfectly for the
next 10 years. As IT professionals, we need to take
up this challenge and make computers as reliable
and secure as TV sets.
The worst offender when it comes to reliability
and security is the operating system, so we will
focus on that. However, before getting into the details, a few words about the relationship between
reliability and security are in order. Problems with
each of them have the same root cause: bugs in the
software. A buffer overrun error can cause a system crash (reliability problem), but it can also
allows a cleverly written virus or worm to take
over the computer (security problem). Although we
focus primarily on reliability below, improving
reliability also improves security.
Shell
File
system
Kernel
mode
Make
Memory
mgmt
...
User
Scheduling
Process
mgmt
Wrapper
Disk
driver
...
LAN
driver
Stub
Interposition
Each driver class exports a set of functions that
the kernel can call. For example, sound drivers
might offer a call to write a block of audio samples
to the card, another one to adjust the volume, and
so on. When the driver is loaded, an array of
pointers to the drivers functions is filled in, so the
kernel can find each one. In addition, the driver
imports a set of functions provided by the kernel,
for example, for allocating a data buffer.
Nooks provides wrappers for both the exported
and imported functions. When the kernel now calls
a driver function or a driver calls a kernel function,
the call actually goes to a wrapper that checks the
parameters for validity and manages the call as
described below. While the wrapper stubs (shown
as lines sticking into and out of the drivers in Fig.
1) are generated automatically from their function
prototypes, the wrapper bodies must be hand written. In all, 455 wrappers were written, 329 for
functions exported by the kernel and 126 for functions exported by device drivers.
When a driver tries to modify a kernel object, its
wrapper copies the object into the drivers protection domain (i.e., onto its private read-write pages).
The driver then modifies the copy. Upon successful completion of the request, modified kernel
objects are copied back to the kernel. In this way,
a driver crash or failure during a call always leaves
kernel objects in a valid state. Keeping track of
imported objects is object specific; code has been
hand written to track 43 classes of objects.
Recovery
After a failure, the user-mode recovery agent runs
and consults a configuration data base to see what
to do. In many cases, releasing the resources held
and restarting the driver is enough because most
errors are caused by unusual timing conditions
(algorithmic bugs are usually found in testing, but
timing bugs are not).
This technique can recover the system, but running applications may fail. In additional work [9],
the Nooks team added the concept of shadow
drivers to allow applications to continue after a
driver failure. In short, during normal operation,
communication between each driver and the kernel
is logged by a shadow driver if it will be needed for
recovery. After a driver restart, the shadow driver
feeds the newly restarted driver from the log, for
example, repeating the IOCTL system calls that set
parameters such as audio volume. The kernel is
unaware of the process of getting the new driver
back into the same state the old one was in. Once
this is accomplished, the driver begins processing
new requests.
-3Limitations
While experiments show that Nooks can catch
99% of the fatal driver errors and 55% of the nonfatal ones, it is not perfect. For example, drivers
can execute privileged instructions they should not
execute; they can write to incorrect I/O ports; and
they can get into infinite loops. Furthermore, large
numbers of wrappers had to be written manually
and may contain faults. Finally, drivers are not
prevented from reenabling write access to all of
memory. Nevertheless, it is potentially a useful
step towards improving the reliability of legacy
kernels.
PARAVIRTUAL MACHINES
A second approach has its roots in the concept of
a virtual machine, which goes back to the late
1960s [3]. In short, this idea is to run a special control program, called a virtual machine monitor, on
the bare hardware instead of an operating system,
Its job is to create multiple instances of the true
machine. Each instance can run any software the
bare machine can. The technique is commonly used
to allow two or more operating systems, say Linux
and Windows, to run on the same hardware at the
same time, with each one thinking it has the entire
machine to itself. The use of virtual machines has
a well-deserved reputation for extremely good fault
isolationafter all, if none of the virtual machines
even know about the other ones, problems in one of
them cannot spread to other ones.
The research here is to adapt this concept to protection within a single operating system, rather than
between different operating systems [5]. Furthermore, because the Pentium is not fully virtualizable, a concession was made to the idea of running
an unmodified operating system in the virtual
machine. This concession allows modifications to
be made to the operating system to make sure it
does do anything that cannot be virtualized. To distinguish this technique from true virtualization, this
one is called paravirtualization.
Specifically, in the 1990s, a research group at the
University of Karlsruhe built a microkernel called
L4 [6]. They were able to run a slightly modified
version of Linux (L4Linux [4]) on top of L4 in
what might be described as a kind of virtual
machine. The researchers later realized that
instead of running only one copy of Linux on L4,
they could run multiple copies. This insight led to
the idea of having one of the virtual Linux
machines run the application programs and one or
more other ones run the device drivers, as illustrated in Fig. 2.
Linux VM #1
Shell
User
mode
Linux VM #2
make
File
system
Memory
mgmt
Scheduling
Process
mgmt
OS
Disk
driver
LAN
driver
...
Interrupts
Kernel
mode
L4 Microkernel
User
mode
File
Disk
Kernel
mode
...
make
Proc.
TTY
Reinc
Ether
User
...
Servers
Other
...
Clock
Other
Drivers
Sys
Performance Considerations
Multiserver architectures based on microkernels
have been criticized for decades because of alleged
performance problems. However, various projects
have proven that modular designs can actually have
-6device drivers and browser plug-ins are not permitted because they would introduce unverified
foreign code that could corrupt the mother process.
Instead, such extensions must run as separate
processes, completely walled off and communicating by the standard interprocess communication
mechanism (described below).
The Microkernel
The Singularity operating system consists of a
microkernel process and a set of user processes, all
typically running in a common virtual address
space. The microkernel controls access to hardware, allocates and deallocates memory, creates,
destroys, and schedules threads, handles thread
synchronization with mutexes, handles interprocess
synchronization with channels, and supervises I/O.
Each device driver runs as a separate process.
Although most of the microkernel is written in
Sing#, a small portion is written in C#, C++, or
assembler and must be trusted since it cannot be
verified. The trusted code includes the HAL
(Hardware Abstraction Layer) and the garbage collector. The HAL hides the low-level hardware
from the system by abstracting out concepts such as
I/O ports, IRQ lines, DMA channels, and timers to
present machine-independent abstractions to the
rest of the operating system.
Interprocess Communication
User processes obtain system services by sending
strongly typed messages to the microkernel over
point-to-point bidirectional channels. In fact, all
process-to-process communication uses these channels. Unlike other message-passing systems, which
have SEND and RECEIVE functions in some
library, Sing# fully supports channels in the
language, including formal typing and protocol
specifications. To make this point clear, consider
this channel specification:
contract C1 {
in message Request(int x) requires x > 0;
out message Reply(int y);
out message Error();
Verification
Each system component has metadata describing
its dependencies, exports, resources, and behavior.
This metadata is used for verification. The system
image consists of the microkernel, drivers, and
applications needed to run the system, along with
their metadata. External verifiers can perform
many checks on the image before it is executed,
such as making sure that drivers do not have
resource conflicts.
Verification is a three-step process:
1. The compiler checks type safety, object ownership, channel protocols, etc.
2. The compiler generates MSIL (Microsoft
Intermediate Language), which is a portable JVMlike byte code that can be verified.
state Start:
Request? -> Pending;
state Pending: one {
Reply! -> Start;
Error! -> Stopped;
}
state Stopped: ;
ACKNOWLEDGEMENTS
We would like to thank Brian Bershad, Galen
Hunt, and Michael Swift for their comments and
suggestions. This work was supported in part by
the Netherlands Organization for Scientific
Research (NWO) under grant 612-060-420. References
1. V.R. Basili and B.T. Perricone, Software
Errors and Complexity: an Empirical Investigation, Commun. of the ACM, vol. 27, Jan.
1984, pp. 42-52.