Slides 03
Slides 03
Slides 03
(3rd Edition)
Introduction to threads
Basic idea
We build virtual processors in software, on top of physical processors:
Processor: Provides a set of instructions along with the capability of
automatically executing a series of those instructions.
Thread: A minimal software processor in whose context a series of
instructions can be executed. Saving a thread context implies
stopping the current execution and saving all the data needed to
continue the execution at a later stage.
Process: A software processor in whose context one or more threads may
be executed. Executing a thread, means executing a series of
instructions in the context of that thread.
2 / 47
Processes: Threads Introduction to threads
Context switching
Contexts
Processor context: The minimal collection of values stored in the registers
of a processor used for the execution of a series of instructions (e.g.,
stack pointer, addressing registers, program counter).
3 / 47
Processes: Threads Introduction to threads
Context switching
Contexts
Processor context: The minimal collection of values stored in the registers
of a processor used for the execution of a series of instructions (e.g.,
stack pointer, addressing registers, program counter).
Thread context: The minimal collection of values stored in registers and
memory, used for the execution of a series of instructions (i.e., processor
context, state).
3 / 47
Processes: Threads Introduction to threads
Context switching
Contexts
Processor context: The minimal collection of values stored in the registers
of a processor used for the execution of a series of instructions (e.g.,
stack pointer, addressing registers, program counter).
Thread context: The minimal collection of values stored in registers and
memory, used for the execution of a series of instructions (i.e., processor
context, state).
Process context: The minimal collection of values stored in registers and
memory, used for the execution of a thread (i.e., thread context, but now
also at least MMU register values).
3 / 47
Processes: Threads Introduction to threads
Context switching
Observations
1 Threads share the same address space. Thread context switching can be
done entirely independent of the operating system.
2 Process switching is generally (somewhat) more expensive as it involves
getting the OS in the loop, i.e., trapping to the kernel.
3 Creating and destroying threads is much cheaper than doing so for
processes.
4 / 47
Processes: Threads Introduction to threads
Process A Process B
Operating system
Trade-offs
Threads use the same address space: more prone to errors
No support from OS/HW to protect threads using each other’s memory
Thread context switching may be faster than process context switching
Thread usage in nondistributed systems 6 / 47
Processes: Threads Introduction to threads
MRU
A D
Main issue
Should an OS kernel provide threads, or should they be implemented as
user-level packages?
User-space solution
All operations can be completely handled within a single process ⇒
implementations can be extremely efficient.
All services provided by the kernel are done on behalf of the process in
which a thread resides ⇒ if the kernel decides to block a thread, the
entire process will be blocked.
Threads are used when there are lots of external events: threads block on
a per-event basis ⇒ if the kernel can’t distinguish threads, how can it
support signaling events to them?
Thread implementation 8 / 47
Processes: Threads Introduction to threads
Kernel solution
The whole idea is to have the kernel contain the implementation of a thread
package. This means that all operations return as system calls:
Operations that block a thread are no longer a problem: the kernel
schedules another available thread within the same process.
handling external events is simple: the kernel (which catches all events)
schedules the thread associated with the event.
The problem is (or used to be) the loss of efficiency due to the fact that
each thread operation requires a trap to the kernel.
Conclusion – but
Try to mix user-level and kernel-level threads into a single concept, however,
performance gain has not turned out to outweigh the increased complexity.
Thread implementation 9 / 47
Processes: Threads Introduction to threads
Lightweight processes
Basic idea
Introduce a two-level threading approach: lightweight processes that can
execute user-level threads.
Thread state
User space
Thread
Lightweight process
Kernel space
Thread implementation 10 / 47
Processes: Threads Introduction to threads
Lightweight processes
Principle operation
Thread implementation 11 / 47
Processes: Threads Introduction to threads
Lightweight processes
Principle operation
User-level thread does system call ⇒ the LWP that is executing that
thread, blocks. The thread remains bound to the LWP.
Thread implementation 11 / 47
Processes: Threads Introduction to threads
Lightweight processes
Principle operation
User-level thread does system call ⇒ the LWP that is executing that
thread, blocks. The thread remains bound to the LWP.
The kernel can schedule another LWP having a runnable thread bound to
it. Note: this thread can switch to any other runnable thread currently in
user space.
Thread implementation 11 / 47
Processes: Threads Introduction to threads
Lightweight processes
Principle operation
User-level thread does system call ⇒ the LWP that is executing that
thread, blocks. The thread remains bound to the LWP.
The kernel can schedule another LWP having a runnable thread bound to
it. Note: this thread can switch to any other runnable thread currently in
user space.
A thread calls a blocking user-level operation ⇒ do context switch to a
runnable thread, (then bound to the same LWP).
Thread implementation 11 / 47
Processes: Threads Introduction to threads
Lightweight processes
Principle operation
User-level thread does system call ⇒ the LWP that is executing that
thread, blocks. The thread remains bound to the LWP.
The kernel can schedule another LWP having a runnable thread bound to
it. Note: this thread can switch to any other runnable thread currently in
user space.
A thread calls a blocking user-level operation ⇒ do context switch to a
runnable thread, (then bound to the same LWP).
When there are no threads to schedule, an LWP may remain idle, and
may even be removed (destroyed) by the kernel.
Thread implementation 11 / 47
Processes: Threads Introduction to threads
Lightweight processes
Principle operation
User-level thread does system call ⇒ the LWP that is executing that
thread, blocks. The thread remains bound to the LWP.
The kernel can schedule another LWP having a runnable thread bound to
it. Note: this thread can switch to any other runnable thread currently in
user space.
A thread calls a blocking user-level operation ⇒ do context switch to a
runnable thread, (then bound to the same LWP).
When there are no threads to schedule, an LWP may remain idle, and
may even be removed (destroyed) by the kernel.
Note
This concept has been virtually abandoned – it’s just either user-level or
kernel-level threads.
Thread implementation 11 / 47
Processes: Threads Threads in distributed systems
Multithreaded clients 12 / 47
Processes: Threads Threads in distributed systems
Multithreaded clients 13 / 47
Processes: Threads Threads in distributed systems
Practical measurements
A typical Web browser has a TLP value between 1.5 and 2.5 ⇒ threads are
primarily used for logically organizing browsers.
Multithreaded clients 13 / 47
Processes: Threads Threads in distributed systems
Improve performance
Starting a thread is cheaper than starting a new process.
Having a single-threaded server prohibits simple scale-up to a
multiprocessor system.
As with clients: hide network latency by reacting to next request while
previous one is being replied.
Better structure
Most servers have high I/O demands. Using simple, well-understood
blocking calls simplifies the overall structure.
Multithreaded programs tend to be smaller and easier to understand due
to simplified flow of control.
Multithreaded servers 14 / 47
Processes: Threads Threads in distributed systems
Worker thread
Request coming in
from the network
Operating system
Overview
Model Characteristics
Multithreading Parallelism, blocking system calls
Single-threaded process No parallelism, blocking system calls
Finite-state machine Parallelism, nonblocking system calls
Multithreaded servers 15 / 47
Processes: Virtualization Principle of virtualization
Virtualization
Observation
Virtualization is important:
Hardware changes faster than software
Ease of portability and code migration
Isolation of failing or attacked components
Program
Interface A
Program Implementation of
mimicking A on B
Interface A Interface B
16 / 47
Processes: Virtualization Principle of virtualization
Mimicking interfaces
Types of virtualization 17 / 47
Processes: Virtualization Principle of virtualization
Ways of virtualization
(a) Process VM, (b) Native VMM, (c) Hosted VMM
Application/Libraries
Differences
(a) Separate set of instructions, an interpreter/emulator, running atop an OS.
(b) Low-level instructions, along with bare-bones minimal operating system
(c) Low-level instructions, but delegating most work to a full-fledged OS.
Types of virtualization 18 / 47
Processes: Virtualization Principle of virtualization
Special instructions
Control-sensitive instruction: may affect configuration of a machine (e.g.,
one affecting relocation register or interrupt table).
Behavior-sensitive instruction: effect is partially determined by context
(e.g., POPF sets an interrupt-enabled flag, but only in system mode).
Types of virtualization 19 / 47
Processes: Virtualization Principle of virtualization
Solutions
Emulate all instructions
Wrap nonprivileged sensitive instructions to divert control to VMM
Paravirtualization: modify guest OS, either by preventing nonprivileged
sensitive instructions, or making them nonsensitive (i.e., changing the
context).
Types of virtualization 20 / 47
Processes: Virtualization Application of virtual machines to distributed systems
IaaS
Instead of renting out a physical machine, a cloud provider will rent out a VM
(or VMM) that may possibly be sharing a physical machine with other
customers ⇒ almost complete isolation between customers (although
performance isolation may not be reached).
21 / 47
Processes: Clients Networked user interfaces
Client-server interaction
Network Network
22 / 47
Processes: Clients Networked user interfaces
Basic organization
Application server Application server User's terminal
Xlib Xlib
X kernel
Device drivers
Basic organization
Application server Application server User's terminal
Xlib Xlib
X kernel
Device drivers
Improving X
Practical observations
There is often no clear separation between application logic and
user-interface commands
Applications tend to operate in a tightly synchronous manner with an X
kernel
Alternative approaches
Let applications control the display completely, up to the pixel level (e.g.,
VNC)
Provide only a few high-level display operations (dependent on local video
drivers), allowing more efficient display operations.
Client-side software
25 / 47
Processes: Servers General design issues
Basic model
A process implementing a specific service on behalf of a collection of clients. It
waits for an incoming request from a client and subsequently ensures that the
request is taken care of, after which it waits for the next incoming request.
26 / 47
Processes: Servers General design issues
Concurrent servers
Observation
Concurrent servers are the norm: they can easily handle multiple requests,
notably in the presence of blocking operations (to disks or other servers).
Contacting a server
Out-of-band communication
Issue
Is it possible to interrupt a server once it has accepted (or is in the process of
accepting) a service request?
Interrupting a server 29 / 47
Processes: Servers General design issues
Out-of-band communication
Issue
Is it possible to interrupt a server once it has accepted (or is in the process of
accepting) a service request?
Interrupting a server 29 / 47
Processes: Servers General design issues
Out-of-band communication
Issue
Is it possible to interrupt a server once it has accepted (or is in the process of
accepting) a service request?
Interrupting a server 29 / 47
Processes: Servers General design issues
Consequences
Clients and servers are completely independent
State inconsistencies due to client or server crashes are reduced
Possible loss of performance because, e.g., a server cannot anticipate
client behavior (think of prefetching file blocks)
Consequences
Clients and servers are completely independent
State inconsistencies due to client or server crashes are reduced
Possible loss of performance because, e.g., a server cannot anticipate
client behavior (think of prefetching file blocks)
Question
Does connection-oriented communication fit into a stateless design?
Stateless versus stateful servers 30 / 47
Processes: Servers General design issues
Stateful servers
Keeps track of the status of its clients:
Record that a file has been opened, so that prefetching can be done
Knows which data a client has cached, and allows clients to keep local
copies of shared data
Stateful servers
Keeps track of the status of its clients:
Record that a file has been opened, so that prefetching can be done
Knows which data a client has cached, and allows clients to keep local
copies of shared data
Observation
The performance of stateful servers can be extremely high, provided clients
are allowed to keep local copies. As it turns out, reliability is often not a major
problem.
Common organization
Logical switch Application/compute servers Distributed
(possibly multiple) file/database
system
Dispatched
request
Client requests
Crucial element
The first tier is generally responsible for passing requests to an appropriate
server: request dispatching
Local-area clusters 32 / 47
Processes: Servers Server clusters
Request Handling
Observation
Having the first tier handle all communication from/to the cluster may lead to a
bottleneck.
Request
Request (handed off)
Client Switch
Server
Local-area clusters 33 / 47
Processes: Servers Server clusters
Server clusters
The front end may easily get overloaded: special measures may be needed
Transport-layer switching: Front end simply passes the TCP request to
one of the servers, taking some performance metric into account.
Content-aware distribution: Front end reads the content of the request
and then selects the best server.
Other messages
Dis-
Client Switch 4. Inform patcher
Setup request switch
1. Pass setup request Distributor 2. Dispatcher selects
to a distributor server
Application
server
Local-area clusters 34 / 47
Processes: Servers Server clusters
Client transparency
To keep client unaware of distribution, let DNS resolver act on behalf of client.
Problem is that the resolver may actually be far from local to the actual client.
Wide-area clusters 35 / 47
Processes: Servers Server clusters
Wide-area clusters 36 / 47
Processes: Servers Server clusters
Wide-area clusters 37 / 47
Processes: Servers Server clusters
Wide-area clusters 37 / 47
Processes: Servers Server clusters
Example: PlanetLab
Essence
Different organizations contribute machines, which they subsequently share for
various experiments.
Problem
We need to ensure that different distributed applications do not get into each
other’s way ⇒ virtualization
Process
Process
Process
Process
Process
Process
Process
Process
Process
/usr
/usr
/usr
/usr
/usr
/dev
/home
/proc
/dev
/home
/proc
/dev
/home
/proc
/dev
/home
/proc
/dev
/home
/proc
Vserver Vserver Vserver Vserver Vserver
Hardware
Vserver
Independent and protected environment with its own libraries, server versions,
and so on. Distributed applications are assigned a collection of vservers
distributed across multiple machines
Case study: PlanetLab 39 / 47
Processes: Servers Server clusters
Node
Vserver
41 / 47
Processes: Code migration Reasons for migrating code
code code
CS exec exec*
resource resource
code code
REV −→ exec −→ exec*
resource resource
42 / 47
Processes: Code migration Reasons for migrating code
code code
CoD exec ←− exec* ←−
resource resource
code code
MA exec −→ −→ exec*
resource resource resource resource
43 / 47
Processes: Code migration Reasons for migrating code
Object components
Code segment: contains the actual code
Data segment: contains the state
Execution state: contains context of thread executing the object’s code
Weak mobility: Move only code and data segment (and reboot execution)
Relatively simple, especially if code is portable
Distinguish code shipping (push) from code fetching (pull)
44 / 47
Processes: Code migration Migration in heterogeneous systems
Main problem
The target machine may not be suitable to execute the migrated code
The definition of process/thread/processor context is highly dependent on
local hardware, operating system and runtime system
45 / 47
Processes: Code migration Migration in heterogeneous systems
46 / 47
Processes: Code migration Migration in heterogeneous systems
Problem
A complete migration may actually take tens of seconds. We also need to
realize that during the migration, a service will be completely unavailable for
multiple seconds.
Downtime
Response time
Time
47 / 47