Nothing Special   »   [go: up one dir, main page]

4memory Management

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Chapter 3: Memory management

Introduction

Memory is an important resource that must be carefully managed. While the average home computer nowadays
has a thousand times as much memory as the IBM 7094, the largest computer in the world in the early 1960s,
programs are getting bigger faster than memories. Ideally, what every programmer would like is an infinitely
large, infinitely fast memory that is also nonvolatile and inexpensive. Unfortunately, technology does not provide
such memories. Consequently, most computers have a memory hierarchy, with a small amount of very fast,
expensive, volatile cache memory, tens of megabytes of medium-speed, medium-price, volatile main memory
(RAM), and tens or hundreds of gigabytes of slow, cheap, nonvolatile disk storage. It is the job of the operating
system to coordinate how these memories are used. The part of the operating system that manages the memory
hierarchy is called the memory manager. Its job is to keep track of which parts of memory are in use and which
parts are not in use, to allocate memory to processes when they need it and deallocate it when they are done, and
to manage swapping between main memory and disk when main memory is too small to hold all the processes.
In this chapter we will investigate a number of different memory management schemes, ranging from very simple
to highly sophisticated.
I- Memory management requirements

Relocation

Programmer does not know where the program will be placed in memory when it is executed. While the program
is executing, it may be swapped to disk and returned to main memory at a different location. Memory references
must be translated in the code to actual physical memory address.

Protection

Processes should not be able to reference memory locations in another process without permission. It is impossible
to check addresses in programs since the program could be relocated. They must be checked during execution.

Sharing

Several processes can be allowed to access the same portion of memory. It is therefore better to allow each process
(person) access to the same copy of the program rather than have their own separate copy.

II- No memory abstraction

1- Mono-programming: Apart from the OS, there is only one user process in main memory at a given
time. Mono-programming systems require a simple memory manager: User types a command, The OS loads
program to main memory and executes it, and displays prompt, waits for new command. Below, we present 3
simple ways of organizing memory in such a system.

Example:
ROM Device Drivers
User Program

Operating System in
MS DOS memory
organization

2- Multiprogramming

a- Motivation

Let p be the fraction of time that a certain type process spends waiting for I/O. Let n be the number of
such processes in memory. The probability that all n processes block for I/O simultaneously is pn. Therefore,
CPU utilization is approximately 1 - pn .
Ex 1. If p = 20% and n = 5 of these processes
are in memor y, the probability that all 4
processes block is the product
(1/5)*(1/5)*(1/5)*(1/5)*(1/5) = (1/5)^5 =
1/3125. 1 - 1/3125 = 3124/3125 = .99968.
This is very close to 100% CPU utilization.
If only 2 such processes were in memory,
(1/5)^2 = 1/25. 1 - 1/25 = 24/25 = .96. This
is only 96% CPU utilization.
Ex.2 If p = 75%, how many processes
would be needed to achieve 96%
CPU utilization?
.96 = 1 - p^n
p^n = .04
logp(pn) = logp (.04)
n = log(3/4) (.04) = (log
.04) / (log 3/4) = 11.189 ~ 12
processes

Ex 3 If p = 75%, how many processes would be


needed to achieve 99% CP U utilization?
.99 = 1 - p^n
p^n = .01
logp(pn) = logp (.01)
n = log(3/4) (.01) = (log .01) / (log
3/4) = 16 processes

The following diagram represents the CPU utilization as a function of number of processes in memory.
Degree of multiprogramming

b- Multiprogramming with Fixed Partitions

There are 2 ways presented as follows:

(a) Represents a separate input queues for each partition (b) Represents a single input queue

Problem: How to organize main memory? How to assign processes to partitions? Separate queues or single
queue

II- Memory abstraction: address space

1- Notion of address space

a- Logical vs. Physical Address Space

The concept of a logical address space that is bounded to a separate physical address space is central to modern
memory management. Logical addresses are generated by the CPU. They are also referred to as virtual
addresses. Physical addresses are those seen by the memory unit.

b- Base and limit registers

This simple solution uses a particularly simple version of dynamic relocation. What it does is map each process'
address space onto a different part of physical memory in a simple way. The classical solution is to equip CPUs
with two special hardware registers, usually called the base and limit registers. When base and limit registers
are used, programs are loaded into consecutive memory locations wherever there is room and without relocation
during loading.
When a process is run, the base register is loaded with the physical address where its program begins in memory
and the limit register is loaded with the length of the program. Every time a process references memory, either
to fetch an instruction or read or write a data word, the CPU hardware automatically adds the base value to the
address generated by the process before sending the address out on the memory bus. Simultaneously, it checks
if the address offered is equal to or greater than the value in the limit register, in which case a fault is generated
and the access is aborted.
Using base and limit registers is an easy way to give each process its own private address space because every
memory address generated automatically has the base register contents added to it before being sent to memory.
In many implementations, the base and limit registers are protected in such a way that only the operating system
can modify them.
A disadvantage of relocation using base and limit registers is the need to perform an addition and a comparison
on every memory reference. Comparisons can be done fast, but additions are slow due to carry propagation time
unless special addition circuits are used.

2- Swapping

a- Schematic view

Swapping works as follows:

❑ A process can be brought in its entirety, run for a while, and then written back to backing store (if required).
Backing store is a fast disk space large enough to accommodate copies of all memory images for all
processes. It must provide direct access to these memory images.
❑ Major part of swap time is transfer time; total transfer time is proportional to the amount of memory
swapped. This time can be used to run another process.
❑ Swapping creates holes in memory (fragmentation), and then memory compaction may be required.
❑ Swapping is not used much anymore, but is still interesting.

Example:

On this figure, memory allocation changes as processes come into memory or leave memory. Shaded regions are
unused memory.
b- Allocating memory for growing segments

(a) Allocating space for growing data segment (b) Allocating space for growing stack & data segment

3- Managing free memory

Free memory can be managed with bitmaps or linked lists.

a- Example

On this figure, (a) represents Part of memory with 5 processes, 3 holes (tick marks show allocation units and
shaded regions are free)

(b) Represents the corresponding bit map (c) Represents the same information as a list

Managing with linked lists (case study): Four neighbor combinations for the terminating process X.

b- Fragmentation

❑ External Fragmentation: total memory space exists to satisfy a request, but it is not contiguous

❑ Internal Fragmentation: allocated memory may be slightly larger than requested memory; this size
difference is memory internal to a partition, but not being used.

c- Algorithms for allocating memory

1. First fit: allocates the first hole found that is large enough - fast (as little searching as possible).
2. Next fit: almost the same as First Fit except that it keeps track of where it last allocated space and starts from
there instead of from the beginning - slightly better performance.

3. Best fit: searches the entire list looking for a hole that is closest to the size needed by the process - slow -
also does not improve resource utilization because it tends to leave many very small ( and therefore useless)
holes.

4. Worst fit: the opposite of Best Fit - chooses the largest available hole and breaks off a hole that is large
enough to be useful (I.e. hold another process) - in practice has not been shown to work better than others.

5. Quick fit: several queues of different sizes

All the preceding algorithms suffer from External Fragmentation: As processes are loaded and
removed from memory the free memory is broken into little pieces and enough total space exists to satisfy a
request, but it is not contiguous

Solutions:

• Break memory into fixed-sized blocks and allocate in units of block sizes. Since the allocation will always
be slightly larger than the process, some Internal Fragmentation still results.

• Compaction: move all processes to one end of memory and holes to the other end. Expensive and can
only be done when relocation is done at execution time, not at load time.

0 0 0 0
Operati Operati Operati Operati
300 ng 300 ng 300 ng 300 ng
k500 P k500 P k500 P k500 P
system system system system
600
k P
1 600
k P
1 600
k P
1 600
k P
1
k 400 800
k P k P k 400
100 2 2 100 2 2
KP k 3
P P
4
900
K
0k
120 120 0k
120
K
0k
150 300
3 0k 4 0k 3
150
0k KP 900 900 0k P
190 190
200
4 K K P
4
210
0k 210 210 210
0k
0k original
K moved
0k moved 0k 0k move
3
allocation Figure 8.11600K
Comparison of 400K
some d
200K
different ways to compact memory

Fig: comparison of some different ways to compact memory.

3- Virtual memory

While base and limit registers can be used to create the abstraction of address space, there is another problem that
has to be solved: there is a need to run programs that are too large to fit in memory. The idea behind virtual
memory is that each program has its own address space, which is broken up into chunks called pages. Each page
is a contiguous range of addresses. These pages are mapped onto physical memory, but not all pages have to be
in physical memory to run the program. The physical memory is divided into fixed-size blocks called page
frames. The pages and page frames are generally the same size. In a sense, virtual memory is a generalization of
the base and limit registers idea.

a- Paging

Paging is a memory management scheme used in most virtual memory systems, which permits the physical
address space to be noncontiguous. It is used by most operating systems today in one of its various forms. It is
traditionally handled by hardware, but recent designs implement paging by closely integrating the hardware and
operating system.

➢ Page table

The relation between virtual addresses and physical memory addresses is given by the page table. An example of
such a mapping is given on the following figure:

➢ Memory-Management Unit (MMU)

It is a Hardware device that maps virtual addresses to physical addresses (among other things). On computers
with virtual memory, virtual addresses don’t go directly to the memory bus, but to the MMU that maps it onto the
physical memory address, and puts it on the memory bus. The user program deals with logical addresses; it never
sees the real physical addresses. The range marked OK-4K means that the virtual or physical addresses in that page are
0 to 4095. The range 4K-8K refers to addresses 4096 to 8191, and so on.
Fig: The position and function of the MMU

➢ Internal Operation of MMU

An incoming virtual address is split into 2 parts: A few high bits, on the left, for the page number. The rest of
the address for the offset (where the address actually lies within the page).

Addressing in a virtual address space of size 2m with pages of size 2n, uses the high order m-n bits for the page
number and the n low order bits for the offset.

Example :( with 16 bit addresses and pages of size 4 KB)

16 bit addresses => the size of the virtual address space is 216, and if the page size is 4K (212), we have 16 pages
(216 / 212 = 24 = 16), the highest 4 bits of the address give the page number and lowest 12 bit give the offset. Virtual
Address: 8196(dec) = 2004(hex) = 0010000000000100(bin). This address lies on page: ‘0010’ or 2 in the virtual
address space, and has offset ‘000000000100’ or 4, i.e. the address is found 4 bytes from the beginning of the
page. The physical address will have the same offset on the frame.

Problem: Today’s computers allow page tables with 1 million or more pages (1M pages for 32 bits, 4K page size
for example). Even very fast registers cannot handle this efficiently. With 4K pages each process may need 4
megabytes of physical address space its page table!! They must also be fast (every instruction needs it). It is then
necessary to look for mechanisms to speed up paging. //24580~215 i.e. 15-12=3bits for the outgoing physical
address of the picture

Questions:

- Describe the way virtual address →physical address translation can be speeded up
- Describe the way very large address spaces can be managed

Solution for both:

The solution that has been devised is to equip computers with a small hardware device for mapping virtual addresses to
physical addresses without going through the page table. The device, called a TLB (Translation Lookaside Buffer) or
sometimes an associative memory

b- Page replacement algorithms

When a page fault occurs, the operating system must choose a page to remove from memory to make room for
the page that has to be brought in. a modified page must first be saved , an unmodified just overwritten. It is better
not to choose an often used page because it will probably need to be brought back in soon.

➢ Optimal page replacement algorithm

On the second run of a program, if the operating system kept track of all page references, the Optimal Page
Replacement Algorithm could be used: replace the page that will not be used for the longest amount of time. This
method is impossible on the first run and not used in practice. It is used in theory to evaluate other algorithms.

➢ Not recently used algorithm (NRU)

It is a practical algorithm that makes use of the bits ‘Referenced’ and ‘Modified’. These bits are updated on every
memory reference and must be set by the hardware. On every clock cycle the operating system can clear the R
bit. This distinguishes those pages that have been referenced most recently from those that have not been
referenced during this clock cycle. The combinations are:

0 0: not referenced and not modified

0 1: not referenced, modified

1 0: referenced, not modified

1 1: referenced, modified

NRU randomly chooses a page from the lowest numbered non empty class to remove

➢ First In, First Out

First In First Out Algorithm maintains a linked list of all pages in order they came into memory. When a new
page must be brought in, replace the page that has been in memory the longest (the one at the beginning of list).
The drawback is that even though a page has been in memory a long time, it may still be needed frequently.

➢ Second chance algorithm


This is a modification of FIFO. The Referenced bit of the page that has been in memory longest is checked before
that page is automatically replaced. If the R bit has been set to 1, that page must have been referenced during the
previous clock cycle. That page is placed at the rear of the list and its R bit is reset to zero. A variation of this
algorithm, the ‘clock’ algorithm keeps a pointer to the oldest page using a circular list. This saves the time used
in the Second Chance Algorithm moving pages in the list

Example

1-(Second chance)

• Operation of a second chance

– pages sorted in FIFO order

– Page list if fault occurs at time 20, A has R bit set (numbers above pages are loading times)

2-(clock)

➢ Least Recently Used

Least Recently Used Algorithm (LRU) keeps track of each memory reference made to each page by some sort of
counter or table. Assume pages used recently will be used again soon and choose a page that has been unused for
a long time to be replaced. This requires a great deal of overhead (Must keep a linked list of pages, most recently
used at front and least at rear. update this list every memory reference !! Alternatively keep counter in each page
table entry. Choose page with lowest value of counter. Periodically zero the counter) and/or special hardware and
is not used in practice. It is simulated by similar algorithms:

Not Frequently Used keeps a counter for each page and at each clock interrupt, if the R bit for that page is 1, the
counter is incremented. The page with the smallest counter is chosen for replacement. What is the problem with
this?
A page with a high counter may have been referenced a lot in one phase of the process, but is no longer used.
This page will be overlooked, while another page with a lower counter but still being used is replaced.

Aging is a modification of NFU that simulates LRU very well.

The counters are shifted right 1 bit before the R bit is added in. Also, the R bit is added to the leftmost rather than
the rightmost bit. When a page fault occurs, the page with the lowest counter is still the page chosen to be removed
(It is clear that a page that has not been referenced for, say, four clock ticks will have four leading zeros in its counter and
thus will have a lower value than a counter that has not been referenced for three clock ticks.). However, a page that has
not been referenced for a while will not be chosen. It would have many leading zeros, making its counter value
smaller than a page that was recently referenced.

Example:

• Note 6 pages for 5 clock ticks, (a) – (e)

➢ Working set model

Demand Paging’: When a process is started, NONE of its pages are brought into memory. From the time the
CPU tries to fetch the first instruction, a page fault occurs, and this continues until sufficient pages have been
brought into memory for the process to run. During any phase of execution a process usually references only a
small fraction of its pages. This property is called the locality of reference.

Demand paging should be transparent to the user, but if the user is aware of the principle, system performance
can be improved. The set of pages that a process is currently using is called its working set. If the entire working
set is in memory, there will be no page faults.

If not, each read of a page from disk may take 10 milliseconds (.010 of a second). Compare this to the time it
takes to execute an instruction: a few nanoseconds (.000000002 of a second). If a program has page faults every
few instructions, it is said to be thrashing. Thrashing is happening when a process spends more time paging than
executing.

The Working Set Algorithm keeps track of a process’ ‘working set’ and makes sure it is in memory before
letting the process run. Since processes are frequently swapped to disk, to let other processes have CPU time,
pure demand paging would cause so many page faults, the system would be too slow.
The ‘working set’ is represented by w (k, t). This is the set of pages, where ‘t’ is any instant in time and ‘k’ is a
number of recent memory references. The ‘working set’ set changes over time but slowly. When a process must
be suspended (due to an I/O wait or lack of free frames), the w (k, t) can be saved with the process. In this way,
when the process is reloaded, its entire w (k, t) is reloaded, avoiding the initial large number of page faults. This
is called ‘PrePaging’.

Example:

➢ WSClock

The operating system keeps track of the working set, and when a page fault occurs, chooses a page not in the
working set for replacement. This requires a lot of work on the part of the operating system. A variation, called
the ‘WSClock Algorithm’, similar to the ‘Clock Algorithm’, makes it more efficient.

Example:
A and b give an example of what happen if R=1; c and d give an example of R=0.

Review of Page Replacement Algorithms

c- How is a page fault actually handled?

1. Trap to the operating system (also called page fault interrupt).

2. Save the user registers and process state; i.e. process goes into waiting state.

3. Determine that the interrupt was a page fault.

4. Check that the page reference was legal and, if so, determine the location of the page on the disk.
5. Issue a read from the disk to a free frame and wait in a queue for this device until the read request is serviced.
After the device seek completes, the disk controller begins the transfer of the page to the frame.

6. While waiting, allocate the CPU to some other user.

7. Interrupt from the disk occurs when the I/O is complete. Must determine that the interrupt was from the disk.

8. Correct the page table /other tables to show that the desired page is now in memory.

9. Take process out of waiting queue and put in ready queue to wait for the CPU again.

10. Restore the user registers, process state and new page table, then resume the interrupted instruction.

d- Segmentation

Where paging uses one continuous sequence of all virtual addresses from 0 to the maximum needed for the
process, segmentation is an alternate scheme that uses multiple separate address spaces for various segments of a
program. A segment is a logical entity of which the programmer is aware. Examples include a procedure, an
array, a stack, etc. Segmentation allows each segment to have different lengths and to change during execution.

You might also like