A Translation Lookaside Buffer
A Translation Lookaside Buffer
A Translation Lookaside Buffer
TLB is a table used in a virtual memory system, that lists the physical address
page number associated with each virtual address page number. A TLB is used
in conjunction with a cache whose tags are based on virtual addresses. The
virtual address is presented simultaneously to the TLB and to the cache so that
cache access and the virtual-to-physical address translation can proceed in
parallel (the translation is done "on the side"). If the requested address is not
cached then the physical address is used to locate the data in main memory. The
alternative would be to place the translation table between the cache
and main memory so that it will only be activated once
there was a cache miss.
TLB is a CPU cache that memory management
hardware uses to improve virtual address translation
speed. Some key features of TLB are:
Avoids retranslation
Paging
The basic idea behind paging is quite simple: Allocate physical memory to
processes in fixed size chunks (page frames) and keep track of where the
various pages of the process reside by recording information in a page table.
Every process has its own page table that typically resides in main memory, and
the page table stores the physical location of each virtual page of the process.
The page table has N rows, where N is the number of virtual pages in the
process. If there are pages of the process currently not in main memory, the
page table indicates this by setting a valid bit to 0; if the page is in main
memory, the valid bit is set to 1. Therefore, each entry of the page table has two
fields: a valid bit and a frame number.
Process memory is divided into these fixed size pages, resulting in potential
internal fragmentation when the last page is copied into memory. The process
may not actually need the entire page frame, but no other process may use it.
Therefore, the unused memory in this last frame is effectively wasted
Now that you understand what paging is, we will discuss how it works. When a
process generates a virtual address, the operating system must dynamically
translate this virtual address into the physical address in memory at which the
data actually resides. (For purposes of simplicity, let's assume we have no cache
memory for the moment.) For example, from a program viewpoint, we see the
final byte of a 10-byte program as address 9, assuming 1-byte instructions and
1-byte addresses, and a starting address of 0. However, when actually loaded
into memory, the logical address 9 (perhaps a reference to the label X in an
assembly language program) may actually reside in physical memory location
1239, implying the program was loaded starting at physical address 1230. There
must be an easy way to convert the logical, or virtual, address 9 to the physical
address 1230.
Paging and segmentation both have their advantages; however, a system does
not have to use one or the other-these two approaches can be combined, in an
effort to get the best of both worlds. In a combined approach, the virtual address
space is divided into segments of variable length, and the segments are divided
into fixed-size pages. Main memory is divided into the same size frames.
When segmentation is used with paging, a virtual address has three components:
a segment index Sf, a page index PI, and a displacement (offset) D. The
memory map then consists of one or more segment tables and page tables. For
fast address translation, two TLBs can be used as shown in Figure , one for
segment tables and one for page tables. As discussed earlier, the TLBs serve as
fast caches for the memory maps. Every virtual address Av generated by a
program goes through a two-stage translation process. First, the segment index
Sf is used to read the current segment table to obtain the base address PB of the
required page table. This base address is combined with the base index Pf
(which is just a displacement within the page table) to produce a page address,
which is then used to access a page table. The result is a real page address, that
is, a page frame number, which can be combined with the displacement part D
of Av to give the final (real) address AR• This system, as depicted in Figure , is
Page size
The page size Sp has a big impact on both storage utilization and the effective
memory data-transfer rate. Consider first the influence of Sp on the space-
utilization factor u defined earlier. If Sp is too large, excessive internal
fragmentation results; if it is too small, the page tables become very large and
tend to reduce space utilization. A good value of Sp should achieve a balance
between these two extremes. Let Ss denote the average segment size in words. If
Ss >> Sp the last page assigned to a segment should contains about S p /2 words.
The size of the page table associated with each segment is approximately S s / Sp
words, assuming each entry in the table is a word. Hence the memory space
overhead associated with each segment is
The optimum page size S~PT can be defined as the value of Sp that maximizes
u or, equivalently, that minimizes S. Differentiating S with respect to Sp' we
obtain
MEMORY ALLOCATION
The various levels of a memory system are divided into sets of contiguous
locations, variously called regions, segments, or pages, which store blocks of
data. Blocks are swapped automatically among the levels in order to minimize
the access time seen by the processor. Swapping generally occurs in response to
processor requests (demand swapping). However, to avoid making a processor
wait while a requested item is being moved to the fastest level of memory M I,
some kind of anticipatory swapping must be implemented, which implies
transferring blocks to MI in anticipation that they will be required soon. Good
short-range prediction of access-request patterns is possible because of locality
of reference.
REPALCEMENT POLICIES
CACHE MEMORY
These are small fast memories placed between the processor and the main
memory. Caches are faster than main memory. Small cache memories are
intended to provide fast speed of memory retrieval without sacrificing the size
of memory. Cache contains a copy of certain portions of main memory. The
UNIT 5 COMPUTER ARCHITECTURE AND ORGANIZATION NOTES Page | 9
memory read or writes Operation is first checked with cache and if the desired
location data is available in cache then used by the CPU directly. Otherwise, a
block of words are read from main memory to cache and the word is used by
CPU from cache. Since cache has limited space, so for this incoming block a
portion called a slot need to be vacated in Cache. The contents of this vacating
block are written back to the main memory at the position it belong to. The
reason of bringing a block of words to cache is once again locality of reference.
We expect that next few addresses will be close to this address and, therefore,
the block of word is transferred from main memory to cache. Thus, for the word
which is not in cache access time is slightly more than the access time for main
memory without cache. But, because of locality of references, the next few
words may be in the cache. thus, enhancing the overall speed of memory
references. For example, if memory read cycle takes 100 ns and a cache read
cycle takes 20 ns, then for four continuous references (first one brings the main
memory content to cache and next three from cache).
=120+60 = 180
CACHE ORGANIZATION
Figure below shows the principal components of a cache. Memory words are
stored in a cache data memory and are grouped into small pages called cache
blocks or lines. The contents of the cache's data memory are thus copies of a set
of main-memory blocks. Each cache block is marked with its block address,
referred to as a tag, so the cache knows to what part of the memory space the
block belongs. The collection of tag addresses currently
assigned to the cache, which can be noncontinguous, is
stored in a special memory, the cache tag memory or
directory.
In the look-aside design, the cache and the main memory are directly connected
to the system bus. In this design the
CPU initiates a memory access by
placing a (real) address Ai on the
memory address bus at the start of a
read (load) or write (store) cycle.
The cache M1 immediately
compares Ai to the tag addresses
currently residing in its tag
memory. If a match is found in M1,
that is, a cache hit occurs, the access is completed by a read or write operation
executed in the cache; main memory M 2 is not involved. If no match with A i is
found in cache, that is, a cache miss occurs, then the desired access is completed
by a write operation directed to M2. In response to a cache miss, a block (line)
of B; that includes the target address Ai is transferred from M2 to M1. This
transfer is fast, taking advantage of the small block size and fast RAM access
methods, which allow the cache block to be filled in a single short burst. The
cache implements some replacement policy such as LRU to determine where to
place an incoming block. When necessary, the cache block replaced by B; in M 1
is saved in M2. Note that cache misses, even though they are infrequent, result
in block transfers between M1 and M2 that tie up the system bus, making it
unavailable for other uses like IO operations.
Read Policy
Figure shows relationship between the data stored in the cache M 1 and the data
stored in main memory M2.Here a cache block (line) size of 4 bytes is assumed.
Each memory address is 12 bits long, so the 10 high-order bits form the tag or
block address, and the 2 low-order bits define a displacement address within the
block. When a block is assigned to M 1 data memory, its tag is also placed in
M1's tag memory. Figure shows the contents of two blocks assigned to the cache
data memory; note the locations of the same blocks in main memory. To read
the shaded word, its address Ai = 101111000110 is sent to M1, which compares
Ai tag part to its stored tags and finds a match (hit). The stored tag pinpoints the
corresponding block in M1 tag memory, and the 2-bit displacement is used to
output the target word to the CPU.
Write Policy:
A cache write operation employs the same addressing technique. The data in
cache and main memory can be written by processors or Input/Output devices.
The main problems faced in writing with cache memories are:
In the case of multiple CPUs with different caches a word altered in one cache
automatically invalidate the word in other cache.
Write through: Write the data in cache as well as main memory. The other
CPUs - Cache combination (in multiprocessor system) has to watch traffic to
the main memory and make suitable amendment in the contents of cache. The
disadvantage of this technique is that a bottleneck is created due to large
number of accesses to main memory by
various CPUs.
Address mapping is defined as the smallest unit of addressed data that can be
mapped independently to an area of the virtual address space.
Direct Mapping
Associative memory
Cache slot number =(Block number of main memory) Modulo (Total number
of slots
in cache)
ADVANTAGE
Associative Memories
The logic circuit for a 1-bit associative memory cell appears in Figure below.
The cell comprises a D flip-flop for data storage, a match circuit (the
EXCLUSIVE-NOR gate) for comparing the flip-flop's contents to an external
data bit D, and circuits for reading from and writing into the cell. The results of
a comparison appear on the match output M, where M = 1 denotes a match and
M = 0 denotes no match. The cell is selected or addressed for both read and
write operations by setting the select line S to 1. New data is written into the
cell by setting the write enable line WE to 1, which in turn enables the D flip-
flop's clock input CK. The stored data is read out via the Q line. The mask
control line MK is activated (MK =1) to force the match line M to 0
independently of the data stored in the D flip-flop; MK also disables the input
circuits of the flip-flop by forcing CK to 0. A cell like that of Figure can be
realized with about 10 transistors-far more than the single transistor required for
a dynamic RAM cell. This high hardware cost is the main reason that large
associative memories are rarely used outside caches.
CACHE TYPES
PERFORMANCE
or, in words, Cache size=number of blocks (lines) per set x number of sets x
number of bytes per block