EC 5001 - Memory 1
EC 5001 - Memory 1
EC 5001 - Memory 1
Memory
IESL College of Engineering
Rukshan Siriwardhane
2
Lesson objectives
Understand the concepts of hierarchical memory
organization.
Understand how each level of memory contributes
to system performance, and how the performance is
measured.
Understand the concepts behind cache memory,
virtual memory, memory segmentation, paging and
address translation.
3
Introduction
Types of Memory
Types of Memory
Some types of DRAM:
Multibank DRAM (MDRAM)
Fast-Page Mode DRAM (FFM) DRAM
Extended Data Out (EDO) DRAM
Burst EDO (BEDO) DRAM
Synchronous Dynamic RAM (SDRAM)
Synchronous Link (SL) DRAM
Double Data Rate (DDR) DRAM
Direct Rambus (DR) DRAM
6
Types of Memory
Types of Memory
ROM also does not need to be refreshed, either. In
fact, it needs very little/no charge to retain its
memory.
ROM is used to store permanent, or semi-
permanent data that persists even while the system
is turned off.
Program necessary to boot the computer
In embedded systems, Toys, household
appliances, Automobiles
Five Basic types of ROM:
ROM, PROM, EPROM, EEPROM and Flash
memory
9
Types of Memory
Cache Memory
The purpose of cache memory is to speed up
accesses by storing recently used data closer to the
CPU, instead of storing it in main memory.
Although cache is much smaller than main memory, its
access time is a fraction of that of main memory.
Unlike main memory, which is accessed by address,
cache is typically accessed by content; hence, it is
often called content addressable memory (CAM).
Well look into this again in cache mapping
Because of this, a single large cache memory may not
be always desirable -- it takes longer to search.
17
Cache Memory
The content that is addressed in content addressable
cache memory is a subset of the bits of a main memory
address called a field.
A block is a defined number of bytes/words.
Main memory and cache consist of blocks of same
size.
Many blocks of main memory map to a single block of
cache.
18
Cache Memory
The tag is a special group of bits that is derived from the main memory address that is stored with
its corresponding
block in cache
A tag field in the cache block distinguishes one cached memory block from another
Offset field states where in the block data resides. This is another different portion of main
memory address
A valid bit indicates whether the cache block is being used.
19
Cache Memory
Cache mapping schemes
Direct mapping
Fully associative mapping
Set associative mapping
20
Cache Memory
The simplest cache mapping scheme is
direct mapped cache.
In a direct mapped cache consisting of N
blocks of cache, block X of main memory
maps to cache block Y = X mod N.
Thus, if we have 10 blocks of cache, block 7
of cache may hold blocks 7, 17, 27, 37, . . .
of main memory.
Cache Memory
With direct
mapped cache
consisting of N
blocks of cache,
block X of main
memory maps to
cache block Y = X
mod N.
22
Cache Memory
EXAMPLE 1
Consider a word-addressable main memory
consisting of four blocks, and a cache with two blocks, where each
block is 4 words.
This means Block 0 and 2 of main memory map to Block 0 of cache, and Blocks 1 and 3
of main memory
map to Block 1 of cache .
Using the tag, block, and offset fields, we can see how main
memory maps to cache as follows.
23
Cache Memory
EXAMPLE 1:
Consider a word-addressable main memory consisting of
four blocks, and a cache with two blocks, where each
block is 4 words.
First, we need to determine the address format for mapping. Each
block is 4 words, so the offset field must contain 2 bits; there are 2
blocks in cache, so the block field must contain 1 bit; this leaves 1
bit for the tag (as a main memory address has 4 bits because there
are a total of 2 =16 words).
4
24
Cache Memory
EXAMPLE 1 Cont'd a
Cache Memory
26
Cache Memory
EXAMPLE 2
Assume a byte-addressable memory consists of 214 bytes,
cache has 16 blocks, and each block has 8 bytes.
The number of memory blocks are:
Each main memory address requires14 bits. Of this 14-bit
address field, the rightmost 3 bits reflect the offset field
We need 4 bits to select a specific block in cache, so the
block field consists of the middle 4 bits.
The remaining 7 bits make up the tag field.
27
Cache Memory
EXAMPLE 3
Suppose we have a word-addressable system using
direct mapping with 16 words of main memory divided in
to 8 blocks (so that each block has 2 words). Assume
that the cache is 4 blocks in size (for a total of 8 words)
Cache Memory
EXAMPLE 4
Suppose we have a byte-addressable system using 15-
bit main memory addresses and 64 blocks of cache. If
each block contains 8 bytes, we know that the main
memory 15-bit address is divided into a 3-bit offset field,
a 6-bit block field and a 6-bit tag filed. If the CPU
generated the main memory address102810,
Cache Memory
In summary, direct mapped cache maps main memory blocks in a modular
fashion to cache
blocks. The mapping depends on:
The number of bits in the main memory address (how many addresses exist in
main memory)
The number of blocks are in cache (which determines the size of
the block field)
How many addresses (either bytes or words) are in a block (which determines the
size of the offset
field)
30
Cache Memory
Suppose instead of placing memory blocks in
specific cache locations based on memory
address, we could allow a block to go anywhere in
cache.
In this way, cache would have to fill up before any
blocks are evicted.
This is how fully associative cache works.
A memory address is partitioned into only two
fields: the tag and the offset.
Cache Memory
Suppose, as before, we have 14-bit memory addresses
and a cache with 16 blocks, each block of size 8. The
field format of a memory reference is:
Content
Addressable
Memory
When the cache is searched, all tags are searched in parallel to retrieve the data
quickly. Offset is used to find
the location in the block
This requires special, costly hardware.
Entire cache has to be built on associative memory so it can be searched in parallel
31
32
Cache Memory
You will recall that direct mapped cache evicts a block
whenever another memory reference needs that block.
With fully associative cache, we have no such
mapping, thus we must devise an algorithm to
determine which block to evict from the cache.
The block that is evicted is the victim block.
There are a number of ways to pick a victim - we will
discuss them shortly.
The removed block is written back to main memory if it
has been modified or simply overwritten if it has not
been changed
33
Cache Memory
In summary the fully associative cache,
Allows a main memory block to be mapped in to any block in
cache
To locate a specific byte or word the computer compares the
tag field of the main memory address to all tags stored in
cache (in one comparison).
Once the specific cache block is located the offset field is
used to locate required data within the block
If the tag of memory address doesnt match with any tags in
cache, the memory block with desired data must be
transferred into the cache.
This may require picking a victim block to transfer out of
cache
34
Cache Memory
Owing to its speed and complexity, associative cache
is very expensive
Although direct mapping is inexpensive, it is very
restrictive
E.g. running a program using block 0, block 4, block 0 again
and block 4 again and so on
Fully associative cache requires a large tag to be
stored
Set associative cache combines the ideas of direct
mapped cache and fully associative cache.
35
Cache Memory
An N-way set associative cache mapping is like
direct mapped cache where a memory reference
maps to a particular location in cache.
Unlike direct mapped cache, a memory reference
maps to a set of several cache blocks, similar to the
way in which fully associative cache works.
Instead of mapping anywhere in the entire cache, a
memory reference can map only to the subset of
cache slots.
36
Cache Memory
The number of cache blocks per set in set associative cache varies according to
overall system design.
For example, a 2-way set associative cache can be
conceptualized as shown in
the schematic below.
Each set contains two different memory blocks.
Cache Memory
In set associative cache mapping, a memory
reference is divided into three fields: tag, set,
and offset.
As with direct-mapped cache, the offset field
chooses the byte/word within the cache block,
and the tag field uniquely identifies the memory
address.
The set field determines the set to which the
memory block maps.
38
Cache Memory
EXAMPLE 5
Suppose we are using 2-way set associative mapping with a
word-addressable main memory of 214 words and a cache
with 16 blocks, where each block contains 8 words.
If the cache has a total of 16 blocks, and each set has 2
blocks, then there are 8 sets in cache.
Thus, the set field is 3 bits, the offset field is 3 bits, and the
tag field is 8 bits.
39
Cache Memory
The set field of a main memory address specifies a
unique set in cache for the given memory block
All blocks in that cache set are then searched for a
matching tag.
An associative search must be performed, but the
search is restricted to the specified set instead of the
entire cache.
This significantly reduces the cost of specialized
hardware
E.g. in a 2-way set associative cache, the
hardware searches only 2 blocks in parallel.
40
Cache Memory
EXAMPLE 6
Suppose a byte addressable memory contains 1MB
and cache consists of 32 blocks, where each block
contains 16 bytes. Using direct mapping, fully
associative mapping and 4-way set associative
mapping, determine where the main memory address
326A016 maps to in cache by specifying either the
cache block or the cache set.
Cache Memory
If we represent our main memory address in binary,
326A0 can be laid out as follows.
Cache Memory
If we are using the 4-way set associative mapping the
memory address format is
The set field has 3 bits because there are only 8 sets in
the cache (where each set holds 4 blocks) If we divide
our main memory address into these fields we get
43
Cache Memory
With fully associative and set associative cache, a
replacement policy is invoked when it becomes
necessary to evict a block from cache.
An optimal replacement policy would be able to look
into the future to see which blocks wont be needed
for the longest period of time.
Although it is impossible to implement an optimal
replacement algorithm, it is instructive to use it as a
benchmark for assessing the efficiency of any other
scheme we come up with.
44
Cache Memory
The replacement policy that we choose depends upon
the locality that we are trying to optimize-- usually, we
are interested in temporal locality.
A Least Recently Used (LRU) algorithm keeps track
of the last time that a block was assessed and evicts
the block that has been unused for the longest period
of time.
The disadvantage of this approach is its complexity:
LRU has to maintain an access history for each block,
which ultimately slows down the cache.
45
Cache Memory
First-in, first-out (FIFO) is a popular cache
replacement policy.
In FIFO, the block that has been in the cache the
longest, regardless of when it was last used.
A random replacement policy does what its name
implies: It picks a block at random and replaces it with
a new block.
Random replacement can certainly evict a block that
will be needed often or needed soon, but it never
thrashes.
46
Cache Memory
The performance of hierarchical memory is measured
by its effective access time (EAT).
EAT is a weighted average that takes into account the
hit ratio and relative access times of successive levels
of memory.
The EAT for a two-level memory is given by:
EAT = H AccessC + (1-H) AccessMM.
where H is the cache hit rate and AccessC and AccessMM
are the access times for cache and main memory,
respectively.
47
Cache Memory
For example, consider a system with a main
memory access time of 200ns supported by a cache
having a 10ns access time and a hit rate of 99%.
Suppose access to cache and main memory occurs
concurrently.
The EAT is:
0.99(10ns) + 0.01(200ns) = 9.9ns + 2ns = 11.9ns.
With the
Accesses
Overlapping
48
Cache Memory
For example, consider a system with a main memory
access time of 200ns supported by a cache having a
10ns access time and a hit rate of 99%.
If the accesses do not overlap, the EAT is:
Cache Memory
Caching depends upon programs exhibiting good
locality.
Some object-oriented programs have poor locality
owing to their complex, dynamic structures.
Arrays stored in column-major rather than row-major
order can be problematic for certain cache
organizations.
Looping arrays that does not fit in cache
With poor locality, caching can actually cause
performance degradation rather than performance
improvement.
50
Cache Memory
Cache replacement policies must take into account
dirty blocks, those blocks that have been updated
while they were in the cache.
Dirty blocks must be written back to memory. A write
policy determines how this will be done.
There are two types of write policies, write through
and write back (copy back).
Write through updates cache and main memory
simultaneously on every write.
51
Cache Memory
Write back (also called copy back) updates memory
only when the block is selected for replacement.
The disadvantage of write through is that memory must
be updated with each cache write, which slows down
the access time on updates. This slowdown is usually
negligible, because the majority of accesses tend to be
reads, not writes.
The advantage of write back is that memory traffic is
minimized, but its disadvantage is that memory does
not always agree with the value in cache, causing
problems in systems with many concurrent users.
52
Cache Memory
The cache we have been discussing is called a
unified or integrated cache where both instructions
and data are cached.
Many modern systems employ separate caches for
data and instructions.
This is called a Harvard cache.
The separation of data from instructions provides
better locality, at the cost of greater complexity.
Simply making the cache larger provides about the
same performance improvement without the
complexity.
53
Cache Memory
Cache performance can also be improved by
adding a small associative cache to hold blocks
that have been evicted recently.
This is called a victim cache.
A trace cache is a variant of an instruction cache
that holds decoded instructions for program
branches, giving the illusion that noncontiguous
instructions are really contiguous.
Cache Memory
Most of todays small systems employ multilevel
cache hierarchies.
The levels of cache form their own small memory
hierarchy.
Level1 cache (8KB to 64KB) is situated on the
processor itself.
Access time is typically about 4ns.
Level 2 cache (64KB to 2MB) may be on the
motherboard, or on an expansion card.
Access time is usually around 15 - 20ns.
55
Cache Memory
In systems that employ three levels of cache, the
Level 2 cache is placed on the same die as the CPU
(reducing access time to about 10ns)
Accordingly, the Level 3 cache (2MB to 256MB)
refers to cache that is situated between the processor
and main memory.
Once the number of cache levels is determined, the
next thing to consider is whether data (or instructions)
can exist in more than one cache level.
56
Cache Memory
If the cache system used an inclusive cache, the
same data may be present at multiple levels of cache.
Strictly inclusive caches guarantee that all data in a
smaller cache also exists at the next lower level.
Exclusive caches permit only one copy of the data.
Non-blocking cache (Intel L2 cache) can process up
to 4 requests concurrently
The tradeoffs in choosing one over the other involve
weighing the variables of access time, memory size,
and circuit complexity.