Presentation 3333
Presentation 3333
Presentation 3333
10 / 4 /2018
Improving Cache Miss Latency-Reducing DRAM Latency
Multiplexor
Cache Cache
Cache
one-word wide
memory organization DRAM access time >> bus transfer time
Memory Access Time
Example
• Assume that it takes 1 cycle to send the address, 15 cycles
for each DRAM access and 1 cycle to send a word of data.
• Assuming a cache block of 4 words and one-word wide
DRAM (fig. 7.13a), miss penalty = 1 + 4x15 + 4x1 = 65 cycles
• With main memory and bus width of 2 words (fig. 7.13b),
miss penalty = 1 + 2x15 + 2x1 = 33 cycles. For 4-word wide
memory, miss penalty is 17 cycles. Expensive due to wide
bus and control circuits.
• With interleaved memory of 4 memory banks and same bus
width (fig. 7.13c), the miss penalty = 1 + 1x15 + 4x1 = 20
cycles. The memory controller must supply consecutive
addresses to different memory banks. Interleaving is
universally adapted in high-performance computers.
Virtual Memory
• Idea 1: Many Programs sharing DRAM Memory so that context switches can
occur
• Idea 2: Allow program to be written without memory constraints – program
can exceed the size of the main memory
• Idea 3: Relocation: Parts of the program can be placed at different locations in
the memory instead of a big chunk.
• Virtual Memory:
(1) DRAM Memory holds many programs running at same time (processes)
(2) use DRAM Memory as a kind of “cache” for disk
Virtual Memory has own terminology
• Each process has its own private “virtual address space” (e.g., 232 Bytes); CPU
actually generates “virtual addresses”
• Each computer has a “physical address space” (e.g., 128 MegaBytes DRAM);
also called “real memory”
• Address translation: mapping virtual addresses to physical addresses
• Allows multiple programs to use (different chunks of physical) memory at same time
• Also allows some chunks of virtual memory to be represented on disk, not in main
memory (to exploit memory hierarchy)
Mapping Virtual Memory to Physical Memory
• Divide Memory into equal sized
Virtual Memory
“chunks” (say, 4KB each)
Any chunk of Virtual Memory ° Stack
assigned to any chunk of
Physical Memory (“page”)
Physical Memory
64 MB Single
Process
Heap
Heap
Static
Code
0 0
Handling Page Faults
1KB page
Translation size
29 28 27 .………………….12 11 10 9 8 ……..……. 3 2 1 0
Physical Address
Address Translation
Page Table
Page Table ...
Register V A.R. P. P. N.
index Val Access Physical +
into -id Rights Page
page Number Physical
table V A.R. P. P. N. Memory
0 A.R. Address (PA)
Page Table
...
is located Access Rights: None, Read Only,
in physical Read/Write, Executable
memory
disk
Handling Page Faults
“tag” “data”
•TLB just a cache of the page table mappings
• Dirty: since use write back, need to know whether or not to write
page to disk when replaced
• Ref: Used to calculate LRU on replacement
64 entries,
fully 20
associative
Cache
16K
entries,
direct 32
mapped
Cache hit Data
Real Stuff: Pentium Pro Memory Hierarchy
• Address Size: 32 bits (VA, PA)
• VM Page Size: 4 KB, 4 MB
• TLB organization: separate i,d TLBs
(i-TLB: 32 entries,
d-TLB: 64 entries)
4-way set associative
LRU approximated
hardware handles miss
• L1 Cache: 8 KB, separate i,d 4-way set
associative
LRU approximated
32 byte block
write back
• L2 Cache: 256 or 512 KB