The memory hierarchy was developed to give the illusion of large, fast, and cheap memory by using smaller, faster cache memory between the CPU and main memory (DRAM). Cache memory provides faster access times than main memory. Cache performance is measured by hit ratio, with higher hit ratios indicating better cache performance. Cache misses occur when requested data is not found in the cache.
The memory hierarchy was developed to give the illusion of large, fast, and cheap memory by using smaller, faster cache memory between the CPU and main memory (DRAM). Cache memory provides faster access times than main memory. Cache performance is measured by hit ratio, with higher hit ratios indicating better cache performance. Cache misses occur when requested data is not found in the cache.
The memory hierarchy was developed to give the illusion of large, fast, and cheap memory by using smaller, faster cache memory between the CPU and main memory (DRAM). Cache memory provides faster access times than main memory. Cache performance is measured by hit ratio, with higher hit ratios indicating better cache performance. Cache misses occur when requested data is not found in the cache.
The memory hierarchy was developed to give the illusion of large, fast, and cheap memory by using smaller, faster cache memory between the CPU and main memory (DRAM). Cache memory provides faster access times than main memory. Cache performance is measured by hit ratio, with higher hit ratios indicating better cache performance. Cache misses occur when requested data is not found in the cache.
By Suparna Dutta cache memory was put between CPU and DRAM.
Memory Hierarchy Memory Hierarchy Properties
• In the Computer System Design, Memory Hierarchy is an enhancement to organize the memory such that it can minimize the access time. The Memory Hierarchy was developed based on a program behavior known as locality of references. Cache Performance Types of misses • A cache miss occurs when a cache doesn't have the requested data in Compulsory—The very first access to a block cannot be in the cache, so the its memory. Meanwhile, a cache hit is when a cache successfully finds block must be brought into the cache. Compulsory misses are those that occur the requested data, satisfying the search query. even if you had an infinite sized cache. • The performance of cache memory is frequently measured in terms Capacity—If the cache cannot contain all the blocks needed during execution of a quantity called Hit ratio. of a program, capacity misses (in addition to compulsory misses) will occur Hit ratio(H) = hit / (hit + miss) = because of blocks being discarded and later retrieved. no. of hits/total accesses Conflict—If the block placement strategy is not fully associative, conflict Miss ratio = miss / (hit + miss) = misses (in addition to compulsory and capacity misses) will occur because a no. of miss/total accesses = 1 - hit ratio(H) block may be discarded and later retrieved if multiple blocks map to its set and accesses to the different blocks are intermingled.
Cache Effective Access Time Cache Mapping
• EAT= h* (Cache Access Time)+ (1-h)* Memory Access Time Where h=hit ratio For multi level cache: Tavg = H1 * C1 + (1 – H1) * (H2 * C2 +(1 – H2) *M ) H1 is the Hit rate in the L1 caches. H2 is the Hit rate in the L2 cache. C1 is the Time to access information in the L1 caches. C2 is the Miss penalty to transfer information from the L2 cache to an L1 cache. M is the Miss penalty to transfer information from the main memory to the L2 cache. Direct Mapping Fully Associative Mapping • The simplest technique, known • A block of main as direct mapping, maps each memory can map to block of main memory into only any line of the cache one possible cache line. or In that is freely available Direct mapping, assign each at that moment. memory block to a specific line • This makes fully in the cache. If a line is associative mapping previously taken up by a more flexible than memory block when a new direct mapping. block needs to be loaded, the old block is trashed.
K- Way Set Associative Mapping Numericals
A computer has an 8 GB memory with 64 bit word sizes. Each block of memory In k-way set associative mapping, stores 16 words. The computer has a direct-mapped cache of 128 blocks. The computer uses word level addressing. What is the address format? If we change the • Cache lines are grouped into sets cache to a 4- way set associative cache, what is the new address format? where each set contains k number of Answers: With 8 GB and a 64 bit word size, there are 8 GB / (8 bytes / word) = 1 GW lines. of memory. This requires 30 bits for an address. Of the 30 bits, we need 4 bits for the word on the line and 7 bits for the block number, leaving 30 – (7 + 4) = 19 bits • A particular block of main memory can for the tag. map to only one particular set of the So the address format is 19 – 7 – 4. cache. If we have a 4-way set associative cache instead each set has 4 lines, then there will • However, within that set, the memory be 128 / 4 = 32 sets. So we would only need 5 bits for the block number, leaving 30 block can map any cache line that is – (5 + 4) = 21 bits for the tag. freely available like fully associative So the address format is 21 – 5 – 4. mapping. Techniques to reduce misses Techniques to reduce misses 1. Larger block size to reduce miss rate—The simplest way to reduce the miss rate is to take advantage of 5. Giving priority to read misses over writes to reduce miss penalty—A write buffer is a good place to spatial locality and increase the block size. Larger blocks reduce compulsory misses, but they also increase the implement this optimization. Write buffers create hazards because they hold the updated value of a location miss penalty. Because larger blocks lower the number of tags, they can slightly reduce static power. Larger needed on a read miss—that is, a read-after-write hazard through memory. One solution is to check the block sizes can also increase capacity or conflict misses, especially in smaller caches. Choosing the right block contents of the write buffer on a read miss. If there are no conflicts, and if the memory system is available, size is a complex trade-off that depends on the size of cache and the miss penalty. sending the read before the writes reduces the miss penalty. Most processors give reads priority over writes. 2. Bigger caches to reduce miss rate—The obvious way to reduce capacity misses is to increase cache capacity. This choice has little effect on power consumption. Drawbacks include potentially longer hit time of the larger cache memory and higher cost and power. Larger 6. Avoiding address translation during indexing of the cache to reduce hit time—Caches must cope with the caches increase both static and dynamic power. translation of a virtual address from the processor to a physical address to access memory. 3. Higher associativity to reduce miss rate—Obviously, increasing associativity reduces conflict misses. Greater associativity can come at the cost of increased hit time. As we will see shortly, associativity also increases power consumption. 4. Multilevel caches to reduce miss penalty—A difficult decision is whether to make the cache hit time fast, to keep pace with the high clock rate of processors, or to make the cache large to reduce the gap between the processor accesses and main memory accesses. Adding another level of cache between the original cache and memory simplifies the decision. The first-level cache can be small enough to match a fast clock cycle time, yet the second-level (or third-level) cache can be large enough to capture many accesses that would go to main memory.
Virtual Memory Virtual Memory
• In this scheme, whenever some pages needs to be loaded in the main • Demand Paging is a popular method of virtual memory management. memory for the execution and the memory is not available for those In demand paging, the pages of a process which are least used, get many pages, then in that case, instead of stopping the pages from stored in the secondary memory. entering in the main memory, the OS search for the RAM area that • A page is copied to the main memory when its demand is made or are least used in the recent times or that are not referenced and copy page fault occurs. There are various page replacement algorithms that into the secondary memory to make the space for the new pages which are used to determine the pages which will be replaced. We in the main memory. will discuss each one of them later in detail. • Since all this procedure happens automatically, therefore it makes the computer feel like it is having the unlimited RAM. Virtual Memory Memory Replacement Policies Advantages of Virtual Memory First In First Out (FIFO): The page to be replaced is the "oldest" page in the memory, the one which was loaded before all the others The degree of Multiprogramming will be increased. Least Recently Used (LRU): User can run large application with less real RAM. The page to be replaced is the one which has not been referenced since all the others have been referenced There is no need to buy more memory RAMs. Last In First Out (LIFO): Disadvantages of Virtual Memory The page to be replaced is the one most recently loaded into the memory Least Frequently Used (LFU): The system becomes slower since swapping takes time. The page to be replaced is the one used least often of the pages currently in the memory It takes more time in switching between applications. Optimal (OPT or MIN): The page to be replaced is the one that will not be used for the longest period of time. This The user will have the lesser hard disk space for its use. algorithm requires future knowledge of the reference string which is not usually available. Thus, this policy is used for comparison studies
Example on FIFO Optimal Page Replacement Policies
Belady’s anomaly proves that it is possible to have more page faults when increasing the number of page frames while using the First in First Out (FIFO) page replacement algorithm. For example, if we consider reference strings 3, 2, 1, 0, 3, 2, 4, 3, 2, 1, 0, 4, and 3 slots, we get 9 total page faults, but if we increase slots to 4, we get 10-page faults. LRU Example