CN114600091A - Reuse distance based cache management - Google Patents
Reuse distance based cache management Download PDFInfo
- Publication number
- CN114600091A CN114600091A CN202080071961.5A CN202080071961A CN114600091A CN 114600091 A CN114600091 A CN 114600091A CN 202080071961 A CN202080071961 A CN 202080071961A CN 114600091 A CN114600091 A CN 114600091A
- Authority
- CN
- China
- Prior art keywords
- cache
- line
- cache line
- priority level
- replacement
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003780 insertion Methods 0.000 claims abstract description 14
- 230000037431 insertion Effects 0.000 claims abstract description 14
- 230000004044 response Effects 0.000 claims description 36
- 238000000034 method Methods 0.000 claims description 34
- 238000012935 Averaging Methods 0.000 claims description 13
- 238000012545 processing Methods 0.000 claims description 13
- 238000007726 management method Methods 0.000 description 19
- 238000004364 calculation method Methods 0.000 description 13
- 239000000523 sample Substances 0.000 description 11
- 238000012913 prioritisation Methods 0.000 description 10
- 230000008901 benefit Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 241001522296 Erithacus rubecula Species 0.000 description 1
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011010 flushing procedure Methods 0.000 description 1
- 230000001771 impaired effect Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000009738 saturating Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0844—Multiple simultaneous or quasi-simultaneous cache accessing
- G06F12/0855—Overlapped cache accessing, e.g. pipeline
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0866—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
- G06F12/0871—Allocation or management of cache space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0891—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0893—Caches characterised by their organisation or structure
- G06F12/0897—Caches characterised by their organisation or structure with two or more cache hierarchy levels
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/12—Replacement control
- G06F12/121—Replacement control using replacement algorithms
- G06F12/126—Replacement control using replacement algorithms with special data handling, e.g. priority of data or instructions, handling errors or pinning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/12—Replacement control
- G06F12/121—Replacement control using replacement algorithms
- G06F12/128—Replacement control using replacement algorithms adapted to multidimensional cache systems, e.g. set-associative, multicache, multiset or multilevel
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/50—Control mechanisms for virtual memory, cache or TLB
- G06F2212/502—Control mechanisms for virtual memory, cache or TLB using adaptive policy
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/60—Details of cache memory
- G06F2212/6024—History based prefetching
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
A cache [110] of a processor [102] includes a cache controller [112] to implement a cache management policy for insertion and replacement of cache lines of a cache. The cache management policy assigns a replacement priority level to each cache line in at least a subset of the cache lines in the region of the cache based on a comparison of a number of accesses to the cache set [116] having ways to store the cache line since a last access to the cache line to a reuse distance determined for the region of the cache, wherein the reuse distance represents an average number of accesses to the cache set between accesses to any given cache line of a given cache set of the region.
Description
Background
Processing systems use caches to temporarily buffer data from memory or mass storage for fast access. Since caches have limited storage capacity, cache management policies are typically employed to guide the selection of cache lines for replacement when the corresponding region of the cache is full. However, some conventional cache management policies, such as those based on Least Recently Used (LRU) or re-reference interval prediction (RRIP) principles, are less efficient at handling irregular accesses to cache lines, or require relatively complex circuit implementations, which can limit their applicability.
Drawings
The present disclosure may be better understood, and its numerous features and advantages made apparent to those skilled in the art by referencing the accompanying drawings. The use of the same reference symbols in different drawings indicates similar or identical items.
Fig. 1 is a block diagram of a processing system having a cache implementing a reuse distance based cache management policy, according to some embodiments.
Fig. 2 is a block diagram illustrating a reuse distance calculation component of the cache of fig. 1, according to some embodiments.
Fig. 3 is a flow diagram illustrating a method for calculating a current reuse distance for a corresponding region of the cache of fig. 1 and 2, according to some embodiments.
Fig. 4 is a block diagram illustrating an alternative implementation of a reuse distance calculation component, according to some embodiments.
Fig. 5 is a flow diagram illustrating a method of maintaining a line access counter for a corresponding cache line, according to some embodiments.
Fig. 6 is a flow diagram illustrating a method for assigning replacement priority levels to cache lines based on a current reuse distance and a corresponding line access count value, according to some embodiments.
Fig. 7 is a flow diagram illustrating a method for selecting a cache line for replacement using a reuse distance based replacement priority, according to some embodiments.
Detailed Description
1-7 illustrate systems and techniques for implementing cache management policies for caches of a processing system based on a "reuse distance" predicted for corresponding regions of the cache. As described herein, the reuse distance of a corresponding region of a cache is a representation of the historical average number of accesses to a given cache set of the cache region between accesses to a given cache line of the cache set. This reuse distance is then used as a predictor or other indicator of how many accesses may occur to a given cache line in a given cache set before the cache line is accessed again, and this information, along with information about the recent access history of the cache line, is used to assign a replacement priority to the cache line. Then, when the corresponding cache set is fully occupied, the cache line allocated in this manner is used for selection of a cache line for replacement according to the cache management policy. This reuse distance based cache management approach may provide accurate and efficient prioritization for replacing cache lines that may be used in round robin or streaming access modes, and in a relatively simple manner to implement in hardware.
Fig. 1 illustrates a processing system 100 that employs reuse distance based cache management, according to some embodiments. Processing system 100 includes a processor 102 coupled to a memory subsystem 104, where memory subsystem 104 includes one or more of system memory, scratch pad, disk drive, or other mass storage devices. The processor 102 is, for example, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), an Accelerated Processing Unit (APU), a Digital Signal Processor (DSP), or a combination thereof. The processor 102 includes one or more execution pipelines 106 (e.g., CPU cores) and one or more cache hierarchies including a cache 108. Cache 108 includes a cache array 110 and a cache controller 112. Cache array 110 includes a plurality of entries 114 to store cache lines (i.e., temporarily buffered data blocks) for access by one or more execution pipelines 106. In at least one embodiment, cache 108 is a set-associative cache such that cache line entries 114 are arranged in a plurality of cache sets 116, each cache set 116 having a plurality of ways, each way being a cache line entry 114 operable to store a corresponding cache line, and such that any cache line associated with a memory address mapped to a cache set 116 may be stored in any way of that cache set 116. In the illustrated example, the cache 108 implements four ways, way 0 through way 3, although more or fewer ways may be implemented. Each cache set 116 includes additional fields, such as a tag field 118 for each way of the set, where the tag field 118 stores a portion of a memory address, status bits, control bits, etc., associated with a valid cache line stored in the corresponding way (if any).
The cache controller 112 operates to maintain the various fields of the cache array 110 based on the activities of the one or more execution pipelines 106, including receiving and storing a block of data as a cache line, accessing the cache line for use or modification by the one or more execution pipelines 106, accessing the cache line for eviction or flushing to the memory subsystem 104, and so forth. As part of this process, cache controller 112 implements cache management policies 120 that control the prioritization of cache lines for replacement or other evictions, and controls the selection of candidate cache lines for replacement or other evictions based on such prioritization. In at least one embodiment, the cache management policy 120 utilizes a "reuse distance" determined for a corresponding region of the cache 108 (which may be a portion (e.g., a quarter) of the cache 108 or the entire cache 108) and a recent access history for a cache line in the corresponding region to determine a replacement priority for the cache line. Although the reuse distance is determined for a portion of the cache in this embodiment, the cache management policy 120 may be applied to the entire cache 108. As described above, at a high level, the reuse distance represents an average number of accesses to a cache set 116 between accesses to a particular cache line within the cache set 116 in the corresponding cache region. That is, the reuse distance represents a prediction of the average number of cache accesses that may occur to any given cache set of the corresponding region of cache 108 before a given cache line of that cache set is accessed again. To this end, the cache management policy 120 implements three phases, each of which is independent and runs concurrently with the other phases: a reuse distance determination phase 122, a replacement priority assignment phase 124, and a cache line replacement phase 126. The reuse distance determination stage 122 provides for the calculation of the current reuse distance for each applicable region of the cache 108 and is described in more detail below with reference to fig. 2-4. The replacement priority assignment stage 124 provides assignment of replacement priorities to cache lines based on the current reuse distance calculated at the most recent compute cycle of stage 122, and is described in more detail below with reference to fig. 5 and 6. Cache line replacement stage 126 provides for selection of a cache line for replacement based on an assigned replacement priority determined from the most recent prioritization period of stage 124, and described in more detail below with reference to FIG. 7.
In at least one embodiment, cache 108 employs a counter set 128 for calculating the current reuse distance at each iteration of stage 122 and for determining the replacement priority of the cache line at each cycle of stage 124. The counter set 128 includes a set access counter 130 and a line access counter 132 for each way (i.e., per cache line) of the cache array 110 or, alternatively, for each way of a subset of cache sets designated as a representative cache set of the cache 108 for sampling purposes (e.g., every xth cache set of the region, X being an integer greater than 1). The set access counter 130 stores a set access count value that represents the number of times an access to the cache set 116 associated with the cache line has occurred since the corresponding cache line was inserted or last accessed. The row access counter 132 stores a row access count value that represents the number of times the corresponding cache line has been accessed since being inserted into the cache 108 or since being reset in response to the start of the next compute cycle. In some embodiments, the counter set 128 also includes an nth access counter 134 that counts the number of accesses to the corresponding cache set 116 before resetting after the nth count access (and triggering further operations, as described below), where N represents a programmable or otherwise specified integer greater than one (N > 1).
Fig. 2 illustrates an implementation of a reuse distance calculation component 200 implemented by the cache controller 112 and accessing the counter set 128, according to some embodiments. Reuse distance calculation component 200 includes a set accounting component 202 for each cache set 116 for calculating reuse distances for corresponding regions of cache 108. This may include each cache set 116 in a cache region, or a representative subset of the cache set 116 of a cache region. In the depicted example, there are X cache sets 116(X > ═ 1) represented, with set accounting components 202-0, 202-1, and 202-3 shown for cache set 0, cache set 1, and cache set X-1 of the represented cache sets 116. Reuse distance calculation component 200 also includes accumulator 204 and averaging/scaling component 206. In one embodiment, reuse distance calculation component 200 also includes a hit counter 207 for counting the number of cache hits for a corresponding region of cache 108 in a current calculation cycle and triggering a reuse distance calculation when the number of cache hits reaches a programmable or otherwise specified value of K. To illustrate, hit counter 207 may be implemented as a countdown counter that resets to K for each compute cycle, decrements each cache hit for the representative cache set, and triggers a reuse distance calculation when 0 is reached.
The configuration of set accounting component 202-3 for set X-1 is illustrated and represents the configuration of each set accounting component 202 with respect to its corresponding cache set 116. As shown, set accounting component 202-3 includes a set of comparators 208, one for each way in the corresponding cache set 116, and selection logic 214 (depicted as a multiplexer for ease of illustration). Thus, for the depicted example having a cache 108 with a set of four ways, the set accounting component 202-3 includes four comparators 208. Each comparator 208 includes an input coupled to receive an address value from a tag field of a corresponding way of the set (e.g., one of the tag fields 118-0 through 118-3 for ways 0 through 4, respectively) and an input to receive an address value from a tag field 210 of a cache probe 212 committed by the execution pipeline 106 to the cache 108. Each comparator 208 also has an output that asserts when the address value from the tag field 118 of the corresponding way matches the address value of the tag field 210 of the cache probe 212; that is, the comparator 208 associated with the way of the cache set 116 that is the target of the cache probe 212 (i.e., providing a cache "hit" for the cache probe 212) has its output asserted, while the other comparators 208 of the set accounting component 202-3 remain unasserted. In this manner, the output of comparator 208 identifies ways that contain cache lines having addresses that match the addresses represented in cache probe 212.
As described above, the counter set 128 includes a set access counter 130 for each way of each representative cache set 116 for reuse distance calculations. Thus, for set X-1 associated with the illustrated set accounting component 202-3, the counter set 128 includes four set access counters 130-0, 130-1, 130-2, and 130-3 for ways 0, 1, 2, and 3, respectively. Each of the set access counters 130-0 through 130-3 stores a set access count value that represents the number of accesses to set X-1 since the cache line in the corresponding way was inserted or last accessed, as described in more detail below.
The accumulator 204 operates to accumulate the set access counter values received from the various set accounting components 202 and provide the resulting updated accumulated values to the averaging/scaling component 206. In response to a triggering event (e.g., every kth access to the representative cache set of the region), the averaging/scaling component 206 operates to average the most recently updated accumulated values over the number of accesses to the representative cache set of the region since the last computation cycle to generate an average set access count value from the accumulated values. The average set access count value may be obtained via, for example, a series of shift operations, and in some embodiments, averaging/scaling component 206 scales the resulting average set access count value using a specified scaling factor. The resulting averaged/scaled set access count value is then used as the current reuse distance 216 for the corresponding region of the cache 108.
Fig. 3 illustrates a method 300 depicting in greater detail the reuse distance calculation process employed by reuse distance calculation component 200 of fig. 2 of cache controller 112, according to some embodiments. At block 302, the cache controller 112 monitors the operation of the cache 108 to determine whether a cache line has been inserted into one of the representative cache sets 116. At block 304, the cache controller 112 monitors the operation of the cache 108 to determine whether a cache line in one of the representative cache sets 116 has been accessed. If a cache line has been accessed, at block 306, the set access counter 130 value associated with the way of the cache set 116 storing the accessed cache line is sent to the accumulator 204. When a cache line has been inserted or accessed, at block 308, cache controller 112 resets the set access counters 130 associated with the ways of cache set 116 that store the inserted or accessed cache line to a predetermined value (e.g., 0), and at block 310, cache controller 112 increments the set access counters 130 associated with every other way of the cache set 116. That is, inserting a cache line or accessing a cache line in a way of the representative cache set 116 resets the set access counter 130 for that way of the cache set 116, while also causing the set access counter 130 for every other way of the cache set 116 to be incremented. Alternatively, cache controller 112 increments set access counters 130 associated with each way of cache set 116 and then resets set access counters 130 associated with ways of cache set 116 that store the inserted or accessed cache line to a predetermined value. Thus, in this manner, the cache controller 112 monitors the number of accesses to the representative cache set 116 since the cache line was inserted or last accessed for each cache line in the representative cache set 116.
Further, at block 312, the cache controller 112 monitors for cache hits to ways of the representative cache set 116 (which may include accesses referenced at block 302). In at least one embodiment, a cache hit to a way of the representative cache set 116 is signaled via the comparator 208 of the set accounting component 202 associated with the representative cache set 116 because when the address represented in a cache probe (e.g., cache probe 212, fig. 2) matches the address in the tag field 118 of the corresponding way, the output of the corresponding comparator 208 asserts and thus signals that the cache probe has hit the way of the representative cache set 116. In response to such a cache hit, at block 314, the selection logic 214 of the set accounting component 202 associated with the hit cache set 116 outputs the value of the set access counter 130 associated with the hit way of the hit cache set 116 to the accumulator 204, whereupon the accumulator 204 adds the input value to the previous accumulated value for the current compute cycle.
The process of blocks 312 and 314 is repeated for each cache hit to the representative cache set 116 until a Kth cache hit to a corresponding region of the cache 108 is detected (via, for example, hit counter 207) in the current compute iteration, where K is a programmable or otherwise specified integer value greater than 1 (K > 1). For example, K may be set to 64 such that the set count accumulation process continues until a 64 th cache hit to a corresponding region of cache 108 occurs. In response to determining that a kth cache hit to a region of the cache 108 has occurred for the current compute cycle at block 316, then at block 318, the averaging/scaling component averages the current accumulated value from the accumulator 204 over the K cache hits for the current compute cycle and, in some embodiments, scales the average by multiplying the average by a factor. In implementations where K and any scaling factor are powers of 2, averaging/scaling component 206 may be implemented as right and left shift logic. To illustrate, when K is set to 64(2^6) according to the previous example, averaging of the accumulated values may be performed by right-shifting the accumulated values by 6 bits and then left-shifting by one bit. More generally, when K is equal to 2^ M (M is a positive integer), then averaging/scaling component 206 may shift the current updated accumulated value left by M bit positions and then right by one bit position to obtain an average set access count of K cache accesses to the representative cache set. Similarly, scaling the average by 2 may be achieved by shifting the resulting average left by 1 bit. The resulting averaged (and scaled) value is then set to the current reuse distance 216 of the corresponding region of the cache 108. The scaling of the average takes into account that there may be differences between the set access count values of the various ways in the set, resulting in a reuse distance that is lower than the set access count values of some rows in the set. As explained in more detail below, a given line is more likely to be evicted once its set access count value exceeds the reuse distance of the cache. Scaling the average may be used to prevent certain rows in the set from being inadvertently prioritized for replacement. With the current compute cycle finished, at block 320, cache controller 112 resets various components used in the finished compute cycle, such as hit counter 207, set access counter 130, row access counter 132, and accumulator 204, and then the process returns to block 312 for the next compute cycle.
FIG. 4 illustrates an alternative implementation of a set accounting component 202 (FIG. 2) implemented by the cache controller 112 for each representative cache set 116 in accordance with at least one embodiment. As with the implementation of FIG. 2, an alternative implementation of the set accounting component 202 of FIG. 4 includes comparators 208 operable to assert their respective outputs in response to an address in the tag field 118 of the corresponding way matching an address in the tag field 210 of the received cache probe 212, and also includes selection logic 414 (depicted as a multiplexer for ease of illustration) that utilizes the outputs of the comparators 208 as its selection control inputs. However, rather than using a counter for the set access counter 130 that is large enough to account for each access to the corresponding cache set during a compute cycle, the implementation of fig. 4 instead increments the set access counter 130 every nth access to the cache set 116, thereby allowing a smaller counter to be used for the set access counter 130. To facilitate counting accesses by N, the illustrated implementation of the set accounting component 202 also includes an nth access counter 404 that is incremented (or reset to N and decremented for each access) each time the cache set 116 is accessed during a current compute cycle. Furthermore, to compensate for the fact that incrementing of the set access counters 130 of a cache set 116 is triggered only every nth access to that cache set, the illustrated implementation employs a shift register 406 and an adder 408 between the output of each set access counter 130 and the corresponding input to the select logic 414 to adjust the cache access method for such samples. Thus, assuming that N is a power of 2 (2^ j), the set access count value output from the set access counter 130 is left shifted by j bits and added to the current hit count represented in the Nth access counter 404, and the resulting value is fed to the corresponding input of the selection logic 414. Selection logic 414 is then operable to select one of the input values for output to accumulator 204 based on the state of the output of comparator 208, which in turn indicates which way (if any) of representative cache set 116 is the target of nth cache probe 212.
Fig. 5 and 6 together illustrate the operation of cache controller 112 to implement cache management policy 120 during replacement priority assignment phase 124 (fig. 1) according to some embodiments. Fig. 5 illustrates a method 500 of maintaining a row access counter 132 for a representative cache set 116 during a prioritization period, according to some embodiments. As described above, each row access counter 132 represents the number of times a cache line stored in the way associated with the row access counter 132 has been accessed since the start of the current prioritization cycle. Thus, as the prioritization cycle begins, at block 502 the cache controller 112 monitors for the insertion of a cache line in a way of the representative cache set 116. In response to such an insertion, at block 504, the cache controller 112 resets the row access counter 132 associated with the target way of the representative cache set 116. Thereafter, at block 506, the cache controller 112 monitors for accesses to the cache line. In response to detecting the access, at block 508, the cache controller 112 increments the line access counter 132 for the accessed cache line. In some embodiments, the row access counters 132 are implemented as one-or two-bit saturating counters to reduce hardware requirements for the row access counters 132 and, thus, count up to at most one access (for a one-bit counter implementation) or at most 3 accesses (for a two-bit counter implementation). In other embodiments, more than two bits are used for the line access counter 132 to facilitate counting of a greater number of accesses to any given cache line.
Turning to fig. 6, a method 600 of representing an allocation of replacement priority to a cache line based on a current line access count of the cache line in a corresponding line access counter 132 and based on a current reuse distance is shown, according to some embodiments. The method 600 is initiated by an event (block 602), such as determining that a cache line is to be replaced, which serves as a trigger to determine the replacement priority of a newly inserted cache line during the current prioritization cycle and to re-determine the replacement priority of a previously existing cache line. In response to the trigger, the prioritization process is initiated by selection of a cache line of the region of cache 108 according to a selection sequence (e.g., direct sequential selection, pseudo-random selection, etc.) at block 604.
At block 606, cache controller 112 accesses set access counter 130, which stores the way of the selected cache line, and compares the count contained therein to current reuse distance 216. If the set access counter 130 is not greater than the current reuse distance 216, this indicates that the cache line has not reached the reuse distance and therefore may be reused in the future. Thus, if the set access counter 130 is less than the current reuse distance, at block 608, the cache controller 112 accesses the row access counter 132 associated with the way storing the selected cache line and determines whether the value stored therein is greater than zero (i.e., determines whether the cache line is reused after the insertion). If so, the cache line may be reused again, taking into account the tendency of some data to be repeatedly accessed. Thus, if it is determined at block 608 that line access counter 132 is greater than zero, then at block 610 a replacement priority level of 3 is assigned to the cache line (for the following, it is assumed that the lower the replacement priority value, the less suitable the corresponding cache line is as a candidate for replacement, and thus the greater the likelihood of selecting the corresponding cache line for replacement). Otherwise, if the line access counter 132 is equal to zero, this indicates that the cache line has not been reused; however, since the reuse distance of the cache line has not yet been reached, there is still some possibility that the cache line will be reused in the future. In this case, at block 612, the cache line is assigned a replacement priority level 1 (priority level 1 indicating a greater likelihood of replacement selection than replacement priority level 3).
Returning to block 606, if it is determined that the access count represented by set access counter 130 is greater than the current reuse distance, this means that the cache line has reached the reuse distance but has not been reused thereafter. Thus, at block 614, the cache controller 112 determines whether the cache line has been accessed by accessing the line access counter 132 for the way storing the cache line. If the line access counter 132 is greater than zero, this means that the cache line has been reused at least once since the cache 108 was inserted, and therefore is likely to be reused again. However, since it already exceeds the reuse distance, its reuse probability is impaired. Thus, if the line access counter 132 is greater than zero, then at block 616 the selected cache line is assigned a replacement priority level of 2 (replacement priority level 2 indicating a greater likelihood of replacement selection than replacement priority level 3 and a lower likelihood of selection than replacement priority level 1). Otherwise, if the count equals zero, this means that the cache line has not been reused since the insertion and has exceeded the reuse distance, and therefore is less likely to be reused in the future. In this case, at block 618, a replacement priority level of 0 is assigned to the cache line (replacement priority level 0 represents the maximum likelihood of replacement selection in this example).
FIG. 7 illustrates a method 700 representative of the operation of the cache controller 112 for the cache line replacement phase 126 (FIG. 1) of the cache management policy 120, according to some embodiments. For the following, recall that, for purposes of illustration, an increase in the value of the replacement priority level corresponds to an increase in the priority of retaining the corresponding cache line, and conversely, a decrease in the value of the replacement priority level corresponds to an increase in the priority or likelihood of evicting the corresponding cache line. At block 702, a load operation or a store operation is performed and results in the generation of a block of data to be stored as a cache line in cache 108. Thus, further at block 702, the cache controller 112 determines the cache set 116 available to store the cache line based on the address associated with the cache line. At block 704, the cache controller 112 determines whether there are available ways in the identified cache set 116 (i.e., whether there are ways that do not currently store a valid cache line). If so, at block 706, the cache controller 112 inserts the cache line into an available way of the cache set 116. If cache set 116 is a representative cache set, then inserting the cache line into cache set 116 triggers certain counting operations, as described above with reference to block 302 of FIG. 3 and block 502 of FIG. 5.
Otherwise, if there are no ways available in the identified cache set 116, the cache controller 112 determines whether to evict the current cache line in the cache set 116 or bypass the cache of that cache line based on the replacement priority level assigned during the replacement priority assignment phase 124 of the cache management policy 120, as described above. Thus, at block 708, the cache controller 112 determines whether the cache 108 supports cache bypassing (also referred to as "selective caching" or "cache eviction"). If cache bypass is supported, at block 710 the cache controller 112 determines whether the cache set 116 contains any cache lines assigned a replacement priority level of 0. If not, at block 712 the cache controller 112 may choose to bypass the cache of the cache line in cache 108 (e.g., by preventing any caching of the cache line, or by providing the cache line to a lower level cache for storage). In another embodiment, if it is determined that the cache line is part of the streaming process (e.g., the current reuse distance is small (0 or 1)), then if there is also no cache line with replacement priority level 1, a cache bypass may be selected.
If cache bypass is not supported, or there are no cache lines with a low enough prioritization level to justify cache bypass, then at block 714 the cache controller 112 selects the cache line in the cache set 116 with the lowest replacement priority level as a replacement candidate. In the case where there are two or more cache lines having the same lowest replacement priority level, the cache controller 112 may pseudo-randomly, select one of the cache lines based on a specified selection order, or select the way having the largest set access count. At block 716, cache controller 112 replaces or otherwise evicts the selected candidate cache line with the new cache line in the corresponding way of cache set 116. The action is a cache line insertion and therefore triggers certain counting operations as described above with reference to block 302 of fig. 3 and block 502 of fig. 5.
In some embodiments, certain aspects of the techniques described above are implemented by one or more processors of a processing system executing software. Software includes one or more sets of executable instructions stored or otherwise tangibly embodied on a non-transitory computer-readable storage medium. The software includes instructions and certain data which, when executed by one or more processors, manipulate the one or more processors to perform one or more aspects described above. Non-transitory computer-readable storage media include, for example, magnetic or optical disk storage, solid-state storage such as flash memory, cache, Random Access Memory (RAM), or one or more other non-volatile memory devices, and so forth. Executable instructions stored on a non-transitory computer-readable storage medium may take the form of source code, assembly language code, object code, or other instruction format that is interpreted or otherwise executed by one or more processors.
According to one aspect, a method for managing a cache of a processing system includes determining, by a cache controller of the cache, a reuse distance for a region of the cache, the reuse distance representing an average number of accesses to a cache set between accesses to a given cache line of a given cache set of the cache. The method also includes the cache controller assigning a replacement priority level to each cache line in at least a subset of the cache lines of the region of the cache based on the reuse distance and a count of the number of cache hits for the cache line.
According to another aspect, a method for managing a cache of a processing system includes implementing, by a cache controller of the cache, a cache management policy for insertion and replacement of a cache line of the cache, the cache management policy providing: a replacement priority level is assigned to each cache line in at least a subset of the cache lines in a region of the cache based on a comparison of a number of accesses to a cache set having ways to store the cache line since a last access of the cache line to a reuse distance determined for the region of the cache, the reuse distance representing an average number of accesses to the cache set between accesses to any given cache line of a given cache set of the region.
According to yet another aspect, a processor includes a cache including a plurality of cache sets, each cache set having a plurality of ways configured to store a corresponding cache line. The processor also includes a cache controller configured to implement a cache management policy for insertion and replacement of cache lines of the cache, the cache management policy providing: a replacement priority level is assigned to each cache line in at least a subset of the cache lines in a region of the cache based on a comparison of a number of accesses to a cache set having ways to store the cache line since a last access of the cache line to a reuse distance determined for the region of the cache, the reuse distance representing an average number of accesses to the cache set between accesses to any given cache line of a given cache set of the region.
Computer-readable storage media includes any non-transitory storage media, or combination of non-transitory storage media, that is accessible by a computer system during use to provide instructions and/or data to the computer system. Such storage media may include, but are not limited to, optical media (e.g., Compact Discs (CDs), Digital Versatile Discs (DVDs), blu-ray discs), magnetic media (e.g., floppy disks, tape, or magnetic hard disks), volatile memory (e.g., Random Access Memory (RAM) or cache), non-volatile memory (e.g., Read Only Memory (ROM) or flash memory), or micro-electromechanical systems (MEMS) -based storage media. The computer-readable storage medium can be embedded in a computing system (e.g., system RAM or ROM), fixedly attached to a computing system (e.g., a magnetic hard drive), removably attached to a computing system (e.g., an optical disk or Universal Serial Bus (USB) based flash memory), or coupled to a computer system over a wired or wireless network (e.g., a Network Accessible Storage (NAS)).
It should be noted that not all of the activities or elements described above in the general description are required, that a portion of a particular activity or device may not be required, and that one or more other activities may be performed, or that elements other than those described may be included. Further, the order in which activities are listed is not necessarily the order in which the activities are performed. In addition, the corresponding concepts have been described with reference to specific embodiments. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the present disclosure as set forth in the claims below. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of the present disclosure.
Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any feature(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature or feature of any or all the claims. Moreover, the particular embodiments disclosed above are illustrative only, as the disclosed subject matter may be modified and practiced in different but equivalent manners apparent to those skilled in the art having the benefit of the teachings herein. No limitations are intended to the details of construction or design herein shown, other than as described in the claims below. It is therefore evident that the particular embodiments disclosed above may be altered or modified and all such variations are considered within the scope of the disclosed subject matter. Accordingly, the protection sought herein is as set forth in the claims below.
Claims (21)
1. A method for managing a cache of a processing system, comprising:
determining, by a cache controller of the cache, a reuse distance for a region of the cache, the reuse distance representing an average number of accesses to a given cache set [116] of the cache between accesses to a given cache line of the cache; and
the cache controller assigns a replacement priority level to each cache line in at least a subset of the cache lines of the region of the cache based on the reuse distance and a count of a number of cache hits for the cache line.
2. The method of claim 1, further comprising:
selecting a cache line for replacement in the region of the cache based on the replacement priority level assigned to the cache.
3. The method of claim 1, wherein determining the reuse distance comprises:
in response to each cache hit to a cache set in at least a subset of cache sets of the region of the cache:
adding a set access count value associated with a way of the cache set containing a cache line targeted by the cache hit to an accumulated value to generate an updated accumulated value;
resetting the set access count value associated with the way of the cache set containing the cache line that is the target of the cache hit; and
incrementing a set access count value associated with other ways of the cache set; and
in response to detecting a specified number of cache hits to the at least a subset of the cache set:
averaging the updated accumulated values over the specified number of cache hits to generate an average set access count value; and
a reuse distance is determined based on the average set access count value.
4. The method of claim 3, further comprising:
in response to detecting the specified number of cache hits to the at least a subset of the cache set:
scaling the average set access count value by a specified factor to generate a scaled average set access count value; and
wherein determining the reuse distance comprises determining the reuse distance based on the scaled average set access count value.
5. The method of claim 3, wherein assigning a replacement priority level to each cache line in at least a subset of the cache lines of the region of the cache comprises:
assigning a first replacement priority to the cache line in response to determining that the set access count value associated with the cache line is not greater than the reuse distance and in response to determining that a number of accesses to the cache line since insertion in the way is greater than zero;
assigning a second replacement priority level to the cache line in response to determining that the set access count value associated with the cache line is not greater than the reuse distance and in response to determining that the number of accesses to the cache line since insertion in the way is equal to zero; and
wherein the second replacement priority level represents a greater likelihood of replacement selection than the first replacement priority level.
6. The method of claim 5, wherein assigning a replacement priority level to each cache line in at least a subset of the cache lines of the region of the cache further comprises:
assigning a third replacement priority level to the cache line in response to determining that the set access count value associated with the cache line is greater than the reuse distance and in response to determining that a number of accesses to the cache line since insertion in the way is greater than zero;
assigning a fourth replacement priority level to the cache line in response to determining that the set access count value associated with the cache line is greater than the reuse distance and in response to determining that the number of accesses to the cache line since insertion in the way is equal to zero;
wherein the third replacement priority level represents a greater likelihood of replacement selection than the first replacement priority level; and
wherein the fourth replacement priority level represents a greater likelihood of replacement selection than the second replacement priority level.
7. The method of claim 6, further comprising:
selecting a cache line for replacement in the region of the cache based on the replacement priority level assigned to the cache.
8. The method of claim 1, wherein the region of the cache comprises an entirety of the cache.
9. A method for managing a cache of a processing system, comprising:
implementing, by a cache controller of the cache, a cache management policy for insertion and replacement of cache lines of the cache, the cache management policy providing: assigning a replacement priority level to each cache line in at least a subset of cache lines in a region of the cache based on a comparison of a number of accesses to a cache set having ways to store the cache line since a last access of the cache line to a reuse distance determined for the region of the cache, the reuse distance representing an average number of accesses to the cache set between accesses to any given cache line of a given cache set of the region.
10. The method of claim 9, wherein enforcing the cache management policy comprises:
maintaining a set access count value for each way of each cache set in at least a subset of the cache sets of the region, the set access count value representing a number of accesses to the cache set since a cache line stored in the corresponding way was inserted or last accessed;
accumulating the set access count values for each way from a cache set of the at least a subset of cache sets targeted for cache hits in an accumulated value; and
determining the reuse distance based on averaging the accumulated value over a specified number of cache hits after the specified number of cache hits for the at least a subset of the cache set.
11. The method of claim 10, wherein enforcing the cache management policy further comprises:
maintaining a line access count value for each way of each cache set of the at least a subset of cache sets of the region, the line access count value representing a number of accesses to the cache line stored in the corresponding way.
12. The method of claim 11, wherein enforcing the cache management policy further comprises:
assigning a replacement priority level to each cache line in at least a subset of cache lines of the region of the cache based on a comparison of the set access count value associated with the cache line to the reuse distance and based on a determination of whether the line access count value for the way storing the cache line is greater than or equal to zero.
13. The method of claim 12, wherein assigning a replacement priority level to each cache line comprises:
assigning a first replacement priority level to the cache line in response to determining that the set access count value associated with the cache line is not greater than the reuse distance and in response to determining that the line access count value associated with the cache line is greater than zero;
assigning a second replacement priority level to the cache line in response to determining that the set access count value associated with the cache line is not greater than the reuse distance and in response to determining that the line access count associated with the cache line is equal to zero;
assigning a third replacement priority level to the cache line in response to determining that the set access count value associated with the cache line is greater than the reuse distance and in response to determining that the line access count value associated with the cache line is greater than zero;
assigning a fourth replacement priority level to the cache line in response to determining that the set access count value associated with the cache line is greater than the reuse distance and in response to determining that the line access count is equal to zero;
wherein the second replacement priority level represents a greater likelihood of replacement selection than the first replacement priority level and the third replacement priority level;
wherein the third replacement priority level represents a greater likelihood of replacement selection than the first replacement priority level; and
wherein the fourth replacement priority level represents a greater likelihood of replacement selection than the second replacement priority level.
14. A processor, comprising:
a cache comprising a plurality of cache sets, each cache set having a plurality of ways configured to store a corresponding cache line; and
a cache controller configured to implement a cache management policy for insertion and replacement of cache lines of the cache, the cache management policy providing: assigning a replacement priority to each cache line in at least a subset of cache lines in a region of the cache based on a comparison of a number of accesses to a cache set having ways to store the cache line since a last access of the cache line to a reuse distance determined for the region of the cache, the reuse distance representing an average number of accesses to the cache set between accesses to any given cache line of a given cache set of the region.
15. The processor of claim 14, wherein the cache comprises:
a plurality of set access counters, each set access counter associated with a corresponding way of a cache set in at least a subset of cache sets of the region of the cache and configured to store a set access count value representing a number of accesses to the cache set since the cache line stored in the corresponding way was inserted or last accessed;
an accumulator configured to accumulate the set access count values for each way from a cache set of the at least a subset of cache sets targeted for cache hits in an accumulated value; and
an averaging/scaling component configured to, in response to detecting a specified number of cache hits on the at least a subset of a cache set, determine the reuse distance based on averaging the accumulated value by the specified number of cache hits.
16. The processor of claim 15, wherein the cache further comprises:
a plurality of row access counters, each row access counter associated with a corresponding way of a cache set of the at least a subset of cache sets and configured to store a row access count value representing a number of accesses to the cache line stored in the corresponding way.
17. The processor of claim 16, wherein the cache controller is configured to implement the cache management policy by:
assigning a replacement priority level to each cache line in at least a subset of cache lines of the region of the cache based on a comparison of the set access count value associated with the cache line to the reuse distance and based on a determination of whether the line access count value for the way storing the cache line is greater than or equal to zero.
18. The processor as in claim 17 wherein the cache controller is configured to assign replacement priority levels to cache lines of a cache set by:
assigning a first replacement priority level to the cache line in response to determining that the set access count value associated with the cache line is not greater than the reuse distance and in response to determining that the line access count value associated with the cache line is greater than zero;
assigning a second replacement priority level to the cache line in response to determining that the set access count value associated with the cache line is not greater than the reuse distance and in response to determining that the line access count associated with the cache line is equal to zero;
assigning a third replacement priority level to the cache line in response to determining that the set access count value associated with the cache line is greater than the reuse distance and in response to determining that the line access count value associated with the cache line is greater than zero;
assigning a fourth replacement priority level to the cache line in response to determining that the set access count value associated with the cache line is greater than the reuse distance and in response to determining that the line access count is equal to zero;
wherein the second replacement priority level represents a greater likelihood of replacement selection than the first replacement priority level and the third replacement priority level;
wherein the third replacement priority level represents a greater likelihood of replacement selection than the first replacement priority level; and
wherein the fourth replacement priority level represents a greater likelihood of replacement selection than the second replacement priority level.
19. The processor of claim 18, wherein the cache controller is configured to implement the cache management policy by:
selecting a cache line of the cache set for replacement based on a comparison of replacement priority levels assigned to each cache line in the cache set.
20. The processor of claim 15, wherein:
the specified number of cache hits equals 2^ M, M being an integer greater than 1; and
the averaging/scaling component is configured to determine the reuse distance by right-shifting the accumulated value by a position of M bits.
21. The processor of claim 14, wherein the region of the cache comprises an entirety of the cache.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/600,897 | 2019-10-14 | ||
US16/600,897 US11797455B2 (en) | 2019-10-14 | 2019-10-14 | Cache management based on reuse distance |
PCT/US2020/046390 WO2021076211A1 (en) | 2019-10-14 | 2020-08-14 | Cache management based on reuse distance |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114600091A true CN114600091A (en) | 2022-06-07 |
Family
ID=75383724
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202080071961.5A Pending CN114600091A (en) | 2019-10-14 | 2020-08-14 | Reuse distance based cache management |
Country Status (6)
Country | Link |
---|---|
US (1) | US11797455B2 (en) |
EP (1) | EP4046026A4 (en) |
JP (1) | JP2022552124A (en) |
KR (1) | KR20220075432A (en) |
CN (1) | CN114600091A (en) |
WO (1) | WO2021076211A1 (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12056058B2 (en) * | 2022-06-27 | 2024-08-06 | Arm Limited | Cache replacement control |
US20240211407A1 (en) * | 2022-12-27 | 2024-06-27 | Advanced Micro Devices, Inc. | Managing a Cache Using Per Memory Region Reuse Distance Estimation |
US20240264950A1 (en) * | 2023-02-02 | 2024-08-08 | Qualcomm Incorporated | Providing content-aware cache replacement and insertion policies in processor-based devices |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080244533A1 (en) * | 2007-03-26 | 2008-10-02 | Acumem Ab | System for and Method of Capturing Performance Characteristics Data From A Computer System and Modeling Target System Performance |
US20080320228A1 (en) * | 2007-06-25 | 2008-12-25 | International Business Machines Corporation | Method and apparatus for efficient replacement algorithm for pre-fetcher oriented data cache |
US20090055594A1 (en) * | 2006-06-05 | 2009-02-26 | Erik Berg | System for and method of capturing application characteristics data from a computer system and modeling target system |
US20100107142A1 (en) * | 2008-10-24 | 2010-04-29 | Microsoft Corporation | Scalability analysis for server systems |
CN103092774A (en) * | 2013-01-04 | 2013-05-08 | 北京北大众志微系统科技有限责任公司 | Management system and method of processor last level high-speed buffer |
CN103136113A (en) * | 2011-11-25 | 2013-06-05 | 中国科学院沈阳计算技术研究所有限公司 | Sharing cache conflict prediction method facing multi-core processor |
US20140129779A1 (en) * | 2012-11-06 | 2014-05-08 | Facebook, Inc. | Cache replacement policy for data with strong temporal locality |
US20140281740A1 (en) * | 2013-03-13 | 2014-09-18 | Intel Corporation | Vulnerability estimation for cache memory |
US20160062916A1 (en) * | 2014-08-27 | 2016-03-03 | The Board Trustees Of The Leland Stanford Junior University | Circuit-based apparatuses and methods with probabilistic cache eviction or replacement |
US20160232093A1 (en) * | 2015-02-11 | 2016-08-11 | Samsung Electronics Co., Ltd. | Computing apparatus and method for cache management |
WO2018211749A1 (en) * | 2017-05-16 | 2018-11-22 | ソニーセミコンダクタソリューションズ株式会社 | Storage controller, storage system, method for controlling storage controller, and program |
US20190079877A1 (en) * | 2017-09-12 | 2019-03-14 | Intel Corporation | System, Apparatus And Method For Prefetch-Aware Replacement In A Cache Memory Hierarchy Of A Processor |
US20190155750A1 (en) * | 2017-11-21 | 2019-05-23 | Arm Limited | Multi-tier cache placement mechanism |
US20190188154A1 (en) * | 2017-12-15 | 2019-06-20 | Intel Corporation | Translation pinning in translation lookaside buffers |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9160650B2 (en) | 2013-06-17 | 2015-10-13 | Futurewei Technologies, Inc. | Enhanced flow entry table cache replacement in a software-defined networking switch |
US11138121B2 (en) | 2017-11-20 | 2021-10-05 | Samsung Electronics Co., Ltd. | Systems and methods for efficient cacheline handling based on predictions |
-
2019
- 2019-10-14 US US16/600,897 patent/US11797455B2/en active Active
-
2020
- 2020-08-14 EP EP20875974.6A patent/EP4046026A4/en active Pending
- 2020-08-14 KR KR1020227016073A patent/KR20220075432A/en unknown
- 2020-08-14 CN CN202080071961.5A patent/CN114600091A/en active Pending
- 2020-08-14 JP JP2022519980A patent/JP2022552124A/en active Pending
- 2020-08-14 WO PCT/US2020/046390 patent/WO2021076211A1/en unknown
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090055594A1 (en) * | 2006-06-05 | 2009-02-26 | Erik Berg | System for and method of capturing application characteristics data from a computer system and modeling target system |
US20080244533A1 (en) * | 2007-03-26 | 2008-10-02 | Acumem Ab | System for and Method of Capturing Performance Characteristics Data From A Computer System and Modeling Target System Performance |
US20080320228A1 (en) * | 2007-06-25 | 2008-12-25 | International Business Machines Corporation | Method and apparatus for efficient replacement algorithm for pre-fetcher oriented data cache |
US20100107142A1 (en) * | 2008-10-24 | 2010-04-29 | Microsoft Corporation | Scalability analysis for server systems |
CN103136113A (en) * | 2011-11-25 | 2013-06-05 | 中国科学院沈阳计算技术研究所有限公司 | Sharing cache conflict prediction method facing multi-core processor |
US20140129779A1 (en) * | 2012-11-06 | 2014-05-08 | Facebook, Inc. | Cache replacement policy for data with strong temporal locality |
CN103092774A (en) * | 2013-01-04 | 2013-05-08 | 北京北大众志微系统科技有限责任公司 | Management system and method of processor last level high-speed buffer |
US20140281740A1 (en) * | 2013-03-13 | 2014-09-18 | Intel Corporation | Vulnerability estimation for cache memory |
US20160062916A1 (en) * | 2014-08-27 | 2016-03-03 | The Board Trustees Of The Leland Stanford Junior University | Circuit-based apparatuses and methods with probabilistic cache eviction or replacement |
US20160232093A1 (en) * | 2015-02-11 | 2016-08-11 | Samsung Electronics Co., Ltd. | Computing apparatus and method for cache management |
WO2018211749A1 (en) * | 2017-05-16 | 2018-11-22 | ソニーセミコンダクタソリューションズ株式会社 | Storage controller, storage system, method for controlling storage controller, and program |
US20190079877A1 (en) * | 2017-09-12 | 2019-03-14 | Intel Corporation | System, Apparatus And Method For Prefetch-Aware Replacement In A Cache Memory Hierarchy Of A Processor |
US20190155750A1 (en) * | 2017-11-21 | 2019-05-23 | Arm Limited | Multi-tier cache placement mechanism |
US20190188154A1 (en) * | 2017-12-15 | 2019-06-20 | Intel Corporation | Translation pinning in translation lookaside buffers |
Non-Patent Citations (4)
Title |
---|
CHENG, YX (CHENG, YUXIA) 等: "Efficient cache resource aggregation using adaptive multi-level exclusive caching policies", 《 FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE》, 19 July 2018 (2018-07-19), pages 964 - 974 * |
林隽民等: "一种基于重用距离预测与流检测的高速缓存替换算法", 《计算机研究与发展》, 31 May 2012 (2012-05-31), pages 1049 - 1060 * |
隋秀峰;吴俊敏;陈国良;唐轶轩;: "ELF:基于无用块消除和低重用块过滤的共享Cache管理策略", 计算机学报, no. 01, 15 January 2011 (2011-01-15) * |
黄涛;王晶;管雪涛;钟祺;王克义;: "一种降低末级高速缓存污染的软件控制插入策略", 电子学报, no. 12, 15 December 2012 (2012-12-15) * |
Also Published As
Publication number | Publication date |
---|---|
KR20220075432A (en) | 2022-06-08 |
EP4046026A4 (en) | 2023-12-06 |
JP2022552124A (en) | 2022-12-15 |
US20210109861A1 (en) | 2021-04-15 |
EP4046026A1 (en) | 2022-08-24 |
US11797455B2 (en) | 2023-10-24 |
WO2021076211A1 (en) | 2021-04-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI564719B (en) | A processor with multiple data prefetchers, a method of the processor operating and a computer program product from the processor operating | |
Pugsley et al. | Sandbox prefetching: Safe run-time evaluation of aggressive prefetchers | |
Xie et al. | PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches | |
US8347039B2 (en) | Programmable stream prefetch with resource optimization | |
US7441087B2 (en) | System, apparatus and method for issuing predictions from an inventory to access a memory | |
KR101361945B1 (en) | Mapping of computer threads onto heterogeneous resources | |
TWI596479B (en) | Processor with data prefetcher and method thereof | |
US9189422B2 (en) | Method to throttle rate of data caching for improved I/O performance | |
US9021206B2 (en) | Use of cache statistics to ration cache hierarchy access | |
CN106372007B (en) | Cache utilization estimation | |
CN114600091A (en) | Reuse distance based cache management | |
JP6166616B2 (en) | Information processing method, information processing apparatus, and program | |
US20110252215A1 (en) | Computer memory with dynamic cell density | |
WO2017069907A1 (en) | System and method for a shared cache with adaptive partitioning | |
JP2004038345A (en) | Prefetch control device, information processor, and prefetch control process | |
US20210182214A1 (en) | Prefetch level demotion | |
US20070204267A1 (en) | Throttling prefetching in a processor | |
US20100077154A1 (en) | Method and system for optimizing processor performance by regulating issue of pre-fetches to hot cache sets | |
JP2019521408A (en) | Up / down prefetcher | |
US8274521B2 (en) | System available cache color map | |
US11487671B2 (en) | GPU cache management based on locality type detection | |
US8484423B2 (en) | Method and apparatus for controlling cache using transaction flags | |
CN118550853B (en) | Cache replacement method and device, electronic equipment and readable storage medium | |
US20240211407A1 (en) | Managing a Cache Using Per Memory Region Reuse Distance Estimation | |
KR101697515B1 (en) | Cache replacement algorithm for last-level caches by exploiting tag-distance correlation of cache lines and embedded systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |