US20080010413A1 - Method and apparatus for application-specific dynamic cache placement - Google Patents
Method and apparatus for application-specific dynamic cache placement Download PDFInfo
- Publication number
- US20080010413A1 US20080010413A1 US11/482,923 US48292306A US2008010413A1 US 20080010413 A1 US20080010413 A1 US 20080010413A1 US 48292306 A US48292306 A US 48292306A US 2008010413 A1 US2008010413 A1 US 2008010413A1
- Authority
- US
- United States
- Prior art keywords
- load
- store instruction
- store
- virtual partitions
- data item
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims abstract description 76
- 238000012545 processing Methods 0.000 claims abstract description 10
- 238000005192 partition Methods 0.000 description 99
- 238000010586 diagram Methods 0.000 description 12
- 238000000638 solvent extraction Methods 0.000 description 3
- 230000001747 exhibiting effect Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 239000000523 sample Substances 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010926 purge Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0842—Multiuser, multiprocessor or multiprocessing cache systems for multiprocessing or multitasking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/12—Replacement control
- G06F12/121—Replacement control using replacement algorithms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3004—Arrangements for executing specific machine instructions to perform operations on memory
- G06F9/30043—LOAD or STORE instructions; Clear instruction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3824—Operand accessing
Definitions
- the present invention relates generally to data processing systems and relates more particularly to the management of data stored in cache memory systems.
- Cache memory Traditional means of reducing this performance gap include the use of cache memory of varying sizes and levels, which provides temporary storage for and quick access to frequently used data.
- Cache memory is conventionally managed by hardware, which attempts to cache the most frequently accessed data while purging older, unused data and fetching data from nearby memory locations, thus retaining the working set of the program in cache.
- hardware-managed cache is designed to support a wide spectrum of workloads, the design tends to be very generic in terms of the algorithms used for cache placement (i.e., determining where in the cache array to keep a data item fetched from a specific memory address). For example, set placement decisions are made based on selected bits from the physical address to index into a given set of a set-associative cache. Although such greedy/random replacement decisions provide reasonably high hit-rates at low cost, they have proven to be ineffective for high memory bound workloads. Thus hardware-managed cache is generally less capable of exploiting application-specific behaviors and replacement policies than other memory schemes. Other software-based approaches that rely on the programmer or compiler for data movement also tend to behave too conservatively to close the processor/memory system gap effectively.
- One embodiment of the present method and apparatus for application-specific dynamic cache placement includes grouping sets of data in a cache memory system into two or more virtual partitions and processing a load/store instruction in accordance with the virtual partitions, where the load/store instruction specifies at least one of the virtual partitions to which the load/store instruction is assigned.
- FIG. 1 is a schematic diagram illustrating an exemplary cache, partitioned according to one embodiment of the present invention
- FIG. 2 is a flow diagram illustrating one embodiment of a method for loading and storing data, according to the present invention
- FIG. 3 is a schematic diagram illustrating an exemplary cache, partitioned according to another embodiment of present invention.
- FIG. 4 is a flow diagram illustrating another embodiment of a method for loading and storing data, according to the present invention.
- FIG. 5 is a flow diagram illustrating one embodiment of a method for partitioning load/store instructions, according to the present invention.
- FIG. 6 is a high level block diagram of the data caching method that is implemented using a general purpose computing device.
- the present invention is a method and apparatus for application-specific dynamic cache placement.
- the present invention maintains multiple smaller caches, as opposed to the traditional unified cache. That is, the cache is partitioned logically (i.e., without necessarily maintaining multiple distinct caches), such that data items can be assigned separately to each of the partitions, thereby reducing conflicts among data items that may share a set address while residing at different locations within memory. Moreover, data sets with long reuse can be restricted to a certain partition, making it less likely that a data set will replace other, more useful data items assigned to other partitions.
- FIG. 1 is a schematic diagram illustrating an exemplary cache 100 , partitioned according to one embodiment of the present invention.
- a tag directory/array 10 of the cache 100 comprises a plurality of sets 102 1 - 102 n (hereinafter collectively referred to as “sets 102 ”). Each set 102 comprises a plurality of ways or lines to which data items may be loaded.
- the sets 102 are grouped into two or more partitions 104 1 - 104 n (hereinafter collectively referred to as “partitions 104 ”).
- the partitions 104 are virtual groupings or “supersets” comprising multiple regular sets 102 .
- the tag directory 10 comprises four sets 102 (each comprising 4 lines) that are grouped into two partitions 104 of two sets 102 each.
- An exemplary load/store instruction 112 for accessing the cache 100 generates an n-bit address 106 , where the fields 108 1 - 108 3 (hereinafter collectively referred to as “fields 108 ”) of the address are named using the convention address [n:m], n is the least significant bit and m is the most significant bit. The width, w, of a field 108 is therefore m ⁇ n+1.
- fields 108 the fields 108 1 - 108 3 (hereinafter collectively referred to as “fields 108 ”) of the address are named using the convention address [n:m], n is the least significant bit and m is the most significant bit. The width, w, of a field 108 is therefore m ⁇ n+1.
- s bits of the n-bit address are used to select among 2 5 sets in cache; for partitioned cache comprising 2 k partitions, only s-k bits are used to select sets within a given partition 104 .
- the address 106 specified by the load/store instruction 112 comprises a “tag” field 108 1 for specifying a tag address of referenced data item stored in the tag directory 110 , a “set-index” field 108 2 for specifying a maximum number of sets 102 per partitions 104 in the cache 100 , and an “offset” field 108 3 for specifying a size of the data item.
- the load/store instruction 112 also specifies one or more identifiers of target partitions 104 in the cache 100 .
- the target partitions 104 comprise portions of the cache 100 to which the load/store instruction 112 may load a data item in the event that the data item is not already contained within the cache 100 .
- the partition 104 to which the load/store instruction 112 is assigned is represented using a 2k -bit vector, where each bit in the 2k-bit vector corresponds to a given partition 104 .
- the corresponding bit-vector value would be [0110], where 0 indicates an unassigned partition and 1 indicates an assigned partition.
- FIG. 2 is a flow diagram illustrating one embodiment of a method 200 for loading and storing data, according to the present invention.
- the method 200 may be implemented, for example, at a cache controller.
- the method 200 is initialized at step 202 and proceeds to step 204 , where the method 200 receives a load/store instruction.
- the method 200 then proceeds to step 206 and checks each partition in the cache for the data item referenced in the load/store instruction. Referring back to FIG. 1 , in step 206 , the method 200 looks for the “tag” field 108 (i.e., tag field 108 1 ) of the address of (i.e., the bits [f+s ⁇ k:n] of the address) in the tag directory 110 in each partition 104 .
- the method 200 only examines specific partitions of the cache that are specified using the bit-vector in the load/store instruction generated by the compiler.
- step 208 the method 200 determines whether the data item is present in the cache based on the result of the tag comparison done in step 206 . If the method 200 concludes in step 208 that the data item is present in a given partition in the cache (i.e., a cache “hit”), the method 200 proceeds to step 218 and either forwards the data item to the processor (in the case of a load operation) or stores the data item in cache (in the case of a store operation).
- step 208 if the method 200 concludes in step 208 that the data item is not present in the cache (i.e., a cache “miss”), the method 200 proceeds to step 212 and determines whether the load/store instruction is assigned to one or more partitions of the cache.
- step 212 the method 200 proceeds to step 214 and loads the data item to one of the partitions assigned to the load/store instruction (e.g., to tag directory location (p, set-index), where p is a selected partition in the tag directory).
- a partition identifier that specifies a partition of cache to which the load/store instruction has been assigned is included in the load/store instruction.
- any replacement policy e.g., least recently used or the like may be used to select from among the specified partitions.
- the partition identifier is implemented as part of a special load/store instruction that includes a number of extra bits for identifying the partition.
- the partition identifier is implemented as a separate prefix instruction that specifies the partition associated with the next sequential load/store instruction.
- one or more special registers can be implemented as new additional operands of load/store instructions to specify the partition information.
- the register-based solution allows the load or store to be assigned to multiple partitions, as each register will comprise a bit-vector that specifies the assigned partitions.
- register move instructions are used to initialize each register with a literal that corresponds to the partition bit-vector. The register move instructions are inserted prior to the load/store instructions that use them.
- such a bit-vector is specified as an immediate operand of the load/store instruction.
- step 212 if the method 200 concludes in step 212 that the load/store instruction is not assigned to a partition, the method 200 proceeds to step 216 and loads the data item in accordance with the address of the data item (i.e., in accordance with the treatment of traditional unified cache).
- step 218 the data item is read from the tag directory of the cache location (p, set-index, tag) at offset, where p is the partition identifier.
- p the partition identifier
- the data item is written at offset in tag directory location (p, set-index, tag).
- the knowledge of which sets in cache comprise which partitions is encoded in the cache controller.
- This knowledge is substantially transparent to an application using the cache, however.
- By searching for the referenced data item in all cache partitions to determine a hit or a miss occurrences of data duplication and associated coherency issues are substantially reduced.
- By annotating load/store instructions with partition identifiers that are to be searched during this stage performance may be even further improved. If a compiler is unable to prove that two loads assigned to two different partitions do not conflict, the compiler may allow both loads to probe each others' partitions, thereby reducing the occurrence of data duplication.
- FIG. 3 is a schematic diagram illustrating an exemplary cache 300 , partitioned according to another embodiment of present invention.
- the cache 300 is substantially similar to the cache 100 illustrated in FIG. 1 .
- a tag directory/array 310 of the cache 300 comprises a plurality of sets 302 1 - 302 n (hereinafter collectively referred to as “sets 302 ”). Each set 302 comprises a plurality of ways or lines to which data items may be loaded.
- the sets 302 are grouped into two or more partitions 304 n (hereinafter collectively referred to as “partitions 304 ”).
- the tag directory 310 comprises two sets 302 (each comprising 4 lines) that are grouped into a single partition 304 .
- the cache directory 310 includes an additional partition identifier 312 for each set 302 in the cache 300 .
- a partition 304 comprises all sets 302 that have the same partition identifier 312 .
- the address 306 specified by the exemplary load/store instruction 314 is substantially similar to that for a unified cache and comprises a “tag” field 308 , for specifying a referenced data item, a “set-index” field 3082 for specifying a maximum number of partitions 304 in the cache 300 , and an “offset” field 3083 for specifying a block size of the data item.
- the load/store instruction 314 also specifies one or more identifiers of target partitions 304 in the cache 300 .
- FIG. 4 is a flow diagram illustrating another embodiment of a method 400 for loading and storing data, according to the present invention.
- the method 400 may be implemented at a cache controller. Unlike the method 400 , however, the method 400 implements partition identifiers as part of the cache tag directory structure, as described with respect to FIG. 3 .
- the method 400 is initialized at step 402 and proceeds to step 404 , where the method 400 receives a load/store instruction.
- the method 400 then proceeds to step 406 and identifies which sets within the cache comprise a partition.
- the method 400 first probes the tag directory to identify which sets constitute a partition. This must be done for all sets, so as to identify every partition.
- step 408 checks each identified partition in the cache for the data item referenced in the load/store instruction.
- the sets in a partition are probed using regular “tag” and “set-index” fields from the load/store instruction.
- step 410 the method 400 determines whether the data item is present in a given partition in the cache. If the method 400 concludes in step 410 that the data item is present in the cache (i.e., a cache “hit”), the method 400 proceeds to step 420 and either forwards the data item to the processor (load) or stores the data item in the cache (store).
- step 410 if the method 400 concludes in step 410 that the data item is not present in the cache (i.e., a cache “miss”), the method 400 proceeds to step 414 and determines whether the load/store instruction is assigned to one or more partitions of the cache.
- step 414 the method 400 proceeds to step 416 and loads the data item to one of the partitions assigned to the load/store instruction (e.g., to tag directory location (p, set-index), where p is a selected partition in the tag directory).
- a partition identifier that specifies a partition of cache to which the load/store instruction has been assigned is included in the load/store instruction.
- any replacement policy e.g., least recently used or the like may be used to selected from among the specified partitions.
- step 414 if the method 400 concludes in step 414 that the load/store instruction is not assigned to a partition, the method 400 proceeds to step 418 and loads the data item in accordance with the address of the block/data item (i.e., in accordance with the treatment of traditional unified cache).
- step 420 forwards the data item to the processor or stores the data item in the cache before terminating in step 422 .
- the data item is read from the tag directory of the cache location (p, set-index, tag) at offset, where p is the partition identifier.
- the data item is written at offset in tag directory location (p, set-index, tag).
- the method 400 does not fix the number of partitions (or the sizes of the partitions) in cache.
- the method 400 can accommodate cache accesses by different applications having different needs regarding partition configuration. Also, even within an application, different phases of the application having different partition requirements can be easily accommodated.
- the method 400 affords more flexibility than the method 200 , but at the cost of slightly greater overhead (required to maintain partition identifiers within the tag directory of the cache).
- FIG. 5 is a flow diagram illustrating one embodiment of a method 500 for partitioning load/store instructions, according to the present invention.
- the method 500 may be implemented, for example, at a compiler, in order to facilitate the processing of load/store instructions as described above.
- the method 500 is initialized at step 502 and proceeds to step 504 , where the method 500 identifies the most frequent load/store instructions, as well as the accessed data items corresponding to these load/store instructions.
- the method 500 calculates the affinity and conflict metrics for each load/store instruction pair.
- the affinity between two memory reference instructions is defined to be proportional to the cardinality of the common set of data items that are accessed by the two instructions within a given time interval. If two instructions access the same data set, assigning the two instructions to the same partition is useful, as the two instructions can aid each other by pre-fetching data sets for the other. Conflict information is used to identify which pairs of load/store instructions potentially displace each other, and are thus better assigned to different partitions.
- the method 500 records the footprint of each load/store instruction.
- the footprint of a load/store instruction is the number of unique reused data items that are accessed by the load/store instruction. Load/store instructions with large footprints may need to be assigned to multiple partitions.
- each load/store instruction is broken into one or more sub-instructions, where each sub-instruction is either a regular load/store instruction or a sub-part of a single load/store instruction that performs a subset of the data accesses that would have been performed by the original load/store instruction.
- the number of sub-instructions corresponding to each load/store instruction is proportional to:
- Footprint(instruction) is the number of unique reused data items accessed by the corresponding load/store instruction.
- the method 500 formulates the partitioning problem as a graph, where nodes of the graph correspond to sub-instructions, and edges of the graph measure the desire of the sub-instructions to be assigned to the same or different partitions.
- the edge weights of the graph are computed based on the affinity and conflict metrics calculated earlier.
- the edges of the graph are assigned weights such that load/store instructions exhibiting a high degree of conflict are assigned to different partitions, while load/store instructions exhibiting a high degree of affinity are assigned to the same partition.
- the edge weights between sub-instructions of the same memory reference instruction are set such that the sub-instructions are always assigned to different partitions.
- step 514 the method 500 assigns each load/store instruction to one or more cache partitions, in accordance with the graph.
- a load/store instruction is assigned to multiple cache partitions if the load/store instruction comprises multiple sub-instructions assigned to different partitions.
- step 516 the method 500 annotates the load/store instructions with their respective partition identifiers, as determined in step 514 .
- the method 500 then terminates in step 518 .
- FIG. 6 is a high level block diagram of the data caching method that is implemented using a general purpose computing device 600 .
- a general purpose computing device 600 comprises a processor 602 , a memory 604 , a data caching module 605 and various input/output (I/O) devices 606 such as a display, a keyboard, a mouse, a modem, and the like.
- I/O devices 606 such as a display, a keyboard, a mouse, a modem, and the like.
- at least one I/O device is a storage device (e.g., a disk drive, an optical disk drive, a floppy disk drive).
- the data caching module 605 can be implemented as a physical device or subsystem that is coupled to a processor through a communication channel.
- the data caching module 605 can be represented by one or more software applications (or even a combination of software and hardware, e.g., using Application Specific Integrated Circuits (ASIC)), where the software is loaded from a storage medium (e.g., I/O devices 606 ) and operated by the processor 602 in the memory 604 of the general purpose computing device 600 .
- a storage medium e.g., I/O devices 606
- the data caching module 605 for loading and storing data described herein with reference to the preceding Figures can be stored on a computer readable medium or carrier (e.g., RAM, magnetic or optical drive or diskette, and the like).
- Embodiments of the present invention represents a significant advancement in the field of data processing systems.
- Embodiments of the present invention maintain multiple smaller caches, as opposed to a traditional unified cache. That is, the cache is partitioned logically (i.e., without necessarily maintaining multiple distinct caches), such that data items can be assigned separately to each of the partitions, thereby reducing conflicts among data items that may share a set address while residing at different locations within memory.
- data sets with long reuse can be restricted to a certain partition, making it less likely that a data set will replace other, more useful data items assigned to other partitions.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
One embodiment of the present method and apparatus for application-specific dynamic cache placement includes grouping sets of data in a cache memory system into two or more virtual partitions and processing a load/store instruction in accordance with the virtual partitions, where the load/store instruction specifies at least one of the virtual partitions to which the load/store instruction is assigned.
Description
- The present invention relates generally to data processing systems and relates more particularly to the management of data stored in cache memory systems.
- The performance gap between the processor and the memory system in computing systems has been steadily increasing. With every generation, processors are being clocked at higher rates, while memory systems have been unable to catch up to this exponential growth. The resultant performance gap has caused a major bottleneck for single-thread performance in modern day processors, as the memory system is generally unable to supply data to the processor core at the rate of execution of the processor.
- Traditional means of reducing this performance gap include the use of cache memory of varying sizes and levels, which provides temporary storage for and quick access to frequently used data. Cache memory is conventionally managed by hardware, which attempts to cache the most frequently accessed data while purging older, unused data and fetching data from nearby memory locations, thus retaining the working set of the program in cache.
- Although there are many advantages to using hardware-managed cache systems, such as transparency and flexibility, the layer of abstraction afforded by these systems comes at a price. For one, because hardware-managed cache is designed to support a wide spectrum of workloads, the design tends to be very generic in terms of the algorithms used for cache placement (i.e., determining where in the cache array to keep a data item fetched from a specific memory address). For example, set placement decisions are made based on selected bits from the physical address to index into a given set of a set-associative cache. Although such greedy/random replacement decisions provide reasonably high hit-rates at low cost, they have proven to be ineffective for high memory bound workloads. Thus hardware-managed cache is generally less capable of exploiting application-specific behaviors and replacement policies than other memory schemes. Other software-based approaches that rely on the programmer or compiler for data movement also tend to behave too conservatively to close the processor/memory system gap effectively.
- Thus, there is a need in the art for a method and apparatus for application-specific dynamic cache placement.
- One embodiment of the present method and apparatus for application-specific dynamic cache placement includes grouping sets of data in a cache memory system into two or more virtual partitions and processing a load/store instruction in accordance with the virtual partitions, where the load/store instruction specifies at least one of the virtual partitions to which the load/store instruction is assigned.
- So that the manner in which the above recited embodiments of the invention are attained and can be understood in detail, a more particular description of the invention, briefly summarized above, may be obtained by reference to the embodiments thereof which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.
-
FIG. 1 is a schematic diagram illustrating an exemplary cache, partitioned according to one embodiment of the present invention; -
FIG. 2 is a flow diagram illustrating one embodiment of a method for loading and storing data, according to the present invention; -
FIG. 3 is a schematic diagram illustrating an exemplary cache, partitioned according to another embodiment of present invention; -
FIG. 4 is a flow diagram illustrating another embodiment of a method for loading and storing data, according to the present invention; -
FIG. 5 is a flow diagram illustrating one embodiment of a method for partitioning load/store instructions, according to the present invention; and -
FIG. 6 is a high level block diagram of the data caching method that is implemented using a general purpose computing device. - To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures.
- In one embodiment, the present invention is a method and apparatus for application-specific dynamic cache placement. In one embodiment, the present invention maintains multiple smaller caches, as opposed to the traditional unified cache. That is, the cache is partitioned logically (i.e., without necessarily maintaining multiple distinct caches), such that data items can be assigned separately to each of the partitions, thereby reducing conflicts among data items that may share a set address while residing at different locations within memory. Moreover, data sets with long reuse can be restricted to a certain partition, making it less likely that a data set will replace other, more useful data items assigned to other partitions.
-
FIG. 1 is a schematic diagram illustrating anexemplary cache 100, partitioned according to one embodiment of the present invention. A tag directory/array 10 of thecache 100 comprises a plurality of sets 102 1-102 n (hereinafter collectively referred to as “sets 102”). Eachset 102 comprises a plurality of ways or lines to which data items may be loaded. Thesets 102 are grouped into two or more partitions 104 1-104 n (hereinafter collectively referred to as “partitions 104”). The partitions 104 are virtual groupings or “supersets” comprising multipleregular sets 102. In the example ofFIG. 1 , the tag directory 10 comprises four sets 102 (each comprising 4 lines) that are grouped into two partitions 104 of twosets 102 each. - An exemplary load/
store instruction 112 for accessing thecache 100 generates an n-bit address 106, where the fields 108 1-108 3 (hereinafter collectively referred to as “fields 108”) of the address are named using the convention address [n:m], n is the least significant bit and m is the most significant bit. The width, w, of a field 108 is therefore m−n+1. For unified cache, s bits of the n-bit address are used to select among 25 sets in cache; for partitioned cache comprising 2k partitions, only s-k bits are used to select sets within a given partition 104. - The
address 106 specified by the load/store instruction 112 comprises a “tag” field 108 1 for specifying a tag address of referenced data item stored in thetag directory 110, a “set-index” field 108 2 for specifying a maximum number ofsets 102 per partitions 104 in thecache 100, and an “offset” field 108 3 for specifying a size of the data item. As will be described in further detail below, the load/store instruction 112 also specifies one or more identifiers of target partitions 104 in thecache 100. The target partitions 104 comprise portions of thecache 100 to which the load/store instruction 112 may load a data item in the event that the data item is not already contained within thecache 100. In general, if there are 2k partitions 104 in thecache 100, the partition 104 to which the load/store instruction 112 is assigned is represented using a 2k -bit vector, where each bit in the 2k-bit vector corresponds to a given partition 104. Thus, for example, if there are four partitions in cache, and a load/store instruction is assigned to partitions two and three, the corresponding bit-vector value would be [0110], where 0 indicates an unassigned partition and 1 indicates an assigned partition. -
FIG. 2 is a flow diagram illustrating one embodiment of amethod 200 for loading and storing data, according to the present invention. Themethod 200 may be implemented, for example, at a cache controller. - The
method 200 is initialized atstep 202 and proceeds tostep 204, where themethod 200 receives a load/store instruction. Themethod 200 then proceeds to step 206 and checks each partition in the cache for the data item referenced in the load/store instruction. Referring back toFIG. 1 , instep 206, themethod 200 looks for the “tag” field 108 (i.e., tag field 108 1) of the address of (i.e., the bits [f+s−k:n] of the address) in thetag directory 110 in each partition 104. In another embodiment, themethod 200 only examines specific partitions of the cache that are specified using the bit-vector in the load/store instruction generated by the compiler. - In
step 208, themethod 200 determines whether the data item is present in the cache based on the result of the tag comparison done instep 206. If themethod 200 concludes instep 208 that the data item is present in a given partition in the cache (i.e., a cache “hit”), themethod 200 proceeds tostep 218 and either forwards the data item to the processor (in the case of a load operation) or stores the data item in cache (in the case of a store operation). - Alternatively, if the
method 200 concludes instep 208 that the data item is not present in the cache (i.e., a cache “miss”), themethod 200 proceeds tostep 212 and determines whether the load/store instruction is assigned to one or more partitions of the cache. - If the
method 200 concludes instep 212 that the load/store instruction is assigned to a partition, themethod 200 proceeds tostep 214 and loads the data item to one of the partitions assigned to the load/store instruction (e.g., to tag directory location (p, set-index), where p is a selected partition in the tag directory). In this case, a partition identifier that specifies a partition of cache to which the load/store instruction has been assigned is included in the load/store instruction. In one embodiment, where the partition identifier specifies more than one partition, any replacement policy (e.g., least recently used or the like) may be used to select from among the specified partitions. - In one embodiment, the partition identifier is implemented as part of a special load/store instruction that includes a number of extra bits for identifying the partition. In another embodiment, the partition identifier is implemented as a separate prefix instruction that specifies the partition associated with the next sequential load/store instruction. In another embodiment still, one or more special registers can be implemented as new additional operands of load/store instructions to specify the partition information. The register-based solution allows the load or store to be assigned to multiple partitions, as each register will comprise a bit-vector that specifies the assigned partitions. In one embodiment, register move instructions are used to initialize each register with a literal that corresponds to the partition bit-vector. The register move instructions are inserted prior to the load/store instructions that use them. In yet another embodiment involving a small number of partitions, such a bit-vector is specified as an immediate operand of the load/store instruction.
- Alternatively, if the
method 200 concludes instep 212 that the load/store instruction is not assigned to a partition, themethod 200 proceeds to step 216 and loads the data item in accordance with the address of the data item (i.e., in accordance with the treatment of traditional unified cache). - Once the data item has been fetched from the cache (i.e., in accordance with step 210) or loaded to the appropriate partition (i.e., in accordance with
step 214 or 216), themethod 200 proceeds to step 218 and either forwards the data item to the processor (load) or stores the data item in cache (store) before terminating instep 220. For a load instruction, the data item is read from the tag directory of the cache location (p, set-index, tag) at offset, where p is the partition identifier. Similarly, for a store instruction, the data item is written at offset in tag directory location (p, set-index, tag). - Thus, according to the
method 200, the knowledge of which sets in cache comprise which partitions is encoded in the cache controller. This knowledge is substantially transparent to an application using the cache, however. By searching for the referenced data item in all cache partitions to determine a hit or a miss, occurrences of data duplication and associated coherency issues are substantially reduced. By annotating load/store instructions with partition identifiers that are to be searched during this stage, performance may be even further improved. If a compiler is unable to prove that two loads assigned to two different partitions do not conflict, the compiler may allow both loads to probe each others' partitions, thereby reducing the occurrence of data duplication. - Moreover, by allowing a load/store instruction to load data only to assigned partitions of the cache, the likelihood of the load/store instruction replacing data items required by other load/store instructions is substantially reduced. “Unified” and “partitioned” views of the cache can be retained simultaneously, without having to re-organize the tag directory of the cache.
-
FIG. 3 is a schematic diagram illustrating anexemplary cache 300, partitioned according to another embodiment of present invention. Thecache 300 is substantially similar to thecache 100 illustrated inFIG. 1 . A tag directory/array 310 of thecache 300 comprises a plurality of sets 302 1-302 n (hereinafter collectively referred to as “sets 302”). Each set 302 comprises a plurality of ways or lines to which data items may be loaded. The sets 302 are grouped into two or more partitions 304 n (hereinafter collectively referred to as “partitions 304”). In the example ofFIG. 3 , thetag directory 310 comprises two sets 302 (each comprising 4 lines) that are grouped into a single partition 304. Unlike thecache 100, thecache directory 310 includes anadditional partition identifier 312 for each set 302 in thecache 300. Thus, a partition 304 comprises all sets 302 that have thesame partition identifier 312. - The
address 306 specified by the exemplary load/store instruction 314 is substantially similar to that for a unified cache and comprises a “tag” field 308, for specifying a referenced data item, a “set-index”field 3082 for specifying a maximum number of partitions 304 in thecache 300, and an “offset”field 3083 for specifying a block size of the data item. The load/store instruction 314 also specifies one or more identifiers of target partitions 304 in thecache 300. -
FIG. 4 is a flow diagram illustrating another embodiment of amethod 400 for loading and storing data, according to the present invention. Themethod 400, like themethod 200, may be implemented at a cache controller. Unlike themethod 400, however, themethod 400 implements partition identifiers as part of the cache tag directory structure, as described with respect toFIG. 3 . - The
method 400 is initialized atstep 402 and proceeds to step 404, where themethod 400 receives a load/store instruction. Themethod 400 then proceeds to step 406 and identifies which sets within the cache comprise a partition. In particular, since the partition identifiers for each set are stored in the tag directory structure, themethod 400 first probes the tag directory to identify which sets constitute a partition. This must be done for all sets, so as to identify every partition. - Once the
method 400 has identified the sets within each partition, themethod 400 proceeds to step 408 and checks each identified partition in the cache for the data item referenced in the load/store instruction. The sets in a partition are probed using regular “tag” and “set-index” fields from the load/store instruction. - In
step 410, themethod 400 determines whether the data item is present in a given partition in the cache. If themethod 400 concludes instep 410 that the data item is present in the cache (i.e., a cache “hit”), themethod 400 proceeds to step 420 and either forwards the data item to the processor (load) or stores the data item in the cache (store). - Alternatively, if the
method 400 concludes instep 410 that the data item is not present in the cache (i.e., a cache “miss”), themethod 400 proceeds to step 414 and determines whether the load/store instruction is assigned to one or more partitions of the cache. - If the
method 400 concludes instep 414 that the load/store instruction is assigned to a partition, themethod 400 proceeds to step 416 and loads the data item to one of the partitions assigned to the load/store instruction (e.g., to tag directory location (p, set-index), where p is a selected partition in the tag directory). In this case, a partition identifier that specifies a partition of cache to which the load/store instruction has been assigned is included in the load/store instruction. In one embodiment, where the partition identifier specifies more than one partition, any replacement policy (e.g., least recently used or the like) may be used to selected from among the specified partitions. - Alternatively, if the
method 400 concludes instep 414 that the load/store instruction is not assigned to a partition, themethod 400 proceeds to step 418 and loads the data item in accordance with the address of the block/data item (i.e., in accordance with the treatment of traditional unified cache). - Once the data item has been fetched from the cache or loaded to the appropriate partition (i.e., in accordance with
step 416 or 418), themethod 400 proceeds to step 420 and forwards the data item to the processor or stores the data item in the cache before terminating instep 422. For a load instruction, the data item is read from the tag directory of the cache location (p, set-index, tag) at offset, where p is the partition identifier. For a store instruction, the data item is written at offset in tag directory location (p, set-index, tag). - The
method 400, though similar to themethod 200, does not fix the number of partitions (or the sizes of the partitions) in cache. Thus, themethod 400 can accommodate cache accesses by different applications having different needs regarding partition configuration. Also, even within an application, different phases of the application having different partition requirements can be easily accommodated. Thus, themethod 400 affords more flexibility than themethod 200, but at the cost of slightly greater overhead (required to maintain partition identifiers within the tag directory of the cache). -
FIG. 5 is a flow diagram illustrating one embodiment of amethod 500 for partitioning load/store instructions, according to the present invention. Themethod 500 may be implemented, for example, at a compiler, in order to facilitate the processing of load/store instructions as described above. - The
method 500 is initialized atstep 502 and proceeds to step 504, where themethod 500 identifies the most frequent load/store instructions, as well as the accessed data items corresponding to these load/store instructions. - In
step 506, themethod 500 calculates the affinity and conflict metrics for each load/store instruction pair. The affinity between two memory reference instructions is defined to be proportional to the cardinality of the common set of data items that are accessed by the two instructions within a given time interval. If two instructions access the same data set, assigning the two instructions to the same partition is useful, as the two instructions can aid each other by pre-fetching data sets for the other. Conflict information is used to identify which pairs of load/store instructions potentially displace each other, and are thus better assigned to different partitions. - In
step 508, themethod 500 records the footprint of each load/store instruction. The footprint of a load/store instruction is the number of unique reused data items that are accessed by the load/store instruction. Load/store instructions with large footprints may need to be assigned to multiple partitions. - In
step 510, each load/store instruction is broken into one or more sub-instructions, where each sub-instruction is either a regular load/store instruction or a sub-part of a single load/store instruction that performs a subset of the data accesses that would have been performed by the original load/store instruction. The number of sub-instructions corresponding to each load/store instruction is proportional to: -
Footprint(instruction)/size of(single partition) - where Footprint(instruction) is the number of unique reused data items accessed by the corresponding load/store instruction.
- In
step 512, themethod 500 formulates the partitioning problem as a graph, where nodes of the graph correspond to sub-instructions, and edges of the graph measure the desire of the sub-instructions to be assigned to the same or different partitions. The edge weights of the graph are computed based on the affinity and conflict metrics calculated earlier. The edges of the graph are assigned weights such that load/store instructions exhibiting a high degree of conflict are assigned to different partitions, while load/store instructions exhibiting a high degree of affinity are assigned to the same partition. The edge weights between sub-instructions of the same memory reference instruction are set such that the sub-instructions are always assigned to different partitions. - In
step 514, themethod 500 assigns each load/store instruction to one or more cache partitions, in accordance with the graph. In one embodiment, a load/store instruction is assigned to multiple cache partitions if the load/store instruction comprises multiple sub-instructions assigned to different partitions. - In
step 516, themethod 500 annotates the load/store instructions with their respective partition identifiers, as determined instep 514. Themethod 500 then terminates instep 518. -
FIG. 6 is a high level block diagram of the data caching method that is implemented using a generalpurpose computing device 600. In one embodiment, a generalpurpose computing device 600 comprises aprocessor 602, amemory 604, adata caching module 605 and various input/output (I/O)devices 606 such as a display, a keyboard, a mouse, a modem, and the like. In one embodiment, at least one I/O device is a storage device (e.g., a disk drive, an optical disk drive, a floppy disk drive). It should be understood that thedata caching module 605 can be implemented as a physical device or subsystem that is coupled to a processor through a communication channel. - Alternatively, the
data caching module 605 can be represented by one or more software applications (or even a combination of software and hardware, e.g., using Application Specific Integrated Circuits (ASIC)), where the software is loaded from a storage medium (e.g., I/O devices 606) and operated by theprocessor 602 in thememory 604 of the generalpurpose computing device 600. Thus, in one embodiment, thedata caching module 605 for loading and storing data described herein with reference to the preceding Figures can be stored on a computer readable medium or carrier (e.g., RAM, magnetic or optical drive or diskette, and the like). - Thus, the present invention represents a significant advancement in the field of data processing systems. Embodiments of the present invention maintain multiple smaller caches, as opposed to a traditional unified cache. That is, the cache is partitioned logically (i.e., without necessarily maintaining multiple distinct caches), such that data items can be assigned separately to each of the partitions, thereby reducing conflicts among data items that may share a set address while residing at different locations within memory. Moreover, data sets with long reuse can be restricted to a certain partition, making it less likely that a data set will replace other, more useful data items assigned to other partitions.
- While the foregoing is directed to the preferred embodiment of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.
Claims (20)
1. A method for accessing data in a cache memory system, the method comprising:
grouping sets of data in the cache memory system into two or more virtual partitions; and
processing a load/store instruction in accordance with the two or more virtual partitions, where the load/store instruction specifies at least one of the two or more virtual partitions to which the load/store instruction is assigned.
2. The method of claim 1 , wherein the processing comprises:
checking at least a subset of the two or more virtual partitions for a data item referenced by the load/store instruction;
fetching the data item referenced by the load/store instruction from one of the two or more virtual partitions, if the data item is located therein and the load/store instruction specifies a load operation;
storing the data item referenced by the load/store instruction in one of the two or more virtual partitions, if the load/store instruction specifies a store operation; and
loading the data item to the at least one of the two or more virtual partitions to which the load/store instruction is assigned, if the data item is not located in the subset of the two or more virtual partitions.
3. The method of claim 2 , wherein the checking comprises:
checking each of the two or more virtual partitions for the data item.
4. The method of claim 2 , wherein the checking comprises:
identifying one or more sets in the cache memory system that comprise the at least one of the two or more virtual partitions to which the load/store instruction is assigned; and
checking the one or more sets for the data item.
5. The method of claim 2 , wherein the loading and storing comprises:
selecting a virtual partition from among two or more virtual partitions to which the load/store instruction is assigned, in accordance with a cache replacement policy.
6. The method of claim 1 , wherein the at least one of the two or more virtual partitions to which the load/store instruction is assigned is specified by an identifier in the load/store instruction.
7. The method of claim 6 , wherein the load/store instruction comprises a number of extra bits for specifying the identifier.
8. The method of claim 6 , wherein the identifier is implemented as a prefix instruction to the load/store instruction, the prefix instruction specifying one or the two or more virtual partitions that is associated with a next sequential load/store instruction.
9. The method of claim 6 , wherein the identifier is specified in accordance with one or more registers implemented as operands of the load/store instruction.
10. The method of claim 1 , wherein the grouping comprises:
identifying one or more load/store instructions that are encountered most frequently from among a plurality of load/store instructions;
identifying one or more data items accessed by the one or more load/store instructions;
modeling the one or more load/store instructions as a graph; and
assigning each of the one or more load/store instructions to at least one virtual partition in accordance with the graph.
11. The method of claim 10 , wherein the modeling comprises:
dividing each of the one or more load/store instructions into one or more sub-instructions, each of the one or more sub-instructions comprising a node of the graph;
calculating an affinity metric for each pair of load/store instructions within the one or more load/store instructions;
calculating a conflict metric for each pair of load/store instructions; and
assigning weights to edges of the graph in accordance with the affinity metric and the conflict metric, the weights indicating whether each pair of load/store instructions should be assigned to a common virtual partition or to different virtual partitions.
12. A computer readable medium containing an executable program for accessing data in a cache memory system, where the program performs the steps of:
grouping sets of data in the cache memory system into two or more virtual partitions; and
processing a load/store instruction in accordance with the two or more virtual partitions, where the load/store instruction specifies at least one of the two or more virtual partitions to which the load/store instruction is assigned.
13. The computer readable medium of claim 12 , wherein the processing comprises:
checking at least a subset of the two or more virtual partitions for a data item referenced by the load/store instruction;
fetching the data item referenced by the load/store instruction from one of the two or more virtual partitions, if the data item is located therein and the load/store instruction specifies a load operation;
storing the data item referenced by the load/store instruction in one of the two or more virtual partitions, if the load/store instruction specifies a store operation; and
loading the data item to the at least one of the two or more virtual partitions to which the load/store instruction is assigned, if the data item is not located in the subset of the two or more virtual partitions.
14. The computer readable medium of claim 13 , wherein the checking comprises:
checking each of the two or more virtual partitions for the data item.
15. The computer readable medium of claim 13 , wherein the checking comprises:
identifying one or more sets in the cache memory system that comprise the at least one of the two or more virtual partitions to which the load/store instruction is assigned; and
checking the one or more sets for the data item.
16. The computer readable medium of claim 12 , wherein the at least one of the two or more virtual partitions to which the load/store instruction is assigned is specified by an identifier in the load/store instruction.
17. The computer readable medium of claim 16 , wherein the load/store instruction comprises a number of extra bits for specifying the identifier.
18. The computer readable medium of claim 12 , wherein the grouping comprises:
identifying one or more load/store instructions that are encountered most frequently from among a plurality of load/store instructions;
identifying one or more data items accessed by the one or more load/store instructions;
modeling the one or more load/store instructions as a graph; and
assigning each of the one or more load/store instructions to at least one virtual partition in accordance with the graph.
19. The computer readable medium of claim 18 , wherein the modeling comprises:
dividing each of the one or more load/store instructions into one or more sub-instructions, each of the one or more sub-instructions comprising a node of the graph;
calculating an affinity metric for each pair of load/store instructions within the one or more load/store instructions;
calculating a conflict metric for each pair of load/store instructions; and
assigning weights to edges of the graph in accordance with the affinity metric and the conflict metric, the weights indicating whether each pair of load/store instructions should be assigned to a common virtual partition or to different virtual partitions.
20. A cache controller for controlling access to a cache memory, the cache controller comprising:
means for grouping sets of data in the cache memory system into two or more virtual partitions; and
means for processing a load/store instruction in accordance with the two or more virtual partitions, where the load/store instruction specifies at least one of the two or more virtual partitions to which the load/store instruction is assigned.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/482,923 US20080010413A1 (en) | 2006-07-07 | 2006-07-07 | Method and apparatus for application-specific dynamic cache placement |
US12/165,178 US7836256B2 (en) | 2006-07-07 | 2008-06-30 | Method and apparatus for application-specific dynamic cache placement |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/482,923 US20080010413A1 (en) | 2006-07-07 | 2006-07-07 | Method and apparatus for application-specific dynamic cache placement |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/165,178 Continuation US7836256B2 (en) | 2006-07-07 | 2008-06-30 | Method and apparatus for application-specific dynamic cache placement |
Publications (1)
Publication Number | Publication Date |
---|---|
US20080010413A1 true US20080010413A1 (en) | 2008-01-10 |
Family
ID=38920322
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/482,923 Abandoned US20080010413A1 (en) | 2006-07-07 | 2006-07-07 | Method and apparatus for application-specific dynamic cache placement |
US12/165,178 Expired - Fee Related US7836256B2 (en) | 2006-07-07 | 2008-06-30 | Method and apparatus for application-specific dynamic cache placement |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/165,178 Expired - Fee Related US7836256B2 (en) | 2006-07-07 | 2008-06-30 | Method and apparatus for application-specific dynamic cache placement |
Country Status (1)
Country | Link |
---|---|
US (2) | US20080010413A1 (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110022773A1 (en) * | 2009-07-27 | 2011-01-27 | International Business Machines Corporation | Fine Grained Cache Allocation |
US20110055827A1 (en) * | 2009-08-25 | 2011-03-03 | International Business Machines Corporation | Cache Partitioning in Virtualized Environments |
US20130290687A1 (en) * | 2011-12-23 | 2013-10-31 | Elmoustapha Ould-Ahmed-Vall | Apparatus and method of improved permute instructions |
US20150067265A1 (en) * | 2013-09-05 | 2015-03-05 | Privatecore, Inc. | System and Method for Partitioning of Memory Units into Non-Conflicting Sets |
US9588764B2 (en) | 2011-12-23 | 2017-03-07 | Intel Corporation | Apparatus and method of improved extract instructions |
US9619236B2 (en) | 2011-12-23 | 2017-04-11 | Intel Corporation | Apparatus and method of improved insert instructions |
US9632980B2 (en) | 2011-12-23 | 2017-04-25 | Intel Corporation | Apparatus and method of mask permute instructions |
US9639482B2 (en) | 2011-09-13 | 2017-05-02 | Facebook, Inc. | Software cryptoprocessor |
US9734092B2 (en) | 2014-03-19 | 2017-08-15 | Facebook, Inc. | Secure support for I/O in software cryptoprocessor |
US9747450B2 (en) | 2014-02-10 | 2017-08-29 | Facebook, Inc. | Attestation using a combined measurement and its constituent measurements |
US9946540B2 (en) * | 2011-12-23 | 2018-04-17 | Intel Corporation | Apparatus and method of improved permute instructions with multiple granularities |
US9983894B2 (en) | 2013-09-25 | 2018-05-29 | Facebook, Inc. | Method and system for providing secure system execution on hardware supporting secure application execution |
US10049048B1 (en) | 2013-10-01 | 2018-08-14 | Facebook, Inc. | Method and system for using processor enclaves and cache partitioning to assist a software cryptoprocessor |
US20190026088A1 (en) * | 2016-02-23 | 2019-01-24 | Intel Corporation | Optimizing structures to fit into a complete cache line |
US20190102302A1 (en) * | 2017-09-29 | 2019-04-04 | Intel Corporation | Processor, method, and system for cache partitioning and control for accurate performance monitoring and optimization |
US10635590B2 (en) * | 2017-09-29 | 2020-04-28 | Intel Corporation | Software-transparent hardware predictor for core-to-core data transfer optimization |
US11221965B2 (en) * | 2018-07-17 | 2022-01-11 | SK Hynix Inc. | Cache memory, memory system including the same, and eviction method of cache memory |
US20220197993A1 (en) * | 2022-03-11 | 2022-06-23 | Intel Corporation | Compartment isolation for load store forwarding |
US11513839B2 (en) * | 2018-05-07 | 2022-11-29 | Micron Technology, Inc. | Memory request size management in a multi-threaded, self-scheduling processor |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
ITPD20050146A1 (en) | 2005-05-20 | 2006-11-21 | Fidia Farmaceutici | RE-ABSORBABLE FILLERS CONSISTING OF LIPOSOMAS AND HYALURONIC ACID AND OR ITS DERIVATIVES |
US20080145415A1 (en) * | 2006-04-21 | 2008-06-19 | Lanfranco Callegaro | Bioresorbable Fillers Constituted by Phospholipid Liposomes and Hyaluronic Acid and/or the Derivatives Thereof |
US20180357544A1 (en) * | 2017-06-08 | 2018-12-13 | International Business Machines Corporation | Optimizing tree-based convolutional neural networks |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5634110A (en) * | 1995-05-05 | 1997-05-27 | Silicon Graphics, Inc. | Cache coherency using flexible directory bit vectors |
US6295580B1 (en) * | 1997-01-30 | 2001-09-25 | Sgs-Thomson Microelectronics Limited | Cache system for concurrent processes |
US6662272B2 (en) * | 2001-09-29 | 2003-12-09 | Hewlett-Packard Development Company, L.P. | Dynamic cache partitioning |
US20050055510A1 (en) * | 2002-10-08 | 2005-03-10 | Hass David T. | Advanced processor translation lookaside buffer management in a multithreaded system |
US20050240733A1 (en) * | 2004-04-22 | 2005-10-27 | International Business Machines Corporation | Apparatus and method for selecting instructions for execution based on bank prediction of a multi-bank cache |
US7451272B2 (en) * | 2004-10-19 | 2008-11-11 | Platform Solutions Incorporated | Queue or stack based cache entry reclaim method |
-
2006
- 2006-07-07 US US11/482,923 patent/US20080010413A1/en not_active Abandoned
-
2008
- 2008-06-30 US US12/165,178 patent/US7836256B2/en not_active Expired - Fee Related
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5634110A (en) * | 1995-05-05 | 1997-05-27 | Silicon Graphics, Inc. | Cache coherency using flexible directory bit vectors |
US6295580B1 (en) * | 1997-01-30 | 2001-09-25 | Sgs-Thomson Microelectronics Limited | Cache system for concurrent processes |
US6453385B1 (en) * | 1997-01-30 | 2002-09-17 | Sgs-Thomson Microelectronics Limited | Cache system |
US6662272B2 (en) * | 2001-09-29 | 2003-12-09 | Hewlett-Packard Development Company, L.P. | Dynamic cache partitioning |
US6865647B2 (en) * | 2001-09-29 | 2005-03-08 | Hewlett-Packard Development Company, L.P. | Dynamic cache partitioning |
US20050055510A1 (en) * | 2002-10-08 | 2005-03-10 | Hass David T. | Advanced processor translation lookaside buffer management in a multithreaded system |
US20050240733A1 (en) * | 2004-04-22 | 2005-10-27 | International Business Machines Corporation | Apparatus and method for selecting instructions for execution based on bank prediction of a multi-bank cache |
US7451272B2 (en) * | 2004-10-19 | 2008-11-11 | Platform Solutions Incorporated | Queue or stack based cache entry reclaim method |
Cited By (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8543769B2 (en) | 2009-07-27 | 2013-09-24 | International Business Machines Corporation | Fine grained cache allocation |
US20110022773A1 (en) * | 2009-07-27 | 2011-01-27 | International Business Machines Corporation | Fine Grained Cache Allocation |
US20110055827A1 (en) * | 2009-08-25 | 2011-03-03 | International Business Machines Corporation | Cache Partitioning in Virtualized Environments |
US8739159B2 (en) | 2009-08-25 | 2014-05-27 | International Business Machines Corporation | Cache partitioning with a partition table to effect allocation of shared cache to virtual machines in virtualized environments |
US8745618B2 (en) * | 2009-08-25 | 2014-06-03 | International Business Machines Corporation | Cache partitioning with a partition table to effect allocation of ways and rows of the cache to virtual machine in virtualized environments |
US9639482B2 (en) | 2011-09-13 | 2017-05-02 | Facebook, Inc. | Software cryptoprocessor |
US9632980B2 (en) | 2011-12-23 | 2017-04-25 | Intel Corporation | Apparatus and method of mask permute instructions |
US11354124B2 (en) | 2011-12-23 | 2022-06-07 | Intel Corporation | Apparatus and method of improved insert instructions |
US9588764B2 (en) | 2011-12-23 | 2017-03-07 | Intel Corporation | Apparatus and method of improved extract instructions |
US9619236B2 (en) | 2011-12-23 | 2017-04-11 | Intel Corporation | Apparatus and method of improved insert instructions |
US10467185B2 (en) | 2011-12-23 | 2019-11-05 | Intel Corporation | Apparatus and method of mask permute instructions |
US10459728B2 (en) | 2011-12-23 | 2019-10-29 | Intel Corporation | Apparatus and method of improved insert instructions |
US9658850B2 (en) * | 2011-12-23 | 2017-05-23 | Intel Corporation | Apparatus and method of improved permute instructions |
US10474459B2 (en) | 2011-12-23 | 2019-11-12 | Intel Corporation | Apparatus and method of improved permute instructions |
US11347502B2 (en) | 2011-12-23 | 2022-05-31 | Intel Corporation | Apparatus and method of improved insert instructions |
US9946540B2 (en) * | 2011-12-23 | 2018-04-17 | Intel Corporation | Apparatus and method of improved permute instructions with multiple granularities |
US20130290687A1 (en) * | 2011-12-23 | 2013-10-31 | Elmoustapha Ould-Ahmed-Vall | Apparatus and method of improved permute instructions |
US10719316B2 (en) | 2011-12-23 | 2020-07-21 | Intel Corporation | Apparatus and method of improved packed integer permute instruction |
US11275583B2 (en) | 2011-12-23 | 2022-03-15 | Intel Corporation | Apparatus and method of improved insert instructions |
US10037282B2 (en) | 2013-09-05 | 2018-07-31 | Facebook, Inc. | System and method for partitioning of memory units into non-conflicting sets |
US9477603B2 (en) * | 2013-09-05 | 2016-10-25 | Facebook, Inc. | System and method for partitioning of memory units into non-conflicting sets |
US20150067265A1 (en) * | 2013-09-05 | 2015-03-05 | Privatecore, Inc. | System and Method for Partitioning of Memory Units into Non-Conflicting Sets |
US9983894B2 (en) | 2013-09-25 | 2018-05-29 | Facebook, Inc. | Method and system for providing secure system execution on hardware supporting secure application execution |
US10049048B1 (en) | 2013-10-01 | 2018-08-14 | Facebook, Inc. | Method and system for using processor enclaves and cache partitioning to assist a software cryptoprocessor |
US9747450B2 (en) | 2014-02-10 | 2017-08-29 | Facebook, Inc. | Attestation using a combined measurement and its constituent measurements |
US9734092B2 (en) | 2014-03-19 | 2017-08-15 | Facebook, Inc. | Secure support for I/O in software cryptoprocessor |
US20190026088A1 (en) * | 2016-02-23 | 2019-01-24 | Intel Corporation | Optimizing structures to fit into a complete cache line |
US10761819B2 (en) * | 2016-02-23 | 2020-09-01 | Intel Corporation | Optimizing structures to fit into a complete cache line |
US20190102302A1 (en) * | 2017-09-29 | 2019-04-04 | Intel Corporation | Processor, method, and system for cache partitioning and control for accurate performance monitoring and optimization |
US10482017B2 (en) * | 2017-09-29 | 2019-11-19 | Intel Corporation | Processor, method, and system for cache partitioning and control for accurate performance monitoring and optimization |
US10635590B2 (en) * | 2017-09-29 | 2020-04-28 | Intel Corporation | Software-transparent hardware predictor for core-to-core data transfer optimization |
US11513839B2 (en) * | 2018-05-07 | 2022-11-29 | Micron Technology, Inc. | Memory request size management in a multi-threaded, self-scheduling processor |
US11221965B2 (en) * | 2018-07-17 | 2022-01-11 | SK Hynix Inc. | Cache memory, memory system including the same, and eviction method of cache memory |
US20220197993A1 (en) * | 2022-03-11 | 2022-06-23 | Intel Corporation | Compartment isolation for load store forwarding |
US12019733B2 (en) * | 2022-03-11 | 2024-06-25 | Intel Corporation | Compartment isolation for load store forwarding |
Also Published As
Publication number | Publication date |
---|---|
US7836256B2 (en) | 2010-11-16 |
US20080270705A1 (en) | 2008-10-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7836256B2 (en) | Method and apparatus for application-specific dynamic cache placement | |
US7502890B2 (en) | Method and apparatus for dynamic priority-based cache replacement | |
US6584549B2 (en) | System and method for prefetching data into a cache based on miss distance | |
US7620781B2 (en) | Efficient Bloom filter | |
US8370575B2 (en) | Optimized software cache lookup for SIMD architectures | |
US5761706A (en) | Stream buffers for high-performance computer memory system | |
US8244988B2 (en) | Predictive ownership control of shared memory computing system data | |
US7606974B2 (en) | Automatic caching generation in network applications | |
JP4298800B2 (en) | Prefetch management in cache memory | |
US20070250667A1 (en) | Pseudo-lru virtual counter for a locking cache | |
US7386679B2 (en) | System, method and storage medium for memory management | |
KR20060049710A (en) | An apparatus and method for partitioning a shared cache of a chip multi-processor | |
US20110320720A1 (en) | Cache Line Replacement In A Symmetric Multiprocessing Computer | |
US8019946B2 (en) | Method and system for securing instruction caches using cache line locking | |
KR20030025297A (en) | Fast and accurate cache way selection | |
US20030018855A1 (en) | Method and apparatus for caching with variable size locking regions | |
US20030097538A1 (en) | Automatic program restructuring to reduce average cache miss penalty | |
Tian et al. | Abndp: Co-optimizing data access and load balance in near-data processing | |
US8214601B2 (en) | Purging without write-back of cache lines containing spent data | |
US7299318B2 (en) | Method for reducing cache conflict misses | |
US11726918B2 (en) | Dynamically coalescing atomic memory operations for memory-local computing | |
Ma et al. | Unified-tp: A unified tlb and page table cache structure for efficient address translation | |
US7099998B1 (en) | Method for reducing an importance level of a cache line | |
Ni et al. | SIP: Boosting up graph computing by separating the irregular property data | |
CN117785737A (en) | Last level cache based on linked list structure and supporting dynamic partition granularity access |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KAILAS, KRISHNAN KUNJUNNY;RAVINDRAN, RAJIV ALAZHATH;SURA, ZEHRA;REEL/FRAME:018636/0900 Effective date: 20060707 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO PAY ISSUE FEE |