Nothing Special   »   [go: up one dir, main page]

US20080010413A1 - Method and apparatus for application-specific dynamic cache placement - Google Patents

Method and apparatus for application-specific dynamic cache placement Download PDF

Info

Publication number
US20080010413A1
US20080010413A1 US11/482,923 US48292306A US2008010413A1 US 20080010413 A1 US20080010413 A1 US 20080010413A1 US 48292306 A US48292306 A US 48292306A US 2008010413 A1 US2008010413 A1 US 2008010413A1
Authority
US
United States
Prior art keywords
load
store instruction
store
virtual partitions
data item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/482,923
Inventor
Krishnan Kunjunny Kailas
Rajiv Alazhath Ravindran
Zehra Sura
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US11/482,923 priority Critical patent/US20080010413A1/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KAILAS, KRISHNAN KUNJUNNY, RAVINDRAN, RAJIV ALAZHATH, SURA, ZEHRA
Publication of US20080010413A1 publication Critical patent/US20080010413A1/en
Priority to US12/165,178 priority patent/US7836256B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0842Multiuser, multiprocessor or multiprocessing cache systems for multiprocessing or multitasking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/12Replacement control
    • G06F12/121Replacement control using replacement algorithms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30043LOAD or STORE instructions; Clear instruction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing

Definitions

  • the present invention relates generally to data processing systems and relates more particularly to the management of data stored in cache memory systems.
  • Cache memory Traditional means of reducing this performance gap include the use of cache memory of varying sizes and levels, which provides temporary storage for and quick access to frequently used data.
  • Cache memory is conventionally managed by hardware, which attempts to cache the most frequently accessed data while purging older, unused data and fetching data from nearby memory locations, thus retaining the working set of the program in cache.
  • hardware-managed cache is designed to support a wide spectrum of workloads, the design tends to be very generic in terms of the algorithms used for cache placement (i.e., determining where in the cache array to keep a data item fetched from a specific memory address). For example, set placement decisions are made based on selected bits from the physical address to index into a given set of a set-associative cache. Although such greedy/random replacement decisions provide reasonably high hit-rates at low cost, they have proven to be ineffective for high memory bound workloads. Thus hardware-managed cache is generally less capable of exploiting application-specific behaviors and replacement policies than other memory schemes. Other software-based approaches that rely on the programmer or compiler for data movement also tend to behave too conservatively to close the processor/memory system gap effectively.
  • One embodiment of the present method and apparatus for application-specific dynamic cache placement includes grouping sets of data in a cache memory system into two or more virtual partitions and processing a load/store instruction in accordance with the virtual partitions, where the load/store instruction specifies at least one of the virtual partitions to which the load/store instruction is assigned.
  • FIG. 1 is a schematic diagram illustrating an exemplary cache, partitioned according to one embodiment of the present invention
  • FIG. 2 is a flow diagram illustrating one embodiment of a method for loading and storing data, according to the present invention
  • FIG. 3 is a schematic diagram illustrating an exemplary cache, partitioned according to another embodiment of present invention.
  • FIG. 4 is a flow diagram illustrating another embodiment of a method for loading and storing data, according to the present invention.
  • FIG. 5 is a flow diagram illustrating one embodiment of a method for partitioning load/store instructions, according to the present invention.
  • FIG. 6 is a high level block diagram of the data caching method that is implemented using a general purpose computing device.
  • the present invention is a method and apparatus for application-specific dynamic cache placement.
  • the present invention maintains multiple smaller caches, as opposed to the traditional unified cache. That is, the cache is partitioned logically (i.e., without necessarily maintaining multiple distinct caches), such that data items can be assigned separately to each of the partitions, thereby reducing conflicts among data items that may share a set address while residing at different locations within memory. Moreover, data sets with long reuse can be restricted to a certain partition, making it less likely that a data set will replace other, more useful data items assigned to other partitions.
  • FIG. 1 is a schematic diagram illustrating an exemplary cache 100 , partitioned according to one embodiment of the present invention.
  • a tag directory/array 10 of the cache 100 comprises a plurality of sets 102 1 - 102 n (hereinafter collectively referred to as “sets 102 ”). Each set 102 comprises a plurality of ways or lines to which data items may be loaded.
  • the sets 102 are grouped into two or more partitions 104 1 - 104 n (hereinafter collectively referred to as “partitions 104 ”).
  • the partitions 104 are virtual groupings or “supersets” comprising multiple regular sets 102 .
  • the tag directory 10 comprises four sets 102 (each comprising 4 lines) that are grouped into two partitions 104 of two sets 102 each.
  • An exemplary load/store instruction 112 for accessing the cache 100 generates an n-bit address 106 , where the fields 108 1 - 108 3 (hereinafter collectively referred to as “fields 108 ”) of the address are named using the convention address [n:m], n is the least significant bit and m is the most significant bit. The width, w, of a field 108 is therefore m ⁇ n+1.
  • fields 108 the fields 108 1 - 108 3 (hereinafter collectively referred to as “fields 108 ”) of the address are named using the convention address [n:m], n is the least significant bit and m is the most significant bit. The width, w, of a field 108 is therefore m ⁇ n+1.
  • s bits of the n-bit address are used to select among 2 5 sets in cache; for partitioned cache comprising 2 k partitions, only s-k bits are used to select sets within a given partition 104 .
  • the address 106 specified by the load/store instruction 112 comprises a “tag” field 108 1 for specifying a tag address of referenced data item stored in the tag directory 110 , a “set-index” field 108 2 for specifying a maximum number of sets 102 per partitions 104 in the cache 100 , and an “offset” field 108 3 for specifying a size of the data item.
  • the load/store instruction 112 also specifies one or more identifiers of target partitions 104 in the cache 100 .
  • the target partitions 104 comprise portions of the cache 100 to which the load/store instruction 112 may load a data item in the event that the data item is not already contained within the cache 100 .
  • the partition 104 to which the load/store instruction 112 is assigned is represented using a 2k -bit vector, where each bit in the 2k-bit vector corresponds to a given partition 104 .
  • the corresponding bit-vector value would be [0110], where 0 indicates an unassigned partition and 1 indicates an assigned partition.
  • FIG. 2 is a flow diagram illustrating one embodiment of a method 200 for loading and storing data, according to the present invention.
  • the method 200 may be implemented, for example, at a cache controller.
  • the method 200 is initialized at step 202 and proceeds to step 204 , where the method 200 receives a load/store instruction.
  • the method 200 then proceeds to step 206 and checks each partition in the cache for the data item referenced in the load/store instruction. Referring back to FIG. 1 , in step 206 , the method 200 looks for the “tag” field 108 (i.e., tag field 108 1 ) of the address of (i.e., the bits [f+s ⁇ k:n] of the address) in the tag directory 110 in each partition 104 .
  • the method 200 only examines specific partitions of the cache that are specified using the bit-vector in the load/store instruction generated by the compiler.
  • step 208 the method 200 determines whether the data item is present in the cache based on the result of the tag comparison done in step 206 . If the method 200 concludes in step 208 that the data item is present in a given partition in the cache (i.e., a cache “hit”), the method 200 proceeds to step 218 and either forwards the data item to the processor (in the case of a load operation) or stores the data item in cache (in the case of a store operation).
  • step 208 if the method 200 concludes in step 208 that the data item is not present in the cache (i.e., a cache “miss”), the method 200 proceeds to step 212 and determines whether the load/store instruction is assigned to one or more partitions of the cache.
  • step 212 the method 200 proceeds to step 214 and loads the data item to one of the partitions assigned to the load/store instruction (e.g., to tag directory location (p, set-index), where p is a selected partition in the tag directory).
  • a partition identifier that specifies a partition of cache to which the load/store instruction has been assigned is included in the load/store instruction.
  • any replacement policy e.g., least recently used or the like may be used to select from among the specified partitions.
  • the partition identifier is implemented as part of a special load/store instruction that includes a number of extra bits for identifying the partition.
  • the partition identifier is implemented as a separate prefix instruction that specifies the partition associated with the next sequential load/store instruction.
  • one or more special registers can be implemented as new additional operands of load/store instructions to specify the partition information.
  • the register-based solution allows the load or store to be assigned to multiple partitions, as each register will comprise a bit-vector that specifies the assigned partitions.
  • register move instructions are used to initialize each register with a literal that corresponds to the partition bit-vector. The register move instructions are inserted prior to the load/store instructions that use them.
  • such a bit-vector is specified as an immediate operand of the load/store instruction.
  • step 212 if the method 200 concludes in step 212 that the load/store instruction is not assigned to a partition, the method 200 proceeds to step 216 and loads the data item in accordance with the address of the data item (i.e., in accordance with the treatment of traditional unified cache).
  • step 218 the data item is read from the tag directory of the cache location (p, set-index, tag) at offset, where p is the partition identifier.
  • p the partition identifier
  • the data item is written at offset in tag directory location (p, set-index, tag).
  • the knowledge of which sets in cache comprise which partitions is encoded in the cache controller.
  • This knowledge is substantially transparent to an application using the cache, however.
  • By searching for the referenced data item in all cache partitions to determine a hit or a miss occurrences of data duplication and associated coherency issues are substantially reduced.
  • By annotating load/store instructions with partition identifiers that are to be searched during this stage performance may be even further improved. If a compiler is unable to prove that two loads assigned to two different partitions do not conflict, the compiler may allow both loads to probe each others' partitions, thereby reducing the occurrence of data duplication.
  • FIG. 3 is a schematic diagram illustrating an exemplary cache 300 , partitioned according to another embodiment of present invention.
  • the cache 300 is substantially similar to the cache 100 illustrated in FIG. 1 .
  • a tag directory/array 310 of the cache 300 comprises a plurality of sets 302 1 - 302 n (hereinafter collectively referred to as “sets 302 ”). Each set 302 comprises a plurality of ways or lines to which data items may be loaded.
  • the sets 302 are grouped into two or more partitions 304 n (hereinafter collectively referred to as “partitions 304 ”).
  • the tag directory 310 comprises two sets 302 (each comprising 4 lines) that are grouped into a single partition 304 .
  • the cache directory 310 includes an additional partition identifier 312 for each set 302 in the cache 300 .
  • a partition 304 comprises all sets 302 that have the same partition identifier 312 .
  • the address 306 specified by the exemplary load/store instruction 314 is substantially similar to that for a unified cache and comprises a “tag” field 308 , for specifying a referenced data item, a “set-index” field 3082 for specifying a maximum number of partitions 304 in the cache 300 , and an “offset” field 3083 for specifying a block size of the data item.
  • the load/store instruction 314 also specifies one or more identifiers of target partitions 304 in the cache 300 .
  • FIG. 4 is a flow diagram illustrating another embodiment of a method 400 for loading and storing data, according to the present invention.
  • the method 400 may be implemented at a cache controller. Unlike the method 400 , however, the method 400 implements partition identifiers as part of the cache tag directory structure, as described with respect to FIG. 3 .
  • the method 400 is initialized at step 402 and proceeds to step 404 , where the method 400 receives a load/store instruction.
  • the method 400 then proceeds to step 406 and identifies which sets within the cache comprise a partition.
  • the method 400 first probes the tag directory to identify which sets constitute a partition. This must be done for all sets, so as to identify every partition.
  • step 408 checks each identified partition in the cache for the data item referenced in the load/store instruction.
  • the sets in a partition are probed using regular “tag” and “set-index” fields from the load/store instruction.
  • step 410 the method 400 determines whether the data item is present in a given partition in the cache. If the method 400 concludes in step 410 that the data item is present in the cache (i.e., a cache “hit”), the method 400 proceeds to step 420 and either forwards the data item to the processor (load) or stores the data item in the cache (store).
  • step 410 if the method 400 concludes in step 410 that the data item is not present in the cache (i.e., a cache “miss”), the method 400 proceeds to step 414 and determines whether the load/store instruction is assigned to one or more partitions of the cache.
  • step 414 the method 400 proceeds to step 416 and loads the data item to one of the partitions assigned to the load/store instruction (e.g., to tag directory location (p, set-index), where p is a selected partition in the tag directory).
  • a partition identifier that specifies a partition of cache to which the load/store instruction has been assigned is included in the load/store instruction.
  • any replacement policy e.g., least recently used or the like may be used to selected from among the specified partitions.
  • step 414 if the method 400 concludes in step 414 that the load/store instruction is not assigned to a partition, the method 400 proceeds to step 418 and loads the data item in accordance with the address of the block/data item (i.e., in accordance with the treatment of traditional unified cache).
  • step 420 forwards the data item to the processor or stores the data item in the cache before terminating in step 422 .
  • the data item is read from the tag directory of the cache location (p, set-index, tag) at offset, where p is the partition identifier.
  • the data item is written at offset in tag directory location (p, set-index, tag).
  • the method 400 does not fix the number of partitions (or the sizes of the partitions) in cache.
  • the method 400 can accommodate cache accesses by different applications having different needs regarding partition configuration. Also, even within an application, different phases of the application having different partition requirements can be easily accommodated.
  • the method 400 affords more flexibility than the method 200 , but at the cost of slightly greater overhead (required to maintain partition identifiers within the tag directory of the cache).
  • FIG. 5 is a flow diagram illustrating one embodiment of a method 500 for partitioning load/store instructions, according to the present invention.
  • the method 500 may be implemented, for example, at a compiler, in order to facilitate the processing of load/store instructions as described above.
  • the method 500 is initialized at step 502 and proceeds to step 504 , where the method 500 identifies the most frequent load/store instructions, as well as the accessed data items corresponding to these load/store instructions.
  • the method 500 calculates the affinity and conflict metrics for each load/store instruction pair.
  • the affinity between two memory reference instructions is defined to be proportional to the cardinality of the common set of data items that are accessed by the two instructions within a given time interval. If two instructions access the same data set, assigning the two instructions to the same partition is useful, as the two instructions can aid each other by pre-fetching data sets for the other. Conflict information is used to identify which pairs of load/store instructions potentially displace each other, and are thus better assigned to different partitions.
  • the method 500 records the footprint of each load/store instruction.
  • the footprint of a load/store instruction is the number of unique reused data items that are accessed by the load/store instruction. Load/store instructions with large footprints may need to be assigned to multiple partitions.
  • each load/store instruction is broken into one or more sub-instructions, where each sub-instruction is either a regular load/store instruction or a sub-part of a single load/store instruction that performs a subset of the data accesses that would have been performed by the original load/store instruction.
  • the number of sub-instructions corresponding to each load/store instruction is proportional to:
  • Footprint(instruction) is the number of unique reused data items accessed by the corresponding load/store instruction.
  • the method 500 formulates the partitioning problem as a graph, where nodes of the graph correspond to sub-instructions, and edges of the graph measure the desire of the sub-instructions to be assigned to the same or different partitions.
  • the edge weights of the graph are computed based on the affinity and conflict metrics calculated earlier.
  • the edges of the graph are assigned weights such that load/store instructions exhibiting a high degree of conflict are assigned to different partitions, while load/store instructions exhibiting a high degree of affinity are assigned to the same partition.
  • the edge weights between sub-instructions of the same memory reference instruction are set such that the sub-instructions are always assigned to different partitions.
  • step 514 the method 500 assigns each load/store instruction to one or more cache partitions, in accordance with the graph.
  • a load/store instruction is assigned to multiple cache partitions if the load/store instruction comprises multiple sub-instructions assigned to different partitions.
  • step 516 the method 500 annotates the load/store instructions with their respective partition identifiers, as determined in step 514 .
  • the method 500 then terminates in step 518 .
  • FIG. 6 is a high level block diagram of the data caching method that is implemented using a general purpose computing device 600 .
  • a general purpose computing device 600 comprises a processor 602 , a memory 604 , a data caching module 605 and various input/output (I/O) devices 606 such as a display, a keyboard, a mouse, a modem, and the like.
  • I/O devices 606 such as a display, a keyboard, a mouse, a modem, and the like.
  • at least one I/O device is a storage device (e.g., a disk drive, an optical disk drive, a floppy disk drive).
  • the data caching module 605 can be implemented as a physical device or subsystem that is coupled to a processor through a communication channel.
  • the data caching module 605 can be represented by one or more software applications (or even a combination of software and hardware, e.g., using Application Specific Integrated Circuits (ASIC)), where the software is loaded from a storage medium (e.g., I/O devices 606 ) and operated by the processor 602 in the memory 604 of the general purpose computing device 600 .
  • a storage medium e.g., I/O devices 606
  • the data caching module 605 for loading and storing data described herein with reference to the preceding Figures can be stored on a computer readable medium or carrier (e.g., RAM, magnetic or optical drive or diskette, and the like).
  • Embodiments of the present invention represents a significant advancement in the field of data processing systems.
  • Embodiments of the present invention maintain multiple smaller caches, as opposed to a traditional unified cache. That is, the cache is partitioned logically (i.e., without necessarily maintaining multiple distinct caches), such that data items can be assigned separately to each of the partitions, thereby reducing conflicts among data items that may share a set address while residing at different locations within memory.
  • data sets with long reuse can be restricted to a certain partition, making it less likely that a data set will replace other, more useful data items assigned to other partitions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

One embodiment of the present method and apparatus for application-specific dynamic cache placement includes grouping sets of data in a cache memory system into two or more virtual partitions and processing a load/store instruction in accordance with the virtual partitions, where the load/store instruction specifies at least one of the virtual partitions to which the load/store instruction is assigned.

Description

    FIELD OF THE INVENTION
  • The present invention relates generally to data processing systems and relates more particularly to the management of data stored in cache memory systems.
  • BACKGROUND
  • The performance gap between the processor and the memory system in computing systems has been steadily increasing. With every generation, processors are being clocked at higher rates, while memory systems have been unable to catch up to this exponential growth. The resultant performance gap has caused a major bottleneck for single-thread performance in modern day processors, as the memory system is generally unable to supply data to the processor core at the rate of execution of the processor.
  • Traditional means of reducing this performance gap include the use of cache memory of varying sizes and levels, which provides temporary storage for and quick access to frequently used data. Cache memory is conventionally managed by hardware, which attempts to cache the most frequently accessed data while purging older, unused data and fetching data from nearby memory locations, thus retaining the working set of the program in cache.
  • Although there are many advantages to using hardware-managed cache systems, such as transparency and flexibility, the layer of abstraction afforded by these systems comes at a price. For one, because hardware-managed cache is designed to support a wide spectrum of workloads, the design tends to be very generic in terms of the algorithms used for cache placement (i.e., determining where in the cache array to keep a data item fetched from a specific memory address). For example, set placement decisions are made based on selected bits from the physical address to index into a given set of a set-associative cache. Although such greedy/random replacement decisions provide reasonably high hit-rates at low cost, they have proven to be ineffective for high memory bound workloads. Thus hardware-managed cache is generally less capable of exploiting application-specific behaviors and replacement policies than other memory schemes. Other software-based approaches that rely on the programmer or compiler for data movement also tend to behave too conservatively to close the processor/memory system gap effectively.
  • Thus, there is a need in the art for a method and apparatus for application-specific dynamic cache placement.
  • SUMMARY OF THE INVENTION
  • One embodiment of the present method and apparatus for application-specific dynamic cache placement includes grouping sets of data in a cache memory system into two or more virtual partitions and processing a load/store instruction in accordance with the virtual partitions, where the load/store instruction specifies at least one of the virtual partitions to which the load/store instruction is assigned.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • So that the manner in which the above recited embodiments of the invention are attained and can be understood in detail, a more particular description of the invention, briefly summarized above, may be obtained by reference to the embodiments thereof which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.
  • FIG. 1 is a schematic diagram illustrating an exemplary cache, partitioned according to one embodiment of the present invention;
  • FIG. 2 is a flow diagram illustrating one embodiment of a method for loading and storing data, according to the present invention;
  • FIG. 3 is a schematic diagram illustrating an exemplary cache, partitioned according to another embodiment of present invention;
  • FIG. 4 is a flow diagram illustrating another embodiment of a method for loading and storing data, according to the present invention;
  • FIG. 5 is a flow diagram illustrating one embodiment of a method for partitioning load/store instructions, according to the present invention; and
  • FIG. 6 is a high level block diagram of the data caching method that is implemented using a general purpose computing device.
  • To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures.
  • DETAILED DESCRIPTION
  • In one embodiment, the present invention is a method and apparatus for application-specific dynamic cache placement. In one embodiment, the present invention maintains multiple smaller caches, as opposed to the traditional unified cache. That is, the cache is partitioned logically (i.e., without necessarily maintaining multiple distinct caches), such that data items can be assigned separately to each of the partitions, thereby reducing conflicts among data items that may share a set address while residing at different locations within memory. Moreover, data sets with long reuse can be restricted to a certain partition, making it less likely that a data set will replace other, more useful data items assigned to other partitions.
  • FIG. 1 is a schematic diagram illustrating an exemplary cache 100, partitioned according to one embodiment of the present invention. A tag directory/array 10 of the cache 100 comprises a plurality of sets 102 1-102 n (hereinafter collectively referred to as “sets 102”). Each set 102 comprises a plurality of ways or lines to which data items may be loaded. The sets 102 are grouped into two or more partitions 104 1-104 n (hereinafter collectively referred to as “partitions 104”). The partitions 104 are virtual groupings or “supersets” comprising multiple regular sets 102. In the example of FIG. 1, the tag directory 10 comprises four sets 102 (each comprising 4 lines) that are grouped into two partitions 104 of two sets 102 each.
  • An exemplary load/store instruction 112 for accessing the cache 100 generates an n-bit address 106, where the fields 108 1-108 3 (hereinafter collectively referred to as “fields 108”) of the address are named using the convention address [n:m], n is the least significant bit and m is the most significant bit. The width, w, of a field 108 is therefore m−n+1. For unified cache, s bits of the n-bit address are used to select among 25 sets in cache; for partitioned cache comprising 2k partitions, only s-k bits are used to select sets within a given partition 104.
  • The address 106 specified by the load/store instruction 112 comprises a “tag” field 108 1 for specifying a tag address of referenced data item stored in the tag directory 110, a “set-index” field 108 2 for specifying a maximum number of sets 102 per partitions 104 in the cache 100, and an “offset” field 108 3 for specifying a size of the data item. As will be described in further detail below, the load/store instruction 112 also specifies one or more identifiers of target partitions 104 in the cache 100. The target partitions 104 comprise portions of the cache 100 to which the load/store instruction 112 may load a data item in the event that the data item is not already contained within the cache 100. In general, if there are 2k partitions 104 in the cache 100, the partition 104 to which the load/store instruction 112 is assigned is represented using a 2k -bit vector, where each bit in the 2k-bit vector corresponds to a given partition 104. Thus, for example, if there are four partitions in cache, and a load/store instruction is assigned to partitions two and three, the corresponding bit-vector value would be [0110], where 0 indicates an unassigned partition and 1 indicates an assigned partition.
  • FIG. 2 is a flow diagram illustrating one embodiment of a method 200 for loading and storing data, according to the present invention. The method 200 may be implemented, for example, at a cache controller.
  • The method 200 is initialized at step 202 and proceeds to step 204, where the method 200 receives a load/store instruction. The method 200 then proceeds to step 206 and checks each partition in the cache for the data item referenced in the load/store instruction. Referring back to FIG. 1, in step 206, the method 200 looks for the “tag” field 108 (i.e., tag field 108 1) of the address of (i.e., the bits [f+s−k:n] of the address) in the tag directory 110 in each partition 104. In another embodiment, the method 200 only examines specific partitions of the cache that are specified using the bit-vector in the load/store instruction generated by the compiler.
  • In step 208, the method 200 determines whether the data item is present in the cache based on the result of the tag comparison done in step 206. If the method 200 concludes in step 208 that the data item is present in a given partition in the cache (i.e., a cache “hit”), the method 200 proceeds to step 218 and either forwards the data item to the processor (in the case of a load operation) or stores the data item in cache (in the case of a store operation).
  • Alternatively, if the method 200 concludes in step 208 that the data item is not present in the cache (i.e., a cache “miss”), the method 200 proceeds to step 212 and determines whether the load/store instruction is assigned to one or more partitions of the cache.
  • If the method 200 concludes in step 212 that the load/store instruction is assigned to a partition, the method 200 proceeds to step 214 and loads the data item to one of the partitions assigned to the load/store instruction (e.g., to tag directory location (p, set-index), where p is a selected partition in the tag directory). In this case, a partition identifier that specifies a partition of cache to which the load/store instruction has been assigned is included in the load/store instruction. In one embodiment, where the partition identifier specifies more than one partition, any replacement policy (e.g., least recently used or the like) may be used to select from among the specified partitions.
  • In one embodiment, the partition identifier is implemented as part of a special load/store instruction that includes a number of extra bits for identifying the partition. In another embodiment, the partition identifier is implemented as a separate prefix instruction that specifies the partition associated with the next sequential load/store instruction. In another embodiment still, one or more special registers can be implemented as new additional operands of load/store instructions to specify the partition information. The register-based solution allows the load or store to be assigned to multiple partitions, as each register will comprise a bit-vector that specifies the assigned partitions. In one embodiment, register move instructions are used to initialize each register with a literal that corresponds to the partition bit-vector. The register move instructions are inserted prior to the load/store instructions that use them. In yet another embodiment involving a small number of partitions, such a bit-vector is specified as an immediate operand of the load/store instruction.
  • Alternatively, if the method 200 concludes in step 212 that the load/store instruction is not assigned to a partition, the method 200 proceeds to step 216 and loads the data item in accordance with the address of the data item (i.e., in accordance with the treatment of traditional unified cache).
  • Once the data item has been fetched from the cache (i.e., in accordance with step 210) or loaded to the appropriate partition (i.e., in accordance with step 214 or 216), the method 200 proceeds to step 218 and either forwards the data item to the processor (load) or stores the data item in cache (store) before terminating in step 220. For a load instruction, the data item is read from the tag directory of the cache location (p, set-index, tag) at offset, where p is the partition identifier. Similarly, for a store instruction, the data item is written at offset in tag directory location (p, set-index, tag).
  • Thus, according to the method 200, the knowledge of which sets in cache comprise which partitions is encoded in the cache controller. This knowledge is substantially transparent to an application using the cache, however. By searching for the referenced data item in all cache partitions to determine a hit or a miss, occurrences of data duplication and associated coherency issues are substantially reduced. By annotating load/store instructions with partition identifiers that are to be searched during this stage, performance may be even further improved. If a compiler is unable to prove that two loads assigned to two different partitions do not conflict, the compiler may allow both loads to probe each others' partitions, thereby reducing the occurrence of data duplication.
  • Moreover, by allowing a load/store instruction to load data only to assigned partitions of the cache, the likelihood of the load/store instruction replacing data items required by other load/store instructions is substantially reduced. “Unified” and “partitioned” views of the cache can be retained simultaneously, without having to re-organize the tag directory of the cache.
  • FIG. 3 is a schematic diagram illustrating an exemplary cache 300, partitioned according to another embodiment of present invention. The cache 300 is substantially similar to the cache 100 illustrated in FIG. 1. A tag directory/array 310 of the cache 300 comprises a plurality of sets 302 1-302 n (hereinafter collectively referred to as “sets 302”). Each set 302 comprises a plurality of ways or lines to which data items may be loaded. The sets 302 are grouped into two or more partitions 304 n (hereinafter collectively referred to as “partitions 304”). In the example of FIG. 3, the tag directory 310 comprises two sets 302 (each comprising 4 lines) that are grouped into a single partition 304. Unlike the cache 100, the cache directory 310 includes an additional partition identifier 312 for each set 302 in the cache 300. Thus, a partition 304 comprises all sets 302 that have the same partition identifier 312.
  • The address 306 specified by the exemplary load/store instruction 314 is substantially similar to that for a unified cache and comprises a “tag” field 308, for specifying a referenced data item, a “set-index” field 3082 for specifying a maximum number of partitions 304 in the cache 300, and an “offset” field 3083 for specifying a block size of the data item. The load/store instruction 314 also specifies one or more identifiers of target partitions 304 in the cache 300.
  • FIG. 4 is a flow diagram illustrating another embodiment of a method 400 for loading and storing data, according to the present invention. The method 400, like the method 200, may be implemented at a cache controller. Unlike the method 400, however, the method 400 implements partition identifiers as part of the cache tag directory structure, as described with respect to FIG. 3.
  • The method 400 is initialized at step 402 and proceeds to step 404, where the method 400 receives a load/store instruction. The method 400 then proceeds to step 406 and identifies which sets within the cache comprise a partition. In particular, since the partition identifiers for each set are stored in the tag directory structure, the method 400 first probes the tag directory to identify which sets constitute a partition. This must be done for all sets, so as to identify every partition.
  • Once the method 400 has identified the sets within each partition, the method 400 proceeds to step 408 and checks each identified partition in the cache for the data item referenced in the load/store instruction. The sets in a partition are probed using regular “tag” and “set-index” fields from the load/store instruction.
  • In step 410, the method 400 determines whether the data item is present in a given partition in the cache. If the method 400 concludes in step 410 that the data item is present in the cache (i.e., a cache “hit”), the method 400 proceeds to step 420 and either forwards the data item to the processor (load) or stores the data item in the cache (store).
  • Alternatively, if the method 400 concludes in step 410 that the data item is not present in the cache (i.e., a cache “miss”), the method 400 proceeds to step 414 and determines whether the load/store instruction is assigned to one or more partitions of the cache.
  • If the method 400 concludes in step 414 that the load/store instruction is assigned to a partition, the method 400 proceeds to step 416 and loads the data item to one of the partitions assigned to the load/store instruction (e.g., to tag directory location (p, set-index), where p is a selected partition in the tag directory). In this case, a partition identifier that specifies a partition of cache to which the load/store instruction has been assigned is included in the load/store instruction. In one embodiment, where the partition identifier specifies more than one partition, any replacement policy (e.g., least recently used or the like) may be used to selected from among the specified partitions.
  • Alternatively, if the method 400 concludes in step 414 that the load/store instruction is not assigned to a partition, the method 400 proceeds to step 418 and loads the data item in accordance with the address of the block/data item (i.e., in accordance with the treatment of traditional unified cache).
  • Once the data item has been fetched from the cache or loaded to the appropriate partition (i.e., in accordance with step 416 or 418), the method 400 proceeds to step 420 and forwards the data item to the processor or stores the data item in the cache before terminating in step 422. For a load instruction, the data item is read from the tag directory of the cache location (p, set-index, tag) at offset, where p is the partition identifier. For a store instruction, the data item is written at offset in tag directory location (p, set-index, tag).
  • The method 400, though similar to the method 200, does not fix the number of partitions (or the sizes of the partitions) in cache. Thus, the method 400 can accommodate cache accesses by different applications having different needs regarding partition configuration. Also, even within an application, different phases of the application having different partition requirements can be easily accommodated. Thus, the method 400 affords more flexibility than the method 200, but at the cost of slightly greater overhead (required to maintain partition identifiers within the tag directory of the cache).
  • FIG. 5 is a flow diagram illustrating one embodiment of a method 500 for partitioning load/store instructions, according to the present invention. The method 500 may be implemented, for example, at a compiler, in order to facilitate the processing of load/store instructions as described above.
  • The method 500 is initialized at step 502 and proceeds to step 504, where the method 500 identifies the most frequent load/store instructions, as well as the accessed data items corresponding to these load/store instructions.
  • In step 506, the method 500 calculates the affinity and conflict metrics for each load/store instruction pair. The affinity between two memory reference instructions is defined to be proportional to the cardinality of the common set of data items that are accessed by the two instructions within a given time interval. If two instructions access the same data set, assigning the two instructions to the same partition is useful, as the two instructions can aid each other by pre-fetching data sets for the other. Conflict information is used to identify which pairs of load/store instructions potentially displace each other, and are thus better assigned to different partitions.
  • In step 508, the method 500 records the footprint of each load/store instruction. The footprint of a load/store instruction is the number of unique reused data items that are accessed by the load/store instruction. Load/store instructions with large footprints may need to be assigned to multiple partitions.
  • In step 510, each load/store instruction is broken into one or more sub-instructions, where each sub-instruction is either a regular load/store instruction or a sub-part of a single load/store instruction that performs a subset of the data accesses that would have been performed by the original load/store instruction. The number of sub-instructions corresponding to each load/store instruction is proportional to:

  • Footprint(instruction)/size of(single partition)
  • where Footprint(instruction) is the number of unique reused data items accessed by the corresponding load/store instruction.
  • In step 512, the method 500 formulates the partitioning problem as a graph, where nodes of the graph correspond to sub-instructions, and edges of the graph measure the desire of the sub-instructions to be assigned to the same or different partitions. The edge weights of the graph are computed based on the affinity and conflict metrics calculated earlier. The edges of the graph are assigned weights such that load/store instructions exhibiting a high degree of conflict are assigned to different partitions, while load/store instructions exhibiting a high degree of affinity are assigned to the same partition. The edge weights between sub-instructions of the same memory reference instruction are set such that the sub-instructions are always assigned to different partitions.
  • In step 514, the method 500 assigns each load/store instruction to one or more cache partitions, in accordance with the graph. In one embodiment, a load/store instruction is assigned to multiple cache partitions if the load/store instruction comprises multiple sub-instructions assigned to different partitions.
  • In step 516, the method 500 annotates the load/store instructions with their respective partition identifiers, as determined in step 514. The method 500 then terminates in step 518.
  • FIG. 6 is a high level block diagram of the data caching method that is implemented using a general purpose computing device 600. In one embodiment, a general purpose computing device 600 comprises a processor 602, a memory 604, a data caching module 605 and various input/output (I/O) devices 606 such as a display, a keyboard, a mouse, a modem, and the like. In one embodiment, at least one I/O device is a storage device (e.g., a disk drive, an optical disk drive, a floppy disk drive). It should be understood that the data caching module 605 can be implemented as a physical device or subsystem that is coupled to a processor through a communication channel.
  • Alternatively, the data caching module 605 can be represented by one or more software applications (or even a combination of software and hardware, e.g., using Application Specific Integrated Circuits (ASIC)), where the software is loaded from a storage medium (e.g., I/O devices 606) and operated by the processor 602 in the memory 604 of the general purpose computing device 600. Thus, in one embodiment, the data caching module 605 for loading and storing data described herein with reference to the preceding Figures can be stored on a computer readable medium or carrier (e.g., RAM, magnetic or optical drive or diskette, and the like).
  • Thus, the present invention represents a significant advancement in the field of data processing systems. Embodiments of the present invention maintain multiple smaller caches, as opposed to a traditional unified cache. That is, the cache is partitioned logically (i.e., without necessarily maintaining multiple distinct caches), such that data items can be assigned separately to each of the partitions, thereby reducing conflicts among data items that may share a set address while residing at different locations within memory. Moreover, data sets with long reuse can be restricted to a certain partition, making it less likely that a data set will replace other, more useful data items assigned to other partitions.
  • While the foregoing is directed to the preferred embodiment of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.

Claims (20)

1. A method for accessing data in a cache memory system, the method comprising:
grouping sets of data in the cache memory system into two or more virtual partitions; and
processing a load/store instruction in accordance with the two or more virtual partitions, where the load/store instruction specifies at least one of the two or more virtual partitions to which the load/store instruction is assigned.
2. The method of claim 1, wherein the processing comprises:
checking at least a subset of the two or more virtual partitions for a data item referenced by the load/store instruction;
fetching the data item referenced by the load/store instruction from one of the two or more virtual partitions, if the data item is located therein and the load/store instruction specifies a load operation;
storing the data item referenced by the load/store instruction in one of the two or more virtual partitions, if the load/store instruction specifies a store operation; and
loading the data item to the at least one of the two or more virtual partitions to which the load/store instruction is assigned, if the data item is not located in the subset of the two or more virtual partitions.
3. The method of claim 2, wherein the checking comprises:
checking each of the two or more virtual partitions for the data item.
4. The method of claim 2, wherein the checking comprises:
identifying one or more sets in the cache memory system that comprise the at least one of the two or more virtual partitions to which the load/store instruction is assigned; and
checking the one or more sets for the data item.
5. The method of claim 2, wherein the loading and storing comprises:
selecting a virtual partition from among two or more virtual partitions to which the load/store instruction is assigned, in accordance with a cache replacement policy.
6. The method of claim 1, wherein the at least one of the two or more virtual partitions to which the load/store instruction is assigned is specified by an identifier in the load/store instruction.
7. The method of claim 6, wherein the load/store instruction comprises a number of extra bits for specifying the identifier.
8. The method of claim 6, wherein the identifier is implemented as a prefix instruction to the load/store instruction, the prefix instruction specifying one or the two or more virtual partitions that is associated with a next sequential load/store instruction.
9. The method of claim 6, wherein the identifier is specified in accordance with one or more registers implemented as operands of the load/store instruction.
10. The method of claim 1, wherein the grouping comprises:
identifying one or more load/store instructions that are encountered most frequently from among a plurality of load/store instructions;
identifying one or more data items accessed by the one or more load/store instructions;
modeling the one or more load/store instructions as a graph; and
assigning each of the one or more load/store instructions to at least one virtual partition in accordance with the graph.
11. The method of claim 10, wherein the modeling comprises:
dividing each of the one or more load/store instructions into one or more sub-instructions, each of the one or more sub-instructions comprising a node of the graph;
calculating an affinity metric for each pair of load/store instructions within the one or more load/store instructions;
calculating a conflict metric for each pair of load/store instructions; and
assigning weights to edges of the graph in accordance with the affinity metric and the conflict metric, the weights indicating whether each pair of load/store instructions should be assigned to a common virtual partition or to different virtual partitions.
12. A computer readable medium containing an executable program for accessing data in a cache memory system, where the program performs the steps of:
grouping sets of data in the cache memory system into two or more virtual partitions; and
processing a load/store instruction in accordance with the two or more virtual partitions, where the load/store instruction specifies at least one of the two or more virtual partitions to which the load/store instruction is assigned.
13. The computer readable medium of claim 12, wherein the processing comprises:
checking at least a subset of the two or more virtual partitions for a data item referenced by the load/store instruction;
fetching the data item referenced by the load/store instruction from one of the two or more virtual partitions, if the data item is located therein and the load/store instruction specifies a load operation;
storing the data item referenced by the load/store instruction in one of the two or more virtual partitions, if the load/store instruction specifies a store operation; and
loading the data item to the at least one of the two or more virtual partitions to which the load/store instruction is assigned, if the data item is not located in the subset of the two or more virtual partitions.
14. The computer readable medium of claim 13, wherein the checking comprises:
checking each of the two or more virtual partitions for the data item.
15. The computer readable medium of claim 13, wherein the checking comprises:
identifying one or more sets in the cache memory system that comprise the at least one of the two or more virtual partitions to which the load/store instruction is assigned; and
checking the one or more sets for the data item.
16. The computer readable medium of claim 12, wherein the at least one of the two or more virtual partitions to which the load/store instruction is assigned is specified by an identifier in the load/store instruction.
17. The computer readable medium of claim 16, wherein the load/store instruction comprises a number of extra bits for specifying the identifier.
18. The computer readable medium of claim 12, wherein the grouping comprises:
identifying one or more load/store instructions that are encountered most frequently from among a plurality of load/store instructions;
identifying one or more data items accessed by the one or more load/store instructions;
modeling the one or more load/store instructions as a graph; and
assigning each of the one or more load/store instructions to at least one virtual partition in accordance with the graph.
19. The computer readable medium of claim 18, wherein the modeling comprises:
dividing each of the one or more load/store instructions into one or more sub-instructions, each of the one or more sub-instructions comprising a node of the graph;
calculating an affinity metric for each pair of load/store instructions within the one or more load/store instructions;
calculating a conflict metric for each pair of load/store instructions; and
assigning weights to edges of the graph in accordance with the affinity metric and the conflict metric, the weights indicating whether each pair of load/store instructions should be assigned to a common virtual partition or to different virtual partitions.
20. A cache controller for controlling access to a cache memory, the cache controller comprising:
means for grouping sets of data in the cache memory system into two or more virtual partitions; and
means for processing a load/store instruction in accordance with the two or more virtual partitions, where the load/store instruction specifies at least one of the two or more virtual partitions to which the load/store instruction is assigned.
US11/482,923 2006-07-07 2006-07-07 Method and apparatus for application-specific dynamic cache placement Abandoned US20080010413A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US11/482,923 US20080010413A1 (en) 2006-07-07 2006-07-07 Method and apparatus for application-specific dynamic cache placement
US12/165,178 US7836256B2 (en) 2006-07-07 2008-06-30 Method and apparatus for application-specific dynamic cache placement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/482,923 US20080010413A1 (en) 2006-07-07 2006-07-07 Method and apparatus for application-specific dynamic cache placement

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US12/165,178 Continuation US7836256B2 (en) 2006-07-07 2008-06-30 Method and apparatus for application-specific dynamic cache placement

Publications (1)

Publication Number Publication Date
US20080010413A1 true US20080010413A1 (en) 2008-01-10

Family

ID=38920322

Family Applications (2)

Application Number Title Priority Date Filing Date
US11/482,923 Abandoned US20080010413A1 (en) 2006-07-07 2006-07-07 Method and apparatus for application-specific dynamic cache placement
US12/165,178 Expired - Fee Related US7836256B2 (en) 2006-07-07 2008-06-30 Method and apparatus for application-specific dynamic cache placement

Family Applications After (1)

Application Number Title Priority Date Filing Date
US12/165,178 Expired - Fee Related US7836256B2 (en) 2006-07-07 2008-06-30 Method and apparatus for application-specific dynamic cache placement

Country Status (1)

Country Link
US (2) US20080010413A1 (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110022773A1 (en) * 2009-07-27 2011-01-27 International Business Machines Corporation Fine Grained Cache Allocation
US20110055827A1 (en) * 2009-08-25 2011-03-03 International Business Machines Corporation Cache Partitioning in Virtualized Environments
US20130290687A1 (en) * 2011-12-23 2013-10-31 Elmoustapha Ould-Ahmed-Vall Apparatus and method of improved permute instructions
US20150067265A1 (en) * 2013-09-05 2015-03-05 Privatecore, Inc. System and Method for Partitioning of Memory Units into Non-Conflicting Sets
US9588764B2 (en) 2011-12-23 2017-03-07 Intel Corporation Apparatus and method of improved extract instructions
US9619236B2 (en) 2011-12-23 2017-04-11 Intel Corporation Apparatus and method of improved insert instructions
US9632980B2 (en) 2011-12-23 2017-04-25 Intel Corporation Apparatus and method of mask permute instructions
US9639482B2 (en) 2011-09-13 2017-05-02 Facebook, Inc. Software cryptoprocessor
US9734092B2 (en) 2014-03-19 2017-08-15 Facebook, Inc. Secure support for I/O in software cryptoprocessor
US9747450B2 (en) 2014-02-10 2017-08-29 Facebook, Inc. Attestation using a combined measurement and its constituent measurements
US9946540B2 (en) * 2011-12-23 2018-04-17 Intel Corporation Apparatus and method of improved permute instructions with multiple granularities
US9983894B2 (en) 2013-09-25 2018-05-29 Facebook, Inc. Method and system for providing secure system execution on hardware supporting secure application execution
US10049048B1 (en) 2013-10-01 2018-08-14 Facebook, Inc. Method and system for using processor enclaves and cache partitioning to assist a software cryptoprocessor
US20190026088A1 (en) * 2016-02-23 2019-01-24 Intel Corporation Optimizing structures to fit into a complete cache line
US20190102302A1 (en) * 2017-09-29 2019-04-04 Intel Corporation Processor, method, and system for cache partitioning and control for accurate performance monitoring and optimization
US10635590B2 (en) * 2017-09-29 2020-04-28 Intel Corporation Software-transparent hardware predictor for core-to-core data transfer optimization
US11221965B2 (en) * 2018-07-17 2022-01-11 SK Hynix Inc. Cache memory, memory system including the same, and eviction method of cache memory
US20220197993A1 (en) * 2022-03-11 2022-06-23 Intel Corporation Compartment isolation for load store forwarding
US11513839B2 (en) * 2018-05-07 2022-11-29 Micron Technology, Inc. Memory request size management in a multi-threaded, self-scheduling processor

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ITPD20050146A1 (en) 2005-05-20 2006-11-21 Fidia Farmaceutici RE-ABSORBABLE FILLERS CONSISTING OF LIPOSOMAS AND HYALURONIC ACID AND OR ITS DERIVATIVES
US20080145415A1 (en) * 2006-04-21 2008-06-19 Lanfranco Callegaro Bioresorbable Fillers Constituted by Phospholipid Liposomes and Hyaluronic Acid and/or the Derivatives Thereof
US20180357544A1 (en) * 2017-06-08 2018-12-13 International Business Machines Corporation Optimizing tree-based convolutional neural networks

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5634110A (en) * 1995-05-05 1997-05-27 Silicon Graphics, Inc. Cache coherency using flexible directory bit vectors
US6295580B1 (en) * 1997-01-30 2001-09-25 Sgs-Thomson Microelectronics Limited Cache system for concurrent processes
US6662272B2 (en) * 2001-09-29 2003-12-09 Hewlett-Packard Development Company, L.P. Dynamic cache partitioning
US20050055510A1 (en) * 2002-10-08 2005-03-10 Hass David T. Advanced processor translation lookaside buffer management in a multithreaded system
US20050240733A1 (en) * 2004-04-22 2005-10-27 International Business Machines Corporation Apparatus and method for selecting instructions for execution based on bank prediction of a multi-bank cache
US7451272B2 (en) * 2004-10-19 2008-11-11 Platform Solutions Incorporated Queue or stack based cache entry reclaim method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5634110A (en) * 1995-05-05 1997-05-27 Silicon Graphics, Inc. Cache coherency using flexible directory bit vectors
US6295580B1 (en) * 1997-01-30 2001-09-25 Sgs-Thomson Microelectronics Limited Cache system for concurrent processes
US6453385B1 (en) * 1997-01-30 2002-09-17 Sgs-Thomson Microelectronics Limited Cache system
US6662272B2 (en) * 2001-09-29 2003-12-09 Hewlett-Packard Development Company, L.P. Dynamic cache partitioning
US6865647B2 (en) * 2001-09-29 2005-03-08 Hewlett-Packard Development Company, L.P. Dynamic cache partitioning
US20050055510A1 (en) * 2002-10-08 2005-03-10 Hass David T. Advanced processor translation lookaside buffer management in a multithreaded system
US20050240733A1 (en) * 2004-04-22 2005-10-27 International Business Machines Corporation Apparatus and method for selecting instructions for execution based on bank prediction of a multi-bank cache
US7451272B2 (en) * 2004-10-19 2008-11-11 Platform Solutions Incorporated Queue or stack based cache entry reclaim method

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8543769B2 (en) 2009-07-27 2013-09-24 International Business Machines Corporation Fine grained cache allocation
US20110022773A1 (en) * 2009-07-27 2011-01-27 International Business Machines Corporation Fine Grained Cache Allocation
US20110055827A1 (en) * 2009-08-25 2011-03-03 International Business Machines Corporation Cache Partitioning in Virtualized Environments
US8739159B2 (en) 2009-08-25 2014-05-27 International Business Machines Corporation Cache partitioning with a partition table to effect allocation of shared cache to virtual machines in virtualized environments
US8745618B2 (en) * 2009-08-25 2014-06-03 International Business Machines Corporation Cache partitioning with a partition table to effect allocation of ways and rows of the cache to virtual machine in virtualized environments
US9639482B2 (en) 2011-09-13 2017-05-02 Facebook, Inc. Software cryptoprocessor
US9632980B2 (en) 2011-12-23 2017-04-25 Intel Corporation Apparatus and method of mask permute instructions
US11354124B2 (en) 2011-12-23 2022-06-07 Intel Corporation Apparatus and method of improved insert instructions
US9588764B2 (en) 2011-12-23 2017-03-07 Intel Corporation Apparatus and method of improved extract instructions
US9619236B2 (en) 2011-12-23 2017-04-11 Intel Corporation Apparatus and method of improved insert instructions
US10467185B2 (en) 2011-12-23 2019-11-05 Intel Corporation Apparatus and method of mask permute instructions
US10459728B2 (en) 2011-12-23 2019-10-29 Intel Corporation Apparatus and method of improved insert instructions
US9658850B2 (en) * 2011-12-23 2017-05-23 Intel Corporation Apparatus and method of improved permute instructions
US10474459B2 (en) 2011-12-23 2019-11-12 Intel Corporation Apparatus and method of improved permute instructions
US11347502B2 (en) 2011-12-23 2022-05-31 Intel Corporation Apparatus and method of improved insert instructions
US9946540B2 (en) * 2011-12-23 2018-04-17 Intel Corporation Apparatus and method of improved permute instructions with multiple granularities
US20130290687A1 (en) * 2011-12-23 2013-10-31 Elmoustapha Ould-Ahmed-Vall Apparatus and method of improved permute instructions
US10719316B2 (en) 2011-12-23 2020-07-21 Intel Corporation Apparatus and method of improved packed integer permute instruction
US11275583B2 (en) 2011-12-23 2022-03-15 Intel Corporation Apparatus and method of improved insert instructions
US10037282B2 (en) 2013-09-05 2018-07-31 Facebook, Inc. System and method for partitioning of memory units into non-conflicting sets
US9477603B2 (en) * 2013-09-05 2016-10-25 Facebook, Inc. System and method for partitioning of memory units into non-conflicting sets
US20150067265A1 (en) * 2013-09-05 2015-03-05 Privatecore, Inc. System and Method for Partitioning of Memory Units into Non-Conflicting Sets
US9983894B2 (en) 2013-09-25 2018-05-29 Facebook, Inc. Method and system for providing secure system execution on hardware supporting secure application execution
US10049048B1 (en) 2013-10-01 2018-08-14 Facebook, Inc. Method and system for using processor enclaves and cache partitioning to assist a software cryptoprocessor
US9747450B2 (en) 2014-02-10 2017-08-29 Facebook, Inc. Attestation using a combined measurement and its constituent measurements
US9734092B2 (en) 2014-03-19 2017-08-15 Facebook, Inc. Secure support for I/O in software cryptoprocessor
US20190026088A1 (en) * 2016-02-23 2019-01-24 Intel Corporation Optimizing structures to fit into a complete cache line
US10761819B2 (en) * 2016-02-23 2020-09-01 Intel Corporation Optimizing structures to fit into a complete cache line
US20190102302A1 (en) * 2017-09-29 2019-04-04 Intel Corporation Processor, method, and system for cache partitioning and control for accurate performance monitoring and optimization
US10482017B2 (en) * 2017-09-29 2019-11-19 Intel Corporation Processor, method, and system for cache partitioning and control for accurate performance monitoring and optimization
US10635590B2 (en) * 2017-09-29 2020-04-28 Intel Corporation Software-transparent hardware predictor for core-to-core data transfer optimization
US11513839B2 (en) * 2018-05-07 2022-11-29 Micron Technology, Inc. Memory request size management in a multi-threaded, self-scheduling processor
US11221965B2 (en) * 2018-07-17 2022-01-11 SK Hynix Inc. Cache memory, memory system including the same, and eviction method of cache memory
US20220197993A1 (en) * 2022-03-11 2022-06-23 Intel Corporation Compartment isolation for load store forwarding
US12019733B2 (en) * 2022-03-11 2024-06-25 Intel Corporation Compartment isolation for load store forwarding

Also Published As

Publication number Publication date
US7836256B2 (en) 2010-11-16
US20080270705A1 (en) 2008-10-30

Similar Documents

Publication Publication Date Title
US7836256B2 (en) Method and apparatus for application-specific dynamic cache placement
US7502890B2 (en) Method and apparatus for dynamic priority-based cache replacement
US6584549B2 (en) System and method for prefetching data into a cache based on miss distance
US7620781B2 (en) Efficient Bloom filter
US8370575B2 (en) Optimized software cache lookup for SIMD architectures
US5761706A (en) Stream buffers for high-performance computer memory system
US8244988B2 (en) Predictive ownership control of shared memory computing system data
US7606974B2 (en) Automatic caching generation in network applications
JP4298800B2 (en) Prefetch management in cache memory
US20070250667A1 (en) Pseudo-lru virtual counter for a locking cache
US7386679B2 (en) System, method and storage medium for memory management
KR20060049710A (en) An apparatus and method for partitioning a shared cache of a chip multi-processor
US20110320720A1 (en) Cache Line Replacement In A Symmetric Multiprocessing Computer
US8019946B2 (en) Method and system for securing instruction caches using cache line locking
KR20030025297A (en) Fast and accurate cache way selection
US20030018855A1 (en) Method and apparatus for caching with variable size locking regions
US20030097538A1 (en) Automatic program restructuring to reduce average cache miss penalty
Tian et al. Abndp: Co-optimizing data access and load balance in near-data processing
US8214601B2 (en) Purging without write-back of cache lines containing spent data
US7299318B2 (en) Method for reducing cache conflict misses
US11726918B2 (en) Dynamically coalescing atomic memory operations for memory-local computing
Ma et al. Unified-tp: A unified tlb and page table cache structure for efficient address translation
US7099998B1 (en) Method for reducing an importance level of a cache line
Ni et al. SIP: Boosting up graph computing by separating the irregular property data
CN117785737A (en) Last level cache based on linked list structure and supporting dynamic partition granularity access

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KAILAS, KRISHNAN KUNJUNNY;RAVINDRAN, RAJIV ALAZHATH;SURA, ZEHRA;REEL/FRAME:018636/0900

Effective date: 20060707

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO PAY ISSUE FEE