Nothing Special   »   [go: up one dir, main page]

CN104424132B - High performance instruction cache system and method - Google Patents

High performance instruction cache system and method Download PDF

Info

Publication number
CN104424132B
CN104424132B CN201410430931.5A CN201410430931A CN104424132B CN 104424132 B CN104424132 B CN 104424132B CN 201410430931 A CN201410430931 A CN 201410430931A CN 104424132 B CN104424132 B CN 104424132B
Authority
CN
China
Prior art keywords
instruction
block
branch
address
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410430931.5A
Other languages
Chinese (zh)
Other versions
CN104424132A (en
Inventor
林正浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xinhao Bravechips Micro Electronics Co Ltd
Original Assignee
Shanghai Xinhao Bravechips Micro Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xinhao Bravechips Micro Electronics Co Ltd filed Critical Shanghai Xinhao Bravechips Micro Electronics Co Ltd
Priority to CN201410430931.5A priority Critical patent/CN104424132B/en
Publication of CN104424132A publication Critical patent/CN104424132A/en
Application granted granted Critical
Publication of CN104424132B publication Critical patent/CN104424132B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0875Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0864Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • G06F12/0897Caches characterised by their organisation or structure with two or more cache hierarchy levels
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/45Caching of specific data in cache memory
    • G06F2212/452Instruction code

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

A high-performance instruction cache system and a method are applied to the field of processors, before a processor core executes instructions, the instructions are filled into a high-speed memory which can be directly accessed by the processor core, so that the processor core can almost acquire needed instructions in the high-speed memory every time, and the extremely high cache hit rate is achieved.

Description

High performance instruction cache system and method
Technical Field
The present invention relates to the field of computers, communications and integrated circuits.
Background
The function of the cache is to copy part of the contents in the lower-level memory into the cache, so that the contents can be quickly accessed by the higher-level memory or the processor core to ensure the continuous operation of the pipeline.
the addressing of the current cache is based on the following mode that the label in the label memory is read by using the index segment in the address for addressing and matching with the label segment in the address; and jointly addressing and reading the content in the cache by using the index segment in the address and the displacement segment in the block. If the tag read from the tag memory is the same as the tag segment in the address, then the contents read from the cache are valid, referred to as a cache hit. Otherwise, if the tag read from the tag memory is not the same as the tag segment in the address, referred to as a cache miss, the contents read from the cache are invalidated. For the multi-way set associative cache, the above operation is performed on each way set in parallel to detect which way set cache hits. The read contents corresponding to the hit way group are valid contents. If all way sets are missing, all read contents are invalid. After a cache miss, cache control logic fills the cache with content from the lower level storage medium.
In the prior art, the power consumption and speed limitations (for example, because the multi-way set associative cache structure requires that the contents and tags addressed by the same index in all the way sets be read out and compared at the same time) are limited, in order to achieve higher performance, a composition mode that the number of low-level cache way sets is larger than that of high-level cache way sets is generally adopted. Furthermore, cache misses may be classified into three types of conditions: forced misses, conflicting misses, and capacity misses. In the prior art, forced misses are inevitable except for a small fraction of the success of the prefetch.
modern cache systems are typically constructed of multiple levels of caches connected in sets of ways. New cache structures, such as: victim caching, trace caching, and prefetching are all based on and improve upon the basic cache structure described above. However, with the ever-expanding processor/memory speed gap, current architectures, particularly cache misses, have become the most serious bottleneck limiting the performance increase of modern processors.
The present invention is directed to a method and system that substantially obviates one or more of the above-described or other difficulties.
Disclosure of Invention
The invention provides a high-performance instruction caching method which is characterized in that a processor core is connected with a first memory containing executable instructions and a second memory with the speed higher than that of the first memory; the method comprises the following steps: examining the instructions being filled from the second memory to the first memory, thereby extracting instruction information including at least branch information; establishing a plurality of tracks according to the extracted instruction information; filling instructions, which are possibly executed by the processor core, of at least one or more instructions from the first memory to the second memory according to one or more of the plurality of instruction tracks; the method further includes the second memory being formed in a fully associative manner and the first memory being formed in a set associative manner.
Optionally, the tracks are in one-to-one correspondence with the instruction blocks in the second memory.
Optionally, the target address is addressed by a first-level block number, so as to determine whether the target instruction belongs to a certain instruction block of the first memory.
Optionally, the second-level block number is written into the track table by matching, and is changed to the first-level block number when the instruction in the first memory is filled into the second memory.
Optionally, scanning the track, and setting a flag bit of a block number corresponding to the active table once finding the reference to the block number of the active table; and simultaneously resetting the flag bit of each block number in the active table in sequence, so that the block number currently referred by the track is represented by the set flag bit and cannot be replaced out of the active table.
The present invention also provides a high performance instruction cache system, which is characterized in that the system comprises: a processor core to execute instructions; a first memory to store instructions required by the processor core; a second memory to store instructions required by the processor core, the second memory being faster than the first memory; a scanner to review instructions being filled from the second memory to the first memory to extract instruction information including at least branch information; a track table for storing a plurality of tracks created according to the extracted instruction information; the system further comprises: the second memory is formed in a fully-associative mode; and the first memory is constituted by a set associative manner.
Optionally, the tracks in the track table are in one-to-one correspondence with the instruction blocks in the second memory.
Optionally, each instruction block in the first memory corresponds to a primary block number.
Optionally, in the method, the instructions being filled from the first memory to the second memory are examined, so as to extract instruction information at least including branch information; establishing a plurality of tracks according to the extracted instruction information; filling instructions, which are possibly executed by the processor core, of at least one or more instructions from the first memory to the second memory according to one or more of the plurality of instruction tracks; the target address is addressed by a first level block number to determine whether the target instruction belongs to a certain instruction block of the second memory.
Optionally, in the method, when a previous instruction block or a subsequent instruction block of a sequential address corresponding to an instruction block in the first memory is already stored in the first memory, the active table stores storage location information of the previous instruction block or the subsequent instruction block corresponding to the instruction block in the first memory.
Optionally, in the method, when the instruction is located in a previous instruction block or a next instruction block of a current instruction block in the first memory, the instruction may be directly found in the first memory according to the location information of the previous instruction block or the next instruction block stored in the active table.
Optionally, in the method, a boundary determination is performed on a branch target instruction address; and according to the judgment result, giving addresses with different formats to the branch target instructions at different positions.
Optionally, in the method, if the branch target instruction address is located in a previous or next instruction block of the instruction block where the branch instruction is located in the first memory, the secondary block number of the previous or next instruction block of the instruction block where the branch instruction is located is used as the secondary block number of the branch target instruction, and the address offset portion corresponding to the first memory in the branch target instruction address is used as the offset of the branch target instruction.
Optionally, in the method, the active table content corresponding to the instruction being filled from the first memory to the second memory is stored in the mini active table; if the examination finds that the branch target instruction is positioned in different primary instruction blocks in the same secondary instruction block of the branch instruction and the primary instruction block is valid in the corresponding primary block number in the micro active table, directly taking the primary block number read from the micro active table as the primary block number of the branch target instruction; if the examination finds that the branch target instruction is positioned in different primary instruction blocks in the same secondary instruction block of the branch instruction, but the primary instruction block is invalid in the corresponding primary block number in the micro active table, directly taking the secondary block number of the branch instruction as the secondary block number of the branch target instruction; if the examination finds that the branch target instruction is positioned in the previous or next secondary instruction block of the branch instruction and the corresponding secondary block number of the previous or next secondary instruction block in the micro active table is valid, the secondary block number read out from the micro active table is directly used as the secondary block number of the branch target instruction.
Optionally, in the method, a plurality of secondary block numbers and contents of the block numbers corresponding to the secondary block numbers in the active table are stored in the mini active table; when the examination finds the branch target instruction, firstly matching the address of the branch target instruction in the micro active table, if the matching is successful, directly taking the primary block number or the secondary block number read from the micro active table as the primary block number or the secondary block number of the branch target instruction; if the matching is not successful, the branch target instruction address is sent to the active table for matching.
Optionally, the system includes: a processor core to execute instructions; a first memory to store instructions required by the processor core; a second memory to store instructions required by the processor core, the second memory being faster than the first memory; a scanner to audit instructions being filled from a first memory to a second memory to extract instruction information including at least branch information; a track table for storing a plurality of tracks created according to the extracted instruction information; each instruction block in the second memory corresponds to a primary block number.
Optionally, in the system, entries of the active table correspond to the instruction blocks in the first memory one to one, and each entry stores a block address of a corresponding instruction block in the first memory; and when the previous instruction block or the next instruction block of the sequential address corresponding to one instruction block in the first memory is already stored in the first memory, the active table also stores the storage location information of the previous instruction block or the next instruction block corresponding to the instruction block in the first memory.
Optionally, in the system, the boundary of the branch target instruction address is determined; and according to the judgment result, giving addresses with different formats to the branch target instructions at different positions.
Optionally, the system comprises a single or a plurality of adders; the adder is used for adding the low order bits of the branch instruction in the part outside the offset corresponding to the first memory and the corresponding bits in the branch transfer distance, and judging whether the branch target instruction is positioned in the previous or next instruction block of the instruction block sequence address of the branch instruction in the first memory; when the branch target instruction is located in a previous instruction block or a next instruction block of the current instruction block in the first memory, the instruction can be directly found in the first memory according to the location information of the previous instruction block or the next instruction block stored in the active table.
Optionally, the system further comprises a micro active meter; the micro active table is used for storing active table contents corresponding to the instructions which are being filled from the first memory to the second memory; when the scanner finds that the branch target instruction is positioned in different primary instruction blocks in the same secondary instruction block of the branch instruction through examination and the corresponding primary block number of the primary instruction block in the micro active table is valid, the primary block number read out from the micro active table is directly used as the primary block number of the branch target instruction; if the examination finds that the branch target instruction is positioned in different primary instruction blocks in the same secondary instruction block of the branch instruction, but the primary instruction block is invalid in the corresponding primary block number in the micro active table, directly taking the secondary block number of the branch instruction as the secondary block number of the branch target instruction; if the examination finds that the branch target instruction is positioned in the previous or next secondary instruction block of the branch instruction and the corresponding secondary block number of the previous or next secondary instruction block in the micro active table is valid, the secondary block number read out from the micro active table is directly used as the secondary block number of the branch target instruction.
Optionally, the system further comprises a micro active meter; the micro active table is used for storing a plurality of secondary block numbers and the corresponding contents of the block numbers in the active table; when the scanner examines and finds the branch target instruction, firstly matching the address of the branch target instruction in the micro active table, if the matching is successful, directly taking the primary block number or the secondary block number read from the micro active table as the primary block number or the secondary block number of the branch target instruction; if the matching is not successful, the branch target instruction address is sent to the active table for matching. Optionally, scanning the track table, and setting a flag bit of a block number corresponding to the active table once finding the reference to the block number of the active table; and simultaneously resetting the flag bit of each block number in the active table in sequence, so that the set flag bit represents the block number currently referred by the track table, and the block number cannot be replaced by the active table.
Other aspects encompassed by the present invention will be understood and appreciated by those skilled in the art in light of the present specification, claims and appended drawings.
Advantageous effects
The system and method of the present invention may provide a basic solution for a cache structure used by digital systems. Unlike conventional cache systems that fill only after a cache miss, the system and method of the present invention fills the instruction cache before the processor executes an instruction, which can sufficiently hide the forced miss. In addition, the system and the method of the invention essentially adopt a fully associative structure for the first-level cache and a group associative structure for the second-level cache, thereby achieving the effect similar to the fully associative structure, avoiding capacity loss and simultaneously improving the running speed of the processor. Because the system and the method of the invention need less matching operation times and have lower missing rate, the power consumption is also obviously reduced compared with the traditional cache system. Other advantages and applications of the present invention will be apparent to those skilled in the art.
Drawings
FIG. 1 is a block diagram of an instruction prefetch architecture in the form of a multi-way set for a level two cache according to the present invention.
FIG. 2 is an embodiment of the movement of the tracker read pointer of the present invention.
FIG. 3 is an embodiment of the relationship between the primary instruction block, the secondary instruction block, and the corresponding memory locations of the present invention.
FIG. 4 is a specific embodiment of the present invention in which the second level cache is formed in two-way set.
FIG. 5 is another embodiment of the present invention in which the level two cache is formed in two-way set.
FIG. 6 is another embodiment of the scanner architecture of the present invention.
FIG. 7 is a diagram of the memory and format used in a micro active table organized in a fully associative manner.
FIG. 8 is one embodiment of a fully associative micro active table.
Detailed Description
The high-performance cache system and method according to the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. Advantages and features of the present invention will become apparent from the following description and from the claims. It is to be noted that the drawings are in a very simplified form and are not to precise scale, which is merely for the purpose of facilitating and distinctly claiming the embodiments of the present invention.
For clarity of description, various embodiments are described to further illustrate various implementations of the present invention, wherein the embodiments are illustrative and not exhaustive. In addition, for simplicity of description, the contents mentioned in the previous embodiments are often omitted in the following embodiments, and therefore, the contents not mentioned in the following embodiments may be referred to the previous embodiments accordingly.
While the invention is amenable to various modifications and alternative forms, specifics thereof have been shown by way of example in the drawings and will be described in detail. It is to be understood that the inventor's point of departure is not intended to limit the invention to the particular embodiments illustrated, but, on the contrary, the inventor's point of departure is intended to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims. The same component numbers may be used throughout the drawings to refer to the same or like parts.
In addition, although the cache system including the Processor core is described as an example in this specification, the present invention may be applied to a cache system including any suitable Processor (Processor). For example, the Processor may be a General Processor (CPU), a Microcontroller (MCU), a Digital Signal Processor (DSP), an image Processor (GPU), a System On Chip (SOC), an Application Specific Integrated Circuit (ASIC), or the like.
FIG. 1 is a block diagram 100 illustrating an instruction prefetch architecture in the form of a multi-way set for a level two cache according to the present invention. As shown in fig. 1, the structure diagram 100 includes an Active list (Active list)104, a scanner 108, a track list (Tracktable)110, a Tracker (Tracker)114, a second level instruction Cache (L2 Cache)106, a first level instruction Cache (L1 Cache)112, and a processor Core 116(CPU Core). It should be understood that various components are listed here for convenience of description, other components may be included, and some components may be omitted. The various components herein may be distributed among multiple systems, may be physically present or virtual, and may be implemented in hardware (e.g., integrated circuits), software, or a combination of hardware and software.
The Instruction Address (Instruction Address) refers to a storage Address of an Instruction in the main memory, that is, the Instruction can be found in the main memory according to the Address. For simplicity and clarity, the virtual address is assumed to be equal to the physical address, and the method of the present invention is also applicable to the case where address mapping is required.
In the present invention, a Branch instruction (Branch instruction) or Branch Source (Branch Source) refers to any suitable form of instruction that causes processor core 116 to change an Execution Flow (e.g., execute an instruction out of order). The Branch source Address (Branch Souce Address) may be the instruction Address of the Branch instruction itself; a Branch Target (Branch Target) refers to a Target instruction to which a Branch caused by a Branch instruction is diverted, and a Branch Target Address (Branch Target Address) may refer to an Address into which a Branch is taken when the Branch of the Branch instruction successfully occurs, that is, an instruction Address of the Branch Target instruction; a current instruction may refer to an instruction currently being executed or fetched by a processor core; the current instruction block may refer to the instruction block containing the instruction currently being executed by the processor.
In this embodiment, the first level instruction cache 112 is formed by fully associative, each memory line of the first level instruction cache 112 is called a first level instruction block, and the first level instruction cache 112 stores at least one first level instruction block of a segment of consecutive instructions including the current instruction. The level one instruction cache 112 contains a plurality of level one instruction blocks, each level one instruction block containing a plurality of instructions, each level one instruction block stored in the level one instruction cache 112 has a level one block number (BNX1), the level one block number BNX1 is the line number of the level one instruction block in the level one instruction cache 112. The second level instruction cache 106 is formed by two identical memories 126 and 128, each of which forms a way set, each way set having the same number of lines, i.e., formed in a two-way set. Each memory line in memories 126 and 128 is referred to as a level two instruction block, and each level two instruction block has a level two block number (BNX2) that is determined by the line number in the level two instruction cache in which the level two instruction block resides and the cache way set in which it resides, i.e., the index (index) bit of the instruction line address plus the cache way set bit indicating the instruction resides. Each secondary instruction block includes a plurality of primary instruction blocks. The second level block number described herein refers to the location of the second level instruction block in the second level instruction cache 106.
Level two instruction cache 106 and level one instruction cache 112 may comprise any suitable storage devices, such as: registers (registers) or register files (register files), static memory (SRAM), dynamic memory (DRAM), flash memory (flash memory), hard disks, Solid State disks (Solid State disks), and any suitable storage device or future new State memory. The second level instruction cache 106 may operate as a cache of the system, or as a first level cache when other caches exist; and may be partitioned into a plurality of Memory segments called Memory blocks for storing data to be accessed by processor core 116, such as instructions in an Instruction Block (Instruction Block).
Active table 104 includes two tag arrays 118 and 120 and two storage arrays 122 and 124 that store first-level block numbers BNX 1. Since the second level instruction cache 106 is formed in two-way set, the active table is also formed in two-way set. One tag array and one memory array in the active table 104 correspond to one way set of the second level instruction cache 106, i.e., tag array 118, memory array 122, and second level cache way set 126, and tag array 120, memory array 124, and second level cache way set 128. The elements that make up the storage arrays 122 and 124 are referred to as entries, each entry storing a primary block number BNX1 and a Valid bit (Valid bit) to preserve the relationship of the primary instruction block in the primary instruction cache and the secondary instruction cache. Since each level two instruction block includes a plurality of level one instruction blocks, each row of the storage arrays 122 and 124 in the active table 104 includes a plurality of entries in which is stored the row number BNX1 in the level one instruction cache 112 at which the level one instruction block in the level two instruction blocks is located.
The scanner 108 examines the first-level instruction block filled from the second-level instruction cache 106 into the first-level instruction cache 112, obtains the instruction type information, and determines whether the instruction is a branch instruction or a non-branch instruction. And if the instruction is judged to be the branch instruction, calculating the target address of the branch instruction. The calculation method comprises the step of adding the current instruction address to the branch transfer distance through an adder to obtain the target address of the branch instruction. The calculated target address of the branch instruction is then sent to the active table 104 for matching.
In the present embodiment, each row of the track table 110 corresponds to each row of the level one instruction cache 112 one by one, and all are pointed to by the same row pointer. Each row of the Track table 110 includes a plurality of Track points (Track points), each Track Point corresponds to one instruction in one row of the primary instruction cache 112, that is, the number of Track points in each row in the Track table is consistent with the number of instructions in each row in the primary instruction cache. A trace point is an entry in the trace table and may contain information of at least one instruction, such as instruction type information, branch target address, and the like. In the invention, the track table address of the track point is related to the instruction address of the instruction represented by the track point (coreespond); the branch instruction track point contains the address of the branch target, and the address is related to the branch target instruction address. A plurality of consecutive trace points corresponding to an instruction block formed by a series of consecutive instructions in the level one instruction cache 112 is referred to as a track. The instruction block and the corresponding track are indicated by the same primary block number (BNX 1). The total number of track points in a track may be equal to the total number of entries in a row in the track table 110. The track table 110 may have other organization forms.
When the processor core 116 fetches an instruction from the first-level instruction cache 112 on demand, assuming that the instruction is not stored in the first-level instruction cache 112 and the second-level instruction cache 106 at that time, the instruction is filled from the lower-level memory into a second-level instruction block pointed to by a second-level block number BNX2 determined by a replacement algorithm (e.g., LRU) in the second-level instruction cache 106, based on the instruction address (PC); the corresponding first-level instruction block in the second-level cache 106 is then filled into the first-level instruction cache 112 to the memory line pointed to by BNX1 as determined by the replacement algorithm (e.g., LRU) based on the needs of the processor core 116. The replacement algorithm may also adopt an existing algorithm such as a first-in-first-out algorithm (FIFO), a least recently used algorithm (LRU), a Random replacement algorithm (Random), and the like. In this process, the scanner 108 examines the instruction types in the primary instruction block, extracts the branch information of the branch instruction therein, and calculates the branch instruction target address. The calculation method comprises the step of adding the current instruction address to the branch transfer distance through an adder to obtain the target address of the branch instruction. Here, the term "Fill (Fill)" means to move instructions from a lower level memory to a higher level memory.
Whether the branch target instruction is already stored in the secondary instruction memory 106 may be determined by matching the branch target instruction address examined, calculated by the scanner 108, with the instruction row address stored in the active table 104. Reading two labels stored in an active table by using index bits of a calculated branch target instruction address, comparing the two labels with label bits of the calculated target branch instruction address, selecting an entry corresponding to the instruction in a road group which is successfully matched by using a block Offset of the calculated branch target address if one of the labels is successfully matched, and writing a first-level block number (BNX1) stored in the active table and a calculated Offset (Offset) of the branch target address into a track table together if the first-level block number (BNX1) stored in the entry is valid and indicates that the target branch instruction is stored in a first-level instruction cache 112, wherein the writing position is a track point of the track table corresponding to the branch source address; if the first level block number (BNX1) stored in the entry is invalid, indicating that the target branch instruction is not stored in the first level instruction cache 112, but is only stored in the second level instruction cache 106, writing the second level block number BNX2 corresponding to the instruction into the track table along with the calculated block offset of the branch target address and the branch target address offset, wherein the writing position is the track point of the track table corresponding to the branch source address; if neither tag match is successful, indicating that the instruction line in which the branch target information is located has not been filled into the second level instruction memory 106, then the instruction is filled from the lower level memory into the second level instruction block pointed to by the second level block number BNX2 determined by the replacement algorithm (e.g., LRU) in the second level instruction cache 106 according to the calculated branch target instruction address, and the second level block number BNX2 is written into the track table together with the calculated block offset of the branch target address and the branch target address offset, where the write location is the track point of the track table corresponding to the branch source address. The matching (Match) described in the present invention refers to comparing two values, and when the two values are the same or equal, the matching is 'successful (Match)', otherwise the matching is 'unsuccessful (Not Match)'.
The position information of track points (instructions) in the track table can be represented by a first address and a second address; the first Address indicates a block number of an instruction corresponding to the track point (pointing to one track in the track table and a corresponding first-level instruction block in the first-level instruction cache), and the second Address indicates a relative position (Offset) of the track point (i.e., the corresponding instruction) in the track (storage block). The first address and the second address correspond to a track point in the track table, and the corresponding track point can be found from the track table according to the first address and the second address. If the type of the track point represents a branch instruction, the track of the branch target may be determined according to a first address contained in the content stored in the entry in the track table, and a specific track point of the target track may be determined according to a second address. Thus, the track table becomes a table representing a branch instruction by the track table entry address corresponding to the branch source address and the table entry content corresponding to the branch target address.
In order to establish a connection with the sequential execution of the next track in a track of the track table 110, an ending track point is set after each track represents the track point of the last instruction, wherein the first address of the sequential execution of the next track (instruction block) is stored. If multiple instruction blocks can be stored in the level one instruction cache 112, the next instruction block for sequential execution is also fetched into the instruction read buffer for the processor core 116 to read for execution when the current instruction block is executed. The instruction address of the next instruction block may be determined by adding the instruction address of the current instruction block to the address length of one instruction block. The address is sent to the active table 104 for matching as previously described and the resulting instruction block is filled into the instruction block of the level one instruction cache 112 as indicated by the replacement algorithm. The instructions newly stored in the next instruction block in the level one instruction cache 112 are also scanned by the scanner 108 and the fetch information fills the track indicated by the level one block number BNX1 as previously described. The replacement algorithm may also adopt an existing algorithm such as a first-in-first-out algorithm (FIFO), a least recently used algorithm (LRU), a Random replacement algorithm (Random), and the like.
Tracker 114 is primarily comprised of selector 130, register 132, and incrementer 134. The read pointer of the tracker 114 points to the first branch instruction track point located after the current instruction in the track table 110 where the current instruction is located; or to an ending trace point of the track in the event that there is no branch trace point on the track following the current instruction. The read pointer of the tracker 114 is composed of a first address pointer and a second address pointer, wherein the value of the first address pointer is a first-level block number of a first-level instruction block where the current instruction is located, i.e. a row pointer; the second address pointer points to the first branch instruction trace point or the end trace point after the current instruction on the track.
when processor core 116 fetches instructions on demand from level one instruction cache 112, level one block number BNX1 is provided by tracker 114 for addressing the level one instruction block, the offset is provided by the processor to fetch the corresponding instruction, and the BRANCH and TAKEN signals are provided to tracker 114. The BRANCH signal indicates whether the instruction is a BRANCH instruction, and the TAKEN signal is used to control the output of the selector. Tracker 114 is configured to indicate the first branch instruction following the current instruction, or to point to the end trace point of the track if there is no branch trace point following the current instruction on the track, and to provide processor core 116 with the first block number BNX1 of the current instruction.
When the track point store pointed to by the read pointer of tracker 114 contains a level one block number BNX1, indicating that the corresponding instruction is already stored in level one instruction cache 112, processor core 116 fetches the instruction directly from level one instruction cache 112 when it is executed. When the contents of the track point storage pointed by the read pointer contain a secondary block number BNX2, the active table is searched by taking the secondary block number BNX2 as the address of the active table. If the first level block number BNX1 stored in the entry corresponding to the second level block number is already valid, indicating that the target address of another branch instruction prior to execution of the instruction is the same as the instruction address corresponding to the second level block number, and that the target instruction has been fetched into first level instruction cache 112, thus writing the first level block number BNX1 to the trace point, and waiting until the instruction is executed, processor core 116 fetches the instruction directly into first level instruction cache 112; if the first level block number BNX1 stored in the entry corresponding to the second level block number is invalid, indicating that the target instruction is not in the first level instruction cache 112, then the first level block number BNX1 is determined, the target instruction line is fetched from the second level instruction cache 106, filled into the first level instruction block corresponding to the first level instruction cache 112, and the first level block number BNX1 is written into the entries corresponding to the memory arrays 122 and 124 of the active table 104, and upon execution of the instruction, the processor core 116 fetches the instruction directly into the first level instruction cache 112.
If the branch instruction pointed to by tracker 114 does not branch, then the read pointer of tracker 114 points to the first branch instruction trace point after the branch instruction, or points to the end trace point of the track if there is no branch instruction trace point in the trace points after the branch instruction. The processor core reads the sequential instructions after the branch instruction for execution.
If the branch instruction pointed to by the tracker 114 successfully branches, the branch target instruction block read from the instruction memory 106 is stored in the instruction block specified by the buffer replacement logic in the instruction read buffer 112, and the new track information generated by the scanner 108 is filled in the corresponding track of the track table 110. At this time, the first address and the second address of the branch target become new tracker address pointers and point to the track points corresponding to the branch target in the track table. The new tracker address pointer also points to the newly filled branch instruction block, making it the new current instruction block. The processor core selects the required instruction from the new current instruction block using the offset bits of the instruction address (PC). The tracker 114 then moves the read pointer to point to the first branch instruction trace point after the branch target instruction in the track corresponding to the new current instruction block, or to point to the end trace point of the track if there is no branch instruction trace point in the trace points after the branch target instruction.
If the tracker 114 points to an end track point in a track, the read pointer of the tracker 114 is updated to the position content value in the end track point, i.e. points to the first track point of the next track, thereby pointing to the new current instruction block. Tracker 114 then moves the read pointer to point to the first branch instruction track point in the track corresponding to the new current instruction block, or to point to the end track point of the track if the track has no branch instruction track points. Repeating the above process in sequence, the instruction may be filled into instruction read buffer 112 before processor core 116 executes the instruction, so that processor core 116 does not need to wait when fetching the instruction, thereby improving processor performance.
FIG. 2 is an embodiment 200 of the movement of the tracker read pointer according to the present invention. In this embodiment, the tracker read pointer moves past a non-branch instruction in the track table to the next branch point in the track table and waits for the processor core 116 to branch. For convenience of explanation, parts of the components that are not relevant to the explanation of the present embodiment are omitted in fig. 2. In the present embodiment, it is assumed that the instruction types stored in the track table 110 and the instruction information stored therein are arranged from left to right according to the instruction addresses from small to large, that is, when the instructions are executed in sequence, the access sequence of each instruction information and the corresponding instruction type is from left to right. Further, it is assumed that the instruction type "0" in the track table 110 represents that the corresponding instruction in the track table 110 is a non-branch instruction, and the instruction type "1" represents that the corresponding instruction is a branch instruction. An entry representing the type of instruction indicated by the second address 216 (offset, BNY) in a track indicated by the first address 214 (level one block number, BNX1) in the track table 110 may be read at any one time. A plurality of entries, or even all entries, representing the type of instruction in a track indicated by the first address 214 in the track table 110 may also be read at any one time.
An ending table entry is added to the right of the table entry of the instruction with the largest instruction address in each row in the track table 110 to store the address of the next instruction to be executed in sequence. The instruction type of the end entry is always set to "1". The first address of the instruction information in the end table entry is the instruction block number of the next instruction and the second address (BNY) is constant at zero and points to the first entry of the instruction track. The end entry is defined to be equivalent to an unconditional branch instruction. When the tracker points to an end table entry, an internal control signal is always generated to cause the selector 208 to select the output 230 of the track table 110; an internal control signal is also generated to cause register 210 to be updated. The internal signal may be triggered by a special bit contained in the end table entry in the track table 110; or may be triggered by the second address 216 pointing to an end entry.
In FIG. 2, the tracker 114 mainly includes a shifter 202, a leading zero counter 204, an adder 206, a selector 208, and a register 210. Wherein the shifter 202 shifts left a plurality of instruction types 218 representing a plurality of instructions read from the track table 110, the number of shift bits being determined by the second address pointer 216 output from the register 210. The leftmost Bit of the shifted instruction type 224 output by the shifter 202 is a STEP Bit. The signal of the step bit and the BRANCH signal from the processor core jointly determine the update of the register 210. The selector 208 is controlled by a control signal, TAKEN, and its output 232 is the Next Address (Next Address), which contains a first Address portion and a second Address portion. When TAKEN is "1" (branch successful), the selector 208 selects the output 230 (containing the first address and the second address of the branch target) of the track table 110 as the output 232. When TAKEN is "0" (branch not successful), selector 208 selects present first address 214 as the first address portion of output 232 and adder output 228 as the second address portion of output 232. Instruction type 224 is fed to leading zero counter 204 to count how many "0" instruction types (representing the corresponding instructions as non-branch instructions) precede the next "1" instruction type (representing the corresponding instructions as branch instructions), where the stride is counted as one bit "0" regardless of whether the stride is "0" or "1". The Number 226 (STEP Number) of leading "0" s obtained is then sent to adder 206 to be added to the second Address 216 output by register 210 to obtain the Next Branch source Address (Next Branch Address) 228. The next branch source address is the second address of the next branch instruction to the current instruction, and the previous non-branch instruction is skipped (Skip) by the tracker 114.
When the second address 216 points to an entry representing an instruction, the shifter controlled by the second address also shifts the plurality of instruction types output by the track table 110 uniformly to the left. The instruction type representing the instruction read out of the track table 110 at this time is shifted to the leftmost step in the instruction types 224. The shift instruction type 224 is fed into a leading zero counter to count the number of instructions preceding the next branch instruction. The output 226 of the leading zero counter 204 is the step size that the tracker should advance. The step size and the second address 216 are added by the adder 206 to obtain the next branch instruction address 228.
When the step bit signal in the shifted instruction type 224 is "0", which indicates that the entry in the track table 110 pointed to by the second address 216 is a non-branch instruction, the step bit signal controls the register 210 to update, and the selector 208 selects the next branch source address 228 as the second address 216 under the control of the TAKEN signal 222 of "0", and the first address 214 remains unchanged. At this time, the new first address points to the next branch instruction in the same track, and the non-branch instructions preceding the branch instruction are all crossed. The new second address controls the shifter 216 to shift the instruction type 218 so that the instruction type bit representing the branch instruction falls on the step bit at 224 for further operation.
When the step bit signal in the shifted instruction type 224 is "1", this indicates that the entry in the track table 110 to which the second address points represents a branch instruction. At this time, the step bit signal does not affect the register 210 update, and the register 210 is updated under the control of the BRANCH signal 234 from the processor core. At this point the adder output 228 is the address of the next branch instruction on the same track as the current branch instruction, while the memory output 230 is the target address of the current branch instruction.
When the BRANCH signal is "1", the output 232 of the selector 208 updates the register 210. If the TAKEN signal 222 from the processor core is "0" at this time, the representative processor core decides to select sequential execution at this branch point, and the selector 208 selects the next branch source address 228 at this time. At this point the first address 214 output by the register 210 is unchanged and the next branch source address 228 becomes the new second address 216. At this time the new first second address points to the next branch instruction in the same track. The new second address controls the shifter 216 to shift the instruction type 218 so that the instruction type bit representing the branch instruction falls on the step bit at 224 for further operation.
If the TAKEN signal 222 from the processor core is "1" at this time, which represents the decision of the processor core to jump to the branch target at this branch point, the selector selects the branch target address 230 read from the track table 110 as the first address 214 and the future second address 226 output by the register 210. At this point, BRANCH signal 234 controls register 210 to latch the first second address into a new first second address. The new first second address points to a branch target address that may not be on the same track. The new second address controls the shifter 216 to shift the instruction type 218 so that the instruction type bit representing the branch instruction falls on the step bit at 224 for further operation.
When the second address points to the track table end entry (next row entry), the internal control signals control the selector 208 to select the output 230 of the track table 110 and update the register 210 as previously described. The new first address 214 is the first address of the next track described in the end table entry of the track table 110, and the second address is zero. At this point the second address control controls shifter 216 to zero-shift instruction type 218 to begin the next operation. In this manner, the tracker 114 cooperates with the track table 110 to skip non-branch instructions in the track table and always point to branch instructions.
FIG. 3 is an embodiment 300 of the primary instruction block, secondary instruction block, and addressing relationships of the present invention. In FIG. 3, it is assumed that instruction address 301 is 40 bits long, i.e., the highest bit is 39 th bit and the lowest bit is 0 th bit, and each instruction address corresponds to a Byte (Byte). Thus, the lowest two bits 302 (i.e., bits 1 and 0) of the Instruction address 301 correspond to 4 bytes in one Instruction Word (Instruction Word). It is assumed in this embodiment that the upper 8 bits of the instruction line 301 are a process identification bit (PID)310 indicating which process is currently executing. Whether the currently executing process is stored in the instruction cache may be determined by the process identification bit 310 and, if not, a prefetch may be performed by the entire line address 301 to avoid a miss of the instruction in the instruction cache. The process identification bit 310 may not be included in the instruction address 301, and the instruction address is 32 bits in length. For illustrative purposes, the lowest two bits 302 and the highest 8 bits 310 of the instruction address are removed, and the remaining 30 bits (i.e., bits 31 to 2) form a new instruction line address 312.
In the present embodiment, it is assumed that a primary instruction block contains 16 instructions, and therefore the offset (offset)303 in the instruction line address 312 has 4 bits, and the offset is used to determine the position of an instruction in the primary instruction block. The offset 303 corresponds to the second address (BNY) described in FIG. 1, and thus can also be used to determine which track point in the track table the instruction corresponds to. Assuming again that the track table has 512 rows, the first level block number BNX1 has 9 bits whose value is determined by the row number in which it is located. Therefore, if it is determined that the branch target instruction of the branch instruction is already stored in the primary instruction cache 112 according to the foregoing method during the process of filling the primary instruction block from the secondary instruction cache 106 to the primary instruction cache 112 according to the demand of the processor 116, the corresponding primary block number stored in the active table 104 is written into the track point of the track table corresponding to the branch source instruction together with the offset 303, and the processor core 116 can directly read the branch instruction from the primary instruction cache 112 when executing the branch instruction.
In this embodiment, the tag bit 311 in the instruction row address 312 is stored in the tag array 118 or 120 in one way set of the active table 104 for comparison with the target instruction address generated by the scanner 108 to obtain the matching information. Assuming in this embodiment that the active table 104 and the secondary instruction cache blocks 126 and 128 each have 1024 rows, the index bits 307 of the corresponding instruction row address 312 have 10 bits (i.e., bits 17 through 8). The index bits 307 are used to retrieve which row of the level two instruction cache the level two instruction block is located in, and may also be used to read out the tags stored in the corresponding tag array 118 and tag array 120 in each way set of the active table 104 and the valid values stored in the corresponding entries in each way set of the active table 104. Also assuming that one secondary instruction Block stored in the secondary instruction cache Block 126 or 128 corresponds to 4 consecutive primary instruction blocks, the Block offset (Block-offset)306 has two bits, i.e., the 6 th and 7 th bits. The block offset 306 is used to select a first level instruction block from the second level instruction blocks stored in the second level cache 106, i.e., to select a valid value in which entry in the active table corresponds. Thus, the way set number of the second level instruction cache 106 in which the second level instruction block resides plus the index bits 307 of the instruction line address 312 form a second level block number BNX 2. If a level one instruction block is in the process of filling from the level two instruction cache 106 into the level one instruction cache 112 according to processor requirements, it is determined according to the foregoing method that the branch target instruction of the branch instruction is not stored in the level one instruction cache 112 but is already stored in the level two instruction cache 106, then the corresponding secondary block number BNX2 is written into the track table at the track point in the track table corresponding to the branch source instruction, together with the block offset 306 and the offset 303 of the branch target address of the branch instruction, until the tracker pointer points to that track point, the corresponding primary instruction block is filled from the secondary instruction cache 106 into the primary cache block pointed to by the primary block number BNX1 determined according to the replacement policy (e.g. LRU) in the primary instruction cache 112, the processor core 116 may read the branch instruction directly from the level one instruction cache 112 when the instruction is executed.
according to the scheme of the invention, the mapping relation of the instructions in the first-level instruction cache and the second-level instruction cache can be established. The location of an instruction in a level one instruction block stored in level one instruction cache 112 may be determined by the level one block number BNX1 plus the offset 303 of the upstream address 312; the block offset 306 in the instruction line address 312 is used to determine the location of the first level instruction block in the second level instruction block stored in the second level instruction cache 106; the index bits 307 in the instruction line address 312 plus the cache way group number in which the second level instruction block resides (i.e., the second level block number BNX2) may determine the location of the second level instruction block in the second level instruction cache 106. It should be noted that although the first-level block number BNX1 and the second-level block number BNX2 have no necessary mapping relationship, the first-level block number BNX1 is determined by a replacement algorithm (e.g., LRU algorithm) when a first-level instruction block is filled into the first-level instruction cache 112 from the second-level instruction cache 106, and the second address (BNY) indicating the location of the instruction in the first-level instruction cache and the second-level instruction cache is the same, i.e., the offset 303 of the instruction line address 312. Thus, through the addressing mode, the mapping relation of the instructions in the first-level instruction cache and the second-level instruction cache can be established.
FIG. 4 is a specific embodiment 400 of the present invention in which the level two cache is formed in two-way set. According to the technical scheme of the invention, the target instruction address calculated by the scanner 108 can be matched with the instruction address stored in the active table, so as to obtain the matching information with the instruction address, and then the secondary block number BNX2 or the primary block number BNX1 is written into the track table, so as to generate a new track.
For illustrative purposes, in the present embodiment, target instruction address 312 is illustrated as a portion of a full instruction address. Target instruction address 312 includes tag bits 311, index bits 307, block offset 306, and offset 303. The tag bit 311 is used for comparing with the tags 302 and 304 in the active table 104 to obtain matching information; index bit 307 is used to retrieve which row in the active table the address corresponds to; the block offset 306 is used to select a corresponding primary instruction block in the secondary instruction block; the offset 303 is used to determine the location of the target instruction in the primary instruction line, i.e., to provide a second address (BNY).
In this embodiment, level two instruction cache 106 is formed from two memory blocks 126 and 128 that contain a consistent number of lines, i.e., are formed in two banks. Correspondingly, the active meter is also formed in a two-way group form. Active table 104 is comprised of a first portion of tag arrays 118 and 120 and a second portion of memory blocks 408 and 410. The first part of tag arrays 118 and 120 is used to match the branch target address computed by the scanner 108 and the second part is used to store the first level block number BNX 1. Since the level two instruction blocks in each way set of the level two cache 106 correspond to 4 level one instruction blocks, a row of each way set in the active table 104 corresponds to 4 entries 408 or 410. The number of rows of the track table is consistent with that of the active table, and the number of rows of the track table is 1024. Each line of the level one instruction cache 112 contains 16 instructions, i.e., the level one instruction block contains 16 instructions, so there are 16 entries per line in the track table 110.
in the present embodiment, it is assumed that the level one instruction block fetched from the level two instruction cache 106, which contains 3 branch instructions located at items 4, 7 and 11 of the level one instruction block, is filled into line 3 of the level one instruction cache 112 according to the LRU replacement policy. In the present embodiment, assume that a value "1654" is stored in the tag of row 14 of way set 0 of active table 104 and a value "2526" is stored in the tag of row 14 of way set 1 of active table 104. Also assume that the valid bit of entry 2 corresponding to row 14 of way set 0 in the active table is "1", the valid bit of entry 3 is "0", and the valid bit of entry 2 corresponding to row 14 of way set 1 is "0".
when the scanner 108 scans the stage one instruction block, it is calculated that the target instruction address of the first branch instruction is "1654 |14|2| 3", i.e. the value of the tag bit 311 corresponding to the target instruction address 312 is "1654", the value of the index bit 307 is "14", the value of the block offset 306 is "2", and the value of the offset 303 is "3". First, according to the prior art, both valid tags in the 14 th row stored in the active table are read out by using the index bits 307, and then the read tags are sent to the comparators 420 and 422, respectively, and compared with the tag bits 311 of the branch target instruction address 312 calculated by the scanner 108, the way group "0" match is successful. The corresponding entry 2 in the active table is then selected using the block offset bit 306 of the branch target address 312, with the valid bit being "1", and the value "5" stored therein is written to the 4 th entry of row 3 in the track table, while the value "3" of the offset (BNY) is written to the 4 th entry of row 3 in the track table, i.e., "5 | 3" is written to the 4 th entry of row 3 in the track table.
When the target address of the second branch instruction calculated by the scanner 108 is "1654 |14|3| 5", the tag bit and the index bit match the values described above, the block offset 306 has a value of "3", the offset 303 has a value of "5", and the entry 3 corresponding to row 14 of way group 0 in the active table is selected by the method described above. At this point, the valid bit of entry 2 is "0", indicating that the branch instruction is not in the level one instruction cache 112, then the way group number in the active table plus the index bit 307 of the target instruction address 312 is written into the track table as a value of the second level block number (BNX2) along with the block offset 306 and offset (BNY)303, i.e., "0 |14|3| 5" is written into the track table at line 3, entry 7. "0" indicates that the instruction corresponds to way group 0 of the active table, "14" indicates that the target instruction corresponds to the 14 th row of the active table, "3" indicates that the instruction corresponds to the 3 rd entry of the active table, and "5" indicates that the instruction corresponds to the 5 th instruction in the primary instruction block.
When the target address of the third branch instruction calculated by the scanner 108 is "3546 |14|2| 8", indicating that the tag bit 311 of the target instruction address 312 has a value of "3546", the index bit 307 has a value of "14", the block offset 306 has a value of "2", and the offset 303 has a value of "8". If the matching with any entry in the active table is not successful by the method described above, indicating that the instruction is not in the secondary cache, then the corresponding instruction block is fetched into the secondary cache 106 according to the target address and into the 14 th row 2 entry in way set 1 of the secondary cache according to the LRU replacement algorithm. Then its way set number in the active table plus the index bit 307 of the target instruction address 312 is written into the track table as a value of a secondary block number (BNX2) along with the block offset 306 and offset (BNY)303, i.e., "1 |14|2| 8" is written into the track table at row 3, item 11. The replacement algorithm may also adopt an existing algorithm such as a first-in-first-out algorithm (FIFO), a least recently used algorithm (LRU), a Random replacement algorithm (Random), and the like.
When the read pointer of tracker 114 points to row 3, entry 4 of the track table, the read value "5 | 3" stored at the track point contains a level one block number indicating that the target instruction of the branch instruction is already stored in row 5 of level one cache 112, and then processor core 116 may read the instruction directly from row 5 of level one instruction cache 112 for use by processor core 116 by waiting until the instruction is executed.
In the present embodiment, it is assumed that the target instruction address of a branch instruction is "1654 |14|3| 5" and that the instruction has been executed, indicating that the instruction has been filled into level one instruction cache 112. Again assuming the target instruction is already stored in line 9 of level one instruction cache 112, then the value "9" is written into line 14, line 3 entry of active way set 0, and the valid bit of this entry is set to "1".
Thus, when the read pointer of the tracker 114 points to row 3, entry 7 of the track table, the read value "0 |14|3| 5" stored at the track point contains a secondary block number BNX2, then the way set 0 of the active table is found from the way set number "0", the row 14, entry 3 of the active table is found from the index number and the block offset, and the primary block number BNX1 stored in the entry is found to be valid at this time. Then the instruction can be read directly from line 9 in the level one cache based on the level one block number BNX1 without having to repeat the read from the level two cache. Meanwhile, the value "9" of the primary block number stored in the entry is written into the 7 th entry of row 3 of the track table, that is, a value "9 | 5" containing the information of the primary block number BNX1 is stored in the 7 th entry of row 3 of the track table 110, so as to complete the update of the track table. By the time this instruction is executed, processor core 116 may read the instruction directly from line 9 of level one instruction cache 112 for use by processor core 116.
When the read pointer of the tracker 114 points to row 3, entry 11 of the track table, the read value "1 |14|2| 8" stored at the track point contains the second level block number BNX2, and the value of the second level block number plus the block offset 306 is used as the active table address according to the method described above to find that the first level block number BNX1 stored in row 14, entry 2 in way set 1 of the active table 104 is invalid, indicating that the corresponding branch target instruction is not in the first level instruction cache 112. Therefore, the corresponding primary instruction block stored in the secondary cache 106 is filled into the primary instruction block with the primary block number BNX1 determined according to the replacement algorithm and pointed to by 38, that is, the corresponding primary instruction block stored in the secondary instruction cache 106 is filled into the 38 th row of the primary instruction cache 112, and at the same time, the value "38" is written into the 14 th row 2 entry of the way group 1 of the active table, and the valid bit of the 14 th row 2 entry of the way group 1 in the active table 104 is set to "1", and at the same time, the value "38 | 8" containing the information of the primary block number BNX1 is written into the 11 rd row entry of the 3 of the track table 110, thereby completing the update of the active table and the track table. The replacement algorithm may also adopt an existing algorithm such as a first-in-first-out algorithm (FIFO), a least recently used algorithm (LRU), a Random replacement algorithm (Random), and the like.
According to the technical scheme of the invention, a storage domain P for storing a way group number in a second-level block number of a second-level instruction block before a sequential address of a second-level instruction block corresponding to the table entry and a storage domain N for storing a way group number in a second-level block number of a second-level instruction block after the sequential address can be added in the table entry of the active table. Thus, when the scanner examines and finds that the branch target instruction of the examined branch instruction is positioned at the second-level instruction block which is one second-level instruction block before or after the sequential address of the second-level instruction block where the branch instruction is positioned, the way group number of the corresponding second-level instruction block before or after can be directly read out from the active table according to the second-level block number corresponding to the examined branch instruction, and the way group number and the result of subtracting one from or adding one from the index bit corresponding to the examined branch instruction are spliced to obtain the second-level block number of the corresponding second-level instruction block before or after, thereby avoiding the operation of sending the branch target instruction address to the active table for matching.
In the invention, when the scanner examines the first-level instruction block (referred to as the current first-level instruction block for short), if the current first-level instruction block is the last one of the second-level instruction blocks (referred to as the current second-level instruction block for short), the ending track point corresponding to the current first-level instruction block is established as before. If the second-level instruction block (the latter second-level instruction block for short) where the next first-level instruction block is located at the sequential address of the current first-level instruction block is stored in the second-level cache, directly filling a second-level block number corresponding to the latter second-level instruction block into the ending track point as track point content; if the latter second-level instruction block is not stored in the second-level cache, the latter second-level instruction block is filled into the position determined by the replacement algorithm in the second-level cache as described above, and the corresponding second-level block number is filled into the ending track point as the track point content. At this time, the second-level block number of the second-level instruction block subsequent to the sequential address of the current second-level instruction block is the second-level block number of the subsequent second-level instruction block, and the way group number in the second-level block number can be used as the storage domain content to be filled into the storage domain N in the active table entry pointed by the second-level block number (referred to as the current second-level block number for short) corresponding to the current second-level instruction block; meanwhile, the second-level block number of the second-level instruction block before the sequential address of the next second-level instruction block is the current second-level block number, and the way group number in the second-level block number can be used as the storage domain content to be filled into the storage domain P in the active table entry pointed by the second-level block number corresponding to the next second-level instruction block.
In addition, the storage domains P and N in the active table entry may also be populated or updated by the following operations. When a new second-level instruction block is filled into the second-level cache, because the current second-level instruction block has the same label as the previous or next second-level instruction block of the sequence address of the current second-level instruction block and the index bit has a difference of '1', the index bits of the second-level instruction block can be respectively reduced by one and increased by one, so that the index bit values of the previous second-level instruction block and the next second-level instruction block of the sequence address of the second-level instruction block are obtained, and the contents stored in all the way groups at the corresponding positions are read out from the active table according to the calculated index bit values. Comparing all the tags in the read content with the tags of the current secondary instruction block, if all the way groups with the index bit minus one of the current secondary instruction block are matched with the tags, the way group number in the matched table entry can be filled into a storage domain P in the active table entry pointed by the current secondary block number as the content of the storage domain, and the way group number in the current secondary block number is filled into a storage domain N in the matched table entry as the content of the storage domain; if there is a match of the labels in all the way groups added by one in the index bit of the current secondary instruction block, the way group number in the matching table entry can be filled into the storage domain N in the active table entry pointed by the current secondary block number as the content of the storage domain, and the way group number in the current secondary block number is filled into the storage domain P in the matching table entry as the content of the storage domain.
FIG. 5 is another embodiment 500 of the present invention in which the level two cache is formed in two-way set. In the present embodiment, target instruction address 312 is illustrated as a portion of a full instruction address. Assuming that a first level instruction block contains 4 instructions, the offset 303 in the instruction line address 312 has 2 bits, and this offset can be used to determine the location of an instruction in the first level instruction block, which is called BN 1Y. Assuming that the track table has 128 rows, the first-level block number BN1X (i.e., the aforementioned BNX1) has 7 bits, and its value is determined by the row number. BN1X concatenates BN1Y, referred to as BN1, so that the location of the instruction in the level one instruction cache 112 may be determined. Since one block of two-level instructions includes 4 blocks of one-level instructions, the block offset 306 is 2 bits. The block offset 306 concatenates the offset 303 referred to as BN 2Y. Assuming again that the active table has 1024 rows, index bit 307 is 10 bits plus the corresponding way set number is referred to as the secondary block number BN2X (consistent with BNX2 described above).
The structure of this embodiment is substantially the same as that in fig. 4, and the only changes are that each row in the active table 104 is added with an entry for storing the previous instruction block address and the next instruction block address of the instruction block represented by this row, and a selector for servicing the newly added entry. Each row (representing a second level cache block) in the left array at 104 is added with an entry 501 storing a second level cache block address in the previous address order and an entry 503 storing a second level cache block address in the next address order, in addition to the original entry 118 storing a tag, 4 entries 408 storing 4 first level cache block addresses in the row. Accordingly, the output of the left array, the output of the table entry 408 is still selected by the original selector 521, and the output of the selector 521 and the outputs of the newly added table entries 501 and 503 are selected by the newly added selector 531. Similarly, the right array is added with an entry 501 storing the previous second-level cache block address, an entry 503 storing the next second-level cache block address, and a selector 532 corresponding to the selector 531.
As in fig. 4, comparator 420 controls a tri-state gate to put the output of selector 531 on bus for storage in track table 110; comparator 422 controls another tri-state gate to place the output of selector 532 on the same bus to be stored in track table 110. The result of the comparisons of the tag 118, tag 120, and the input addresses determines which selector's output (which way's instruction address) is sent to the track table 110 for storage.
Since the buffer is organized in groups in this embodiment. The index address of the previous or next secondary instruction block of the current secondary instruction block may be obtained by subtracting 1 from or adding 1 to the index address of the current secondary instruction block (i.e., 307 in fig. 4), and then, in the newly added entries 501 and 502 storing the previous block address and the entries 503 and 504 storing the next block address, only the way number (way number) of the way group where the previous or next secondary instruction block of the current secondary instruction block is located needs to be stored. For convenience of description, in the following embodiments, unless otherwise specified, the "branch source instruction" is a direct branch instruction.
When a sub-block of the second-level instructions in the second-level cache 106 is filled into the first-level cache 112 according to the LRU replacement policy, the scanner 108 examines the sub-blocks of the second-level instructions from the second-level cache 106 to the first-level cache 112, and calculates a branch target address of the branch source instruction when a certain instruction in the sub-blocks of the second-level instructions is found to be a branch instruction.
In order to reduce power consumption, i.e., reduce the number of accesses to the active table 104, the frequency of accessing the active table 104 is reduced by determining whether the position of the branch target instruction exceeds the boundary of the primary instruction block, the boundary of the current secondary instruction block, and the boundary of the previous or next secondary instruction block of the current secondary instruction block in the scanner 108.
In the present embodiment, the address boundary determination of the branch target address is determined by adding the branch address offset to the lower bits of the base address. As shown in fig. 5, adding branch OFFSET (OFFSET)571 to base address low 581 extracts carry signals (574, 575, and 576) from three edges of the adder, which are treated with priority logic such that a valid 'in-bound' signal representing the largest data block invalidates the in-bound signal representing the smaller data block.
As shown in fig. 5, the lower 581 of the base address is divided into 3 parts, the first part is the offset 303 of the base address 311, the second part is the block offset 306, and the third part 579 is one bit higher than the block offset 306 in the address. Branch offset 571 is divided into two parts, with lower part 573 corresponding to lower part 581 of base address 311, and the remaining upper part 572. Similarly, the generated sum value 582 is also divided into 3 sections by the same boundary as the base address, and carry signals 574, 575, and 576 are generated at each boundary.
Taking the branch offset 571 as a positive number as an example, the method for determining the address boundary judgment condition is as follows:
1. If the upper portion 572 of the branch offset 571 is not all '0', then the branch target address calculated by the adder exceeds the next second level instruction block of the current second level instruction block. This situation is referred to as case 1.
2. If the high portion 572 of branch offset 571 is all '0' and carry signals 574, 575, and 576 are all '0', this indicates that the branch target address is in the first stage instruction block where the branch source instruction is located. This case is referred to as case 2.
3. If the high portion 572 of branch offset 571 is all '0' and carry signal 574 is '1' and carry signals 575 and 576 are '0', it indicates that the branch target address is in the second level instruction block where the branch source instruction is located. This case is referred to as case 3.
4. If the high portion 572 of branch offset 571 is all '0' and carry signal 575 is '1' and carry signal 576 is '0', it indicates that the branch target address is in a second level instruction block subsequent to the second level instruction block in which the branch source instruction is located. This case is referred to as case 4.
5. If the high portion 572 of branch offset 571 is all '0' and carry signal 576 is '1', it indicates that the branch target address is outside of the next two-level instruction block of the two-level instruction block in which the branch source instruction is located. This case is identical to case 1 and is also referred to as case 1.
For the case where branch offset 571 is a negative number, the address boundary determination may also be determined as described above. The difference is that first, it is determined whether the high portion 572 of the branch offset 571 is all '1'. The case 1 is the case when the high-level part 572 of the branch offset 571 is not all '1'; if all of the high parts 572 of branch offsets 571 are '1' and carry signals 574, 575 and 576 are '0', case 2 is described above; if the high portion 572 of branch offset 571 is all '1', and carry signal 574 is '1', and carry signals 575 and 576 are '0', case 3 is described above; if the high portion 572 of branch offset 571 is all '1', and carry signal 575 is '1', and carry signal 576 is '0', case 4 is described above; if the high portion 572 of the branch offset 571 is all '1' and the carry signal 576 is '1', case 1 is described above.
The frequency of access to the active table can be reduced based on the above relationship. When the scanner 108 scans an instruction segment, the branch target instruction address is calculated according to the BN1X base address and the PC address of the instruction segment temporarily stored in the scanner, and the calculated branch target is located in the following situations.
When the scanner 108 checks and obtains the address boundary judgment condition 1, the branch target instruction address calculated by the scanner 108 is sent to the active table 104 through the bus 507, the corresponding row is read according to the index bit therein, the read label is matched with the label of the branch target instruction address calculated by the scanner 108, and if the matching is successful, the subsequent operation is consistent with the above operation. If the matching is unsuccessful, the corresponding instruction block is taken out from the lower-level memory according to the calculated branch target address and filled into the secondary cache block determined by the replacement strategy, and the subsequent operation is consistent with the above operation.
When the scanner 108 examines address boundary determination case 2, the branch target address and the branch source address are in the same primary instruction block, i.e., the branch target instruction and the branch source instruction have the same BN 1X. At this time, the tristate gates (such as the tristate gate 541, etc.) of all way groups are forced to be turned off, and the branch source BN1X stored in the scanner and the calculated offset 303 (i.e. the branch target BN1Y) are merged into the BN1 and written into the entries of the track table 110 pointed to by the branch source BN1X and BN1Y temporarily stored in the scanner 108 via the bus 505, and when the branch source instruction is to be executed, the processor 116 can directly read the instruction from the primary cache 112 for the processor 116 to use.
When the scanner 108 examines address boundary determination case 3, the branch target address and the branch source address are in the same secondary instruction block, i.e., the branch target instruction and the branch source instruction have the same BN 2X. At this time, BN2X (including the way group number and the index bit) of the instruction Block in which the branch source instruction stored in the scanner is located is indexed out of the second storage Block (e.g., second storage Block 408 or 410) in the corresponding entry of the active table 104 via the bus 507, and the content of the corresponding storage field in the second storage Block is selected by using the calculated Block offset 306 (Block-offset). If the BN1X value stored in the storage field is valid, the tri-state gate of the set corresponding to the set number in the branch source BN2X is forced to be turned on, the tri-state gates of the other sets are turned off, the BN1X value is sent to the track table 110 via the bus 508, and at the same time, the branch target BN1Y obtained by removing the block offset 306 from the calculated branch target BN2Y value is sent to the track table 110 via the bus 505, and the two are combined into the branch target BN1 and written into the table entries pointed to by the branch source BN1X and BN1Y stored in the scanner 108 in the track table 110. If the BN1X value stored in the memory field is invalid, the tri-state gates of all the way groups are forced to be turned off, and the branch source BN2X and the calculated branch target BN2Y stored in the scanner 108 are merged into a branch source BN 22 and the branch target BN2Y is written into the table entry pointed to by the branch source BN1X and BN1Y temporarily stored in the scanner 108 in the track table 110 via the bus 505. The subsequent operations are identical to those described above.
When the scanner 108 checks and obtains the address boundary judgment condition 4, the branch target address is in the previous two-level instruction block or the next two-level instruction block of the branch source address, i.e. the index bit value of the branch target instruction is different from the index bit value of the branch source instruction by '± 1' (i.e. the index bit value of the previous two-level instruction block is different from the index bit value of the branch source instruction by '-1', and the index bit value of the next two-level instruction block is different from the index bit value of the branch source instruction by '+ 1'). At this time, the branch source BN2X (including the way group number and the index bit) stored in the scanner is indexed out of the third storage block (e.g., the third storage block 501, 502 or 502, 504) in the corresponding entry of the active table 104 via the bus 507, and according to the address boundary determination result, the corresponding storage domain P (e.g., the third storage block 501 or 50) is selected when the branch target address is in the previous secondary instruction block of the branch source address, and the corresponding storage domain N (e.g., the third storage block 503 or 504) is selected when the branch target address is in the next secondary instruction block of the branch source address. If the way group number stored in the selected storage domain is valid, the tri-state gate of the way group corresponding to the way group number in the branch source BN2X is forced to be turned on, the tri-state gates of the other way groups are turned off, the way group number is sent to the track table 110 through the bus 508, and simultaneously, the value of the new index bit obtained by performing a subtraction or addition operation on the branch source index bit stored in the scanner 108 and the calculated branch target BN2Y value are sent to the track table 110 through the bus 505, and the two are combined into the branch target BN2 and written into the table entry pointed by the branch source BN1X and BN1Y temporarily stored in the scanner 108 in the track table 110. If the way group number stored in the selected storage area is invalid, the branch target address calculated by the scanner 108 is sent to the active table 104 via the bus 506 for index matching, and the subsequent operation is consistent with the operation of the address boundary determination case 1.
The above embodiment reduces the frequency of reading tags and addresses in the active table 104, but in cases 2 and 3, it is also necessary to directly look up 408, 410 entries in a row of the active table 104 in fig. 5 with the way set number, index address 307 to obtain the first instruction address in the same two-level instruction block, or the previous second address in 501, 502 entries, or the next second address in 503, 504 entries. If the scanner 108 scans the instruction block filled from the lower-level buffer 126 or 128 to the higher-level buffer 112, the entry in the row (the same group number and the same index address as the instruction block) of the active table (104) corresponding to the instruction block is filled into the scanner 108 for temporary storage, so as to further reduce the access frequency of the active table 104. Furthermore, when the register in the scanner 108 has a plurality of independent read ports, the plurality of branch instructions in the instruction segment being scanned can be determined according to the address boundary of the respective branch target instruction at the same time, and the BN1 or BN2 type address of the branch target of the instruction can be obtained by accessing the read port allocated to the instruction to map the address boundary, so as to store the track table 110.
FIG. 6 is another embodiment 600 of the scanner architecture of the present invention. In this embodiment, each instruction block of the high level cache 112 contains 4 instructions, i.e., the offset 303BNY address is two bits; the lower level buffer 126 or 128 contains 4 higher level buffer blocks per buffer block, i.e. the block offset 306 address is also two bits. A row in the track table 104 corresponds to a lower-level cache block, and the row contains 4 entries storing the BN1X address as in 408, and an entry storing the way set number of the previous lower-level address block as in 501, and an entry storing the way set number of the next lower-level address block as in 503. A total of 4 instructions are filled into the high-level cache 112 one high-level cache block at a time from the low-level instruction cache 126 or 128. There is a decode and judge block 601 in the scanner 108, which contains 4 instruction decode and judge blocks, each containing an instruction decoder and an adder such as 607. The scanner 108 also includes a micro-actuator block 660. The entire scanner 608 can be substituted for the scanner 108 of fig. 5, and the other entry level structure is the same as that of fig. 5, where only the track table 110 is shown.
When a higher level instruction block is provided to the scanner 608 for scanning, its corresponding active table row is simultaneously read from the active table 104, along with the way set number of the row, the index value 307 (i.e., the lower level cache block number of the instruction block), and the block offset 306 are provided to the scanner 108 for temporary storage. Where the scanner 108 stores the tag table entry 118 in the active table row, and the memory of 306 is not shown in figure 6. The mini-active block 660 in the scanner 608 contains 620, 621, 622, 623 storage entries to store 4 BN1X addresses in the active table 104, e.g., 408 entries. 660 also contains 3 entries 624, 625 and 626, where 624 stores the way set number of the previous lower cache block as in 501 in 104, 625 stores the way set number and index address of the current lower cache block, and 626 stores the way set number of the next lower cache block as in 503 in 104. Where 625 the entry stores the secondary address way set number and index address 307 for the instruction block being scanned, and is read into the scanner 608 concurrently with the instruction block.
The micro-active block also has 5 selectors 570-574, 4 of which 570-573 are the same structure, and the contents of the selected entries 630-636 are directly or after being operated according to the address boundary judgment condition generated by the corresponding decoding and judgment sub-blocks to generate BN1X or BN2X addresses, and the address offset 303 calculated by an adder such as 607 and the like is used as the branch target address of the scanned instruction to be stored in the track table corresponding to the scanned instruction. The 5 th selector 574 selects the contents of entries 630-636 to fill the end track points in the track, with the selection control being different from that of selectors 570-573.
Each decoding and judging module 601 decodes and judges one of the 4 instructions corresponding to each sub-block, an instruction decoder in each sub-block decodes the instruction responsible for the sub-block, if the instruction type is not a branch instruction, the instruction type generated by decoding the sub-block is stored in an entry corresponding to the instruction in the track table, and the scanner does not calculate a branch address for the instruction. If the instruction type is a branch instruction (hereinafter referred to as a branch source instruction), the sub-block generates an address boundary determination as described above to select a branch target address, which is used to populate the entry in the track table 110 corresponding to the branch source instruction along with the decoded instruction type. The following example describes the case when the instruction decoded by the sub-block is a branch instruction.
The branch offsets are described below as positive numbers for ease of understanding. And so on for branch offsets that are negative. As before, each sub-block first determines whether all '0's are present in the bits 572 of the branch offset 571 for the instruction, and if not all '0's, the address boundary is determined to be case 1. With this determination, the instruction branch offset is added to the base address of the instruction. The base address is a combination of the tag (from 408 entries in the active table 104) temporarily stored in the scanner, the index value (i.e., 307 in the table entry present 625), the block offset 306, and the offset 303 BNY. The first 3 parts of the base addresses of the 4 instructions in the instruction block are the same, and only BNY are different. BNY of the 1 st instruction in instruction order is '0', and BNY of the following 3 instructions is '1', '2', '3' in order. The added sum is the memory address of the branch target, and the index portion 307 in this memory address is used as the address to read out each row in the left and right arrays in the active table 104. BN1X stored in one of the 4 entries 408 in the row is selected by selector 521 controlled by block offset 306 in the memory address, selected by selector 531 (fixing the output of selection 521 in the case of address boundary determination 1) to tri-state gate 541. The tag entry 118 in the row is compared with the tag portion 311 in the memory address of the branch target in the comparator 420, and if the result is the same, the same comparison enables (enable) tri-state gate 541, and the output of the tri-state gate 541 is merged with the 303 offset BNY in the memory address and sent to the corresponding entry in the track table for the scanned instruction to be stored. If the comparison result between the tag table entry 120 of the right array and the tag portion 311 of the memory address of the branch target is the same in the comparator 422, the BN1X in the address finally sent to the track table is from the table entry 410, and the principle is the same, and is not described again. The case where the high bits in the branch offset amount are all '0' will be described below.
Each decode and predicate subblock adds the branch offset 571 of the branch instruction for which it is respectively responsible to the block offset 306, 303, in the instruction base address by an adder, e.g., 607, within the subblock (higher bits in the base address, e.g., index bit 722, tag 721, are discarded). Each sub-block generates an address boundary decision based on the carry signal generated during the addition in the manner described above, and generates a control signal based on the address boundary decision to control the selector to select the appropriate value in the memory table entries 620-626 for filling the track table. Taking the first sequential instruction in a block of instructions being scanned as an example, branch offset 571 in that instruction is added to block offset 306, offset 303 in that instruction (offset 303 being '0' for the first sequential instruction) in adder 607. If the address boundary of the sub-block is determined to be case 1, the calculated memory address of the branch target is sent from the scanner 608 to the active table 104, mapped to the first-level cache address BN1, and stored in the corresponding entry of the branch source instruction in the track table, as described above.
If the address boundary determination is case 2 or case 3, the address boundary determination places the block offset 306 in the sum generated by adder 607 on control line 610 to control selector 670. If block offset 306 is '00', selector 670 selects the content in storage table entry 620, and if the valid bit of the content is 'valid', selector 670 outputs BN1X address in storage table entry 620; when the valid bit in the contents of storage entry 620 is 'invalid', selector 670 outputs the way set number, index bit 307, stored in storage entry 625. The output way set number, index bit 307, is merged with the block offset 306, offset 303(BNY) in the sum generated by adder 607 and sent to the first entry in a track in track table 110. The track is the corresponding track of the level one cache block in the level one cache into which the instruction block being scanned is stored. When the block offset 306 in the sum generated by the adder 607 is '01', '10', and '11', the selector 670 selects the contents of the stored entries 621, 622, and 623 accordingly, and if the contents are invalid, the entry 625 is selected, as described above.
If the address boundary determination is case 4 and the branch target instruction is in the previous second level cache block, then control line 610 controls selector 670 to select store entry 624 to read the way set number therein and select store entry 625 to read the index 307 therein. With the upper block way group number of entry 624, index 307 in entry 625 minus '1', block offset 306 in the sum generated by adder 607, offset 303 are merged into the BN2 address for storage in the first entry in the track. If the address boundary determination is case 4 and the branch target instruction is in the next second level cache block, then control line 610 controls selector 670 to select store entry 626 to read the way set number therein and select store entry 625 to read the index 307 therein. With the next block way group number of entry 626, index 307 plus '1' in entry 625, block offset 306, offset 303 in the sum generated by adder 607 is merged into the BN2 address for storage in the first entry in the track.
The other 3 sub-blocks processing the other 3 instructions also operate independently on their respective instructions in the manner described above, making address boundary decisions for the instructions independently, controlling selectors 671, 672, 673 to select contents of the storage entries 620-626 via control lines 611, 612, 613 respectively, in response to the portions of the sums generated by the adders in the sub-blocks to fill 2, 3, 4 entries in the tracks, respectively.
The last entry in the track, the end track point, is filled in by the output of selector 674. The selector is directly controlled by the block offset 306 in the base address 614 of the instruction segment. When the block offset 306 is '00', the selector 674 selects the storage table entry 621. When the valid bit in storage table entry 621 is 'valid', selector 674 outputs the address of BN1X in storage table entry 631; when the valid bit in the contents of storage entry 621 is 'invalid', selector 674 outputs the way set number, index bit 307, stored in storage entry 625. This output is combined with the block offset 306 plus '1' in the sum generated by adder 607 and the offset 303(BNY) is fed to the end entry in one track in the track table 110. When the block offset 306 in the base address is '01', '10', then the selector 674 selects the contents of the store table entries 622, 623 accordingly, and if the contents are not valid, then the table entry 625 is selected, as described above. When the block offset 306 in the base address is '11', selector 674 selects store entry 626 to read the way set number therein, and selects store entry 625 to read the index 307 therein. With the next block way group number of entry 626, index 307 in entry 625 plus '1', block offset 306, offset 303 in the sum generated by adder 607 is merged into the end track point entry in the track described above as the address of BN 2.
In this embodiment, the active table 104 may also be configured in a multi-port read-write manner, so as to implement simultaneous access of multiple branch target addresses to the active table.
FIG. 7 is a diagram of the memory and format used in a micro track table organized in a fully associative manner. Please refer to fig. 7A, which is a structure of one memory 820 in a fully-associative micro-track block. The memory 820 contains 6 entries corresponding to a secondary instruction block containing 4 primary instruction blocks. Wherein, the table entry 710 stores a primary instruction block number BN1X corresponding to a primary instruction block with an intra-block displacement of '00' in the secondary instruction block and its valid signal; the entries 711, 712, 713 respectively store the primary instruction block numbers of the primary instruction blocks with intra-block shifts of '01', '10', '11'. Entry 714 stores the way set number (Waynumber) and index address 307(index) of the present second level instruction block. Entry 715 stores the way set number of the next block of the second level cache block.
Please refer to fig. 8, which is an embodiment of a fully associative micro track table, wherein the module 110 is a track table and the module 808 is a scanner, which can be referred to as the scanner 108 in fig. 5. The functional block 801 is a functional block for independently decoding instructions and calculating a branch target address for a plurality of instructions in an instruction block passing through the scanner, similar to the instruction decoding and determining module 601 in the embodiment of fig. 6. The function block 801 adds the instruction Base Address (Base Address, as described in FIG. 6) of each decoded Branch instruction, which has the same high order bits but the lowest two bits of the Base Address vary with the position of the instruction within the instruction block, to the Branch Offset of the instruction, which is the Branch target Address, to control the selection of the contents of the mini-active block 881. Referring to fig. 7B, the branch target addresses may be divided into 4 parts, which are arranged in descending order from the upper bit to the lower bit, namely a micro Tag part (Tag)721, a micro Index (Index)722, a Block Offset (Block Offset)306 and an Offset 303. Micro tag 721, micro index 722 are different from tag 311, index 307 in other embodiments of the disclosure. The micro index 722 has only two bits, because each micro active block only has 4 active table rows corresponding to the secondary instruction blocks, there are 4 primary instruction blocks in a corresponding secondary instruction block, and the micro index value is equal to the lowest two bits of the active table index 307. So the other bits in active table index 307 are incorporated into micro tag 721. The addresses are the same, except that the present embodiment divides the tag and index on different bits of the address. Micro tag 721 includes tag 311 and the other bits of active table index 307 except for the least significant two bits.
Wherein the first 3 portions 721, 722 and 306 are provided to each of the micro-motion blocks (e.g., micro-motion block 881, 883) via buses 810, 811, 812, 813 for controlling the selectors thereof; the offset 303 is combined with the BNX output from the corresponding selector into the complete BN address to fill the entry in the track table 110. Returning to FIG. 8, the mini-initiative block 881 includes memories 820, 821, 822, 823 for storing track table entries and multiplexers 870, 871, 872, 873, 874. Wherein memories such as memory 820 are the structure of FIG. 7A.
Micro-activity block 881 has a micro tag register 851 that stores the base address of a contiguous instruction corresponding to the active table entry stored in micro-activity block 881. There are 4 more comparators 860, 861, 862, 863. One input of the 4 comparators is connected to the output of the register 851, and the other input is connected to the 4 branch target addresses 810, 811, 812, 813, respectively. The 4 branch target addresses 810, 811, 812, 813 are sent to mini-active blocks 881, 883 (constructed identically to mini-active block 881) respectively for comparison with the micro-tags in the micro-tag registers therein. In the mini-active block 881, let the micro tag portion 721 of the branch target address 810 compare with the comparator 860, which is the same as the micro tag in the micro register 851. Comparator 860 controls selector 870 with micro-index 307 and block offset 306 in branch target address 810. The micro index 307 selects one of the 4 memories, and when the micro index is '00', 820 is selected, and when the micro index is '01', '10', '11', the memories 821, 822, 823 are selected, respectively. And block offset 306 selects one of the 4 sets of BN1X addresses and valid bits from the selected memory; when the valid bit in the selected group is 'valid', selector 870 outputs the BN1X address in the selected group; when the valid bit in the selected set is 'invalid', selector 870 outputs the way set number stored in memory 820 entry 724, index bit 307, along with block offset 306 at branch target address 810. This output is OR' ed with the same output from the other micro-active block 883 and combined with the offset 303 from the adder 607 to the track table 110 for writing to the first entry in the track pointed to by the address bus 505.
In the micro-active block 881, assuming that the micro tag portion 721 of the branch target address 811 is compared by the comparator 861 and the result is different from the micro tag in the micro tag register 851, the comparator 861 sends a signal to control the selector 871 to output all '0' outputs so that it does not affect the corresponding outputs in other micro-active blocks (e.g., the micro-active block 883). If the micro tag 721 in the branch target address 811 does not match the micro tag comparisons stored in all the mini active blocks, the branch target 811 is sent to the active table 104 to read the entry pointed to by the branch target address 811 and to read the 2 nd entry on the track pointed to by the address bus 505 in the track table. Similarly, the remaining 2 branch target instruction addresses 812, 813 each control the selector 872, 873 to select 1 of the 16 BN 1; or way group number and index bit 307, along with block offset 306 at the target instruction address; or all '0' outputs. This output is combined with the corresponding BN1Y and ored with the corresponding output from mini-active block 883 before being sent to track table 110 for writing into the 3, 4 entry store of the track. If an instruction is not a branch instruction, instruction decode controls the instruction's corresponding comparator to not compare, e.g., if instruction 892 is not a branch instruction, the valid bit of branch target address 812 is ' invalid ', and therefore the corresponding comparator 862 in each mini-active block 881, 883 does not compare. The non-branch instruction type resulting from instruction decode is stored in entry 3 of the track in the track table 110.
The next block address stored in the end track point in the track is provided by a similar comparison, decoding and selecting function. The selector 874 is connected to the memory 820 or the like in a manner different from that of the selectors 870 to 873, and the selector 874 selects input of the address next in the order of 870 to 873 under the same address control. If the micro-index 722 of the address and the block offset 306 bits are '0000', the selectors 870-873 select the entry 710 in the memory 820, but select the entry 711 in the memory 820 according to the same address selector 874; if the micro-index 722 of the address and the block offset 306 are '0011', the selectors 870-873 select the entry 713 in the memory 820, but the selector 874 selects the entry 710 in the memory 821. More particularly, if the micro-index of the address and the block offset 306 are '1111', the selectors 870-873 select the entry 713 in the memory 823, but the selector 874 selects the way number in the entry 715 in the memory 823 and the secondary instruction block number plus '1' in the entry 714, along with the block offset 306 in the address, as the next block address. The micro tag 721 in the present block address 814 (i.e., the base address of the instruction block being processed) is sent to each mini-active block for comparison with the micro tags stored therein. Assuming that the comparator 864 in the microactuation block 881 compares the output of the micro tag register 851 with the pico on the block address 814, the comparator 864 controls the selector 874 with the index 722 and block offset 306 portions of the block address 814. When the valid bit of the selected primary instruction block address BN1X is 'valid', the entry is output. When the valid bit of the selected primary instruction block address BN1X is 'invalid', selector 874 selects the way group code in entry 724 in memory 823, and index address 307 is output along with block offset 306 at address 814. If the desired next block address is not in each mini-active block but in the active table 110, it is also filled into the end track point of the track in a similar manner. Thus completing the filling of the whole track. Please refer to fig. 7C, which is the address format in the track table. The address format 760 is a first-level cache address format, and comprises BN1X 761 and an offset BNY 303. The address format 780 is a second-level cache address format, and is composed of a way set number 781, an index address 307, a block offset 306, and an offset BNY 303.
Returning to FIG. 8, further, when the branch target 810, 811, 812, 813 and the micro tag in the local block address 814 do not match with all of the micro active blocks (e.g., the micro active blocks 881, 883) in the scanner 800, such as sending the branch target address 811 to the active table 104 to read an entry for filling the track table 110 in the present embodiment, the row pointed to by the branch target address 811 may be filled into the memory specified by the micro index bit 722 in the branch target 811 in one of the micro active blocks (e.g., the micro active block 883) specified by the replacement logic (e.g., LRU, not shown) in the scanner 800, replacing the original entry, such as replacing the memory 822 in the micro active block 883 when the micro index bit is '10'. The way is to fill the table entries 710, 711, 712, 713 with 4 BN1X in one row of the active table 104 pointed to by the branch target 811 and their valid signals in sequence; filling the way group number and index number 307 of the active table row into the table entry 714 as the second-level cache block number of the block; the table entry 715 is filled with the next block table entry in the active table row, e.g., the way set number in 503. In addition, the micro tag in the branch target 811 is stored in the micro tag register 851 of the micro active block 883; and the valid location in the memories 820, 821, 823 is 'invalid'. The entries in the memories 820, 821, 823 may then be updated during periods when the active tables are not being accessed.
The permutation logic may designate a mini-active block as the permutation object according to a particular algorithm. In the case of LRU, a count value having a reset bit is stored in each mini active block, and the lowest bit thereof is on the right. The count value is shifted left by 1 bit each time any one comparator in the block matches and the lowest bit is filled with a '1'. The replacement logic observes the count values in all the blocks, and if the lowest bit of any one of the count values is '0', the micro active block in which the count value is located is the object to be replaced. If all the count values are not '0', the replacement logic controls the count values in all the micro active blocks to shift right by one bit until the lowest bit of one count value is '0', namely the micro active block where the count value is located is taken as the replaced object.
the present invention may also support simultaneous address mapping of all instructions in an instruction block being scanned by the scanner 108 with a micro active block of a group-associative architecture. The mini-active block of the group-associative structure is structured like a reduced active table 104, such as the number of columns, the same table entries but only 8 rows, and 4 read ports corresponding to at most 4 instructions in an instruction block. Each read port corresponds to an entry in the track table 110. In addition, in fig. 5, the selectors 521 and 531, the comparator 420, the tri-state gate 541, and the like are 4 sets. The 4 branch addresses of the 4 branch instructions are used to address the mini active block of the group-connected fabric. Wherein 4 micro-indices (3 bits in this example) read 8 rows of content from each of 4 read ports in the two arrays of the 2-way set, respectively, and wherein the 8 sets of BN1X addresses each select one from each set by the block offset 306 in the 4 branch addresses; the 8 micro tags (longer than tag 311, including the bits in index 307 except the lowest 3 bits) are compared in 8 comparators with the micro tags in the 4 branch addresses, respectively. One of the two ways of the read port with the same comparison result drives the 3-state gate thereof to write the BN1X selected by 306 in the way of the read port into the table entry corresponding to the read port in the track table. Each of the 4 read ports writes one entry in the track.
It should be noted that all the technical solutions described in the present invention can also be extended to a cache system with more levels.
Any other suitable modifications can be made according to the technical scheme and the conception of the invention. All such alternatives, modifications and improvements as would be obvious to one skilled in the art are intended to be included within the scope of the invention as defined by the appended claims.

Claims (13)

1. A high performance instruction caching method, comprising:
A processor core to execute instructions;
A first memory for instructions required by the processor core and a second memory having a faster access speed than the first memory;
An active table for recording the location of the instruction block in the first memory in the second memory;
The first memory is formed in a group connection mode;
The active table is formed by group connection, and the table entries of the active table correspond to the instruction blocks in the first memory one by one;
The method is characterized in that:
Storing active table contents corresponding to the instruction block being filled from the first memory to the second memory in a mini active table; if the examination finds that the branch target instruction is positioned in different primary instruction blocks in the same secondary instruction block of the branch instruction and the primary instruction block is valid in the corresponding primary block number in the micro active table, directly taking the primary block number read from the micro active table as the primary block number of the branch target instruction; if the examination finds that the branch target instruction is positioned in different primary instruction blocks in the same secondary instruction block of the branch instruction, but the primary instruction block has an invalid primary block number in the micro active table, the secondary block number of the branch instruction is directly used as the secondary block number of the branch target instruction.
2. the method of claim 1 wherein the branch target instruction address is bounded and different format addresses are given to branch target instructions located at different locations based on the bounding.
3. The method of claim 2, wherein the active table stores information of storage locations of a previous instruction block and a next instruction block of the secondary instruction block corresponding to the entry in the first memory.
4. Method according to claim 2, characterized in that when an instruction is located in a previous or subsequent instruction block of the current instruction block in the first memory, the instruction can be found directly in the first memory based on the location information of said previous or subsequent instruction block stored in the active table.
5. The method of claim 2 wherein if the branch target instruction address is located in an instruction block immediately preceding or subsequent to the instruction block in which the branch instruction is located in the second memory, then the instruction is directly found in the first memory using the second level block number of the instruction immediately preceding or subsequent to the instruction block in which the branch instruction is located as the second level block number of the branch instruction.
6. The method according to claim 1 or 2, wherein if the examination finds that the branch target instruction is located in the previous or next secondary instruction block of the branch instruction and the previous or next secondary instruction block has a valid secondary block number in the micro active table, the secondary block number directly read from the micro active table is used as the secondary block number of the branch target instruction.
7. The method according to claim 1 or 2, wherein a plurality of secondary block numbers and their corresponding contents in the active table are stored in the mini active table; if a branch instruction is found during examination, firstly matching a branch target instruction address in the micro active table, and if the matching is successful, directly taking a primary block number or a secondary block number read from the micro active table as the primary block number or the secondary block number of the branch target instruction; if the matching is not successful, the branch target instruction address is sent to the active table for matching.
8. A high performance instruction cache system, comprising:
the processor core is connected with a second memory with high access speed, the second memory is connected with a first memory with low access speed, and the first memory is formed by a group connection mode;
The active table is formed by group connection, the table entries of the active table correspond to the instruction blocks in the first memory one by one, and each table entry stores the address of the corresponding instruction block in the first memory in the second memory;
The method is characterized in that:
The system also includes a mini active meter; the micro active table is used for storing the content of the active table corresponding to the instruction block which is being filled from the first memory to the second memory; when the scanner examines and finds that the branch target instruction is positioned in different primary instruction blocks in the same secondary instruction block of the branch instruction, and the corresponding primary block number of the primary instruction block in the miniature active table is valid, the primary block number read out from the miniature active table is directly used as the primary block number of the branch target instruction; if the examination finds that the branch target instruction is positioned in different primary instruction blocks in the same secondary instruction block of the branch instruction, but the primary instruction block has an invalid primary block number in the micro active table, the secondary block number of the branch instruction is directly used as the secondary block number of the branch target instruction.
9. The system of claim 8, wherein the branch target instruction address is bounded and different format addresses are given to branch target instructions located at different locations based on the bounding.
10. The system of claim 8, wherein the active table stores a second level block number in the first memory of a previous instruction block and a next instruction block of a second level instruction block corresponding to an instruction block when the previous instruction block or the next instruction block of the sequential address corresponding to the instruction block in the first memory is already stored in the first memory.
11. The system of claim 9, wherein the system comprises a single or a plurality of adders; the adder is used for adding the low order bits of the branch instruction in the part outside the offset corresponding to the first memory and the corresponding bits in the branch transfer distance, and judging whether the branch target instruction is positioned in the previous or next instruction block of the instruction block sequence address of the branch instruction in the first memory; when the branch target instruction is located in a previous instruction block or a next instruction block of the current instruction block in the first memory, the instruction can be directly found in the first memory according to the location information of the previous instruction block or the next instruction block stored in the active table.
12. the system according to claim 8 or 9, wherein if the examination finds that the branch target instruction is located in the previous or next secondary instruction block of the branch instruction and the previous or next secondary instruction block has a valid secondary block number in the micro active table, the secondary block number directly read from the micro active table is used as the secondary block number of the branch target instruction.
13. The system of claim 8 or 9, further comprising a mini active meter; the micro active table is used for storing a plurality of secondary block numbers and the corresponding contents of the block numbers in the active table; when a scanner finds a branch instruction during examination, firstly, matching a branch landmark instruction address in the miniature active table, and if the matching is successful, directly taking a primary block number or a secondary block number read from the miniature active table as the primary block number or the secondary block number of the branch target instruction; if the matching is not successful, the branch target instruction address is sent to the active table for matching.
CN201410430931.5A 2013-08-23 2014-08-22 High performance instruction cache system and method Active CN104424132B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410430931.5A CN104424132B (en) 2013-08-23 2014-08-22 High performance instruction cache system and method

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN2013103796579 2013-08-23
CN201310379657 2013-08-23
CN201410430931.5A CN104424132B (en) 2013-08-23 2014-08-22 High performance instruction cache system and method

Publications (2)

Publication Number Publication Date
CN104424132A CN104424132A (en) 2015-03-18
CN104424132B true CN104424132B (en) 2019-12-13

Family

ID=52483095

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410430931.5A Active CN104424132B (en) 2013-08-23 2014-08-22 High performance instruction cache system and method

Country Status (2)

Country Link
CN (1) CN104424132B (en)
WO (1) WO2015024532A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294352B (en) * 2015-05-13 2019-10-25 姚猛 A kind of document handling method, device and file system
US9431070B1 (en) 2015-08-31 2016-08-30 National Tsing Hua University Memory apparatus
CN105389270B (en) * 2015-12-22 2019-01-25 上海爱信诺航芯电子科技有限公司 A kind of system and method improving system on chip instruction cache hits rate
CN112905528A (en) * 2021-02-09 2021-06-04 深圳市众芯诺科技有限公司 Intelligent household chip based on Internet of things

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5897655A (en) * 1996-12-10 1999-04-27 International Business Machines Corporation System and method for cache replacement within a cache set based on valid, modified or least recently used status in order of preference
CN102110058A (en) * 2009-12-25 2011-06-29 上海芯豪微电子有限公司 Low-deficiency rate and low-deficiency punishment caching method and device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101552032B (en) * 2008-12-12 2012-01-18 深圳市晶凯电子技术有限公司 Method and device for constructing a high-speed solid state memory disc by using higher-capacity DRAM to join in flash memory medium management
US8904156B2 (en) * 2009-10-14 2014-12-02 Oracle America, Inc. Perceptron-based branch prediction mechanism for predicting conditional branch instructions on a multithreaded processor
US8635408B2 (en) * 2011-01-04 2014-01-21 International Business Machines Corporation Controlling power of a cache based on predicting the instruction cache way for high power applications
US8756405B2 (en) * 2011-05-09 2014-06-17 Freescale Semiconductor, Inc. Selective routing of local memory accesses and device thereof
CN102841865B (en) * 2011-06-24 2016-02-10 上海芯豪微电子有限公司 High-performance cache system and method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5897655A (en) * 1996-12-10 1999-04-27 International Business Machines Corporation System and method for cache replacement within a cache set based on valid, modified or least recently used status in order of preference
CN102110058A (en) * 2009-12-25 2011-06-29 上海芯豪微电子有限公司 Low-deficiency rate and low-deficiency punishment caching method and device

Also Published As

Publication number Publication date
CN104424132A (en) 2015-03-18
WO2015024532A9 (en) 2015-04-23
WO2015024532A1 (en) 2015-02-26

Similar Documents

Publication Publication Date Title
US20150186293A1 (en) High-performance cache system and method
CN107479860B (en) Processor chip and instruction cache prefetching method
US6425055B1 (en) Way-predicting cache memory
US6678815B1 (en) Apparatus and method for reducing power consumption due to cache and TLB accesses in a processor front-end
US7707397B2 (en) Variable group associativity branch target address cache delivering multiple target addresses per cache line
US5093778A (en) Integrated single structure branch prediction cache
US8370575B2 (en) Optimized software cache lookup for SIMD architectures
US9753855B2 (en) High-performance instruction cache system and method
TWI382426B (en) System and method for predicting cache access
KR102546238B1 (en) Multi-Table Branch Target Buffer
US20090006803A1 (en) L2 Cache/Nest Address Translation
US7680985B2 (en) Method and apparatus for accessing a split cache directory
US11593117B2 (en) Combining load or store instructions
US20150370569A1 (en) Instruction processing system and method
US10275358B2 (en) High-performance instruction cache system and method
CN102110058A (en) Low-deficiency rate and low-deficiency punishment caching method and device
US11775445B2 (en) Translation support for a virtual cache
TW201351145A (en) Instruction cache power reduction
US7937530B2 (en) Method and apparatus for accessing a cache with an effective address
US10810134B2 (en) Sharing virtual and real translations in a virtual cache
US11301250B2 (en) Data prefetching auxiliary circuit, data prefetching method, and microprocessor
CN104424128B (en) Variable length instruction word processor system and method
US10481912B2 (en) Variable branch target buffer (BTB) line size for compression
CN104424132B (en) High performance instruction cache system and method
US6240489B1 (en) Method for implementing a pseudo least recent used (LRU) mechanism in a four-way cache memory within a data processing system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP02 Change in the address of a patent holder
CP02 Change in the address of a patent holder

Address after: 201203 501, No. 14, Lane 328, Yuqing Road, Pudong New Area, Shanghai

Patentee after: SHANGHAI XINHAO MICROELECTRONICS Co.,Ltd.

Address before: 200092, B, block 1398, Siping Road, Shanghai, Yangpu District 1202

Patentee before: SHANGHAI XINHAO MICROELECTRONICS Co.,Ltd.