US20050278517A1 - Systems and methods for performing branch prediction in a variable length instruction set microprocessor - Google Patents
Systems and methods for performing branch prediction in a variable length instruction set microprocessor Download PDFInfo
- Publication number
- US20050278517A1 US20050278517A1 US11/132,428 US13242805A US2005278517A1 US 20050278517 A1 US20050278517 A1 US 20050278517A1 US 13242805 A US13242805 A US 13242805A US 2005278517 A1 US2005278517 A1 US 2005278517A1
- Authority
- US
- United States
- Prior art keywords
- instruction
- branch
- address
- fetch
- prediction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 230000007246 mechanism Effects 0.000 abstract description 5
- 230000008901 benefit Effects 0.000 description 6
- 238000013461 design Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 239000000463 material Substances 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 238000002347 injection Methods 0.000 description 1
- 239000007924 injection Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 229910052710 silicon Inorganic materials 0.000 description 1
- 239000010703 silicon Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F5/00—Methods or arrangements for data conversion without changing the order or content of the data handled
- G06F5/01—Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3861—Recovery, e.g. branch miss-prediction, exception handling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3842—Speculative instruction execution
- G06F9/3844—Speculative instruction execution using dynamic branch prediction, e.g. using branch history tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/362—Software debugging
- G06F11/3648—Software debugging using additional hardware
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7867—Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30032—Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30036—Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30145—Instruction analysis, e.g. decoding, instruction word fields
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30145—Instruction analysis, e.g. decoding, instruction word fields
- G06F9/30149—Instruction analysis, e.g. decoding, instruction word fields of variable length instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30181—Instruction operation extension or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/32—Address formation of the next instruction, e.g. by incrementing the instruction counter
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/32—Address formation of the next instruction, e.g. by incrementing the instruction counter
- G06F9/322—Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address
- G06F9/325—Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address for loops, e.g. loop detection or loop counter
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3804—Instruction prefetching for branches, e.g. hedging, branch folding
- G06F9/3806—Instruction prefetching for branches, e.g. hedging, branch folding using address prediction, e.g. return stack, branch history buffer
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3816—Instruction alignment, e.g. cache line crossing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3842—Speculative instruction execution
- G06F9/3846—Speculative instruction execution using static prediction, e.g. branch taken strategy
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3893—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
- G06F9/3895—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros
- G06F9/3897—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros with adaptable data path
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- This invention relates generally to microprocessor architecture and more specifically to an improved architecture and mode of operation of a microprocessor for performing branch prediction.
- a typical component of a multistage microprocessor pipeline is the branch prediction unit (BPU).
- the branch prediction unit Usually located in or near a fetch stage of the pipelines the branch prediction unit increases effective processing speed by predicting whether a branch to a non-sequential instruction will be taken based upon past instruction processing history.
- the branch prediction unit contains a branch look-up or prediction table that stores the address of branch instructions, an indication as to whether the branch was taken and a speculative target address for a taken branch.
- next instruction is speculatively loaded into the pipeline. Whether or not the prediction will be correct, will not be known until a later stage of the pipeline. However, if the prediction is correct, clock cycles will be saved by not having to go back to get the next instruction address. Otherwise, the current pipeline behind the stage in which the actual address of the next instruction is determined must be flushed and the correct branch inserted back in the first stage. While this may seem like a harsh penalty for incorrect predictions, in applications where the instruction set is limited and small loops are repeated many times, such as, for example, applications typically implemented with embedded processors, branch prediction is usually accurate enough such that the benefits associated with correct predictions outweigh the cost of occasional incorrect predictions—i.e., pipeline flush. In these types of applications branch prediction can achieve accuracy over ninety percent of the time. Thus, the risk of predicting an incorrect branch resulting in a pipeline flush is outweighed by the benefit of saved clock cycles.
- look-up table is a comprised of entries associated with 32-bit wide fetch entities and instructions have lengths varying from 16 to 64-bits, a specific lookup table address entry may not be sufficient to reference a particular instruction.
- a microprocessor architecture in which branch prediction information is selectively ignored by the instruction pipeline in order to avoid injection of erroneous instructions into the pipeline. These embodiments are particularly useful for branch prediction schemes in which variable length instructions are predictively fetched.
- a 32-bit word is fetched based on the address in the branch prediction table.
- this address may reference a word comprising two 16-bit instruction words, or a 16-bit instruction word and an unaligned instruction word of larger length (32, 48 or 64 bits) or parts of two unaligned instruction words of such larger lengths.
- the branch prediction table may contain a tag coupled to the lower bits of a fetch instruction address. If the entry at the location specified by the branch prediction table contains more than one instruction, for example, two 16-bit instructions, or a 16-bit instruction and a portion of a 32, 48 or 64-bit instruction, a prediction may be made based on an instruction that will ultimately be discarded. Though the instruction aligner will discard the incorrect instruction, a predicted branch will already have been injected into the pipeline and will not be discovered until branch resolution in a later stage of the pipeline causing a pipeline flush.
- a prediction will be discarded beforehand if two conditions are satisfied.
- a prediction will be discarded if a branch prediction look-up is based on a non-sequential fetch to an unaligned address, and secondly, if the branch target alignment cache (BTAC) bit is equal to zero. This second condition will only be satisfied if the prediction is based on an instruction having an aligned instruction address.
- BTAC branch target alignment cache
- an alignment bit of zero will indicate that the prediction information is for an aligned branch. This will prevent the predictions based on incorrect instructions from being injected into the pipeline.
- a microprocessor architecture which utilizes dynamic branch prediction while removing the inherent latency involved in branch prediction.
- an instruction fetch address is used to look up in a BPU table recording historical program flow to predict when a non-sequential program flow is to occur.
- the address of the instruction prior to the branch instruction in the program flow is used to index the branch in the branch table.
- a delay slot instruction may be inserted after a conditional branch such that the conditional branch is not the last sequential instruction.
- the delay slot instruction is the actual sequential departure point, the instruction prior to the non-sequential program flow would actually be the branch instruction.
- the BPU would index such an entry by the address of the conditional branch instruction itself, since it would be the instruction prior to the non-sequential instruction.
- use of a delay slot instruction will also affect branch resolution in the selection stage.
- update of the BPU must be deferred for one execution cycle after the branch instruction. This process is further complicated by the use of variable length instructions. Performance of branch resolution after execution requires updating of the BPU table.
- the processor instruction set includes variable length instructions it becomes essential to determine the last fetch address of the current instruction as well as the update address, i.e., the fetch address prior to the sequential departure point.
- the current instruction is an aligned or non-aligned 16-bit or an aligned 32-bit instruction, the last fetch address will be the instruction fetch address of the current instruction.
- the update address of an aligned 16-bit or aligned 32-bit instruction will be the last fetch address of the prior instruction. For a non-aligned 16-bit instruction, if it was arrived at sequentially, the update address will be the update address of the prior instruction. Otherwise, the update address will be the last fetch address of the prior instruction.
- the last fetch address will simply be the address of the next instruction.
- the update address will be the current instruction address.
- the last fetch address of a non-aligned 48-bit instruction or an aligned 64-bit instruction will be the address of the next instruction minus one and the update address will be the current instruction address. If the current instruction is a non-aligned 64-bit, the last fetch address will be the same as the next instruction address and the update address will be the next instruction address minus one.
- a microprocessor architecture which employs dynamic branch prediction and zero overhead loops.
- the BPU is updated whenever the zero-overhead loop mechanism is updated.
- the BPU needs to store the last fetch address of the last instruction of the loop body. This allows the BPU to predictively re-direct instruction fetch to the start of the loop body whenever an instruction fetch hits the end of the loop body.
- the last fetch address of the loop body can be derived from the address of the first instruction after the end of the loop, despite the use of variable length instructions, by exploiting the fact that instructions are fetched in 32-bit word chunks and that instruction sizes are in general integer multiple of a 16-bits.
- the last instruction of the loop body has a last fetch address immediately preceding the address of the next instruction after the end of the loop body. Otherwise, if the next instruction after the end of the loop body has an unaligned address, the last instruction of the loop body has the same fetch address as the next instruction after the loop body.
- At least one exemplary embodiment of the invention provides a method of performing branch prediction in a microprocessor using variable length instructions.
- the method of performing branch prediction in a microprocessor using variable length instructions comprises fetching an instruction from memory based on a specified fetch address, making a branch prediction based on the address of the fetched instruction, and discarding the branch prediction if (1) the branch prediction look-up was based on a non-sequential fetch to an unaligned instruction address and (2) if a branch target alignment cache (BTAC) bit of the instruction is equal to zero.
- BTAC branch target alignment cache
- At least one additional exemplary embodiment provides a method of performing dynamic branch prediction in a microprocessor.
- the method of performing dynamic branch prediction in a microprocessor may comprise fetching the penultimate instruction word prior to a non-sequential program flow and a branch prediction unit look-up table entry containing prediction information for a next instruction word on a first clock cycle, fetching the last instruction word prior to a non-sequential program flow and making a prediction on non-sequential program flow based on information fetched in the previous cycle on a second clock cycle, and fetching the predicted target instruction on a third clock cycle.
- an additional exemplary embodiment provides a method of updating a look-up table of a branch prediction unit in a variable length instruction set microprocessor.
- the method of updating a look-up table of a branch prediction unit in a variable length instruction set microprocessor may comprise storing a last fetch address of a last instruction of a loop body of a zero overhead loop in the branch prediction look-up table, and predictively re-directing an instruction fetch to the start of the loop body whenever an instruction fetch hits the end of a loop body, wherein the last fetch address of the loop body is derived from the address of the first instruction after the end of the loop.
- FIG. 1 is a diagram illustrating the contents of a 32-bit instruction memory and a corresponding table illustrating the location of particular instructions within the instruction memory in connection with a technique for selectively ignoring branch prediction information in accordance with at least one exemplary embodiment of this invention
- FIG. 2 is a flow chart illustrating the steps of a method for selectively discarding branch predictions corresponding to aligned 16-bit instructions having the same fetch address as a non-aligned 16-bit target instruction in accordance with at least one exemplary embodiment of this invention
- FIG. 3 is a flow chart illustrating a prior art method of performing branch prediction by storing non-sequential branch instructions in a branch prediction unit table that is indexed by the fetch address of the non-sequential branch instruction;
- FIG. 4 is a flow chart illustrating a method for performing branch prediction by storing non-sequential branch instructions in a branch prediction table that is indexed by the fetch address of the instruction prior to the non-sequential branch instruction in accordance with at least one exemplary embodiment of this invention
- FIG. 5 is a diagram illustrating possible scenarios encountered during branch resolution when 32-bit words are fetched from memory in a system incorporating a variable length instruction architecture including instructions of 16-bits, 32-bits, 48-bits or 64-bits in length;
- FIGS. 6 and 7 are tables illustrating a method for computing the last instruction fetch address of a zero-overhead loop for dynamic branch prediction in a variable-length instruction set architecture processor
- FIG. 1 is a diagram illustrating the contents of a 32-bit instruction memory and a corresponding table illustrating the location of particular instructions within the instruction memory in connection with a technique for selectively ignoring branch prediction information in accordance with at least one exemplary embodiment of this invention.
- FIG. 1 a sequence of 32-bit wide memory words are shown containing instructions instr_ 1 through instr_ 4 in sequential locations in memory.
- Instr_ 2 is the target of a non-sequential instruction fetch.
- the BPU stores prediction information in its tables based only on the 32-bit fetch address of the start of the instruction. There can be more than one instruction in any 32-bit word in memory, however, only one prediction can be made per 32-bit word. Thus, the performance problem can be seen by referring to FIG. 1 .
- the instruction address of instr_ 2 is actually 0x2, however, the fetch address is 0x0, and a fetch of this address will cause the entire 32-bit word comprised of 16-bits of instr_ 1 and 16-bits of instr_ 2 to be fetched.
- a branch prediction will be made for instr_ 1 based on the instruction fetch of the 32-bit word at address 0x0.
- the branch predictor does not take into account the fact that the instr_ 1 at 0x0 will be discarded by the aligner before it can be issued, however the prediction remains.
- the prediction would be correct if instr_ 1 is fetched as the result of a sequential fetch of 0x0, or if a branch was made to 0x0, but, in this case, where a branch is made to instr_ 2 at 0x2, the prediction is wrong. As a result, the prediction is wrong for instr_ 2 causing an incorrect instruction to hit the backstop and a pipeline flush, a severe performance penalty, to occur.
- FIG. 2 is a flow chart outlining the steps of a method for solving the aforementioned problem by selectively discarding branch prediction information in accordance with various embodiments of the invention. Operation of the method begins at step 200 and proceeds to step 205 where a 32-bit word is read from memory at the specified fetch address of the target instruction. Next, in step 210 , a prediction is made based on this fetched instruction. This prediction is based on the aligned instruction fetch location. Operation of the method then proceeds to step 215 where the first part of a two-part determination test is applied to whether the branch prediction lookup is based on a non-sequential fetch to an unaligned instruction address. In the context of FIG.
- instr_ 2 this condition would be satisfied by instr_ 2 because it is non-aligned (it does not start at the beginning of line 0x0, but rather after the first 16-bits).
- this condition alone is not sufficient because a valid branch prediction lookup can be based on a branch located at an unaligned instruction address. For example, if in FIG. 1 instr_ 1 is not a branch and instr_ 2 is a branch. If, in step 215 , it is determined that the branch prediction lookup is based on a non-sequential fetch to an unaligned instruction address, operation of the method proceeds to the next step of the test, step 220 . Otherwise, operation of the method jumps to step 225 , where the prediction is assumed valid and passed.
- step 220 in this step a second determination is made as to whether the branch target address cache (BTAC) alignment bit is 0, indicating that the prediction information is for an aligned branch.
- This bit will be 0 for all aligned branches and will be 1 for all unaligned branches because it is derived from the instruction address.
- the second bit of the instruction address will always be 0 for aligned branches (i.e., 0, 4, 8, f, etc.) and will always be 1 for unaligned branches (i.e., 2, 6, a, etc.). If, in step 220 , it is determined that the branch target address cache (BTAC) alignment bit is not 0, operation proceeds to step 225 where the prediction is passed.
- BTAC branch target address cache
- step 220 determines that the BTAC alignment bit is 0
- step 230 the prediction is discarded.
- the next fetch address is updated in step 235 based on whether a branch was predicted and returns to step 205 where the next fetch occurs.
- dynamic branch prediction is an effective technique to reduce branch penalty in a pipeline processor architecture.
- This technique uses the instruction fetch address to look up in internal tables recording program flow history to predict the target of a non-sequential program flow.
- branch prediction is complicated when a variable-length instruction architecture is used. In a variable-length instruction architecture, the instruction fetch address cannot be assumed to be identical to the actual instruction address. This makes it difficult for the branch prediction algorithm to guarantee sufficient instruction words are fetched and at the same time minimize unnecessary fetches.
- FIG. 3 illustrates such a conventional indexing method in which two instructions are sequentially fetched, the first instruction being a branch instruction, and the second being the next sequential instruction word.
- the branch instruction is fetched with the associated BPU table entry.
- this instruction is propagated in the pipeline to the next stage where it is detected as a predicted branch while the next instruction is fetched.
- the target instruction is fetched based on the branch prediction made in the last cycle.
- a latency is introduced because three steps are required to fetch the branch instruction, make a prediction and fetch the target instruction. If the instruction word fetched in 305 is not part of the branch nor of its delay slot, then the word is discarded and as a result a “bubble” is injected into the pipeline.
- FIG. 4 illustrates a novel and improved method for making a branch prediction in accordance with various embodiments of the invention.
- the method depicted in FIG. 4 is characterized in that the instruction address of the instruction preceding the branch instruction is used to index the BPU table rather than the instruction address of the branch instruction itself.
- the instruction address of the instruction preceding the branch instruction is used to index the BPU table rather than the instruction address of the branch instruction itself.
- the method begins in step 400 where the instruction prior to the branch instruction is fetched together with the BPU entry containing prediction information of the next instruction.
- the branch instruction is fetched while, concurrently, a prediction on this branch can be made based on information fetched in the previous cycle.
- the target instruction is fetched. As illustrated, no extra instruction word is fetched between the branch and the target instructions. Hence, no bubble will be injected into the pipeline and overall performance of the processor is improved.
- the branch instruction may not be the departure point (the instruction prior to non-sequential flow). Rather another instruction may appear after the branch instruction. Therefore, though the non-sequential jump is dictated by the branch instruction, the last instruction to be executed may not be the branch instruction, but may rather be the delay slot instruction.
- a delay slot is used in some processor architectures with short pipelines to hide branch resolution latency. Processors with dynamic branch prediction might still have to support the concept of delay slots to be compatible with legacy code.
- FIG. 5 illustrates five potential scenarios encountered when performing branch resolution. These scenarios may be grouped into two groups by the way in which they are handled. Group one comprises a non-aligned 16-bit instruction and an aligned 16 or 32-bit instruction. Group two comprises one of three scenarios: a non-aligned 32 or 48-bit instruction, a non-aligned 48-bit or an aligned 64-bit instruction, and a non-aligned 64-bit instruction.
- L 0 is simply the 30 most significant bits of the fetch address denoted as instr_addr[31:2].
- the update address U 0 depends on whether these instructions were arrived at sequentially or as the result of a non-sequential instruction.
- the last fetch address of the prior instruction also known as L ⁇ 1 . This information is stored internally and is available as a variable to the current instruction in the select stage of the pipeline.
- the update address will be the last fetch address of the prior non-sequential instruction L ⁇ 1 .
- the update address of a 16 or 32-bit aligned instruction U 0 will be the last fetch address of the prior instruction L ⁇ 1 , irrespective of whether the prior instruction was sequential or not.
- Scenarios 3-5 can be handled in the same manner by taking advantage of the fact that each instruction fetch fetches a contiguous 32-bit word. Therefore, when the instruction is sufficiently long and/or unaligned to span two or more consecutive fetched instruction words in memory, we know with certainty that L 0 , the last fetch address, can be derived from the instruction address of the next sequential instruction, denoted as next_addr[31:2] in FIG. 5 . In scenarios 3 and 5, covering non-aligned 32-bit, aligned 48-bit and non-aligned 64-bit instructions, the last portion of the current instruction share the same fetch address with the start of the next sequential instruction. Hence L 0 will be next_addr[31:2].
- the fetch address of the last portion of the current instruction is one less than the start address of the next sequential instruction.
- L 0 next_addr[31:2] ⁇ 1.
- the current instruction spans two consecutive 32-bit fetched instruction words. The fetch address prior to the last portion of the current instruction is always the fetch address of the start of the instruction. Therefore, U 0 will be inst_addr[31:2].
- the last portion of the current instruction shares the same fetch address as the start of the next sequential instruction. Hence, U 0 will be next_addr[31:2] ⁇ 1.
- the update address U 0 and last fetch address L 0 are computed based on 4 values that are provided to the selection stage as early arriving signals directly from registers. These signals are namely inst_addr, next_addr, L ⁇ 1 and U ⁇ 1 . Only one multiplexer is required to compute U 0 in scenario 1, and one decrementer is required to compute L 0 in scenario 4 and U 0 in scenario 5. The overall complexity of the novel and improved branch prediction method being disclosed is only marginally increased comparing with traditional methods.
- a method and apparatus are provided for computing the last instruction fetch of a zero-overhead loop for dynamic branch prediction in a variable length instruction set microprocessor.
- Zero-overhead loops as well as the previously discussed dynamic branch prediction, are both powerful techniques for improving effective processor performance.
- the BPU has to be updated whenever the zero-overhead loop mechanism is updated.
- the BPU needs the last instruction fetch address of the loop body. This allows the BPU to re-direct instruction fetch to the start of the loop body whenever an instruction fetch hits the end of the loop body.
- determining the last fetch address of a loop body is not trivial.
- a processor with a variable-length instruction set only keeps track of the first address an instruction is fetched from.
- the last fetch address of a loop body is the fetch address of the last portion of the last instruction of the loop body and is not readily available.
- a zero-overhead loop mechanism requires an address related to the end of the loop body to be stored as part of the architectural state.
- this address can be denoted as LP_END.
- LP_END is assigned the address of the next instruction after the last instruction of the loop body
- the last fetch address of the loop body designated in various exemplary embodiments as LP_LAST, can be derived by exploiting two facts. Firstly, despite the variable length nature of the instruction set, instructions are fetched in fixed size chunks, namely 32-bit words. The BPU works only with the fetch address of theses fixed size chunks. Secondly, instruction sizes of variable-length are usually an integer multiple of a fixed size, namely 16-bits.
- LP_END is both non-aligned and aligned.
- the instruction “sub” is the last instruction of the loop body.
- LP_END is located at 0xA.
- LP_END is unaligned and LP_END[1] is 1, thus, the inversion of LP_END[1] is 0 and the last fetch address of the loop body, LP_LAST, is LP_END[32:2] which is 0x8.
- LP_END is aligned and located at 0x18.
- LP_END[1] is 0, as with all aligned instructions, thus, the inversion of LP_END is 1 and LP 13 LAST is LP_END[31:2] ⁇ 1 or the line above LP_LAST, line 0x14. Note that in the above calculations least significant bits of addresses that are known to be zero are ignored for the sake of simplifying the description.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Mathematical Physics (AREA)
- Quality & Reliability (AREA)
- Advance Control (AREA)
- Executing Machine-Instructions (AREA)
- Memory System Of A Hierarchy Structure (AREA)
- Debugging And Monitoring (AREA)
Abstract
A method of performing branch prediction in a microprocessor using variable length instructions is provided. An instruction is fetched from memory based on a specified fetch address and a branch prediction is made based on the address. The prediction is selectively discarded if the look-up was based on a non-sequential fetch to an unaligned instruction address and a branch target alignment cache (BTAC) bit of the instruction is equal to zero. In order to remove the inherent latency of branch prediction, an instruction prior to a branch instruction may be fetched concurrently with a branch prediction unit look-up table entry containing prediction information for a next instruction word. Then, the branch instruction is fetched and a prediction is made on this branch instruction based on information fetched in the previous cycle. The predicted target instruction is fetched on the next clock cycle. If zero overhead loops are used, a look-up table of a branch prediction unit is updated whenever the zero-overhead loop mechanism is updated. A last fetch address of a last instruction of a loop body of a zero overhead loop in the branch prediction look-up table is stored. Then, whenever an instruction fetch hits the end of a loop body, predictively re-directing an instruction fetch to the start of the loop body. The last fetch address of the loop body is derived from the address of the first instruction after the end of the loop.
Description
- This application claims priority to provisional application No. 60/572,238 filed May 19, 2004, entitled “Microprocessor Architecture,” hereby incorporated by reference in its entirety.
- This invention relates generally to microprocessor architecture and more specifically to an improved architecture and mode of operation of a microprocessor for performing branch prediction.
- A typical component of a multistage microprocessor pipeline is the branch prediction unit (BPU). Usually located in or near a fetch stage of the pipelines the branch prediction unit increases effective processing speed by predicting whether a branch to a non-sequential instruction will be taken based upon past instruction processing history. The branch prediction unit contains a branch look-up or prediction table that stores the address of branch instructions, an indication as to whether the branch was taken and a speculative target address for a taken branch. When an instruction is fetched, if the instruction is a conditional branch, the result of the conditional branch is speculatively predicted based on past branch history. This speculative or predictive result is injected into the pipeline. Thus, referencing a branch history table, the next instruction is speculatively loaded into the pipeline. Whether or not the prediction will be correct, will not be known until a later stage of the pipeline. However, if the prediction is correct, clock cycles will be saved by not having to go back to get the next instruction address. Otherwise, the current pipeline behind the stage in which the actual address of the next instruction is determined must be flushed and the correct branch inserted back in the first stage. While this may seem like a harsh penalty for incorrect predictions, in applications where the instruction set is limited and small loops are repeated many times, such as, for example, applications typically implemented with embedded processors, branch prediction is usually accurate enough such that the benefits associated with correct predictions outweigh the cost of occasional incorrect predictions—i.e., pipeline flush. In these types of applications branch prediction can achieve accuracy over ninety percent of the time. Thus, the risk of predicting an incorrect branch resulting in a pipeline flush is outweighed by the benefit of saved clock cycles.
- While branch prediction is effective at increasing effective processing speed, problems may arise that reduce or eliminate these efficiency gains when dealing with a variable length microprocessor instruction set. For example, if the look-up table is a comprised of entries associated with 32-bit wide fetch entities and instructions have lengths varying from 16 to 64-bits, a specific lookup table address entry may not be sufficient to reference a particular instruction.
- The description herein of various advantages and disadvantages associated with known apparatus, methods, and materials is not intended to limit the scope of the invention to their exclusion. Indeed, various embodiments of the invention may include one or more of the known apparatus, methods, and materials without suffering from their disadvantages.
- As background to the techniques discussed herein, the following references are incorporated herein by reference: U.S. Pat. No. 6,862,563 issued Mar. 1, 2005 entitled “Method And Apparatus For Managing The Configuration And Functionality Of A Semiconductor Design” (Hakewill et al.); U.S. Ser. No. 10/423,745 filed Apr. 25, 2003, entitled “Apparatus and Method for Managing Integrated Circuit Designs”; and U.S. Ser. No. 10/651,560 filed Aug. 29, 2003, entitled “Improved Computerized Extension Apparatus and Methods”, all assigned to the assignee of the present invention.
- Thus, there exists a need for microprocessor architecture with reduced power consumption, improved performance, reduction of silicon footprint and improved branch prediction as compared with state of the art microprocessors.
- In various embodiments of this invention, a microprocessor architecture is disclosed in which branch prediction information is selectively ignored by the instruction pipeline in order to avoid injection of erroneous instructions into the pipeline. These embodiments are particularly useful for branch prediction schemes in which variable length instructions are predictively fetched. In various exemplary embodiments, a 32-bit word is fetched based on the address in the branch prediction table. However, in branch prediction systems based on addresses of 32-bit fetch objects, because the instruction memory is comprised of 32-bit entries, regardless of instruction length, this address may reference a word comprising two 16-bit instruction words, or a 16-bit instruction word and an unaligned instruction word of larger length (32, 48 or 64 bits) or parts of two unaligned instruction words of such larger lengths.
- In various embodiments, the branch prediction table may contain a tag coupled to the lower bits of a fetch instruction address. If the entry at the location specified by the branch prediction table contains more than one instruction, for example, two 16-bit instructions, or a 16-bit instruction and a portion of a 32, 48 or 64-bit instruction, a prediction may be made based on an instruction that will ultimately be discarded. Though the instruction aligner will discard the incorrect instruction, a predicted branch will already have been injected into the pipeline and will not be discovered until branch resolution in a later stage of the pipeline causing a pipeline flush.
- Thus, in various exemplary embodiments, to prevent such an incorrect prediction from being made, a prediction will be discarded beforehand if two conditions are satisfied. In various embodiments, a prediction will be discarded if a branch prediction look-up is based on a non-sequential fetch to an unaligned address, and secondly, if the branch target alignment cache (BTAC) bit is equal to zero. This second condition will only be satisfied if the prediction is based on an instruction having an aligned instruction address. In various exemplary embodiments, an alignment bit of zero will indicate that the prediction information is for an aligned branch. This will prevent the predictions based on incorrect instructions from being injected into the pipeline.
- In various embodiments of this invention, a microprocessor architecture is disclosed which utilizes dynamic branch prediction while removing the inherent latency involved in branch prediction. In this embodiment, an instruction fetch address is used to look up in a BPU table recording historical program flow to predict when a non-sequential program flow is to occur. However, instead of using the instruction address of the branch instruction to index the branch table, the address of the instruction prior to the branch instruction in the program flow is used to index the branch in the branch table. Thus, fetching the instruction prior to the branch instruction will cause a prediction to be made and eliminate the inherent one step latency in the process of dynamic branch prediction caused by the fetching the address of the branch instruction itself. In the above embodiment, it should be noted that in some cases, a delay slot instruction may be inserted after a conditional branch such that the conditional branch is not the last sequential instruction. In such a case, because the delay slot instruction is the actual sequential departure point, the instruction prior to the non-sequential program flow would actually be the branch instruction. Thus, the BPU would index such an entry by the address of the conditional branch instruction itself, since it would be the instruction prior to the non-sequential instruction.
- In various embodiments, use of a delay slot instruction will also affect branch resolution in the selection stage. In various exemplary embodiments, if a delay slot instruction is utilized, update of the BPU must be deferred for one execution cycle after the branch instruction. This process is further complicated by the use of variable length instructions. Performance of branch resolution after execution requires updating of the BPU table. However, when the processor instruction set includes variable length instructions it becomes essential to determine the last fetch address of the current instruction as well as the update address, i.e., the fetch address prior to the sequential departure point. In various exemplary embodiments, if the current instruction is an aligned or non-aligned 16-bit or an aligned 32-bit instruction, the last fetch address will be the instruction fetch address of the current instruction. The update address of an aligned 16-bit or aligned 32-bit instruction will be the last fetch address of the prior instruction. For a non-aligned 16-bit instruction, if it was arrived at sequentially, the update address will be the update address of the prior instruction. Otherwise, the update address will be the last fetch address of the prior instruction.
- In the same embodiment, if the current instruction is non-aligned 32-bit or an aligned 48-bit instruction, the last fetch address will simply be the address of the next instruction. The update address will be the current instruction address. The last fetch address of a non-aligned 48-bit instruction or an aligned 64-bit instruction will be the address of the next instruction minus one and the update address will be the current instruction address. If the current instruction is a non-aligned 64-bit, the last fetch address will be the same as the next instruction address and the update address will be the next instruction address minus one.
- In exemplary embodiments of this invention, a microprocessor architecture is disclosed which employs dynamic branch prediction and zero overhead loops. In such a processor, the BPU is updated whenever the zero-overhead loop mechanism is updated. Specifically, the BPU needs to store the last fetch address of the last instruction of the loop body. This allows the BPU to predictively re-direct instruction fetch to the start of the loop body whenever an instruction fetch hits the end of the loop body. In this embodiment, the last fetch address of the loop body can be derived from the address of the first instruction after the end of the loop, despite the use of variable length instructions, by exploiting the fact that instructions are fetched in 32-bit word chunks and that instruction sizes are in general integer multiple of a 16-bits. Therefore, in this embodiment, if the next instruction after the end of the loop body has an aligned address, the last instruction of the loop body has a last fetch address immediately preceding the address of the next instruction after the end of the loop body. Otherwise, if the next instruction after the end of the loop body has an unaligned address, the last instruction of the loop body has the same fetch address as the next instruction after the loop body.
- At least one exemplary embodiment of the invention provides a method of performing branch prediction in a microprocessor using variable length instructions. The method of performing branch prediction in a microprocessor using variable length instructions according to this embodiment comprises fetching an instruction from memory based on a specified fetch address, making a branch prediction based on the address of the fetched instruction, and discarding the branch prediction if (1) the branch prediction look-up was based on a non-sequential fetch to an unaligned instruction address and (2) if a branch target alignment cache (BTAC) bit of the instruction is equal to zero.
- At least one additional exemplary embodiment provides a method of performing dynamic branch prediction in a microprocessor. The method of performing dynamic branch prediction in a microprocessor according to this embodiment may comprise fetching the penultimate instruction word prior to a non-sequential program flow and a branch prediction unit look-up table entry containing prediction information for a next instruction word on a first clock cycle, fetching the last instruction word prior to a non-sequential program flow and making a prediction on non-sequential program flow based on information fetched in the previous cycle on a second clock cycle, and fetching the predicted target instruction on a third clock cycle.
- Yet an additional exemplary embodiment provides a method of updating a look-up table of a branch prediction unit in a variable length instruction set microprocessor. The method of updating a look-up table of a branch prediction unit in a variable length instruction set microprocessor may comprise storing a last fetch address of a last instruction of a loop body of a zero overhead loop in the branch prediction look-up table, and predictively re-directing an instruction fetch to the start of the loop body whenever an instruction fetch hits the end of a loop body, wherein the last fetch address of the loop body is derived from the address of the first instruction after the end of the loop.
- Other aspects and advantages of the invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.
-
FIG. 1 is a diagram illustrating the contents of a 32-bit instruction memory and a corresponding table illustrating the location of particular instructions within the instruction memory in connection with a technique for selectively ignoring branch prediction information in accordance with at least one exemplary embodiment of this invention; -
FIG. 2 is a flow chart illustrating the steps of a method for selectively discarding branch predictions corresponding to aligned 16-bit instructions having the same fetch address as a non-aligned 16-bit target instruction in accordance with at least one exemplary embodiment of this invention; -
FIG. 3 is a flow chart illustrating a prior art method of performing branch prediction by storing non-sequential branch instructions in a branch prediction unit table that is indexed by the fetch address of the non-sequential branch instruction; -
FIG. 4 is a flow chart illustrating a method for performing branch prediction by storing non-sequential branch instructions in a branch prediction table that is indexed by the fetch address of the instruction prior to the non-sequential branch instruction in accordance with at least one exemplary embodiment of this invention; -
FIG. 5 is a diagram illustrating possible scenarios encountered during branch resolution when 32-bit words are fetched from memory in a system incorporating a variable length instruction architecture including instructions of 16-bits, 32-bits, 48-bits or 64-bits in length; and -
FIGS. 6 and 7 are tables illustrating a method for computing the last instruction fetch address of a zero-overhead loop for dynamic branch prediction in a variable-length instruction set architecture processor; - The following description is intended to convey a thorough understanding of the invention by providing specific embodiments and details involving various aspects of a new and useful microprocessor architecture. It is understood, however, that the invention is not limited to these specific embodiments and details, which are exemplary only. It further is understood that one possessing ordinary skill in the art, in light of known systems and methods, would appreciate the use of the invention for its intended purposes and benefits in any number of alternative embodiments, depending upon specific design and other needs.
-
FIG. 1 is a diagram illustrating the contents of a 32-bit instruction memory and a corresponding table illustrating the location of particular instructions within the instruction memory in connection with a technique for selectively ignoring branch prediction information in accordance with at least one exemplary embodiment of this invention. When branch prediction is done in a microprocessor employing a variable length instruction set, a performance problem is created when a branch is made to an unaligned target address that is packed with an aligned instruction in the same 32-bit word that is predicted to be a branch. - In
FIG. 1 a sequence of 32-bit wide memory words are shown containing instructions instr_1 through instr_4 in sequential locations in memory. Instr_2, is the target of a non-sequential instruction fetch. The BPU stores prediction information in its tables based only on the 32-bit fetch address of the start of the instruction. There can be more than one instruction in any 32-bit word in memory, however, only one prediction can be made per 32-bit word. Thus, the performance problem can be seen by referring toFIG. 1 . The instruction address of instr_2 is actually 0x2, however, the fetch address is 0x0, and a fetch of this address will cause the entire 32-bit word comprised of 16-bits of instr_1 and 16-bits of instr_2 to be fetched. Under a simple BPU configuration, a branch prediction will be made for instr_1 based on the instruction fetch of the 32-bit word at address 0x0. The branch predictor does not take into account the fact that the instr_1 at 0x0 will be discarded by the aligner before it can be issued, however the prediction remains. The prediction would be correct if instr_1 is fetched as the result of a sequential fetch of 0x0, or if a branch was made to 0x0, but, in this case, where a branch is made to instr_2 at 0x2, the prediction is wrong. As a result, the prediction is wrong for instr_2 causing an incorrect instruction to hit the backstop and a pipeline flush, a severe performance penalty, to occur. -
FIG. 2 is a flow chart outlining the steps of a method for solving the aforementioned problem by selectively discarding branch prediction information in accordance with various embodiments of the invention. Operation of the method begins atstep 200 and proceeds to step 205 where a 32-bit word is read from memory at the specified fetch address of the target instruction. Next, instep 210, a prediction is made based on this fetched instruction. This prediction is based on the aligned instruction fetch location. Operation of the method then proceeds to step 215 where the first part of a two-part determination test is applied to whether the branch prediction lookup is based on a non-sequential fetch to an unaligned instruction address. In the context ofFIG. 1 , this condition would be satisfied by instr_2 because it is non-aligned (it does not start at the beginning of line 0x0, but rather after the first 16-bits). However, this condition alone is not sufficient because a valid branch prediction lookup can be based on a branch located at an unaligned instruction address. For example, if inFIG. 1 instr_1 is not a branch and instr_2 is a branch. If, instep 215, it is determined that the branch prediction lookup is based on a non-sequential fetch to an unaligned instruction address, operation of the method proceeds to the next step of the test,step 220. Otherwise, operation of the method jumps to step 225, where the prediction is assumed valid and passed. - Returning to step 220, in this step a second determination is made as to whether the branch target address cache (BTAC) alignment bit is 0, indicating that the prediction information is for an aligned branch. This bit will be 0 for all aligned branches and will be 1 for all unaligned branches because it is derived from the instruction address. The second bit of the instruction address will always be 0 for aligned branches (i.e., 0, 4, 8, f, etc.) and will always be 1 for unaligned branches (i.e., 2, 6, a, etc.). If, in
step 220, it is determined that the branch target address cache (BTAC) alignment bit is not 0, operation proceeds to step 225 where the prediction is passed. Otherwise, if instep 220 it is determined that the BTAC alignment bit is 0, operation of the method proceeds to step 230, where the prediction is discarded. Thus, rather than causing an incorrect instruction to be injected into the pipeline which will ultimately cause a pipeline flush, the next sequential instruction will be correctly fetched. Afterstep 230, operation of the method is the same as afterstep 225, where the next fetch address is updated instep 235 based on whether a branch was predicted and returns to step 205 where the next fetch occurs. - As discussed above, dynamic branch prediction is an effective technique to reduce branch penalty in a pipeline processor architecture. This technique uses the instruction fetch address to look up in internal tables recording program flow history to predict the target of a non-sequential program flow. Also, discussed above, branch prediction is complicated when a variable-length instruction architecture is used. In a variable-length instruction architecture, the instruction fetch address cannot be assumed to be identical to the actual instruction address. This makes it difficult for the branch prediction algorithm to guarantee sufficient instruction words are fetched and at the same time minimize unnecessary fetches.
- One known method of ameliorating this problem is to add extra pipeline stages to the front of the processor pipeline to perform branch prediction prior to the instruction fetch to allow more time for the prediction mechanism to make a better decision. A negative consequence of this approach is that extra pipeline stages increase the penalty to correct an incorrect prediction. Alternatively, the extra pipeline stages would not be needed if prediction could be performed concurrent to instruction fetch. However, such a design has an inherent latency in which extra instructions are already fetched by the time a prediction is made.
- Traditional branch prediction schemes use the instruction address of a branch instruction (non-sequential program instruction) to index its internal tables.
FIG. 3 illustrates such a conventional indexing method in which two instructions are sequentially fetched, the first instruction being a branch instruction, and the second being the next sequential instruction word. Instep 300, the branch instruction is fetched with the associated BPU table entry. In the next clock cycle, instep 305, this instruction is propagated in the pipeline to the next stage where it is detected as a predicted branch while the next instruction is fetched. Then, atstep 310, in the next clock cycle, the target instruction is fetched based on the branch prediction made in the last cycle. Thus, a latency is introduced because three steps are required to fetch the branch instruction, make a prediction and fetch the target instruction. If the instruction word fetched in 305 is not part of the branch nor of its delay slot, then the word is discarded and as a result a “bubble” is injected into the pipeline. -
FIG. 4 illustrates a novel and improved method for making a branch prediction in accordance with various embodiments of the invention. The method depicted inFIG. 4 is characterized in that the instruction address of the instruction preceding the branch instruction is used to index the BPU table rather than the instruction address of the branch instruction itself. As a result, by fetching the instruction just prior to the branch instruction, a prediction can be made from the address of this instruction while the branch instruction itself is being fetched. - Referring specifically to
FIG. 4 , the method begins instep 400 where the instruction prior to the branch instruction is fetched together with the BPU entry containing prediction information of the next instruction. Next, instep 405, the branch instruction is fetched while, concurrently, a prediction on this branch can be made based on information fetched in the previous cycle. Then, in step 410, in the next clock cycle, the target instruction is fetched. As illustrated, no extra instruction word is fetched between the branch and the target instructions. Hence, no bubble will be injected into the pipeline and overall performance of the processor is improved. - It should be noted that in some cases, due to the use of delay slot instructions, the branch instruction may not be the departure point (the instruction prior to non-sequential flow). Rather another instruction may appear after the branch instruction. Therefore, though the non-sequential jump is dictated by the branch instruction, the last instruction to be executed may not be the branch instruction, but may rather be the delay slot instruction. A delay slot is used in some processor architectures with short pipelines to hide branch resolution latency. Processors with dynamic branch prediction might still have to support the concept of delay slots to be compatible with legacy code. Where a delay slot instruction is used after the branch instruction, utilizing the above branch prediction scheme will cause the instruction address of the branch instruction, not the instruction before the branch instruction, to be used to index the BPU tables, because this instruction is actually the instruction before the last instruction. This fact has significant consequences for branch resolution as will be discussed below. Namely, in order to effectively perform branch resolution, we must know the last fetch address of the previous instruction.
- As stated above, branch resolution occurs in the selection stage of the pipeline and causes the BPU to be updated to reflect the outcome of the conditional branch during the write-back stage. Referring to
FIG. 5 ,FIG. 5 illustrates five potential scenarios encountered when performing branch resolution. These scenarios may be grouped into two groups by the way in which they are handled. Group one comprises a non-aligned 16-bit instruction and an aligned 16 or 32-bit instruction. Group two comprises one of three scenarios: a non-aligned 32 or 48-bit instruction, a non-aligned 48-bit or an aligned 64-bit instruction, and a non-aligned 64-bit instruction. - Two pieces of information need to be computed for every instruction under this scheme: namely the last fetch address of the current instruction, L0 and the update address of the current instruction U0. In the case of the scenarios of group one, it is also necessary to know L−1, the last fetch address of the previous instruction, and U−1, the update address of the previous instruction. Looking at both the first scenario and second scenarios, a non-aligned 16 bit instruction, and an aligned 16 or 32 bit instruction respectively, L0 is simply the 30 most significant bits of the fetch address denoted as instr_addr[31:2]. However, because in both of the scenarios, the instruction address spans only one fetch address line, the update address U0 depends on whether these instructions were arrived at sequentially or as the result of a non-sequential instruction. However, in keeping with the method discussed in the context of
FIG. 4 , we know the last fetch address of the prior instruction , also known as L−1. This information is stored internally and is available as a variable to the current instruction in the select stage of the pipeline. In the first scenario, if the current instruction is arrived at through sequential program flow, it has the same departure address as the prior instruction and hence U0 will be U−1. Otherwise, the update address will be the last fetch address of the prior non-sequential instruction L−1. In the second scenario, the update address of a 16 or 32-bit aligned instruction, U0 will be the last fetch address of the prior instruction L−1, irrespective of whether the prior instruction was sequential or not. - Scenarios 3-5 can be handled in the same manner by taking advantage of the fact that each instruction fetch fetches a contiguous 32-bit word. Therefore, when the instruction is sufficiently long and/or unaligned to span two or more consecutive fetched instruction words in memory, we know with certainty that L0, the last fetch address, can be derived from the instruction address of the next sequential instruction, denoted as next_addr[31:2] in
FIG. 5 . Inscenarios scenario 4, covering non-aligned 48-bit or aligned 64-bit instructions, the fetch address of the last portion of the current instruction is one less than the start address of the next sequential instruction. Hence, L0=next_addr[31:2]−1. On the other hand, inscenario scenario 5, the last portion of the current instruction shares the same fetch address as the start of the next sequential instruction. Hence, U0 will be next_addr[31:2]−1. In the scheme just described, the update address U0 and last fetch address L0 are computed based on 4 values that are provided to the selection stage as early arriving signals directly from registers. These signals are namely inst_addr, next_addr, L−1 and U−1. Only one multiplexer is required to compute U0 inscenario 1, and one decrementer is required to compute L0 inscenario 4 and U0 inscenario 5. The overall complexity of the novel and improved branch prediction method being disclosed is only marginally increased comparing with traditional methods. - In yet another embodiment of the invention, a method and apparatus are provided for computing the last instruction fetch of a zero-overhead loop for dynamic branch prediction in a variable length instruction set microprocessor. Zero-overhead loops, as well as the previously discussed dynamic branch prediction, are both powerful techniques for improving effective processor performance. In a microprocessor employing both techniques, the BPU has to be updated whenever the zero-overhead loop mechanism is updated. In particular, the BPU needs the last instruction fetch address of the loop body. This allows the BPU to re-direct instruction fetch to the start of the loop body whenever an instruction fetch hits the end of the loop body. However, in a variable-length instruction architecture, determining the last fetch address of a loop body is not trivial. Typically, a processor with a variable-length instruction set only keeps track of the first address an instruction is fetched from. However, the last fetch address of a loop body is the fetch address of the last portion of the last instruction of the loop body and is not readily available.
- Typically, a zero-overhead loop mechanism requires an address related to the end of the loop body to be stored as part of the architectural state. In various exemplary embodiments, this address can be denoted as LP_END. If LP_END is assigned the address of the next instruction after the last instruction of the loop body, the last fetch address of the loop body, designated in various exemplary embodiments as LP_LAST, can be derived by exploiting two facts. Firstly, despite the variable length nature of the instruction set, instructions are fetched in fixed size chunks, namely 32-bit words. The BPU works only with the fetch address of theses fixed size chunks. Secondly, instruction sizes of variable-length are usually an integer multiple of a fixed size, namely 16-bits. Based on these facts, an instruction can be classified as aligned if the start address of the instruction is the same as the fetch address. If LP_END is an aligned address, LP_LAST must be the fetch address that precedes that of LP_END. If LP_END is non aligned, LP_LAST is the fetch address of LP_END. Thus, the equation LP_LAST=LP_END[31:2]−(˜LP_END[1]) can be used to derive the LP_LAST whether or not LP_END is aligned.
- Referring to
FIGS. 6 and 7 , two examples are illustrated in which LP_END is both non-aligned and aligned. In both cases, the instruction “sub” is the last instruction of the loop body. In the first case, LP_END is located at 0xA. In this case, LP_END is unaligned and LP_END[1] is 1, thus, the inversion of LP_END[1] is 0 and the last fetch address of the loop body, LP_LAST, is LP_END[32:2] which is 0x8. In the second case, LP_END is aligned and located at 0x18. LP_END[1] is 0, as with all aligned instructions, thus, the inversion of LP_END is 1 and LP13LAST is LP_END[31:2]−1 or the line above LP_LAST, line 0x14. Note that in the above calculations least significant bits of addresses that are known to be zero are ignored for the sake of simplifying the description. - While the foregoing description includes many details and specificities, it is to be understood that these have been included for purposes of explanation only, and are not to be interpreted as limitations of the present invention. Many modifications to the embodiments described above can be made without departing from the spirit and scope of the invention.
Claims (12)
1. In a microprocessor, a method of performing branch prediction using variable length instructions, the method comprising:
fetching an instruction from memory based on a specified fetch address;
making a branch prediction based on the address of the fetched instruction; and
discarding the branch prediction if:
(1) a branch prediction look-up was based on a non-sequential fetch to an unaligned instruction address; and
(2) a branch target alignment cache (BTAC) bit of the instruction is equal to a predefined value.
2. The method according to claim 1 , further comprising passing a predicted instruction associated with the branch prediction if either (1) or (2) is false.
3. The method according to claim 2 , further comprising updating a next fetch address if a branch prediction is incorrect.
4. The method according to claim 3 , wherein the microprocessor comprises an instruction for pipeline having a select stage, and updating comprises after resolving a branch in the select stage, updating the branch prediction unit (BPU) with the address of the next instruction resulting from that branch.
5. The method according to claim 1 , wherein making a branch prediction comprises parsing a branch look-up table of a branch prediction unit (BPU) that indexes non-sequential branch instructions by their addresses in association with the next instruction taken.
6. The method according to claim 1 , wherein an instruction is determined to be unaligned if it does not start at the beginning of a memory address line.
7. The method according to claim 1 , wherein a BTAC alignment bit will be one of a 0 or a 1 for an aligned branch instruction and the other of a 0 or a 1 for an unaligned branch instruction.
8. In a microprocessor, a method of performing dynamic branch prediction comprising:
fetching the penultimate instruction word prior to a non-sequential program flow and a branch prediction unit look-up table entry containing prediction information for a next instruction on a first clock cycle;
fetching the last instruction word prior to a non-sequential program flow and making a prediction on this non-sequential program flow based on information fetched in the previous cycle on a second clock cycle; and
fetching the predicted target instruction on a third clock cycle.
9. The method according to claim 8 , wherein fetching an instruction prior to a branch instruction and a branch prediction look-up table entry comprises using the instruction address of the instruction just prior to the branch instruction in the program flow index the branch in the branch table.
10. The method according to claim 9 , wherein if a delay slot instruction appears after the branch instruction, fetching an instruction prior to a branch instruction and a branch prediction look-up table entry comprises using the instruction address of the branch instruction, not the instruction before the branch instruction, to index the BPU tables.
11. A method of updating a look-up table of a branch prediction unit in a variable length instruction set microprocessor, the method comprising:
storing a last fetch address of a last instruction of a loop body of a zero overhead loop in the branch prediction look-up table; and
predictively re-directing an instruction fetch to the start of the loop body whenever an instruction fetch hits the end of a loop body, wherein the last fetch address of the loop body is derived from the address of the first instruction after the end of the loop.
12. The method according to claim 11 , wherein storing comprises, if the next instruction after the end of the loop body has an aligned address, the last instruction of the loop body has a last fetch address immediately preceding the address of the next instruction after the end of the loop body, otherwise, if the next instruction after the end of the loop body has an unaligned address, the last instruction of the loop body has a last fetch address the same as the address of the next instruction after the loop body.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/132,428 US20050278517A1 (en) | 2004-05-19 | 2005-05-19 | Systems and methods for performing branch prediction in a variable length instruction set microprocessor |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US57223804P | 2004-05-19 | 2004-05-19 | |
US11/132,428 US20050278517A1 (en) | 2004-05-19 | 2005-05-19 | Systems and methods for performing branch prediction in a variable length instruction set microprocessor |
Publications (1)
Publication Number | Publication Date |
---|---|
US20050278517A1 true US20050278517A1 (en) | 2005-12-15 |
Family
ID=35429033
Family Applications (7)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/132,428 Abandoned US20050278517A1 (en) | 2004-05-19 | 2005-05-19 | Systems and methods for performing branch prediction in a variable length instruction set microprocessor |
US11/132,448 Abandoned US20050289323A1 (en) | 2004-05-19 | 2005-05-19 | Barrel shifter for a microprocessor |
US11/132,432 Abandoned US20050273559A1 (en) | 2004-05-19 | 2005-05-19 | Microprocessor architecture including unified cache debug unit |
US11/132,423 Abandoned US20050278513A1 (en) | 2004-05-19 | 2005-05-19 | Systems and methods of dynamic branch prediction in a microprocessor |
US11/132,447 Abandoned US20050278505A1 (en) | 2004-05-19 | 2005-05-19 | Microprocessor architecture including zero impact predictive data pre-fetch mechanism for pipeline data memory |
US11/132,424 Active 2031-02-12 US8719837B2 (en) | 2004-05-19 | 2005-05-19 | Microprocessor architecture having extendible logic |
US14/222,194 Active US9003422B2 (en) | 2004-05-19 | 2014-03-21 | Microprocessor architecture having extendible logic |
Family Applications After (6)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/132,448 Abandoned US20050289323A1 (en) | 2004-05-19 | 2005-05-19 | Barrel shifter for a microprocessor |
US11/132,432 Abandoned US20050273559A1 (en) | 2004-05-19 | 2005-05-19 | Microprocessor architecture including unified cache debug unit |
US11/132,423 Abandoned US20050278513A1 (en) | 2004-05-19 | 2005-05-19 | Systems and methods of dynamic branch prediction in a microprocessor |
US11/132,447 Abandoned US20050278505A1 (en) | 2004-05-19 | 2005-05-19 | Microprocessor architecture including zero impact predictive data pre-fetch mechanism for pipeline data memory |
US11/132,424 Active 2031-02-12 US8719837B2 (en) | 2004-05-19 | 2005-05-19 | Microprocessor architecture having extendible logic |
US14/222,194 Active US9003422B2 (en) | 2004-05-19 | 2014-03-21 | Microprocessor architecture having extendible logic |
Country Status (5)
Country | Link |
---|---|
US (7) | US20050278517A1 (en) |
CN (1) | CN101002169A (en) |
GB (1) | GB2428842A (en) |
TW (1) | TW200602974A (en) |
WO (1) | WO2005114441A2 (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050278505A1 (en) * | 2004-05-19 | 2005-12-15 | Lim Seow C | Microprocessor architecture including zero impact predictive data pre-fetch mechanism for pipeline data memory |
WO2008039975A1 (en) * | 2006-09-29 | 2008-04-03 | Qualcomm Incorporated | Effective use of a bht in processor having variable length instruction set execution modes |
US7971042B2 (en) | 2005-09-28 | 2011-06-28 | Synopsys, Inc. | Microprocessor system and method for instruction-initiated recording and execution of instruction sequences in a dynamically decoupleable extended instruction pipeline |
WO2013101152A1 (en) * | 2011-12-30 | 2013-07-04 | Intel Corporation | Embedded branch prediction unit |
GB2545796A (en) * | 2015-11-09 | 2017-06-28 | Imagination Tech Ltd | Fetch ahead branch target buffer |
CN107179895A (en) * | 2017-05-17 | 2017-09-19 | 北京中科睿芯科技有限公司 | A kind of method that application compound instruction accelerates instruction execution speed in data flow architecture |
WO2018009277A1 (en) * | 2016-07-07 | 2018-01-11 | Intel Corporation | Graphics command parsing mechanism |
CN110442382A (en) * | 2019-07-31 | 2019-11-12 | 西安芯海微电子科技有限公司 | Prefetch buffer control method, device, chip and computer readable storage medium |
CN110727463A (en) * | 2019-09-12 | 2020-01-24 | 无锡江南计算技术研究所 | Zero-level instruction circular buffer prefetching method and device based on dynamic credit |
US20200065112A1 (en) * | 2018-08-22 | 2020-02-27 | Qualcomm Incorporated | Asymmetric speculative/nonspeculative conditional branching |
CN112015490A (en) * | 2020-11-02 | 2020-12-01 | 鹏城实验室 | Method, apparatus and medium for programmable device implementing and testing reduced instruction set |
US11182166B2 (en) | 2019-05-23 | 2021-11-23 | Samsung Electronics Co., Ltd. | Branch prediction throughput by skipping over cachelines without branches |
US11663007B2 (en) * | 2021-10-01 | 2023-05-30 | Arm Limited | Control of branch prediction for zero-overhead loop |
Families Citing this family (59)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7577795B2 (en) * | 2006-01-25 | 2009-08-18 | International Business Machines Corporation | Disowning cache entries on aging out of the entry |
US20070260862A1 (en) * | 2006-05-03 | 2007-11-08 | Mcfarling Scott | Providing storage in a memory hierarchy for prediction information |
US7752468B2 (en) | 2006-06-06 | 2010-07-06 | Intel Corporation | Predict computing platform memory power utilization |
US7555605B2 (en) * | 2006-09-28 | 2009-06-30 | Freescale Semiconductor, Inc. | Data processing system having cache memory debugging support and method therefor |
US7529909B2 (en) * | 2006-12-28 | 2009-05-05 | Microsoft Corporation | Security verified reconfiguration of execution datapath in extensible microcomputer |
US7779241B1 (en) * | 2007-04-10 | 2010-08-17 | Dunn David A | History based pipelined branch prediction |
US8209488B2 (en) * | 2008-02-01 | 2012-06-26 | International Business Machines Corporation | Techniques for prediction-based indirect data prefetching |
US8166277B2 (en) * | 2008-02-01 | 2012-04-24 | International Business Machines Corporation | Data prefetching using indirect addressing |
US9519480B2 (en) * | 2008-02-11 | 2016-12-13 | International Business Machines Corporation | Branch target preloading using a multiplexer and hash circuit to reduce incorrect branch predictions |
US9201655B2 (en) * | 2008-03-19 | 2015-12-01 | International Business Machines Corporation | Method, computer program product, and hardware product for eliminating or reducing operand line crossing penalty |
US8181003B2 (en) * | 2008-05-29 | 2012-05-15 | Axis Semiconductor, Inc. | Instruction set design, control and communication in programmable microprocessor cores and the like |
US8131982B2 (en) * | 2008-06-13 | 2012-03-06 | International Business Machines Corporation | Branch prediction instructions having mask values involving unloading and loading branch history data |
US8225069B2 (en) * | 2009-03-31 | 2012-07-17 | Intel Corporation | Control of on-die system fabric blocks |
US10338923B2 (en) * | 2009-05-05 | 2019-07-02 | International Business Machines Corporation | Branch prediction path wrong guess instruction |
JP5423156B2 (en) * | 2009-06-01 | 2014-02-19 | 富士通株式会社 | Information processing apparatus and branch prediction method |
US8954714B2 (en) * | 2010-02-01 | 2015-02-10 | Altera Corporation | Processor with cycle offsets and delay lines to allow scheduling of instructions through time |
US8521999B2 (en) * | 2010-03-11 | 2013-08-27 | International Business Machines Corporation | Executing touchBHT instruction to pre-fetch information to prediction mechanism for branch with taken history |
US8495287B2 (en) * | 2010-06-24 | 2013-07-23 | International Business Machines Corporation | Clock-based debugging for embedded dynamic random access memory element in a processor core |
US10866807B2 (en) | 2011-12-22 | 2020-12-15 | Intel Corporation | Processors, methods, systems, and instructions to generate sequences of integers in numerical order that differ by a constant stride |
WO2013095563A1 (en) | 2011-12-22 | 2013-06-27 | Intel Corporation | Packed data rearrangement control indexes precursors generation processors, methods, systems, and instructions |
US10565283B2 (en) | 2011-12-22 | 2020-02-18 | Intel Corporation | Processors, methods, systems, and instructions to generate sequences of consecutive integers in numerical order |
US10223111B2 (en) | 2011-12-22 | 2019-03-05 | Intel Corporation | Processors, methods, systems, and instructions to generate sequences of integers in which integers in consecutive positions differ by a constant integer stride and where a smallest integer is offset from zero by an integer offset |
US9851973B2 (en) * | 2012-03-30 | 2017-12-26 | Intel Corporation | Dynamic branch hints using branches-to-nowhere conditional branch |
US9152424B2 (en) | 2012-06-14 | 2015-10-06 | International Business Machines Corporation | Mitigating instruction prediction latency with independently filtered presence predictors |
US9135012B2 (en) | 2012-06-14 | 2015-09-15 | International Business Machines Corporation | Instruction filtering |
KR101996351B1 (en) * | 2012-06-15 | 2019-07-05 | 인텔 코포레이션 | A virtual load store queue having a dynamic dispatch window with a unified structure |
US9378017B2 (en) * | 2012-12-29 | 2016-06-28 | Intel Corporation | Apparatus and method of efficient vector roll operation |
CN103425498B (en) * | 2013-08-20 | 2018-07-24 | 复旦大学 | A kind of long instruction words command memory of low-power consumption and its method for optimizing power consumption |
US10372590B2 (en) * | 2013-11-22 | 2019-08-06 | International Business Corporation | Determining instruction execution history in a debugger |
US9870226B2 (en) * | 2014-07-03 | 2018-01-16 | The Regents Of The University Of Michigan | Control of switching between executed mechanisms |
US9910670B2 (en) * | 2014-07-09 | 2018-03-06 | Intel Corporation | Instruction set for eliminating misaligned memory accesses during processing of an array having misaligned data rows |
US9740607B2 (en) | 2014-09-03 | 2017-08-22 | Micron Technology, Inc. | Swap operations in memory |
TWI569207B (en) * | 2014-10-28 | 2017-02-01 | 上海兆芯集成電路有限公司 | Fractional use of prediction history storage for operating system routines |
US9665374B2 (en) * | 2014-12-18 | 2017-05-30 | Intel Corporation | Binary translation mechanism |
US9792116B2 (en) * | 2015-04-24 | 2017-10-17 | Optimum Semiconductor Technologies, Inc. | Computer processor that implements pre-translation of virtual addresses with target registers |
US10346168B2 (en) | 2015-06-26 | 2019-07-09 | Microsoft Technology Licensing, Llc | Decoupled processor instruction window and operand buffer |
US10776115B2 (en) * | 2015-09-19 | 2020-09-15 | Microsoft Technology Licensing, Llc | Debug support for block-based processor |
GB2548601B (en) * | 2016-03-23 | 2019-02-13 | Advanced Risc Mach Ltd | Processing vector instructions |
US10599428B2 (en) | 2016-03-23 | 2020-03-24 | Arm Limited | Relaxed execution of overlapping mixed-scalar-vector instructions |
CN109690536B (en) * | 2017-02-16 | 2021-03-23 | 华为技术有限公司 | Method and system for fetching multi-core instruction traces from virtual platform simulator to performance simulation model |
US9959247B1 (en) | 2017-02-17 | 2018-05-01 | Google Llc | Permuting in a matrix-vector processor |
US10902348B2 (en) | 2017-05-19 | 2021-01-26 | International Business Machines Corporation | Computerized branch predictions and decisions |
GB2564390B (en) * | 2017-07-04 | 2019-10-02 | Advanced Risc Mach Ltd | An apparatus and method for controlling use of a register cache |
US11868804B1 (en) | 2019-11-18 | 2024-01-09 | Groq, Inc. | Processor instruction dispatch configuration |
US11360934B1 (en) | 2017-09-15 | 2022-06-14 | Groq, Inc. | Tensor streaming processor architecture |
US11114138B2 (en) | 2017-09-15 | 2021-09-07 | Groq, Inc. | Data structures with multiple read ports |
US11243880B1 (en) | 2017-09-15 | 2022-02-08 | Groq, Inc. | Processor architecture |
US10372459B2 (en) | 2017-09-21 | 2019-08-06 | Qualcomm Incorporated | Training and utilization of neural branch predictor |
US11170307B1 (en) | 2017-09-21 | 2021-11-09 | Groq, Inc. | Predictive model compiler for generating a statically scheduled binary with known resource constraints |
US11537687B2 (en) | 2018-11-19 | 2022-12-27 | Groq, Inc. | Spatial locality transform of matrices |
US11163577B2 (en) | 2018-11-26 | 2021-11-02 | International Business Machines Corporation | Selectively supporting static branch prediction settings only in association with processor-designated types of instructions |
US11086631B2 (en) | 2018-11-30 | 2021-08-10 | Western Digital Technologies, Inc. | Illegal instruction exception handling |
CN109783384A (en) * | 2019-01-10 | 2019-05-21 | 未来电视有限公司 | Log use-case test method, log use-case test device and electronic equipment |
CN114930351A (en) | 2019-11-26 | 2022-08-19 | 格罗克公司 | Loading operands from a multidimensional array and outputting results using only a single side |
CN113076277B (en) * | 2021-03-26 | 2024-05-03 | 大唐微电子技术有限公司 | Method, device, computer storage medium and terminal for realizing pipeline scheduling |
US12067395B2 (en) | 2021-08-12 | 2024-08-20 | Tenstorrent Inc. | Pre-staged instruction registers for variable length instruction set machine |
US11599358B1 (en) | 2021-08-12 | 2023-03-07 | Tenstorrent Inc. | Pre-staged instruction registers for variable length instruction set machine |
CN115495155B (en) * | 2022-11-18 | 2023-03-24 | 北京数渡信息科技有限公司 | Hardware circulation processing device suitable for general processor |
CN117193861B (en) * | 2023-11-07 | 2024-03-15 | 芯来智融半导体科技(上海)有限公司 | Instruction processing method, apparatus, computer device and storage medium |
Citations (74)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5155843A (en) * | 1990-06-29 | 1992-10-13 | Digital Equipment Corporation | Error transition mode for multi-processor system |
US5423011A (en) * | 1992-06-11 | 1995-06-06 | International Business Machines Corporation | Apparatus for initializing branch prediction information |
US5493687A (en) * | 1991-07-08 | 1996-02-20 | Seiko Epson Corporation | RISC microprocessor architecture implementing multiple typed register sets |
US5530825A (en) * | 1994-04-15 | 1996-06-25 | Motorola, Inc. | Data processor with branch target address cache and method of operation |
US5752014A (en) * | 1996-04-29 | 1998-05-12 | International Business Machines Corporation | Automatic selection of branch prediction methodology for subsequent branch instruction based on outcome of previous branch prediction |
US5778423A (en) * | 1990-06-29 | 1998-07-07 | Digital Equipment Corporation | Prefetch instruction for improving performance in reduced instruction set processor |
US5805876A (en) * | 1996-09-30 | 1998-09-08 | International Business Machines Corporation | Method and system for reducing average branch resolution time and effective misprediction penalty in a processor |
US5808876A (en) * | 1997-06-20 | 1998-09-15 | International Business Machines Corporation | Multi-function power distribution system |
US5920711A (en) * | 1995-06-02 | 1999-07-06 | Synopsys, Inc. | System for frame-based protocol, graphical capture, synthesis, analysis, and simulation |
US5978909A (en) * | 1997-11-26 | 1999-11-02 | Intel Corporation | System for speculative branch target prediction having a dynamic prediction history buffer and a static prediction history buffer |
US6076158A (en) * | 1990-06-29 | 2000-06-13 | Digital Equipment Corporation | Branch prediction in high-performance processor |
US6151672A (en) * | 1998-02-23 | 2000-11-21 | Hewlett-Packard Company | Methods and apparatus for reducing interference in a branch history table of a microprocessor |
US6189091B1 (en) * | 1998-12-02 | 2001-02-13 | Ip First, L.L.C. | Apparatus and method for speculatively updating global history and restoring same on branch misprediction detection |
US6253287B1 (en) * | 1998-09-09 | 2001-06-26 | Advanced Micro Devices, Inc. | Using three-dimensional storage to make variable-length instructions appear uniform in two dimensions |
US20010016903A1 (en) * | 1998-12-03 | 2001-08-23 | Marc Tremblay | Software branch prediction filtering for a microprocessor |
US20010021974A1 (en) * | 2000-02-01 | 2001-09-13 | Samsung Electronics Co., Ltd. | Branch predictor suitable for multi-processing microprocessor |
US6292879B1 (en) * | 1995-10-25 | 2001-09-18 | Anthony S. Fong | Method and apparatus to specify access control list and cache enabling and cache coherency requirement enabling on individual operands of an instruction of a computer |
US20010032309A1 (en) * | 1999-03-18 | 2001-10-18 | Henry G. Glenn | Static branch prediction mechanism for conditional branch instructions |
US20010040686A1 (en) * | 1998-06-26 | 2001-11-15 | Heidi M. Schoolcraft | Streamlined tetrahedral interpolation |
US20010044892A1 (en) * | 1997-09-10 | 2001-11-22 | Shinichi Yamaura | Method and system for high performance implementation of microprocessors |
US6339822B1 (en) * | 1998-10-02 | 2002-01-15 | Advanced Micro Devices, Inc. | Using padded instructions in a block-oriented cache |
US20020066006A1 (en) * | 2000-11-29 | 2002-05-30 | Lsi Logic Corporation | Simple branch prediction and misprediction recovery method |
US20020069351A1 (en) * | 2000-12-05 | 2002-06-06 | Shyh-An Chi | Memory data access structure and method suitable for use in a processor |
US20020073301A1 (en) * | 2000-12-07 | 2002-06-13 | International Business Machines Corporation | Hardware for use with compiler generated branch information |
US20020078332A1 (en) * | 2000-12-19 | 2002-06-20 | Seznec Andre C. | Conflict free parallel read access to a bank interleaved branch predictor in a processor |
US20020083312A1 (en) * | 2000-12-27 | 2002-06-27 | Balaram Sinharoy | Branch Prediction apparatus and process for restoring replaced branch history for use in future branch predictions for an executing program |
US20020087852A1 (en) * | 2000-12-28 | 2002-07-04 | Jourdan Stephan J. | Method and apparatus for predicting branches using a meta predictor |
US20020087851A1 (en) * | 2000-12-28 | 2002-07-04 | Matsushita Electric Industrial Co., Ltd. | Microprocessor and an instruction converter |
US20020138236A1 (en) * | 2001-03-21 | 2002-09-26 | Akihiro Takamura | Processor having execution result prediction function for instruction |
US20020157000A1 (en) * | 2001-03-01 | 2002-10-24 | International Business Machines Corporation | Software hint to improve the branch target prediction accuracy |
US20020188833A1 (en) * | 2001-05-04 | 2002-12-12 | Ip First Llc | Dual call/return stack branch prediction system |
US20020194462A1 (en) * | 2001-05-04 | 2002-12-19 | Ip First Llc | Apparatus and method for selecting one of multiple target addresses stored in a speculative branch target address cache per instruction cache line |
US20020194461A1 (en) * | 2001-05-04 | 2002-12-19 | Ip First Llc | Speculative branch target address cache |
US20020194464A1 (en) * | 2001-05-04 | 2002-12-19 | Ip First Llc | Speculative branch target address cache with selective override by seconday predictor based on branch instruction type |
US20020194463A1 (en) * | 2001-05-04 | 2002-12-19 | Ip First Llc, | Speculative hybrid branch direction predictor |
US20020199092A1 (en) * | 1999-11-05 | 2002-12-26 | Ip-First Llc | Split history tables for branch prediction |
US20030023838A1 (en) * | 2001-07-27 | 2003-01-30 | Karim Faraydon O. | Novel fetch branch architecture for reducing branch penalty without branch prediction |
US6550056B1 (en) * | 1999-07-19 | 2003-04-15 | Mitsubishi Denki Kabushiki Kaisha | Source level debugger for debugging source programs |
US6609194B1 (en) * | 1999-11-12 | 2003-08-19 | Ip-First, Llc | Apparatus for performing branch target address calculation based on branch type |
US6622240B1 (en) * | 1999-06-18 | 2003-09-16 | Intrinsity, Inc. | Method and apparatus for pre-branch instruction |
US20030204705A1 (en) * | 2002-04-30 | 2003-10-30 | Oldfield William H. | Prediction of branch instructions in a data processing apparatus |
US20040015683A1 (en) * | 2002-07-18 | 2004-01-22 | International Business Machines Corporation | Two dimensional branch history table prefetching mechanism |
US20040049660A1 (en) * | 2002-09-06 | 2004-03-11 | Mips Technologies, Inc. | Method and apparatus for clearing hazards using jump instructions |
US20040068643A1 (en) * | 1997-08-01 | 2004-04-08 | Dowling Eric M. | Method and apparatus for high performance branching in pipelined microsystems |
US6774832B1 (en) * | 2003-03-25 | 2004-08-10 | Raytheon Company | Multi-bit output DDS with real time delta sigma modulation look up from memory |
US20040172524A1 (en) * | 2001-06-29 | 2004-09-02 | Jan Hoogerbrugge | Method, apparatus and compiler for predicting indirect branch target addresses |
US20040186985A1 (en) * | 2003-03-21 | 2004-09-23 | Analog Devices, Inc. | Method and apparatus for branch prediction based on branch targets |
US20040193843A1 (en) * | 2003-03-31 | 2004-09-30 | Eran Altshuler | System and method for early branch prediction |
US20040193855A1 (en) * | 2003-03-31 | 2004-09-30 | Nicolas Kacevas | System and method for branch prediction access |
US20040225871A1 (en) * | 1999-10-01 | 2004-11-11 | Naohiko Irie | Branch control memory |
US20040225870A1 (en) * | 2003-05-07 | 2004-11-11 | Srinivasan Srikanth T. | Method and apparatus for reducing wrong path execution in a speculative multi-threaded processor |
US20040225872A1 (en) * | 2002-06-04 | 2004-11-11 | International Business Machines Corporation | Hybrid branch prediction using a global selection counter and a prediction method comparison table |
US20040230782A1 (en) * | 2003-05-12 | 2004-11-18 | International Business Machines Corporation | Method and system for processing loop branch instructions |
US6823444B1 (en) * | 2001-07-03 | 2004-11-23 | Ip-First, Llc | Apparatus and method for selectively accessing disparate instruction buffer stages based on branch target address cache hit and instruction stage wrap |
US20040255104A1 (en) * | 2003-06-12 | 2004-12-16 | Intel Corporation | Method and apparatus for recycling candidate branch outcomes after a wrong-path execution in a superscalar processor |
US20040268102A1 (en) * | 2003-06-30 | 2004-12-30 | Combs Jonathan D. | Mechanism to remove stale branch predictions at a microprocessor |
US20050027974A1 (en) * | 2003-07-31 | 2005-02-03 | Oded Lempel | Method and system for conserving resources in an instruction pipeline |
US20050050309A1 (en) * | 2003-08-29 | 2005-03-03 | Renesas Technology Corp. | Data processor |
US20050066305A1 (en) * | 2003-09-22 | 2005-03-24 | Lisanke Robert John | Method and machine for efficient simulation of digital hardware within a software development environment |
US20050076193A1 (en) * | 2003-09-08 | 2005-04-07 | Ip-First, Llc. | Apparatus and method for selectively overriding return stack prediction in response to detection of non-standard return sequence |
US20050091479A1 (en) * | 2003-10-24 | 2005-04-28 | Sung-Woo Chung | Branch predictor, system and method of branch prediction |
US20050125613A1 (en) * | 2003-12-03 | 2005-06-09 | Sangwook Kim | Reconfigurable trace cache |
US20050125632A1 (en) * | 2003-12-03 | 2005-06-09 | Advanced Micro Devices, Inc. | Transitioning from instruction cache to trace cache on label boundaries |
US20050125634A1 (en) * | 2002-10-04 | 2005-06-09 | Fujitsu Limited | Processor and instruction control method |
US20050154867A1 (en) * | 2004-01-14 | 2005-07-14 | International Business Machines Corporation | Autonomic method and apparatus for counting branch instructions to improve branch predictions |
US20050172277A1 (en) * | 2004-02-04 | 2005-08-04 | Saurabh Chheda | Energy-focused compiler-assisted branch prediction |
US20050216703A1 (en) * | 2004-03-26 | 2005-09-29 | International Business Machines Corporation | Apparatus and method for decreasing the latency between an instruction cache and a pipeline processor |
US20050216713A1 (en) * | 2004-03-25 | 2005-09-29 | International Business Machines Corporation | Instruction text controlled selectively stated branches for prediction via a branch target buffer |
US20050223202A1 (en) * | 2004-03-31 | 2005-10-06 | Intel Corporation | Branch prediction in a pipelined processor |
US6963554B1 (en) * | 2000-12-27 | 2005-11-08 | National Semiconductor Corporation | Microwire dynamic sequencer pipeline stall |
US20060015706A1 (en) * | 2004-06-30 | 2006-01-19 | Chunrong Lai | TLB correlated branch predictor and method for use thereof |
US20060036836A1 (en) * | 1998-12-31 | 2006-02-16 | Metaflow Technologies, Inc. | Block-based branch target buffer |
US20060041868A1 (en) * | 2004-08-23 | 2006-02-23 | Cheng-Yen Huang | Method for verifying branch prediction mechanism and accessible recording medium for storing program thereof |
US7162619B2 (en) * | 2001-07-03 | 2007-01-09 | Ip-First, Llc | Apparatus and method for densely packing a branch instruction predicted by a branch target address cache and associated target instructions into a byte-wide instruction buffer |
Family Cites Families (148)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4342082A (en) | 1977-01-13 | 1982-07-27 | International Business Machines Corp. | Program instruction mechanism for shortened recursive handling of interruptions |
US4216539A (en) | 1978-05-05 | 1980-08-05 | Zehntel, Inc. | In-circuit digital tester |
US4400773A (en) | 1980-12-31 | 1983-08-23 | International Business Machines Corp. | Independent handling of I/O interrupt requests and associated status information transfers |
US4594659A (en) * | 1982-10-13 | 1986-06-10 | Honeywell Information Systems Inc. | Method and apparatus for prefetching instructions for a central execution pipeline unit |
JPS63225822A (en) | 1986-08-11 | 1988-09-20 | Toshiba Corp | Barrel shifter |
US4905178A (en) | 1986-09-19 | 1990-02-27 | Performance Semiconductor Corporation | Fast shifter method and structure |
JPS6398729A (en) * | 1986-10-15 | 1988-04-30 | Fujitsu Ltd | Barrel shifter |
US4914622A (en) | 1987-04-17 | 1990-04-03 | Advanced Micro Devices, Inc. | Array-organized bit map with a barrel shifter |
US4962500A (en) | 1987-08-28 | 1990-10-09 | Nec Corporation | Data processor including testing structure for a barrel shifter |
KR970005453B1 (en) | 1987-12-25 | 1997-04-16 | 가부시기가이샤 히다찌세이사꾸쇼 | Data processing apparatus for high speed processing |
US4926323A (en) * | 1988-03-03 | 1990-05-15 | Advanced Micro Devices, Inc. | Streamlined instruction processor |
JPH01263820A (en) | 1988-04-15 | 1989-10-20 | Hitachi Ltd | Microprocessor |
DE3886739D1 (en) | 1988-06-02 | 1994-02-10 | Itt Ind Gmbh Deutsche | Device for digital signal processing. |
GB2229832B (en) | 1989-03-30 | 1993-04-07 | Intel Corp | Byte swap instruction for memory format conversion within a microprocessor |
EP0415648B1 (en) * | 1989-08-31 | 1998-05-20 | Canon Kabushiki Kaisha | Image processing apparatus |
JPH03185530A (en) | 1989-12-14 | 1991-08-13 | Mitsubishi Electric Corp | Data processor |
EP0436341B1 (en) | 1990-01-02 | 1997-05-07 | Motorola, Inc. | Sequential prefetch method for 1, 2 or 3 word instructions |
JPH03248226A (en) | 1990-02-26 | 1991-11-06 | Nec Corp | Microprocessor |
JP2560889B2 (en) * | 1990-05-22 | 1996-12-04 | 日本電気株式会社 | Microprocessor |
JP2556612B2 (en) * | 1990-08-29 | 1996-11-20 | 日本電気アイシーマイコンシステム株式会社 | Barrel shifter circuit |
US5636363A (en) * | 1991-06-14 | 1997-06-03 | Integrated Device Technology, Inc. | Hardware control structure and method for off-chip monitoring entries of an on-chip cache |
DE69229084T2 (en) * | 1991-07-08 | 1999-10-21 | Canon K.K., Tokio/Tokyo | Color imaging process, color image reader and color image processing apparatus |
US5539911A (en) * | 1991-07-08 | 1996-07-23 | Seiko Epson Corporation | High-performance, superscalar-based computer system with out-of-order instruction execution |
US5450586A (en) * | 1991-08-14 | 1995-09-12 | Hewlett-Packard Company | System for analyzing and debugging embedded software through dynamic and interactive use of code markers |
CA2073516A1 (en) | 1991-11-27 | 1993-05-28 | Peter Michael Kogge | Dynamic multi-mode parallel processor array architecture computer system |
US5485625A (en) | 1992-06-29 | 1996-01-16 | Ford Motor Company | Method and apparatus for monitoring external events during a microprocessor's sleep mode |
US5274770A (en) | 1992-07-29 | 1993-12-28 | Tritech Microelectronics International Pte Ltd. | Flexible register-based I/O microcontroller with single cycle instruction execution |
US5294928A (en) | 1992-08-31 | 1994-03-15 | Microchip Technology Incorporated | A/D converter with zero power mode |
US5333119A (en) | 1992-09-30 | 1994-07-26 | Regents Of The University Of Minnesota | Digital signal processor with delayed-evaluation array multipliers and low-power memory addressing |
US5542074A (en) | 1992-10-22 | 1996-07-30 | Maspar Computer Corporation | Parallel processor system with highly flexible local control capability, including selective inversion of instruction signal and control of bit shift amount |
US5696958A (en) | 1993-01-11 | 1997-12-09 | Silicon Graphics, Inc. | Method and apparatus for reducing delays following the execution of a branch instruction in an instruction pipeline |
GB2275119B (en) * | 1993-02-03 | 1997-05-14 | Motorola Inc | A cached processor |
US5577217A (en) * | 1993-05-14 | 1996-11-19 | Intel Corporation | Method and apparatus for a branch target buffer with shared branch pattern tables for associated branch predictions |
JPH06332693A (en) | 1993-05-27 | 1994-12-02 | Hitachi Ltd | Issuing system of suspending instruction with time-out function |
US5454117A (en) | 1993-08-25 | 1995-09-26 | Nexgen, Inc. | Configurable branch prediction for a processor performing speculative execution |
US5584031A (en) | 1993-11-09 | 1996-12-10 | Motorola Inc. | System and method for executing a low power delay instruction |
JP2801135B2 (en) * | 1993-11-26 | 1998-09-21 | 富士通株式会社 | Instruction reading method and instruction reading device for pipeline processor |
US6116768A (en) | 1993-11-30 | 2000-09-12 | Texas Instruments Incorporated | Three input arithmetic logic unit with barrel rotator |
US5590350A (en) | 1993-11-30 | 1996-12-31 | Texas Instruments Incorporated | Three input arithmetic logic unit with mask generator |
US5509129A (en) | 1993-11-30 | 1996-04-16 | Guttag; Karl M. | Long instruction word controlling plural independent processor operations |
US5590351A (en) | 1994-01-21 | 1996-12-31 | Advanced Micro Devices, Inc. | Superscalar execution unit for sequential instruction pointer updates and segment limit checks |
TW253946B (en) * | 1994-02-04 | 1995-08-11 | Ibm | Data processor with branch prediction and method of operation |
JPH07253922A (en) | 1994-03-14 | 1995-10-03 | Texas Instr Japan Ltd | Address generating circuit |
US5517436A (en) | 1994-06-07 | 1996-05-14 | Andreas; David C. | Digital signal processor for audio applications |
US5809293A (en) | 1994-07-29 | 1998-09-15 | International Business Machines Corporation | System and method for program execution tracing within an integrated processor |
US5566357A (en) | 1994-10-06 | 1996-10-15 | Qualcomm Incorporated | Power reduction in a cellular radiotelephone |
US5692168A (en) * | 1994-10-18 | 1997-11-25 | Cyrix Corporation | Prefetch buffer using flow control bit to identify changes of flow within the code stream |
JPH08202469A (en) | 1995-01-30 | 1996-08-09 | Fujitsu Ltd | Microcontroller unit equipped with universal asychronous transmitting and receiving circuit |
US5600674A (en) | 1995-03-02 | 1997-02-04 | Motorola Inc. | Method and apparatus of an enhanced digital signal processor |
US5655122A (en) | 1995-04-05 | 1997-08-05 | Sequent Computer Systems, Inc. | Optimizing compiler with static prediction of branch probability, branch frequency and function frequency |
US5835753A (en) | 1995-04-12 | 1998-11-10 | Advanced Micro Devices, Inc. | Microprocessor with dynamically extendable pipeline stages and a classifying circuit |
US5659752A (en) * | 1995-06-30 | 1997-08-19 | International Business Machines Corporation | System and method for improving branch prediction in compiled program code |
US5842004A (en) | 1995-08-04 | 1998-11-24 | Sun Microsystems, Inc. | Method and apparatus for decompression of compressed geometric three-dimensional graphics data |
US5768602A (en) | 1995-08-04 | 1998-06-16 | Apple Computer, Inc. | Sleep mode controller for power management |
US5727211A (en) * | 1995-11-09 | 1998-03-10 | Chromatic Research, Inc. | System and method for fast context switching between tasks |
US5774709A (en) | 1995-12-06 | 1998-06-30 | Lsi Logic Corporation | Enhanced branch delay slot handling with single exception program counter |
US5778438A (en) | 1995-12-06 | 1998-07-07 | Intel Corporation | Method and apparatus for maintaining cache coherency in a computer system with a highly pipelined bus and multiple conflicting snoop requests |
US5996071A (en) * | 1995-12-15 | 1999-11-30 | Via-Cyrix, Inc. | Detecting self-modifying code in a pipelined processor with branch processing by comparing latched store address to subsequent target address |
JP3663710B2 (en) * | 1996-01-17 | 2005-06-22 | ヤマハ株式会社 | Program generation method and processor interrupt control method |
US5896305A (en) * | 1996-02-08 | 1999-04-20 | Texas Instruments Incorporated | Shifter circuit for an arithmetic logic unit in a microprocessor |
JPH09261490A (en) * | 1996-03-22 | 1997-10-03 | Minolta Co Ltd | Image forming device |
US5784636A (en) | 1996-05-28 | 1998-07-21 | National Semiconductor Corporation | Reconfigurable computer architecture for use in signal processing applications |
US20010025337A1 (en) | 1996-06-10 | 2001-09-27 | Frank Worrell | Microprocessor including a mode detector for setting compression mode |
US5826079A (en) | 1996-07-05 | 1998-10-20 | Ncr Corporation | Method for improving the execution efficiency of frequently communicating processes utilizing affinity process scheduling by identifying and assigning the frequently communicating processes to the same processor |
US5964884A (en) * | 1996-09-30 | 1999-10-12 | Advanced Micro Devices, Inc. | Self-timed pulse control circuit |
US5848264A (en) | 1996-10-25 | 1998-12-08 | S3 Incorporated | Debug and video queue for multi-processor chip |
US6058142A (en) | 1996-11-29 | 2000-05-02 | Sony Corporation | Image processing apparatus |
US6061521A (en) | 1996-12-02 | 2000-05-09 | Compaq Computer Corp. | Computer having multimedia operations executable as two distinct sets of operations within a single instruction cycle |
US5909572A (en) | 1996-12-02 | 1999-06-01 | Compaq Computer Corp. | System and method for conditionally moving an operand from a source register to a destination register |
EP0855645A3 (en) * | 1996-12-31 | 2000-05-24 | Texas Instruments Incorporated | System and method for speculative execution of instructions with data prefetch |
KR100236533B1 (en) | 1997-01-16 | 2000-01-15 | 윤종용 | Digital signal processor |
EP0855718A1 (en) | 1997-01-28 | 1998-07-29 | Hewlett-Packard Company | Memory low power mode control |
US6185732B1 (en) * | 1997-04-08 | 2001-02-06 | Advanced Micro Devices, Inc. | Software debug port for a microprocessor |
US6154857A (en) * | 1997-04-08 | 2000-11-28 | Advanced Micro Devices, Inc. | Microprocessor-based device incorporating a cache for capturing software performance profiling data |
US6584525B1 (en) | 1998-11-19 | 2003-06-24 | Edwin E. Klingman | Adaptation of standard microprocessor architectures via an interface to a configurable subsystem |
US6021500A (en) | 1997-05-07 | 2000-02-01 | Intel Corporation | Processor with sleep and deep sleep modes |
US5950120A (en) | 1997-06-17 | 1999-09-07 | Lsi Logic Corporation | Apparatus and method for shutdown of wireless communications mobile station with multiple clocks |
US5931950A (en) | 1997-06-17 | 1999-08-03 | Pc-Tel, Inc. | Wake-up-on-ring power conservation for host signal processing communication system |
US6035374A (en) | 1997-06-25 | 2000-03-07 | Sun Microsystems, Inc. | Method of executing coded instructions in a multiprocessor having shared execution resources including active, nap, and sleep states in accordance with cache miss latency |
US6088786A (en) | 1997-06-27 | 2000-07-11 | Sun Microsystems, Inc. | Method and system for coupling a stack based processor to register based functional unit |
US5878264A (en) | 1997-07-17 | 1999-03-02 | Sun Microsystems, Inc. | Power sequence controller with wakeup logic for enabling a wakeup interrupt handler procedure |
US6026478A (en) | 1997-08-01 | 2000-02-15 | Micron Technology, Inc. | Split embedded DRAM processor |
US6226738B1 (en) | 1997-08-01 | 2001-05-01 | Micron Technology, Inc. | Split embedded DRAM processor |
US6760833B1 (en) | 1997-08-01 | 2004-07-06 | Micron Technology, Inc. | Split embedded DRAM processor |
JPH11143571A (en) | 1997-11-05 | 1999-05-28 | Mitsubishi Electric Corp | Data processor |
US6044458A (en) | 1997-12-12 | 2000-03-28 | Motorola, Inc. | System for monitoring program flow utilizing fixwords stored sequentially to opcodes |
US6014743A (en) | 1998-02-05 | 2000-01-11 | Intergrated Device Technology, Inc. | Apparatus and method for recording a floating point error pointer in zero cycles |
US6374349B2 (en) | 1998-03-19 | 2002-04-16 | Mcfarling Scott | Branch predictor with serially connected predictor stages for improving branch prediction accuracy |
US6289417B1 (en) | 1998-05-18 | 2001-09-11 | Arm Limited | Operand supply to an execution unit |
US6308279B1 (en) | 1998-05-22 | 2001-10-23 | Intel Corporation | Method and apparatus for power mode transition in a multi-thread processor |
JPH11353225A (en) | 1998-05-26 | 1999-12-24 | Internatl Business Mach Corp <Ibm> | Memory that processor addressing gray code system in sequential execution style accesses and method for storing code and data in memory |
US20020053015A1 (en) | 1998-07-14 | 2002-05-02 | Sony Corporation And Sony Electronics Inc. | Digital signal processor particularly suited for decoding digital audio |
US6327651B1 (en) | 1998-09-08 | 2001-12-04 | International Business Machines Corporation | Wide shifting in the vector permute unit |
US6240521B1 (en) | 1998-09-10 | 2001-05-29 | International Business Machines Corp. | Sleep mode transition between processors sharing an instruction set and an address space |
US6347379B1 (en) | 1998-09-25 | 2002-02-12 | Intel Corporation | Reducing power consumption of an electronic device |
US6862563B1 (en) | 1998-10-14 | 2005-03-01 | Arc International | Method and apparatus for managing the configuration and functionality of a semiconductor design |
US6671743B1 (en) * | 1998-11-13 | 2003-12-30 | Creative Technology, Ltd. | Method and system for exposing proprietary APIs in a privileged device driver to an application |
DE69910826T2 (en) * | 1998-11-20 | 2004-06-17 | Altera Corp., San Jose | COMPUTER SYSTEM WITH RECONFIGURABLE PROGRAMMABLE LOGIC DEVICE |
US6763452B1 (en) * | 1999-01-28 | 2004-07-13 | Ati International Srl | Modifying program execution based on profiling |
US6477683B1 (en) * | 1999-02-05 | 2002-11-05 | Tensilica, Inc. | Automated processor generation system for designing a configurable processor and method for the same |
US6418530B2 (en) | 1999-02-18 | 2002-07-09 | Hewlett-Packard Company | Hardware/software system for instruction profiling and trace selection using branch history information for branch predictions |
US6427206B1 (en) | 1999-05-03 | 2002-07-30 | Intel Corporation | Optimized branch predictions for strongly predicted compiler branches |
US6560754B1 (en) * | 1999-05-13 | 2003-05-06 | Arc International Plc | Method and apparatus for jump control in a pipelined processor |
US6438700B1 (en) | 1999-05-18 | 2002-08-20 | Koninklijke Philips Electronics N.V. | System and method to reduce power consumption in advanced RISC machine (ARM) based systems |
US6571333B1 (en) | 1999-11-05 | 2003-05-27 | Intel Corporation | Initializing a memory controller by executing software in second memory to wakeup a system |
US6909744B2 (en) | 1999-12-09 | 2005-06-21 | Redrock Semiconductor, Inc. | Processor architecture for compression and decompression of video and images |
US6412038B1 (en) | 2000-02-14 | 2002-06-25 | Intel Corporation | Integral modular cache for a processor |
JP2001282548A (en) | 2000-03-29 | 2001-10-12 | Matsushita Electric Ind Co Ltd | Communication equipment and communication method |
US6519696B1 (en) | 2000-03-30 | 2003-02-11 | I.P. First, Llc | Paired register exchange using renaming register map |
US6681295B1 (en) * | 2000-08-31 | 2004-01-20 | Hewlett-Packard Development Company, L.P. | Fast lane prefetching |
US6718460B1 (en) | 2000-09-05 | 2004-04-06 | Sun Microsystems, Inc. | Mechanism for error handling in a computer system |
US20030070013A1 (en) | 2000-10-27 | 2003-04-10 | Daniel Hansson | Method and apparatus for reducing power consumption in a digital processor |
US7039901B2 (en) | 2001-01-24 | 2006-05-02 | Texas Instruments Incorporated | Software shared memory bus |
US6925634B2 (en) * | 2001-01-24 | 2005-08-02 | Texas Instruments Incorporated | Method for maintaining cache coherency in software in a shared memory system |
EP1384160A2 (en) | 2001-03-02 | 2004-01-28 | Atsana Semiconductor Corp. | Apparatus for variable word length computing in an array processor |
US7010558B2 (en) | 2001-04-19 | 2006-03-07 | Arc International | Data processor with enhanced instruction execution and method |
US7165168B2 (en) | 2003-01-14 | 2007-01-16 | Ip-First, Llc | Microprocessor with branch target address cache update queue |
GB0112275D0 (en) | 2001-05-21 | 2001-07-11 | Micron Technology Inc | Method and circuit for normalization of floating point significands in a simd array mpp |
GB0112269D0 (en) | 2001-05-21 | 2001-07-11 | Micron Technology Inc | Method and circuit for alignment of floating point significands in a simd array mpp |
US7191445B2 (en) * | 2001-08-31 | 2007-03-13 | Texas Instruments Incorporated | Method using embedded real-time analysis components with corresponding real-time operating system software objects |
US6751331B2 (en) | 2001-10-11 | 2004-06-15 | United Global Sourcing Incorporated | Communication headset |
JP2003131902A (en) * | 2001-10-24 | 2003-05-09 | Toshiba Corp | Software debugger, system-level debugger, debug method and debug program |
US7051239B2 (en) * | 2001-12-28 | 2006-05-23 | Hewlett-Packard Development Company, L.P. | Method and apparatus for efficiently implementing trace and/or logic analysis mechanisms on a processor chip |
US20030225998A1 (en) | 2002-01-31 | 2003-12-04 | Khan Mohammed Noshad | Configurable data processor with multi-length instruction set architecture |
US7168067B2 (en) * | 2002-02-08 | 2007-01-23 | Agere Systems Inc. | Multiprocessor system with cache-based software breakpoints |
US7529912B2 (en) | 2002-02-12 | 2009-05-05 | Via Technologies, Inc. | Apparatus and method for instruction-level specification of floating point format |
US7181596B2 (en) | 2002-02-12 | 2007-02-20 | Ip-First, Llc | Apparatus and method for extending a microprocessor instruction set |
US7328328B2 (en) | 2002-02-19 | 2008-02-05 | Ip-First, Llc | Non-temporal memory reference control mechanism |
US7315921B2 (en) | 2002-02-19 | 2008-01-01 | Ip-First, Llc | Apparatus and method for selective memory attribute control |
US7395412B2 (en) | 2002-03-08 | 2008-07-01 | Ip-First, Llc | Apparatus and method for extending data modes in a microprocessor |
US7546446B2 (en) | 2002-03-08 | 2009-06-09 | Ip-First, Llc | Selective interrupt suppression |
US7302551B2 (en) | 2002-04-02 | 2007-11-27 | Ip-First, Llc | Suppression of store checking |
US7380103B2 (en) | 2002-04-02 | 2008-05-27 | Ip-First, Llc | Apparatus and method for selective control of results write back |
US7155598B2 (en) | 2002-04-02 | 2006-12-26 | Ip-First, Llc | Apparatus and method for conditional instruction execution |
US7185180B2 (en) | 2002-04-02 | 2007-02-27 | Ip-First, Llc | Apparatus and method for selective control of condition code write back |
US7373483B2 (en) | 2002-04-02 | 2008-05-13 | Ip-First, Llc | Mechanism for extending the number of registers in a microprocessor |
US7380109B2 (en) | 2002-04-15 | 2008-05-27 | Ip-First, Llc | Apparatus and method for providing extended address modes in an existing instruction set for a microprocessor |
KR100450753B1 (en) | 2002-05-17 | 2004-10-01 | 한국전자통신연구원 | Programmable variable length decoder including interface of CPU processor |
US6718504B1 (en) | 2002-06-05 | 2004-04-06 | Arc International | Method and apparatus for implementing a data processor adapted for turbo decoding |
US6968444B1 (en) | 2002-11-04 | 2005-11-22 | Advanced Micro Devices, Inc. | Microprocessor employing a fixed position dispatch unit |
US7590829B2 (en) | 2003-03-31 | 2009-09-15 | Stretch, Inc. | Extension adapter |
US7668897B2 (en) | 2003-06-16 | 2010-02-23 | Arm Limited | Result partitioning within SIMD data processing systems |
US7373642B2 (en) | 2003-07-29 | 2008-05-13 | Stretch, Inc. | Defining instruction extensions in a standard programming language |
US7133950B2 (en) | 2003-08-19 | 2006-11-07 | Sun Microsystems, Inc. | Request arbitration in multi-core processor |
US7363544B2 (en) * | 2003-10-30 | 2008-04-22 | International Business Machines Corporation | Program debug method and apparatus |
US7401328B2 (en) | 2003-12-18 | 2008-07-15 | Lsi Corporation | Software-implemented grouping techniques for use in a superscalar data processing system |
US7613911B2 (en) | 2004-03-12 | 2009-11-03 | Arm Limited | Prefetching exception vectors by early lookup exception vectors within a cache memory |
US20050278517A1 (en) * | 2004-05-19 | 2005-12-15 | Kar-Lik Wong | Systems and methods for performing branch prediction in a variable length instruction set microprocessor |
-
2005
- 2005-05-19 US US11/132,428 patent/US20050278517A1/en not_active Abandoned
- 2005-05-19 US US11/132,448 patent/US20050289323A1/en not_active Abandoned
- 2005-05-19 US US11/132,432 patent/US20050273559A1/en not_active Abandoned
- 2005-05-19 TW TW094116302A patent/TW200602974A/en unknown
- 2005-05-19 WO PCT/US2005/017586 patent/WO2005114441A2/en active Application Filing
- 2005-05-19 US US11/132,423 patent/US20050278513A1/en not_active Abandoned
- 2005-05-19 GB GB0622477A patent/GB2428842A/en not_active Withdrawn
- 2005-05-19 US US11/132,447 patent/US20050278505A1/en not_active Abandoned
- 2005-05-19 US US11/132,424 patent/US8719837B2/en active Active
- 2005-05-19 CN CNA2005800215322A patent/CN101002169A/en active Pending
-
2014
- 2014-03-21 US US14/222,194 patent/US9003422B2/en active Active
Patent Citations (80)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6076158A (en) * | 1990-06-29 | 2000-06-13 | Digital Equipment Corporation | Branch prediction in high-performance processor |
US5155843A (en) * | 1990-06-29 | 1992-10-13 | Digital Equipment Corporation | Error transition mode for multi-processor system |
US5778423A (en) * | 1990-06-29 | 1998-07-07 | Digital Equipment Corporation | Prefetch instruction for improving performance in reduced instruction set processor |
US5493687A (en) * | 1991-07-08 | 1996-02-20 | Seiko Epson Corporation | RISC microprocessor architecture implementing multiple typed register sets |
US5423011A (en) * | 1992-06-11 | 1995-06-06 | International Business Machines Corporation | Apparatus for initializing branch prediction information |
US5530825A (en) * | 1994-04-15 | 1996-06-25 | Motorola, Inc. | Data processor with branch target address cache and method of operation |
US5920711A (en) * | 1995-06-02 | 1999-07-06 | Synopsys, Inc. | System for frame-based protocol, graphical capture, synthesis, analysis, and simulation |
US6292879B1 (en) * | 1995-10-25 | 2001-09-18 | Anthony S. Fong | Method and apparatus to specify access control list and cache enabling and cache coherency requirement enabling on individual operands of an instruction of a computer |
US5752014A (en) * | 1996-04-29 | 1998-05-12 | International Business Machines Corporation | Automatic selection of branch prediction methodology for subsequent branch instruction based on outcome of previous branch prediction |
US5805876A (en) * | 1996-09-30 | 1998-09-08 | International Business Machines Corporation | Method and system for reducing average branch resolution time and effective misprediction penalty in a processor |
US5808876A (en) * | 1997-06-20 | 1998-09-15 | International Business Machines Corporation | Multi-function power distribution system |
US20040068643A1 (en) * | 1997-08-01 | 2004-04-08 | Dowling Eric M. | Method and apparatus for high performance branching in pipelined microsystems |
US20010044892A1 (en) * | 1997-09-10 | 2001-11-22 | Shinichi Yamaura | Method and system for high performance implementation of microprocessors |
US5978909A (en) * | 1997-11-26 | 1999-11-02 | Intel Corporation | System for speculative branch target prediction having a dynamic prediction history buffer and a static prediction history buffer |
US6353882B1 (en) * | 1998-02-23 | 2002-03-05 | Hewlett-Packard Company | Reducing branch prediction interference of opposite well behaved branches sharing history entry by static prediction correctness based updating |
US6151672A (en) * | 1998-02-23 | 2000-11-21 | Hewlett-Packard Company | Methods and apparatus for reducing interference in a branch history table of a microprocessor |
US20010040686A1 (en) * | 1998-06-26 | 2001-11-15 | Heidi M. Schoolcraft | Streamlined tetrahedral interpolation |
US6253287B1 (en) * | 1998-09-09 | 2001-06-26 | Advanced Micro Devices, Inc. | Using three-dimensional storage to make variable-length instructions appear uniform in two dimensions |
US6339822B1 (en) * | 1998-10-02 | 2002-01-15 | Advanced Micro Devices, Inc. | Using padded instructions in a block-oriented cache |
US6526502B1 (en) * | 1998-12-02 | 2003-02-25 | Ip-First Llc | Apparatus and method for speculatively updating global branch history with branch prediction prior to resolution of branch outcome |
US6189091B1 (en) * | 1998-12-02 | 2001-02-13 | Ip First, L.L.C. | Apparatus and method for speculatively updating global history and restoring same on branch misprediction detection |
US20010016903A1 (en) * | 1998-12-03 | 2001-08-23 | Marc Tremblay | Software branch prediction filtering for a microprocessor |
US20060036836A1 (en) * | 1998-12-31 | 2006-02-16 | Metaflow Technologies, Inc. | Block-based branch target buffer |
US6571331B2 (en) * | 1999-03-18 | 2003-05-27 | Ip-First, Llc | Static branch prediction mechanism for conditional branch instructions |
US6499101B1 (en) * | 1999-03-18 | 2002-12-24 | I.P. First L.L.C. | Static branch prediction mechanism for conditional branch instructions |
US20010032309A1 (en) * | 1999-03-18 | 2001-10-18 | Henry G. Glenn | Static branch prediction mechanism for conditional branch instructions |
US6622240B1 (en) * | 1999-06-18 | 2003-09-16 | Intrinsity, Inc. | Method and apparatus for pre-branch instruction |
US6550056B1 (en) * | 1999-07-19 | 2003-04-15 | Mitsubishi Denki Kabushiki Kaisha | Source level debugger for debugging source programs |
US20040225871A1 (en) * | 1999-10-01 | 2004-11-11 | Naohiko Irie | Branch control memory |
US20020199092A1 (en) * | 1999-11-05 | 2002-12-26 | Ip-First Llc | Split history tables for branch prediction |
US6609194B1 (en) * | 1999-11-12 | 2003-08-19 | Ip-First, Llc | Apparatus for performing branch target address calculation based on branch type |
US20010021974A1 (en) * | 2000-02-01 | 2001-09-13 | Samsung Electronics Co., Ltd. | Branch predictor suitable for multi-processing microprocessor |
US20020066006A1 (en) * | 2000-11-29 | 2002-05-30 | Lsi Logic Corporation | Simple branch prediction and misprediction recovery method |
US20020069351A1 (en) * | 2000-12-05 | 2002-06-06 | Shyh-An Chi | Memory data access structure and method suitable for use in a processor |
US20020073301A1 (en) * | 2000-12-07 | 2002-06-13 | International Business Machines Corporation | Hardware for use with compiler generated branch information |
US20020078332A1 (en) * | 2000-12-19 | 2002-06-20 | Seznec Andre C. | Conflict free parallel read access to a bank interleaved branch predictor in a processor |
US20020083312A1 (en) * | 2000-12-27 | 2002-06-27 | Balaram Sinharoy | Branch Prediction apparatus and process for restoring replaced branch history for use in future branch predictions for an executing program |
US6963554B1 (en) * | 2000-12-27 | 2005-11-08 | National Semiconductor Corporation | Microwire dynamic sequencer pipeline stall |
US20020087852A1 (en) * | 2000-12-28 | 2002-07-04 | Jourdan Stephan J. | Method and apparatus for predicting branches using a meta predictor |
US20020087851A1 (en) * | 2000-12-28 | 2002-07-04 | Matsushita Electric Industrial Co., Ltd. | Microprocessor and an instruction converter |
US20020157000A1 (en) * | 2001-03-01 | 2002-10-24 | International Business Machines Corporation | Software hint to improve the branch target prediction accuracy |
US20020138236A1 (en) * | 2001-03-21 | 2002-09-26 | Akihiro Takamura | Processor having execution result prediction function for instruction |
US20020194462A1 (en) * | 2001-05-04 | 2002-12-19 | Ip First Llc | Apparatus and method for selecting one of multiple target addresses stored in a speculative branch target address cache per instruction cache line |
US20020188833A1 (en) * | 2001-05-04 | 2002-12-12 | Ip First Llc | Dual call/return stack branch prediction system |
US20050132175A1 (en) * | 2001-05-04 | 2005-06-16 | Ip-First, Llc. | Speculative hybrid branch direction predictor |
US20020194464A1 (en) * | 2001-05-04 | 2002-12-19 | Ip First Llc | Speculative branch target address cache with selective override by seconday predictor based on branch instruction type |
US20020194461A1 (en) * | 2001-05-04 | 2002-12-19 | Ip First Llc | Speculative branch target address cache |
US20020194463A1 (en) * | 2001-05-04 | 2002-12-19 | Ip First Llc, | Speculative hybrid branch direction predictor |
US6886093B2 (en) * | 2001-05-04 | 2005-04-26 | Ip-First, Llc | Speculative hybrid branch direction predictor |
US20040172524A1 (en) * | 2001-06-29 | 2004-09-02 | Jan Hoogerbrugge | Method, apparatus and compiler for predicting indirect branch target addresses |
US7162619B2 (en) * | 2001-07-03 | 2007-01-09 | Ip-First, Llc | Apparatus and method for densely packing a branch instruction predicted by a branch target address cache and associated target instructions into a byte-wide instruction buffer |
US6823444B1 (en) * | 2001-07-03 | 2004-11-23 | Ip-First, Llc | Apparatus and method for selectively accessing disparate instruction buffer stages based on branch target address cache hit and instruction stage wrap |
US20030023838A1 (en) * | 2001-07-27 | 2003-01-30 | Karim Faraydon O. | Novel fetch branch architecture for reducing branch penalty without branch prediction |
US20030204705A1 (en) * | 2002-04-30 | 2003-10-30 | Oldfield William H. | Prediction of branch instructions in a data processing apparatus |
US20040225872A1 (en) * | 2002-06-04 | 2004-11-11 | International Business Machines Corporation | Hybrid branch prediction using a global selection counter and a prediction method comparison table |
US20040015683A1 (en) * | 2002-07-18 | 2004-01-22 | International Business Machines Corporation | Two dimensional branch history table prefetching mechanism |
US20040049660A1 (en) * | 2002-09-06 | 2004-03-11 | Mips Technologies, Inc. | Method and apparatus for clearing hazards using jump instructions |
US20050125634A1 (en) * | 2002-10-04 | 2005-06-09 | Fujitsu Limited | Processor and instruction control method |
US20040186985A1 (en) * | 2003-03-21 | 2004-09-23 | Analog Devices, Inc. | Method and apparatus for branch prediction based on branch targets |
US6774832B1 (en) * | 2003-03-25 | 2004-08-10 | Raytheon Company | Multi-bit output DDS with real time delta sigma modulation look up from memory |
US20040193843A1 (en) * | 2003-03-31 | 2004-09-30 | Eran Altshuler | System and method for early branch prediction |
US20040193855A1 (en) * | 2003-03-31 | 2004-09-30 | Nicolas Kacevas | System and method for branch prediction access |
US20040225870A1 (en) * | 2003-05-07 | 2004-11-11 | Srinivasan Srikanth T. | Method and apparatus for reducing wrong path execution in a speculative multi-threaded processor |
US20040230782A1 (en) * | 2003-05-12 | 2004-11-18 | International Business Machines Corporation | Method and system for processing loop branch instructions |
US20040255104A1 (en) * | 2003-06-12 | 2004-12-16 | Intel Corporation | Method and apparatus for recycling candidate branch outcomes after a wrong-path execution in a superscalar processor |
US20040268102A1 (en) * | 2003-06-30 | 2004-12-30 | Combs Jonathan D. | Mechanism to remove stale branch predictions at a microprocessor |
US20050027974A1 (en) * | 2003-07-31 | 2005-02-03 | Oded Lempel | Method and system for conserving resources in an instruction pipeline |
US20050050309A1 (en) * | 2003-08-29 | 2005-03-03 | Renesas Technology Corp. | Data processor |
US20050076193A1 (en) * | 2003-09-08 | 2005-04-07 | Ip-First, Llc. | Apparatus and method for selectively overriding return stack prediction in response to detection of non-standard return sequence |
US20050066305A1 (en) * | 2003-09-22 | 2005-03-24 | Lisanke Robert John | Method and machine for efficient simulation of digital hardware within a software development environment |
US20050091479A1 (en) * | 2003-10-24 | 2005-04-28 | Sung-Woo Chung | Branch predictor, system and method of branch prediction |
US20050125632A1 (en) * | 2003-12-03 | 2005-06-09 | Advanced Micro Devices, Inc. | Transitioning from instruction cache to trace cache on label boundaries |
US20050125613A1 (en) * | 2003-12-03 | 2005-06-09 | Sangwook Kim | Reconfigurable trace cache |
US20050154867A1 (en) * | 2004-01-14 | 2005-07-14 | International Business Machines Corporation | Autonomic method and apparatus for counting branch instructions to improve branch predictions |
US20050172277A1 (en) * | 2004-02-04 | 2005-08-04 | Saurabh Chheda | Energy-focused compiler-assisted branch prediction |
US20050216713A1 (en) * | 2004-03-25 | 2005-09-29 | International Business Machines Corporation | Instruction text controlled selectively stated branches for prediction via a branch target buffer |
US20050216703A1 (en) * | 2004-03-26 | 2005-09-29 | International Business Machines Corporation | Apparatus and method for decreasing the latency between an instruction cache and a pipeline processor |
US20050223202A1 (en) * | 2004-03-31 | 2005-10-06 | Intel Corporation | Branch prediction in a pipelined processor |
US20060015706A1 (en) * | 2004-06-30 | 2006-01-19 | Chunrong Lai | TLB correlated branch predictor and method for use thereof |
US20060041868A1 (en) * | 2004-08-23 | 2006-02-23 | Cheng-Yen Huang | Method for verifying branch prediction mechanism and accessible recording medium for storing program thereof |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8719837B2 (en) | 2004-05-19 | 2014-05-06 | Synopsys, Inc. | Microprocessor architecture having extendible logic |
US20050289323A1 (en) * | 2004-05-19 | 2005-12-29 | Kar-Lik Wong | Barrel shifter for a microprocessor |
US20050278505A1 (en) * | 2004-05-19 | 2005-12-15 | Lim Seow C | Microprocessor architecture including zero impact predictive data pre-fetch mechanism for pipeline data memory |
US9003422B2 (en) | 2004-05-19 | 2015-04-07 | Synopsys, Inc. | Microprocessor architecture having extendible logic |
US7971042B2 (en) | 2005-09-28 | 2011-06-28 | Synopsys, Inc. | Microprocessor system and method for instruction-initiated recording and execution of instruction sequences in a dynamically decoupleable extended instruction pipeline |
US7716460B2 (en) | 2006-09-29 | 2010-05-11 | Qualcomm Incorporated | Effective use of a BHT in processor having variable length instruction set execution modes |
US8185725B2 (en) | 2006-09-29 | 2012-05-22 | Qualcomm Incorporated | Selective powering of a BHT in a processor having variable length instructions |
US20100058032A1 (en) * | 2006-09-29 | 2010-03-04 | Qualcomm Incorporated | Effective Use of a BHT in Processor Having Variable Length Instruction Set Execution Modes |
US20080082807A1 (en) * | 2006-09-29 | 2008-04-03 | Brian Michael Stempel | Effective Use of a BHT in Processor Having Variable Length Instruction Set Execution Modes |
WO2008039975A1 (en) * | 2006-09-29 | 2008-04-03 | Qualcomm Incorporated | Effective use of a bht in processor having variable length instruction set execution modes |
US9753732B2 (en) | 2011-12-30 | 2017-09-05 | Intel Corporation | Embedded branch prediction unit |
WO2013101152A1 (en) * | 2011-12-30 | 2013-07-04 | Intel Corporation | Embedded branch prediction unit |
US9395994B2 (en) | 2011-12-30 | 2016-07-19 | Intel Corporation | Embedded branch prediction unit |
GB2545796A (en) * | 2015-11-09 | 2017-06-28 | Imagination Tech Ltd | Fetch ahead branch target buffer |
GB2545796B (en) * | 2015-11-09 | 2019-01-30 | Mips Tech Llc | Fetch ahead branch target buffer |
US10664280B2 (en) | 2015-11-09 | 2020-05-26 | MIPS Tech, LLC | Fetch ahead branch target buffer |
WO2018009277A1 (en) * | 2016-07-07 | 2018-01-11 | Intel Corporation | Graphics command parsing mechanism |
US10192281B2 (en) | 2016-07-07 | 2019-01-29 | Intel Corporation | Graphics command parsing mechanism |
CN107179895A (en) * | 2017-05-17 | 2017-09-19 | 北京中科睿芯科技有限公司 | A kind of method that application compound instruction accelerates instruction execution speed in data flow architecture |
US20200065112A1 (en) * | 2018-08-22 | 2020-02-27 | Qualcomm Incorporated | Asymmetric speculative/nonspeculative conditional branching |
US11182166B2 (en) | 2019-05-23 | 2021-11-23 | Samsung Electronics Co., Ltd. | Branch prediction throughput by skipping over cachelines without branches |
CN110442382A (en) * | 2019-07-31 | 2019-11-12 | 西安芯海微电子科技有限公司 | Prefetch buffer control method, device, chip and computer readable storage medium |
CN110727463A (en) * | 2019-09-12 | 2020-01-24 | 无锡江南计算技术研究所 | Zero-level instruction circular buffer prefetching method and device based on dynamic credit |
CN112015490A (en) * | 2020-11-02 | 2020-12-01 | 鹏城实验室 | Method, apparatus and medium for programmable device implementing and testing reduced instruction set |
US11663007B2 (en) * | 2021-10-01 | 2023-05-30 | Arm Limited | Control of branch prediction for zero-overhead loop |
Also Published As
Publication number | Publication date |
---|---|
US9003422B2 (en) | 2015-04-07 |
GB2428842A (en) | 2007-02-07 |
GB0622477D0 (en) | 2006-12-20 |
WO2005114441A2 (en) | 2005-12-01 |
CN101002169A (en) | 2007-07-18 |
WO2005114441A3 (en) | 2007-01-18 |
US20140208087A1 (en) | 2014-07-24 |
US20050273559A1 (en) | 2005-12-08 |
US20050289323A1 (en) | 2005-12-29 |
US20050289321A1 (en) | 2005-12-29 |
TW200602974A (en) | 2006-01-16 |
US8719837B2 (en) | 2014-05-06 |
US20050278505A1 (en) | 2005-12-15 |
US20050278513A1 (en) | 2005-12-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20050278517A1 (en) | Systems and methods for performing branch prediction in a variable length instruction set microprocessor | |
KR101059335B1 (en) | Efficient Use of JHT in Processors with Variable Length Instruction Set Execution Modes | |
EP1851620B1 (en) | Suppressing update of a branch history register by loop-ending branches | |
US7278012B2 (en) | Method and apparatus for efficiently accessing first and second branch history tables to predict branch instructions | |
US7237098B2 (en) | Apparatus and method for selectively overriding return stack prediction in response to detection of non-standard return sequence | |
JP5384323B2 (en) | Representing a loop branch in a branch history register with multiple bits | |
JP5255367B2 (en) | Processor with branch destination address cache and method of processing data | |
US7516312B2 (en) | Presbyopic branch target prefetch method and apparatus | |
US7444501B2 (en) | Methods and apparatus for recognizing a subroutine call | |
JP2004533695A (en) | Method, processor, and compiler for predicting branch target | |
JP5209633B2 (en) | System and method with working global history register | |
JP2011100466A5 (en) | ||
JP2008532142A5 (en) | ||
US7143269B2 (en) | Apparatus and method for killing an instruction after loading the instruction into an instruction queue in a pipelined microprocessor | |
JP2009536770A (en) | Branch address cache based on block | |
KR101048258B1 (en) | Association of cached branch information with the final granularity of branch instructions in a variable-length instruction set | |
WO2004072848A2 (en) | Method and apparatus for hazard detection and management in a pipelined digital processor | |
US7234046B2 (en) | Branch prediction using precedent instruction address of relative offset determined based on branch type and enabling skipping | |
US20050144427A1 (en) | Processor including branch prediction mechanism for far jump and far call instructions | |
US8578134B1 (en) | System and method for aligning change-of-flow instructions in an instruction buffer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: ARC INTERNATIONAL, UNITED KINGDOM Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WONG, KAR-LIK;HAKEWILL, JAMES;TOPHAM, NIGEL;AND OTHERS;REEL/FRAME:016933/0516;SIGNING DATES FROM 20050714 TO 20050809 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |