Nothing Special   »   [go: up one dir, main page]

US20050278517A1 - Systems and methods for performing branch prediction in a variable length instruction set microprocessor - Google Patents

Systems and methods for performing branch prediction in a variable length instruction set microprocessor Download PDF

Info

Publication number
US20050278517A1
US20050278517A1 US11/132,428 US13242805A US2005278517A1 US 20050278517 A1 US20050278517 A1 US 20050278517A1 US 13242805 A US13242805 A US 13242805A US 2005278517 A1 US2005278517 A1 US 2005278517A1
Authority
US
United States
Prior art keywords
instruction
branch
address
fetch
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/132,428
Inventor
Kar-Lik Wong
James Hakewill
Nigel Topham
Rich Fuhler
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ARC International
Original Assignee
ARC International
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ARC International filed Critical ARC International
Priority to US11/132,428 priority Critical patent/US20050278517A1/en
Assigned to ARC INTERNATIONAL reassignment ARC INTERNATIONAL ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: FUHLER, RICH, HAKEWILL, JAMES, WONG, KAR-LIK, TOPHAM, NIGEL
Publication of US20050278517A1 publication Critical patent/US20050278517A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F5/00Methods or arrangements for data conversion without changing the order or content of the data handled
    • G06F5/01Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3861Recovery, e.g. branch miss-prediction, exception handling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3842Speculative instruction execution
    • G06F9/3844Speculative instruction execution using dynamic branch prediction, e.g. using branch history tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/3648Software debugging using additional hardware
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30032Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30036Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • G06F9/30149Instruction analysis, e.g. decoding, instruction word fields of variable length instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/32Address formation of the next instruction, e.g. by incrementing the instruction counter
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/32Address formation of the next instruction, e.g. by incrementing the instruction counter
    • G06F9/322Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address
    • G06F9/325Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address for loops, e.g. loop detection or loop counter
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3804Instruction prefetching for branches, e.g. hedging, branch folding
    • G06F9/3806Instruction prefetching for branches, e.g. hedging, branch folding using address prediction, e.g. return stack, branch history buffer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3816Instruction alignment, e.g. cache line crossing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3842Speculative instruction execution
    • G06F9/3846Speculative instruction execution using static prediction, e.g. branch taken strategy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3893Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
    • G06F9/3895Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros
    • G06F9/3897Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros with adaptable data path
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This invention relates generally to microprocessor architecture and more specifically to an improved architecture and mode of operation of a microprocessor for performing branch prediction.
  • a typical component of a multistage microprocessor pipeline is the branch prediction unit (BPU).
  • the branch prediction unit Usually located in or near a fetch stage of the pipelines the branch prediction unit increases effective processing speed by predicting whether a branch to a non-sequential instruction will be taken based upon past instruction processing history.
  • the branch prediction unit contains a branch look-up or prediction table that stores the address of branch instructions, an indication as to whether the branch was taken and a speculative target address for a taken branch.
  • next instruction is speculatively loaded into the pipeline. Whether or not the prediction will be correct, will not be known until a later stage of the pipeline. However, if the prediction is correct, clock cycles will be saved by not having to go back to get the next instruction address. Otherwise, the current pipeline behind the stage in which the actual address of the next instruction is determined must be flushed and the correct branch inserted back in the first stage. While this may seem like a harsh penalty for incorrect predictions, in applications where the instruction set is limited and small loops are repeated many times, such as, for example, applications typically implemented with embedded processors, branch prediction is usually accurate enough such that the benefits associated with correct predictions outweigh the cost of occasional incorrect predictions—i.e., pipeline flush. In these types of applications branch prediction can achieve accuracy over ninety percent of the time. Thus, the risk of predicting an incorrect branch resulting in a pipeline flush is outweighed by the benefit of saved clock cycles.
  • look-up table is a comprised of entries associated with 32-bit wide fetch entities and instructions have lengths varying from 16 to 64-bits, a specific lookup table address entry may not be sufficient to reference a particular instruction.
  • a microprocessor architecture in which branch prediction information is selectively ignored by the instruction pipeline in order to avoid injection of erroneous instructions into the pipeline. These embodiments are particularly useful for branch prediction schemes in which variable length instructions are predictively fetched.
  • a 32-bit word is fetched based on the address in the branch prediction table.
  • this address may reference a word comprising two 16-bit instruction words, or a 16-bit instruction word and an unaligned instruction word of larger length (32, 48 or 64 bits) or parts of two unaligned instruction words of such larger lengths.
  • the branch prediction table may contain a tag coupled to the lower bits of a fetch instruction address. If the entry at the location specified by the branch prediction table contains more than one instruction, for example, two 16-bit instructions, or a 16-bit instruction and a portion of a 32, 48 or 64-bit instruction, a prediction may be made based on an instruction that will ultimately be discarded. Though the instruction aligner will discard the incorrect instruction, a predicted branch will already have been injected into the pipeline and will not be discovered until branch resolution in a later stage of the pipeline causing a pipeline flush.
  • a prediction will be discarded beforehand if two conditions are satisfied.
  • a prediction will be discarded if a branch prediction look-up is based on a non-sequential fetch to an unaligned address, and secondly, if the branch target alignment cache (BTAC) bit is equal to zero. This second condition will only be satisfied if the prediction is based on an instruction having an aligned instruction address.
  • BTAC branch target alignment cache
  • an alignment bit of zero will indicate that the prediction information is for an aligned branch. This will prevent the predictions based on incorrect instructions from being injected into the pipeline.
  • a microprocessor architecture which utilizes dynamic branch prediction while removing the inherent latency involved in branch prediction.
  • an instruction fetch address is used to look up in a BPU table recording historical program flow to predict when a non-sequential program flow is to occur.
  • the address of the instruction prior to the branch instruction in the program flow is used to index the branch in the branch table.
  • a delay slot instruction may be inserted after a conditional branch such that the conditional branch is not the last sequential instruction.
  • the delay slot instruction is the actual sequential departure point, the instruction prior to the non-sequential program flow would actually be the branch instruction.
  • the BPU would index such an entry by the address of the conditional branch instruction itself, since it would be the instruction prior to the non-sequential instruction.
  • use of a delay slot instruction will also affect branch resolution in the selection stage.
  • update of the BPU must be deferred for one execution cycle after the branch instruction. This process is further complicated by the use of variable length instructions. Performance of branch resolution after execution requires updating of the BPU table.
  • the processor instruction set includes variable length instructions it becomes essential to determine the last fetch address of the current instruction as well as the update address, i.e., the fetch address prior to the sequential departure point.
  • the current instruction is an aligned or non-aligned 16-bit or an aligned 32-bit instruction, the last fetch address will be the instruction fetch address of the current instruction.
  • the update address of an aligned 16-bit or aligned 32-bit instruction will be the last fetch address of the prior instruction. For a non-aligned 16-bit instruction, if it was arrived at sequentially, the update address will be the update address of the prior instruction. Otherwise, the update address will be the last fetch address of the prior instruction.
  • the last fetch address will simply be the address of the next instruction.
  • the update address will be the current instruction address.
  • the last fetch address of a non-aligned 48-bit instruction or an aligned 64-bit instruction will be the address of the next instruction minus one and the update address will be the current instruction address. If the current instruction is a non-aligned 64-bit, the last fetch address will be the same as the next instruction address and the update address will be the next instruction address minus one.
  • a microprocessor architecture which employs dynamic branch prediction and zero overhead loops.
  • the BPU is updated whenever the zero-overhead loop mechanism is updated.
  • the BPU needs to store the last fetch address of the last instruction of the loop body. This allows the BPU to predictively re-direct instruction fetch to the start of the loop body whenever an instruction fetch hits the end of the loop body.
  • the last fetch address of the loop body can be derived from the address of the first instruction after the end of the loop, despite the use of variable length instructions, by exploiting the fact that instructions are fetched in 32-bit word chunks and that instruction sizes are in general integer multiple of a 16-bits.
  • the last instruction of the loop body has a last fetch address immediately preceding the address of the next instruction after the end of the loop body. Otherwise, if the next instruction after the end of the loop body has an unaligned address, the last instruction of the loop body has the same fetch address as the next instruction after the loop body.
  • At least one exemplary embodiment of the invention provides a method of performing branch prediction in a microprocessor using variable length instructions.
  • the method of performing branch prediction in a microprocessor using variable length instructions comprises fetching an instruction from memory based on a specified fetch address, making a branch prediction based on the address of the fetched instruction, and discarding the branch prediction if (1) the branch prediction look-up was based on a non-sequential fetch to an unaligned instruction address and (2) if a branch target alignment cache (BTAC) bit of the instruction is equal to zero.
  • BTAC branch target alignment cache
  • At least one additional exemplary embodiment provides a method of performing dynamic branch prediction in a microprocessor.
  • the method of performing dynamic branch prediction in a microprocessor may comprise fetching the penultimate instruction word prior to a non-sequential program flow and a branch prediction unit look-up table entry containing prediction information for a next instruction word on a first clock cycle, fetching the last instruction word prior to a non-sequential program flow and making a prediction on non-sequential program flow based on information fetched in the previous cycle on a second clock cycle, and fetching the predicted target instruction on a third clock cycle.
  • an additional exemplary embodiment provides a method of updating a look-up table of a branch prediction unit in a variable length instruction set microprocessor.
  • the method of updating a look-up table of a branch prediction unit in a variable length instruction set microprocessor may comprise storing a last fetch address of a last instruction of a loop body of a zero overhead loop in the branch prediction look-up table, and predictively re-directing an instruction fetch to the start of the loop body whenever an instruction fetch hits the end of a loop body, wherein the last fetch address of the loop body is derived from the address of the first instruction after the end of the loop.
  • FIG. 1 is a diagram illustrating the contents of a 32-bit instruction memory and a corresponding table illustrating the location of particular instructions within the instruction memory in connection with a technique for selectively ignoring branch prediction information in accordance with at least one exemplary embodiment of this invention
  • FIG. 2 is a flow chart illustrating the steps of a method for selectively discarding branch predictions corresponding to aligned 16-bit instructions having the same fetch address as a non-aligned 16-bit target instruction in accordance with at least one exemplary embodiment of this invention
  • FIG. 3 is a flow chart illustrating a prior art method of performing branch prediction by storing non-sequential branch instructions in a branch prediction unit table that is indexed by the fetch address of the non-sequential branch instruction;
  • FIG. 4 is a flow chart illustrating a method for performing branch prediction by storing non-sequential branch instructions in a branch prediction table that is indexed by the fetch address of the instruction prior to the non-sequential branch instruction in accordance with at least one exemplary embodiment of this invention
  • FIG. 5 is a diagram illustrating possible scenarios encountered during branch resolution when 32-bit words are fetched from memory in a system incorporating a variable length instruction architecture including instructions of 16-bits, 32-bits, 48-bits or 64-bits in length;
  • FIGS. 6 and 7 are tables illustrating a method for computing the last instruction fetch address of a zero-overhead loop for dynamic branch prediction in a variable-length instruction set architecture processor
  • FIG. 1 is a diagram illustrating the contents of a 32-bit instruction memory and a corresponding table illustrating the location of particular instructions within the instruction memory in connection with a technique for selectively ignoring branch prediction information in accordance with at least one exemplary embodiment of this invention.
  • FIG. 1 a sequence of 32-bit wide memory words are shown containing instructions instr_ 1 through instr_ 4 in sequential locations in memory.
  • Instr_ 2 is the target of a non-sequential instruction fetch.
  • the BPU stores prediction information in its tables based only on the 32-bit fetch address of the start of the instruction. There can be more than one instruction in any 32-bit word in memory, however, only one prediction can be made per 32-bit word. Thus, the performance problem can be seen by referring to FIG. 1 .
  • the instruction address of instr_ 2 is actually 0x2, however, the fetch address is 0x0, and a fetch of this address will cause the entire 32-bit word comprised of 16-bits of instr_ 1 and 16-bits of instr_ 2 to be fetched.
  • a branch prediction will be made for instr_ 1 based on the instruction fetch of the 32-bit word at address 0x0.
  • the branch predictor does not take into account the fact that the instr_ 1 at 0x0 will be discarded by the aligner before it can be issued, however the prediction remains.
  • the prediction would be correct if instr_ 1 is fetched as the result of a sequential fetch of 0x0, or if a branch was made to 0x0, but, in this case, where a branch is made to instr_ 2 at 0x2, the prediction is wrong. As a result, the prediction is wrong for instr_ 2 causing an incorrect instruction to hit the backstop and a pipeline flush, a severe performance penalty, to occur.
  • FIG. 2 is a flow chart outlining the steps of a method for solving the aforementioned problem by selectively discarding branch prediction information in accordance with various embodiments of the invention. Operation of the method begins at step 200 and proceeds to step 205 where a 32-bit word is read from memory at the specified fetch address of the target instruction. Next, in step 210 , a prediction is made based on this fetched instruction. This prediction is based on the aligned instruction fetch location. Operation of the method then proceeds to step 215 where the first part of a two-part determination test is applied to whether the branch prediction lookup is based on a non-sequential fetch to an unaligned instruction address. In the context of FIG.
  • instr_ 2 this condition would be satisfied by instr_ 2 because it is non-aligned (it does not start at the beginning of line 0x0, but rather after the first 16-bits).
  • this condition alone is not sufficient because a valid branch prediction lookup can be based on a branch located at an unaligned instruction address. For example, if in FIG. 1 instr_ 1 is not a branch and instr_ 2 is a branch. If, in step 215 , it is determined that the branch prediction lookup is based on a non-sequential fetch to an unaligned instruction address, operation of the method proceeds to the next step of the test, step 220 . Otherwise, operation of the method jumps to step 225 , where the prediction is assumed valid and passed.
  • step 220 in this step a second determination is made as to whether the branch target address cache (BTAC) alignment bit is 0, indicating that the prediction information is for an aligned branch.
  • This bit will be 0 for all aligned branches and will be 1 for all unaligned branches because it is derived from the instruction address.
  • the second bit of the instruction address will always be 0 for aligned branches (i.e., 0, 4, 8, f, etc.) and will always be 1 for unaligned branches (i.e., 2, 6, a, etc.). If, in step 220 , it is determined that the branch target address cache (BTAC) alignment bit is not 0, operation proceeds to step 225 where the prediction is passed.
  • BTAC branch target address cache
  • step 220 determines that the BTAC alignment bit is 0
  • step 230 the prediction is discarded.
  • the next fetch address is updated in step 235 based on whether a branch was predicted and returns to step 205 where the next fetch occurs.
  • dynamic branch prediction is an effective technique to reduce branch penalty in a pipeline processor architecture.
  • This technique uses the instruction fetch address to look up in internal tables recording program flow history to predict the target of a non-sequential program flow.
  • branch prediction is complicated when a variable-length instruction architecture is used. In a variable-length instruction architecture, the instruction fetch address cannot be assumed to be identical to the actual instruction address. This makes it difficult for the branch prediction algorithm to guarantee sufficient instruction words are fetched and at the same time minimize unnecessary fetches.
  • FIG. 3 illustrates such a conventional indexing method in which two instructions are sequentially fetched, the first instruction being a branch instruction, and the second being the next sequential instruction word.
  • the branch instruction is fetched with the associated BPU table entry.
  • this instruction is propagated in the pipeline to the next stage where it is detected as a predicted branch while the next instruction is fetched.
  • the target instruction is fetched based on the branch prediction made in the last cycle.
  • a latency is introduced because three steps are required to fetch the branch instruction, make a prediction and fetch the target instruction. If the instruction word fetched in 305 is not part of the branch nor of its delay slot, then the word is discarded and as a result a “bubble” is injected into the pipeline.
  • FIG. 4 illustrates a novel and improved method for making a branch prediction in accordance with various embodiments of the invention.
  • the method depicted in FIG. 4 is characterized in that the instruction address of the instruction preceding the branch instruction is used to index the BPU table rather than the instruction address of the branch instruction itself.
  • the instruction address of the instruction preceding the branch instruction is used to index the BPU table rather than the instruction address of the branch instruction itself.
  • the method begins in step 400 where the instruction prior to the branch instruction is fetched together with the BPU entry containing prediction information of the next instruction.
  • the branch instruction is fetched while, concurrently, a prediction on this branch can be made based on information fetched in the previous cycle.
  • the target instruction is fetched. As illustrated, no extra instruction word is fetched between the branch and the target instructions. Hence, no bubble will be injected into the pipeline and overall performance of the processor is improved.
  • the branch instruction may not be the departure point (the instruction prior to non-sequential flow). Rather another instruction may appear after the branch instruction. Therefore, though the non-sequential jump is dictated by the branch instruction, the last instruction to be executed may not be the branch instruction, but may rather be the delay slot instruction.
  • a delay slot is used in some processor architectures with short pipelines to hide branch resolution latency. Processors with dynamic branch prediction might still have to support the concept of delay slots to be compatible with legacy code.
  • FIG. 5 illustrates five potential scenarios encountered when performing branch resolution. These scenarios may be grouped into two groups by the way in which they are handled. Group one comprises a non-aligned 16-bit instruction and an aligned 16 or 32-bit instruction. Group two comprises one of three scenarios: a non-aligned 32 or 48-bit instruction, a non-aligned 48-bit or an aligned 64-bit instruction, and a non-aligned 64-bit instruction.
  • L 0 is simply the 30 most significant bits of the fetch address denoted as instr_addr[31:2].
  • the update address U 0 depends on whether these instructions were arrived at sequentially or as the result of a non-sequential instruction.
  • the last fetch address of the prior instruction also known as L ⁇ 1 . This information is stored internally and is available as a variable to the current instruction in the select stage of the pipeline.
  • the update address will be the last fetch address of the prior non-sequential instruction L ⁇ 1 .
  • the update address of a 16 or 32-bit aligned instruction U 0 will be the last fetch address of the prior instruction L ⁇ 1 , irrespective of whether the prior instruction was sequential or not.
  • Scenarios 3-5 can be handled in the same manner by taking advantage of the fact that each instruction fetch fetches a contiguous 32-bit word. Therefore, when the instruction is sufficiently long and/or unaligned to span two or more consecutive fetched instruction words in memory, we know with certainty that L 0 , the last fetch address, can be derived from the instruction address of the next sequential instruction, denoted as next_addr[31:2] in FIG. 5 . In scenarios 3 and 5, covering non-aligned 32-bit, aligned 48-bit and non-aligned 64-bit instructions, the last portion of the current instruction share the same fetch address with the start of the next sequential instruction. Hence L 0 will be next_addr[31:2].
  • the fetch address of the last portion of the current instruction is one less than the start address of the next sequential instruction.
  • L 0 next_addr[31:2] ⁇ 1.
  • the current instruction spans two consecutive 32-bit fetched instruction words. The fetch address prior to the last portion of the current instruction is always the fetch address of the start of the instruction. Therefore, U 0 will be inst_addr[31:2].
  • the last portion of the current instruction shares the same fetch address as the start of the next sequential instruction. Hence, U 0 will be next_addr[31:2] ⁇ 1.
  • the update address U 0 and last fetch address L 0 are computed based on 4 values that are provided to the selection stage as early arriving signals directly from registers. These signals are namely inst_addr, next_addr, L ⁇ 1 and U ⁇ 1 . Only one multiplexer is required to compute U 0 in scenario 1, and one decrementer is required to compute L 0 in scenario 4 and U 0 in scenario 5. The overall complexity of the novel and improved branch prediction method being disclosed is only marginally increased comparing with traditional methods.
  • a method and apparatus are provided for computing the last instruction fetch of a zero-overhead loop for dynamic branch prediction in a variable length instruction set microprocessor.
  • Zero-overhead loops as well as the previously discussed dynamic branch prediction, are both powerful techniques for improving effective processor performance.
  • the BPU has to be updated whenever the zero-overhead loop mechanism is updated.
  • the BPU needs the last instruction fetch address of the loop body. This allows the BPU to re-direct instruction fetch to the start of the loop body whenever an instruction fetch hits the end of the loop body.
  • determining the last fetch address of a loop body is not trivial.
  • a processor with a variable-length instruction set only keeps track of the first address an instruction is fetched from.
  • the last fetch address of a loop body is the fetch address of the last portion of the last instruction of the loop body and is not readily available.
  • a zero-overhead loop mechanism requires an address related to the end of the loop body to be stored as part of the architectural state.
  • this address can be denoted as LP_END.
  • LP_END is assigned the address of the next instruction after the last instruction of the loop body
  • the last fetch address of the loop body designated in various exemplary embodiments as LP_LAST, can be derived by exploiting two facts. Firstly, despite the variable length nature of the instruction set, instructions are fetched in fixed size chunks, namely 32-bit words. The BPU works only with the fetch address of theses fixed size chunks. Secondly, instruction sizes of variable-length are usually an integer multiple of a fixed size, namely 16-bits.
  • LP_END is both non-aligned and aligned.
  • the instruction “sub” is the last instruction of the loop body.
  • LP_END is located at 0xA.
  • LP_END is unaligned and LP_END[1] is 1, thus, the inversion of LP_END[1] is 0 and the last fetch address of the loop body, LP_LAST, is LP_END[32:2] which is 0x8.
  • LP_END is aligned and located at 0x18.
  • LP_END[1] is 0, as with all aligned instructions, thus, the inversion of LP_END is 1 and LP 13 LAST is LP_END[31:2] ⁇ 1 or the line above LP_LAST, line 0x14. Note that in the above calculations least significant bits of addresses that are known to be zero are ignored for the sake of simplifying the description.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Advance Control (AREA)
  • Executing Machine-Instructions (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A method of performing branch prediction in a microprocessor using variable length instructions is provided. An instruction is fetched from memory based on a specified fetch address and a branch prediction is made based on the address. The prediction is selectively discarded if the look-up was based on a non-sequential fetch to an unaligned instruction address and a branch target alignment cache (BTAC) bit of the instruction is equal to zero. In order to remove the inherent latency of branch prediction, an instruction prior to a branch instruction may be fetched concurrently with a branch prediction unit look-up table entry containing prediction information for a next instruction word. Then, the branch instruction is fetched and a prediction is made on this branch instruction based on information fetched in the previous cycle. The predicted target instruction is fetched on the next clock cycle. If zero overhead loops are used, a look-up table of a branch prediction unit is updated whenever the zero-overhead loop mechanism is updated. A last fetch address of a last instruction of a loop body of a zero overhead loop in the branch prediction look-up table is stored. Then, whenever an instruction fetch hits the end of a loop body, predictively re-directing an instruction fetch to the start of the loop body. The last fetch address of the loop body is derived from the address of the first instruction after the end of the loop.

Description

    CROSS REFERENCE TO RELATED APPLICATION(S)
  • This application claims priority to provisional application No. 60/572,238 filed May 19, 2004, entitled “Microprocessor Architecture,” hereby incorporated by reference in its entirety.
  • FIELD OF THE INVENTION
  • This invention relates generally to microprocessor architecture and more specifically to an improved architecture and mode of operation of a microprocessor for performing branch prediction.
  • BACKGROUND OF THE INVENTION
  • A typical component of a multistage microprocessor pipeline is the branch prediction unit (BPU). Usually located in or near a fetch stage of the pipelines the branch prediction unit increases effective processing speed by predicting whether a branch to a non-sequential instruction will be taken based upon past instruction processing history. The branch prediction unit contains a branch look-up or prediction table that stores the address of branch instructions, an indication as to whether the branch was taken and a speculative target address for a taken branch. When an instruction is fetched, if the instruction is a conditional branch, the result of the conditional branch is speculatively predicted based on past branch history. This speculative or predictive result is injected into the pipeline. Thus, referencing a branch history table, the next instruction is speculatively loaded into the pipeline. Whether or not the prediction will be correct, will not be known until a later stage of the pipeline. However, if the prediction is correct, clock cycles will be saved by not having to go back to get the next instruction address. Otherwise, the current pipeline behind the stage in which the actual address of the next instruction is determined must be flushed and the correct branch inserted back in the first stage. While this may seem like a harsh penalty for incorrect predictions, in applications where the instruction set is limited and small loops are repeated many times, such as, for example, applications typically implemented with embedded processors, branch prediction is usually accurate enough such that the benefits associated with correct predictions outweigh the cost of occasional incorrect predictions—i.e., pipeline flush. In these types of applications branch prediction can achieve accuracy over ninety percent of the time. Thus, the risk of predicting an incorrect branch resulting in a pipeline flush is outweighed by the benefit of saved clock cycles.
  • While branch prediction is effective at increasing effective processing speed, problems may arise that reduce or eliminate these efficiency gains when dealing with a variable length microprocessor instruction set. For example, if the look-up table is a comprised of entries associated with 32-bit wide fetch entities and instructions have lengths varying from 16 to 64-bits, a specific lookup table address entry may not be sufficient to reference a particular instruction.
  • The description herein of various advantages and disadvantages associated with known apparatus, methods, and materials is not intended to limit the scope of the invention to their exclusion. Indeed, various embodiments of the invention may include one or more of the known apparatus, methods, and materials without suffering from their disadvantages.
  • As background to the techniques discussed herein, the following references are incorporated herein by reference: U.S. Pat. No. 6,862,563 issued Mar. 1, 2005 entitled “Method And Apparatus For Managing The Configuration And Functionality Of A Semiconductor Design” (Hakewill et al.); U.S. Ser. No. 10/423,745 filed Apr. 25, 2003, entitled “Apparatus and Method for Managing Integrated Circuit Designs”; and U.S. Ser. No. 10/651,560 filed Aug. 29, 2003, entitled “Improved Computerized Extension Apparatus and Methods”, all assigned to the assignee of the present invention.
  • SUMMARY OF THE INVENTION
  • Thus, there exists a need for microprocessor architecture with reduced power consumption, improved performance, reduction of silicon footprint and improved branch prediction as compared with state of the art microprocessors.
  • In various embodiments of this invention, a microprocessor architecture is disclosed in which branch prediction information is selectively ignored by the instruction pipeline in order to avoid injection of erroneous instructions into the pipeline. These embodiments are particularly useful for branch prediction schemes in which variable length instructions are predictively fetched. In various exemplary embodiments, a 32-bit word is fetched based on the address in the branch prediction table. However, in branch prediction systems based on addresses of 32-bit fetch objects, because the instruction memory is comprised of 32-bit entries, regardless of instruction length, this address may reference a word comprising two 16-bit instruction words, or a 16-bit instruction word and an unaligned instruction word of larger length (32, 48 or 64 bits) or parts of two unaligned instruction words of such larger lengths.
  • In various embodiments, the branch prediction table may contain a tag coupled to the lower bits of a fetch instruction address. If the entry at the location specified by the branch prediction table contains more than one instruction, for example, two 16-bit instructions, or a 16-bit instruction and a portion of a 32, 48 or 64-bit instruction, a prediction may be made based on an instruction that will ultimately be discarded. Though the instruction aligner will discard the incorrect instruction, a predicted branch will already have been injected into the pipeline and will not be discovered until branch resolution in a later stage of the pipeline causing a pipeline flush.
  • Thus, in various exemplary embodiments, to prevent such an incorrect prediction from being made, a prediction will be discarded beforehand if two conditions are satisfied. In various embodiments, a prediction will be discarded if a branch prediction look-up is based on a non-sequential fetch to an unaligned address, and secondly, if the branch target alignment cache (BTAC) bit is equal to zero. This second condition will only be satisfied if the prediction is based on an instruction having an aligned instruction address. In various exemplary embodiments, an alignment bit of zero will indicate that the prediction information is for an aligned branch. This will prevent the predictions based on incorrect instructions from being injected into the pipeline.
  • In various embodiments of this invention, a microprocessor architecture is disclosed which utilizes dynamic branch prediction while removing the inherent latency involved in branch prediction. In this embodiment, an instruction fetch address is used to look up in a BPU table recording historical program flow to predict when a non-sequential program flow is to occur. However, instead of using the instruction address of the branch instruction to index the branch table, the address of the instruction prior to the branch instruction in the program flow is used to index the branch in the branch table. Thus, fetching the instruction prior to the branch instruction will cause a prediction to be made and eliminate the inherent one step latency in the process of dynamic branch prediction caused by the fetching the address of the branch instruction itself. In the above embodiment, it should be noted that in some cases, a delay slot instruction may be inserted after a conditional branch such that the conditional branch is not the last sequential instruction. In such a case, because the delay slot instruction is the actual sequential departure point, the instruction prior to the non-sequential program flow would actually be the branch instruction. Thus, the BPU would index such an entry by the address of the conditional branch instruction itself, since it would be the instruction prior to the non-sequential instruction.
  • In various embodiments, use of a delay slot instruction will also affect branch resolution in the selection stage. In various exemplary embodiments, if a delay slot instruction is utilized, update of the BPU must be deferred for one execution cycle after the branch instruction. This process is further complicated by the use of variable length instructions. Performance of branch resolution after execution requires updating of the BPU table. However, when the processor instruction set includes variable length instructions it becomes essential to determine the last fetch address of the current instruction as well as the update address, i.e., the fetch address prior to the sequential departure point. In various exemplary embodiments, if the current instruction is an aligned or non-aligned 16-bit or an aligned 32-bit instruction, the last fetch address will be the instruction fetch address of the current instruction. The update address of an aligned 16-bit or aligned 32-bit instruction will be the last fetch address of the prior instruction. For a non-aligned 16-bit instruction, if it was arrived at sequentially, the update address will be the update address of the prior instruction. Otherwise, the update address will be the last fetch address of the prior instruction.
  • In the same embodiment, if the current instruction is non-aligned 32-bit or an aligned 48-bit instruction, the last fetch address will simply be the address of the next instruction. The update address will be the current instruction address. The last fetch address of a non-aligned 48-bit instruction or an aligned 64-bit instruction will be the address of the next instruction minus one and the update address will be the current instruction address. If the current instruction is a non-aligned 64-bit, the last fetch address will be the same as the next instruction address and the update address will be the next instruction address minus one.
  • In exemplary embodiments of this invention, a microprocessor architecture is disclosed which employs dynamic branch prediction and zero overhead loops. In such a processor, the BPU is updated whenever the zero-overhead loop mechanism is updated. Specifically, the BPU needs to store the last fetch address of the last instruction of the loop body. This allows the BPU to predictively re-direct instruction fetch to the start of the loop body whenever an instruction fetch hits the end of the loop body. In this embodiment, the last fetch address of the loop body can be derived from the address of the first instruction after the end of the loop, despite the use of variable length instructions, by exploiting the fact that instructions are fetched in 32-bit word chunks and that instruction sizes are in general integer multiple of a 16-bits. Therefore, in this embodiment, if the next instruction after the end of the loop body has an aligned address, the last instruction of the loop body has a last fetch address immediately preceding the address of the next instruction after the end of the loop body. Otherwise, if the next instruction after the end of the loop body has an unaligned address, the last instruction of the loop body has the same fetch address as the next instruction after the loop body.
  • At least one exemplary embodiment of the invention provides a method of performing branch prediction in a microprocessor using variable length instructions. The method of performing branch prediction in a microprocessor using variable length instructions according to this embodiment comprises fetching an instruction from memory based on a specified fetch address, making a branch prediction based on the address of the fetched instruction, and discarding the branch prediction if (1) the branch prediction look-up was based on a non-sequential fetch to an unaligned instruction address and (2) if a branch target alignment cache (BTAC) bit of the instruction is equal to zero.
  • At least one additional exemplary embodiment provides a method of performing dynamic branch prediction in a microprocessor. The method of performing dynamic branch prediction in a microprocessor according to this embodiment may comprise fetching the penultimate instruction word prior to a non-sequential program flow and a branch prediction unit look-up table entry containing prediction information for a next instruction word on a first clock cycle, fetching the last instruction word prior to a non-sequential program flow and making a prediction on non-sequential program flow based on information fetched in the previous cycle on a second clock cycle, and fetching the predicted target instruction on a third clock cycle.
  • Yet an additional exemplary embodiment provides a method of updating a look-up table of a branch prediction unit in a variable length instruction set microprocessor. The method of updating a look-up table of a branch prediction unit in a variable length instruction set microprocessor may comprise storing a last fetch address of a last instruction of a loop body of a zero overhead loop in the branch prediction look-up table, and predictively re-directing an instruction fetch to the start of the loop body whenever an instruction fetch hits the end of a loop body, wherein the last fetch address of the loop body is derived from the address of the first instruction after the end of the loop.
  • Other aspects and advantages of the invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a diagram illustrating the contents of a 32-bit instruction memory and a corresponding table illustrating the location of particular instructions within the instruction memory in connection with a technique for selectively ignoring branch prediction information in accordance with at least one exemplary embodiment of this invention;
  • FIG. 2 is a flow chart illustrating the steps of a method for selectively discarding branch predictions corresponding to aligned 16-bit instructions having the same fetch address as a non-aligned 16-bit target instruction in accordance with at least one exemplary embodiment of this invention;
  • FIG. 3 is a flow chart illustrating a prior art method of performing branch prediction by storing non-sequential branch instructions in a branch prediction unit table that is indexed by the fetch address of the non-sequential branch instruction;
  • FIG. 4 is a flow chart illustrating a method for performing branch prediction by storing non-sequential branch instructions in a branch prediction table that is indexed by the fetch address of the instruction prior to the non-sequential branch instruction in accordance with at least one exemplary embodiment of this invention;
  • FIG. 5 is a diagram illustrating possible scenarios encountered during branch resolution when 32-bit words are fetched from memory in a system incorporating a variable length instruction architecture including instructions of 16-bits, 32-bits, 48-bits or 64-bits in length; and
  • FIGS. 6 and 7 are tables illustrating a method for computing the last instruction fetch address of a zero-overhead loop for dynamic branch prediction in a variable-length instruction set architecture processor;
  • DETAILED DESCRIPTION OF THE DISCLOSURE
  • The following description is intended to convey a thorough understanding of the invention by providing specific embodiments and details involving various aspects of a new and useful microprocessor architecture. It is understood, however, that the invention is not limited to these specific embodiments and details, which are exemplary only. It further is understood that one possessing ordinary skill in the art, in light of known systems and methods, would appreciate the use of the invention for its intended purposes and benefits in any number of alternative embodiments, depending upon specific design and other needs.
  • FIG. 1 is a diagram illustrating the contents of a 32-bit instruction memory and a corresponding table illustrating the location of particular instructions within the instruction memory in connection with a technique for selectively ignoring branch prediction information in accordance with at least one exemplary embodiment of this invention. When branch prediction is done in a microprocessor employing a variable length instruction set, a performance problem is created when a branch is made to an unaligned target address that is packed with an aligned instruction in the same 32-bit word that is predicted to be a branch.
  • In FIG. 1 a sequence of 32-bit wide memory words are shown containing instructions instr_1 through instr_4 in sequential locations in memory. Instr_2, is the target of a non-sequential instruction fetch. The BPU stores prediction information in its tables based only on the 32-bit fetch address of the start of the instruction. There can be more than one instruction in any 32-bit word in memory, however, only one prediction can be made per 32-bit word. Thus, the performance problem can be seen by referring to FIG. 1. The instruction address of instr_2 is actually 0x2, however, the fetch address is 0x0, and a fetch of this address will cause the entire 32-bit word comprised of 16-bits of instr_1 and 16-bits of instr_2 to be fetched. Under a simple BPU configuration, a branch prediction will be made for instr_1 based on the instruction fetch of the 32-bit word at address 0x0. The branch predictor does not take into account the fact that the instr_1 at 0x0 will be discarded by the aligner before it can be issued, however the prediction remains. The prediction would be correct if instr_1 is fetched as the result of a sequential fetch of 0x0, or if a branch was made to 0x0, but, in this case, where a branch is made to instr_2 at 0x2, the prediction is wrong. As a result, the prediction is wrong for instr_2 causing an incorrect instruction to hit the backstop and a pipeline flush, a severe performance penalty, to occur.
  • FIG. 2 is a flow chart outlining the steps of a method for solving the aforementioned problem by selectively discarding branch prediction information in accordance with various embodiments of the invention. Operation of the method begins at step 200 and proceeds to step 205 where a 32-bit word is read from memory at the specified fetch address of the target instruction. Next, in step 210, a prediction is made based on this fetched instruction. This prediction is based on the aligned instruction fetch location. Operation of the method then proceeds to step 215 where the first part of a two-part determination test is applied to whether the branch prediction lookup is based on a non-sequential fetch to an unaligned instruction address. In the context of FIG. 1, this condition would be satisfied by instr_2 because it is non-aligned (it does not start at the beginning of line 0x0, but rather after the first 16-bits). However, this condition alone is not sufficient because a valid branch prediction lookup can be based on a branch located at an unaligned instruction address. For example, if in FIG. 1 instr_1 is not a branch and instr_2 is a branch. If, in step 215, it is determined that the branch prediction lookup is based on a non-sequential fetch to an unaligned instruction address, operation of the method proceeds to the next step of the test, step 220. Otherwise, operation of the method jumps to step 225, where the prediction is assumed valid and passed.
  • Returning to step 220, in this step a second determination is made as to whether the branch target address cache (BTAC) alignment bit is 0, indicating that the prediction information is for an aligned branch. This bit will be 0 for all aligned branches and will be 1 for all unaligned branches because it is derived from the instruction address. The second bit of the instruction address will always be 0 for aligned branches (i.e., 0, 4, 8, f, etc.) and will always be 1 for unaligned branches (i.e., 2, 6, a, etc.). If, in step 220, it is determined that the branch target address cache (BTAC) alignment bit is not 0, operation proceeds to step 225 where the prediction is passed. Otherwise, if in step 220 it is determined that the BTAC alignment bit is 0, operation of the method proceeds to step 230, where the prediction is discarded. Thus, rather than causing an incorrect instruction to be injected into the pipeline which will ultimately cause a pipeline flush, the next sequential instruction will be correctly fetched. After step 230, operation of the method is the same as after step 225, where the next fetch address is updated in step 235 based on whether a branch was predicted and returns to step 205 where the next fetch occurs.
  • As discussed above, dynamic branch prediction is an effective technique to reduce branch penalty in a pipeline processor architecture. This technique uses the instruction fetch address to look up in internal tables recording program flow history to predict the target of a non-sequential program flow. Also, discussed above, branch prediction is complicated when a variable-length instruction architecture is used. In a variable-length instruction architecture, the instruction fetch address cannot be assumed to be identical to the actual instruction address. This makes it difficult for the branch prediction algorithm to guarantee sufficient instruction words are fetched and at the same time minimize unnecessary fetches.
  • One known method of ameliorating this problem is to add extra pipeline stages to the front of the processor pipeline to perform branch prediction prior to the instruction fetch to allow more time for the prediction mechanism to make a better decision. A negative consequence of this approach is that extra pipeline stages increase the penalty to correct an incorrect prediction. Alternatively, the extra pipeline stages would not be needed if prediction could be performed concurrent to instruction fetch. However, such a design has an inherent latency in which extra instructions are already fetched by the time a prediction is made.
  • Traditional branch prediction schemes use the instruction address of a branch instruction (non-sequential program instruction) to index its internal tables. FIG. 3 illustrates such a conventional indexing method in which two instructions are sequentially fetched, the first instruction being a branch instruction, and the second being the next sequential instruction word. In step 300, the branch instruction is fetched with the associated BPU table entry. In the next clock cycle, in step 305, this instruction is propagated in the pipeline to the next stage where it is detected as a predicted branch while the next instruction is fetched. Then, at step 310, in the next clock cycle, the target instruction is fetched based on the branch prediction made in the last cycle. Thus, a latency is introduced because three steps are required to fetch the branch instruction, make a prediction and fetch the target instruction. If the instruction word fetched in 305 is not part of the branch nor of its delay slot, then the word is discarded and as a result a “bubble” is injected into the pipeline.
  • FIG. 4 illustrates a novel and improved method for making a branch prediction in accordance with various embodiments of the invention. The method depicted in FIG. 4 is characterized in that the instruction address of the instruction preceding the branch instruction is used to index the BPU table rather than the instruction address of the branch instruction itself. As a result, by fetching the instruction just prior to the branch instruction, a prediction can be made from the address of this instruction while the branch instruction itself is being fetched.
  • Referring specifically to FIG. 4, the method begins in step 400 where the instruction prior to the branch instruction is fetched together with the BPU entry containing prediction information of the next instruction. Next, in step 405, the branch instruction is fetched while, concurrently, a prediction on this branch can be made based on information fetched in the previous cycle. Then, in step 410, in the next clock cycle, the target instruction is fetched. As illustrated, no extra instruction word is fetched between the branch and the target instructions. Hence, no bubble will be injected into the pipeline and overall performance of the processor is improved.
  • It should be noted that in some cases, due to the use of delay slot instructions, the branch instruction may not be the departure point (the instruction prior to non-sequential flow). Rather another instruction may appear after the branch instruction. Therefore, though the non-sequential jump is dictated by the branch instruction, the last instruction to be executed may not be the branch instruction, but may rather be the delay slot instruction. A delay slot is used in some processor architectures with short pipelines to hide branch resolution latency. Processors with dynamic branch prediction might still have to support the concept of delay slots to be compatible with legacy code. Where a delay slot instruction is used after the branch instruction, utilizing the above branch prediction scheme will cause the instruction address of the branch instruction, not the instruction before the branch instruction, to be used to index the BPU tables, because this instruction is actually the instruction before the last instruction. This fact has significant consequences for branch resolution as will be discussed below. Namely, in order to effectively perform branch resolution, we must know the last fetch address of the previous instruction.
  • As stated above, branch resolution occurs in the selection stage of the pipeline and causes the BPU to be updated to reflect the outcome of the conditional branch during the write-back stage. Referring to FIG. 5, FIG. 5 illustrates five potential scenarios encountered when performing branch resolution. These scenarios may be grouped into two groups by the way in which they are handled. Group one comprises a non-aligned 16-bit instruction and an aligned 16 or 32-bit instruction. Group two comprises one of three scenarios: a non-aligned 32 or 48-bit instruction, a non-aligned 48-bit or an aligned 64-bit instruction, and a non-aligned 64-bit instruction.
  • Two pieces of information need to be computed for every instruction under this scheme: namely the last fetch address of the current instruction, L0 and the update address of the current instruction U0. In the case of the scenarios of group one, it is also necessary to know L−1, the last fetch address of the previous instruction, and U−1, the update address of the previous instruction. Looking at both the first scenario and second scenarios, a non-aligned 16 bit instruction, and an aligned 16 or 32 bit instruction respectively, L0 is simply the 30 most significant bits of the fetch address denoted as instr_addr[31:2]. However, because in both of the scenarios, the instruction address spans only one fetch address line, the update address U0 depends on whether these instructions were arrived at sequentially or as the result of a non-sequential instruction. However, in keeping with the method discussed in the context of FIG. 4, we know the last fetch address of the prior instruction , also known as L−1. This information is stored internally and is available as a variable to the current instruction in the select stage of the pipeline. In the first scenario, if the current instruction is arrived at through sequential program flow, it has the same departure address as the prior instruction and hence U0 will be U−1. Otherwise, the update address will be the last fetch address of the prior non-sequential instruction L−1. In the second scenario, the update address of a 16 or 32-bit aligned instruction, U0 will be the last fetch address of the prior instruction L−1, irrespective of whether the prior instruction was sequential or not.
  • Scenarios 3-5 can be handled in the same manner by taking advantage of the fact that each instruction fetch fetches a contiguous 32-bit word. Therefore, when the instruction is sufficiently long and/or unaligned to span two or more consecutive fetched instruction words in memory, we know with certainty that L0, the last fetch address, can be derived from the instruction address of the next sequential instruction, denoted as next_addr[31:2] in FIG. 5. In scenarios 3 and 5, covering non-aligned 32-bit, aligned 48-bit and non-aligned 64-bit instructions, the last portion of the current instruction share the same fetch address with the start of the next sequential instruction. Hence L0 will be next_addr[31:2]. In scenario 4, covering non-aligned 48-bit or aligned 64-bit instructions, the fetch address of the last portion of the current instruction is one less than the start address of the next sequential instruction. Hence, L0=next_addr[31:2]−1. On the other hand, in scenario 3 and 4, the current instruction spans two consecutive 32-bit fetched instruction words. The fetch address prior to the last portion of the current instruction is always the fetch address of the start of the instruction. Therefore, U0 will be inst_addr[31:2]. In scenario 5, the last portion of the current instruction shares the same fetch address as the start of the next sequential instruction. Hence, U0 will be next_addr[31:2]−1. In the scheme just described, the update address U0 and last fetch address L0 are computed based on 4 values that are provided to the selection stage as early arriving signals directly from registers. These signals are namely inst_addr, next_addr, L−1 and U−1. Only one multiplexer is required to compute U0 in scenario 1, and one decrementer is required to compute L0 in scenario 4 and U0 in scenario 5. The overall complexity of the novel and improved branch prediction method being disclosed is only marginally increased comparing with traditional methods.
  • In yet another embodiment of the invention, a method and apparatus are provided for computing the last instruction fetch of a zero-overhead loop for dynamic branch prediction in a variable length instruction set microprocessor. Zero-overhead loops, as well as the previously discussed dynamic branch prediction, are both powerful techniques for improving effective processor performance. In a microprocessor employing both techniques, the BPU has to be updated whenever the zero-overhead loop mechanism is updated. In particular, the BPU needs the last instruction fetch address of the loop body. This allows the BPU to re-direct instruction fetch to the start of the loop body whenever an instruction fetch hits the end of the loop body. However, in a variable-length instruction architecture, determining the last fetch address of a loop body is not trivial. Typically, a processor with a variable-length instruction set only keeps track of the first address an instruction is fetched from. However, the last fetch address of a loop body is the fetch address of the last portion of the last instruction of the loop body and is not readily available.
  • Typically, a zero-overhead loop mechanism requires an address related to the end of the loop body to be stored as part of the architectural state. In various exemplary embodiments, this address can be denoted as LP_END. If LP_END is assigned the address of the next instruction after the last instruction of the loop body, the last fetch address of the loop body, designated in various exemplary embodiments as LP_LAST, can be derived by exploiting two facts. Firstly, despite the variable length nature of the instruction set, instructions are fetched in fixed size chunks, namely 32-bit words. The BPU works only with the fetch address of theses fixed size chunks. Secondly, instruction sizes of variable-length are usually an integer multiple of a fixed size, namely 16-bits. Based on these facts, an instruction can be classified as aligned if the start address of the instruction is the same as the fetch address. If LP_END is an aligned address, LP_LAST must be the fetch address that precedes that of LP_END. If LP_END is non aligned, LP_LAST is the fetch address of LP_END. Thus, the equation LP_LAST=LP_END[31:2]−(˜LP_END[1]) can be used to derive the LP_LAST whether or not LP_END is aligned.
  • Referring to FIGS. 6 and 7, two examples are illustrated in which LP_END is both non-aligned and aligned. In both cases, the instruction “sub” is the last instruction of the loop body. In the first case, LP_END is located at 0xA. In this case, LP_END is unaligned and LP_END[1] is 1, thus, the inversion of LP_END[1] is 0 and the last fetch address of the loop body, LP_LAST, is LP_END[32:2] which is 0x8. In the second case, LP_END is aligned and located at 0x18. LP_END[1] is 0, as with all aligned instructions, thus, the inversion of LP_END is 1 and LP13LAST is LP_END[31:2]−1 or the line above LP_LAST, line 0x14. Note that in the above calculations least significant bits of addresses that are known to be zero are ignored for the sake of simplifying the description.
  • While the foregoing description includes many details and specificities, it is to be understood that these have been included for purposes of explanation only, and are not to be interpreted as limitations of the present invention. Many modifications to the embodiments described above can be made without departing from the spirit and scope of the invention.

Claims (12)

1. In a microprocessor, a method of performing branch prediction using variable length instructions, the method comprising:
fetching an instruction from memory based on a specified fetch address;
making a branch prediction based on the address of the fetched instruction; and
discarding the branch prediction if:
(1) a branch prediction look-up was based on a non-sequential fetch to an unaligned instruction address; and
(2) a branch target alignment cache (BTAC) bit of the instruction is equal to a predefined value.
2. The method according to claim 1, further comprising passing a predicted instruction associated with the branch prediction if either (1) or (2) is false.
3. The method according to claim 2, further comprising updating a next fetch address if a branch prediction is incorrect.
4. The method according to claim 3, wherein the microprocessor comprises an instruction for pipeline having a select stage, and updating comprises after resolving a branch in the select stage, updating the branch prediction unit (BPU) with the address of the next instruction resulting from that branch.
5. The method according to claim 1, wherein making a branch prediction comprises parsing a branch look-up table of a branch prediction unit (BPU) that indexes non-sequential branch instructions by their addresses in association with the next instruction taken.
6. The method according to claim 1, wherein an instruction is determined to be unaligned if it does not start at the beginning of a memory address line.
7. The method according to claim 1, wherein a BTAC alignment bit will be one of a 0 or a 1 for an aligned branch instruction and the other of a 0 or a 1 for an unaligned branch instruction.
8. In a microprocessor, a method of performing dynamic branch prediction comprising:
fetching the penultimate instruction word prior to a non-sequential program flow and a branch prediction unit look-up table entry containing prediction information for a next instruction on a first clock cycle;
fetching the last instruction word prior to a non-sequential program flow and making a prediction on this non-sequential program flow based on information fetched in the previous cycle on a second clock cycle; and
fetching the predicted target instruction on a third clock cycle.
9. The method according to claim 8, wherein fetching an instruction prior to a branch instruction and a branch prediction look-up table entry comprises using the instruction address of the instruction just prior to the branch instruction in the program flow index the branch in the branch table.
10. The method according to claim 9, wherein if a delay slot instruction appears after the branch instruction, fetching an instruction prior to a branch instruction and a branch prediction look-up table entry comprises using the instruction address of the branch instruction, not the instruction before the branch instruction, to index the BPU tables.
11. A method of updating a look-up table of a branch prediction unit in a variable length instruction set microprocessor, the method comprising:
storing a last fetch address of a last instruction of a loop body of a zero overhead loop in the branch prediction look-up table; and
predictively re-directing an instruction fetch to the start of the loop body whenever an instruction fetch hits the end of a loop body, wherein the last fetch address of the loop body is derived from the address of the first instruction after the end of the loop.
12. The method according to claim 11, wherein storing comprises, if the next instruction after the end of the loop body has an aligned address, the last instruction of the loop body has a last fetch address immediately preceding the address of the next instruction after the end of the loop body, otherwise, if the next instruction after the end of the loop body has an unaligned address, the last instruction of the loop body has a last fetch address the same as the address of the next instruction after the loop body.
US11/132,428 2004-05-19 2005-05-19 Systems and methods for performing branch prediction in a variable length instruction set microprocessor Abandoned US20050278517A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/132,428 US20050278517A1 (en) 2004-05-19 2005-05-19 Systems and methods for performing branch prediction in a variable length instruction set microprocessor

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US57223804P 2004-05-19 2004-05-19
US11/132,428 US20050278517A1 (en) 2004-05-19 2005-05-19 Systems and methods for performing branch prediction in a variable length instruction set microprocessor

Publications (1)

Publication Number Publication Date
US20050278517A1 true US20050278517A1 (en) 2005-12-15

Family

ID=35429033

Family Applications (7)

Application Number Title Priority Date Filing Date
US11/132,428 Abandoned US20050278517A1 (en) 2004-05-19 2005-05-19 Systems and methods for performing branch prediction in a variable length instruction set microprocessor
US11/132,448 Abandoned US20050289323A1 (en) 2004-05-19 2005-05-19 Barrel shifter for a microprocessor
US11/132,432 Abandoned US20050273559A1 (en) 2004-05-19 2005-05-19 Microprocessor architecture including unified cache debug unit
US11/132,423 Abandoned US20050278513A1 (en) 2004-05-19 2005-05-19 Systems and methods of dynamic branch prediction in a microprocessor
US11/132,447 Abandoned US20050278505A1 (en) 2004-05-19 2005-05-19 Microprocessor architecture including zero impact predictive data pre-fetch mechanism for pipeline data memory
US11/132,424 Active 2031-02-12 US8719837B2 (en) 2004-05-19 2005-05-19 Microprocessor architecture having extendible logic
US14/222,194 Active US9003422B2 (en) 2004-05-19 2014-03-21 Microprocessor architecture having extendible logic

Family Applications After (6)

Application Number Title Priority Date Filing Date
US11/132,448 Abandoned US20050289323A1 (en) 2004-05-19 2005-05-19 Barrel shifter for a microprocessor
US11/132,432 Abandoned US20050273559A1 (en) 2004-05-19 2005-05-19 Microprocessor architecture including unified cache debug unit
US11/132,423 Abandoned US20050278513A1 (en) 2004-05-19 2005-05-19 Systems and methods of dynamic branch prediction in a microprocessor
US11/132,447 Abandoned US20050278505A1 (en) 2004-05-19 2005-05-19 Microprocessor architecture including zero impact predictive data pre-fetch mechanism for pipeline data memory
US11/132,424 Active 2031-02-12 US8719837B2 (en) 2004-05-19 2005-05-19 Microprocessor architecture having extendible logic
US14/222,194 Active US9003422B2 (en) 2004-05-19 2014-03-21 Microprocessor architecture having extendible logic

Country Status (5)

Country Link
US (7) US20050278517A1 (en)
CN (1) CN101002169A (en)
GB (1) GB2428842A (en)
TW (1) TW200602974A (en)
WO (1) WO2005114441A2 (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050278505A1 (en) * 2004-05-19 2005-12-15 Lim Seow C Microprocessor architecture including zero impact predictive data pre-fetch mechanism for pipeline data memory
WO2008039975A1 (en) * 2006-09-29 2008-04-03 Qualcomm Incorporated Effective use of a bht in processor having variable length instruction set execution modes
US7971042B2 (en) 2005-09-28 2011-06-28 Synopsys, Inc. Microprocessor system and method for instruction-initiated recording and execution of instruction sequences in a dynamically decoupleable extended instruction pipeline
WO2013101152A1 (en) * 2011-12-30 2013-07-04 Intel Corporation Embedded branch prediction unit
GB2545796A (en) * 2015-11-09 2017-06-28 Imagination Tech Ltd Fetch ahead branch target buffer
CN107179895A (en) * 2017-05-17 2017-09-19 北京中科睿芯科技有限公司 A kind of method that application compound instruction accelerates instruction execution speed in data flow architecture
WO2018009277A1 (en) * 2016-07-07 2018-01-11 Intel Corporation Graphics command parsing mechanism
CN110442382A (en) * 2019-07-31 2019-11-12 西安芯海微电子科技有限公司 Prefetch buffer control method, device, chip and computer readable storage medium
CN110727463A (en) * 2019-09-12 2020-01-24 无锡江南计算技术研究所 Zero-level instruction circular buffer prefetching method and device based on dynamic credit
US20200065112A1 (en) * 2018-08-22 2020-02-27 Qualcomm Incorporated Asymmetric speculative/nonspeculative conditional branching
CN112015490A (en) * 2020-11-02 2020-12-01 鹏城实验室 Method, apparatus and medium for programmable device implementing and testing reduced instruction set
US11182166B2 (en) 2019-05-23 2021-11-23 Samsung Electronics Co., Ltd. Branch prediction throughput by skipping over cachelines without branches
US11663007B2 (en) * 2021-10-01 2023-05-30 Arm Limited Control of branch prediction for zero-overhead loop

Families Citing this family (59)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7577795B2 (en) * 2006-01-25 2009-08-18 International Business Machines Corporation Disowning cache entries on aging out of the entry
US20070260862A1 (en) * 2006-05-03 2007-11-08 Mcfarling Scott Providing storage in a memory hierarchy for prediction information
US7752468B2 (en) 2006-06-06 2010-07-06 Intel Corporation Predict computing platform memory power utilization
US7555605B2 (en) * 2006-09-28 2009-06-30 Freescale Semiconductor, Inc. Data processing system having cache memory debugging support and method therefor
US7529909B2 (en) * 2006-12-28 2009-05-05 Microsoft Corporation Security verified reconfiguration of execution datapath in extensible microcomputer
US7779241B1 (en) * 2007-04-10 2010-08-17 Dunn David A History based pipelined branch prediction
US8209488B2 (en) * 2008-02-01 2012-06-26 International Business Machines Corporation Techniques for prediction-based indirect data prefetching
US8166277B2 (en) * 2008-02-01 2012-04-24 International Business Machines Corporation Data prefetching using indirect addressing
US9519480B2 (en) * 2008-02-11 2016-12-13 International Business Machines Corporation Branch target preloading using a multiplexer and hash circuit to reduce incorrect branch predictions
US9201655B2 (en) * 2008-03-19 2015-12-01 International Business Machines Corporation Method, computer program product, and hardware product for eliminating or reducing operand line crossing penalty
US8181003B2 (en) * 2008-05-29 2012-05-15 Axis Semiconductor, Inc. Instruction set design, control and communication in programmable microprocessor cores and the like
US8131982B2 (en) * 2008-06-13 2012-03-06 International Business Machines Corporation Branch prediction instructions having mask values involving unloading and loading branch history data
US8225069B2 (en) * 2009-03-31 2012-07-17 Intel Corporation Control of on-die system fabric blocks
US10338923B2 (en) * 2009-05-05 2019-07-02 International Business Machines Corporation Branch prediction path wrong guess instruction
JP5423156B2 (en) * 2009-06-01 2014-02-19 富士通株式会社 Information processing apparatus and branch prediction method
US8954714B2 (en) * 2010-02-01 2015-02-10 Altera Corporation Processor with cycle offsets and delay lines to allow scheduling of instructions through time
US8521999B2 (en) * 2010-03-11 2013-08-27 International Business Machines Corporation Executing touchBHT instruction to pre-fetch information to prediction mechanism for branch with taken history
US8495287B2 (en) * 2010-06-24 2013-07-23 International Business Machines Corporation Clock-based debugging for embedded dynamic random access memory element in a processor core
US10866807B2 (en) 2011-12-22 2020-12-15 Intel Corporation Processors, methods, systems, and instructions to generate sequences of integers in numerical order that differ by a constant stride
WO2013095563A1 (en) 2011-12-22 2013-06-27 Intel Corporation Packed data rearrangement control indexes precursors generation processors, methods, systems, and instructions
US10565283B2 (en) 2011-12-22 2020-02-18 Intel Corporation Processors, methods, systems, and instructions to generate sequences of consecutive integers in numerical order
US10223111B2 (en) 2011-12-22 2019-03-05 Intel Corporation Processors, methods, systems, and instructions to generate sequences of integers in which integers in consecutive positions differ by a constant integer stride and where a smallest integer is offset from zero by an integer offset
US9851973B2 (en) * 2012-03-30 2017-12-26 Intel Corporation Dynamic branch hints using branches-to-nowhere conditional branch
US9152424B2 (en) 2012-06-14 2015-10-06 International Business Machines Corporation Mitigating instruction prediction latency with independently filtered presence predictors
US9135012B2 (en) 2012-06-14 2015-09-15 International Business Machines Corporation Instruction filtering
KR101996351B1 (en) * 2012-06-15 2019-07-05 인텔 코포레이션 A virtual load store queue having a dynamic dispatch window with a unified structure
US9378017B2 (en) * 2012-12-29 2016-06-28 Intel Corporation Apparatus and method of efficient vector roll operation
CN103425498B (en) * 2013-08-20 2018-07-24 复旦大学 A kind of long instruction words command memory of low-power consumption and its method for optimizing power consumption
US10372590B2 (en) * 2013-11-22 2019-08-06 International Business Corporation Determining instruction execution history in a debugger
US9870226B2 (en) * 2014-07-03 2018-01-16 The Regents Of The University Of Michigan Control of switching between executed mechanisms
US9910670B2 (en) * 2014-07-09 2018-03-06 Intel Corporation Instruction set for eliminating misaligned memory accesses during processing of an array having misaligned data rows
US9740607B2 (en) 2014-09-03 2017-08-22 Micron Technology, Inc. Swap operations in memory
TWI569207B (en) * 2014-10-28 2017-02-01 上海兆芯集成電路有限公司 Fractional use of prediction history storage for operating system routines
US9665374B2 (en) * 2014-12-18 2017-05-30 Intel Corporation Binary translation mechanism
US9792116B2 (en) * 2015-04-24 2017-10-17 Optimum Semiconductor Technologies, Inc. Computer processor that implements pre-translation of virtual addresses with target registers
US10346168B2 (en) 2015-06-26 2019-07-09 Microsoft Technology Licensing, Llc Decoupled processor instruction window and operand buffer
US10776115B2 (en) * 2015-09-19 2020-09-15 Microsoft Technology Licensing, Llc Debug support for block-based processor
GB2548601B (en) * 2016-03-23 2019-02-13 Advanced Risc Mach Ltd Processing vector instructions
US10599428B2 (en) 2016-03-23 2020-03-24 Arm Limited Relaxed execution of overlapping mixed-scalar-vector instructions
CN109690536B (en) * 2017-02-16 2021-03-23 华为技术有限公司 Method and system for fetching multi-core instruction traces from virtual platform simulator to performance simulation model
US9959247B1 (en) 2017-02-17 2018-05-01 Google Llc Permuting in a matrix-vector processor
US10902348B2 (en) 2017-05-19 2021-01-26 International Business Machines Corporation Computerized branch predictions and decisions
GB2564390B (en) * 2017-07-04 2019-10-02 Advanced Risc Mach Ltd An apparatus and method for controlling use of a register cache
US11868804B1 (en) 2019-11-18 2024-01-09 Groq, Inc. Processor instruction dispatch configuration
US11360934B1 (en) 2017-09-15 2022-06-14 Groq, Inc. Tensor streaming processor architecture
US11114138B2 (en) 2017-09-15 2021-09-07 Groq, Inc. Data structures with multiple read ports
US11243880B1 (en) 2017-09-15 2022-02-08 Groq, Inc. Processor architecture
US10372459B2 (en) 2017-09-21 2019-08-06 Qualcomm Incorporated Training and utilization of neural branch predictor
US11170307B1 (en) 2017-09-21 2021-11-09 Groq, Inc. Predictive model compiler for generating a statically scheduled binary with known resource constraints
US11537687B2 (en) 2018-11-19 2022-12-27 Groq, Inc. Spatial locality transform of matrices
US11163577B2 (en) 2018-11-26 2021-11-02 International Business Machines Corporation Selectively supporting static branch prediction settings only in association with processor-designated types of instructions
US11086631B2 (en) 2018-11-30 2021-08-10 Western Digital Technologies, Inc. Illegal instruction exception handling
CN109783384A (en) * 2019-01-10 2019-05-21 未来电视有限公司 Log use-case test method, log use-case test device and electronic equipment
CN114930351A (en) 2019-11-26 2022-08-19 格罗克公司 Loading operands from a multidimensional array and outputting results using only a single side
CN113076277B (en) * 2021-03-26 2024-05-03 大唐微电子技术有限公司 Method, device, computer storage medium and terminal for realizing pipeline scheduling
US12067395B2 (en) 2021-08-12 2024-08-20 Tenstorrent Inc. Pre-staged instruction registers for variable length instruction set machine
US11599358B1 (en) 2021-08-12 2023-03-07 Tenstorrent Inc. Pre-staged instruction registers for variable length instruction set machine
CN115495155B (en) * 2022-11-18 2023-03-24 北京数渡信息科技有限公司 Hardware circulation processing device suitable for general processor
CN117193861B (en) * 2023-11-07 2024-03-15 芯来智融半导体科技(上海)有限公司 Instruction processing method, apparatus, computer device and storage medium

Citations (74)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5155843A (en) * 1990-06-29 1992-10-13 Digital Equipment Corporation Error transition mode for multi-processor system
US5423011A (en) * 1992-06-11 1995-06-06 International Business Machines Corporation Apparatus for initializing branch prediction information
US5493687A (en) * 1991-07-08 1996-02-20 Seiko Epson Corporation RISC microprocessor architecture implementing multiple typed register sets
US5530825A (en) * 1994-04-15 1996-06-25 Motorola, Inc. Data processor with branch target address cache and method of operation
US5752014A (en) * 1996-04-29 1998-05-12 International Business Machines Corporation Automatic selection of branch prediction methodology for subsequent branch instruction based on outcome of previous branch prediction
US5778423A (en) * 1990-06-29 1998-07-07 Digital Equipment Corporation Prefetch instruction for improving performance in reduced instruction set processor
US5805876A (en) * 1996-09-30 1998-09-08 International Business Machines Corporation Method and system for reducing average branch resolution time and effective misprediction penalty in a processor
US5808876A (en) * 1997-06-20 1998-09-15 International Business Machines Corporation Multi-function power distribution system
US5920711A (en) * 1995-06-02 1999-07-06 Synopsys, Inc. System for frame-based protocol, graphical capture, synthesis, analysis, and simulation
US5978909A (en) * 1997-11-26 1999-11-02 Intel Corporation System for speculative branch target prediction having a dynamic prediction history buffer and a static prediction history buffer
US6076158A (en) * 1990-06-29 2000-06-13 Digital Equipment Corporation Branch prediction in high-performance processor
US6151672A (en) * 1998-02-23 2000-11-21 Hewlett-Packard Company Methods and apparatus for reducing interference in a branch history table of a microprocessor
US6189091B1 (en) * 1998-12-02 2001-02-13 Ip First, L.L.C. Apparatus and method for speculatively updating global history and restoring same on branch misprediction detection
US6253287B1 (en) * 1998-09-09 2001-06-26 Advanced Micro Devices, Inc. Using three-dimensional storage to make variable-length instructions appear uniform in two dimensions
US20010016903A1 (en) * 1998-12-03 2001-08-23 Marc Tremblay Software branch prediction filtering for a microprocessor
US20010021974A1 (en) * 2000-02-01 2001-09-13 Samsung Electronics Co., Ltd. Branch predictor suitable for multi-processing microprocessor
US6292879B1 (en) * 1995-10-25 2001-09-18 Anthony S. Fong Method and apparatus to specify access control list and cache enabling and cache coherency requirement enabling on individual operands of an instruction of a computer
US20010032309A1 (en) * 1999-03-18 2001-10-18 Henry G. Glenn Static branch prediction mechanism for conditional branch instructions
US20010040686A1 (en) * 1998-06-26 2001-11-15 Heidi M. Schoolcraft Streamlined tetrahedral interpolation
US20010044892A1 (en) * 1997-09-10 2001-11-22 Shinichi Yamaura Method and system for high performance implementation of microprocessors
US6339822B1 (en) * 1998-10-02 2002-01-15 Advanced Micro Devices, Inc. Using padded instructions in a block-oriented cache
US20020066006A1 (en) * 2000-11-29 2002-05-30 Lsi Logic Corporation Simple branch prediction and misprediction recovery method
US20020069351A1 (en) * 2000-12-05 2002-06-06 Shyh-An Chi Memory data access structure and method suitable for use in a processor
US20020073301A1 (en) * 2000-12-07 2002-06-13 International Business Machines Corporation Hardware for use with compiler generated branch information
US20020078332A1 (en) * 2000-12-19 2002-06-20 Seznec Andre C. Conflict free parallel read access to a bank interleaved branch predictor in a processor
US20020083312A1 (en) * 2000-12-27 2002-06-27 Balaram Sinharoy Branch Prediction apparatus and process for restoring replaced branch history for use in future branch predictions for an executing program
US20020087852A1 (en) * 2000-12-28 2002-07-04 Jourdan Stephan J. Method and apparatus for predicting branches using a meta predictor
US20020087851A1 (en) * 2000-12-28 2002-07-04 Matsushita Electric Industrial Co., Ltd. Microprocessor and an instruction converter
US20020138236A1 (en) * 2001-03-21 2002-09-26 Akihiro Takamura Processor having execution result prediction function for instruction
US20020157000A1 (en) * 2001-03-01 2002-10-24 International Business Machines Corporation Software hint to improve the branch target prediction accuracy
US20020188833A1 (en) * 2001-05-04 2002-12-12 Ip First Llc Dual call/return stack branch prediction system
US20020194462A1 (en) * 2001-05-04 2002-12-19 Ip First Llc Apparatus and method for selecting one of multiple target addresses stored in a speculative branch target address cache per instruction cache line
US20020194461A1 (en) * 2001-05-04 2002-12-19 Ip First Llc Speculative branch target address cache
US20020194464A1 (en) * 2001-05-04 2002-12-19 Ip First Llc Speculative branch target address cache with selective override by seconday predictor based on branch instruction type
US20020194463A1 (en) * 2001-05-04 2002-12-19 Ip First Llc, Speculative hybrid branch direction predictor
US20020199092A1 (en) * 1999-11-05 2002-12-26 Ip-First Llc Split history tables for branch prediction
US20030023838A1 (en) * 2001-07-27 2003-01-30 Karim Faraydon O. Novel fetch branch architecture for reducing branch penalty without branch prediction
US6550056B1 (en) * 1999-07-19 2003-04-15 Mitsubishi Denki Kabushiki Kaisha Source level debugger for debugging source programs
US6609194B1 (en) * 1999-11-12 2003-08-19 Ip-First, Llc Apparatus for performing branch target address calculation based on branch type
US6622240B1 (en) * 1999-06-18 2003-09-16 Intrinsity, Inc. Method and apparatus for pre-branch instruction
US20030204705A1 (en) * 2002-04-30 2003-10-30 Oldfield William H. Prediction of branch instructions in a data processing apparatus
US20040015683A1 (en) * 2002-07-18 2004-01-22 International Business Machines Corporation Two dimensional branch history table prefetching mechanism
US20040049660A1 (en) * 2002-09-06 2004-03-11 Mips Technologies, Inc. Method and apparatus for clearing hazards using jump instructions
US20040068643A1 (en) * 1997-08-01 2004-04-08 Dowling Eric M. Method and apparatus for high performance branching in pipelined microsystems
US6774832B1 (en) * 2003-03-25 2004-08-10 Raytheon Company Multi-bit output DDS with real time delta sigma modulation look up from memory
US20040172524A1 (en) * 2001-06-29 2004-09-02 Jan Hoogerbrugge Method, apparatus and compiler for predicting indirect branch target addresses
US20040186985A1 (en) * 2003-03-21 2004-09-23 Analog Devices, Inc. Method and apparatus for branch prediction based on branch targets
US20040193843A1 (en) * 2003-03-31 2004-09-30 Eran Altshuler System and method for early branch prediction
US20040193855A1 (en) * 2003-03-31 2004-09-30 Nicolas Kacevas System and method for branch prediction access
US20040225871A1 (en) * 1999-10-01 2004-11-11 Naohiko Irie Branch control memory
US20040225870A1 (en) * 2003-05-07 2004-11-11 Srinivasan Srikanth T. Method and apparatus for reducing wrong path execution in a speculative multi-threaded processor
US20040225872A1 (en) * 2002-06-04 2004-11-11 International Business Machines Corporation Hybrid branch prediction using a global selection counter and a prediction method comparison table
US20040230782A1 (en) * 2003-05-12 2004-11-18 International Business Machines Corporation Method and system for processing loop branch instructions
US6823444B1 (en) * 2001-07-03 2004-11-23 Ip-First, Llc Apparatus and method for selectively accessing disparate instruction buffer stages based on branch target address cache hit and instruction stage wrap
US20040255104A1 (en) * 2003-06-12 2004-12-16 Intel Corporation Method and apparatus for recycling candidate branch outcomes after a wrong-path execution in a superscalar processor
US20040268102A1 (en) * 2003-06-30 2004-12-30 Combs Jonathan D. Mechanism to remove stale branch predictions at a microprocessor
US20050027974A1 (en) * 2003-07-31 2005-02-03 Oded Lempel Method and system for conserving resources in an instruction pipeline
US20050050309A1 (en) * 2003-08-29 2005-03-03 Renesas Technology Corp. Data processor
US20050066305A1 (en) * 2003-09-22 2005-03-24 Lisanke Robert John Method and machine for efficient simulation of digital hardware within a software development environment
US20050076193A1 (en) * 2003-09-08 2005-04-07 Ip-First, Llc. Apparatus and method for selectively overriding return stack prediction in response to detection of non-standard return sequence
US20050091479A1 (en) * 2003-10-24 2005-04-28 Sung-Woo Chung Branch predictor, system and method of branch prediction
US20050125613A1 (en) * 2003-12-03 2005-06-09 Sangwook Kim Reconfigurable trace cache
US20050125632A1 (en) * 2003-12-03 2005-06-09 Advanced Micro Devices, Inc. Transitioning from instruction cache to trace cache on label boundaries
US20050125634A1 (en) * 2002-10-04 2005-06-09 Fujitsu Limited Processor and instruction control method
US20050154867A1 (en) * 2004-01-14 2005-07-14 International Business Machines Corporation Autonomic method and apparatus for counting branch instructions to improve branch predictions
US20050172277A1 (en) * 2004-02-04 2005-08-04 Saurabh Chheda Energy-focused compiler-assisted branch prediction
US20050216703A1 (en) * 2004-03-26 2005-09-29 International Business Machines Corporation Apparatus and method for decreasing the latency between an instruction cache and a pipeline processor
US20050216713A1 (en) * 2004-03-25 2005-09-29 International Business Machines Corporation Instruction text controlled selectively stated branches for prediction via a branch target buffer
US20050223202A1 (en) * 2004-03-31 2005-10-06 Intel Corporation Branch prediction in a pipelined processor
US6963554B1 (en) * 2000-12-27 2005-11-08 National Semiconductor Corporation Microwire dynamic sequencer pipeline stall
US20060015706A1 (en) * 2004-06-30 2006-01-19 Chunrong Lai TLB correlated branch predictor and method for use thereof
US20060036836A1 (en) * 1998-12-31 2006-02-16 Metaflow Technologies, Inc. Block-based branch target buffer
US20060041868A1 (en) * 2004-08-23 2006-02-23 Cheng-Yen Huang Method for verifying branch prediction mechanism and accessible recording medium for storing program thereof
US7162619B2 (en) * 2001-07-03 2007-01-09 Ip-First, Llc Apparatus and method for densely packing a branch instruction predicted by a branch target address cache and associated target instructions into a byte-wide instruction buffer

Family Cites Families (148)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4342082A (en) 1977-01-13 1982-07-27 International Business Machines Corp. Program instruction mechanism for shortened recursive handling of interruptions
US4216539A (en) 1978-05-05 1980-08-05 Zehntel, Inc. In-circuit digital tester
US4400773A (en) 1980-12-31 1983-08-23 International Business Machines Corp. Independent handling of I/O interrupt requests and associated status information transfers
US4594659A (en) * 1982-10-13 1986-06-10 Honeywell Information Systems Inc. Method and apparatus for prefetching instructions for a central execution pipeline unit
JPS63225822A (en) 1986-08-11 1988-09-20 Toshiba Corp Barrel shifter
US4905178A (en) 1986-09-19 1990-02-27 Performance Semiconductor Corporation Fast shifter method and structure
JPS6398729A (en) * 1986-10-15 1988-04-30 Fujitsu Ltd Barrel shifter
US4914622A (en) 1987-04-17 1990-04-03 Advanced Micro Devices, Inc. Array-organized bit map with a barrel shifter
US4962500A (en) 1987-08-28 1990-10-09 Nec Corporation Data processor including testing structure for a barrel shifter
KR970005453B1 (en) 1987-12-25 1997-04-16 가부시기가이샤 히다찌세이사꾸쇼 Data processing apparatus for high speed processing
US4926323A (en) * 1988-03-03 1990-05-15 Advanced Micro Devices, Inc. Streamlined instruction processor
JPH01263820A (en) 1988-04-15 1989-10-20 Hitachi Ltd Microprocessor
DE3886739D1 (en) 1988-06-02 1994-02-10 Itt Ind Gmbh Deutsche Device for digital signal processing.
GB2229832B (en) 1989-03-30 1993-04-07 Intel Corp Byte swap instruction for memory format conversion within a microprocessor
EP0415648B1 (en) * 1989-08-31 1998-05-20 Canon Kabushiki Kaisha Image processing apparatus
JPH03185530A (en) 1989-12-14 1991-08-13 Mitsubishi Electric Corp Data processor
EP0436341B1 (en) 1990-01-02 1997-05-07 Motorola, Inc. Sequential prefetch method for 1, 2 or 3 word instructions
JPH03248226A (en) 1990-02-26 1991-11-06 Nec Corp Microprocessor
JP2560889B2 (en) * 1990-05-22 1996-12-04 日本電気株式会社 Microprocessor
JP2556612B2 (en) * 1990-08-29 1996-11-20 日本電気アイシーマイコンシステム株式会社 Barrel shifter circuit
US5636363A (en) * 1991-06-14 1997-06-03 Integrated Device Technology, Inc. Hardware control structure and method for off-chip monitoring entries of an on-chip cache
DE69229084T2 (en) * 1991-07-08 1999-10-21 Canon K.K., Tokio/Tokyo Color imaging process, color image reader and color image processing apparatus
US5539911A (en) * 1991-07-08 1996-07-23 Seiko Epson Corporation High-performance, superscalar-based computer system with out-of-order instruction execution
US5450586A (en) * 1991-08-14 1995-09-12 Hewlett-Packard Company System for analyzing and debugging embedded software through dynamic and interactive use of code markers
CA2073516A1 (en) 1991-11-27 1993-05-28 Peter Michael Kogge Dynamic multi-mode parallel processor array architecture computer system
US5485625A (en) 1992-06-29 1996-01-16 Ford Motor Company Method and apparatus for monitoring external events during a microprocessor's sleep mode
US5274770A (en) 1992-07-29 1993-12-28 Tritech Microelectronics International Pte Ltd. Flexible register-based I/O microcontroller with single cycle instruction execution
US5294928A (en) 1992-08-31 1994-03-15 Microchip Technology Incorporated A/D converter with zero power mode
US5333119A (en) 1992-09-30 1994-07-26 Regents Of The University Of Minnesota Digital signal processor with delayed-evaluation array multipliers and low-power memory addressing
US5542074A (en) 1992-10-22 1996-07-30 Maspar Computer Corporation Parallel processor system with highly flexible local control capability, including selective inversion of instruction signal and control of bit shift amount
US5696958A (en) 1993-01-11 1997-12-09 Silicon Graphics, Inc. Method and apparatus for reducing delays following the execution of a branch instruction in an instruction pipeline
GB2275119B (en) * 1993-02-03 1997-05-14 Motorola Inc A cached processor
US5577217A (en) * 1993-05-14 1996-11-19 Intel Corporation Method and apparatus for a branch target buffer with shared branch pattern tables for associated branch predictions
JPH06332693A (en) 1993-05-27 1994-12-02 Hitachi Ltd Issuing system of suspending instruction with time-out function
US5454117A (en) 1993-08-25 1995-09-26 Nexgen, Inc. Configurable branch prediction for a processor performing speculative execution
US5584031A (en) 1993-11-09 1996-12-10 Motorola Inc. System and method for executing a low power delay instruction
JP2801135B2 (en) * 1993-11-26 1998-09-21 富士通株式会社 Instruction reading method and instruction reading device for pipeline processor
US6116768A (en) 1993-11-30 2000-09-12 Texas Instruments Incorporated Three input arithmetic logic unit with barrel rotator
US5590350A (en) 1993-11-30 1996-12-31 Texas Instruments Incorporated Three input arithmetic logic unit with mask generator
US5509129A (en) 1993-11-30 1996-04-16 Guttag; Karl M. Long instruction word controlling plural independent processor operations
US5590351A (en) 1994-01-21 1996-12-31 Advanced Micro Devices, Inc. Superscalar execution unit for sequential instruction pointer updates and segment limit checks
TW253946B (en) * 1994-02-04 1995-08-11 Ibm Data processor with branch prediction and method of operation
JPH07253922A (en) 1994-03-14 1995-10-03 Texas Instr Japan Ltd Address generating circuit
US5517436A (en) 1994-06-07 1996-05-14 Andreas; David C. Digital signal processor for audio applications
US5809293A (en) 1994-07-29 1998-09-15 International Business Machines Corporation System and method for program execution tracing within an integrated processor
US5566357A (en) 1994-10-06 1996-10-15 Qualcomm Incorporated Power reduction in a cellular radiotelephone
US5692168A (en) * 1994-10-18 1997-11-25 Cyrix Corporation Prefetch buffer using flow control bit to identify changes of flow within the code stream
JPH08202469A (en) 1995-01-30 1996-08-09 Fujitsu Ltd Microcontroller unit equipped with universal asychronous transmitting and receiving circuit
US5600674A (en) 1995-03-02 1997-02-04 Motorola Inc. Method and apparatus of an enhanced digital signal processor
US5655122A (en) 1995-04-05 1997-08-05 Sequent Computer Systems, Inc. Optimizing compiler with static prediction of branch probability, branch frequency and function frequency
US5835753A (en) 1995-04-12 1998-11-10 Advanced Micro Devices, Inc. Microprocessor with dynamically extendable pipeline stages and a classifying circuit
US5659752A (en) * 1995-06-30 1997-08-19 International Business Machines Corporation System and method for improving branch prediction in compiled program code
US5842004A (en) 1995-08-04 1998-11-24 Sun Microsystems, Inc. Method and apparatus for decompression of compressed geometric three-dimensional graphics data
US5768602A (en) 1995-08-04 1998-06-16 Apple Computer, Inc. Sleep mode controller for power management
US5727211A (en) * 1995-11-09 1998-03-10 Chromatic Research, Inc. System and method for fast context switching between tasks
US5774709A (en) 1995-12-06 1998-06-30 Lsi Logic Corporation Enhanced branch delay slot handling with single exception program counter
US5778438A (en) 1995-12-06 1998-07-07 Intel Corporation Method and apparatus for maintaining cache coherency in a computer system with a highly pipelined bus and multiple conflicting snoop requests
US5996071A (en) * 1995-12-15 1999-11-30 Via-Cyrix, Inc. Detecting self-modifying code in a pipelined processor with branch processing by comparing latched store address to subsequent target address
JP3663710B2 (en) * 1996-01-17 2005-06-22 ヤマハ株式会社 Program generation method and processor interrupt control method
US5896305A (en) * 1996-02-08 1999-04-20 Texas Instruments Incorporated Shifter circuit for an arithmetic logic unit in a microprocessor
JPH09261490A (en) * 1996-03-22 1997-10-03 Minolta Co Ltd Image forming device
US5784636A (en) 1996-05-28 1998-07-21 National Semiconductor Corporation Reconfigurable computer architecture for use in signal processing applications
US20010025337A1 (en) 1996-06-10 2001-09-27 Frank Worrell Microprocessor including a mode detector for setting compression mode
US5826079A (en) 1996-07-05 1998-10-20 Ncr Corporation Method for improving the execution efficiency of frequently communicating processes utilizing affinity process scheduling by identifying and assigning the frequently communicating processes to the same processor
US5964884A (en) * 1996-09-30 1999-10-12 Advanced Micro Devices, Inc. Self-timed pulse control circuit
US5848264A (en) 1996-10-25 1998-12-08 S3 Incorporated Debug and video queue for multi-processor chip
US6058142A (en) 1996-11-29 2000-05-02 Sony Corporation Image processing apparatus
US6061521A (en) 1996-12-02 2000-05-09 Compaq Computer Corp. Computer having multimedia operations executable as two distinct sets of operations within a single instruction cycle
US5909572A (en) 1996-12-02 1999-06-01 Compaq Computer Corp. System and method for conditionally moving an operand from a source register to a destination register
EP0855645A3 (en) * 1996-12-31 2000-05-24 Texas Instruments Incorporated System and method for speculative execution of instructions with data prefetch
KR100236533B1 (en) 1997-01-16 2000-01-15 윤종용 Digital signal processor
EP0855718A1 (en) 1997-01-28 1998-07-29 Hewlett-Packard Company Memory low power mode control
US6185732B1 (en) * 1997-04-08 2001-02-06 Advanced Micro Devices, Inc. Software debug port for a microprocessor
US6154857A (en) * 1997-04-08 2000-11-28 Advanced Micro Devices, Inc. Microprocessor-based device incorporating a cache for capturing software performance profiling data
US6584525B1 (en) 1998-11-19 2003-06-24 Edwin E. Klingman Adaptation of standard microprocessor architectures via an interface to a configurable subsystem
US6021500A (en) 1997-05-07 2000-02-01 Intel Corporation Processor with sleep and deep sleep modes
US5950120A (en) 1997-06-17 1999-09-07 Lsi Logic Corporation Apparatus and method for shutdown of wireless communications mobile station with multiple clocks
US5931950A (en) 1997-06-17 1999-08-03 Pc-Tel, Inc. Wake-up-on-ring power conservation for host signal processing communication system
US6035374A (en) 1997-06-25 2000-03-07 Sun Microsystems, Inc. Method of executing coded instructions in a multiprocessor having shared execution resources including active, nap, and sleep states in accordance with cache miss latency
US6088786A (en) 1997-06-27 2000-07-11 Sun Microsystems, Inc. Method and system for coupling a stack based processor to register based functional unit
US5878264A (en) 1997-07-17 1999-03-02 Sun Microsystems, Inc. Power sequence controller with wakeup logic for enabling a wakeup interrupt handler procedure
US6026478A (en) 1997-08-01 2000-02-15 Micron Technology, Inc. Split embedded DRAM processor
US6226738B1 (en) 1997-08-01 2001-05-01 Micron Technology, Inc. Split embedded DRAM processor
US6760833B1 (en) 1997-08-01 2004-07-06 Micron Technology, Inc. Split embedded DRAM processor
JPH11143571A (en) 1997-11-05 1999-05-28 Mitsubishi Electric Corp Data processor
US6044458A (en) 1997-12-12 2000-03-28 Motorola, Inc. System for monitoring program flow utilizing fixwords stored sequentially to opcodes
US6014743A (en) 1998-02-05 2000-01-11 Intergrated Device Technology, Inc. Apparatus and method for recording a floating point error pointer in zero cycles
US6374349B2 (en) 1998-03-19 2002-04-16 Mcfarling Scott Branch predictor with serially connected predictor stages for improving branch prediction accuracy
US6289417B1 (en) 1998-05-18 2001-09-11 Arm Limited Operand supply to an execution unit
US6308279B1 (en) 1998-05-22 2001-10-23 Intel Corporation Method and apparatus for power mode transition in a multi-thread processor
JPH11353225A (en) 1998-05-26 1999-12-24 Internatl Business Mach Corp <Ibm> Memory that processor addressing gray code system in sequential execution style accesses and method for storing code and data in memory
US20020053015A1 (en) 1998-07-14 2002-05-02 Sony Corporation And Sony Electronics Inc. Digital signal processor particularly suited for decoding digital audio
US6327651B1 (en) 1998-09-08 2001-12-04 International Business Machines Corporation Wide shifting in the vector permute unit
US6240521B1 (en) 1998-09-10 2001-05-29 International Business Machines Corp. Sleep mode transition between processors sharing an instruction set and an address space
US6347379B1 (en) 1998-09-25 2002-02-12 Intel Corporation Reducing power consumption of an electronic device
US6862563B1 (en) 1998-10-14 2005-03-01 Arc International Method and apparatus for managing the configuration and functionality of a semiconductor design
US6671743B1 (en) * 1998-11-13 2003-12-30 Creative Technology, Ltd. Method and system for exposing proprietary APIs in a privileged device driver to an application
DE69910826T2 (en) * 1998-11-20 2004-06-17 Altera Corp., San Jose COMPUTER SYSTEM WITH RECONFIGURABLE PROGRAMMABLE LOGIC DEVICE
US6763452B1 (en) * 1999-01-28 2004-07-13 Ati International Srl Modifying program execution based on profiling
US6477683B1 (en) * 1999-02-05 2002-11-05 Tensilica, Inc. Automated processor generation system for designing a configurable processor and method for the same
US6418530B2 (en) 1999-02-18 2002-07-09 Hewlett-Packard Company Hardware/software system for instruction profiling and trace selection using branch history information for branch predictions
US6427206B1 (en) 1999-05-03 2002-07-30 Intel Corporation Optimized branch predictions for strongly predicted compiler branches
US6560754B1 (en) * 1999-05-13 2003-05-06 Arc International Plc Method and apparatus for jump control in a pipelined processor
US6438700B1 (en) 1999-05-18 2002-08-20 Koninklijke Philips Electronics N.V. System and method to reduce power consumption in advanced RISC machine (ARM) based systems
US6571333B1 (en) 1999-11-05 2003-05-27 Intel Corporation Initializing a memory controller by executing software in second memory to wakeup a system
US6909744B2 (en) 1999-12-09 2005-06-21 Redrock Semiconductor, Inc. Processor architecture for compression and decompression of video and images
US6412038B1 (en) 2000-02-14 2002-06-25 Intel Corporation Integral modular cache for a processor
JP2001282548A (en) 2000-03-29 2001-10-12 Matsushita Electric Ind Co Ltd Communication equipment and communication method
US6519696B1 (en) 2000-03-30 2003-02-11 I.P. First, Llc Paired register exchange using renaming register map
US6681295B1 (en) * 2000-08-31 2004-01-20 Hewlett-Packard Development Company, L.P. Fast lane prefetching
US6718460B1 (en) 2000-09-05 2004-04-06 Sun Microsystems, Inc. Mechanism for error handling in a computer system
US20030070013A1 (en) 2000-10-27 2003-04-10 Daniel Hansson Method and apparatus for reducing power consumption in a digital processor
US7039901B2 (en) 2001-01-24 2006-05-02 Texas Instruments Incorporated Software shared memory bus
US6925634B2 (en) * 2001-01-24 2005-08-02 Texas Instruments Incorporated Method for maintaining cache coherency in software in a shared memory system
EP1384160A2 (en) 2001-03-02 2004-01-28 Atsana Semiconductor Corp. Apparatus for variable word length computing in an array processor
US7010558B2 (en) 2001-04-19 2006-03-07 Arc International Data processor with enhanced instruction execution and method
US7165168B2 (en) 2003-01-14 2007-01-16 Ip-First, Llc Microprocessor with branch target address cache update queue
GB0112275D0 (en) 2001-05-21 2001-07-11 Micron Technology Inc Method and circuit for normalization of floating point significands in a simd array mpp
GB0112269D0 (en) 2001-05-21 2001-07-11 Micron Technology Inc Method and circuit for alignment of floating point significands in a simd array mpp
US7191445B2 (en) * 2001-08-31 2007-03-13 Texas Instruments Incorporated Method using embedded real-time analysis components with corresponding real-time operating system software objects
US6751331B2 (en) 2001-10-11 2004-06-15 United Global Sourcing Incorporated Communication headset
JP2003131902A (en) * 2001-10-24 2003-05-09 Toshiba Corp Software debugger, system-level debugger, debug method and debug program
US7051239B2 (en) * 2001-12-28 2006-05-23 Hewlett-Packard Development Company, L.P. Method and apparatus for efficiently implementing trace and/or logic analysis mechanisms on a processor chip
US20030225998A1 (en) 2002-01-31 2003-12-04 Khan Mohammed Noshad Configurable data processor with multi-length instruction set architecture
US7168067B2 (en) * 2002-02-08 2007-01-23 Agere Systems Inc. Multiprocessor system with cache-based software breakpoints
US7529912B2 (en) 2002-02-12 2009-05-05 Via Technologies, Inc. Apparatus and method for instruction-level specification of floating point format
US7181596B2 (en) 2002-02-12 2007-02-20 Ip-First, Llc Apparatus and method for extending a microprocessor instruction set
US7328328B2 (en) 2002-02-19 2008-02-05 Ip-First, Llc Non-temporal memory reference control mechanism
US7315921B2 (en) 2002-02-19 2008-01-01 Ip-First, Llc Apparatus and method for selective memory attribute control
US7395412B2 (en) 2002-03-08 2008-07-01 Ip-First, Llc Apparatus and method for extending data modes in a microprocessor
US7546446B2 (en) 2002-03-08 2009-06-09 Ip-First, Llc Selective interrupt suppression
US7302551B2 (en) 2002-04-02 2007-11-27 Ip-First, Llc Suppression of store checking
US7380103B2 (en) 2002-04-02 2008-05-27 Ip-First, Llc Apparatus and method for selective control of results write back
US7155598B2 (en) 2002-04-02 2006-12-26 Ip-First, Llc Apparatus and method for conditional instruction execution
US7185180B2 (en) 2002-04-02 2007-02-27 Ip-First, Llc Apparatus and method for selective control of condition code write back
US7373483B2 (en) 2002-04-02 2008-05-13 Ip-First, Llc Mechanism for extending the number of registers in a microprocessor
US7380109B2 (en) 2002-04-15 2008-05-27 Ip-First, Llc Apparatus and method for providing extended address modes in an existing instruction set for a microprocessor
KR100450753B1 (en) 2002-05-17 2004-10-01 한국전자통신연구원 Programmable variable length decoder including interface of CPU processor
US6718504B1 (en) 2002-06-05 2004-04-06 Arc International Method and apparatus for implementing a data processor adapted for turbo decoding
US6968444B1 (en) 2002-11-04 2005-11-22 Advanced Micro Devices, Inc. Microprocessor employing a fixed position dispatch unit
US7590829B2 (en) 2003-03-31 2009-09-15 Stretch, Inc. Extension adapter
US7668897B2 (en) 2003-06-16 2010-02-23 Arm Limited Result partitioning within SIMD data processing systems
US7373642B2 (en) 2003-07-29 2008-05-13 Stretch, Inc. Defining instruction extensions in a standard programming language
US7133950B2 (en) 2003-08-19 2006-11-07 Sun Microsystems, Inc. Request arbitration in multi-core processor
US7363544B2 (en) * 2003-10-30 2008-04-22 International Business Machines Corporation Program debug method and apparatus
US7401328B2 (en) 2003-12-18 2008-07-15 Lsi Corporation Software-implemented grouping techniques for use in a superscalar data processing system
US7613911B2 (en) 2004-03-12 2009-11-03 Arm Limited Prefetching exception vectors by early lookup exception vectors within a cache memory
US20050278517A1 (en) * 2004-05-19 2005-12-15 Kar-Lik Wong Systems and methods for performing branch prediction in a variable length instruction set microprocessor

Patent Citations (80)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6076158A (en) * 1990-06-29 2000-06-13 Digital Equipment Corporation Branch prediction in high-performance processor
US5155843A (en) * 1990-06-29 1992-10-13 Digital Equipment Corporation Error transition mode for multi-processor system
US5778423A (en) * 1990-06-29 1998-07-07 Digital Equipment Corporation Prefetch instruction for improving performance in reduced instruction set processor
US5493687A (en) * 1991-07-08 1996-02-20 Seiko Epson Corporation RISC microprocessor architecture implementing multiple typed register sets
US5423011A (en) * 1992-06-11 1995-06-06 International Business Machines Corporation Apparatus for initializing branch prediction information
US5530825A (en) * 1994-04-15 1996-06-25 Motorola, Inc. Data processor with branch target address cache and method of operation
US5920711A (en) * 1995-06-02 1999-07-06 Synopsys, Inc. System for frame-based protocol, graphical capture, synthesis, analysis, and simulation
US6292879B1 (en) * 1995-10-25 2001-09-18 Anthony S. Fong Method and apparatus to specify access control list and cache enabling and cache coherency requirement enabling on individual operands of an instruction of a computer
US5752014A (en) * 1996-04-29 1998-05-12 International Business Machines Corporation Automatic selection of branch prediction methodology for subsequent branch instruction based on outcome of previous branch prediction
US5805876A (en) * 1996-09-30 1998-09-08 International Business Machines Corporation Method and system for reducing average branch resolution time and effective misprediction penalty in a processor
US5808876A (en) * 1997-06-20 1998-09-15 International Business Machines Corporation Multi-function power distribution system
US20040068643A1 (en) * 1997-08-01 2004-04-08 Dowling Eric M. Method and apparatus for high performance branching in pipelined microsystems
US20010044892A1 (en) * 1997-09-10 2001-11-22 Shinichi Yamaura Method and system for high performance implementation of microprocessors
US5978909A (en) * 1997-11-26 1999-11-02 Intel Corporation System for speculative branch target prediction having a dynamic prediction history buffer and a static prediction history buffer
US6353882B1 (en) * 1998-02-23 2002-03-05 Hewlett-Packard Company Reducing branch prediction interference of opposite well behaved branches sharing history entry by static prediction correctness based updating
US6151672A (en) * 1998-02-23 2000-11-21 Hewlett-Packard Company Methods and apparatus for reducing interference in a branch history table of a microprocessor
US20010040686A1 (en) * 1998-06-26 2001-11-15 Heidi M. Schoolcraft Streamlined tetrahedral interpolation
US6253287B1 (en) * 1998-09-09 2001-06-26 Advanced Micro Devices, Inc. Using three-dimensional storage to make variable-length instructions appear uniform in two dimensions
US6339822B1 (en) * 1998-10-02 2002-01-15 Advanced Micro Devices, Inc. Using padded instructions in a block-oriented cache
US6526502B1 (en) * 1998-12-02 2003-02-25 Ip-First Llc Apparatus and method for speculatively updating global branch history with branch prediction prior to resolution of branch outcome
US6189091B1 (en) * 1998-12-02 2001-02-13 Ip First, L.L.C. Apparatus and method for speculatively updating global history and restoring same on branch misprediction detection
US20010016903A1 (en) * 1998-12-03 2001-08-23 Marc Tremblay Software branch prediction filtering for a microprocessor
US20060036836A1 (en) * 1998-12-31 2006-02-16 Metaflow Technologies, Inc. Block-based branch target buffer
US6571331B2 (en) * 1999-03-18 2003-05-27 Ip-First, Llc Static branch prediction mechanism for conditional branch instructions
US6499101B1 (en) * 1999-03-18 2002-12-24 I.P. First L.L.C. Static branch prediction mechanism for conditional branch instructions
US20010032309A1 (en) * 1999-03-18 2001-10-18 Henry G. Glenn Static branch prediction mechanism for conditional branch instructions
US6622240B1 (en) * 1999-06-18 2003-09-16 Intrinsity, Inc. Method and apparatus for pre-branch instruction
US6550056B1 (en) * 1999-07-19 2003-04-15 Mitsubishi Denki Kabushiki Kaisha Source level debugger for debugging source programs
US20040225871A1 (en) * 1999-10-01 2004-11-11 Naohiko Irie Branch control memory
US20020199092A1 (en) * 1999-11-05 2002-12-26 Ip-First Llc Split history tables for branch prediction
US6609194B1 (en) * 1999-11-12 2003-08-19 Ip-First, Llc Apparatus for performing branch target address calculation based on branch type
US20010021974A1 (en) * 2000-02-01 2001-09-13 Samsung Electronics Co., Ltd. Branch predictor suitable for multi-processing microprocessor
US20020066006A1 (en) * 2000-11-29 2002-05-30 Lsi Logic Corporation Simple branch prediction and misprediction recovery method
US20020069351A1 (en) * 2000-12-05 2002-06-06 Shyh-An Chi Memory data access structure and method suitable for use in a processor
US20020073301A1 (en) * 2000-12-07 2002-06-13 International Business Machines Corporation Hardware for use with compiler generated branch information
US20020078332A1 (en) * 2000-12-19 2002-06-20 Seznec Andre C. Conflict free parallel read access to a bank interleaved branch predictor in a processor
US20020083312A1 (en) * 2000-12-27 2002-06-27 Balaram Sinharoy Branch Prediction apparatus and process for restoring replaced branch history for use in future branch predictions for an executing program
US6963554B1 (en) * 2000-12-27 2005-11-08 National Semiconductor Corporation Microwire dynamic sequencer pipeline stall
US20020087852A1 (en) * 2000-12-28 2002-07-04 Jourdan Stephan J. Method and apparatus for predicting branches using a meta predictor
US20020087851A1 (en) * 2000-12-28 2002-07-04 Matsushita Electric Industrial Co., Ltd. Microprocessor and an instruction converter
US20020157000A1 (en) * 2001-03-01 2002-10-24 International Business Machines Corporation Software hint to improve the branch target prediction accuracy
US20020138236A1 (en) * 2001-03-21 2002-09-26 Akihiro Takamura Processor having execution result prediction function for instruction
US20020194462A1 (en) * 2001-05-04 2002-12-19 Ip First Llc Apparatus and method for selecting one of multiple target addresses stored in a speculative branch target address cache per instruction cache line
US20020188833A1 (en) * 2001-05-04 2002-12-12 Ip First Llc Dual call/return stack branch prediction system
US20050132175A1 (en) * 2001-05-04 2005-06-16 Ip-First, Llc. Speculative hybrid branch direction predictor
US20020194464A1 (en) * 2001-05-04 2002-12-19 Ip First Llc Speculative branch target address cache with selective override by seconday predictor based on branch instruction type
US20020194461A1 (en) * 2001-05-04 2002-12-19 Ip First Llc Speculative branch target address cache
US20020194463A1 (en) * 2001-05-04 2002-12-19 Ip First Llc, Speculative hybrid branch direction predictor
US6886093B2 (en) * 2001-05-04 2005-04-26 Ip-First, Llc Speculative hybrid branch direction predictor
US20040172524A1 (en) * 2001-06-29 2004-09-02 Jan Hoogerbrugge Method, apparatus and compiler for predicting indirect branch target addresses
US7162619B2 (en) * 2001-07-03 2007-01-09 Ip-First, Llc Apparatus and method for densely packing a branch instruction predicted by a branch target address cache and associated target instructions into a byte-wide instruction buffer
US6823444B1 (en) * 2001-07-03 2004-11-23 Ip-First, Llc Apparatus and method for selectively accessing disparate instruction buffer stages based on branch target address cache hit and instruction stage wrap
US20030023838A1 (en) * 2001-07-27 2003-01-30 Karim Faraydon O. Novel fetch branch architecture for reducing branch penalty without branch prediction
US20030204705A1 (en) * 2002-04-30 2003-10-30 Oldfield William H. Prediction of branch instructions in a data processing apparatus
US20040225872A1 (en) * 2002-06-04 2004-11-11 International Business Machines Corporation Hybrid branch prediction using a global selection counter and a prediction method comparison table
US20040015683A1 (en) * 2002-07-18 2004-01-22 International Business Machines Corporation Two dimensional branch history table prefetching mechanism
US20040049660A1 (en) * 2002-09-06 2004-03-11 Mips Technologies, Inc. Method and apparatus for clearing hazards using jump instructions
US20050125634A1 (en) * 2002-10-04 2005-06-09 Fujitsu Limited Processor and instruction control method
US20040186985A1 (en) * 2003-03-21 2004-09-23 Analog Devices, Inc. Method and apparatus for branch prediction based on branch targets
US6774832B1 (en) * 2003-03-25 2004-08-10 Raytheon Company Multi-bit output DDS with real time delta sigma modulation look up from memory
US20040193843A1 (en) * 2003-03-31 2004-09-30 Eran Altshuler System and method for early branch prediction
US20040193855A1 (en) * 2003-03-31 2004-09-30 Nicolas Kacevas System and method for branch prediction access
US20040225870A1 (en) * 2003-05-07 2004-11-11 Srinivasan Srikanth T. Method and apparatus for reducing wrong path execution in a speculative multi-threaded processor
US20040230782A1 (en) * 2003-05-12 2004-11-18 International Business Machines Corporation Method and system for processing loop branch instructions
US20040255104A1 (en) * 2003-06-12 2004-12-16 Intel Corporation Method and apparatus for recycling candidate branch outcomes after a wrong-path execution in a superscalar processor
US20040268102A1 (en) * 2003-06-30 2004-12-30 Combs Jonathan D. Mechanism to remove stale branch predictions at a microprocessor
US20050027974A1 (en) * 2003-07-31 2005-02-03 Oded Lempel Method and system for conserving resources in an instruction pipeline
US20050050309A1 (en) * 2003-08-29 2005-03-03 Renesas Technology Corp. Data processor
US20050076193A1 (en) * 2003-09-08 2005-04-07 Ip-First, Llc. Apparatus and method for selectively overriding return stack prediction in response to detection of non-standard return sequence
US20050066305A1 (en) * 2003-09-22 2005-03-24 Lisanke Robert John Method and machine for efficient simulation of digital hardware within a software development environment
US20050091479A1 (en) * 2003-10-24 2005-04-28 Sung-Woo Chung Branch predictor, system and method of branch prediction
US20050125632A1 (en) * 2003-12-03 2005-06-09 Advanced Micro Devices, Inc. Transitioning from instruction cache to trace cache on label boundaries
US20050125613A1 (en) * 2003-12-03 2005-06-09 Sangwook Kim Reconfigurable trace cache
US20050154867A1 (en) * 2004-01-14 2005-07-14 International Business Machines Corporation Autonomic method and apparatus for counting branch instructions to improve branch predictions
US20050172277A1 (en) * 2004-02-04 2005-08-04 Saurabh Chheda Energy-focused compiler-assisted branch prediction
US20050216713A1 (en) * 2004-03-25 2005-09-29 International Business Machines Corporation Instruction text controlled selectively stated branches for prediction via a branch target buffer
US20050216703A1 (en) * 2004-03-26 2005-09-29 International Business Machines Corporation Apparatus and method for decreasing the latency between an instruction cache and a pipeline processor
US20050223202A1 (en) * 2004-03-31 2005-10-06 Intel Corporation Branch prediction in a pipelined processor
US20060015706A1 (en) * 2004-06-30 2006-01-19 Chunrong Lai TLB correlated branch predictor and method for use thereof
US20060041868A1 (en) * 2004-08-23 2006-02-23 Cheng-Yen Huang Method for verifying branch prediction mechanism and accessible recording medium for storing program thereof

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8719837B2 (en) 2004-05-19 2014-05-06 Synopsys, Inc. Microprocessor architecture having extendible logic
US20050289323A1 (en) * 2004-05-19 2005-12-29 Kar-Lik Wong Barrel shifter for a microprocessor
US20050278505A1 (en) * 2004-05-19 2005-12-15 Lim Seow C Microprocessor architecture including zero impact predictive data pre-fetch mechanism for pipeline data memory
US9003422B2 (en) 2004-05-19 2015-04-07 Synopsys, Inc. Microprocessor architecture having extendible logic
US7971042B2 (en) 2005-09-28 2011-06-28 Synopsys, Inc. Microprocessor system and method for instruction-initiated recording and execution of instruction sequences in a dynamically decoupleable extended instruction pipeline
US7716460B2 (en) 2006-09-29 2010-05-11 Qualcomm Incorporated Effective use of a BHT in processor having variable length instruction set execution modes
US8185725B2 (en) 2006-09-29 2012-05-22 Qualcomm Incorporated Selective powering of a BHT in a processor having variable length instructions
US20100058032A1 (en) * 2006-09-29 2010-03-04 Qualcomm Incorporated Effective Use of a BHT in Processor Having Variable Length Instruction Set Execution Modes
US20080082807A1 (en) * 2006-09-29 2008-04-03 Brian Michael Stempel Effective Use of a BHT in Processor Having Variable Length Instruction Set Execution Modes
WO2008039975A1 (en) * 2006-09-29 2008-04-03 Qualcomm Incorporated Effective use of a bht in processor having variable length instruction set execution modes
US9753732B2 (en) 2011-12-30 2017-09-05 Intel Corporation Embedded branch prediction unit
WO2013101152A1 (en) * 2011-12-30 2013-07-04 Intel Corporation Embedded branch prediction unit
US9395994B2 (en) 2011-12-30 2016-07-19 Intel Corporation Embedded branch prediction unit
GB2545796A (en) * 2015-11-09 2017-06-28 Imagination Tech Ltd Fetch ahead branch target buffer
GB2545796B (en) * 2015-11-09 2019-01-30 Mips Tech Llc Fetch ahead branch target buffer
US10664280B2 (en) 2015-11-09 2020-05-26 MIPS Tech, LLC Fetch ahead branch target buffer
WO2018009277A1 (en) * 2016-07-07 2018-01-11 Intel Corporation Graphics command parsing mechanism
US10192281B2 (en) 2016-07-07 2019-01-29 Intel Corporation Graphics command parsing mechanism
CN107179895A (en) * 2017-05-17 2017-09-19 北京中科睿芯科技有限公司 A kind of method that application compound instruction accelerates instruction execution speed in data flow architecture
US20200065112A1 (en) * 2018-08-22 2020-02-27 Qualcomm Incorporated Asymmetric speculative/nonspeculative conditional branching
US11182166B2 (en) 2019-05-23 2021-11-23 Samsung Electronics Co., Ltd. Branch prediction throughput by skipping over cachelines without branches
CN110442382A (en) * 2019-07-31 2019-11-12 西安芯海微电子科技有限公司 Prefetch buffer control method, device, chip and computer readable storage medium
CN110727463A (en) * 2019-09-12 2020-01-24 无锡江南计算技术研究所 Zero-level instruction circular buffer prefetching method and device based on dynamic credit
CN112015490A (en) * 2020-11-02 2020-12-01 鹏城实验室 Method, apparatus and medium for programmable device implementing and testing reduced instruction set
US11663007B2 (en) * 2021-10-01 2023-05-30 Arm Limited Control of branch prediction for zero-overhead loop

Also Published As

Publication number Publication date
US9003422B2 (en) 2015-04-07
GB2428842A (en) 2007-02-07
GB0622477D0 (en) 2006-12-20
WO2005114441A2 (en) 2005-12-01
CN101002169A (en) 2007-07-18
WO2005114441A3 (en) 2007-01-18
US20140208087A1 (en) 2014-07-24
US20050273559A1 (en) 2005-12-08
US20050289323A1 (en) 2005-12-29
US20050289321A1 (en) 2005-12-29
TW200602974A (en) 2006-01-16
US8719837B2 (en) 2014-05-06
US20050278505A1 (en) 2005-12-15
US20050278513A1 (en) 2005-12-15

Similar Documents

Publication Publication Date Title
US20050278517A1 (en) Systems and methods for performing branch prediction in a variable length instruction set microprocessor
KR101059335B1 (en) Efficient Use of JHT in Processors with Variable Length Instruction Set Execution Modes
EP1851620B1 (en) Suppressing update of a branch history register by loop-ending branches
US7278012B2 (en) Method and apparatus for efficiently accessing first and second branch history tables to predict branch instructions
US7237098B2 (en) Apparatus and method for selectively overriding return stack prediction in response to detection of non-standard return sequence
JP5384323B2 (en) Representing a loop branch in a branch history register with multiple bits
JP5255367B2 (en) Processor with branch destination address cache and method of processing data
US7516312B2 (en) Presbyopic branch target prefetch method and apparatus
US7444501B2 (en) Methods and apparatus for recognizing a subroutine call
JP2004533695A (en) Method, processor, and compiler for predicting branch target
JP5209633B2 (en) System and method with working global history register
JP2011100466A5 (en)
JP2008532142A5 (en)
US7143269B2 (en) Apparatus and method for killing an instruction after loading the instruction into an instruction queue in a pipelined microprocessor
JP2009536770A (en) Branch address cache based on block
KR101048258B1 (en) Association of cached branch information with the final granularity of branch instructions in a variable-length instruction set
WO2004072848A2 (en) Method and apparatus for hazard detection and management in a pipelined digital processor
US7234046B2 (en) Branch prediction using precedent instruction address of relative offset determined based on branch type and enabling skipping
US20050144427A1 (en) Processor including branch prediction mechanism for far jump and far call instructions
US8578134B1 (en) System and method for aligning change-of-flow instructions in an instruction buffer

Legal Events

Date Code Title Description
AS Assignment

Owner name: ARC INTERNATIONAL, UNITED KINGDOM

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WONG, KAR-LIK;HAKEWILL, JAMES;TOPHAM, NIGEL;AND OTHERS;REEL/FRAME:016933/0516;SIGNING DATES FROM 20050714 TO 20050809

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION