Inherently Lower-Power High-Performance Superscalar Architectures
Inherently Lower-Power High-Performance Superscalar Architectures
Inherently Lower-Power High-Performance Superscalar Architectures
We will introduce
General solution methodology
Previous decentralization schemes
Proposed strategy
Simulation results of multicluster architecture
Conclusions
General Design Solution
Decentralization of microarchitecture
Replace tightly coupled CPU with a set of clusters,
each capable of superscalar processing
Can ideally reduce γ to zero, with good cluster
partitioning techniques
Figure 3
Multicluster Architecture Details
Register Renaming and Instruction Steering
Each cluster is provided with a local physical RF
Global Map Table maintains mapping between architectural registers and
physical registers
Intercluster Communication
Remote Access Window (RAW) used for remote RF calls
Remote Access Buffer (RAB) used to keep the remote source operand
One cycle penalty incurred for a remote RF
Multicluster Architecture Details
(Cont’d)
Memory Dataflow
Centralized memory disambiguation unit does not scale with
increasing issue width and bigger sizes of the load/store window
Proposed scheme: Every cluster is provided with a local load/store
window that is hardwired to a particular data cache bank
Developed a bank predictor in order to combat not knowing which
cluster the instruction is being routed to at the decode stage