Trace processors use a new microarchitecture organization that achieves higher performance than conventional superscalar processors. Trace processors can be logically broken into two main parts, the frontend (instruction fetch) and the backend (instruction execution). In the frontend a trace processor uses a trace cache to enable it to fetch across multiple branches in a single cycle. The trace cache records short dynamic sequences of instructions, traces, and can provide one trace of instructions per cycle when a path is repeated. Trace processors use a distributed backend, which consists of simple processing elements that are replicated for high aggregate bandwidth. Traces are dispatched by the fronted, one per processing element, to the backend.
This thesis proposes three mechanisms that enable very high-performance frontends for trace processors. The first mechanism, trace pre-construction, augments the trace cache by performing a task analogous to prefetching. It increases both the average performance of the trace cache and the robustness of the trace cache to varying workloads. Pre-construction can reduce the trace cache miss rates by up to 80% for the SPECint95 benchmarks.
The second mechanism, instruction pre-processing, takes advantage of the trace cache to dynamically optimize program binaries. It can perform transformations that both dynamically optimize common instruction sequences and take advantage of implementation-specific hardware. The dynamic optimizations, performed in the frontend, expose more parallelism to the trace processor backend. Three specific optimizations are considered: instruction scheduling, constant propagation and instruction collapsing. Together these optimizations increase performance by up to 20% for the SPECint95 benchmarks.
The third mechanism, next-trace prediction, is a control flow predictor that matches the bandwidth of the trace cache without sacrificing prediction accuracy. It performs the functionality of branch prediction and branch target prediction, and works in units of traces, so its bandwidth is perfectly matched to the trace cache. Next-trace prediction has prediction accuracy comparable to the best traditional branch predictors, while providing significantly higher branch throughput.
Cited By
Recommendations
Trace processors
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on MicroarchitectureTraces are dynamic instruction sequences constructed and cached by hardware. A microarchitecture organized around traces is presented as a means for efficiently executing many instructions per cycle. Trace processors exploit both control flow and data ...