A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor

SS Mukherjee, C Weaver, J Emer… - … . 36th Annual IEEE …, 2003 - ieeexplore.ieee.org
Proceedings. 36th Annual IEEE/ACM International Symposium on …, 2003ieeexplore.ieee.org
Single-event upsets from particle strikes have become a key challenge in microprocessor
design. Techniques to deal with these transients faults exist, but come at a cost. Designers
clearly require accurate estimates of processor error rates to make appropriate
cost/reliability tradeoffs. This paper describes a method for generating these estimates. A
key aspect of this analysis is that some single-bit faults (such as those occurring in the
branch predictor) do not produce an error in a program's output. We define a structure's …
Single-event upsets from particle strikes have become a key challenge in microprocessor design. Techniques to deal with these transients faults exist, but come at a cost. Designers clearly require accurate estimates of processor error rates to make appropriate cost/reliability tradeoffs. This paper describes a method for generating these estimates. A key aspect of this analysis is that some single-bit faults (such as those occurring in the branch predictor) do not produce an error in a program's output. We define a structure's architectural vulnerability factor (AVF) as the probability that a fault in that particular structure do not result in an error. A structure's error rate is the product of its raw error rate, as determined by process and circuit technology, and the AVF. Unfortunately, computing AVFs of complex structures, such as the instruction queue, can be quite involved. We identify numerous cases, such as prefetches, dynamically dead code, and wrong-path instructions, in which a fault do not affect, correct execution. We instrument a detailed 1A64 processor simulator to map bit-level microarchitectural state to these cases, generating per-structure AVF estimates. This analysis shows AVFs of 28% and 9% for the instruction queue and execution units, respectively, averaged across dynamic sections of the entire CPU2000 benchmark suite.
ieeexplore.ieee.org