[PDF][PDF] Eliminating conflict misses for high performance architectures
G Rivera, CW Tseng - Proceedings of the 12th international conference …, 1998 - dl.acm.org
G Rivera, CW Tseng
Proceedings of the 12th international conference on Supercomputing, 1998•dl.acm.orgMany cache misses in scientific programs are due to conflicts caused by limited set
associativity. Two data-layout transformations, inter-and intra-variable padding, can
eliminate many confict misses at compile time. We present GROUPPAD, an inter-variable
padding heuristic to preserve group reuse in stencil computations frequently found in
scientific computations. We show padding can also improve performance in parallel
programs. Our optimizations have been implemented and tested on a collection of kernels …
associativity. Two data-layout transformations, inter-and intra-variable padding, can
eliminate many confict misses at compile time. We present GROUPPAD, an inter-variable
padding heuristic to preserve group reuse in stencil computations frequently found in
scientific computations. We show padding can also improve performance in parallel
programs. Our optimizations have been implemented and tested on a collection of kernels …
Abstract
Many cache misses in scientific programs are due to conflicts caused by limited set associativity. Two data-layout transformations, inter-and intra-variable padding, can eliminate many confict misses at compile time. We present GROUPPAD, an inter-variable padding heuristic to preserve group reuse in stencil computations frequently found in scientific computations. We show padding can also improve performance in parallel programs. Our optimizations have been implemented and tested on a collection of kernels and programs for different cache and data sizes. Preliminary results demonstrate GROUPPAD is able to consistently preserve group reuse among the programs evaluated, though execution time improvements are small for actual problem and cache sizes tested. Padding improves performance of parallel versions of programs approximately the same magnitude as sequential versions of the same program.
ACM Digital Library