Accelerator memory reuse in the dark silicon era

EG Cota, P Mantovani, M Petracca… - IEEE Computer …, 2012 - ieeexplore.ieee.org
IEEE Computer Architecture Letters, 2012ieeexplore.ieee.org
Accelerators integrated on-die with General-Purpose CPUs (GP-CPUs) can yield significant
performance and power improvements. Their extensive use, however, is ultimately limited by
their area overhead; due to their high degree of specialization, the opportunity cost of
investing die real estate on accelerators can become prohibitive, especially for general-
purpose architectures. In this paper we present a novel technique aimed at mitigating this
opportunity cost by allowing GP-CPU cores to reuse accelerator memory as a non-uniform …
Accelerators integrated on-die with General-Purpose CPUs (GP-CPUs) can yield significant performance and power improvements. Their extensive use, however, is ultimately limited by their area overhead; due to their high degree of specialization, the opportunity cost of investing die real estate on accelerators can become prohibitive, especially for general-purpose architectures. In this paper we present a novel technique aimed at mitigating this opportunity cost by allowing GP-CPU cores to reuse accelerator memory as a non-uniform cache architecture (NUCA) substrate. On a system with a last level-2 cache of 128kB, our technique achieves on average a 25% performance improvement when reusing four 512 kB accelerator memory blocks to form a level-3 cache. Making these blocks reusable as NUCA slices incurs on average in a 1.89% area overhead with respect to equally-sized ad hoc cache slices.
ieeexplore.ieee.org