Mitigating Amdahl's law through EPI throttling

M Annavaram, E Grochowski… - … Symposium on Computer …, 2005 - ieeexplore.ieee.org
M Annavaram, E Grochowski, J Shen
32nd International Symposium on Computer Architecture (ISCA'05), 2005ieeexplore.ieee.org
This paper is motivated by three recent trends in computer design. First, chip multi-
processors (CMPs) with increasing numbers of CPU cores per chip are becoming common.
Second, multi-threaded software that can take advantage of CMPs will soon become
prevalent. Due to the nature of the algorithms, these multi-threaded programs inherently will
have phases of sequential execution; Amdahl's law dictates that the speedup of such
parallel programs will be limited by the sequential portion of the computation. Finally …
This paper is motivated by three recent trends in computer design. First, chip multi-processors (CMPs) with increasing numbers of CPU cores per chip are becoming common. Second, multi-threaded software that can take advantage of CMPs will soon become prevalent. Due to the nature of the algorithms, these multi-threaded programs inherently will have phases of sequential execution; Amdahl's law dictates that the speedup of such parallel programs will be limited by the sequential portion of the computation. Finally, increasing levels of on-chip integration coupled with a slowing rate of reduction in supply voltage make power consumption a first order design constraint. Given this environment, our goal is to minimize the execution times of multi-threaded programs containing nontrivial parallel and sequential phases, while keeping the CMP's total power consumption within a fixed budget. In order to mitigate the effects of Amdahl's law, in this paper we make a compelling case for varying the amount of energy expended to process instructions according to the amount of available parallelism. Using the equation, Power-Energy per instruction (EPI) * Instructions per second (IPS), we propose that during phases of limited parallelism (low IPS) the chip multi-processor will spend more EPI; similarly, during phases of higher parallelism (high IPS) the chip multi-processor will spend less EPI; in both scenarios power is fixed. We evaluate the performance benefits of an EPI throttle on an asymmetric multiprocessor (AMP) prototyped from a physical 4-way Xeon SMP server. Using a wide range of multi-threaded programs, we show a 38% wall clock speedup on an AMP compared to a standard SMP that uses the same power. We also measure the supply current on a 4-way SMP server while running the multi-threaded programs and use the measured data as input to a software simulator that implements a more flexible EPI throttle. The results from the measurement-driven simulation show performance benefits comparable to the AMP prototype. We analyze the results from both techniques, explain why and when an EPI throttle works well, and conclude with a discussion of the challenges in building practical EPI throttles.
ieeexplore.ieee.org