When it comes to performance, modern computer design has become a well structured art which starts with instruction sets that maximize opportunities for concurrency, follows through with fast organizational techniques such as pipelining and super scalar execution, and ends with clever macro and circuit designs that are based on inherently fast CMOS fabrication technologies. When it comes to low power, however, exactly the opposite is true. Current techniques start with lowering supply voltages and making process changes to minimize capacitance, followed by some relatively simple techniques for minimizing power for particular logic macros, and then utilizing relatively ad hoc techniques, such as ‘sleep modes’, at higher levels. This work attempts to reverse this by bringing the power issue to the earliest phase of high-performance microprocessor development. We propose a methodology for power- optimization of high-performance microprocessors at the microarchitecture level. In particular, our work explores solutions to the problem that do not compromise performance. First, major targets for power reduction are identified within microarchitecture, where power is heavily consumed, or will be heavily consumed in next- generation processors. This involves developing energy models for structures where power grows with increasing issue width, such as Register File, Issue Window, Memory Disambiguation Unit, etc. Then, a multicluster microarchitecture is developed that reduces energy in the identified critical design points, with minimal performance impact, Detailed simulation of the baseline and proposed multicluster architectures has been performed, optimizing both for the energy-delay metric. A comparison of the two microarchitectures, both optimized for energy efficiency, shows that the multicluster architecture is potentially up to twice as energy efficient for wide issue processors, with an advantage that grows with the issue width. Conversely, at the same power dissipation level the new multicluster architecture supports configurations with measurably higher performance than equivalent conventional designs.
Cited By
- Mohammadi M, Aamodt T and Dally W (2017). CG-OoO, ACM Transactions on Architecture and Code Optimization, 14:4, (1-26), Online publication date: 20-Dec-2017.
- Dosanjh S, Barrett R, Doerfler D, Hammond S, Hemmert K, Heroux M, Lin P, Pedretti K, Rodrigues A, Trucano T and Luitjens J (2014). Exascale design space exploration and co-design, Future Generation Computer Systems, 30:C, (46-58), Online publication date: 1-Jan-2014.
- Lee B and Brooks D (2008). Efficiency trends and limits from comprehensive microarchitectural adaptivity, ACM SIGPLAN Notices, 43:3, (36-47), Online publication date: 25-Mar-2008.
- Lee B and Brooks D (2008). Efficiency trends and limits from comprehensive microarchitectural adaptivity, ACM SIGOPS Operating Systems Review, 42:2, (36-47), Online publication date: 25-Mar-2008.
- Lee B and Brooks D (2008). Efficiency trends and limits from comprehensive microarchitectural adaptivity, ACM SIGARCH Computer Architecture News, 36:1, (36-47), Online publication date: 25-Mar-2008.
- Lee B and Brooks D Efficiency trends and limits from comprehensive microarchitectural adaptivity Proceedings of the 13th international conference on Architectural support for programming languages and operating systems, (36-47)
- Safi E, Akl P, Moshovos A, Veneris A and Arapoyianni A On the latency, energy and area of checkpointed, superscalar register alias tables Proceedings of the 2007 international symposium on Low power electronics and design, (379-382)
- Sethumadhavan S, Roesner F, Emer J, Burger D and Keckler S (2007). Late-binding, ACM SIGARCH Computer Architecture News, 35:2, (347-357), Online publication date: 9-Jun-2007.
- Sethumadhavan S, Roesner F, Emer J, Burger D and Keckler S Late-binding Proceedings of the 34th annual international symposium on Computer architecture, (347-357)
- Sharkey J, Ghose K, Ponomarev D and Ergin O Power-Efficient Wakeup Tag Broadcast Proceedings of the 2005 International Conference on Computer Design, (654-661)
- Gibert E, Sanchez J and Gonzalez A (2005). Distributed Data Cache Designs for Clustered VLIW Processors, IEEE Transactions on Computers, 54:10, (1227-1241), Online publication date: 1-Oct-2005.
- Sharkey J, Ponomarev D, Ghose K and Ergin O Instruction packing Proceedings of the 2005 international symposium on Low power electronics and design, (30-35)
- Abella J and Gonzalez A Inherently Workload-Balanced Clustered Microarchitecture Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
- Parcerisa J, Sahuquillo J, Gonzalez A and Duato J (2005). On-Chip Interconnects and Instruction Steering Schemes for Clustered Microarchitectures, IEEE Transactions on Parallel and Distributed Systems, 16:2, (130-144), Online publication date: 1-Feb-2005.
- Latorre F, González J and González A Back-end assignment schemes for clustered multithreaded processors Proceedings of the 18th annual international conference on Supercomputing, (316-325)
- Rodrigues A, Murphy R, Kogge P and Underwood K Characterizing a new class of threads in scientific applications for high end supercomputers Proceedings of the 18th annual international conference on Supercomputing, (164-174)
- Brooks D, Bose P and Martonosi M (2004). Power-performance simulation, ACM SIGMETRICS Performance Evaluation Review, 31:4, (13-18), Online publication date: 1-Mar-2004.
- Gibert E, Sánchez J and González A Local scheduling techniques for memory coherence in a clustered VLIW processor with a distributed data cache Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, (193-203)
- Srinivasan V, Brooks D, Gschwind M, Bose P, Zyuban V, Strenski P and Emma P Optimizing pipelines for power and performance Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, (333-344)
- Parcerisa J, Sahuquillo J, González A and Duato J Efficient Interconnects for Clustered Microarchitectures Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
- Canal R and González A Reducing the complexity of the issue logic Proceedings of the 15th international conference on Supercomputing, (312-320)
- Folegnani D and González A Energy-effective issue logic Proceedings of the 28th annual international symposium on Computer architecture, (230-239)
- Folegnani D and González A (2001). Energy-effective issue logic, ACM SIGARCH Computer Architecture News, 29:2, (230-239), Online publication date: 1-May-2001.
- Zyuban V and Kogge P (2001). Inherently Lower-Power High-Performance Superscalar Architectures, IEEE Transactions on Computers, 50:3, (268-285), Online publication date: 1-Mar-2001.
- Sánchez J and González A Modulo scheduling for a fully-distributed clustered VLIW architecture Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, (124-133)
- Brooks D, Bose P, Schuster S, Jacobson H, Kudva P, Buyuktosunoglu A, Wellman J, Zyuban V, Gupta M and Cook P (2018). Power-Aware Microarchitecture, IEEE Micro, 20:6, (26-44), Online publication date: 1-Nov-2000.
- Zyuban V and Kogge P Optimization of high-performance superscalar architectures for energy efficiency Proceedings of the 2000 international symposium on Low power electronics and design, (84-89)
Recommendations
Inherently Lower-Power High-Performance Superscalar Architectures
In recent years, reducing power has become an important design goal for high-performance microprocessors. This work attempts to bring the power issue to the earliest phases of microprocessor development, in particular, the stage of defining a chip ...
Efficient superscalar performance through boosting
The foremost goal of superscalar processor design is to increase performance through the exploitation of instruction-level parallelism (ILP). Previous studies have shown that speculative execution is required for high instruction per cycle (IPC) rates ...
Efficient superscalar performance through boosting
ASPLOS V: Proceedings of the fifth international conference on Architectural support for programming languages and operating systemsThe foremost goal of superscalar processor design is to increase performance through the exploitation of instruction-level parallelism (ILP). Previous studies have shown that speculative execution is required for high instruction per cycle (IPC) rates ...