Power management of extreme-scale networks with on/off links in runtime systems

E Totoni, N Jain, LV Kale - ACM Transactions on Parallel Computing …, 2015 - dl.acm.org
ACM Transactions on Parallel Computing (TOPC), 2015dl.acm.org
Networks are among major power consumers in large-scale parallel systems. During
execution of common parallel applications, a sizeable fraction of the links in the high-radix
interconnects are either never used or are underutilized. We propose a runtime system
based adaptive approach to turn off unused links, which has various advantages over the
previously proposed hardware and compiler based approaches. We discuss why the
runtime system is the best system component to accomplish this task, and test the …
Networks are among major power consumers in large-scale parallel systems. During execution of common parallel applications, a sizeable fraction of the links in the high-radix interconnects are either never used or are underutilized. We propose a runtime system based adaptive approach to turn off unused links, which has various advantages over the previously proposed hardware and compiler based approaches. We discuss why the runtime system is the best system component to accomplish this task, and test the effectiveness of our approach using real applications (including NAMD, MILC), and application benchmarks (including NAS Parallel Benchmarks, Stencil). These codes are simulated on representative topologies such as 6-D Torus and multilevel directly connected network (similar to IBM PERCS in Power 775 and Dragonfly in Cray Aries). For common applications with near-neighbor communication pattern, our approach can save up to 20% of total machine's power and energy, without any performance penalty.
ACM Digital Library