Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

A survey on software methods to improve the energy efficiency of parallel computing

Published: 01 November 2017 Publication History

Abstract

Energy consumption is one of the top challenges for achieving the next generation of supercomputing. Codesign of hardware and software is critical for improving energy efficiency EE for future large-scale systems. Many architectural power-saving techniques have been developed, and most hardware components are approaching physical limits. Accordingly, parallel computing software, including both applications and systems, should exploit power-saving hardware innovations and manage efficient energy use. In addition, new power-aware parallel computing methods are essential to decrease energy usage further. This article surveys software-based methods that aim to improve EE for parallel computing. It reviews the methods that exploit the characteristics of parallel scientific applications, including load imbalance and mixed precision of floating-point FP calculations, to improve EE. In addition, this article summarizes widely used methods to improve power usage at different granularities, such as the whole system and per application. In particular, it describes the most important techniques to measure and to achieve energy-efficient usage of various parallel computing facilities, including processors, memories, and networks. Overall, this article reviews the state-of-the-art of energy-efficient methods for parallel computing to motivate researchers to achieve optimal parallel computing under a power budget constraint.

References

[1]
Abe Y, Sasaki H, Peres M . 2012 Power and performance analysis of GPU-accelerated systems. In: Proceedings of the 2012 Workshop on Power-Aware Computing and Systems HotPower'12, Phoenix, AZ, 19-23 May 2014. New York, NY: ACM Press.
[2]
Alan I, Arslan E, Kosar T 2015 Energy-aware data transfer algorithms. In: Proceedings of the 2015 International Conference for High Performance Computing, Networking, Storage and Analysis SC'15, Austin, TX, 15-20 November 2015. New York, NY: ACM Press.
[3]
Alonso M, Coll S, Martinez J-M . 2006 Dynamic power saving in fat-tree interconnection networks using on/off link. In: Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium IPDPS'06, Rhodes Island, Greece, 25-29 April 2006, pp. pp.299-–307. IEEE Press.
[4]
Amarasinghe S, Campbell D, Carlson W . 2009 Exascale Software Study: Software Challenges in Extreme Scale Systems. DARPA Report .
[5]
Anzt H, Rocker B, Heuveline V 2010 Energy efficiency of mixed precision iterative refinement methods using hybrid hardware platforms. Computer Science - Research and Development Volume 25 Issue 3-4: pp.141-–148.
[6]
<collab collab-type="author">ASCAC Subcommittee</collab>2014 The top ten exascale research challenges. ASCAC Advanced Scientific Computing Advisory Committee Subcommittee Report . Washington: U.S. Department of Energy.
[7]
Avelar V, Azevedo D, French A 2014 PUE™: a comprehensive examination of the metric. ASHRAE Press.
[8]
Baboulina M, Donfacka S, Dongarra J . 2012 A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines. In: Proceedings of the International Conference on Computational Science, ICCS 2012, Omaha, Nebraska, 4-6 June 2012, pp. pp.17-–26. Elsevier Press.
[9]
Baek W, Chilimbi TM 2010 Green: a framework for supporting energy-conscious programming using controlled approximation. In: Proceedings of the 31st ACM SIGPLAN conference on programming language design and implementation PLDI'10, Toronto, Ontario, Canada, 5-10 June 2010, pp. pp.198-–209. New York, NY: ACM Press.
[10]
Bailey PE, Marathe A, Lowenthal DK . 2015 Finding the limits of power-constrained application performance. In: Proceedings of the 2015 International Conference for High Performance Computing, Networking, Storage and Analysis SC'15, Austin, TX, 15-20 November 2015. ACM Press.
[11]
Balaprakash P, Tiwari A, Wild SM 2013 Multi-objective optimization of HPC kernels for performance, power, and energy. In: Proceedings of 4th International Workshop on Performance Modeling, Benchmarking, and Simulation of HPC Systems PMBS12, Denver, CO, 18 November 2013, pp. pp.239-–260. Springer Press. ext-link-type="uri" xlink:href="http://www.mcs.anl.gov/papers/P4069-0413.pdf">http://www.mcs.anl.gov/papers/P4069-0413.pdf</ext-link>
[12]
Ballard G, Demmel J, Holtz O . 2011 Minimizing communication in numerical linear algebra. SIAM Society for Industrial and Applied Mathematics Journal on Matrix Analysis and Applications Volume 32 Issue 3: pp.866-–901.
[13]
Bates NJ, Patterson MK 2013<chapter-title>Achieving the 20MW target: mobilizing the HPC community to accelerate energy efficient computing</chapter-title>. In: D'Hollander EH . eds Transition of HPC Towards Exascale Computing. IOS Press, pp. pp.37-–45.
[14]
Benito M, Vallejo E, Beivide R 2015 On the use of commodity Ethernet technology in exascale HPC systems. In: Proceedings of 22nd IEEE International Conference on High Performance Computing HiPC'15, Bengaluru, India, 16-19 December 2015, pp. pp.254-–263. IEEE Press.
[15]
Benkner S, Franchetti F, Gerndt HM . 2014 Automatic Application Tuning for HPC Architectures. Dagstuhl Reports, Vol. 3-9 . Dagstuhl Press, pp. pp.214-–244. ext-link-type="uri" xlink:href="http://dx.doi.org/10.4230/DagRep.3.9.214">http://dx.doi.org/10.4230/DagRep.3.9.214</ext-link>
[16]
Bienia C, Kumar S, Singh JP . 2008 The PARSEC Benchmark Suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques PACT'08, Toronto, Canada, 25-29 October 2008, pp. pp.72-–81. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/1454115.1454128">http://dx.doi.org/10.1145/1454115.1454128</ext-link>
[17]
Bingham BD, Greenstreet MR 2008 Computation with energy-time trade-offs: models, algorithms and lower-bounds. In: Proceedings of the 2008 International Symposium on Parallel and Distributed Processing with Applications ISPA'08, Sydney, NSW, 10-12 December 2008, pp. pp.143-–152. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/ISPA.2008.127">http://dx.doi.org/10.1109/ISPA.2008.127</ext-link>
[18]
Brooks D, Tiwari V, Martonosi M 2000 Wattch: a framework for architectural level power analysis and optimizations. In: Proceedings of the 27th International Symposium on Computer Architecture ISCA'00, Vancouver, BC, 14-14 June 2000, pp. pp.83-–94. IEEE Press.
[19]
Cho S, Melhem RG 2010 On the interplay of parallelization, program performance, and energy consumption. IEEE Transactions on Parallel and Distributed Systems Volume 21 Issue 3: pp.342-–353.
[20]
Choi JW, Bedard D, Fowler R . 2013 A roofline model of energy. In: Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium IPDPS'13, Boston, Massachusetts, USA, 20-24 May 2013, pp. pp.661-–672. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/IPDPS.2013.77">http://dx.doi.org/10.1109/IPDPS.2013.77</ext-link>
[21]
Collange S, Defour D, Tisserand A 2009 Power consumption of GPUs from a software perspective. In: Proceedings of the 9th International Conference on Computational Science ICCS'09, Baton Rouge, LA, 25-27 May 2009, pp. pp.914-–923. Berlin, Heidelberg: Springer Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1007/978-3-642-01970-8_92">http://dx.doi.org/10.1007/978-3-642-01970-8_92</ext-link>
[22]
Conner S, Akioka S, Irwin MJ . 2007 Link shutdown opportunities during collective communications in 3-D torus nets. In: Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium IPDPS'07, Long Beach, California, 26-30 March 2007, pp. pp.1-–8. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/IPDPS.2007.370534">http://dx.doi.org/10.1109/IPDPS.2007.370534</ext-link>
[23]
Curtis-Maury M, Blagojevic F, Antonopoulos CD . 2007 Prediction-based power-performance adaptation of multithreaded scientific codes. IEEE Transactions on Parallel and Distributed Systems Volume 19 Issue 10: pp.1396-–1410.
[24]
Curtis-Maury M, Dzierwa J, Antonopoulos CD . 2006a Online strategies for high-performance power-aware thread execution on emerging multiprocessors. In: Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium IPDPS'06, Rhodes Island, Greece, 25-29 April 2006, pp. pp.298-–307. IEEE Press.
[25]
Curtis-Maury M, Dzierwa J, Antonopoulos CD . 2006b Online power-performance adaptation of multithreaded programs using hardware event-based prediction. In: Proceedings of the 20th annual international conference on Supercomputing ICS'06, Cairns, Queensland, Australia, 28 June-1 July 2006, pp. pp.157-–166. New York, NY: ACM Press.
[26]
David H, Gorbatov E, Hanebutte UR . 2010 RAPL: memory power estimation and capping. In: Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design ISLPED'10, Austin, TX, 18-20 August 2010, pp. pp.189-–194. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/1840845.1840883">http://dx.doi.org/10.1145/1840845.1840883</ext-link>
[27]
Demmel J, Dongarra J, Fox A . 2009 Accelerating time-to-solution for computational science and engineering. SciDAC Review Volume 15 : pp.46-–57.
[28]
Demmel J, Gearhart A, Lipshitz B . 2013 Perfect strong scaling using no additional energy. In: Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium IPDPS'13, Boston, Massachusetts, 20-24 May 2013, pp. pp.649-–660. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/IPDPS.2013.32">http://dx.doi.org/10.1109/IPDPS.2013.32</ext-link>
[29]
Dickov B, Carpenter PM, Pericas M . 2015 Self-tuned software-managed energy reduction in InfiniBand links. In: Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems ICPADS'15, Melbourne, Australia, 14-17 December 2015. IEEE Press. <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/ICPADS.2015.87">http://dx.doi.org/10.1109/ICPADS.2015.87</ext-link>
[30]
Dickov B, Pericas M, Carpenter PM . 2014 Software-managed power reduction in InfiniBand links. In: Proceedings of the 2015 International Conference on Parallel Processing ICPP'15, Minneapolis, MN, 9-12 September 2014, pp. pp.311-–320. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/ICPP.2014.40">http://dx.doi.org/10.1109/ICPP.2014.40</ext-link>
[31]
Dongarra J, Beckman P, Moore T . 2011 The international exascale software project roadmap. International Journal of High Performance Computing Applications Volume 25 Issue 1: pp.3-–60.
[32]
Dongarra J, Ltaief H, Luszczek P . 2012 Energy footprint of advanced dense numerical linear algebra using tile algorithms on multicore architecture. In: Proceedings of the 2nd International Conference on Cloud and Green Computing CGC'12, Xiangtan, Hunan, 1-3 November 2012, pp. pp.274-–281. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/CGC.2012.113">http://dx.doi.org/10.1109/CGC.2012.113</ext-link>
[33]
Dreslinski RG, Wieckowski M, Blaauw D . 2010 Near-threshold computing: reclaiming Moore's law through energy efficient integrated circuits. Proceedings of the IEEEVolume 98 Issue 2: pp.253-–266. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/JPROC.2009.2034764">http://dx.doi.org/10.1109/JPROC.2009.2034764</ext-link>
[34]
Ellsworth DA, Malony AD, Rountree B . 2015 Dynamic power sharing for higher job throughput. In: Proceedings of the 2015 International Conference for High Performance Computing, Networking, Storage and Analysis SC'15, Austin, TX, 15-20 November 2015. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/2807591.2807643">http://dx.doi.org/10.1145/2807591.2807643</ext-link>
[35]
Elnozahy EN, Kistler M, Rajamony R. <chapter-title>Energy-efficient server clusters</chapter-title>. 2003. In: Power-Aware Computer Systems . . Volume Vol. 2325 . Berlin: Springer Berlin Heidelberg Press, pp. pp.179-–197.
[36]
Enos J, Steffen C, Fullop J . 2010 Quantifying the impact of GPUs on performance and energy efficiency in HPC clusters. In: Proceedings of the 2010 International Green Computing Conference, Chicago, IL, 15-18 August 2010, pp. pp.317-–324. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/GREENCOMP.2010.5598297">http://dx.doi.org/10.1109/GREENCOMP.2010.5598297</ext-link>
[37]
Esmaeilzadeh H, Cao T, Yang X . 2011 Looking back on the language and hardware revolutions: measured power, performance, and scaling. In: Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems ASPLOS XVI, Newport Beach, California, 5-11 March 2011, pp. pp.319-–332. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/1950365.1950402">http://dx.doi.org/10.1145/1950365.1950402</ext-link>
[38]
Esmaeilzadeh H, Cao T, Yang X . 2012 What is happening to power, performance, and software? IEEE Micro Volume 32 Issue 3: pp.110-–121.
[39]
Etinski M, Corbalan J, Labarta J . 2010 Optimizing Job Performance Under a Given Power Constraint in HPC Centers. In: Proceedings of the 2010 International Green Computing Conference, Chicago, IL, 15-18 August 2010, pp. pp.257-–267. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/GREENCOMP.2010.5598303">http://dx.doi.org/10.1109/GREENCOMP.2010.5598303</ext-link>
[40]
Etinski M, Corbalan J, Labarta J . 2012 Parallel job scheduling for power constrained HPC systems. Parallel Computing Volume 38 Issue 12: pp. pp.615-–630.
[41]
Feng X, Ge R, Cameron KW 2005 Power and energy profiling of scientific applications on distributed systems. In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium IPDPS'05, Denver, CO, 4-8 April 2005, p. 34. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/IPDPS.2005.346">http://dx.doi.org/10.1109/IPDPS.2005.346</ext-link>
[42]
Fowers J, Brown G, Wernsing J . 2013 A performance and energy comparison of convolution on GPUs, FPGAs, and multicore processors. ACM Transactions on Architecture and Code Optimization TACO - Special Issue on High-Performance Embedded Architectures and Compilers Volume 9 Issue 4: pp.110-–121.
[43]
Freeh VW, Bletsch TK, Rawson FL 2007a Scaling and packing on a chip multiprocessor. In: Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium IPDPS'07, Long Beach, California, 26-30 March 2007, pp. pp.1-–8. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/IPDPS.2007.370539">http://dx.doi.org/10.1109/IPDPS.2007.370539</ext-link>
[44]
Freeh VW, Lowenthal DK, Pan F . 2007b Analyzing the energy-time trade-off in high-performance computing applications. IEEE Transactions on Parallel and Distributed Systems Volume 18 Issue 6: pp.825-–848.
[45]
Freeh VW, Pan F, Kappiah N . 2005a Using multiple energy gears in MPI programs on a power-scalable cluster. In: Proceedings of the 10th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming PPoPP 2005, Chicago, IL, 15-17 June 2005, pp. pp.164-–173. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/1065944.1065967">http://dx.doi.org/10.1145/1065944.1065967</ext-link>
[46]
Freeh VW, Pan F, Kappiah N . 2005b Exploring the energy-time tradeoff in MPI programs on a power-scalable cluster. In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium IPDPS'05, Denver, CO, 4-8 April 2005. IEEE Press, 4a. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/IPDPS.2005.346">http://dx.doi.org/10.1109/IPDPS.2005.346</ext-link>
[47]
Ge R, Cameron KW 2007 Power-aware speedup. In: Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium IPDPS'07, Long Beach, California, 26-30 March 2007, pp. pp.1-–10. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/IPDPS.2007.370246">http://dx.doi.org/10.1109/IPDPS.2007.370246</ext-link>
[48]
Ge R, Feng X, Cameron KW 2005 Performance-constrained distributed DVS scheduling for scientific applications on power-aware clusters. In: Proceedings of the 2005 International Conference for High Performance Computing, Networking, Storage and Analysis SC'05, Seattle, WA, 12-18 November 2005, p. pp.34. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/SC.2005.57">http://dx.doi.org/10.1109/SC.2005.57</ext-link>
[49]
Ge R, Feng X, Feng W-C . 2007 CPU MISER: a performance-directed, run-time system for power-aware clusters. In: Proceedings of the 2007 International Conference on Parallel Processing ICPP'07, XiAn, China, 10-14 September 2007, p. 18. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/ICPP.2007.29">http://dx.doi.org/10.1109/ICPP.2007.29</ext-link>
[50]
Ge R, Feng X, Song S . 2009 PowerPack: energy profiling and analysis of high-performance systems and applications. IEEE Transactions on Parallel and Distributed Systems Volume 21 Issue 5: pp.658-–671.
[51]
Ge R, Vogt R, Majumder J . 2013 Effects of dynamic voltage and frequency scaling on a K20 GPU. In: Proceedings of the 42nd International Conference on Parallel Processing ICPP'13, Lyon, France, 1-4 October 2013, pp. pp.826-–833. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/ICPP.2013.98">http://dx.doi.org/10.1109/ICPP.2013.98</ext-link>
[52]
Georgiou Y, Cadeau T, Glesser D . 2014 Energy accounting and control with SLURM resource and job management system. In: Chatterjee M, Cao JN, Kothapalli K . eds Distributed Computing and Networking . Lecture Notes in Computer Science, Volume Vol. 8314 . Berlin: Springer Berlin Heidelberg Press, pp. pp.96-–118. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1007/978-3-642-45249-9_7">http://dx.doi.org/10.1007/978-3-642-45249-9_7</ext-link>.
[53]
Ghosh S, Chandrasekaran S, Chapman B 2012 Energy analysis of parallel scientific kernels on multiple GPUs. In: Proceedings of the 2012 Symposium on Application Accelerators in High Performance Computing SAAHPC, Argonne, IL, 10-11 July 2012, pp. pp.54-–63. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/SAAHPC.2012.17">http://dx.doi.org/10.1109/SAAHPC.2012.17</ext-link>
[54]
Grant RE, Afsahi A 2006 Power-performance efficiency of asymmetric multiprocessors for multi-threaded scientific applications. In: Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium IPDPS'06, Rhodes Island, Greece, 25-29 April 2006, pp. pp.300-–308. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/IPDPS.2006.1639601">http://dx.doi.org/10.1109/IPDPS.2006.1639601</ext-link>
[55]
Greenhalgh P 2011 big. LITTLE processing with ARM Cortex-A15 & Cortex-A7. ARM Whitepaper .
[56]
Grigori L, Demmel JW, Xiang H 2011 CALU: a communication optimal LU factorization algorithm. SIAM Journal on Matrix Analysis and Applications Volume 32 Issue 4: pp.1317-–1350.
[57]
Groves T, Grant R 2015 Power aware, dynamic provisioning of HPC networks. Sandia National Laboratories report, .
[58]
Gschwandtner P, Chalios C, Nikolopoulos DS . 2015 On the potential of significance-driven execution for energy-aware HPC. Computer Science - Research and Development Volume 30 Issue 2: pp.197-–206. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1007/s00450-014-0265-9">http://dx.doi.org/10.1007/s00450-014-0265-9</ext-link>
[59]
Gschwandtner P, Durillo JJ, Fahringer T 2014 Multi-objective auto-tuning with Insieme: optimization and trade-off analysis for time, energy and resource usage. In: Proceedings of the 20th European Conference on Parallel Processing Euro-Par 2014, Porto, Portugal, 25-29 August 2014, pp. pp.87-–98. Springer International Publishing. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1007/978-3-319-09873-9_8">http://dx.doi.org/10.1007/978-3-319-09873-9_8</ext-link>
[60]
Hackenberg D, Ilsche T, Schone R . 2013 Power measurement techniques on standard compute nodes: a quantitative comparison. In: Proceedings of 2013 IEEE International Symposium on Performance Analysis of Systems and Software ISPASS, Austin, TX, 21-23 April 2013, pp. pp.194-–204. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/ISPASS.2013.6557170">http://dx.doi.org/10.1109/ISPASS.2013.6557170</ext-link>
[61]
Hart A, Richardson H, Doleschal J . 2014 User-level power monitoring and application performance on cray XC30 supercomputers. In: Proceedings of the 2014 CUG Cray User Group meeting, Lugano, Switzerland, 8 May 2014.
[62]
Hennecke M, Frings W, Homberg W . 2012 Measuring power consumption on IBM blue gene/P. Computer Science - Research and Development Volume 27 Issue 4: pp.329-–336.
[63]
Hoefler T 2010 Software and hardware techniques for power-efficient HPC networking. Computing in Science & Engineering Volume 12 Issue 6: pp.30-–37. <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/MCSE.2010.96">http://dx.doi.org/10.1109/MCSE.2010.96</ext-link>
[64]
Hoffmann H, Misailovic S, Sidiroglou S . 2009 Using Code Perforation to Improve Performance, Reduce Energy Consumption, and Respond to Failures. Computer Science and Artificial Intelligence Laboratory Technical Report . . Cambridge: Computer Science Department, Massachusetts Institute of Technology MIT.
[65]
Hoffmann H, Sidiroglou S, Carbin M . 2011 Dynamic knobs for responsive power-aware computing. In: Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems ASPLOS XVI, Newport Beach, California, 5-11 March 2011, pp. pp.199-–212. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/1950365.1950390">http://dx.doi.org/10.1145/1950365.1950390</ext-link>
[66]
Hong S, Kim H 2010 An integrated GPU power and performance model. In: Proceedings of the 37th Annual International Symposium on Computer Architecture ISCA'10, Saint-Malo, France, 19-23 June 2010, pp. pp.280-–289. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/1816038.1815998">http://dx.doi.org/10.1145/1816038.1815998</ext-link>
[67]
Hsu C-H, Feng W-C 2005 A power-aware run-time system for high-performance computing. In: Proceedings of the 2005 International Conference for High Performance Computing, Networking, Storage and Analysis SC'05, Seattle, WA, 12-18 November 2005, p. 1. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/SC.2005.57">http://dx.doi.org/10.1109/SC.2005.57</ext-link>
[68]
Hsu C-H, Kremer U 2003a <chapter-title>Dynamic voltage and frequency scaling for scientific applications</chapter-title>. In: In: Dietz HG ed. Languages and Compilers for Parallel Computing ., Volume Vol. 2624, Berlin: Springer-Verlag Press, pp. pp.86-–99.
[69]
Hsu C-H, Kremer U 2003b The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction. In: Proceedings of the 24th ACM SIGPLAN Conference on Programming Language Design and Implementation PLDI'03, San Diego, CA, 8-11 June 2003, pp. pp.38-–48. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/780822.781137">http://dx.doi.org/10.1145/780822.781137</ext-link>
[70]
Huang S, Xiao S, Feng W 2009 On the energy efficiency of graphics processing units for scientific computing. In: Proceedings of the 23rd International Parallel and Distributed Processing Symposium IPDPS'09, Rome, Italy, 23-29 May 2009, pp. pp.1-–8. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/IPDPS.2009.5160980">http://dx.doi.org/10.1109/IPDPS.2009.5160980</ext-link>
[71]
<collab collab-type="author">IEEE 802.3az</collab>2010 Active/Idle Toggling with Low Power Idle.
[72]
Inadomi Y, Patki T, Inoue K . 2015 Analyzing and mitigating the impact of manufacturing variability in power-constrained supercomputing. In: Proceedings of the 2015 International Conference for High Performance Computing, Networking, Storage and Analysis SC'15, Austin, TX, 15-20 November 2015. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/2807591.2807639">http://dx.doi.org/10.1145/2807591.2807639</ext-link>
[73]
<collab collab-type="author">InfiniBand Trade Association</collab>2002 InfiniBand Architecture Specification . . InfiniBand Trade Association.
[74]
<collab collab-type="author">Intel Corp</collab>2011 System Programming Guide, volume 3B-2 of Intel 64 and IA-32 Architectures Software Developer's Manual . Intel Corp.
[75]
<collab collab-type="author">Intel Corp</collab>2013a Intel® Xeon® Processor Specification. Intel Corp.
[76]
<collab collab-type="author">Intel Corp</collab>2013b IPMI-Intelligent Platform Management Interface Specification Second Generation . . Intel Corp.
[77]
Jana S, Chapman B 2015 Impact of frequency scaling on one sided remote memory accesses. In: Proceedings of the 9th international conference on partitioned global address space programming models PGAS'15, Washington, DC, 16-18 September 2015, pp. pp.25-–37. IEEE Press. <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/PGAS.2015.11">http://dx.doi.org/10.1109/PGAS.2015.11</ext-link>
[78]
Jana S, Schuchart J, Chapman B 2014a Analysis of energy and performance of RDMA-based data access patterns. In: Proceedings of the 8th international conference on partitioned global address space programming models PGAS'14 . Eugene, OR, 6-10 October 2014. New York, NY: ACM Press. <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/2676870.2676882">http://dx.doi.org/10.1145/2676870.2676882</ext-link>
[79]
Jana S, Hernandez O, Poole S . 2014b Power consumption due to data movement in distributed programming models. In: Proceedings of the 20th international conference euro-par 2014 parallel processing, Porto, Portugal, 25-29 August 2014, pp. pp.366-–378. Springer Press. <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1007/978-3-319-09873-9_31">http://dx.doi.org/10.1007/978-3-319-09873-9_31</ext-link>
[80]
Jana S, Hernandez O, Poole S . 2014c Analyzing the energy and power consumption of remote memory accesses in the OpenSHMEM model. In: Proceedings of the 1st OpenSHMEM workshop: experiences, implementations and tools, Annapolis, MD, 4-6 March 2014, pp. pp.59-–73. Springer Press. <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1007/978-3-319-05215-1_5">http://dx.doi.org/10.1007/978-3-319-05215-1_5</ext-link>
[81]
Jiao Y, Lin H, Balaji P . 2010 Power and performance characterization of computational kernels on the GPU. In: Proceedings of the 2010 IEEE/ACM International Conference on Green Computing and Communications & International Conference on Cyber, Physical and Social Computing GREENCOM-CPSCOM'10, Hangzhou, China, 18-20 December 2010, pp. pp.221-–228. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/GreenCom-CPSCom.2010.143">http://dx.doi.org/10.1109/GreenCom-CPSCom.2010.143</ext-link>
[82]
Jordan H, Thoman P, Durillo JJ . 2012 A multi-objective auto-tuning framework for parallel codes. In: Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis SC'12, Salt Lake City, Utah, 10-16 November 2012, pp. pp.1-–12. New York, NY: ACM Press. <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/SC.2012.7">http://dx.doi.org/10.1109/SC.2012.7</ext-link>
[83]
Kale L, Krishnan S 1993 CHARM++: a portable concurrent object oriented system based on C++. In: Proceedings of the 8th annual conference on object-oriented programming systems, languages, and applications OOPSLA'93, Washington, DC, 26 September-01 October 1993, pp. pp.91-–108. ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/165854.165874">http://dx.doi.org/10.1145/165854.165874</ext-link>
[84]
Kandalla K, Mancini EP, Sur S . 2010 Designing power-aware collective communication algorithms for InfiniBand clusters. In: Proceedings of the 39th International Conference on Parallel Processing ICPP'10, San Diego, CA, 13-16 September 2010, pp. pp.218-–227. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/ICPP.2010.78">http://dx.doi.org/10.1109/ICPP.2010.78</ext-link>
[85]
Kappiah N, Freeh VW, Lowenthal DK 2005 Just in time dynamic voltage scaling: exploiting inter-node slack to save energy in MPI programs. In: Proceedings of the 2005 International Conference for High Performance Computing, Networking, Storage and Analysis SC'05, Seattle, WA, 12-18 November 2005, p. 33. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/SC.2005.57">http://dx.doi.org/10.1109/SC.2005.57</ext-link>
[86]
Karpuzcu UR, Kim NS, Torrellas J 2013 Coping with parametric variation at near-threshold voltages. IEEE Micro Volume 33 Issue 4: pp.6-–14.
[87]
Kaxiras S, Martonosi M 2008 Computer Architecture Techniques for Power-Efficiency . 1st ed.San Rafael: Morgan and Claypool Publishers.
[88]
Keramidas G, Spiliopoulos V, Kaxiras S 2010 Interval-based models for run-time DVFS orchestration in superscalar processors. In: Proceedings of the 7th ACM international Conference on Computing Frontiers CF'10, Bertinoro, Italy, 17-19 May 2010, pp. pp.287-–296. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/1787275.1787338">http://dx.doi.org/10.1145/1787275.1787338</ext-link>
[89]
Kestor G, Gioiosa R, Kerbyson D . 2013 Quantifying the energy cost of data movement in scientific applications. In: Proceedings of 2013 IEEE international symposium on workload characterization IISWC, Portland, OR, 22-24 September 2013, pp. pp.56-–65. IEEE Press. <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/IISWC.2013.6704670">http://dx.doi.org/10.1109/IISWC.2013.6704670</ext-link>
[90]
Kogge P, Bergman K, Borkar S . 2008 ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems. DARPA Report .
[91]
Korthikanti VA, Agha G 2010 Avoiding energy wastage in parallel applications. In: Proceedings of the 2010 International Green Computing Conference, Chicago, IL, 15-18 August 2010, pp. pp.149-–163. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/GREENCOMP.2010.5598314">http://dx.doi.org/10.1109/GREENCOMP.2010.5598314</ext-link>
[92]
Lam MO, Hollingsworth JK, de Supinski BR . 2013 Automatically adapting programs for mixed-precision floating-point computation. In: Proceedings of the 27th annual international conference on supercomputing ICS'13, Eugene, Oregon, 10-14 June 2013, pp. pp.369-–378. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/2464996.2465018">http://dx.doi.org/10.1145/2464996.2465018</ext-link>
[93]
Laros JH, Pedretti KT, Kelly SM . 2013 Energy-Efficient High Performance Computing - Measurement and Tuning . SpringerBriefs in Computer Science. London: Springer Publications.
[94]
Laros JH, Pedretti KT, Kelly SM . 2012 Energy based performance tuning for large scale high performance computing systems. In: Proceedings of the 2012 Symposium on High Performance Computing HPC'12, Orlando, FL, 26-29 March 2012. Society for Computer Simulation International Press.
[95]
Laros JH, Pokorny P, DeBonis D 2013 PowerInsight - a commodity power measurement capability. In: Proceedings of 2013 International Green Computing Conference IGCC, Arlington, VA, 27-29 June 2013, pp. pp.1-–6. IEEE Press. <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/IGCC.2013.6604485">http://dx.doi.org/10.1109/IGCC.2013.6604485</ext-link>.
[96]
Lawson B, Smirni E 2005 Power-aware resource allocation in high-end systems via online simulation. In: Proceedings of the 19th Annual International Conference on Supercomputing ICS'05, Boston, MA, 20-22 June 2005, pp. pp.229-–238. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/1088149.1088179">http://dx.doi.org/10.1145/1088149.1088179</ext-link>
[97]
Leon EA, Karlin I, Grant RE 2015 Optimizing explicit hydrodynamics for power, energy, and performance. In: Proceedings of 2015 IEEE International Conference on Cluster Computing CLUSTER, Chicago, IL, 8-11 September 2015, pp. pp.11-–21. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/CLUSTER.2015.12">http://dx.doi.org/10.1109/CLUSTER.2015.12</ext-link>
[98]
Li D, Byna S, Chakradhar S 2011a Energy-aware workload consolidation on GPU. In: Proceedings of the 40th International Conference Parallel Processing Workshops ICPPW'11, Taipei, Taiwan, China, 13-16 September 2011, pp. pp.389-–398. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/ICPPW.2011.25">http://dx.doi.org/10.1109/ICPPW.2011.25</ext-link>
[99]
Li B, Chang H-C, Song SL, Su C-Y . 2014 The power-performance tradeoffs of the Intel Xeon Phi on HPC applications. In: Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops IPDPSW'14, Phoenix, Arizona, 19-23 May 2014, pp. pp.1448-–1456. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/IPDPSW.2014.162">http://dx.doi.org/10.1109/IPDPSW.2014.162</ext-link>
[100]
Li S, Lim K, Faraboschi P . 2011b System-level integrated server architectures for scale-out datacenters. In: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture MICRO-44, Porto Alegre, Brazil, 4-7 December 2011, pp. pp.260-–271. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/2155620.2155651">http://dx.doi.org/10.1145/2155620.2155651</ext-link>
[101]
Li J, Martinez JF 2006 Dynamic power-performance adaptation of parallel computation on chip multiprocessors. In: Proceedings of the 12th International Symposium on High-Performance Computer Architecture HPCA'06, Austin, TX, 11-15 February 2006, pp. pp.77-–87. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/HPCA.2006.1598114">http://dx.doi.org/10.1109/HPCA.2006.1598114</ext-link>
[102]
Li D, Nikolopoulos DS, Cameron KW . 2010a Power-Aware MPI Task Aggregation Prediction for High-End Computing Systems. In Proceedings of the 2010 IEEE International Parallel and Distributed Processing Symposium IPDPS'10, Atlanta, GA, 19-23 April 2010, pp. pp.1-–12. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/IPDPS.2010.5470463">http://dx.doi.org/10.1109/IPDPS.2010.5470463</ext-link>
[103]
Li D, de Supinski BR, Schulz M . 2010b Hybrid MPI/OpenMP power-aware computing. In: Proceedings of the 2010 IEEE International Parallel and Distributed Processing Symposium IPDPS'10, Atlanta, GA, 19-23 April 2010, pp. pp.1-–12. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/IPDPS.2010.5470463">http://dx.doi.org/10.1109/IPDPS.2010.5470463</ext-link>
[104]
Lim MY, Freeh VW, Lowenthal DK 2006 Adaptive, transparent frequency and voltage scaling of communication phases in MPI programs. In: Proceedings of the 2006 International Conference for High Performance Computing, Networking, Storage and Analysis SC'06, Tampa, FL, 11-17 November 2006, p. 14. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/SC.2006.11">http://dx.doi.org/10.1109/SC.2006.11</ext-link>
[105]
Linderman MD, Ho M, Dill DL . 2010 Towards program optimization through automated analysis of numerical precision. In: Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization CGO'10, Toronto, Canada, 24-28 April 2010, pp. pp.230-–237. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/1772954.1772987">http://dx.doi.org/10.1145/1772954.1772987</ext-link>
[106]
Liu J, Poff D, Abali B 2009 Evaluating high performance communication: a power perspective. In: Proceedings of the 23rd International Conference on Supercomputing ICS'09, Yorktown Heights, NY, 8-12 June 2009, pp. pp.326-–337. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/1542275.1542322">http://dx.doi.org/10.1145/1542275.1542322</ext-link>
[107]
Luk C-K, Hong S, Kim H 2009 Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture MICRO-42, New York, NY, 12-16 December 2009. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/1669112.1669121">http://dx.doi.org/10.1145/1669112.1669121</ext-link>
[108]
Ma K, Li X, Chen W . 2012 GreenGPU: a holistic approach to energy efficiency in GPU-CPU heterogeneous architectures. In: Proceedings of the 41st International Conference on Parallel Processing ICPP'12, Pittsburgh, PA, 10-13 September 2012, pp. pp.48-–57. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/ICPP.2012.31">http://dx.doi.org/10.1109/ICPP.2012.31</ext-link>
[109]
Mämmelä O, Majanen M, Basmadjian R . 2012 Energy-aware job scheduler for high-performance computing. Computer Science - Research and Development Volume 27 Issue 4: pp.265-–275.
[110]
Marathe A, Bailey PE, Lowenthal DK . 2015<chapter-title>A run-time system for power-constrained HPC applications</chapter-title>. In: Kunkel JM, Ludwig T eds High Performance Computing ., Volume Vol. 9137 . Springer International Publishing, pp. pp.394-–408.
[111]
Martin SJ, Rush D, Kappel M 2015 Cray advanced platform monitoring and control CAPMC. In: Proceedings of the 2015 CUG Cray User Group meeting, Chicago, IL, 26-30 April 2015.
[112]
Mei X, Yung LS, Zhao K . 2013 A measurement study of GPU DVFS on energy conservation. In: Proceedings of the 2013 Workshop on Power-Aware Computing and Systems HotPower'13, Farmington, Pennsylvania, 3-6 November 2013. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/2525526.2525847">http://dx.doi.org/10.1145/2525526.2525847</ext-link>
[113]
Miceli R, Civario G, Sikora A . 2012 AutoTune: a plugin-driven approach to the automatic tuning of parallel applications. In: Proceedings of the 11th International Conference on Applied Parallel and Scientific Computing PARA'12, Helsinki, Finland, 10-13 June 2012, pp. pp.328-–342. Berlin, Heidelberg: Springer-Verlag. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1007/978-3-642-36803-5_24">http://dx.doi.org/10.1007/978-3-642-36803-5_24</ext-link>.
[114]
Miller DAB, Ozaktas HM 1997 Limit to the bit-rate capacity of electrical interconnects from the aspect ratio of the system architecture. Journal of Parallel and Distributed Computing Volume 41 Issue 1: pp.42-–52.
[115]
Minartz T, Ludwig T, Knobloch M . 2011 Managing hardware power saving modes for high performance computing. In: Proceedings of 2011 International on Green Computing Conference and Workshops IGCC'11, Orlando, FL, 25-28 July 2011, pp. pp.1-–8. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/IGCC.2011.6008581">http://dx.doi.org/10.1109/IGCC.2011.6008581</ext-link>
[116]
Mittal S, Vetter JS 2015 A survey of methods for analyzing and improving GPU energy efficiency. ACM Computing Surveys Volume 47 Issue 2: pp.1-–23.
[117]
Miwa S, Aita S, Nakamura H 2014 Performance estimation of high performance computing systems with energy efficient Ethernet technology. Computer Science-Research and Development Volume 29 Issue 3: pp.161-–169.
[118]
Miwa S, Nakamura H 2015 Profile-based power shifting in interconnection networks with on/off links. In: Proceedings of the 2015 International Conference for High Performance Computing, Networking, Storage and Analysis SC'15, Austin, TX, 15-20 November 2015. ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/2807591.2807639">http://dx.doi.org/10.1145/2807591.2807639</ext-link>
[119]
<collab collab-type="author">MPI Forum</collab>2015 MPI: a message-passing interface standard. .
[120]
Mukhanov L, Nikolopoulos DS, de Supinski BR 2015 ALEA: fine-grain energy profiling with basic block sampling. In: Proceedings of the 24th International Conference on Parallel Architectures and Compilation Techniques PACT-2015, San Francisco, CA, 18-21 October 2015, pp. pp.87-–98. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/PACT.2015.16">http://dx.doi.org/10.1109/PACT.2015.16</ext-link>
[121]
Nedevschi S, Popa L, Iannaccone G . 2008. Reducing network energy consumption via sleeping and rate-adaptation. In: Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation NSDI'08, San Francisco, CA, 16-18 April 2008, pp. pp.323-–336. Berkeley, CA: USENIX Association Press.
[122]
<collab collab-type="author">NVIDIA Corp</collab>2013 TESLA K20 GPU Accelerator Board Specification . NVIDIA Corp.
[123]
<collab collab-type="author">NVIDIA Corp</collab>2015 NVML NVIDIA Management Library Reference Manual . NVIDIA Corp.
[124]
<collab collab-type="author">OpenMP ARB Architecture Review Board</collab>2015 OpenMP Application Program Interface, Version 4.5 .
[125]
Patki T, Lowenthal DK, Rountree B . 2013 Exploring hardware overprovisioning in power-constrained, high performance computing. In: Proceedings of the 27th International Conference on Supercomputing ICS'13, Eugene, Oregon, 10-14 June 2013, pp. pp.173-–182. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/2464996.2465009">http://dx.doi.org/10.1145/2464996.2465009</ext-link>.
[126]
Patki T, Sasidharan A, Maiterth M . 2015 Practical resource management in power-constrained, high performance computing. In: Proceedings of the 24th IEEE International Symposium on High Performance Distributed Computing HPDC'15, Portland, Oregon, 15-19 June 2015, pp. pp.121-–132. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/2749246.2749262">http://dx.doi.org/10.1145/2749246.2749262</ext-link>.
[127]
Patterson D, Gannon D, Wrinn M 2013</year>a The Berkeley Par Lab: Progress in the Parallel Computing Landscape, <year>2013. Microsoft Corporation Press.
[128]
Patterson MK, Poole SW, Hsu C-H . 2013b <chapter-title>TUE, a New Energy-Efficiency Metric Applied at ORNL's Jaguar</chapter-title>. In: Kunkel JM, Ludwig T, Meuer HW eds Supercomputing ., Volume Vol. 7905 . Berlin: Springer Berlin Heidelberg Press, pp. pp.372-–382.
[129]
Pedretti K, Olivier SL, Ferreira KB . Early experiences with node-level power capping on the cray XC40 Platform. In: Proceedings of the 3rd International Workshop on Energy Efficient Supercomputing, Austin, TX, 15-20 November 2015. ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/2834800.2834801">http://dx.doi.org/10.1145/2834800.2834801</ext-link>
[130]
Price DC, Clark MA, Barsdell BR . 2015 Optimizing performance-per-watt on GPUs in high performance computing. Computer Science - Research and Development : : pp.1-–9.
[131]
Rahman SMF, Guo J, Bhat J A . 2012 Studying the impact of application-level optimizations on the power consumption of multi-core architectures. In: Proceedings of the 9th Conference on Computing FrontiersCF'12, Cagliari, Italy, 15-17 May 2012, pp. pp.123-–132. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/2212908.2212927">http://dx.doi.org/10.1145/2212908.2212927</ext-link>
[132]
Rahman SF, Guo J, Yi Q 2011 Automated empirical tuning of scientific codes for performance and power consumption. In: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers HiPEAC'11, Heraklion, Greece, 24-26 January 2011, pp. pp.107-–116. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/1944862.1944880">http://dx.doi.org/10.1145/1944862.1944880</ext-link>
[133]
Rajovic N, Carpenter P, Gelado I . 2013 Supercomputing with commodity CPUs: are mobile SoCs ready for HPC? In: Proceedings of the 2013 International Conference for High Performance Computing, Networking, Storage and Analysis SC'13, Denver, CO, 17-21 November 2013, pp. pp.1-–12. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/2503210.2503281">http://dx.doi.org/10.1145/2503210.2503281</ext-link>
[134]
Rotem E, Naveh A, Ananthakrishnan A . 2012 Power-management architecture of the Intel micro-architecture codenamed Sandy Bridge. IEEE Micro Volume 32 Issue 2: pp.20-–27.
[135]
Rountree B, Ahn DH, de Supinski BR . 2012 Beyond DVFS: a first look at performance under a hardware-enforced power bound. In: Proceedings of 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum IPDPSW'12, Shanghai, China, 21-25 May 2012, pp. pp.947-–953. IEEE Press</publisher-name>. <publisher-name>IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/IPDPSW.2012.116">http://dx.doi.org/10.1109/IPDPSW.2012.116</ext-link>
[136]
Rountree B, Lowenthal DK, Funk S . 2007 Bounding energy consumption in large-scale MPI programs. In: Proceedings of the 2007 International Conference for High Performance Computing, Networking, Storage and Analysis SC'07, Reno, NV, 10-16 November 2007, pp. pp.1-–9. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/1362622.1362688">http://dx.doi.org/10.1145/1362622.1362688</ext-link>
[137]
Rountree B, Lowenthal DK, Schulz M . 2011 Practical performance prediction under dynamic voltage frequency scaling. In: Proceedings of the 2nd International Green Computing Conference IGCC'11, Orlando, Florida, 25-28 July 2011, pp. pp.1-–8. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/IGCC.2011.6008553">http://dx.doi.org/10.1109/IGCC.2011.6008553</ext-link>
[138]
Rountree B, Lowenthal DK, de Supinski BR . 2009 Adagio: making DVS practical for complex HPC applications. In: Proceedings of the 23rd International Conference on Supercomputing ICS'09, Yorktown Heights, NY, 8-12 June 2009, pp. pp.460-–469. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/1542275.1542340">http://dx.doi.org/10.1145/1542275.1542340</ext-link>
[139]
Rubio-González C, Nguyen C, Nguyen HD . 2013 Precimonious: tuning assistant for floating-point precision. In: Proceedings of the 2013 International Conference for High Performance Computing, Networking, Storage and Analysis SC'13, Denver, CO, 17-21 November 2013. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/2510000/2503296">http://dx.doi.org/10.1145/2510000/2503296</ext-link>
[140]
Sampson A, Dietl W, Fortuna E, Gnanapragasam D . 2011 EnerJ: approximate data types for safe and general low-power computation. In: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation PLDI'11, San Jose, CA, 4-8 June 2011, pp. pp.164-–174. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/1993316.1993518">http://dx.doi.org/10.1145/1993316.1993518</ext-link>
[141]
Saputra H, Kandemir M, Vijaykrishnan N . 2002 Energy-conscious compilation based on voltage scaling. In: Proceedings of the Joint Conference on Languages, Compilers and Tools for Embedded Systems: Software and Compilers for Embedded Systems LCTES/SCOPES'02, Berlin, Germany, 19-21 June 2002, pp. pp.2-–11. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/1816038.1815998">http://dx.doi.org/10.1145/513829.513832</ext-link>
[142]
Saravanan KP, Carpenter PM, Ramirez A 2013 Power/performance evaluation of energy efficient Ethernet EEE for high performance computing. In: Proceedings of 2013 IEEE International Symposium on Performance Analysis of Systems and Software ISPASS, Austin, TX, 21-23 April 2013, pp. pp.205-–214. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/ISPASS.2013.6557171">http://dx.doi.org/10.1109/ISPASS.2013.6557171</ext-link>
[143]
Saravanan KP, Carpenter PM, Ramirez A 2014 A performance perspective on energy efficient HPC links. In: Proceedings of the 28th International Conference on Supercomputing ICS'14, Muenchen, Germany, 10-13 June 2014, pp. pp.313-–322. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/2597652.2597671">http://dx.doi.org/10.1145/2597652.2597671</ext-link>
[144]
Sarood O, Langer A, Gupta A . 2014 Maximizing throughput of overprovisioned HPC data centers under a strict power budget. In: Proceedings of the 2014 International Conference for High Performance Computing, Networking, Storage and Analysis SC'14, New Orleans, LA, 16-21 November 2014, pp. pp.807-–818. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/SC.2014.71">http://dx.doi.org/10.1109/SC.2014.71</ext-link>
[145]
Sarood O, Langer A, Kale L . 2013 Optimizing power allocation to CPU and memory subsystems in overprovisioned HPC Systems. In: Proceedings of 2013 IEEE International Conference on Cluster Computing CLUSTER, Indianapolis, IN, 23-27 September 2013, pp. pp.1-–8. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/CLUSTER">http://dx.doi.org/10.1109/CLUSTER</ext-link>. 2013.6702684.
[146]
Sarood O, Miller P, Totoni E . 2012 'Cool' load balancing for high performance computing data centers. IEEE Transactions on Computers Volume 61 Issue 2: pp.1752-–1764. <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/TC.2012.143">http://dx.doi.org/10.1109/TC.2012.143</ext-link>
[147]
Schkufza E, Sharma R, Aiken A 2014 Stochastic optimization of floating-point programs with tunable precision. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation PLDI'14, Edinburgh, UK, 9-11 June 2014, pp. pp.53-–64. New York, NY: ACM Press. <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/2594291.2594302">http://dx.doi.org/10.1145/2594291.2594302</ext-link>
[148]
Scogland T, Azose J, Rohr D . 2015 Node variability in large-scale power measurements: perspectives from the Green500, Top500 and EEHPCWG. In: Proceedings of the 2015 International Conference for High Performance Computing, Networking, Storage and Analysis SC'15, Austin, TX, 15-20 November 2015. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/2807591.2807658">http://dx.doi.org/10.1145/2807591.2807658</ext-link>
[149]
Scogland T, Steffen C, Wilde T . 2014 A power-measurement methodology for large-scale, high-performance computing. In: Proceedings of the 5th ACM/SPEC International Conference on Performance Engineering ICPE'14, Dublin, Ireland, 22-26 March 2014, pp. pp.149-–159. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/2568088.2576795">http://dx.doi.org/10.1145/2568088.2576795</ext-link>
[150]
Shalf J, Dosanjh S, Morrison J 2010 Exascale computing technology challenges. In: Proceedings of the 9th International Conference on High Performance Computing for Computational Science VECPAR'10, Berkeley, CA, 22-25 June 2010, pp. pp.1-–25. Heidelberg: Springer-Verlag Berlin Press. <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1007/978-3-642-19328-6_1">http://dx.doi.org/10.1007/978-3-642-19328-6_1</ext-link>
[151]
Solomonik E, Demmel J 2011 Communication-optimal parallel 2.5 D matrix multiplication LU factorization algorithms. In: Proceedings of the 18th European Conference on Parallel Processing Euro-Par 2011, Bordeaux, France, 29 August-2 September 2011, pp. pp.90-–109. Springer International Publishing. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1007/978-3-642-23397-5_10">http://dx.doi.org/10.1007/978-3-642-23397-5_10</ext-link>
[152]
Song S, Su C-Y, Ge R . 2011 Iso-energy-efficiency: an approach to power-constrained parallel computation. In: Proceedings of the 2011 IEEE International Parallel and Distributed Processing Symposium IPDPS'11, Alaska, USA, 16-20 May 2011, pp. pp.128-–139. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/IPDPS.2011.22">http://dx.doi.org/10.1109/IPDPS.2011.22</ext-link>
[153]
Suleman MA, Qureshi MK, Patt YN 2008 Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs. In: Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems ASPLOS XIII, Seattle, WA, 1-5 March 2008, pp. pp.277-–286. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/1346281.1346317">http://dx.doi.org/10.1145/1346281.1346317</ext-link>
[154]
Tapus C, Chung I-H, Hollingsworth JK 2002 Active harmony: towards automated performance tuning. In: Proceedings of the 2005 International Conference for High Performance Computing, Networking, Storage and Analysis SC'02, Baltimore, MD, 16-22 November 2002, pp. pp.1-–11. New York, NY: ACM Press. <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/SC.2002.10062">http://dx.doi.org/10.1109/SC.2002.10062</ext-link>
[155]
Tavarageri S, Sadayappan P 2013 A compiler analysis to determine useful cache size for energy efficiency. In: Proceedings of 27th international symposium on parallel & distributed processing workshops and PhD forum IPDPSW'13, Cambridge, MA, 20-24 May 2013, pp. pp.923-–930. IEEE Press. <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/IPDPSW.2013.268">http://dx.doi.org/10.1109/IPDPSW.2013.268</ext-link>
[156]
<collab collab-type="author">The Green 500</collab>2015 Available at: <ext-link ext-link-type="uri" xlink:href="http://www.green500.org/lists/green201511">http://www.green500.org/lists/green201511</ext-link> accessed 20 August 2016.
[157]
Tiwari A, Laurenzano MA, Carrington L . 2012 Auto-tuning for energy usage in scientific applications. In: Alexander M, D'Ambra P, Belloum A . eds Euro-Par 2011: Parallel Processing Workshops . Lecture Notes in Computer Science Volume Vol. 7156, 2012. Springer International Publishing, pp. pp.178-–187. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1007/978-3-642-29740-3_21">http://dx.doi.org/10.1007/978-3-642-29740-3_21</ext-link>
[158]
<collab collab-type="author">Top500</collab>2015<ext-link ext-link-type="uri" xlink:href="http://www.top500.org/lists/2015/11/">http://www.top500.org/lists/2015/11/</ext-link> .
[159]
Totoni E, Jain N, Kale L 2014a Power management of extreme-scale networks with on/off links in runtime systems. ACM Transactions on Parallel Computing - Special Issue on PPoPP 2012 Volume 1 Issue 2: pp.386-–393. <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/2687001">http://dx.doi.org/10.1145/2687001</ext-link>
[160]
Totoni E, Torrellas J, Kale L 2014b Using an adaptive HPC runtime system to reconfigure the cache hierarchy. In: Proceedings of the 2014 international conference for high performance computing, networking, storage and analysis SC'14, New Orleans, LA, 16-21 November 2014, pp. pp.1047-–1058. ACM Press. <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/SC.2014.90">http://dx.doi.org/10.1109/SC.2014.90</ext-link>
[161]
Vaidyanathan K, Avancha S, SherlekarOn S 2013<chapter-title>Exascale Computing & beyond: meeting the challenges</chapter-title>. In: D'Hollander EH, Dongarra JJ, Foster I. eds Transition of HPC Towards Exascale Computing IOS Press, pp. pp.24-–34.
[162]
Venkatesh A, Kandalla K, Panda DK 2013 Evaluation of energy characteristics of MPI communication primitives with RAPL. In: Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum IPDPSW, Boston, Massachusetts, 20-24 May 2013, pp. pp.938-–945. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/IPDPSW.2013.243">http://dx.doi.org/10.1109/IPDPSW.2013.243</ext-link>
[163]
Venkatesh A, Vishnu A, Hamidouche K . 2015 A case for application-oblivious energy-efficient MPI runtime. In: Proceedings of the 2015 International Conference for High Performance Computing, Networking, Storage and Analysis SC'15, Austin, TX, 15-20 November 2015. New York, NY: ACM Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/2807591.2807658">http://dx.doi.org/10.1145/2807591.2807658</ext-link>
[164]
Vishnu A, Song S, Marquez A . 2013 Designing energy efficient communication runtime systems: a view from PGAS models. The Journal of Supercomputing Volume 63 Issue 3: pp.691-–709. <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1007/s11227-011-0699-9">http://dx.doi.org/10.1007/s11227-011-0699-9</ext-link>
[165]
Wang G, Ren X 2010 Power-efficient work distribution method for CPU-GPU heterogeneous system. In: Proceedings of 2010 International Symposium on Parallel and Distributed Processing with Applications ISPA'10, Taipei, Taiwan, China, 6-9 September 2010, pp. pp.386-–393. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/ISPA.2010.22">http://dx.doi.org/10.1109/ISPA.2010.22</ext-link>
[166]
Ware M, Rajamani K, Floyd M . 2010 Architecting for power management: The IBM® POWER7™ approach. In: Proceedings of the 16th international symposium on high-performance computer architecture HPCA'16, Bangalore, India, 9-14 January 2010, pp. pp.1-–11. IEEE Press. <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/HPCA.2010.5416627">http://dx.doi.org/10.1109/HPCA.2010.5416627</ext-link>
[167]
Weiser M, Welch B, Demers A . 1994 Scheduling for reduced CPU Energy. In: Proceedings of the 1st USENIX Conference on Operating Systems Design and Implementation OSDI'94. ACM Press.
[168]
Williams J, Massie C, George AD . 2010 Characterization of fixed and reconfigurable multi-core devices for application acceleration. ACM Transactions on Reconfigurable Technology and Systems TRETS Volume 3 Issue 4: pp.1-–28.
[169]
Wolfe M 1996 High Performance Compilers for Parallel Computing . Boston: Addison-Wesley Publishing Company.
[170]
Woo DH, Lee H-HS 2008 Extending Amdahl's law for energy-efficient computing in the many-core era. Computer Volume 41 Issue 12: pp.24-–31.
[171]
Yoshii K, Iskra K, Gupta R . 2012 Evaluating power monitoring capabilities on IBM Blue Gene/P and Blue Gene/Q. In: Proceedings of 2012 IEEE International Conference on Cluster Computing CLUSTER, Beijing, China, 24-28 September 2012, pp. pp.36-–44. IEEE Press. ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/CLUSTER.2012.62">http://dx.doi.org/10.1109/CLUSTER.2012.62</ext-link>
[172]
Zhou Z, Lan Z, Tang W . 2014<chapter-title>Reducing energy costs for IBM Blue Gene/P power-aware job scheduling</chapter-title>. In: Job Scheduling Strategies for Parallel Processing ., Volume Vol. 8429 . Berlin: Springer Berlin Heidelberg Press, pp. pp.96-–115.
[173]
Zyuban V, Friedrich J, Gonzalez CJ . 2011 Power optimization methodology for the IBM POWER7 microprocessor. IBM Journal of Research and Development Volume 55 Issue 3: pp.7:1-–7:9. <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1147/JRD.2011.2110410">http://dx.doi.org/10.1147/JRD.2011.2110410</ext-link>.

Cited By

View all
  • (2024)Energy efficient power cap configurations through Pareto front analysis and machine learning categorizationCluster Computing10.1007/s10586-023-04151-227:3(3433-3449)Online publication date: 1-Jun-2024
  • (2023)Carbon-Aware Global Routing in Path-Aware NetworksProceedings of the 14th ACM International Conference on Future Energy Systems10.1145/3575813.3595192(144-158)Online publication date: 20-Jun-2023
  • (2022)Prediction of job characteristics for intelligent resource allocation in HPC systems: a survey and future directionsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-022-0625-816:5Online publication date: 1-Oct-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

Publisher

Sage Publications, Inc.

United States

Publication History

Published: 01 November 2017

Author Tags

  1. Parallel computing
  2. auto-tuning
  3. energy efficiency
  4. high performance computing
  5. power saving

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Energy efficient power cap configurations through Pareto front analysis and machine learning categorizationCluster Computing10.1007/s10586-023-04151-227:3(3433-3449)Online publication date: 1-Jun-2024
  • (2023)Carbon-Aware Global Routing in Path-Aware NetworksProceedings of the 14th ACM International Conference on Future Energy Systems10.1145/3575813.3595192(144-158)Online publication date: 20-Jun-2023
  • (2022)Prediction of job characteristics for intelligent resource allocation in HPC systems: a survey and future directionsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-022-0625-816:5Online publication date: 1-Oct-2022
  • (2022)Adaptive parallel applications: from shared memory architectures to fog computing (2002–2022)Cluster Computing10.1007/s10586-022-03692-225:6(4439-4461)Online publication date: 1-Dec-2022
  • (2021)Improvement of Energy-Efficiency in High Performance Computing (HPC)International Journal of ICT Research in Africa and the Middle East10.4018/IJICTRAME.29083510:2(30-51)Online publication date: 1-Jul-2021
  • (2021)Hardware and Software Solutions for Energy-Efficient Computing in Scientific ProgrammingScientific Programming10.1155/2021/55142842021Online publication date: 1-Jan-2021
  • (2021)FEPAC: A Framework for Evaluating Parallel Algorithms on Cluster ArchitecturesProceedings of the 2021 Australasian Computer Science Week Multiconference10.1145/3437378.3444363(1-10)Online publication date: 1-Feb-2021
  • (2021)UPR: deadlock-free dynamic network reconfiguration by exploiting channel dependency graph compatibilityThe Journal of Supercomputing10.1007/s11227-021-03791-877:11(12826-12856)Online publication date: 1-Nov-2021
  • (2020)Energy Efficient Design Techniques in Next-Generation Wireless Communication NetworksWireless Communications & Mobile Computing10.1155/2020/72353622020Online publication date: 1-Jan-2020
  • (2020)The Landscape of Exascale ResearchACM Computing Surveys10.1145/337239053:2(1-43)Online publication date: 20-Mar-2020
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media