Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3126908.3126945acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Towards fine-grained dynamic tuning of HPC applications on modern multi-core architectures

Published: 12 November 2017 Publication History

Abstract

There is a consensus that exascale systems should operate within a power envelope of 20MW. Consequently, energy conservation is still considered as the most crucial constraint if such systems are to be realized.
So far, most research on this topic focused on strategies such as power capping and dynamic power management. Although these approaches can reduce power consumption, we believe that they might not be sufficient to reach the exascale energy-efficiency goals. Hence, we aim to adopt techniques from embedded systems, where energy-efficiency has always been the fundamental objective.
A successful energy-saving technique used in embedded systems is to integrate fine-grained autotuning with dynamic voltage and frequency scaling. In this paper, we apply a similar technique to a real-world HPC application. Our experimental results on a HPC cluster indicate that such an approach saves up to 20% of energy compared to the baseline configuration, with negligible performance loss.

References

[1]
Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O'Reilly, and Saman P. Amarasinghe. 2014. OpenTuner: an extensible framework for program autotuning. In International Conference on Parallel Architectures and Compilation, PACT '14, Edmonton, AB, Canada. ACM, 303--316.
[2]
Axel Auweter, Arndt Bode, Matthias Brehm, Luigi Brochard, Nicolay Hammer, Herbert Huber, Raj Panda, Francois Thomas, and Torsten Wilde. 2014. A Case Study of Energy Aware Scheduling on SuperMUC. In 29th International Conference, ISC 2014, Leipzig, Germany, June 22-26. 394--409.
[3]
Protonu Basu, Mary W. Hall, Malik Murtaza Khan, Suchit Maindola, Saurav Muralidharan, Shreyas Ramalingam, Axel Rivera, Manu Shantharam, and Anand Venkat. 2013. Towards making autotuning mainstream. International Journal of High Performance Computing Applications 27, 4 (2013), 379--393.
[4]
JeeWhan Choi, Daniel Bedard, Robert J. Fowler, and Richard W. Vuduc. 2013. A Roofline Model of Energy. In 27th IEEE International Symposium on Parallel and Distributed Processing, IPDPS, Cambridge, MA, USA. 661--672.
[5]
Pietro Cicotti, Ananta Tiwari, and Laura Carrington. 2014. Efficient speed (ES): Adaptive DVFS and clock modulation for energy efficiency. In International Conference on Cluster Computing, CLUSTER Madrid, Spain. 158--166.
[6]
Yifeng Cui, Kim B. Olsen, Thomas H. Jordan, Kwangyoon Lee, Jun Zhou, Patrick Small, Daniel Roten, Geoffrey Ely, Dhabaleswar K. Panda, Amit Chourasia, John M. Levesque, Steven M. Day, and Philip Maechling. 2010. Scalable Earthquake Simulation on Petascale Supercomputers. In Conference on High Performance Computing Networking, Storage and Analysis, SC '10, New Orleans, LA, USA. 1--20.
[7]
Maja Etinski, Julita Corbalán, Jesús Labarta, and Mateo Valero. 2012. Understanding the future of energy-performance trade-off via DVFS in HPC environments. J. Parallel Distrib. Comput. 72, 4 (2012), 579--590.
[8]
Vincent W. Freeh, David K. Lowenthal, Feng Pan, Nandini Kappiah, Robert Springer, Barry Rountree, and Mark E. Femal. 2007. Analyzing the Energy-Time Trade-Off in High-Performance Computing Applications. IEEE Trans. Parallel Distrib. Syst. 18, 6 (2007), 835--848.
[9]
Rong Ge, Xizhou Feng, Shuaiwen Song, Hung-Ching Chang, Dong Li, and Kirk W. Cameron. 2010. PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications. IEEE Trans. Parallel Distrib. Syst. 21, 5 (2010), 658--671.
[10]
Stefan Valentin Gheorghita, Martin Palkovic, Juan Hamers, Arnout Vandecappelle, Stelios Mamagkakis, Twan Basten, Lieven Eeckhout, Henk Corporaal, Francky Catthoor, Frederik Vandeputte, and Koen De Bosschere. 2009. System-scenario-based design of dynamic embedded systems. ACM Trans. Design Autom. Electr. Syst. 14, 1, Article 3 (2009), 45 pages.
[11]
Ricardo Gonzalez and Mark Horowitz. 1996. Energy dissipation in general purpose microprocessors. IEEE Journal of Solid-State Circuits 31, 9 (1996), 1277--1284.
[12]
Corey Gough, Ian Steiner, and Winston A. Saunders. 2015. Energy Efficient Servers: Blueprints for Data Center Optimization (1st ed.). Apress.
[13]
Robert W. Graves. 1996. Simulating seismic wave propagation in 3D elastic media using staggered-grid finite differences. Bulletin of the Seismological Society of America 86, 4 (1996), 1091--1106.
[14]
Philipp Gschwandtner, Juan José Durillo, and Thomas Fahringer. 2014. Multi-Objective Auto-Tuning with Insieme: Optimization and Trade-Off Analysis for Time, Energy and Resource Usage. In 20th International Conference on Parallel Processing Euro-Par, Porto, Portugal. 87--98.
[15]
Daniel Hackenberg, Thomas Ilsche, Joseph Schuchart, Robert Schöne, Wolfgang E. Nagel, Marc Simon, and Yiannis Georgiou. 2014. HDEEM: high definition energy efficiency monitoring. In Proceedings of the 2nd International Workshop on Energy Efficient Supercomputing, E2SC '14, New Orleans, LA, USA. 1--10.
[16]
Daniel Hackenberg, Robert Schöne, Thomas Ilsche, Daniel Molka, Joseph Schuchart, and Robin Geyer. 2015. An Energy Efficiency Feature Survey of the Intel Haswell Processor. In IEEE International Parallel and Distributed Processing Symposium Workshop, IPDPSW. 896--904.
[17]
Per Hammarlund, Alberto J. Martinez, Atiq A. Bajwa, David L. Hill, Erik G. Hallnor, Hong Jiang, Martin Dixon, Michael Derr, Mikal Hunsaker, Rajesh Kumar, Randy B. Osborne, Ravi Rajwar, Ronak Singhal, Reynold D'Sa, Robert Chappell, Shiv Kaushik, Srinivas Chennupaty, Stéphan Jourdan, Steve Gunther, Thomas Piazza, and Ted Burton. 2014. Haswell: The Fourth-Generation Intel Core Processor. IEEE Micro 34, 2 (2014), 6--20.
[18]
Henry Hoffmann, Stelios Sidiroglou, Michael Carbin, Sasa Misailovic, Anant Agarwal, and Martin C. Rinard. 2011. Dynamic knobs for responsive power-aware computing. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS Newport Beach, CA, USA. 199--212.
[19]
Johannes Hofmann, Dietmar Fey, Jan Eitzinger, Georg Hager, and Gerhard Wellein. 2016. Analysis of Intel's Haswell Microarchitecture Using the ECM Model and Microbenchmarks. In International Conference on Architecture of Computing Systems, Nuremberg, Germany. Springer International Publishing, Cham, 210--222.
[20]
Intel Corporation. 2016. Intel Xeon Processor E5 v3 Product Family - Processor Specification Update. http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-v3-spec-update.pdf. (September 2016). {Online; accessed 08-March-2017}.
[21]
Herbert Jordan, Peter Thoman, Juan Jose Durillo Barrionuevo, Simone Pellegrini, Philipp Gschwandtner, Thomas Fahringer, and Hans Moritsch. 2012. A multi-objective auto-tuning framework for parallel codes. In Conference on High Performance Computing Networking, Storage and Analysis, SC '12, SaltLake City, UT, USA. 10.
[22]
Dong Li, Bronis R. de Supinski, Martin Schulz, Kirk W. Cameron, and Dimitrios S. Nikolopoulos. 2010. Hybrid MPI/OpenMP power-aware computing. In 24th IEEE International Symposium on Parallel and Distributed Processing, IPDPS, Atlanta, GA, USA Proceedings. 1--12.
[23]
Dong Li, Dimitrios S. Nikolopoulos, Kirk W. Cameron, Bronis R. de Supinski, and Martin Schulz. 2010. Power-aware MPI task aggregation prediction for high-end computing systems. In 24th IEEE International Symposium on Parallel and Distributed Processing, IPDPS, Atlanta, Georgia, USA. 1-12.
[24]
Andrea Martínez, AnnaSikora, Eduardo César, and Joan Sorribes. 2014. ELASTIC: A large scale dynamic tuning environment. Scientific Programming 22, 4 (2014), 261--271.
[25]
John D. McCalpin. 1995. Memory Bandwidth and Machine Balance in Current High Performance Computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter (Dec 1995), 19--25.
[26]
Anna Morajko, Tomàs Margalef, and Emilio Luque. 2007. Design and implementation of a dynamic tuning environment. J. Parallel Distrib. Comput. 67, 4 (2007), 474--490.
[27]
Espen Birger Raknes, Børge Arntsen, and Wiktor Weibull. 2015. Three-dimensional elastic full waveform inversion using seismic data from the Sleipner area. Geophysical Journal International 202, 3 (2015), 1877--1894.
[28]
Barry Rountree, Dong H. Ahn, Bronis R. de Supinski, David K. Lowenthal, and Martin Schulz. 2012. Beyond DVFS: A First Look at Performance under a Hardware-Enforced Power Bound. In 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, IPDPSW 2012, Shanghai, China. 947--953.
[29]
Mazeiar Salehie and Ladan Tahvildari. 2009. Self-adaptive software: Landscape and research challenges. TAAS 4, 2, Article 14 (2009), 42 pages.
[30]
Robert Schöne and Daniel Molka. 2014. Integrating performance analysis and energy efficiency optimizations in a unified environment. Computer Science - R&D 29, 3-4 (2014), 231--239.
[31]
Robert Schöne, Jan Treibig, Manuel F. Dolz, Carla Guillén, Carmen B. Navarrete, Michael Knobloch, and Barry Rountree. 2014. Tools and methods for measuring and tuning the energy efficiency of HPC systems. Scientific Programming 22, 4 (2014), 273--283.
[32]
Joseph Schuchart, Michael Gerndt, Per Gunnar Kjeldsberg, Michael Lysaght, David Horák, Lubomír Ríha, Andreas Gocht, Mohammed Sourouri, Madhura Kumaraswamy, Anamika Chowdhury, Magnus Jahre, Kai Diethelm, Othman Bouizi, Umbreen Sabir Mian, Jakub Kružík, Radim Sojka, Martin Beseda, Venkatesh Kannan, Zakaria Bendifallah, Daniel Hackenberg, and Wolfgang E Nagel. 2017. The READEX formalism for automatic tuning for energy efficiency. Computing (2017), 1--9.
[33]
Centre For Information Services and High Performance Computing (ZIH). 2017. SystemTaurus. https://doc.zih.tu-dresden.de/hpc-wiki/bin/view/Compendium/HardwareTaurus. (2017). {Online; accessed 28-February-2017}.
[34]
Anna Sikora, Eduardo César, Isaías A. Comprés Ureña, and Michael Gerndt. 2016. Autotuning of MPI Applications Using PTF. In Proceedings of the ACM Workshop on Software Engineering Methods for Parallel and High Performance Applications, Kyoto, Japan. 31--38.
[35]
Mohammed Sourouri, Scott B. Baden, and Xing Cai. 2017. Panda: A Compiler Framework for Concurrent CPU+GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers. International Journal of Parallel Programming 45, 3 (2017), 711--729.
[36]
Mohammed Sourouri, Johannes Langguth, Filippo Spiga, Scott B. Baden, and Xing Cai. 2015. CPU+GPU Programming of Stencil Computations for Resource-Efficient Use of GPU Clusters. In International Conference on Computational Science and Engineering, CSE'15, Porto, Portugal. IEEE Computer Society, 17--26.
[37]
Cristian Tapus, I-Hsin Chung, and Jeffrey K. Hollingsworth. 2002. Active harmony: towards automated performance tuning. In Conference on High Performance Computing Networking, Storage and Analysis, SC '02, Baltimore, MD, USA. 43:1--43:11.
[38]
Christian Terboven, Dieter an Mey, Dirk Schmidl, Henry Jin, and Thomas Reichstein. 2008. Data and Thread Affinity in OpenMP Programs. In Proceedings of the Workshop on Memory Access on Future Processors: A Solved Problem?, MAW'08, Ischia, Italy, May 5-7. ACM, New York, NY, USA, 377--384.
[39]
Ananta Tiwari, Chun Chen, Jacqueline Chame, Mary W. Hall, and Jeffrey K. Hollingsworth. 2009. A scalable auto-tuning framework for compiler optimization. In 23rd IEEE International Symposium on Parallel and Distributed Processing, IPDPS, Rome, Italy. 1--12.
[40]
R. Clinton Whaley and Jack J. Dongarra. 1998. Automatically Tuned Linear Algebra Software. In Conference on High Performance Computing Networking, Storage and Analysis, SC '98, Orlando, FL, USA. 38.
[41]
Samuel Williams, Andrew Waterman, and David A. Patterson. 2009. Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (2009), 65--76.

Cited By

View all
  • (2023)SYnergy: Fine-grained Energy-Efficient Heterogeneous Computing for Scalable Energy SavingProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607055(1-13)Online publication date: 12-Nov-2023
  • (2022)PowerSpector: Towards Energy Efficiency with Calling-Context-Aware Profiling2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00126(1272-1282)Online publication date: May-2022
  • (2022)DEPO: A dynamic energy‐performance optimizer tool for automatic power capping for energy efficient high‐performance computingSoftware: Practice and Experience10.1002/spe.313952:12(2598-2634)Online publication date: 14-Aug-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2017
801 pages
ISBN:9781450351140
DOI:10.1145/3126908
  • General Chair:
  • Bernd Mohr,
  • Program Chair:
  • Padma Raghavan
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. autotuning
  2. dynamic tuning
  3. dynamic voltage and frequency scaling
  4. energy-efficiency
  5. high performance computing

Qualifiers

  • Research-article

Funding Sources

Conference

SC '17
Sponsor:

Acceptance Rates

SC '17 Paper Acceptance Rate 61 of 327 submissions, 19%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)32
  • Downloads (Last 6 weeks)3
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)SYnergy: Fine-grained Energy-Efficient Heterogeneous Computing for Scalable Energy SavingProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607055(1-13)Online publication date: 12-Nov-2023
  • (2022)PowerSpector: Towards Energy Efficiency with Calling-Context-Aware Profiling2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00126(1272-1282)Online publication date: May-2022
  • (2022)DEPO: A dynamic energy‐performance optimizer tool for automatic power capping for energy efficient high‐performance computingSoftware: Practice and Experience10.1002/spe.313952:12(2598-2634)Online publication date: 14-Aug-2022
  • (2021)Parallel application power and performance prediction modeling using simulationProceedings of the Winter Simulation Conference10.5555/3522802.3522980(1-12)Online publication date: 13-Dec-2021
  • (2021)Bootstrapping in-situ workflow auto-tuning via combining performance models of component applicationsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476197(1-15)Online publication date: 14-Nov-2021
  • (2021)Efficient Auto-Tuning of Parallel Programs with Interdependent Tuning Parameters via Auto-Tuning Framework (ATF)ACM Transactions on Architecture and Code Optimization10.1145/342709318:1(1-26)Online publication date: 20-Jan-2021
  • (2021)Parallel Application Power and Performance Prediction Modeling Using Simulation2021 Winter Simulation Conference (WSC)10.1109/WSC52266.2021.9715340(1-12)Online publication date: 12-Dec-2021
  • (2020)Exploiting Dynamism in HPC Applications to Optimize Energy-EfficiencyWorkshop Proceedings of the 49th International Conference on Parallel Processing10.1145/3409390.3409399(1-10)Online publication date: 17-Aug-2020
  • (2020)CodeSeerProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392741(1-11)Online publication date: 29-Jun-2020
  • (2020)Fine-grained Powercap Allocation for Power-constrained Systems based on Multi-objective Machine LearningIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.3045983(1-1)Online publication date: 2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media