Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3225058.3225088acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article
Public Access

Energy-efficient Application Resource Scheduling using Machine Learning Classifiers

Published: 13 August 2018 Publication History

Abstract

Resource scheduling in high performance computing (HPC) usually aims to minimize application runtime rather than optimize for energy efficiency. Most existing research on reducing power and energy consumption imposes the constraint that little or no performance loss is allowed, which improves but still does not maximize energy efficiency. By optimizing for energy efficiency instead of application turnaround time, we can reduce the cost of running scientific applications. We propose using machine learning classification, driven by low-level hardware performance counters, to predict the most energy-efficient resource settings to use during application runtime, which unlike static resource scheduling dynamically adapts to changing application behavior. We evaluate our approach on a large shared-memory system using four complex bioinformatic HPC applications, decreasing energy consumption over the naive race scheduler by 20% on average, and by as much as 38%. An average increase in runtime of 31% is dominated by a 39% reduction in power consumption, from which we extrapolate the potential for a 24% increase in throughput for future over-provisioned, power-constrained clusters. This work demonstrates that low-overhead classification is suitable for dynamically optimizing energy efficiency during application runtime.

References

[1]
Bilge Acun, Phil Miller, and Laxmikant V. Kale. 2016. Variation Among Processors Under Turbo Boost in HPC Systems. In ICS.
[2]
Mark F. Adams, Jed Brown, John Shalf, Brian van Straalen, Erich Strohmaier, and Sam Williams. 2014. HPGMG 1.0: A Benchmark for Ranking High Performance Computing Systems. Technical Report LBNL-6630E. LBNL.
[3]
Claudia Alvarado, Dan Tamir, and Apan Qasem. 2015. Realizing Energy-efficient Thread Affinity Configurations with Supervised Learning. In IGSC.
[4]
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. 1991. The NAS Parallel Benchmarks-Summary and Preliminary Results. In SC.
[5]
Michael Berry, Thomas E. Potok, Prasanna Balaprakash, Henry Hoffmann, Raju Vatsavai, Prabhat, and Robinson Pino. 2015. Machine Learning and Understanding for Intelligent Extreme Scale Scientific Computing and Discovery. (2015).
[6]
Dimitrios Chasapis, Marc Casas, Miquel Moretó, Martin Schulz, Eduard Ayguadé, Jesus Labarta, and Mateo Valero. 2016. Runtime-Guided Mitigation of Manufacturing Variability in Power-Constrained Multi-Socket NUMA Nodes. In ICS.
[7]
Matthew Curtis-Maury, Ankur Shah, Filip Blagojevic, Dimitrios S. Nikolopoulos, Bronis R. de Supinski, and Martin Schulz. 2008. Prediction Models for Multi-dimensional Power-performance Optimization on Many Cores. In PACT.
[8]
Howard David, Eugene Gorbatov, Ulf R. Hanebutte, Rahul Khanna, and Christian Le. 2010. RAPL: Memory Power Estimation and Capping. In ISLPED.
[9]
Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-aware Scheduling for Heterogeneous Datacenters. In ASPLOS.
[10]
Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-efficient and QoS-aware Cluster Management. In ASPLOS.
[11]
Sudip S Dosanjh, Shane Canon, Jack Deslippe, Kjiersten Fagnan, Richard A Gerber, Lisa Gerhardt, Jason Hick, Douglas Jacobsen, David Skinner, and Nicholas J Wright. 2013. Extreme Data Science at the National Energy Research Scientific Computing (NERSC) Center. In PARCO.
[12]
Matteo Ferroni, Andrea Corna, Andrea Damiani, Rolando Brondolin, Juan A. Colmenares, Steven Hofmeyr, John D. Kubiatowicz, and Marco D. Santambrogio. 2017. Power Consumption Models for Multi-Tenant Server Infrastructures. ACM TACO 14, 4, Article 38 (Nov. 2017).
[13]
Jerome H. Friedman. 2001. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Statist. 29, 5 (Oct. 2001).
[14]
Evangelos Georganas, Aydın Buluç, Jarrod Chapman, Steven Hofmeyr, Chaitanya Aluru, Rob Egan, Leonid Oliker, Daniel Rokhsar, and Katherine Yelick. 2015. HipMer: An Extreme-Scale De Novo Genome Assembler. In SC.
[15]
Pierre Geurts, Damien Ernst, and Louis Wehenkel. 2006. Extremely Randomized Trees. Machine Learning 63, 1 (01 Apr 2006).
[16]
Van Emden Henson and Ulrike Meier Yang. 2002. BoomerAMG: A Parallel Algebraic Multigrid Solver and Preconditioner. Appl. Numer. Math. (April 2002).
[17]
Geoffrey E. Hinton. 1989. Connectionist Learning Procedures. Artificial Intelligence 40, 1 (1989).
[18]
Henry Hoffmann. 2013. Racing and Pacing to Idle: An Evaluation of Heuristics for Energy-aware Resource Allocation. In HotPower.
[19]
Henry Hoffmann. 2014. CoAdapt: Predictable Behavior for Accuracy-Aware Applications Running on Power-Aware Systems. In ECRTS.
[20]
Henry Hoffmann. 2015. JouleGuard: Energy Guarantees for Approximate Applications. In SOSP.
[21]
Connor Imes, David H. K. Kim, Martina Maggio, and Henry Hofmann. 2016. Portable Multicore Resource Management for Applications with Performance Constraints. In MCSoC.
[22]
Nandini Kappiah, Vincent W. Freeh, and David K. Lowenthal. 2005. Just In Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs. In SC.
[23]
Ian Karlin, Jeff Keasler, and Rob Neely. 2013. LULESH 2.0 Updates and Changes. Technical Report LLNL-TR-641973.
[24]
David H. K. Kim, Connor Imes, and Henry Hoffmann. 2015. Racing and Pacing to Idle: Theoretical and Empirical Analysis of Energy Optimization Heuristics. In CPSNA.
[25]
Minyoung Kim, Mark-Oliver Stehr, Carolyn Talcott, Nikil Dutt, and Nalini Venkatasubramanian. 2013. xTune: A Formal Methodology for Cross-layer Tuning of Mobile Embedded Systems. ACM TECS 11, 4, Article 73 (2013).
[26]
Peter Kogge, Shekhar Borkar, Dan Campbell, William Carlson, William Dally, Monty Denneau, Paul Franzon, William Harrod, Jon Hiller, Stephen Keckler, Dean Klein, and Robert Lucas. 2008. ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems. (28 Sept. 2008).
[27]
Adam J. Kunen, Teresa S. Bailey, and Peter N. Brown. 2015. KRIPKE - A Massively Parallel Transport Mini-App. In American Nuclear Society M&C. http://www.osti.gov/scitech/servlets/purl/1229802
[28]
Los Alamos National Laboratory. 2016. CoMD. (2016). https://github.com/ECP-copa/CoMD
[29]
Lawrence Livermore National Laboratory. 2017. Co-design at Lawrence Livermore National Lab -- Quicksilver. (2017). https://codesign.llnl.gov/quicksilver.php
[30]
Dinghua Li, Chi-Man Liu, Ruibang Luo, Kunihiko Sadakane, and Tak-Wah Lam. 2015. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 10 (2015).
[31]
Simone Libutti, Giuseppe Massari, Patrick Bellasi, and William Fornaciari. 2014. Exploiting Performance Counters for Energy Efficient Co-Scheduling of Mixed Workloads on Multi-Core Platforms. In PARMA-DITAM.
[32]
Aniruddha Marathe, Peter E. Bailey, David K. Lowenthal, Barry Rountree, Martin Schulz, and Bronis R. de Supinski. 2015. A Run-Time System for Power-Constrained HPC Applications. In ISC.
[33]
John D. McCalpin. 1995. Memory Bandwidth and Machine Balance in Current High Performance Computers. IEEE TCCA Newsletter (Dec. 1995).
[34]
Nikita Mishra, Connor Imes, John D. Lafferty, and Henry Hoffmann. 2018. CALOREE: Learning Control for Predictable Latency and Low Energy. In ASPLOS.
[35]
Nikita Mishra, Huazhe Zhang, John D. Lafferty, and Henry Hoffmann. 2015. A Probabilistic Graphical Model-based Approach for Minimizing Energy Under Performance Constraints. In ASPLOS.
[36]
NERSC. 2017 (accessed Jan, 2018). GENEPOOL. (2017 (accessed Jan, 2018)). http://www.nersc.gov/users/computational-systems/genepool/
[37]
Sergey Nurk, Dmitry Meleshko, Anton Korobeynikov, and Pavel Pevzner. 2016. metaSPAdes: a new versatile de novo metagenomics assembler. ArXiv e-prints (April 2016). arXiv:q-bio.GN/1604.03071
[38]
opcm. 2016. Processor Counter Monitor (PCM). (2016). https://github.com/opcm/pcm
[39]
Tapasya Patki, David K. Lowenthal, Anjana Sasidharan, Matthias Maiterth, Barry L. Rountree, Martin Schulz, and Bronis R. de Supinski. 2015. Practical Resource Management in Power-Constrained, High Performance Computing. In HPDC.
[40]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (Nov. 2011). http://dl.acm.org/citation.cfm?id=1953048.2078195
[41]
Yu Peng, Henry CM Leung, Siu-Ming Yiu, and Francis YL Chin. 2012. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 11 (2012).
[42]
Barry Rountree, David K. Lownenthal, Bronis R. de Supinski, Martin Schulz, Vincent W. Freeh, and Tyler Bletsch. 2009. Adagio: Making DVS Practical for Complex HPC Applications. In ICS.
[43]
Osman Sarood, Akhil Langer, Laxmikant Kale, Barry Rountree, and Bronis de Supinski. 2013. Optimizing Power Allocation to CPU and Memory Subsystems in Overprovisioned HPC Systems. In CLUSTER.
[44]
Hiroshi Sasaki, Yoshimichi Ikeda, Masaaki Kondo, and Hiroshi Nakamura. 2007. An Intra-task Dvfs Technique Based on Statistical Analysis of Hardware Events. In CF.
[45]
Srinath Sridharan, Gagan Gupta, and Gurindar S. Sohi. 2013. Holistic Run-time Parallelism Management for Time and Energy Efficiency. In ICS.
[46]
ExaOSR Team. 2012. Key Challenges for Exascale OS/R. (15 June 2012). https://collab.mcs.anl.gov/display/exaosr/Challenges
[47]
John R. Tramm, Andrew R. Siegel, Benoit Forget, and Colin Josey. 2014. Performance Analysis of a Reduced Data Movement Algorithm for Neutron Cross Section Data in Monte Carlo Simulations. In EASC.
[48]
John R. Tramm, Andrew R. Siegel, Tanzima Islam, and Martin Schulz. 2014. XSBench - The Development and Verification of a Performance Abstraction for Monte Carlo Reactor Analysis. In PHYSOR.
[49]
Ghislain Landry Tsafack Chetsa, Laurent Lefèvre, Jean-Marc Pierson, Patricia Stolf, and Georges Da Costa. 2013. Exploiting Performance Counters to Predict and Improve Energy Performance of HPC Systems. Future Generation Computer Systems (Aug. 2013). https://hal.inria.fr/hal-00925306
[50]
Vibhore Vardhan, Wanghong Yuan, Albert F. Harris III, Sarita V. Adve, Robin Kravets, Klara Nahrstedt, Daniel Grobe Sachs, and Douglas L. Jones. 2009. GRACE-2: Integrating Fine-Grained Application Adaptation with Global Adaptation for Saving Energy. IJES 4, 2 (2009).
[51]
Thomas Willhalm. 2014. Intel PCM Column Names Decoder Ring. (18 July 2014). https://software.intel.com/en-us/blogs/2014/07/18/intel-pcm-column-names-decoder-ring
[52]
Xingfu Wu, Valerie Taylor, Jeanine Cook, and Philip J. Mucci. 2016. Using Performance-Power Modeling to Improve Energy Efficiency of HPC Applications. Computer 49, 10 (Oct. 2016).
[53]
Huazhe Zhang and Henry Hoffmann. 2016. Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques. In ASPLOS.

Cited By

View all
  • (2024)Planter: Rapid Prototyping of In-Network Machine Learning InferenceACM SIGCOMM Computer Communication Review10.1145/3687230.368723254:1(2-21)Online publication date: 6-Aug-2024
  • (2023)Energy-Aware Scheduling for High-Performance Computing Systems: A SurveyEnergies10.3390/en1602089016:2(890)Online publication date: 12-Jan-2023
  • (2023)Fusion Orchestration Guidelines (FOG) for Collaborative Computing and Network Data FusionNAECON 2023 - IEEE National Aerospace and Electronics Conference10.1109/NAECON58068.2023.10365788(286-293)Online publication date: 28-Aug-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '18: Proceedings of the 47th International Conference on Parallel Processing
August 2018
945 pages
ISBN:9781450365109
DOI:10.1145/3225058
© 2018 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

In-Cooperation

  • University of Oregon: University of Oregon

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Energy Efficiency
  2. Machine Learning
  3. Power Management
  4. Runtime Control

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ICPP 2018

Acceptance Rates

ICPP '18 Paper Acceptance Rate 91 of 313 submissions, 29%;
Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)288
  • Downloads (Last 6 weeks)32
Reflects downloads up to 30 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Planter: Rapid Prototyping of In-Network Machine Learning InferenceACM SIGCOMM Computer Communication Review10.1145/3687230.368723254:1(2-21)Online publication date: 6-Aug-2024
  • (2023)Energy-Aware Scheduling for High-Performance Computing Systems: A SurveyEnergies10.3390/en1602089016:2(890)Online publication date: 12-Jan-2023
  • (2023)Fusion Orchestration Guidelines (FOG) for Collaborative Computing and Network Data FusionNAECON 2023 - IEEE National Aerospace and Electronics Conference10.1109/NAECON58068.2023.10365788(286-293)Online publication date: 28-Aug-2023
  • (2022)GOAL: Supporting General and Dynamic Adaptation in Computing SystemsProceedings of the 2022 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software10.1145/3563835.3567655(16-32)Online publication date: 29-Nov-2022
  • (2022)Penelope: Peer-to-peer Power ManagementProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545047(1-11)Online publication date: 29-Aug-2022
  • (2022)A Survey of Machine Learning for Computer Architecture and SystemsACM Computing Surveys10.1145/349452355:3(1-39)Online publication date: 3-Feb-2022
  • (2022)Energy-Aware Non-Preemptive Task Scheduling With Deadline Constraint in DVFS-Enabled Heterogeneous ClustersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.318109633:12(4083-4099)Online publication date: 1-Dec-2022
  • (2022)Online Power Management for Multi-Cores: A Reinforcement Learning Based ApproachIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.309227033:4(751-764)Online publication date: 1-Apr-2022
  • (2022)Amphis: Managing Reconfigurable Processor Architectures With Generative Adversarial LearningIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319798041:11(3993-4003)Online publication date: Nov-2022
  • (2022)Comparative Study on Energy-Efficiency for Wireless Body Area Network using Machine Learning Approach2022 Seventh International Conference on Parallel, Distributed and Grid Computing (PDGC)10.1109/PDGC56933.2022.10053368(372-377)Online publication date: 25-Nov-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media