Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Performance-Energy Trade-off in Modern CMPs

Published: 30 December 2020 Publication History

Abstract

Chip multiprocessors (CMPs) are ubiquitous in all computing systems ranging from high-end servers to mobile devices. In these systems, energy consumption is a critical design constraint as it constitutes the most significant operating cost for computing clouds. Analogous to this, longer battery life continues to be an essential user concern in mobile devices. To optimize on power consumption, modern processors are designed with Dynamic Voltage and Frequency Scaling (DVFS) support at the individual core as well as the uncore level. This allows fine-grained control of performance and energy. For an n core processor with m core and uncore frequency choices, the total DVFS configuration space is now m(n+1) (with the uncore accounting for the + 1). In addition to that, in CMPs, the performance-energy trade-off due to core/uncore frequency scaling concerning a single application cannot be determined independently as cores share critical resources like the last level cache (LLC) and the memory. Thus, unlike the uni-processor environment, the energy consumption of an application running on a CMP depends not only on its characteristics but also on those of its co-runners (applications running on other cores). The key objective of our work is to select a suitable core and uncore frequency that minimizes power consumption while limiting application performance degradation within certain pre-defined limits (can be termed as QoS requirements). The key contribution of our work is a learning-based model that is able to capture the interference due to shared cache, bus bandwidth, and memory bandwidth between applications running on multiple cores and predict near-optimal frequencies for core and uncore.

References

[1]
Solomon Abera, M. Balakrishnan, and Anshul Kumar. 2017. PLSS: A scheduler for multi-core embedded systems. In Architecture of Computing Systems (ARCS’17). Springer International Publishing, Cham, 164--176
[2]
Solomon Abera, M. Balakrishnan, and Anshul Kumar. 2018. Performance-energy trade-off in CMPs with per-core DVFS. In Architecture of Computing Systems (ARCS’18). Springer International Publishing, Cham, 225--238.
[3]
B. Acun, K. Chandrasekar, and L. V. Kale. 2019. Fine-grained energy efficiency using per-core DVFS with an adaptive runtime system. In 2019 10th International Green and Sustainable Computing Conference (IGSC’19). 1--8.
[4]
Solomon Abera Bekele, M. Balakrishnan, and Anshul Kumar. 2019. ML guided energy-performance trade-off estimation for uncore frequency scaling. In Proceedings of the High Performance Computing Symposium (HPC'19). Society for Computer Simulation International, San Diego, CA, USA, Article 6, 1–12. http://dl.acm.org/citation.cfm?id=3338075.3338081.
[5]
Jacob Benesty et al. 2009. Pearson correlation coefficient. In Noise Reduction in Speech Processing. Springer, 37--40.
[6]
Leo Breiman. 2001. Random forests. Machine Learning 45, 1 (2001), 5--32.
[7]
D. Brodowski, N. Golde, R. J. Wysocki, and V. Kumar. 2017. Linux CPUFreq Governors - Information for Users and Developers. Linux Kernel. Retrieved from https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt.
[8]
James Bucek et al. 2018. SPEC CPU2017: Next-generation compute benchmark. In Companion of the 2018 ACM/SPEC ICPE’18. ACM, 41--42.
[9]
Rajkumar Buyya et al. [n.d.]. Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Gener. Comput. Syst. ([n.d.]), 599--616.
[10]
M. Chang and W. Liang. 2011. Learning-directed dynamic voltage and frequency scaling for computation time prediction. In 2011 IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications.
[11]
Xi Chen et al. 2013. Dynamic voltage and frequency scaling for shared resources in multicore processor designs. In Proceedings of DAC’13. Article 114, 7 pages.
[12]
Kihwan Choi, Ramakrishna Soma, and Massoud Pedram. 2004. Fine-grained dynamic voltage and frequency scaling for precise energy and performance trade-off based on the ratio of off-chip access to on-chip computation times. In Proceedings of DATE’04 - Volume 1. IEEE Computer Society, 10004.
[13]
R. Cochran, C. Hankendi, A. K. Coskun, and S. Reda. 2011. Pack cap: Adaptive DVFS and thread packing under power caps. In 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’11). 175--185.
[14]
Gaurav Dhiman, Giacomo Marchetti, and Tajana Rosing. 2010. vGreen: A system for energy-efficient management of virtual machines. ACM Trans. Des. Autom. Electron. Syst. 16, 1, Article 6 (Nov. 2010) 27 pages.
[15]
Gaurav Dhiman and Tajana Simunic Rosing. 2007. Dynamic voltage frequency scaling for multi-tasking systems using online learning. In Proceedings of the 2007 International Symposium on Low Power Electronics and Design (ISLPED’07). ACM, New York, NY, 207--212.
[16]
Vishal Gupta et al. 2012. The forgotten ‘Uncore’: On the energy-efficiency of heterogeneous cores. Presented as part of the 2012 USENIX Annual Technical Conference (USENIX ATC’12).
[17]
Z. Wang et al. 2017. Modular reinforcement learning for self-adaptive energy efficiency optimization in multicore system. In 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC’17).
[18]
Xiaobo Fan, Wolf-Dietrich Weber, and Luiz Andre Barroso. 2007. Power provisioning for a warehouse-sized computer. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07). Association for Computing Machinery, New York, NY,13--23.
[19]
Y. Ge and Q. Qiu. 2011. Dynamic thermal management for multimedia applications using machine learning. In 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC’11). 95--100.
[20]
Mark Hall et al. 2009. The WEKA data mining software: An update. SIGKDD Explor. Newsl. 11, 1 (Nov. 2009), 10--18.
[21]
Intel. 2007. Intel 64 and IA-32 Architectures Software Developer’s Manual - Volume 3B. Intel Corporation.
[22]
Jeabin Lee, Byeong-Gyu Nam, and Hoi-Jun Yoo. 2007. Dynamic voltage and frequency scaling (DVFS) scheme for multi-domains power management. In 2007 IEEE Asian Solid-State Circuits Conference. 360--363.
[23]
Da-Cheng Juan and Diana Marculescu. 2012. Power-aware performance increase via core/uncore reinforcement control for chip-multiprocessors. In Proceedings of ISLPED’12.
[24]
Linux Kernel. 2011. Profiling with perf. Retrieved from https://perf.wiki.kernel.org/index.php/Tutorial.
[25]
Sung Il Kim et al. 2013. Using DVFS and task scheduling algorithms for a hard real-time heterogeneous multicore processor environment. In Proceedings of EEHPDC’13. ACM, 23--30.
[26]
J. S. Lee, K. Skadron, and S. W. Chung. 2010. Predictive temperature-aware DVFS. IEEE Trans. Comput. 59, 1 (2010), 127--133.
[27]
W. Liang, S. Chen, Y. Chang, and J. Fang. 2008. Memory-aware dynamic voltage and frequency prediction for portable devices. In 2008 14th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications.
[28]
Andreas Merkel, Jan Stoess, and Frank Bellosa. 2010. Resource-conscious scheduling for energy efficiency on multicore processors. In Proceedings of EuroSys’10. ACM, New York, NY, 153--166.
[29]
Michael Moeng and Rami Melhem. 2010. Applying statistical machine learning to multicore voltage and frequency scaling. In Proceedings of CF’10 (CF’10). ACM, 277--286.
[30]
V. Pallipadi and A. Starikovskiy. 2006. The ondemand governor. In Proceeding of Linux Symposium.
[31]
Yogesh Sharma, Bahman Javadi, Weisheng Si, and Daniel Sun. 2016. Reliability and energy efficiency in cloud computing systems. J. Netw. Comput. Appl. 74, C (October 2016), 66--85.
[32]
H. Shen and Q. Qiu. 2014. Contention aware frequency scaling on CMPs with guaranteed quality of service. In 2014 Design, Automation Test in Europe Conference Exhibition (DATE’14). 1--6.
[33]
Sheng Yang et al. 2015. Adaptive energy minimization of embedded heterogeneous systems using regression-based learning. In 2015 25th PATMOS. 103--110.
[34]
Vaibhav Sundriyal, Masha Sosonkina, Bryce M. Westheimer, and Mark Gordon. 2018. Comparisons of core and uncore frequency scaling modes in quantum chemistry application GAMESS. In Proceedings of HPC’18.
[35]
F. M. M. u. Islam and M. Lin. 2015. A framework for learning based DVFS technique selection and frequency scaling for multi-core real-time systems. In 2015 IEEE 17th International Conference on High Performance Computing and Communications. 721--726.
[36]
F. M. M. u. Islam and M. Lin. 2015. A framework for learning based DVFS technique selection and frequency scaling for multi-core real-time systems. In 2015 IEEE 17th International Conference on High Performance Computing and Communications. 721--726.
[37]
Mark Weiser et al. 1994. Scheduling for reduced CPU energy. In Proceedings of the 1st USENIX (OSDI’94). USENIX Association, Berkeley, CA, Article 2. http://dl.acm.org/citation.cfm?id=1267638.1267640
[38]
J. Won, X. Chen, P. Gratz, J. Hu, and V. Soteriou. 2014. Up by their bootstraps: Online learning in Artificial Neural Networks for CMP uncore power management. In HPCA’14. 308--319.
[39]
Qiang Wu et al. 2005. A dynamic compilation framework for controlling microprocessor energy and performance. In Proceedings of the 38th MICRO. IEEE Computer Society, 271--282.
[40]
Fen Xie, Margaret Martonosi, and Sharad Malik. 2003. Compile-time dynamic voltage scaling settings: Opportunities and limits. In Proceedings of the ACM SIGPLAN 2003 (PLDI’03). ACM, 49--62.
[41]
Dakai Zhu et al. [n.d.]. Scheduling with dynamic voltage/speed adjustment using slack reclamation in multiprocessor real-time systems. IEEE Trans. Parallel Distrib. Syst. 14, 7 ([n.d.]), 686--700.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 18, Issue 1
March 2021
402 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/3446348
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 December 2020
Accepted: 01 September 2020
Revised: 01 August 2020
Received: 01 December 2019
Published in TACO Volume 18, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Resource contention
  2. machine learning
  3. performance-energy trade-off

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 837
    Total Downloads
  • Downloads (Last 12 months)240
  • Downloads (Last 6 weeks)39
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media