research-article

Open access

Performance-Energy Trade-off in Modern CMPs

Authors:

M. Balakrishnan,

Anshul KumarAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 18, Issue 1

Article No.: 3, Pages 1 - 26

https://doi.org/10.1145/3427092

Published: 30 December 2020 Publication History

All formats PDF

Abstract

Chip multiprocessors (CMPs) are ubiquitous in all computing systems ranging from high-end servers to mobile devices. In these systems, energy consumption is a critical design constraint as it constitutes the most significant operating cost for computing clouds. Analogous to this, longer battery life continues to be an essential user concern in mobile devices. To optimize on power consumption, modern processors are designed with Dynamic Voltage and Frequency Scaling (DVFS) support at the individual core as well as the uncore level. This allows fine-grained control of performance and energy. For an n core processor with m core and uncore frequency choices, the total DVFS configuration space is now m⁽ⁿ⁺¹⁾ (with the uncore accounting for the + 1). In addition to that, in CMPs, the performance-energy trade-off due to core/uncore frequency scaling concerning a single application cannot be determined independently as cores share critical resources like the last level cache (LLC) and the memory. Thus, unlike the uni-processor environment, the energy consumption of an application running on a CMP depends not only on its characteristics but also on those of its co-runners (applications running on other cores). The key objective of our work is to select a suitable core and uncore frequency that minimizes power consumption while limiting application performance degradation within certain pre-defined limits (can be termed as QoS requirements). The key contribution of our work is a learning-based model that is able to capture the interference due to shared cache, bus bandwidth, and memory bandwidth between applications running on multiple cores and predict near-optimal frequencies for core and uncore.

References

[1]

Solomon Abera, M. Balakrishnan, and Anshul Kumar. 2017. PLSS: A scheduler for multi-core embedded systems. In Architecture of Computing Systems (ARCS’17). Springer International Publishing, Cham, 164--176

[2]

Solomon Abera, M. Balakrishnan, and Anshul Kumar. 2018. Performance-energy trade-off in CMPs with per-core DVFS. In Architecture of Computing Systems (ARCS’18). Springer International Publishing, Cham, 225--238.

[3]

B. Acun, K. Chandrasekar, and L. V. Kale. 2019. Fine-grained energy efficiency using per-core DVFS with an adaptive runtime system. In 2019 10th International Green and Sustainable Computing Conference (IGSC’19). 1--8.

[4]

Solomon Abera Bekele, M. Balakrishnan, and Anshul Kumar. 2019. ML guided energy-performance trade-off estimation for uncore frequency scaling. In Proceedings of the High Performance Computing Symposium (HPC'19). Society for Computer Simulation International, San Diego, CA, USA, Article 6, 1–12. http://dl.acm.org/citation.cfm?id=3338075.3338081.

Digital Library

[5]

Jacob Benesty et al. 2009. Pearson correlation coefficient. In Noise Reduction in Speech Processing. Springer, 37--40.

[6]

Leo Breiman. 2001. Random forests. Machine Learning 45, 1 (2001), 5--32.

Digital Library

[7]

D. Brodowski, N. Golde, R. J. Wysocki, and V. Kumar. 2017. Linux CPUFreq Governors - Information for Users and Developers. Linux Kernel. Retrieved from https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt.

[8]

James Bucek et al. 2018. SPEC CPU2017: Next-generation compute benchmark. In Companion of the 2018 ACM/SPEC ICPE’18. ACM, 41--42.

[9]

Rajkumar Buyya et al. [n.d.]. Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Gener. Comput. Syst. ([n.d.]), 599--616.

[10]

M. Chang and W. Liang. 2011. Learning-directed dynamic voltage and frequency scaling for computation time prediction. In 2011 IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications.

[11]

Xi Chen et al. 2013. Dynamic voltage and frequency scaling for shared resources in multicore processor designs. In Proceedings of DAC’13. Article 114, 7 pages.

[12]

Kihwan Choi, Ramakrishna Soma, and Massoud Pedram. 2004. Fine-grained dynamic voltage and frequency scaling for precise energy and performance trade-off based on the ratio of off-chip access to on-chip computation times. In Proceedings of DATE’04 - Volume 1. IEEE Computer Society, 10004.

[13]

R. Cochran, C. Hankendi, A. K. Coskun, and S. Reda. 2011. Pack cap: Adaptive DVFS and thread packing under power caps. In 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’11). 175--185.

[14]

Gaurav Dhiman, Giacomo Marchetti, and Tajana Rosing. 2010. vGreen: A system for energy-efficient management of virtual machines. ACM Trans. Des. Autom. Electron. Syst. 16, 1, Article 6 (Nov. 2010) 27 pages.

Digital Library

[15]

Gaurav Dhiman and Tajana Simunic Rosing. 2007. Dynamic voltage frequency scaling for multi-tasking systems using online learning. In Proceedings of the 2007 International Symposium on Low Power Electronics and Design (ISLPED’07). ACM, New York, NY, 207--212.

Digital Library

[16]

Vishal Gupta et al. 2012. The forgotten ‘Uncore’: On the energy-efficiency of heterogeneous cores. Presented as part of the 2012 USENIX Annual Technical Conference (USENIX ATC’12).

[17]

Z. Wang et al. 2017. Modular reinforcement learning for self-adaptive energy efficiency optimization in multicore system. In 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC’17).

[18]

Xiaobo Fan, Wolf-Dietrich Weber, and Luiz Andre Barroso. 2007. Power provisioning for a warehouse-sized computer. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07). Association for Computing Machinery, New York, NY,13--23.

Digital Library

[19]

Y. Ge and Q. Qiu. 2011. Dynamic thermal management for multimedia applications using machine learning. In 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC’11). 95--100.

[20]

Mark Hall et al. 2009. The WEKA data mining software: An update. SIGKDD Explor. Newsl. 11, 1 (Nov. 2009), 10--18.

[21]

Intel. 2007. Intel 64 and IA-32 Architectures Software Developer’s Manual - Volume 3B. Intel Corporation.

[22]

Jeabin Lee, Byeong-Gyu Nam, and Hoi-Jun Yoo. 2007. Dynamic voltage and frequency scaling (DVFS) scheme for multi-domains power management. In 2007 IEEE Asian Solid-State Circuits Conference. 360--363.

[23]

Da-Cheng Juan and Diana Marculescu. 2012. Power-aware performance increase via core/uncore reinforcement control for chip-multiprocessors. In Proceedings of ISLPED’12.

Digital Library

[24]

Linux Kernel. 2011. Profiling with perf. Retrieved from https://perf.wiki.kernel.org/index.php/Tutorial.

[25]

Sung Il Kim et al. 2013. Using DVFS and task scheduling algorithms for a hard real-time heterogeneous multicore processor environment. In Proceedings of EEHPDC’13. ACM, 23--30.

[26]

J. S. Lee, K. Skadron, and S. W. Chung. 2010. Predictive temperature-aware DVFS. IEEE Trans. Comput. 59, 1 (2010), 127--133.

Digital Library

[27]

W. Liang, S. Chen, Y. Chang, and J. Fang. 2008. Memory-aware dynamic voltage and frequency prediction for portable devices. In 2008 14th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications.

[28]

Andreas Merkel, Jan Stoess, and Frank Bellosa. 2010. Resource-conscious scheduling for energy efficiency on multicore processors. In Proceedings of EuroSys’10. ACM, New York, NY, 153--166.

Digital Library

[29]

Michael Moeng and Rami Melhem. 2010. Applying statistical machine learning to multicore voltage and frequency scaling. In Proceedings of CF’10 (CF’10). ACM, 277--286.

Digital Library

[30]

V. Pallipadi and A. Starikovskiy. 2006. The ondemand governor. In Proceeding of Linux Symposium.

[31]

Yogesh Sharma, Bahman Javadi, Weisheng Si, and Daniel Sun. 2016. Reliability and energy efficiency in cloud computing systems. J. Netw. Comput. Appl. 74, C (October 2016), 66--85.

Digital Library

[32]

H. Shen and Q. Qiu. 2014. Contention aware frequency scaling on CMPs with guaranteed quality of service. In 2014 Design, Automation Test in Europe Conference Exhibition (DATE’14). 1--6.

[33]

Sheng Yang et al. 2015. Adaptive energy minimization of embedded heterogeneous systems using regression-based learning. In 2015 25th PATMOS. 103--110.

[34]

Vaibhav Sundriyal, Masha Sosonkina, Bryce M. Westheimer, and Mark Gordon. 2018. Comparisons of core and uncore frequency scaling modes in quantum chemistry application GAMESS. In Proceedings of HPC’18.

[35]

F. M. M. u. Islam and M. Lin. 2015. A framework for learning based DVFS technique selection and frequency scaling for multi-core real-time systems. In 2015 IEEE 17th International Conference on High Performance Computing and Communications. 721--726.

[36]

F. M. M. u. Islam and M. Lin. 2015. A framework for learning based DVFS technique selection and frequency scaling for multi-core real-time systems. In 2015 IEEE 17th International Conference on High Performance Computing and Communications. 721--726.

[37]

Mark Weiser et al. 1994. Scheduling for reduced CPU energy. In Proceedings of the 1st USENIX (OSDI’94). USENIX Association, Berkeley, CA, Article 2. http://dl.acm.org/citation.cfm?id=1267638.1267640

[38]

J. Won, X. Chen, P. Gratz, J. Hu, and V. Soteriou. 2014. Up by their bootstraps: Online learning in Artificial Neural Networks for CMP uncore power management. In HPCA’14. 308--319.

[39]

Qiang Wu et al. 2005. A dynamic compilation framework for controlling microprocessor energy and performance. In Proceedings of the 38th MICRO. IEEE Computer Society, 271--282.

[40]

Fen Xie, Margaret Martonosi, and Sharad Malik. 2003. Compile-time dynamic voltage scaling settings: Opportunities and limits. In Proceedings of the ACM SIGPLAN 2003 (PLDI’03). ACM, 49--62.

Digital Library

[41]

Dakai Zhu et al. [n.d.]. Scheduling with dynamic voltage/speed adjustment using slack reclamation in multiprocessor real-time systems. IEEE Trans. Parallel Distrib. Syst. 14, 7 ([n.d.]), 686--700.

Cited By

Index Terms

Performance-Energy Trade-off in Modern CMPs
1. Computer systems organization

Recommendations

ML guided energy-performance trade-off estimation for uncore frequency scaling
HPC '19: Proceedings of the High Performance Computing Symposium

Chip multiprocessors (CMPs) - also called multicores - have become the main architectural component for computing systems ranging from high-end servers to hand-held devices. CMPs enhance performance through parallelism by permitting multi-programmed/...
Static energy reduction by performance linked cache capacity management in tiled CMPs
SAC '15: Proceedings of the 30th Annual ACM Symposium on Applied Computing

With the rapid growth in semiconductor technology, modern processor chips have multiple number of processor cores with multi-level on-chip caches. Recent study about the chip power consumption indicates that, the principal amount of chip power is ...
Energy-Efficient Hardware Prefetching for CMPs Using Heterogeneous Interconnects
PDP '10: Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing

In the last years high performance processor designs have evolved toward Chip-Multiprocessor (CMP) architectures that implement multiple processing cores on a single die. As the number of cores inside a CMP increases, the on-chip interconnection network ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 18, Issue 1

March 2021

402 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/3446348

Editor:
David Kaeli
Northeastern University, USA

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 December 2020

Accepted: 01 September 2020

Revised: 01 August 2020

Received: 01 December 2019

Published in TACO Volume 18, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
837
Total Downloads

Downloads (Last 12 months)240
Downloads (Last 6 weeks)39

Reflects downloads up to 21 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents