Abstract
In multi-core processors, energy efficiency and performance consideration are essential issues. Usually, energy-saving techniques result in performance loss and vice versa. Therefore, energy delay product (EDP) is used broadly in many applications as a trade-off between energy saving and performance improvement. This paper presents a technique to perform work-stealing scheduling in the operating system kernel without needing any modification to the user-space program. The proposed scheduling uses predictive models to determine the optimal active number of cores and clock frequency of the processor as an optimum configuration at runtime for any running program to achieve the minimum EDP value. Since EDP is considered as a long-term metric, at runtime, in each specific time frame, PEPS uses the instruction per watt (IPW) to determine the best configuration. By using performance and power predicting models, PEPS finds the optimal configuration in terms of energy efficiency for the next time interval. Because different workloads at runtime have different behaviors and programs with different degrees of parallelization acted variously, the proposed method uses performance counters as a factor for workload characterization. Compared to the Linux scheduler, the proposed algorithm has up to 25% improvement in energy saving at the cost of 7% performance loss. Moreover, while reducing the temperature by 24%, it results in 19% improvement in EDP.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Hennessy J, Patterson D (2006) Computer architecture: a quantitative approach, vol 4. Morgan Kaufman, San Francisco
Moore GE (1965) Cramming more components onto integrated circuits. Electronics 38:114–117
Blumofe RD (1995) Executing multithreaded programs efficiently. Ph.D. thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology
Gautier T, Besseron X, Pigeon L (2007). Kaapi: a thread scheduling runtime system for data flow computations on cluster of multiprocessors. In: Proceedings of the 2007 International Workshop on Parallel Symbolic Computation. ACM, New York, pp 15–23
Leiserson CE, Charles E (2009) The Cilk++ concurrency platform. In: Proceedings of the 46th Annual Design Automation Conference (DAC09), pp 522–527
Duran A, Corbal J and Ayguad Eduard (2008). Evaluation of OpenMP task scheduling strategies. In: Eigenmann R, de Supinski BR (eds) OpenMP in a New Era of Parallelism. IWOMP. Lecture Notes in Computer Science, vol 5004. Springer, Berlin
Charles P, Grothoff C, Saraswat V, Donawa C, Kielstra A, Ebcioglu K, von Praun C, Sarkar V (2005) X10: an object-oriented approach to non-uniform cluster computing. In: OOPSLA’05: Proceedings of the 20th Annual ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications, New York, pp 519–538
Horowitz M, Indermaur T, González R (1994) Low-power digital design. In: Proceedings of 1994 IEEE Symposium on Low Power Electronics, pp 8–11
Sergey Z, Carlos SJ, Sergey B, Alexandra F, Manuel P (2013) Survey of energy-cognizant scheduling techniques. IEEE Trans Parallel Distrib Syst 24:1447–1464
Shinde J, Salankar SS (2011) Clock gating—a power optimizing technique for VLSI circuits. In: 2011 Annual IEEE India Conference, IEEE
Nandita S, Prakash NS, Shalakha D, Sivaranjani D (2015) Power Reduction by clock gating technique. Procedia Technol 21:631–635
Chien TH, Chang RG (2015) Dynamic voltage and frequency scaling optimization for multi-core architectures. In: Intelligent Systems and Applications: Proceedings of the International
Donald J, Martonosi M (2006) Techniques for multi-core thermal management: classification and new exploration. ACM SIGARCH Comput Archit News 34:2
Zanini F, Atienza D, Benini L, Micheli G (2009) Multi-core thermal management with model predictive control. In: European Conference Circuit Theory and Design (ECCTD), vol 1, pp 711–714
Wang Y, Ma K, Wang X (2009) Temperature-constrained power control for chip multiprocessors with online model estimation. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, pp 314–324
Cui Y, Zhang W, He B (2017) A variation-aware adaptive fuzzy control system for thermal management of microprocessors. IEEE Trans Large Scale Integr (VLSI) Syst 25:683–695
Alrabea A, Alzubi OA, Alzubi JA (2020) A task-based model for minimizing energy consumption in WSNs. Energy Syst 29:1423–1431
Lawler EL, Labetoulle J (1978) On preemptive scheduling of unrelated parallel processors by linear programming. J ACM (JACM) 25:612–619
Bailis P, Reddi VJ, Gandhi S, Brooks D, Seltzer M (2011) Dimetrodon: processor-level preventive thermal management via idle cycle injection. In: IEEE 48th ACM/EDAC/IEEE Design Automation Conference (DAC), New York, USA
Chadha G, Mahlke S, Narayanasamy S (2012) When less is more (LIMO): controlled parallelism for improved efficiency. In: Proceedings of the 2012 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, 2012. CASES, pp 141–150
Charr JC, Couturier R, Fanfakh A, Giersch A (2014) Dynamic frequency scaling for energy consumption reduction in synchronous distributed applications. In: IEEE International Symposium on Parallel and Distributed Processing with Applications
Chien TH, Chang RG (2015) Dynamic voltage and frequency scaling optimization for multi-core architectures. In: Intelligent System and Applications, 2015
Chen Q, Guo M (2018) Contention and locality-aware work-stealing for iterative applications in multi-socket computers. IEEE Trans Comput 67:784–798
Cochran R, Hankendi C, Coskun A, Reda S (2011) Identifying the optimal energy-efficient operating points of parallel workloads. In: IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
Ju T et al (2016) Thread count prediction model: dynamically adjusting threads for heterogeneous many-core systems. In: IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)
Wang W, Davidson JW, Soffa ML (2016) Predicting the memory bandwidth and optimal core allocations for multi-threaded applications on large-scale NUMA machines. In: IEEE International Symposium on High Performance Computer Architecture (HPCA), Barcelona, Spain
De Daniele S, Torquati M, Danelutto M (2016) A reconfiguration algorithm for power-aware parallel applications. ACM Trans Archit Code Optim 43:1–25
Silva VRG, Furtunato A, Georgiou K, Eder K, Xavier-de-Souza S (2018) Energy-optimal configuration for single-node HPC applications. http://arxiv.org/abs/1805.00998
Blumofe RD, Leiserson CE, Santa Fe (1995) Scheduling multithreaded computations by work stealing. In: Proceedings of the 35th Annual Symposium on Foundations of Computer Science, vol 46. Journal of the ACM, New Mexico, pp 356–368
Imam S, Sarkar V, Träff J, Hunold S, Versaci F (2015) Load balancing prioritized tasks via work-stealing. In: Euro-Par 2015: Parallel Processing. Lecture notes in Computer Science, vol 9233
Guo Y et al (2010) SLAW: a scalable locality-aware adaptive work-stealing scheduler. In: IEEE International Symposium on Parallel & Distributed Processing (IPDPS), Atlanta, GA, USA, pp 1–12
Liu YD, Binghamton SUNY (2012) Green thieves in work stealing. In: ASPLOS’12 (Provactive Ideas session)
Ribic H, Liu YD (2014) Energy-efficient work-stealing language runtimes. ACM SIGARCH Comput Archit News 4:513–528
Shankar S, Lakomski G, Alvarado C, Hay R (2014) Power aware work-stealing in homogeneous multi-core systems. In: FUTURE COMPUTING: the Sixth International Conference on Future Computational Technologies and Applications
Chen Q, Zheng L, Guo M, Phoenix HZ (2014) EEWA: energy-efficient workload-aware task scheduling in multi-core architectures. IEEE, AZ, USA
Quan C, Minyi G (2018) Contention and locality-aware work-stealing for iterative applications in multi-socket computers. IEEE Trans Comput 67:784–798
https://github.com/SakalisC/Splash-3/tree/master/codes. Accessed 26 Mar 2020
Al-hayanni MA et al (2020) PARMA: parallelization-aware run-time management for energy-efficient many-core systems. IEEE Trans Comput (Early Access) 69:1507–1518
Salami B, Noori H, Naghibzadeh M (2020) Fairness-aware energy efficient scheduling on heterogeneous multi-core processors. IEEE Trans Comput 70:72–82
Blumofe RD, Leiserson CE (1994) Scheduling multithreaded computations by work stealing. In: Proceeding of the 35th Annual Symposium on Foundations of Computer Science, Santa Fe, New Mexico, pp 356–368
Bircher WL, John LK, San J (2007) Complete system power estimation: a trickle-down approach based on performance events. In: IEEE International Symposium on Performance Analysis of Systems & Software, CA, USA
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Brodowski D, Golde N (2015) CPU frequency and voltage scaling code in the Linux (TM) kernel. Linux CPUFreq. CPUFreq Governors
Kim S-W, Lee JJ-S, Dugar V, De Vega J (2014) Intel® power gadget. Intel Corporation, vol 7
Eranian S (2006) Perfmon2: a flexible performance monitoring interface for Linux. In: Proceeding of the Ottawa Linux Symposium
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Maghsoud, Z., Noori, H. & Pour Mozaffari, S. PEPS: predictive energy-efficient parallel scheduler for multi-core processors. J Supercomput 77, 6566–6585 (2021). https://doi.org/10.1007/s11227-020-03562-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-020-03562-x