Abstract
It is meaningful to use a little energy to obtain more performance improvement compared with the increased energy. It also makes sense to relax a small quantity of performance restriction to save an enormous amount of energy. Trading a small amount of energy for a considerable sum of performance or vice versa is possible if the relativities between performance and energy of parallel programs are exactly known. This work studies the relativities by recording the performance speedup and energy consumption of parallel programs when the number of cores on which programs run are changed. We demonstrate that the performance improvement and the increased energy consumption have a linear negative correlation.In addition, these relativities can guide us to do performance–energy adaptation under two assumptions. Our experiments show that the average correlation coefficients between performance and energy are higher than 97 %. Furthermore, it can be found that exchanging less than 6 % performance loss for more than 37 % energy consumption is feasible and vise versa.
Similar content being viewed by others
Notes
The real speedup is higher than ideal speedup for some parallel programs at a few points. This is because the working set size of per-thread decreases when spawning more threads, thus performance gets improved due to the fact that the cache miss rate of per-thread reduces. The measured execution time is also affected by the precision of sniper simulator.
References
Weldezion AY, Grange M, Pamunuwa D, Lu Z, Jantsch A, Weerasekera R, Tenhunen H (2009) Scalability of network-on-chip communication architecture for 3-D meshes. In: Proceedings of the 2009 3rd IEEE international symposium on networks-on-chip. IEEE Computer Society, New York, pp 114–123
Eyerman S, Du Bois K, Eeckhout L (2012) Speedup stacks: Identifying scaling bottlenecks in multi-threaded applications. In: Proceedings of IEEE international symposium on performance analysis of systems and software (ISPASS ’12). IEEE, New York, pp 145–155
Mikkilineni R, Seyler I (2011) Parallax—a new operating system for scalable, distributed, and parallel computing. In: Proceedings of IEEE international symposium on parallel and distributed processing workshops and Phd forum (IPDPSW ’11). IEEE, New York, pp 976–983
Korthikanti VA, Agha G, Greenstreet M (2011) On the energy complexity of parallel algorithms. In: Proceedings of 2011 international conference on parallel processing (ICPP ’11). IEEE, New York, pp 562–570
Sartori J, Kumar R (2010) Low-overhead, high-speed multi-core barrier synchronization. High performance embedded architectures and compilers. Springer, Berlin, pp 18–34
Lakshmanan K, de Niz D, Rajkumar R (2009) Coordinated task scheduling, allocation and synchronization on multiprocessors. In: Proceedings of real-time systems symposium (RTSS ’09). IEEE, New York, pp 469–478
Bhattacharjee A, Martonosi M (2009) Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors. In: Proceedings of the 36th annual international symposium on Computer architecture (HPCA ’09). ACM, New York, pp 290–301
Curtis-Maury M, Shah A, Blagojevic F, Nikolopoulos DS, de Supinski BR, Schulz M (2008) Prediction models for multi-dimensional power-performance optimization on many cores. In: Proceedings of the 17th international conference on Parallel architectures and compilation techniques (PACT ’08), pp 250–259
Sinha A, Wang A, Chandrakasan AP (2000) Algorithmic transforms for efficient energy scalable computation. In: Proceedings of the 2000 international symposium on low power electronics and design. ACM, New York, pp 31–36
Korthikanti VA, Agha G (2010) Avoiding energy wastage in parallel applications. In: Proceedings of 2010 international green computing conference. IEEE, New York, pp 149–163
Hernndez V, Romn JE, Toms A (2007) Parallel Arnoldi eigensolvers with enhanced scalability via global communications rearrangement. Parallel Comput 33(7):521–540
Chen D, Lu D, Tian M, He S, Wang S, Tian J, Li X (2013) Towards energy-efficient parallel analysis of neural signals. Cluster Comput 16(1):1–15
Sasaki H, Tanimoto T, Inoue K, Nakamura H (2012) Scalability-based manycore partitioning. In: Proceedings of the 21st international conference on parallel architectures and compilation techniques (PACT ’12). ACM, New York, pp 107–116
Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of the 17th international conference on parallel architectures and compilation techniques (PACT ’08). ACM, New York, pp 72–81
Carlson TE, Heirman W, Eeckhout L (2011) Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In: Proceedings of 2011 international conference for high performance computing. Networking, storage and analysis. ACM, New York, pp 1–12
Heirman W, Carlson TE, Che S, Skadron K, Eeckhout L (2011) Using cycle stacks to understand scaling bottlenecks in multi-threaded workloads. In: Proceedings of 2011 IEEE international symposium on workload characterization (IISWC). IEEE, New York, pp 38–49
Li S, Ahn JH, Strong RD, Brockman JB, Tullsen DM, Jouppi NP (2009) McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: Proceedings of 42nd annual IEEE international symposium on microarchitecture (MICRO ’42). IEEE, New York, pp 469–480
Brandenburg BB, Calandrino JM, Anderson JH (2008) On the scalability of real-time scheduling algorithms on multicore platforms: a case study. In: Proceedings of real-time systems symposium. IEEE, New York, pp 157–169
Wentzlaff D, Agarwal A (2009) Factored operating systems (fos): the case for a scalable operating system for multicores. ACM SIGOPS Oper Syst Rev 43(2):76–85
Veal B, Foong A (2007) Performance scalability of a multi-core web server. In: Proceedings of the 3rd ACM symposium on architecture for networking and communications systems. ACM, New York, pp 57–66
Merkel A, Stoess J, Bellosa F (2010) Resource-conscious scheduling for energy efficiency on multicore processors. In: Proceedings of the 5th European conference on computer systems. ACM, New York, pp 153–166
Majzoub SS, Saleh RA, Wilton SJ, Ward RK (2010) Energy optimization for many-core platforms: communication and PVT aware voltage-Island formation and voltage selection algorithm. IEEE Trans Comput-Aided Des Integr Circ Syst 29(5):816–829
Meng J, Chen C, Coskun AK, Joshi A (2011) Run-time energy management of manycore systems through reconfigurable interconnects. In: Proceedings of the 21st Great lakes symposium on VLSI. ACM, New York, pp 43–48
Garcia E, Orozco D, Gao G (2011) Energy efficient tiling on a many-core architecture. In: Proceedings of 4th workshop on programmability issues for heterogeneous multicores (MULTIPROG ’11), pp 53–66
Korthikanti VA, Agha G (2009) Analysis of parallel algorithms for energy conservation in scalable multicore architectures. In: Proceedings of international conference on parallel processing (ICPP ’09), pp 212–219
Korthikanti VA, Agha G (2009) Energy bounded scalability analysis of parallel algorithms. Technical report, Department of Computer Science, University of Illinois at Urbana Champaign
Korthikanti VA, Agha G (2010) Energy-performance trade-off analysis of parallel algorithms. In: Proceedings of USENIX workshop on hot topics in parallelism (HotPar ’10)
Seo E, Jeong J, Park S, Lee J (2008) Energy efficient scheduling of real-time tasks on multicore processors. IEEE Trans Parallel Distrib Syst 19(11):1540–1552
Curtis-Maury M, Singh K, McKee SA, Blagojevic F, Nikolopoulos DS, De Supinski BR, Schulz M (2007) Identifying energy-efficient concurrency levels using machine learning. In: Proceedings of IEEE international conference on cluster computing. IEEE, New York, pp 488–495
Li J, Martinez JF (2005) Power-performance implications of thread-level parallelism on chip multiprocessors. In: Proceedings of IEEE international symposium on performance analysis of systems and software (ISPASS ’05). IEEE, New York, pp 124–134
Acknowledgments
This work was supported by National High-tech Research and Development Program of China (863 Program) under Grant No. 2012AA010905, and China National Natural Science Foundation under Grants Nos. 61272408, 61133006. Doctoral Fund of Ministry of Education of China under Grant No. 20130142110048.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhu, L., Jin, H., Liao, X. et al. Performance–energy adaptation of parallel programs in pervasive computing. J Supercomput 70, 1260–1278 (2014). https://doi.org/10.1007/s11227-014-1226-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-014-1226-6