Performance–energy adaptation of parallel programs in pervasive computing

Liang Zhu¹,
Hai Jin¹,
Xiaofei Liao¹ &
…
Jianhui Yue²

264 Accesses
Explore all metrics

Abstract

It is meaningful to use a little energy to obtain more performance improvement compared with the increased energy. It also makes sense to relax a small quantity of performance restriction to save an enormous amount of energy. Trading a small amount of energy for a considerable sum of performance or vice versa is possible if the relativities between performance and energy of parallel programs are exactly known. This work studies the relativities by recording the performance speedup and energy consumption of parallel programs when the number of cores on which programs run are changed. We demonstrate that the performance improvement and the increased energy consumption have a linear negative correlation.In addition, these relativities can guide us to do performance–energy adaptation under two assumptions. Our experiments show that the average correlation coefficients between performance and energy are higher than 97 %. Furthermore, it can be found that exchanging less than 6 % performance loss for more than 37 % energy consumption is feasible and vise versa.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Runtime Adaptability: The Key for Improving Parallel Applications

The Energy Efficiency Evaluating Method Determining Energy Consumption of the Parallel Program According to Its Profile

Article 01 December 2020

Malleable Techniques and Resource Scheduling to Improve Energy Efficiency in Parallel Applications

Notes

The real speedup is higher than ideal speedup for some parallel programs at a few points. This is because the working set size of per-thread decreases when spawning more threads, thus performance gets improved due to the fact that the cache miss rate of per-thread reduces. The measured execution time is also affected by the precision of sniper simulator.

References

Weldezion AY, Grange M, Pamunuwa D, Lu Z, Jantsch A, Weerasekera R, Tenhunen H (2009) Scalability of network-on-chip communication architecture for 3-D meshes. In: Proceedings of the 2009 3rd IEEE international symposium on networks-on-chip. IEEE Computer Society, New York, pp 114–123
Eyerman S, Du Bois K, Eeckhout L (2012) Speedup stacks: Identifying scaling bottlenecks in multi-threaded applications. In: Proceedings of IEEE international symposium on performance analysis of systems and software (ISPASS ’12). IEEE, New York, pp 145–155
Mikkilineni R, Seyler I (2011) Parallax—a new operating system for scalable, distributed, and parallel computing. In: Proceedings of IEEE international symposium on parallel and distributed processing workshops and Phd forum (IPDPSW ’11). IEEE, New York, pp 976–983
Korthikanti VA, Agha G, Greenstreet M (2011) On the energy complexity of parallel algorithms. In: Proceedings of 2011 international conference on parallel processing (ICPP ’11). IEEE, New York, pp 562–570
Sartori J, Kumar R (2010) Low-overhead, high-speed multi-core barrier synchronization. High performance embedded architectures and compilers. Springer, Berlin, pp 18–34
Chapter Google Scholar
Lakshmanan K, de Niz D, Rajkumar R (2009) Coordinated task scheduling, allocation and synchronization on multiprocessors. In: Proceedings of real-time systems symposium (RTSS ’09). IEEE, New York, pp 469–478
Bhattacharjee A, Martonosi M (2009) Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors. In: Proceedings of the 36th annual international symposium on Computer architecture (HPCA ’09). ACM, New York, pp 290–301
Curtis-Maury M, Shah A, Blagojevic F, Nikolopoulos DS, de Supinski BR, Schulz M (2008) Prediction models for multi-dimensional power-performance optimization on many cores. In: Proceedings of the 17th international conference on Parallel architectures and compilation techniques (PACT ’08), pp 250–259
Sinha A, Wang A, Chandrakasan AP (2000) Algorithmic transforms for efficient energy scalable computation. In: Proceedings of the 2000 international symposium on low power electronics and design. ACM, New York, pp 31–36
Korthikanti VA, Agha G (2010) Avoiding energy wastage in parallel applications. In: Proceedings of 2010 international green computing conference. IEEE, New York, pp 149–163
Hernndez V, Romn JE, Toms A (2007) Parallel Arnoldi eigensolvers with enhanced scalability via global communications rearrangement. Parallel Comput 33(7):521–540
Article MathSciNet Google Scholar
Chen D, Lu D, Tian M, He S, Wang S, Tian J, Li X (2013) Towards energy-efficient parallel analysis of neural signals. Cluster Comput 16(1):1–15
Article Google Scholar
Sasaki H, Tanimoto T, Inoue K, Nakamura H (2012) Scalability-based manycore partitioning. In: Proceedings of the 21st international conference on parallel architectures and compilation techniques (PACT ’12). ACM, New York, pp 107–116
Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of the 17th international conference on parallel architectures and compilation techniques (PACT ’08). ACM, New York, pp 72–81
Carlson TE, Heirman W, Eeckhout L (2011) Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In: Proceedings of 2011 international conference for high performance computing. Networking, storage and analysis. ACM, New York, pp 1–12
Heirman W, Carlson TE, Che S, Skadron K, Eeckhout L (2011) Using cycle stacks to understand scaling bottlenecks in multi-threaded workloads. In: Proceedings of 2011 IEEE international symposium on workload characterization (IISWC). IEEE, New York, pp 38–49
Li S, Ahn JH, Strong RD, Brockman JB, Tullsen DM, Jouppi NP (2009) McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: Proceedings of 42nd annual IEEE international symposium on microarchitecture (MICRO ’42). IEEE, New York, pp 469–480
Brandenburg BB, Calandrino JM, Anderson JH (2008) On the scalability of real-time scheduling algorithms on multicore platforms: a case study. In: Proceedings of real-time systems symposium. IEEE, New York, pp 157–169
Wentzlaff D, Agarwal A (2009) Factored operating systems (fos): the case for a scalable operating system for multicores. ACM SIGOPS Oper Syst Rev 43(2):76–85
Article Google Scholar
Veal B, Foong A (2007) Performance scalability of a multi-core web server. In: Proceedings of the 3rd ACM symposium on architecture for networking and communications systems. ACM, New York, pp 57–66
Merkel A, Stoess J, Bellosa F (2010) Resource-conscious scheduling for energy efficiency on multicore processors. In: Proceedings of the 5th European conference on computer systems. ACM, New York, pp 153–166
Majzoub SS, Saleh RA, Wilton SJ, Ward RK (2010) Energy optimization for many-core platforms: communication and PVT aware voltage-Island formation and voltage selection algorithm. IEEE Trans Comput-Aided Des Integr Circ Syst 29(5):816–829
Article Google Scholar
Meng J, Chen C, Coskun AK, Joshi A (2011) Run-time energy management of manycore systems through reconfigurable interconnects. In: Proceedings of the 21st Great lakes symposium on VLSI. ACM, New York, pp 43–48
Garcia E, Orozco D, Gao G (2011) Energy efficient tiling on a many-core architecture. In: Proceedings of 4th workshop on programmability issues for heterogeneous multicores (MULTIPROG ’11), pp 53–66
Korthikanti VA, Agha G (2009) Analysis of parallel algorithms for energy conservation in scalable multicore architectures. In: Proceedings of international conference on parallel processing (ICPP ’09), pp 212–219
Korthikanti VA, Agha G (2009) Energy bounded scalability analysis of parallel algorithms. Technical report, Department of Computer Science, University of Illinois at Urbana Champaign
Korthikanti VA, Agha G (2010) Energy-performance trade-off analysis of parallel algorithms. In: Proceedings of USENIX workshop on hot topics in parallelism (HotPar ’10)
Seo E, Jeong J, Park S, Lee J (2008) Energy efficient scheduling of real-time tasks on multicore processors. IEEE Trans Parallel Distrib Syst 19(11):1540–1552
Article Google Scholar
Curtis-Maury M, Singh K, McKee SA, Blagojevic F, Nikolopoulos DS, De Supinski BR, Schulz M (2007) Identifying energy-efficient concurrency levels using machine learning. In: Proceedings of IEEE international conference on cluster computing. IEEE, New York, pp 488–495
Li J, Martinez JF (2005) Power-performance implications of thread-level parallelism on chip multiprocessors. In: Proceedings of IEEE international symposium on performance analysis of systems and software (ISPASS ’05). IEEE, New York, pp 124–134

Download references

Acknowledgments

This work was supported by National High-tech Research and Development Program of China (863 Program) under Grant No. 2012AA010905, and China National Natural Science Foundation under Grants Nos. 61272408, 61133006. Doctoral Fund of Ministry of Education of China under Grant No. 20130142110048.

Author information

Authors and Affiliations

Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
Liang Zhu, Hai Jin & Xiaofei Liao
Electrical and Computer Engineering, University of Maine, Maine, 04469, USA
Jianhui Yue

Authors

Liang Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Hai Jin
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofei Liao
View author publications
You can also search for this author in PubMed Google Scholar
Jianhui Yue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hai Jin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, L., Jin, H., Liao, X. et al. Performance–energy adaptation of parallel programs in pervasive computing. J Supercomput 70, 1260–1278 (2014). https://doi.org/10.1007/s11227-014-1226-6

Download citation

Published: 15 June 2014
Issue Date: December 2014
DOI: https://doi.org/10.1007/s11227-014-1226-6

Performance–energy adaptation of parallel programs in pervasive computing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Runtime Adaptability: The Key for Improving Parallel Applications

The Energy Efficiency Evaluating Method Determining Energy Consumption of the Parallel Program According to Its Profile

Malleable Techniques and Resource Scheduling to Improve Energy Efficiency in Parallel Applications

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Performance–energy adaptation of parallel programs in pervasive computing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Runtime Adaptability: The Key for Improving Parallel Applications

The Energy Efficiency Evaluating Method Determining Energy Consumption of the Parallel Program According to Its Profile

Malleable Techniques and Resource Scheduling to Improve Energy Efficiency in Parallel Applications

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation