Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-031-30442-2_20guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Proactive Task Offloading for Load Balancing in Iterative Applications

Published: 28 April 2023 Publication History

Abstract

Load imbalance is often a challenge for applications in parallel systems. Static cost models and pre-partitioning algorithms distribute the load at the beginning. Nevertheless, dynamic changes during execution or inaccurate cost indicators may lead to imbalance at runtime. Reactive work-stealing strategies can help monitor the execution and perform task migration to balance the load. However, the benefits depend on migration overhead and assumption about future execution.
Our proactive approach further improves existing solutions by applying machine learning to online load prediction. Following that, we propose a fully distributed algorithm for adapting the prediction result to guide task offloading. The experiments are performed with an artificial test case and a realistic application named Sam(oa)2 on three systems with different communication overhead. Our results confirm improvements for important use cases compared to previous solutions. Furthermore, this approach can support co-scheduling tasks across multiple applications.

References

[1]
Amiri M et al. Survey on prediction models of applications for resources provisioning in cloud J. Netw. Comput. Appl. 2017 82 93-113
[2]
Blumofe RD, Joerg CF, et al. Cilk: an efficient multithreaded runtime system SIGPLAN Not. 1995 30 8 207-216
[3]
Carrington, L.C., Laurenzano, M., et al.: How well can simple metrics represent the performance of HPC applications? In: Proceedings of the ACM/IEEE Conference on Supercomputing (2015).
[4]
Catalyurek, U.V., Boman, E.G., et al.: Hypergraph-based dynamic load balancing for adaptive scientific computations. In: International Parallel and Distributed Processing Symposium, pp. 1–11 (2007).
[5]
Chow, Y.C., et al.: Models for dynamic load balancing in a heterogeneous multiple processor system. IEEE Trans. Comput. C-28(5), 354–361 (1979)
[6]
Chung, M.T., Kranzlmüller, D.: User-defined tools for characterizing task-parallel applications and predicting load imbalance. In: 15th International Conference on Advanced Computing and Applications (ACOMP), pp. 98–105 (2021).
[7]
Corradi A, Leonardi L, and Zambonelli F Diffusive load-balancing policies for dynamic applications IEEE Concurrency 1999 7 1 22-31
[8]
Delimitrou C and Kozyrakis C Quasar: resource-efficient and GOS-aware cluster management SIGPLAN Not. 2014 49 4 127-144
[9]
Dinan, J., Larkins, D.B., et al.: Scalable work stealing. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (2009).
[10]
Freitas V, Pilla LL, et al. Packsteallb: a scalable distributed load balancer based on work stealing and workload discretization J. Parallel Distrib. Comput. 2021 150 34-45
[11]
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016). http://www.deeplearningbook.org
[12]
Karypis, G., Kumar, V.: A coarse-grain parallel formulation of multilevel k-way graph partitioning algorithm. In: PPSC (1997)
[13]
Klinkenberg J, Samfass P, et al. Chameleon: reactive load balancing for hybrid MPI+OpenMP task-parallel applications J. Parallel Distrib. Comput. 2020 138 55-64
[14]
Klinkenberg, J., Samfass, P., et al.: Reactive task migration for hybrid MPI+OpenMP applications. In: Parallel Processing and Applied Mathematics, pp. 59–71 (2020).
[15]
Larkins, D.B., Snyder, J., Dinan, J.: Accelerated work stealing. In: Proceedings of the 48th International Conference on Parallel Processing (2019)
[16]
Li, J., Ma, X., et al.: Machine learning based online performance prediction for runtime parallelization and task scheduling. In: IEEE International Symposium on Performance Analysis of Systems and Software, pp. 89–100 (2009)
[17]
Lifflander, J., et al.: Work stealing and persistence-based load balancers for iterative overdecomposed applications. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, pp. 137–148 (2012)
[18]
Meister O, Rahnema K, and Bader M Parallel memory-efficient adaptive mesh refinement on structured triangular meshes with billions of grid cells ACM Trans. Math. Softw. (TOMS) 2016 43 3 1-27
[19]
Menon, H., Kalé, L.: A distributed dynamic load balancer for iterative applications. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–11 (2013).
[20]
Munera, A., Royuela, S., et al.: Experiences on the characterization of parallel applications in embedded systems with Extrae/Paraver. In: 49th International Conference on Parallel Processing (2020)
[21]
Rannabauer L, Dumbser M, and Bader M ADER-DG with a-posteriori finite-volume limiting to simulate tsunamis in a parallel adaptive mesh refinement framework Comput. Fluids 2018 173 299-306
[22]
Renardy, M., Rogers, R.C.: An introduction to partial differential equations, vol. 13. Springer, New York (2006).
[23]
Samfass, P., Klinkenberg, J., Bader, M.: Hybrid MPI+OpenMP reactive work stealing in distributed memory in the PDE framework Sam(oa)2. In: IEEE International Conference on Cluster Computing, pp. 337–347 (2018)
[24]
Samfass, P., Klinkenberg, J., et al.: Predictive, reactive and replication-based load balancing of tasks in chameleon and Sam(oa)2. In: Proceedings of the Platform for Advanced Scientific Computing Conference (2021)
[25]
Sharkawi, S., Desota, D., et al.: Performance projection of HPC applications using SPEC CFP2006 benchmarks. In: International Symposium on Parallel & Distributed Processing, pp. 1–12 (2009)
[26]
Shende, S., Malony, A.D., et al.: Portable profiling and tracing for parallel, scientific applications using C++. In: Proceedings of the SIGMETRICS Symposium on Parallel and Distributed Tools, pp. 134–145
[27]
Skinner, D., Kramer, W.: Understanding the causes of performance variability in HPC workloads. In: Proceedings of the IEEE Workload Characterization Symposium, pp. 137–149 (2005).
[28]
Thoman P et al. A taxonomy of task-based parallel programming technologies for high-performance computing J. Supercomput. 2018 74 4 1422-1434

Cited By

View all
  • (2024)An Optimal Novel Approach for Dynamic Energy-Efficient Task Offloading in Mobile Edge-Cloud Computing NetworksSN Computer Science10.1007/s42979-024-02992-15:5Online publication date: 15-Jun-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
Parallel Processing and Applied Mathematics: 14th International Conference, PPAM 2022, Gdansk, Poland, September 11–14, 2022, Revised Selected Papers, Part I
Sep 2022
486 pages
ISBN:978-3-031-30441-5
DOI:10.1007/978-3-031-30442-2
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 28 April 2023

Author Tags

  1. HPC
  2. Task-based Parallel Models
  3. MPI+OpenMP
  4. Machine Learning
  5. Online Prediction
  6. Dynamic Load Balancing

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)An Optimal Novel Approach for Dynamic Energy-Efficient Task Offloading in Mobile Edge-Cloud Computing NetworksSN Computer Science10.1007/s42979-024-02992-15:5Online publication date: 15-Jun-2024

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media