Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2503210.2503284acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

A distributed dynamic load balancer for iterative applications

Published: 17 November 2013 Publication History

Abstract

For many applications, computation load varies over time. Such applications require dynamic load balancing to improve performance. Centralized load balancing schemes, which perform the load balancing decisions at a central location, are not scalable. In contrast, fully distributed strategies are scalable but typically do not produce a balanced work distribution as they tend to consider only local information.
This paper describes a fully distributed algorithm for load balancing that uses partial information about the global state of the system to perform load balancing. This algorithm, referred to as GrapevineLB, consists of two stages: global information propagation using a lightweight algorithm inspired by epidemic [21] algorithms, and work unit transfer using a randomized algorithm. We provide analysis of the algorithm along with detailed simulation and performance comparison with other load balancing strategies. We demonstrate the effectiveness of GrapevineLB for adaptive mesh refinement and molecular dynamics on up to 131,072 cores of BlueGene/Q.

References

[1]
I. Ahmad and A. Ghafoor. A semi distributed task allocation strategy for large hypercube supercomputers. In Conference on Supercomputing, 1990.
[2]
K. Birman, M. Hayden, O. Ozkasap, Z. Xiao, M. Budiu, and Y. Minsky. Bimodal multicast. ACM Transactions on Computer Systems (TOCS), 1999.
[3]
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An Efficient Multithreaded Runtime System. In PPoPP, 1995.
[4]
J. E. Boillat. Load balancing and poisson equation in a graph. Concurrency: Practice and Experience, 2(4):289--313, 1990.
[5]
U. Catalyurek, E. Boman, K. Devine, D. Bozdag, R. Heaphy, and L. Riesen. Hypergraph-based dynamic load balancing for adaptive scientific computations. In Proc. of 21st International Parallel and Distributed Processing Symposium (IPDPS'07), pages 1--11. IEEE, 2007. Best Algorithms Paper Award.
[6]
C. Chevalier, F. Pellegrini, I. Futurs, and U. B. I. Improvement of the efficiency of genetic algorithms for scalable parallel graph partitioning in a multi-level framework. In In Proceedings of Euro-Par 2006, LNCS, pages 243--252, 2006.
[7]
Y.-C. Chow and W. H. Kohler. Models for dynamic load balancing in homogeneous multiple processor systems. In IEEE Transactions on Computers, 1982.
[8]
A. Corradi, L. Leonardi, and F. Zambonelli. Diffusive load balancing policies for dynamic applications. In IEEE Concurrency, pages 7(1):22--31, 1999.
[9]
G. Cybenko. Dynamic load balancing for distributed memory multiprocessors. Journal of parallel and distributed computing, 7(2):279--301, 1989.
[10]
A. Demers, D. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. Sturgis, D. Swinehart, and D. Terry. Epidemic algorithms for replicated database maintenance. In ACM Symposium on Principles of distributed computing, 1987.
[11]
J. Dinan, D. B. Larkins, P. Sadayappan, S. Krishnamoorthy, and J. Nieplocha. Scalable work stealing. In Conference on High Performance Computing Networking, Storage and Analysis, 2009.
[12]
George Karypis and Vipin Kumar. A coarse-grain parallel formulation of multilevel k-way graph partitioning algorithm. In Proc. of the 8th SIAM conference on Parallel Processing for Scientific Computing, 1997.
[13]
George Karypis and Vipin Kumar. Multilevel k-way Partitioning Scheme for Irregular Graphs. Journal of Parallel and Distributed Computing, 48:96--129, 1998.
[14]
A. Ha'c and X. Jin. Dynamic load balancing in distributed system using a decentralized algorithm. In Intl. Conf. on Distributed Computing Systems, 1987.
[15]
B. Hendrickson and K. Devine. Dynamic load balancing in computational mechanics. Computer Methods in Applied Mechanics and Engineering, 184(2):485--500, 2000.
[16]
B. Hendrickson and R. Leland. The Chaco user's guide. Technical Report SAND 93--2339, Sandia National Laboratories, Albuquerque, NM, Oct. 1993.
[17]
Y. Hu and R. Blake. An optimal dynamic load balancing algorithm. Technical report, Daresbury Laboratory, 1995.
[18]
P. Jetley, F. Gioachin, C. Mendes, L. V. Kale, and T. R. Quinn. Massively parallel cosmological simulations with ChaNGa. In IPDPS, 2008.
[19]
L. V. Kalé. Comparing the performance of two dynamic load distribution methods. In Proceedings of the 1988 International Conference on Parallel Processing, pages 8--11, St. Charles, IL, August 1988.
[20]
L. V. Kalé. The virtualization model of parallel programming: Runtime optimizations and the state of art. In LACSI 2002, Albuquerque, October 2002.
[21]
W. Kermack and A. McKendrick. Contributions to the mathematical theory of epidemics. ii. the problem of endemicity. Proceedings of the Royal society of London. Series A, 138(834):55--83, 1932.
[22]
A. Langer, J. Lifflander, P. Miller, K.-C. Pan, L. V. Kale, and P. Ricker. Scalable Algorithms for Distributed-Memory Adaptive Mesh Refinement. In SBAC-PAD 2012, New York, USA, October 2012.
[23]
J. Lifflander, S. Krishnamoorthy, and L. V. Kale. Work stealing and persistence-based load balancers for iterative overdecomposed applications. In HPDC, 2012.
[24]
F. C. H. Lin and R. M. Keller. The gradient model load balancing method. Software Engineering, IEEE Transactions on, (1):32--38, 1987.
[25]
Y.-J. Lin and V. Kumar. And-parallel execution of logic programs on a shared-memory multiprocessor. J. Log. Program., 10(1/2/3&4):155--178, 1991.
[26]
F. Mattern. Algorithms for distributed termination detection. Distributed computing, 2(3):161--175, 1987.
[27]
C. Mei and L. V. K. et al. Enabling and scaling biomolecular simulations of 100 million atoms on petascale machines with a multicore-optimized message-driven runtime. In Proceedings of the 2011 ACM/IEEE conference on Supercomputing.
[28]
H. Menon, N. Jain, G. Zheng, and L. V. Kalé. Automated load balancing invocation based on application characteristics. In IEEE Cluster, 2012.
[29]
L. M. Ni and K. Hwang. Optimal load balancing in a multiple processor system with many job classes. In IEEE Trans. on Software Eng., volume SE-11, 1985.
[30]
D. Peleg and E. Upfal. The token distribution problem. SIAM Journal on Computing, 18(2):229--243, 1989.
[31]
S. Sharma, R. Ponnusamy, B. Moon, Y. Hwang, R. Das, and J. Saltz. Run-time and compile-time support for adaptive irregular problems. In Proceedings of Supercomputing 1994, Nov. 1994.
[32]
Y. Sun, G. Zheng, P. Jetley, and L. V. Kale. An Adaptive Framework for Large-scale State Space Search. In IPDPS, 2011.
[33]
M. H. Willebeek-LeMair and A. P. Reeves. Strategies for dynamic load balancing on highly parallel computers. In IEEE Transactions on Parallel and Distributed Systems, September 1993.
[34]
C. Xu, F. C. M. Lau, and R. Diekmann. Decentralized remapping of data parallel applications in distributed memory multiprocessors. Concurrency - Practice and Experience, 9(12):1351--1376, 1997.
[35]
G. Zheng, A. Bhatele, E. Meneses, and L. V. Kale. Periodic Hierarchical Load Balancing for Large Supercomputers. IJHPCA, March 2011.

Cited By

View all
  • (2024)Cloud-Native Computing: A Survey From the Perspective of ServicesProceedings of the IEEE10.1109/JPROC.2024.3353855112:1(12-46)Online publication date: Jan-2024
  • (2024)Distributed Asynchronous Contact Mechanics with DARMA/vtAsynchronous Many-Task Systems and Applications10.1007/978-3-031-61763-8_4(34-45)Online publication date: 14-Feb-2024
  • (2023)Proactive Task Offloading for Load Balancing in Iterative ApplicationsParallel Processing and Applied Mathematics10.1007/978-3-031-30442-2_20(263-275)Online publication date: 28-Apr-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
November 2013
1123 pages
ISBN:9781450323789
DOI:10.1145/2503210
  • General Chair:
  • William Gropp,
  • Program Chair:
  • Satoshi Matsuoka
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. distributed load balancer
  2. epidemic algorithm
  3. load balancing

Qualifiers

  • Research-article

Funding Sources

Conference

SC13
Sponsor:

Acceptance Rates

SC '13 Paper Acceptance Rate 91 of 449 submissions, 20%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)3
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Cloud-Native Computing: A Survey From the Perspective of ServicesProceedings of the IEEE10.1109/JPROC.2024.3353855112:1(12-46)Online publication date: Jan-2024
  • (2024)Distributed Asynchronous Contact Mechanics with DARMA/vtAsynchronous Many-Task Systems and Applications10.1007/978-3-031-61763-8_4(34-45)Online publication date: 14-Feb-2024
  • (2023)Proactive Task Offloading for Load Balancing in Iterative ApplicationsParallel Processing and Applied Mathematics10.1007/978-3-031-30442-2_20(263-275)Online publication date: 28-Apr-2023
  • (2023)From reactive to proactive load balancing for task‐based parallel applications in distributed memory machinesConcurrency and Computation: Practice and Experience10.1002/cpe.782835:24Online publication date: 26-Jun-2023
  • (2022)Microservice: dynamic load balancing strategy based on consistent hashingInternational Conference on Electronic Information Engineering and Computer Communication (EIECC 2021)10.1117/12.2634851(105)Online publication date: 4-May-2022
  • (2022)Asynchronous Workload Balancing through Persistent Work-Stealing and Offloading for a Distributed Actor Model Library2022 IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM)10.1109/PAW-ATM56565.2022.00009(39-51)Online publication date: Nov-2022
  • (2021)Dynamic Load Sharing in Memory Constrained Devices: A Survey2021 IEEE 7th World Forum on Internet of Things (WF-IoT)10.1109/WF-IoT51360.2021.9594948(586-591)Online publication date: 14-Jun-2021
  • (2021)Distributed Work Stealing at Scale via Matchmaking2021 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/Cluster48925.2021.00040(250-260)Online publication date: Sep-2021
  • (2021)Optimizing Distributed Load Balancing for Workloads with Time-Varying Imbalance2021 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/Cluster48925.2021.00039(238-249)Online publication date: Sep-2021
  • (2020)Looking at Data Science through the Lens of Scheduling and Load BalancingScheduling Problems - New Applications and Trends10.5772/intechopen.92578Online publication date: 8-Jul-2020
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media