An optimal broadcast algorithm adapted to SMP clusters
JL Träff, A Ripke - European Parallel Virtual Machine/Message Passing …, 2005 - Springer
JL Träff, A Ripke
European Parallel Virtual Machine/Message Passing Interface Users' Group Meeting, 2005•SpringerWe describe and and evaluate the adaption of a new, optimal broadcast algorithm for “flat”,
fully connected networks to clusters of SMP nodes. The optimal broadcast algorithm
improves over other commonly used broadcast algorithms (pipelined binary trees, recursive
halving) by up to a factor of two for the non-hierarchical (non-SMP) case. The algorithm is
well suited for clusters of SMP nodes, since intra-node broadcast of relatively small blocks
can take place concurrently with inter-node communication over the network. This new …
fully connected networks to clusters of SMP nodes. The optimal broadcast algorithm
improves over other commonly used broadcast algorithms (pipelined binary trees, recursive
halving) by up to a factor of two for the non-hierarchical (non-SMP) case. The algorithm is
well suited for clusters of SMP nodes, since intra-node broadcast of relatively small blocks
can take place concurrently with inter-node communication over the network. This new …
Abstract
We describe and and evaluate the adaption of a new, optimal broadcast algorithm for “flat”, fully connected networks to clusters of SMP nodes. The optimal broadcast algorithm improves over other commonly used broadcast algorithms (pipelined binary trees, recursive halving) by up to a factor of two for the non-hierarchical (non-SMP) case. The algorithm is well suited for clusters of SMP nodes, since intra-node broadcast of relatively small blocks can take place concurrently with inter-node communication over the network. This new algorithm has been incorporated into a state-of-the art MPI library. On a 32-node dual-processor AMD cluster with Myrinet interconnect, improvements of a factor of 1.5 over for instance a pipelined binary tree algorithm has been achieved, both for the case with one and with two MPI processes per node.
Springer