All-gather Algorithms Resilient to Imbalanced Process Arrival Patterns

J Proficz - ACM Transactions on Architecture and Code …, 2021 - dl.acm.org
ACM Transactions on Architecture and Code Optimization (TACO), 2021dl.acm.org
Two novel algorithms for the all-gather operation resilient to imbalanced process arrival
patterns (PATs) are presented. The first one, Background Disseminated Ring (BDR), is
based on the regular parallel ring algorithm often supplied in MPI implementations and
exploits an auxiliary background thread for early data exchange from faster processes to
accelerate the performed all-gather operation. The other algorithm, Background Sorted
Linear synchronized tree with Broadcast (BSLB), is built upon the already existing PAP …
Two novel algorithms for the all-gather operation resilient to imbalanced process arrival patterns (PATs) are presented. The first one, Background Disseminated Ring (BDR), is based on the regular parallel ring algorithm often supplied in MPI implementations and exploits an auxiliary background thread for early data exchange from faster processes to accelerate the performed all-gather operation. The other algorithm, Background Sorted Linear synchronized tree with Broadcast (BSLB), is built upon the already existing PAP-aware gather algorithm, that is, Background Sorted Linear Synchronized tree (BSLS), followed by a regular broadcast distributing gathered data to all participating processes. The background of the imbalanced PAP subject is described, along with the PAP monitoring and evaluation topics. An experimental evaluation of the algorithms based on a proposed mini-benchmark is presented. The mini-benchmark was performed over 2,000 times in a typical HPC cluster architecture with homogeneous compute nodes. The obtained results are analyzed according to different PATs, data sizes, and process numbers, showing that the proposed optimization works well for various configurations, is scalable, and can significantly reduce the all-gather elapsed times, in our case, up to factor 1.9 or 47% in comparison with the best state-of-the-art solution.
ACM Digital Library
Showing the best result for this search. See all results