Abstract
With local core counts on the rise, taking advantage of shared-memory to optimize collective operations can improve performance. We study several on-host shared memory optimized algorithms for MPI_Bcast, MPI_Reduce, and MPI_Allreduce, using tree-based, and reduce-scatter algorithms. For small data operations with relatively large synchronization costs fan-in/fan-out algorithms generally perform best. For large messages data manipulation constitute the largest cost and reduce-scatter algorithms are best for reductions. These optimization improve performance by up to a factor of three. Memory and cache sharing effect require deliberate process layout and careful radix selection for tree-based methods.
Research sponsored by the Mathematical, Information, and Computational Sciences Division, Office of Advanced Scientific Computing Research, U.S. Department of Energy, under Contract No. DE-AC05-00OR22725 with UT-Battelle, LLC.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Thakur, R., Gropp, W.: Improving the performance of collective operations in mpich. In: Lecture Notes In Computer Science, pp. 257–267 (2006)
Rabenseifner, R.: Optimization of collective reduction operations. In: Lecture Notes In Computer Science, pp. 1–9 (2004)
LA-MPI, http://public.lanl.gov/lampi
Sistare, S., van de Vaart, R., Loh, E.: Optimization of mpi collectives on clusters of large-scale smp’s. In: Proceedings of SC 1999: High Performance Networking and Computing (1999)
NEC web page, http://www.nec.de
Mamidala, A.R., et al.: Mpi collectives on modern multicore clusters: Performance optimizations and communication characteristics. In: CCGRID 2008 (accepted for publication, 2008)
Mamidala, A.R., Vishnu, A., Panda, D.K.: Efficient shared memory and rdma based design for mpi_allgather over infiniband. In: Lecture Notes In Computer Science
Tipparaju, V., Nieplocha, J., Panda, D.: Fast collective operations using shared and remote memory access protocols on clusters. In: Proceedings of the International Parallel and Distributed Processing Symposium (2003)
Wu, M.S., Kendall, R.A., Aluru, S.: Exploring collective communications on a cluster of smps. In: Proceedings, HPCAsia2004, pp. 114–117 (2004)
Graham, R.L., Choi, S.E., Daniel, D.J., Desai, N.N., Minnich, R.G., Rasmussen, C.E., Risinger, L.D., Sukalksi, M.W.: A network-failure-tolerant message-passing system for terascale clusters. International Journal of Parallel Programming 31(4) (2003)
Open MPI, http://www.open-mpi.org
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Graham, R.L., Shipman, G. (2008). MPI Support for Multi-core Architectures: Optimized Shared Memory Collectives. In: Lastovetsky, A., Kechadi, T., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2008. Lecture Notes in Computer Science, vol 5205. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87475-1_21
Download citation
DOI: https://doi.org/10.1007/978-3-540-87475-1_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87474-4
Online ISBN: 978-3-540-87475-1
eBook Packages: Computer ScienceComputer Science (R0)