Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3409501.3409510acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcctConference Proceedingsconference-collections
research-article

Optimized Reduce Communication Performance with the Tree Topology

Published: 25 August 2020 Publication History

Abstract

Communication plays an important role in MPI applications, and reduce operations are heavily used part of MPI. In this paper, we propose a k-nomial tree topology and a hierarchy tree topology to optimize the Reduce operation in MPI. The k-nomial tree can effectively decrease the communication steps and is suitable for lots of processes. Compared with the binomial tree algorithm in small and medium size messages, the Reduce operation performed by the k-nomial tree can improve communication performance by 46%. Hierarchy trees can dynamically group processes at run time to take advantage of high bandwidth to communicate as much as possible within nodes. The test results show that compared with the binomial tree algorithm, the performance of the hierarchy tree algorithm is stable. For Reduce operation, we can get a 30% performance improvement.

References

[1]
The standarization forum for message passing interface (MPI)[EB/OL]. http://mpi-forum.org/. 2020.
[2]
Petrini, F., Kerbyson, D. J., and Pakin, S. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q. In proceedings of the 2003 ACM/IEEE Conference on Supercomputing, SC 2003 CM.
[3]
Rabenseifner, R. Optimization of Collective Reduction Operations. Lecture Notes in Computer Science. v 3036, p 1--9, 2004.
[4]
Thakur, R., Rabenseifner, R. and Gropp, W. Optimization of Collective Communication Operations in MPICH. International Journal of High Performance Computing Applications, 19(1):49--66, 2005.
[5]
Cameron, K. W., and Ge, R. Predicting and Evaluating Distributed Communication Performance. Supercomputing. In Proceedings of the ACM/IEEE SC2004 Conference. 2004.
[6]
Mirsadeghi, S. H., and Afsahi, A. Topology-Aware Rank Reordering for MPI Collectives. In proceedings IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016, p 1759--1768, July 18, 2016.
[7]
Tu, B., Fan, J., Zhan, J., and Zhao, X. Performance analysis and optimization of mpi collective operations on multi-core clusters. Journal of Supercomputing, 60(1), 141--162. 2012.
[8]
Alvarezllorente, J. M., Diazmartin, J. C., and Ricogallego, J. A. Formal modeling and performance evaluation of a runtime rank remapping technique in Broadcast, Allgather and Allreduce MPI collective operations. In Proceedings -- 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017, p 963-972, July 10, 2017
[9]
Hasanov, K., Lastovetsky, A. Hierarchical redesign of classic MPI reduction algorithms. Journal of Supercomputing, v 73, n 2, p 713--725, February 1, 2017.

Cited By

View all
  • (2020)Improving Clairvoyant: reduction algorithm resilient to imbalanced process arrival patternsThe Journal of Supercomputing10.1007/s11227-020-03499-177:6(6145-6177)Online publication date: 20-Nov-2020

Index Terms

  1. Optimized Reduce Communication Performance with the Tree Topology

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    HPCCT & BDAI '20: Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial Intelligence
    July 2020
    276 pages
    ISBN:9781450375603
    DOI:10.1145/3409501
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • Xi'an Jiaotong-Liverpool University: Xi'an Jiaotong-Liverpool University

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 August 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. MPI_Reduce
    2. hierarchy
    3. k-nomial
    4. optimization
    5. tree topology

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    HPCCT & BDAI 2020

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)41
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 24 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Improving Clairvoyant: reduction algorithm resilient to imbalanced process arrival patternsThe Journal of Supercomputing10.1007/s11227-020-03499-177:6(6145-6177)Online publication date: 20-Nov-2020

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media