Article

Free access

Building a high-performance collective communication library

Authors:

David G. Payne,

Robert van de Geijn,

Jerrell WattsAuthors Info & Claims

Supercomputing '94: Proceedings of the 1994 ACM/IEEE conference on Supercomputing

Pages 107 - 116

Published: 14 November 1994 Publication History

Abstract

In this paper, we report on a project to develop a unified approach for building a library of collective communication operations that performs well on a cross-section of problems encountered in real applications. The target architecture is a two-dimensional mesh with worm-hole routing, but the techniques are more general. The approach differs from traditional library implementations in that we address the need for implementations that perform well for various sized vectors and grid dimensions, including non-power-of-two grids. We show how a general approach to hybrid algorithms yields performance across the entire range of vector lengths. Moreover, many scalable implementations of application libraries require collective communication within groups of nodes. Our approach yields the same kind of performance for group collective communication. Results from the Intel Paragon system are included. To obtain this library for Intel systems contact intercom©cs.utexas.edu.

References

[1]

M. Barnett, S. Gupta, D. Payne, L. Shuler, R. van de Geijn and J. Watts. Interprocessor Collective Communication Library (InterCom). Proceedings of Scalable High Performance Computing Conference, pg. 357-364, IEEE Computer Society Press, Knoxville, TN, May 23-24, 1994.

[2]

M. Barnett, R. Littlefield, D.G. Payne and R. van de Geijn. Efficient Communication Primitives on Mesh Architectures with Hardware Routing. Sixth SIAM Conference on Parallel Processing for Scientific Computing, Norfolk, VA, Mar. 22-24, 1993.

[3]

M. Barnett, R. Littlefield, D.G. Payne and R. van de Geijn. Global Combine on Mesh Architectures with Wormhole Routing. 7th International Parallel Processing Symposium, pages 156-162, IEEE Computer Society Press, Newport Beach, CA, Apr. 13-16, 1993.

[4]

M. Barnett, D. Payne and R. van de Geijn. Optimal broadcasting in mesh-connected architectures. University of Texas Department of Computer Science TR-91-38, Dec. 1991.

Digital Library

[5]

M. Barnett, D.G. Payne, R. van de Geijn and J. Watts. Broadcasting on Meshes with Worm-Hole Routing. Journal of Parallel and Distributed Computing, submitted. (Currently University of Texas Department of Computer Sciences TR-93-24.)

Digital Library

[6]

J.-C. Bermond, P. Michallon and D. Trystram. Broadcasting in Wraparound Meshes with Parallel Monodirectional Links. Parallel Computing, 18(6):639-648, June 1992.

[7]

C.-T. Ho and S. L. Johnsson. Distributed Routing Algorithms for Broadcasting and Personalized Communication in Hypercubes. Proceedings of the 1986 International Conference on Parallel Processing, pg. 640-648, IEEE Computer Society Press, 1986.

[8]

S. L. Lillevik. The Touchstone 30 Gigaflop Delta Prototype Sixth Distributed Memory Computing Conference Proceedings, pg. 671-677, IEEE Computer Society Press, 1991.

[9]

R. Littlefield. Characterizing and Tuning Communications Performance on the Touchstone Delta and iPSC/860. Proceedings of the 1992 Intel User's Group Meeting, Dallas, Texas, Oct. 4-7, 1992.

[10]

L. M. Ni and P. K McKinley. A Survey of Wormhole Routing Techniques in Direct Networks. IEEE Computer, 26(2):62-76, Feb. 1993.

Digital Library

[11]

Y. Saad and M. H. Schultz. Data Communication in Parallel Architectures. Parallel Computing, 11(2):131-150, Aug. 1989.

[12]

S. R. Seidel. Broadcasting on Linear Arrays and Meshes. Oak Ridge National Laboratory Technical Report ORNL/TM-12356, Mar. 1993.

[13]

M. Simmen. Comments on Broadcast Algorithms for Two-Dimensional Grids Parallel Computing, 17(1):109-112, Apr. 1991.

[14]

R. A. van de Geijn. Efficient Global Combine Operations. Sixth Distributed Memory Computing Conference Proceedings, pg. 291-294, IEEE Computer Society Press, 1991.

[15]

R. van de Geijn and J. Watts. A Pipelined Broadcast for Multidimensional Meshes. Parallel Processing Letters, to appear.

[16]

D. W. Walker. The Design of a Standard Message Passing Interface for Distributed Memory Concurrent Computers. Parallel Computing, Apr. 1994. (Up to date information about the MPI standard is available from netlib, directory mpi.)

Digital Library

Cited By

León ERiesen RFerreira KMaccabe AMaccabe AThain D(2011)Cache injection for parallel applicationsProceedings of the 20th international symposium on High performance distributed computing10.1145/1996130.1996135(15-26)Online publication date: 8-Jun-2011
https://dl.acm.org/doi/10.1145/1996130.1996135
Chiba Tden Burger MKielmann TMatsuoka S(2010)Dynamic Load-Balanced Multicast for Data-Intensive Applications on CloudsProceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing10.1109/CCGRID.2010.63(5-14)Online publication date: 17-May-2010
https://dl.acm.org/doi/10.1109/CCGRID.2010.63
Peterka TGoodell DRoss RShen HThakur RPinfold W(2009)A configurable algorithm for parallel image-compositing applicationsProceedings of the Conference on High Performance Computing Networking, Storage and Analysis10.1145/1654059.1654064(1-10)Online publication date: 14-Nov-2009
https://dl.acm.org/doi/10.1145/1654059.1654064
Show More Cited By

Recommendations

Collective operations in NEC's high-performance MPI libraries
IPDPS'06: Proceedings of the 20th international conference on Parallel and distributed processing

We give an overview of the algorithms and implementations in the high-performance MPI libraries MPI/SX and MPI/ES of some of the most important collective operations of MPI (the Message Passing Interface). The infrastructure of MPI/SX makes it easy to ...
An adaptive extension library for improving collective communication operations

In this paper, we present an adaptive extension library that combines the advantage of using a portable MPI library with the ability to optimize the performance of specific collective communication operations. The extension library is built on top of ...
Fast Collective Communication Libraries, Please

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

Supercomputing '94: Proceedings of the 1994 ACM/IEEE conference on Supercomputing

November 1994

840 pages

ISBN:0818666056

Conference Chair:
Gary M. Johnson
George Mason University

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS: Computer Society

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 14 November 1994

Check for updates

Qualifiers

Article

Conference

SC '94

Sponsor:

SIGARCH
IEEE-CS

SC '94: International Conference for High Performance Computing, Networking, Storage and Analysis

November 14 - 18, 1994

Washington, D.C.

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
257
Total Downloads

Downloads (Last 12 months)67
Downloads (Last 6 weeks)5

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

León ERiesen RFerreira KMaccabe AMaccabe AThain D(2011)Cache injection for parallel applicationsProceedings of the 20th international symposium on High performance distributed computing10.1145/1996130.1996135(15-26)Online publication date: 8-Jun-2011
https://dl.acm.org/doi/10.1145/1996130.1996135
Chiba Tden Burger MKielmann TMatsuoka S(2010)Dynamic Load-Balanced Multicast for Data-Intensive Applications on CloudsProceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing10.1109/CCGRID.2010.63(5-14)Online publication date: 17-May-2010
https://dl.acm.org/doi/10.1109/CCGRID.2010.63
Peterka TGoodell DRoss RShen HThakur RPinfold W(2009)A configurable algorithm for parallel image-compositing applicationsProceedings of the Conference on High Performance Computing Networking, Storage and Analysis10.1145/1654059.1654064(1-10)Online publication date: 14-Nov-2009
https://dl.acm.org/doi/10.1145/1654059.1654064
Coti CHerault TCappello F(2009)MPI Applications on GridsProceedings of the 15th International Euro-Par Conference on Parallel Processing10.1007/978-3-642-03869-3_45(466-477)Online publication date: 23-Aug-2009
https://dl.acm.org/doi/10.1007/978-3-642-03869-3_45
Matsuda MKudoh TKodama YTakano RIshikawa Y(2008)The design and implementation of MPI collective operations for clusters in long-and-fast networksCluster Computing10.1007/s10586-007-0050-711:1(45-55)Online publication date: 1-Mar-2008
https://dl.acm.org/doi/10.1007/s10586-007-0050-7
Dongarra JBosilca GChen ZEijkhout VFagg GFuentes ELangou JLuszczek PPjesivac-Grbovic JSeymour KYou HVadhiyar S(2006)Self-adapting numerical software (SANS) effortIBM Journal of Research and Development10.1147/rd.502.022350:2/3(223-238)Online publication date: 1-Mar-2006
https://dl.acm.org/doi/10.1147/rd.502.0223
Wu MKendall RWright KZhang ZKramer W(2005)Performance Modeling and Tuning Strategies of Mixed Mode Collective CommunicationsProceedings of the 2005 ACM/IEEE conference on Supercomputing10.1109/SC.2005.56Online publication date: 12-Nov-2005
https://dl.acm.org/doi/10.1109/SC.2005.56
Ramos LMartin C(2005)A Reconfigurable MPI Broadcast FunctionProceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region10.1109/HPCASIA.2005.9Online publication date: 30-Nov-2005
https://dl.acm.org/doi/10.1109/HPCASIA.2005.9
Ramos LMartins C(2005)A proposal of reconfigurable MPI collective communication functionsProceedings of the Third international conference on Parallel and Distributed Processing and Applications10.1007/11576235_14(102-107)Online publication date: 2-Nov-2005
https://dl.acm.org/doi/10.1007/11576235_14
Moody AFernandez JPetrini FPanda DMcGraw J(2003)Scalable NIC-based Reduction on Large-scale ClustersProceedings of the 2003 ACM/IEEE conference on Supercomputing10.1145/1048935.1050209Online publication date: 15-Nov-2003
https://dl.acm.org/doi/10.1145/1048935.1050209
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten