research-article

CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers

Authors:

Jehoshua Bruck,

Pablo Elustando,

Marc SnirAuthors Info & Claims

IEEE Transactions on Parallel and Distributed Systems, Volume 6, Issue 2

Pages 154 - 164

https://doi.org/10.1109/71.342126

Published: 01 February 1995 Publication History

Abstract

A collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface, efficient communication operations, and the advantage of portability. A library of this nature, the Collective Communication Library (CCL), intended for the line of scalable parallel computer products by IBM, has been designed. CCL is part of the parallel application programming interface of the recently announced IBM 9076 Scalable POWERparallel System 1 (SP1). In this paper, we examine several issues related to the functionality, correctness, and performance of a portable collective communication library while focusing on three novel aspects in the design and implementation of CCL: 1) the introduction of process groups, 2) the definition of semantics that ensures correctness, and 3) the design of new and tunable algorithms based on a realistic point-to-point communication model.Index Terms Collective communication algorithms, collective communication semantics, message-passing parallel systems, portable library, process group, tunable algorithms.

References

[1]

A. Aggarwal and S. Kipnis, “Message-time tradeoff for combining data in message-passing systems,” IBM Res. Rep. RC-18349, Sept. 1992.

[2]

Y. Amir, D. Dolev, S. Kramer, and D. Malki, “Transis: A communication sub-system for high availability,” in Proc. 22nd Ann Int Symp Fault-Tolerant Computing , July 1992, pp. 76–84.

[3]

V. Bala and S. Kipnis, “Efficient collective communication over dynamically determined sets of processors in massively parallel computers,” IBM Res. Rep. RC-17771, Mar. 1992.

[4]

——, “Process groups: A mechanism for the coordination of and communication among processes in the Venus collective communication library,” in Proc. 7th Int Parallel Processing Symp , Apr. 1993.

Digital Library

[5]

A. Bar-Noy and S. Kipnis, “Designing broadcasting algorithms in the postal model for message-passing systems,” in Proc. 4th Ann ACM Symp Parallel Algorithms, Architectures , June 1992, pp. 11–22.

Digital Library

[6]

——, “Broadcasting multiple messages in simultaneous send/receive systems,” in Proc. 5th IEEE Symp Parallel, Distributed Processing , Dec. 1993.

Digital Library

[7]

A. Bar-Noy, S. Kipnis, and B. Schieber, “An optimal algorithm for computing census functions in message-passing systems,” Parallel Processing Lett. , vol. 3, no. 1, Mar. 1993.

[8]

A. Beguelin, et al. , “A user's guide to PVM Parallel Virtual Machine,” ORNL Tech. Rep. ORNL/TM-11826, May 1992.

[9]

K. Birman et al. , “The ISIS system manual,” Department of Computer Science, Cornell University, Ithaca, NY, Sept. 1990.

[10]

J. Bruck et al. , “Survey of routing issues for the Vulcan parallel computer,” IBM Res. Rep. RJ-8839, June 1992.

[11]

J. Bruck et al. , “A proposal for common group structures in a collective communication library,” IBM Res. Rep. RJ-9421, Mar. 1993.

[12]

J. Bruck, R. Cypher, and C. T. Ho, “Efficient fault-tolerant mesh and hypercubes architectures,” in Proc. 1992 Int. Symp. Fault-Tolerant Computing , pp. 162–169.

[13]

J. Bruck, R. Cypher, and C. T. Ho, “Multiple message broadcasting with generalized Fibonacci trees,” in Proc. 4th IEEE Symp. Parallel, Distributed Processing , Dec. 1992, pp. 424–431.

Digital Library

[14]

J. Bruck, C. T. Ho, and S. Kipnis, “Concatenating data optimally in message-passing systems,” IBM Res. Rep. RJ-9191, Jan. 1993.

[15]

J. Bruck, R. Cypher, C. T. Ho, and S. Kipnis, “Efficient algorithms for the index operation in message-passing systems,” IBM Res. Rep. RJ-9300, Apr. 1993.

[16]

N. Carriero and D. Gelernter, “Linda in context,” Commun. ACM , vol. 32, no. 4, pp. 444–458, Apr. 1989.

Digital Library

[17]

D. R. Cheriton and W. Zwaenpoel, “Distributed process groups in the V kernel,” ACM Trans. Comput. Syst ., vol. 2, no. 3, pp. 77–107, May 1985.

Digital Library

[18]

D. Culler et al. , “LogP: Towards a realistic model of parallel computation,” in Proc. 4th SIGPLAN Symp. Principles, Practices of Parallel Programming , ACM, May 1993.

Digital Library

[19]

R. Cypher and E. Leu, “Message-passing semantics and portable parallel programs,” IBM Res. Rep. RJ-9654, Jan. 1994.

[20]

W. J. Dally et al. , “The J-machine: A fine-grain concurrent computer,” in Proc. Inform. Processing 89 , 1989, pp. 1147–1153.

[21]

Document for a Standard Message-Passing Interface , Message Passing Interface Forum, Tech. Rep. CS-93-214, Univ. Tennessee, Nov. 1993.

Digital Library

[22]

J. Dongarra, R. Hempel, A. Hey, and D. Walker, “A proposal for a user-level, message-passing interface in a distributed memory environment,” ORNL Tech. Rep. ORNL/TM-12231, Oct. 1992.

[23]

S. Dutt and J. P. Hayes, “Designing fault-tolerant systems using automorphisms,” J. Parallel. Distrib. Computing , vol. 12, pp. 249–268, 1991.

Digital Library

[24]

B. Elspas and J. Turner, “Graphs with circulant adjacency matrices,” J. Combinatorial Theory , no. 9, pp. 297–307, 1970.

[25]

B. G. Fitch and M. E. Giampapa, “The Vulcan operation environment: A brief overview and status report,” in Proc. 5th Workshop Use Parallel Processors Meteorology , ECMWF, Nov. 1992.

[26]

G. Fox and W. Furmanski, “Optimal communication algorithms for regular decompositions on the hypercube,” in Proc. 3rd Conf. Hypercube Concurrent Comput. Applicat. , ACM, pp. 648–713, 1988.

Digital Library

[27]

G. Fox et al. , Solving Problems on Concurrent Processors, Volume I: General Techniques and Regular Problems . Englewood Cliffs, NJ: Prentice-Hall, 1988.

Digital Library

[28]

D. Frye et al. , “An external user interface for scalable parallel systems: FORTRAN interface,” Tech. Rep., IBM Highly Parallel Supercomputing Systems Laboratory, Nov. 1992.

[29]

G. A. Geist, M. T. Heath, B. W. Peyton, and P. H. Worley, “A user's guide to PICL: A portable instrumented communication library,” ORNL Tech. Rep. ORNL/TM-11616, Oct. 1990.

[30]

M. E. Giampapa, B. G. Fitch, G. R. Irwin, and D. G. Shea, “Vulcan operating environment: Programmer's reference,” Intern. Memorandum, IBM T. J. Watson Research Center, Yorktown Heights, NY, Mar. 1991.

[31]

R. Hempel, “The ANL/GMD macros (PARMACS) in FORTRAN for portable parallel programming using the message passing programming model, user's guide and reference manual,” Tech. Memorandum, Gesellschaft für Mathematik und Datenverabeitung mbH, West Germany.

[32]

C. T. Ho, “Optimal broadcasting on SIMD hypercubes without indirect addressing capability,” J. Parallel, Distrib. Computing , vol. 13, no. 2, pp. 246–255, Oct. 1991.

Digital Library

[33]

S. L. Johnsson and C. T. Ho, “Spanning graphs for optimum broadcasting and personalized communication in hypercubes,” IEEE Trans. Comput. , vol. 38, no. 9, pp. 1249–1268, Sept. 1989.

Digital Library

[34]

J. F. Palmer, “The NCUBE family of parallel supercomputers,” in Proc. Int. Conf. Comput. Design , IEEE, 1986.

[35]

A. Skjellum and A. P. Leung, “Zipcode: A portable multicomputer communication library atop the Reactive Kernel,” in Proc. 5th Distrib. Memory Computing Conf. , IEEE, Apr. 1990, pp. 328–337.

[36]

Connection Machine CM-5 Tech. Summary , Thinking Machines Corporation, 1991.

[37]

Express 3.0 Introductory Guide , Parasoft Corporation, 1990.

[38]

Paragon XP/S Overview , Intel Corporation, 1991.

[39]

“Vulcan system summary,” Int. Memorandum, IBM T. J. Watson Research Center, Yorktown Heights, NY.

Cited By

Hasanov KLastovetsky A(2017)Hierarchical redesign of classic MPI reduction algorithmsThe Journal of Supercomputing10.1007/s11227-016-1779-773:2(713-725)Online publication date: 1-Feb-2017
https://dl.acm.org/doi/10.1007/s11227-016-1779-7
Axtmann MBingmann TSanders PSchulz CBlelloch GAgrawal K(2015)Practical Massively Parallel SortingProceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures10.1145/2755573.2755595(13-23)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2755573.2755595
Hasanov KLastovetsky A(2015)Hierarchical Optimization of MPI Reduce AlgorithmsProceedings of the 13th International Conference on Parallel Computing Technologies - Volume 925110.1007/978-3-319-21909-7_3(21-34)Online publication date: 31-Aug-2015
https://dl.acm.org/doi/10.1007/978-3-319-21909-7_3
Show More Cited By

Index Terms

CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers

Recommendations

CCL v3.0: Multiprogrammed Semi-Asynchronous Checkpoints
PADS '03: Proceedings of the seventeenth workshop on Parallel and distributed simulation

CCL (Checkpointing and Communication Library) is arecently developed software in support of optimistic parallelsimulation on myrinet based clusters. Beyond classicallow latency message delivery functionalities, this libraryimplements CPU offloaded, semi-...
CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers
Proceedings of the 8th International Symposium on Parallel Processing
The performance and scalability of SHMEM and MPI-2 one-sided routines on a SGI Origin 2000 and a Cray T3E-600: Performances

This paper compares the performance and scalability of SHMEM and MPI-2 one-sided routines on different communication patterns for a SGI Origin 2000 and a Cray T3E-600. The communication tests were chosen to represent commonly used communication patterns ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems

IEEE Transactions on Parallel and Distributed Systems Volume 6, Issue 2

February 1995

116 pages

ISSN:1045-9219

Editor:
Duncan H. Lawrie
Univ. of Illinois at Urbana-Champaign, Urbana

Issue’s Table of Contents

Copyright © Copyright © 1995 IEEE. All Rights Reserved.

Publisher

IEEE Press

Publication History

Published: 01 February 1995

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

28
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hasanov KLastovetsky A(2017)Hierarchical redesign of classic MPI reduction algorithmsThe Journal of Supercomputing10.1007/s11227-016-1779-773:2(713-725)Online publication date: 1-Feb-2017
https://dl.acm.org/doi/10.1007/s11227-016-1779-7
Axtmann MBingmann TSanders PSchulz CBlelloch GAgrawal K(2015)Practical Massively Parallel SortingProceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures10.1145/2755573.2755595(13-23)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2755573.2755595
Hasanov KLastovetsky A(2015)Hierarchical Optimization of MPI Reduce AlgorithmsProceedings of the 13th International Conference on Parallel Computing Technologies - Volume 925110.1007/978-3-319-21909-7_3(21-34)Online publication date: 31-Aug-2015
https://dl.acm.org/doi/10.1007/978-3-319-21909-7_3
Tembey PGavrilovska ASchwan K(2013)inTuneProceedings of the First ACM SIGOPS Conference on Timely Results in Operating Systems10.1145/2524211.2524213(1-16)Online publication date: 3-Nov-2013
https://dl.acm.org/doi/10.1145/2524211.2524213
Tembey PGavrilovska ASchwan K(2010)A case for coordinated resource management in heterogeneous multicore platformsProceedings of the 2010 international conference on Computer Architecture10.1007/978-3-642-24322-6_27(341-356)Online publication date: 19-Jun-2010
https://dl.acm.org/doi/10.1007/978-3-642-24322-6_27
Na'mneh RPan WYoo S(2006)Efficient Adaptive Algorithms for Transposing Small and Large Matrices on Symmetric MultiprocessorsInformatica10.5555/1413878.141388317:4(535-550)Online publication date: 1-Dec-2006
https://dl.acm.org/doi/10.5555/1413878.1413883
Wu MKendall RWright KZhang ZKramer W(2005)Performance Modeling and Tuning Strategies of Mixed Mode Collective CommunicationsProceedings of the 2005 ACM/IEEE conference on Supercomputing10.1109/SC.2005.56Online publication date: 12-Nov-2005
https://dl.acm.org/doi/10.1109/SC.2005.56
Tipparaju VNieplocha JKramer W(2005)Optimizing All-to-All Collective Communication by Exploiting Concurrency in Modern NetworksProceedings of the 2005 ACM/IEEE conference on Supercomputing10.1109/SC.2005.51Online publication date: 12-Nov-2005
https://dl.acm.org/doi/10.1109/SC.2005.51
Wu JYeh SLiu P(2004)Efficient Multiple Multicast on Heterogeneous Network of WorkstationsThe Journal of Supercomputing10.1023/B:SUPE.0000022573.11074.0029:1(59-88)Online publication date: 1-Jul-2004
https://dl.acm.org/doi/10.1023/B%3ASUPE.0000022573.11074.00
Jayasimha DSchwiebert LManivannan DMay J(2003)A foundation for designing deadlock-free routing algorithms in wormhole networksJournal of the ACM10.1145/636865.63686950:2(250-275)Online publication date: 1-Mar-2003
https://dl.acm.org/doi/10.1145/636865.636869
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents