Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers

Published: 01 February 1995 Publication History

Abstract

A collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface, efficient communication operations, and the advantage of portability. A library of this nature, the Collective Communication Library (CCL), intended for the line of scalable parallel computer products by IBM, has been designed. CCL is part of the parallel application programming interface of the recently announced IBM 9076 Scalable POWERparallel System 1 (SP1). In this paper, we examine several issues related to the functionality, correctness, and performance of a portable collective communication library while focusing on three novel aspects in the design and implementation of CCL: 1) the introduction of process groups, 2) the definition of semantics that ensures correctness, and 3) the design of new and tunable algorithms based on a realistic point-to-point communication model.Index Terms Collective communication algorithms, collective communication semantics, message-passing parallel systems, portable library, process group, tunable algorithms.

References

[1]
A. Aggarwal and S. Kipnis, “Message-time tradeoff for combining data in message-passing systems,” IBM Res. Rep. RC-18349, Sept. 1992.
[2]
Y. Amir, D. Dolev, S. Kramer, and D. Malki, “Transis: A communication sub-system for high availability,” in Proc. 22nd Ann Int Symp Fault-Tolerant Computing , July 1992, pp. 76–84.
[3]
V. Bala and S. Kipnis, “Efficient collective communication over dynamically determined sets of processors in massively parallel computers,” IBM Res. Rep. RC-17771, Mar. 1992.
[4]
——, “Process groups: A mechanism for the coordination of and communication among processes in the Venus collective communication library,” in Proc. 7th Int Parallel Processing Symp , Apr. 1993.
[5]
A. Bar-Noy and S. Kipnis, “Designing broadcasting algorithms in the postal model for message-passing systems,” in Proc. 4th Ann ACM Symp Parallel Algorithms, Architectures , June 1992, pp. 11–22.
[6]
——, “Broadcasting multiple messages in simultaneous send/receive systems,” in Proc. 5th IEEE Symp Parallel, Distributed Processing , Dec. 1993.
[7]
A. Bar-Noy, S. Kipnis, and B. Schieber, “An optimal algorithm for computing census functions in message-passing systems,” Parallel Processing Lett. , vol. 3, no. 1, Mar. 1993.
[8]
A. Beguelin, et al. , “A user's guide to PVM Parallel Virtual Machine,” ORNL Tech. Rep. ORNL/TM-11826, May 1992.
[9]
K. Birman et al. , “The ISIS system manual,” Department of Computer Science, Cornell University, Ithaca, NY, Sept. 1990.
[10]
J. Bruck et al. , “Survey of routing issues for the Vulcan parallel computer,” IBM Res. Rep. RJ-8839, June 1992.
[11]
J. Bruck et al. , “A proposal for common group structures in a collective communication library,” IBM Res. Rep. RJ-9421, Mar. 1993.
[12]
J. Bruck, R. Cypher, and C. T. Ho, “Efficient fault-tolerant mesh and hypercubes architectures,” in Proc. 1992 Int. Symp. Fault-Tolerant Computing , pp. 162–169.
[13]
J. Bruck, R. Cypher, and C. T. Ho, “Multiple message broadcasting with generalized Fibonacci trees,” in Proc. 4th IEEE Symp. Parallel, Distributed Processing , Dec. 1992, pp. 424–431.
[14]
J. Bruck, C. T. Ho, and S. Kipnis, “Concatenating data optimally in message-passing systems,” IBM Res. Rep. RJ-9191, Jan. 1993.
[15]
J. Bruck, R. Cypher, C. T. Ho, and S. Kipnis, “Efficient algorithms for the index operation in message-passing systems,” IBM Res. Rep. RJ-9300, Apr. 1993.
[16]
N. Carriero and D. Gelernter, “Linda in context,” Commun. ACM , vol. 32, no. 4, pp. 444–458, Apr. 1989.
[17]
D. R. Cheriton and W. Zwaenpoel, “Distributed process groups in the V kernel,” ACM Trans. Comput. Syst ., vol. 2, no. 3, pp. 77–107, May 1985.
[18]
D. Culler et al. , “LogP: Towards a realistic model of parallel computation,” in Proc. 4th SIGPLAN Symp. Principles, Practices of Parallel Programming , ACM, May 1993.
[19]
R. Cypher and E. Leu, “Message-passing semantics and portable parallel programs,” IBM Res. Rep. RJ-9654, Jan. 1994.
[20]
W. J. Dally et al. , “The J-machine: A fine-grain concurrent computer,” in Proc. Inform. Processing 89 , 1989, pp. 1147–1153.
[21]
Document for a Standard Message-Passing Interface , Message Passing Interface Forum, Tech. Rep. CS-93-214, Univ. Tennessee, Nov. 1993.
[22]
J. Dongarra, R. Hempel, A. Hey, and D. Walker, “A proposal for a user-level, message-passing interface in a distributed memory environment,” ORNL Tech. Rep. ORNL/TM-12231, Oct. 1992.
[23]
S. Dutt and J. P. Hayes, “Designing fault-tolerant systems using automorphisms,” J. Parallel. Distrib. Computing , vol. 12, pp. 249–268, 1991.
[24]
B. Elspas and J. Turner, “Graphs with circulant adjacency matrices,” J. Combinatorial Theory , no. 9, pp. 297–307, 1970.
[25]
B. G. Fitch and M. E. Giampapa, “The Vulcan operation environment: A brief overview and status report,” in Proc. 5th Workshop Use Parallel Processors Meteorology , ECMWF, Nov. 1992.
[26]
G. Fox and W. Furmanski, “Optimal communication algorithms for regular decompositions on the hypercube,” in Proc. 3rd Conf. Hypercube Concurrent Comput. Applicat. , ACM, pp. 648–713, 1988.
[27]
G. Fox et al. , Solving Problems on Concurrent Processors, Volume I: General Techniques and Regular Problems . Englewood Cliffs, NJ: Prentice-Hall, 1988.
[28]
D. Frye et al. , “An external user interface for scalable parallel systems: FORTRAN interface,” Tech. Rep., IBM Highly Parallel Supercomputing Systems Laboratory, Nov. 1992.
[29]
G. A. Geist, M. T. Heath, B. W. Peyton, and P. H. Worley, “A user's guide to PICL: A portable instrumented communication library,” ORNL Tech. Rep. ORNL/TM-11616, Oct. 1990.
[30]
M. E. Giampapa, B. G. Fitch, G. R. Irwin, and D. G. Shea, “Vulcan operating environment: Programmer's reference,” Intern. Memorandum, IBM T. J. Watson Research Center, Yorktown Heights, NY, Mar. 1991.
[31]
R. Hempel, “The ANL/GMD macros (PARMACS) in FORTRAN for portable parallel programming using the message passing programming model, user's guide and reference manual,” Tech. Memorandum, Gesellschaft für Mathematik und Datenverabeitung mbH, West Germany.
[32]
C. T. Ho, “Optimal broadcasting on SIMD hypercubes without indirect addressing capability,” J. Parallel, Distrib. Computing , vol. 13, no. 2, pp. 246–255, Oct. 1991.
[33]
S. L. Johnsson and C. T. Ho, “Spanning graphs for optimum broadcasting and personalized communication in hypercubes,” IEEE Trans. Comput. , vol. 38, no. 9, pp. 1249–1268, Sept. 1989.
[34]
J. F. Palmer, “The NCUBE family of parallel supercomputers,” in Proc. Int. Conf. Comput. Design , IEEE, 1986.
[35]
A. Skjellum and A. P. Leung, “Zipcode: A portable multicomputer communication library atop the Reactive Kernel,” in Proc. 5th Distrib. Memory Computing Conf. , IEEE, Apr. 1990, pp. 328–337.
[36]
Connection Machine CM-5 Tech. Summary , Thinking Machines Corporation, 1991.
[37]
Express 3.0 Introductory Guide , Parasoft Corporation, 1990.
[38]
Paragon XP/S Overview , Intel Corporation, 1991.
[39]
“Vulcan system summary,” Int. Memorandum, IBM T. J. Watson Research Center, Yorktown Heights, NY.

Cited By

View all
  • (2017)Hierarchical redesign of classic MPI reduction algorithmsThe Journal of Supercomputing10.1007/s11227-016-1779-773:2(713-725)Online publication date: 1-Feb-2017
  • (2015)Practical Massively Parallel SortingProceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures10.1145/2755573.2755595(13-23)Online publication date: 13-Jun-2015
  • (2015)Hierarchical Optimization of MPI Reduce AlgorithmsProceedings of the 13th International Conference on Parallel Computing Technologies - Volume 925110.1007/978-3-319-21909-7_3(21-34)Online publication date: 31-Aug-2015
  • Show More Cited By

Index Terms

  1. CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image IEEE Transactions on Parallel and Distributed Systems
        IEEE Transactions on Parallel and Distributed Systems  Volume 6, Issue 2
        February 1995
        116 pages
        ISSN:1045-9219
        Issue’s Table of Contents

        Publisher

        IEEE Press

        Publication History

        Published: 01 February 1995

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 19 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2017)Hierarchical redesign of classic MPI reduction algorithmsThe Journal of Supercomputing10.1007/s11227-016-1779-773:2(713-725)Online publication date: 1-Feb-2017
        • (2015)Practical Massively Parallel SortingProceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures10.1145/2755573.2755595(13-23)Online publication date: 13-Jun-2015
        • (2015)Hierarchical Optimization of MPI Reduce AlgorithmsProceedings of the 13th International Conference on Parallel Computing Technologies - Volume 925110.1007/978-3-319-21909-7_3(21-34)Online publication date: 31-Aug-2015
        • (2013)inTuneProceedings of the First ACM SIGOPS Conference on Timely Results in Operating Systems10.1145/2524211.2524213(1-16)Online publication date: 3-Nov-2013
        • (2010)A case for coordinated resource management in heterogeneous multicore platformsProceedings of the 2010 international conference on Computer Architecture10.1007/978-3-642-24322-6_27(341-356)Online publication date: 19-Jun-2010
        • (2006)Efficient Adaptive Algorithms for Transposing Small and Large Matrices on Symmetric MultiprocessorsInformatica10.5555/1413878.141388317:4(535-550)Online publication date: 1-Dec-2006
        • (2005)Performance Modeling and Tuning Strategies of Mixed Mode Collective CommunicationsProceedings of the 2005 ACM/IEEE conference on Supercomputing10.1109/SC.2005.56Online publication date: 12-Nov-2005
        • (2005)Optimizing All-to-All Collective Communication by Exploiting Concurrency in Modern NetworksProceedings of the 2005 ACM/IEEE conference on Supercomputing10.1109/SC.2005.51Online publication date: 12-Nov-2005
        • (2004)Efficient Multiple Multicast on Heterogeneous Network of WorkstationsThe Journal of Supercomputing10.1023/B:SUPE.0000022573.11074.0029:1(59-88)Online publication date: 1-Jul-2004
        • (2003)A foundation for designing deadlock-free routing algorithms in wormhole networksJournal of the ACM10.1145/636865.63686950:2(250-275)Online publication date: 1-Mar-2003
        • Show More Cited By

        View Options

        View options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media