research-article

Pipelined Data Parallel Algorithms-I: Concept and Modeling

Authors:

L. M. NiAuthors Info & Claims

IEEE Transactions on Parallel and Distributed Systems, Volume 1, Issue 4

Pages 470 - 485

https://doi.org/10.1109/71.80175

Published: 01 October 1990 Publication History

Abstract

The basic concept of pipelined data-parallel algorithms is introduced by contrasting the algorithms with other styles of computation and by a simple example (a pipeline image distance transformation algorithm). Pipelined data-parallel algorithms are a class of algorithms which use pipelined operations and data level partitioning to achieve parallelism. Applications which involve data parallelism and recurrence relations are good candidates for this kind of algorithm. The computations are ideal for distributed-memory multicomputers. By controlling the granularity through data partitioning and overlapping the operations through pipelining, it is possible to achieve a balanced computation on multicomputers. An analytic model is presented for modeling pipelined data-parallel computation on multicomputers. The model uses timed Petri nets to describe data pipelining operations. As a case study, the model is applied to a pipelined matrix multiplication algorithm. Predicted results match closely with the measured performance on a 64-node NCUBE hypercube multicomputer.

References

[1]

{1} W. L. Athas and C. L. Seitz, "Multicomputers: Message-passing concurrent computers," IEEE Comput. Mag., pp. 9-24, Aug. 1988.

Digital Library

[2]

{2} V. Cherkassky and R. Smith, "Efficient mapping and implementation of matrix algorithms on a hypercube," Tech. Rep., Dep. Elec. Eng., Univ. of Minnesota, 1987.

[3]

{3} W. W. Chu, L. J. Holloway, M. T. Lan, and K. Efe, "Task allocation in distributed data processing," IEEE Comput. Mag., pp. 57-69, Nov. 1980.

Digital Library

[4]

{4} W. J. Dally and C. L. Seitz, "Deadlock-free message routing in multiprocessor interconnection networks," IEEE Trans. Comput. , pp. 547-553, May 1987.

Digital Library

[5]

{5} Z. Fang, X. Li, and L. M. Ni, "On the communication complexity of generalized 2-D convolution on array processors," IEEE Trans. Comput., vol. 38, no. 2, pp. 184-194, Feb. 1989.

Digital Library

[6]

{6} G. C. Fox, S. W. Otto, and A. J. Hey, "Matrix algorithms on a hypercube I: Matrix multiplication," Parallel Computing, pp. 17-31, Jan. 1987.

[7]

{7} D. C. Grunwald and D. A. Reed, "Networks for parallel processors: Measurements and prognostications," in Proc. Third Conf. Hypercube Concurrent Comput. Appl., Jan. 1988, pp. 610-619.

Digital Library

[8]

{8} J. P. Hayes, T. N. Mudge, Q. F. Stout, S. Colley, and J. Palmer, "Architecture of a hypercube supercomputer," in Proc. 1986 Int. Conf. Parallel Processing, Aug. 1986, pp. 653-660.

[9]

{9} W. D. Hillis and G. L. Steele, "Data parallel algorithms," Commun. ACM, pp. 1170-1183, Dec. 1986.

[10]

{10} K. Hwang and F. A. Briggs, Computer Architecture and Parallel Processing. New York: McGraw-Hill, 1984.

Digital Library

[11]

{11} K. Hwang, "Advanced parallel processing with supercomputer architectures," Proc. IEEE, pp. 1348-1378, Oct. 1987.

[12]

{12} C. T. King, W. H. Chou, and-L. M. Ni, "Pipelined data parallel algorithms: Part II-Design." IEEE Trans. Parallel Distributed Syst., this issue, pp. 486-499.

[13]

{13} H. T. Kung, "Why systolic architectures?" IEEE Comput. Mag., no. 1, vol. 15, pp. 37-46, Jan. 1982.

[14]

{14} H. T. Kung and C. E. Leiserson, "Systolic arrays (for VLSI)," Sparse Matrix Proc., pp. 32-63, Jan. 1978.

[15]

{15} S. Y. Kung, S. C. Lo, S. N. Jean, and J. N. Hwang, "Wave-front array processors-Concept to implementation" IEEE Comput. Mag., pp. 18-33, July 1987.

Digital Library

[16]

{16} P. R. Ma, E. Y. Lee, and M. Tsuchiya, "A task allocation model for distributed computing systems," IEEE Trans. Comput., pp. 41-47, Jan. 1982.

[17]

{17} T. N. Mudge, G. D. Buzzard, and T. S. Abdel-Rahman, "A high performance operating system for the NCUBE," in Proc. 2nd Conf. Hypercube Multiprocessors, 1986.

[18]

{18} P. A. Nelson and L. Snyder, "Programming paradigms for nonshared memory parallel computers," in The Characteristics of Parallel Algorithms, L. H. Jamieson, D. B. Gannon, and R. J. Douglass, Eds. Cambridge, MA: MIT Press, 1987.

[19]

{19} L. M. Ni and C. T. King, "On partitioning and mapping for hypercube computing," Int. J. Parallel Programming, vol. 17, no. 6, pp. 475-495, Dec. 1988.

Digital Library

[20]

{20} A. Osterhaug, Guide to Parallel Programming on Sequent Computer Systems, Sequent Computer Systems, Inc., 1987.

[21]

{21} C. V. Ramamoorthy and G. S. Ho, "Performance evaluation of asynchronous concurrent systems using Petri nets," IEEE Trans. Software Eng., vol. SE-6, no. 5, pp. 440-449, Sept. 1980.

Digital Library

[22]

{22} D. A. Reed, L. M. Adams, and M. L. Patrick, "Stencils and problem partitionings: Their influence on the performance of multiple processor systems," IEEE Trans. Comput., vol. C-36, no. 7, pp. 845-858, July 1987.

[23]

{23} Y. Saad and M. H. Schultz, "Topological properties of hypercubes," Tech. Rep. YALEU/DCS/RR-389, Dep. Comput. Sci., Yale Univ., June 1985.

[24]

{24} V. Sarkar and J. Hennessy, "Compile-time partitioning and scheduling of parallel programs," Proc. SIGPLAN 1986 Symp. Compiler Construct., 1986, pp. 17-26.

[25]

{25} Y. Shih and J. Fier, "Hypercube systems and key applications," in Parallel Processing for Supercomputing and AI, K. Hwang and D. DeGroot, Eds., 1987.

[26]

{26} F. Shih, C. T. King, and C. Pu, "A two-scan algorithm and architecture to a root for morphological filters," in Proc. Int. Phoenix Conf. Comput. Commun.. AZ, Mar. 1990.

[27]

{27} A. Witkowski, K. Chandrakumar, and G. Macchio, "Concurrent I/O system for the hypercube multiprocessor," in Proc. Third Conf. Hypercube Concurrent Comput. Appl., Jan. 1988.

Cited By

Yang WLi KLi K(2019)A Pipeline Computing Method of SpTV for Three-Order Tensors on CPU and GPUACM Transactions on Knowledge Discovery from Data10.1145/336357513:6(1-27)Online publication date: 11-Nov-2019
https://dl.acm.org/doi/10.1145/3363575
Tumeo ABranca MCamerini LPilato CLanzi PFerrandi FSciuto DRosenstiel WWakabayashi K(2009)Mapping pipelined applications onto heterogeneous embedded systemsProceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis10.1145/1629435.1629495(443-452)Online publication date: 11-Oct-2009
https://dl.acm.org/doi/10.1145/1629435.1629495
Branca MCamerini LFerrandi FLanzi PPilato CSciuto DTumeo ARothlauf F(2009)Evolutionary algorithms for the mapping of pipelined applications onto heterogeneous embedded systemsProceedings of the 11th Annual conference on Genetic and evolutionary computation10.1145/1569901.1570094(1435-1442)Online publication date: 8-Jul-2009
https://dl.acm.org/doi/10.1145/1569901.1570094
Show More Cited By

Pipelined Data Parallel Algorithms-I: Concept and Modeling
1. Mathematics of computing
2. Theory of computation
  1. Models of computation
    1. Concurrency

Recommendations

Pipelined data parallel algorithms—concept and modeling
ICS '88: Proceedings of the 2nd international conference on Supercomputing

A new style of efficient parallel algorithms on distributed-memory multiprocessors is introduced, which exploits parallelism through pipelined parallel computation, or large-grain pipelining. By using macro-pipelining between nodes in the system, large-...
Pipelined Data Parallel Algorithms-II: Design

A methodology for designing pipelined data-parallel algorithms on multicomputers is studied. The design procedure starts with a sequential algorithm which can be expressed as a nested loop with constant loop-carried dependencies. The procedure's main ...
Pipelined data parallel algorithms--concept, design, and modeling

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems

IEEE Transactions on Parallel and Distributed Systems Volume 1, Issue 4

October 1990

123 pages

ISSN:1045-9219

Issue’s Table of Contents

Copyright © Copyright © 1990 IEEE. All Rights Reserved.

Publisher

IEEE Press

Publication History

Published: 01 October 1990

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yang WLi KLi K(2019)A Pipeline Computing Method of SpTV for Three-Order Tensors on CPU and GPUACM Transactions on Knowledge Discovery from Data10.1145/336357513:6(1-27)Online publication date: 11-Nov-2019
https://dl.acm.org/doi/10.1145/3363575
Tumeo ABranca MCamerini LPilato CLanzi PFerrandi FSciuto DRosenstiel WWakabayashi K(2009)Mapping pipelined applications onto heterogeneous embedded systemsProceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis10.1145/1629435.1629495(443-452)Online publication date: 11-Oct-2009
https://dl.acm.org/doi/10.1145/1629435.1629495
Branca MCamerini LFerrandi FLanzi PPilato CSciuto DTumeo ARothlauf F(2009)Evolutionary algorithms for the mapping of pipelined applications onto heterogeneous embedded systemsProceedings of the 11th Annual conference on Genetic and evolutionary computation10.1145/1569901.1570094(1435-1442)Online publication date: 8-Jul-2009
https://dl.acm.org/doi/10.1145/1569901.1570094
Vikram KVasudevan V(2006)Mapping data-parallel tasks onto partially reconfigurable hybrid processor architecturesIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2006.88405214:9(1010-1023)Online publication date: 1-Sep-2006
https://dl.acm.org/doi/10.1109/TVLSI.2006.884052
Andonov RBalev SRajopadhye SYanev NRosenberg A(2001)Optimal semi-oblique tilingProceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures10.1145/378580.378619(153-162)Online publication date: 3-Jul-2001
https://dl.acm.org/doi/10.1145/378580.378619
Barlas G(1998)Collection-Aware Optimum Sequencing of Operations and Closed-Form Solutions for the Distribution of a Divisible Load on Arbitrary Processor TreesIEEE Transactions on Parallel and Distributed Systems10.1109/71.6792149:5(429-441)Online publication date: 1-May-1998
https://dl.acm.org/doi/10.1109/71.679214
Andonov RYanev NBourzoufi H(1997)Three-dimensional orthogonal tile sizing problemProceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors10.5555/784893.785001Online publication date: 14-Jul-1997
https://dl.acm.org/doi/10.5555/784893.785001
Andonov RRajopadhye S(1997)Optimal Orthogonal Tiling of 2-D IterationsJournal of Parallel and Distributed Computing10.1006/jpdc.1997.137145:2(159-165)Online publication date: 15-Sep-1997
https://dl.acm.org/doi/10.1006/jpdc.1997.1371
Van der Wijngaart RSarukkai SMehra PYew P(1996)The effect of interrupts on software pipeline execution on message-passing architecturesProceedings of the 10th international conference on Supercomputing10.1145/237578.237603(189-196)Online publication date: 1-Jan-1996
https://dl.acm.org/doi/10.1145/237578.237603
Choudhary ANarahari BNicol DSimha R(1994)Optimal Processor Assignment for a Class of Pipelined ComputationsIEEE Transactions on Parallel and Distributed Systems10.1109/71.2730505:4(439-445)Online publication date: 1-Apr-1994
https://dl.acm.org/doi/10.1109/71.273050
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents