Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Pipelined Data Parallel Algorithms-I: Concept and Modeling

Published: 01 October 1990 Publication History

Abstract

The basic concept of pipelined data-parallel algorithms is introduced by contrasting the algorithms with other styles of computation and by a simple example (a pipeline image distance transformation algorithm). Pipelined data-parallel algorithms are a class of algorithms which use pipelined operations and data level partitioning to achieve parallelism. Applications which involve data parallelism and recurrence relations are good candidates for this kind of algorithm. The computations are ideal for distributed-memory multicomputers. By controlling the granularity through data partitioning and overlapping the operations through pipelining, it is possible to achieve a balanced computation on multicomputers. An analytic model is presented for modeling pipelined data-parallel computation on multicomputers. The model uses timed Petri nets to describe data pipelining operations. As a case study, the model is applied to a pipelined matrix multiplication algorithm. Predicted results match closely with the measured performance on a 64-node NCUBE hypercube multicomputer.

References

[1]
{1} W. L. Athas and C. L. Seitz, "Multicomputers: Message-passing concurrent computers," IEEE Comput. Mag., pp. 9-24, Aug. 1988.
[2]
{2} V. Cherkassky and R. Smith, "Efficient mapping and implementation of matrix algorithms on a hypercube," Tech. Rep., Dep. Elec. Eng., Univ. of Minnesota, 1987.
[3]
{3} W. W. Chu, L. J. Holloway, M. T. Lan, and K. Efe, "Task allocation in distributed data processing," IEEE Comput. Mag., pp. 57-69, Nov. 1980.
[4]
{4} W. J. Dally and C. L. Seitz, "Deadlock-free message routing in multiprocessor interconnection networks," IEEE Trans. Comput. , pp. 547-553, May 1987.
[5]
{5} Z. Fang, X. Li, and L. M. Ni, "On the communication complexity of generalized 2-D convolution on array processors," IEEE Trans. Comput., vol. 38, no. 2, pp. 184-194, Feb. 1989.
[6]
{6} G. C. Fox, S. W. Otto, and A. J. Hey, "Matrix algorithms on a hypercube I: Matrix multiplication," Parallel Computing, pp. 17-31, Jan. 1987.
[7]
{7} D. C. Grunwald and D. A. Reed, "Networks for parallel processors: Measurements and prognostications," in Proc. Third Conf. Hypercube Concurrent Comput. Appl., Jan. 1988, pp. 610-619.
[8]
{8} J. P. Hayes, T. N. Mudge, Q. F. Stout, S. Colley, and J. Palmer, "Architecture of a hypercube supercomputer," in Proc. 1986 Int. Conf. Parallel Processing, Aug. 1986, pp. 653-660.
[9]
{9} W. D. Hillis and G. L. Steele, "Data parallel algorithms," Commun. ACM, pp. 1170-1183, Dec. 1986.
[10]
{10} K. Hwang and F. A. Briggs, Computer Architecture and Parallel Processing. New York: McGraw-Hill, 1984.
[11]
{11} K. Hwang, "Advanced parallel processing with supercomputer architectures," Proc. IEEE, pp. 1348-1378, Oct. 1987.
[12]
{12} C. T. King, W. H. Chou, and-L. M. Ni, "Pipelined data parallel algorithms: Part II-Design." IEEE Trans. Parallel Distributed Syst., this issue, pp. 486-499.
[13]
{13} H. T. Kung, "Why systolic architectures?" IEEE Comput. Mag., no. 1, vol. 15, pp. 37-46, Jan. 1982.
[14]
{14} H. T. Kung and C. E. Leiserson, "Systolic arrays (for VLSI)," Sparse Matrix Proc., pp. 32-63, Jan. 1978.
[15]
{15} S. Y. Kung, S. C. Lo, S. N. Jean, and J. N. Hwang, "Wave-front array processors-Concept to implementation" IEEE Comput. Mag., pp. 18-33, July 1987.
[16]
{16} P. R. Ma, E. Y. Lee, and M. Tsuchiya, "A task allocation model for distributed computing systems," IEEE Trans. Comput., pp. 41-47, Jan. 1982.
[17]
{17} T. N. Mudge, G. D. Buzzard, and T. S. Abdel-Rahman, "A high performance operating system for the NCUBE," in Proc. 2nd Conf. Hypercube Multiprocessors, 1986.
[18]
{18} P. A. Nelson and L. Snyder, "Programming paradigms for nonshared memory parallel computers," in The Characteristics of Parallel Algorithms, L. H. Jamieson, D. B. Gannon, and R. J. Douglass, Eds. Cambridge, MA: MIT Press, 1987.
[19]
{19} L. M. Ni and C. T. King, "On partitioning and mapping for hypercube computing," Int. J. Parallel Programming, vol. 17, no. 6, pp. 475-495, Dec. 1988.
[20]
{20} A. Osterhaug, Guide to Parallel Programming on Sequent Computer Systems, Sequent Computer Systems, Inc., 1987.
[21]
{21} C. V. Ramamoorthy and G. S. Ho, "Performance evaluation of asynchronous concurrent systems using Petri nets," IEEE Trans. Software Eng., vol. SE-6, no. 5, pp. 440-449, Sept. 1980.
[22]
{22} D. A. Reed, L. M. Adams, and M. L. Patrick, "Stencils and problem partitionings: Their influence on the performance of multiple processor systems," IEEE Trans. Comput., vol. C-36, no. 7, pp. 845-858, July 1987.
[23]
{23} Y. Saad and M. H. Schultz, "Topological properties of hypercubes," Tech. Rep. YALEU/DCS/RR-389, Dep. Comput. Sci., Yale Univ., June 1985.
[24]
{24} V. Sarkar and J. Hennessy, "Compile-time partitioning and scheduling of parallel programs," Proc. SIGPLAN 1986 Symp. Compiler Construct., 1986, pp. 17-26.
[25]
{25} Y. Shih and J. Fier, "Hypercube systems and key applications," in Parallel Processing for Supercomputing and AI, K. Hwang and D. DeGroot, Eds., 1987.
[26]
{26} F. Shih, C. T. King, and C. Pu, "A two-scan algorithm and architecture to a root for morphological filters," in Proc. Int. Phoenix Conf. Comput. Commun.. AZ, Mar. 1990.
[27]
{27} A. Witkowski, K. Chandrakumar, and G. Macchio, "Concurrent I/O system for the hypercube multiprocessor," in Proc. Third Conf. Hypercube Concurrent Comput. Appl., Jan. 1988.

Cited By

View all
  • (2019)A Pipeline Computing Method of SpTV for Three-Order Tensors on CPU and GPUACM Transactions on Knowledge Discovery from Data10.1145/336357513:6(1-27)Online publication date: 11-Nov-2019
  • (2009)Mapping pipelined applications onto heterogeneous embedded systemsProceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis10.1145/1629435.1629495(443-452)Online publication date: 11-Oct-2009
  • (2009)Evolutionary algorithms for the mapping of pipelined applications onto heterogeneous embedded systemsProceedings of the 11th Annual conference on Genetic and evolutionary computation10.1145/1569901.1570094(1435-1442)Online publication date: 8-Jul-2009
  • Show More Cited By
  1. Pipelined Data Parallel Algorithms-I: Concept and Modeling

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image IEEE Transactions on Parallel and Distributed Systems
      IEEE Transactions on Parallel and Distributed Systems  Volume 1, Issue 4
      October 1990
      123 pages

      Publisher

      IEEE Press

      Publication History

      Published: 01 October 1990

      Author Tags

      1. Index Termspipelined data-parallel algorithms
      2. Petri nets
      3. data level partitioning
      4. data parallelism
      5. parallel algorithms
      6. pipelined operations

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 13 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2019)A Pipeline Computing Method of SpTV for Three-Order Tensors on CPU and GPUACM Transactions on Knowledge Discovery from Data10.1145/336357513:6(1-27)Online publication date: 11-Nov-2019
      • (2009)Mapping pipelined applications onto heterogeneous embedded systemsProceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis10.1145/1629435.1629495(443-452)Online publication date: 11-Oct-2009
      • (2009)Evolutionary algorithms for the mapping of pipelined applications onto heterogeneous embedded systemsProceedings of the 11th Annual conference on Genetic and evolutionary computation10.1145/1569901.1570094(1435-1442)Online publication date: 8-Jul-2009
      • (2006)Mapping data-parallel tasks onto partially reconfigurable hybrid processor architecturesIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2006.88405214:9(1010-1023)Online publication date: 1-Sep-2006
      • (2001)Optimal semi-oblique tilingProceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures10.1145/378580.378619(153-162)Online publication date: 3-Jul-2001
      • (1998)Collection-Aware Optimum Sequencing of Operations and Closed-Form Solutions for the Distribution of a Divisible Load on Arbitrary Processor TreesIEEE Transactions on Parallel and Distributed Systems10.1109/71.6792149:5(429-441)Online publication date: 1-May-1998
      • (1997)Three-dimensional orthogonal tile sizing problemProceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors10.5555/784893.785001Online publication date: 14-Jul-1997
      • (1997)Optimal Orthogonal Tiling of 2-D IterationsJournal of Parallel and Distributed Computing10.1006/jpdc.1997.137145:2(159-165)Online publication date: 15-Sep-1997
      • (1996)The effect of interrupts on software pipeline execution on message-passing architecturesProceedings of the 10th international conference on Supercomputing10.1145/237578.237603(189-196)Online publication date: 1-Jan-1996
      • (1994)Optimal Processor Assignment for a Class of Pipelined ComputationsIEEE Transactions on Parallel and Distributed Systems10.1109/71.2730505:4(439-445)Online publication date: 1-Apr-1994
      • Show More Cited By

      View Options

      View options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media