article

Pattern-Driven Automatic Parallelization

Author:

Christoph W. KesslerAuthors Info & Claims

Scientific Programming, Volume 5, Issue 3

Pages 251 - 274

https://doi.org/10.1155/1996/406379

Published: 01 August 1996 Publication History

Abstract

This article describes a knowledge-based system for automatic parallelization of a wide class of sequential numerical codes operating on vectors and dense matrices, and for execution on distributed memory message-passing multiprocessors. Its main feature is a fast and powerful pattern recognition tool that locally identifies frequently occurring computations and programming concepts in the source code. This tool also works for dusty deck codes that have been "encrypted" by former machine-specific code transformations. Successful pattern recognition guides sophisticated code transformations including local algorithm replacement such that the parallelized code need not emerge from the sequential program structure by just parallelizing the loops. It allows access to an expert's knowledge on useful parallel algorithms, available machine-specific library routines, and powerful program transformations. The partially restored program semantics also supports local array alignment, distribution, and redistribution, and allows for faster and more exact prediction of the performance of the parallelized target code than is usually possible.

References

[1]

Z. Ammarguellat and W. L. Harrison III, "Automatic recognition of induction variables and recurrence relations by abstract interpretation," in Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation , White Plains, New York, June 20-22, 1990 (ACM Press).

[2]

V. Balasundaram, G. Fox, K. Kennedy, and U. Kremer, "A static performance estimator to guide data partitioning decisions," in Proceedings of ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, vol. 3, Williamsburg, VA, April 1991, pp. 213-223.

[3]

R. Barrett, M. Berry, T. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. van der Vorst, TEMPLATES for the Solution of Linear Systems: Building Blocks for lterative Methods. Philadelphia: SIAM, 1993.

[4]

Michael Berry (Ed.), "Scientific workload characterization by loop based analyses," Performance Eval. Rev., vol. 19, pp. 17-29, Feb. 1992.

Digital Library

[5]

S. Bhansali, J. R. Hagemeister, C. S. Raghavendra, and H. Sivaraman, "Parallelizing sequential programs by algorithm-level transformations," in Proceedings of the Third Workshop on Program Recognition. Washington, Nov. 1994. IEEE Computer Society Press, 1994, pp. 100-107.

[6]

R. Bixby, K. Kennedy, and U. Kremer, "Automatic data layout using 0-1 integer programming," Center for Research on Parallel Computation, Rice University, Houston, TX, Tech. Rep. CRPC-TR93349-S, Nov. 1993.

[7]

P. Bose, "Heuristic rule-based program transformations for enhanced vectorization," in Proc. Int. Conf. on Parallel Processing, pp. 63-66, 1988.

[8]

P. Bose, "Interactive program improvement via EAVE: An expert adviser for vectorization," in Proc. Int. Conf. on Supercomputing, pp. 119- 130, 1988.

[9]

T. Brandes and M. Sommer, "A knowledge-based parallelization tool in a programming environment," in 16th Int. Conf. on Parallel Processing, 1987, p. 446.

[10]

D. Callahan and K. Kennedy, "Compiling programs for distributed memory multiprocessors," J. Supercomput., vol. 2, pp. 151-169, 1988.

[11]

B. Chapman, P. Mehrotra, and H. Zima, "Programming in Vienna Fortran," Sci. Programming, vol. 1, pp. 31-50, 1992.

Digital Library

[12]

A. Dierstein, R. Hayer, and T. Rauber, "Automatic parallelization for distributed memory rnultiprocessors," in Automatic Parallelization-- New Approaches to Code Generation, Data Distribution and Performance Prediction, C. W. Kessler, Ed. Braunschweig; Wiesbaden: Verlag Vieweg, 1994, pp. 192-217.

[13]

B. DiMartino and G. Ianello, "Towards automated code parallelization through program comprehension, in Proceedings of the Third Workshop on Program Recognition. Washington, Nov. 1994. IEEE Computer Society Press, 1994, pp. 108-115.

[14]

B. DiMartino and C. W. Kessler, "Program comprehension engines for automatic parallelization: A comparative study," Proceedings of the First International Workshop on Software Engineering for Parallel and Distributed Systems. I. Jelly, I. Gorton, and P. Croll, Eds. Chapman & Hall, 1996, pp. 146-157, London.

[15]

J. J. Dongarra, J. DuCroz, S. Hammarling, and R. Hanson, "An extended set of Fortran basic linear algebra subprograms," ACM Trans. Math. Software, vol. 14, pp. 1-32, 1988.

Digital Library

[16]

T. Fahringer, "Automatic performance prediction for parallel programs on massively parallel computers," PhD thesis, Technisch-Naturwissenschaftliche Fakultät der Universität Wien, 1993.

[17]

P. Feautrier, "Dataflow analysis of array and scalar references," Int. J. Parallel Programming, vol. 20, pp. 23-53, Feb. 1991.

[18]

C. Ferdinand, H. Seidl, and R. Wilhelm, "Tree automata for code selection," in Code Generation--Concepts, Tools, Techniques. R. Giegerich and S. L. Graham, Eds. Springer Verlag, Workshops in Computing series, 1992, pp. 30-50.

[19]

J. Ferrante, K. J. Ottenstein, and J. D. Warren, "The program dependence graph and its use in optimization," ACM Trans. Programming Languages Systems, vol. 9, pp. 319-349, 1987.

Digital Library

[20]

A. Formella, S. Müller, W. Paul, and A. Bingert, "Isolating the reasons for the performance of parallel machines on numerical programs," in Automatic Parallelization--New Approaches to Code Generation, Data Distribution and Performance Prediction, C. W. Kessler, Ed. Braunschweig; Wiesbaden: Verlag Vieweg, 1994, pp. 34-64.

[21]

G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, and D. Walker, Solving Problems on Concurrent Processors, vol. 1: General Techniques and Regular Problems. Englewood Cliffs, NJ: Prentice-Hall, 1988.

[22]

T.L. Freeman and C. Philips, Parallel Numerical Algorithms. Englewood Cliffs, NJ: Prentice-Hall, 1992.

[23]

S. S. Fried, "Personal supercomputing with the Intel i860," BYTE, pp. 347-357, Jan. 1991.

[24]

K. A. Gallivan, M. T. Heath, E. Ng, J. M. Ortega, B. W. Peyton, R. J. Plemmons, C. H. Romine, A. H. Sameh, and R. G. Voigt, Parallel Algorithms for Matrix Computations. Philadelphia: SIAM, 1990.

[25]

H. M. Gerndt, "Automatic parallelization for distributed-memory multiprocessing systems," PhD thesis, Universität Bonn, 1989.

[26]

M. Gupta, "Automatic data partitioning on distributed memory multicomputers," University of Illinois at Urbana-Champaign, Tech. Rep. UILU-ENG-92-2237 or CRHC-92-19, 1992.

[27]

M.T. Harandi and J. Q. Ning, "Knowledge-based program analysis," IEEE Software, pp. 74-81, Jan. 1990.

Digital Library

[28]

High Performance Fortran Forum (HPFF): "High performance Fortran language specification," Sci. Programming, vol. 2, 1993.

[29]

S. Hiranandani, K. Kennedy, and C.-W. Tseng, "Compiler-support for machine-independent parallel programming in Fortran-D," Tech. Rep. Rice COMP TR91-149, Rice University, Houston, TX, Mar. 1991.

[30]

J. Hulman, S. Andel, B. M. Chapman, and H. P. Zima, "Intelligent parallelization within the Vienna Fortran compilation system, in Fourth Workshop on Compilers for Parallel Computers, H. J. Sips, Ed. Delft University of Technology, 1993, pp. 455-467.

[31]

K. Ikudome, G. C. Fox, A. Kolawa, and J. W. Flower, "An automatic and symbolic parallelization system for distributed memory parallel computers, in Fifth Distributed Memory Computing Conference (DMCC5)." Charleston, SC: IEEE Computer Society Press, 1990, pp. 1105-1114.

[32]

N. D. Jones, C. K. Gomard, and P. Sestofl, Partial Evaluation and Automatic Program Generation. Englewood Cliffs, NJ: Prentice-Hall, 1993.

[33]

C. W. Kessler, "Automatische Parallelisierung numerischer Programme durch Mustererkennung," PhD thesis, Universität Saarbrücken, 1994.

[34]

C.W. Kessler, "Symbolic array data flow analysis and pattern recognition in dense matrix computations, in Proceedings of IFIP WG10.3 Working Conference on Programming Environments for Massively Parallel Distributed Systems. K. M. Decker and R. M. Rehmann, Eds. Basel, Switzerland: Birkhäuser Verlag AG, pp. 57-68, 1994.

[35]

K. Knobe, J. D. Lukas, and G. L. Steele, "Data optimization: Allocation of arrays to reduce communication on SIMD machines," J. Parallel Distrib. Comput., vol. 8, pp. 102-118, 1990.

Digital Library

[36]

K. Knobe and V. Natarajan, "Data optimization: Minimizing residual interprocessor data motion on SIMD machines, in Third Symposium on the Frontiers of Massively Parallel Computation. J. Jájá, Ed. Los Alamitos, CA, 1990, pp. 416-423.

[37]

P.M. Kogge and H. S. Stone, "A parallel algorithm for the efficient solution of a general class of recurrence equations," IEEE Trans. Computers, vol. C-22, pp. 786-793, Aug. 1973.

[38]

W. Kozaczynski, J. Ning, and A. Engberts, "Program concept recognition and transformation," IEEE Trans. Software Eng., vol. 18, Dec. 1993, pp. 1065-1075.

Digital Library

[39]

U. Kremer, "NP-completeness of dynamic remapping," Center for Research on Parallel Computation, Rice University, Houston, TX, Tech. Rep. CRPC-TR93330-S, Aug. 1993. See also: Proc. Fourth Workshop on Compilers for Parallel Computers , Delft, Dec. 1993.

[40]

C. Lawson, R. Hanson, D. Kincaid, and F. Krogh, "Basic linear algebra subprograms for Fortran usage," ACM Trans. Math. Software, vol. 5, pp. 308-325, 1979.

Digital Library

[41]

J. Li and M. Chen, Index domain alignment: Minimizing cost of cross-referencing between distributed arrays, in Third Symposium on the Frontiers of Massively Parallel Computation. J. Jájá Ed. IEEE Computer Society Press, Los Alamitos, CA, 1990, pp. 424-433.

[42]

J. Li and M. Chen, "Compiling communication-efficient programs for massively parallel machines," IEEE Trans. Parallel Distrib. Systems, vol. 2, pp. 361-375, July 1991.

Digital Library

[43]

D.E. Maydan, S. P. Amarasinghe, and M. S. Lam, "Array data-flow analysis and its use in array privatization," in Proceedings of ACM SIGPLAN Conference on Principles of Programming Languages . 1993, pp. 2-15, ACM Press.

[44]

F. McMahon, "The Livermore Fortran kernels: A test of the numeric performance range," Lawrence Livermore National Laboratory, Tech. Rep., 1986.

[45]

R. Metzger, "Automated recognition of parallel algorithms in scientific applications," in Workshop on Plan Recognition at IJCAI'95. 1995.

[46]

A. Mohamed, G. Fox, G. Laszewski, M. Parashar, T. Haupt, K. Mills, Y. Lu, N. Lin, and N. Yeh, "Application benchmark set for Fortran D and high performance Fortran," Northeast Parallel Architectures Center, Syracuse, NY. Tech. Rep. 327, 1992.

[47]

S. S. Pinter and R. Y. Pinter, "Program optimization and parallelization using idioms, in ACM SIGPLAN Symposium on Principles of Programming Languages. 1991, pp. 79-92, ACM Press.

Digital Library

[48]

W. H. Press, S. A. Teukolski, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C--The Art of Scientific Computing, 2nd ed. Cambridge: Cambridge University Press, 1992.

[49]

X. Redon and P. Feautrier, "Detection of recurrences in sequential programs with loops," in PARLE 93, Springer LNCS, vol. 694, pp. 132- 145, 1993.

[50]

C. Rich and L. M. Wills, "Recognizinga program's design: A graph-parsing approach," IEEE Software , pp. 82-89, Jan. 1990.

Digital Library

[51]

G. Sabot and S. Wholey, "Cmax: A Fortran translator for the connection machine system, in Int. ACM Conf. on Supercomputing, 1993, pp. 147-156.

Digital Library

[52]

Z. Shen, Z. Li. and P. Yew, "An empirical study of Fortran programs for parallelizing compilers," IEEE Trans. Parallel Distrib. Systems, vol. 1, pp. 356-364, July 1990.

Digital Library

[53]

L. Snyder, "Recognition and selection of idioms for code optimization," Acta Informatica, vol. 17, pp. 327-348, 1982.

Digital Library

[54]

D. Whitfield and M. L. Soffa, "An approach to ordering optimizing transformations," in Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 1990, pp. 137-146, ACM Press.

[55]

H. Zima, H. Bast, and M. Gerndt, "Superb: A tool for semi-automatic MIMD/SIMD parallelization," Parallel Computing, vol. 6, pp. 1-18, 1988.

[56]

H. Zima and B. Chapman, Supercompilers for Parallel and Vector Computers. ACM Press Frontier Series, Addison-Wesley, 1990.

Cited By

Chelini LZinenko OGrosser TCorporaal H(2019)Declarative Loop Tactics for Domain-specific OptimizationACM Transactions on Architecture and Code Optimization10.1145/337226616:4(1-25)Online publication date: 26-Dec-2019
https://dl.acm.org/doi/10.1145/3372266
von Koch TManilov SVasiladiotis CCole MFranke BDubach CXue J(2018)Towards a compiler analysis for parallel algorithmic skeletonsProceedings of the 27th International Conference on Compiler Construction10.1145/3178372.3179513(174-184)Online publication date: 24-Feb-2018
https://dl.acm.org/doi/10.1145/3178372.3179513
Huang CChen QWang ZPower ROrtiz JLi JXiao ZLu SRiedel E(2015)SpartanProceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference10.5555/2813767.2813768(1-15)Online publication date: 8-Jul-2015
https://dl.acm.org/doi/10.5555/2813767.2813768
Show More Cited By

Index Terms

Pattern-Driven Automatic Parallelization
1. Theory of computation
  1. Models of computation
    1. Concurrency
      1. Parallel computing models

Recommendations

Pattern-driven automatic program transformation and parallelization
PDP '95: Proceedings of the 3rd Euromicro Workshop on Parallel and Distributed Processing

The PARAMAT system is able to automatically parallelize a wide class of sequential numeric codes operating on dense vectors, matrices etc. without any user interaction, for execution on distributed memory message-passing multiprocessors. A powerful ...
A comparison of automatic parallelization tools/compilers on the SGI origin 2000
SC '98: Proceedings of the 1998 ACM/IEEE conference on Supercomputing

Porting applications to new high performance parallel and distributed computing platforms is a challenging task. Since writing parallel code by hand is time consuming and costly, porting codes would ideally be automated by using some parallelization ...
A pattern-supported parallelization approach
PMAM '13: Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores

In the embedded systems domain a trend towards multi-and many-core processors is evident. For the exploitation of these additional processing elements parallel software is inevitable. The pattern-supported parallelization approach, which is introduced ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Scientific Programming

Scientific Programming Volume 5, Issue 3

August 1996

101 pages

Issue’s Table of Contents

Publisher

IOS Press

Netherlands

Publication History

Published: 01 August 1996

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chelini LZinenko OGrosser TCorporaal H(2019)Declarative Loop Tactics for Domain-specific OptimizationACM Transactions on Architecture and Code Optimization10.1145/337226616:4(1-25)Online publication date: 26-Dec-2019
https://dl.acm.org/doi/10.1145/3372266
von Koch TManilov SVasiladiotis CCole MFranke BDubach CXue J(2018)Towards a compiler analysis for parallel algorithmic skeletonsProceedings of the 27th International Conference on Compiler Construction10.1145/3178372.3179513(174-184)Online publication date: 24-Feb-2018
https://dl.acm.org/doi/10.1145/3178372.3179513
Huang CChen QWang ZPower ROrtiz JLi JXiao ZLu SRiedel E(2015)SpartanProceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference10.5555/2813767.2813768(1-15)Online publication date: 8-Jul-2015
https://dl.acm.org/doi/10.5555/2813767.2813768
Park EKartsaklis CJanjusic TCavazos JAmarasinghe SKamil SSadayappan P(2014)Trace-Driven Memory Access Pattern Recognition in Computational KernelsProceedings of the Second Workshop on Optimizing Stencil Computations10.1145/2686745.2686748(25-32)Online publication date: 20-Oct-2014
https://dl.acm.org/doi/10.1145/2686745.2686748
Shafiee Sarvestani AHansson EKessler C(2013)Extensible Recognition of Algorithmic Patterns in DSP Programs for Automatic ParallelizationInternational Journal of Parallel Programming10.1007/s10766-012-0229-241:6(806-824)Online publication date: 1-Dec-2013
https://dl.acm.org/doi/10.1007/s10766-012-0229-2
Arenaz MTouriño JDoallo R(2008)XARKACM Transactions on Programming Languages and Systems10.1145/1391956.139195930:6(1-56)Online publication date: 30-Oct-2008
https://dl.acm.org/doi/10.1145/1391956.1391959
Cong JJiang WHutton MChow P(2008)Pattern-based behavior synthesis for FPGA resource reductionProceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays10.1145/1344671.1344688(107-116)Online publication date: 24-Feb-2008
https://dl.acm.org/doi/10.1145/1344671.1344688
Baumstark LWills L(2005)Retargeting Sequential Image-Processing Programs for Data Parallel ExecutionIEEE Transactions on Software Engineering10.1109/TSE.2005.2631:2(116-136)Online publication date: 1-Feb-2005
https://dl.acm.org/doi/10.1109/TSE.2005.26
Di Martino BKeβler C(2000)Two Program Comprehension Tools for Automatic ParallelizationIEEE Concurrency10.1109/4434.8243118:1(37-47)Online publication date: 1-Jan-2000
https://dl.acm.org/doi/10.1109/4434.824311
Bodin FMével YQuiniou REgan GBrent RGannon D(1998)A user level program transformation toolProceedings of the 12th international conference on Supercomputing10.1145/277830.277868(180-187)Online publication date: 13-Jul-1998
https://dl.acm.org/doi/10.1145/277830.277868

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents