Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Pattern-Driven Automatic Parallelization

Published: 01 August 1996 Publication History

Abstract

This article describes a knowledge-based system for automatic parallelization of a wide class of sequential numerical codes operating on vectors and dense matrices, and for execution on distributed memory message-passing multiprocessors. Its main feature is a fast and powerful pattern recognition tool that locally identifies frequently occurring computations and programming concepts in the source code. This tool also works for dusty deck codes that have been "encrypted" by former machine-specific code transformations. Successful pattern recognition guides sophisticated code transformations including local algorithm replacement such that the parallelized code need not emerge from the sequential program structure by just parallelizing the loops. It allows access to an expert's knowledge on useful parallel algorithms, available machine-specific library routines, and powerful program transformations. The partially restored program semantics also supports local array alignment, distribution, and redistribution, and allows for faster and more exact prediction of the performance of the parallelized target code than is usually possible.

References

[1]
Z. Ammarguellat and W. L. Harrison III, "Automatic recognition of induction variables and recurrence relations by abstract interpretation," in Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation , White Plains, New York, June 20-22, 1990 (ACM Press).
[2]
V. Balasundaram, G. Fox, K. Kennedy, and U. Kremer, "A static performance estimator to guide data partitioning decisions," in Proceedings of ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, vol. 3, Williamsburg, VA, April 1991, pp. 213-223.
[3]
R. Barrett, M. Berry, T. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. van der Vorst, TEMPLATES for the Solution of Linear Systems: Building Blocks for lterative Methods. Philadelphia: SIAM, 1993.
[4]
Michael Berry (Ed.), "Scientific workload characterization by loop based analyses," Performance Eval. Rev., vol. 19, pp. 17-29, Feb. 1992.
[5]
S. Bhansali, J. R. Hagemeister, C. S. Raghavendra, and H. Sivaraman, "Parallelizing sequential programs by algorithm-level transformations," in Proceedings of the Third Workshop on Program Recognition. Washington, Nov. 1994. IEEE Computer Society Press, 1994, pp. 100-107.
[6]
R. Bixby, K. Kennedy, and U. Kremer, "Automatic data layout using 0-1 integer programming," Center for Research on Parallel Computation, Rice University, Houston, TX, Tech. Rep. CRPC-TR93349-S, Nov. 1993.
[7]
P. Bose, "Heuristic rule-based program transformations for enhanced vectorization," in Proc. Int. Conf. on Parallel Processing, pp. 63-66, 1988.
[8]
P. Bose, "Interactive program improvement via EAVE: An expert adviser for vectorization," in Proc. Int. Conf. on Supercomputing, pp. 119- 130, 1988.
[9]
T. Brandes and M. Sommer, "A knowledge-based parallelization tool in a programming environment," in 16th Int. Conf. on Parallel Processing, 1987, p. 446.
[10]
D. Callahan and K. Kennedy, "Compiling programs for distributed memory multiprocessors," J. Supercomput., vol. 2, pp. 151-169, 1988.
[11]
B. Chapman, P. Mehrotra, and H. Zima, "Programming in Vienna Fortran," Sci. Programming, vol. 1, pp. 31-50, 1992.
[12]
A. Dierstein, R. Hayer, and T. Rauber, "Automatic parallelization for distributed memory rnultiprocessors," in Automatic Parallelization-- New Approaches to Code Generation, Data Distribution and Performance Prediction, C. W. Kessler, Ed. Braunschweig; Wiesbaden: Verlag Vieweg, 1994, pp. 192-217.
[13]
B. DiMartino and G. Ianello, "Towards automated code parallelization through program comprehension, in Proceedings of the Third Workshop on Program Recognition. Washington, Nov. 1994. IEEE Computer Society Press, 1994, pp. 108-115.
[14]
B. DiMartino and C. W. Kessler, "Program comprehension engines for automatic parallelization: A comparative study," Proceedings of the First International Workshop on Software Engineering for Parallel and Distributed Systems. I. Jelly, I. Gorton, and P. Croll, Eds. Chapman & Hall, 1996, pp. 146-157, London.
[15]
J. J. Dongarra, J. DuCroz, S. Hammarling, and R. Hanson, "An extended set of Fortran basic linear algebra subprograms," ACM Trans. Math. Software, vol. 14, pp. 1-32, 1988.
[16]
T. Fahringer, "Automatic performance prediction for parallel programs on massively parallel computers," PhD thesis, Technisch-Naturwissenschaftliche Fakultät der Universität Wien, 1993.
[17]
P. Feautrier, "Dataflow analysis of array and scalar references," Int. J. Parallel Programming, vol. 20, pp. 23-53, Feb. 1991.
[18]
C. Ferdinand, H. Seidl, and R. Wilhelm, "Tree automata for code selection," in Code Generation--Concepts, Tools, Techniques. R. Giegerich and S. L. Graham, Eds. Springer Verlag, Workshops in Computing series, 1992, pp. 30-50.
[19]
J. Ferrante, K. J. Ottenstein, and J. D. Warren, "The program dependence graph and its use in optimization," ACM Trans. Programming Languages Systems, vol. 9, pp. 319-349, 1987.
[20]
A. Formella, S. Müller, W. Paul, and A. Bingert, "Isolating the reasons for the performance of parallel machines on numerical programs," in Automatic Parallelization--New Approaches to Code Generation, Data Distribution and Performance Prediction, C. W. Kessler, Ed. Braunschweig; Wiesbaden: Verlag Vieweg, 1994, pp. 34-64.
[21]
G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, and D. Walker, Solving Problems on Concurrent Processors, vol. 1: General Techniques and Regular Problems. Englewood Cliffs, NJ: Prentice-Hall, 1988.
[22]
T.L. Freeman and C. Philips, Parallel Numerical Algorithms. Englewood Cliffs, NJ: Prentice-Hall, 1992.
[23]
S. S. Fried, "Personal supercomputing with the Intel i860," BYTE, pp. 347-357, Jan. 1991.
[24]
K. A. Gallivan, M. T. Heath, E. Ng, J. M. Ortega, B. W. Peyton, R. J. Plemmons, C. H. Romine, A. H. Sameh, and R. G. Voigt, Parallel Algorithms for Matrix Computations. Philadelphia: SIAM, 1990.
[25]
H. M. Gerndt, "Automatic parallelization for distributed-memory multiprocessing systems," PhD thesis, Universität Bonn, 1989.
[26]
M. Gupta, "Automatic data partitioning on distributed memory multicomputers," University of Illinois at Urbana-Champaign, Tech. Rep. UILU-ENG-92-2237 or CRHC-92-19, 1992.
[27]
M.T. Harandi and J. Q. Ning, "Knowledge-based program analysis," IEEE Software, pp. 74-81, Jan. 1990.
[28]
High Performance Fortran Forum (HPFF): "High performance Fortran language specification," Sci. Programming, vol. 2, 1993.
[29]
S. Hiranandani, K. Kennedy, and C.-W. Tseng, "Compiler-support for machine-independent parallel programming in Fortran-D," Tech. Rep. Rice COMP TR91-149, Rice University, Houston, TX, Mar. 1991.
[30]
J. Hulman, S. Andel, B. M. Chapman, and H. P. Zima, "Intelligent parallelization within the Vienna Fortran compilation system, in Fourth Workshop on Compilers for Parallel Computers, H. J. Sips, Ed. Delft University of Technology, 1993, pp. 455-467.
[31]
K. Ikudome, G. C. Fox, A. Kolawa, and J. W. Flower, "An automatic and symbolic parallelization system for distributed memory parallel computers, in Fifth Distributed Memory Computing Conference (DMCC5)." Charleston, SC: IEEE Computer Society Press, 1990, pp. 1105-1114.
[32]
N. D. Jones, C. K. Gomard, and P. Sestofl, Partial Evaluation and Automatic Program Generation. Englewood Cliffs, NJ: Prentice-Hall, 1993.
[33]
C. W. Kessler, "Automatische Parallelisierung numerischer Programme durch Mustererkennung," PhD thesis, Universität Saarbrücken, 1994.
[34]
C.W. Kessler, "Symbolic array data flow analysis and pattern recognition in dense matrix computations, in Proceedings of IFIP WG10.3 Working Conference on Programming Environments for Massively Parallel Distributed Systems. K. M. Decker and R. M. Rehmann, Eds. Basel, Switzerland: Birkhäuser Verlag AG, pp. 57-68, 1994.
[35]
K. Knobe, J. D. Lukas, and G. L. Steele, "Data optimization: Allocation of arrays to reduce communication on SIMD machines," J. Parallel Distrib. Comput., vol. 8, pp. 102-118, 1990.
[36]
K. Knobe and V. Natarajan, "Data optimization: Minimizing residual interprocessor data motion on SIMD machines, in Third Symposium on the Frontiers of Massively Parallel Computation. J. Jájá, Ed. Los Alamitos, CA, 1990, pp. 416-423.
[37]
P.M. Kogge and H. S. Stone, "A parallel algorithm for the efficient solution of a general class of recurrence equations," IEEE Trans. Computers, vol. C-22, pp. 786-793, Aug. 1973.
[38]
W. Kozaczynski, J. Ning, and A. Engberts, "Program concept recognition and transformation," IEEE Trans. Software Eng., vol. 18, Dec. 1993, pp. 1065-1075.
[39]
U. Kremer, "NP-completeness of dynamic remapping," Center for Research on Parallel Computation, Rice University, Houston, TX, Tech. Rep. CRPC-TR93330-S, Aug. 1993. See also: Proc. Fourth Workshop on Compilers for Parallel Computers , Delft, Dec. 1993.
[40]
C. Lawson, R. Hanson, D. Kincaid, and F. Krogh, "Basic linear algebra subprograms for Fortran usage," ACM Trans. Math. Software, vol. 5, pp. 308-325, 1979.
[41]
J. Li and M. Chen, Index domain alignment: Minimizing cost of cross-referencing between distributed arrays, in Third Symposium on the Frontiers of Massively Parallel Computation. J. Jájá Ed. IEEE Computer Society Press, Los Alamitos, CA, 1990, pp. 424-433.
[42]
J. Li and M. Chen, "Compiling communication-efficient programs for massively parallel machines," IEEE Trans. Parallel Distrib. Systems, vol. 2, pp. 361-375, July 1991.
[43]
D.E. Maydan, S. P. Amarasinghe, and M. S. Lam, "Array data-flow analysis and its use in array privatization," in Proceedings of ACM SIGPLAN Conference on Principles of Programming Languages . 1993, pp. 2-15, ACM Press.
[44]
F. McMahon, "The Livermore Fortran kernels: A test of the numeric performance range," Lawrence Livermore National Laboratory, Tech. Rep., 1986.
[45]
R. Metzger, "Automated recognition of parallel algorithms in scientific applications," in Workshop on Plan Recognition at IJCAI'95. 1995.
[46]
A. Mohamed, G. Fox, G. Laszewski, M. Parashar, T. Haupt, K. Mills, Y. Lu, N. Lin, and N. Yeh, "Application benchmark set for Fortran D and high performance Fortran," Northeast Parallel Architectures Center, Syracuse, NY. Tech. Rep. 327, 1992.
[47]
S. S. Pinter and R. Y. Pinter, "Program optimization and parallelization using idioms, in ACM SIGPLAN Symposium on Principles of Programming Languages. 1991, pp. 79-92, ACM Press.
[48]
W. H. Press, S. A. Teukolski, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C--The Art of Scientific Computing, 2nd ed. Cambridge: Cambridge University Press, 1992.
[49]
X. Redon and P. Feautrier, "Detection of recurrences in sequential programs with loops," in PARLE 93, Springer LNCS, vol. 694, pp. 132- 145, 1993.
[50]
C. Rich and L. M. Wills, "Recognizinga program's design: A graph-parsing approach," IEEE Software , pp. 82-89, Jan. 1990.
[51]
G. Sabot and S. Wholey, "Cmax: A Fortran translator for the connection machine system, in Int. ACM Conf. on Supercomputing, 1993, pp. 147-156.
[52]
Z. Shen, Z. Li. and P. Yew, "An empirical study of Fortran programs for parallelizing compilers," IEEE Trans. Parallel Distrib. Systems, vol. 1, pp. 356-364, July 1990.
[53]
L. Snyder, "Recognition and selection of idioms for code optimization," Acta Informatica, vol. 17, pp. 327-348, 1982.
[54]
D. Whitfield and M. L. Soffa, "An approach to ordering optimizing transformations," in Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 1990, pp. 137-146, ACM Press.
[55]
H. Zima, H. Bast, and M. Gerndt, "Superb: A tool for semi-automatic MIMD/SIMD parallelization," Parallel Computing, vol. 6, pp. 1-18, 1988.
[56]
H. Zima and B. Chapman, Supercompilers for Parallel and Vector Computers. ACM Press Frontier Series, Addison-Wesley, 1990.

Cited By

View all
  • (2019)Declarative Loop Tactics for Domain-specific OptimizationACM Transactions on Architecture and Code Optimization10.1145/337226616:4(1-25)Online publication date: 26-Dec-2019
  • (2018)Towards a compiler analysis for parallel algorithmic skeletonsProceedings of the 27th International Conference on Compiler Construction10.1145/3178372.3179513(174-184)Online publication date: 24-Feb-2018
  • (2015)SpartanProceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference10.5555/2813767.2813768(1-15)Online publication date: 8-Jul-2015
  • Show More Cited By

Index Terms

  1. Pattern-Driven Automatic Parallelization

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Scientific Programming
    Scientific Programming  Volume 5, Issue 3
    August 1996
    101 pages

    Publisher

    IOS Press

    Netherlands

    Publication History

    Published: 01 August 1996

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 21 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Declarative Loop Tactics for Domain-specific OptimizationACM Transactions on Architecture and Code Optimization10.1145/337226616:4(1-25)Online publication date: 26-Dec-2019
    • (2018)Towards a compiler analysis for parallel algorithmic skeletonsProceedings of the 27th International Conference on Compiler Construction10.1145/3178372.3179513(174-184)Online publication date: 24-Feb-2018
    • (2015)SpartanProceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference10.5555/2813767.2813768(1-15)Online publication date: 8-Jul-2015
    • (2014)Trace-Driven Memory Access Pattern Recognition in Computational KernelsProceedings of the Second Workshop on Optimizing Stencil Computations10.1145/2686745.2686748(25-32)Online publication date: 20-Oct-2014
    • (2013)Extensible Recognition of Algorithmic Patterns in DSP Programs for Automatic ParallelizationInternational Journal of Parallel Programming10.1007/s10766-012-0229-241:6(806-824)Online publication date: 1-Dec-2013
    • (2008)XARKACM Transactions on Programming Languages and Systems10.1145/1391956.139195930:6(1-56)Online publication date: 30-Oct-2008
    • (2008)Pattern-based behavior synthesis for FPGA resource reductionProceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays10.1145/1344671.1344688(107-116)Online publication date: 24-Feb-2008
    • (2005)Retargeting Sequential Image-Processing Programs for Data Parallel ExecutionIEEE Transactions on Software Engineering10.1109/TSE.2005.2631:2(116-136)Online publication date: 1-Feb-2005
    • (2000)Two Program Comprehension Tools for Automatic ParallelizationIEEE Concurrency10.1109/4434.8243118:1(37-47)Online publication date: 1-Jan-2000
    • (1998)A user level program transformation toolProceedings of the 12th international conference on Supercomputing10.1145/277830.277868(180-187)Online publication date: 13-Jul-1998

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media