This thesis discusses several problems facing compiler design for supercomputers. One problem is that of building a comprehensive yet efficient means of representing the flow of data in a program. We propose a means of computing a data dependence graph and labelling the arcs with "direction vectors" which show how the flow of data corresponds to the loop structure of the program. We use these data dependence direction vectors in several high level compiler loop optimizations, namely, loop vectorization, loop fission, loop fusion and loop interchanging. We show how to perform these transformations and how to use them to optimize programs for a wide range of supercomputers. We also discuss the problems of recurrence relations. We study arithmetic recurrences with IF statements, and recurrences involving both data and control dependence relations in a cycle. The wavefront method of solving recurrences is also treated. Another problem for a compiler is the management of compiler temporary variables; supercomputer compilers must manage temporary arrays, and we discuss ways to make this problem more tractable. Finally we discuss WHILE loops and see how they relate to iterative DO loops, and show several methods to execute WHILE loops on a machine. We also give the general structure of an optimizing compiler for supercomputers, developed from our experience with a test-bed compiler.
Cited By
- Kao S, Subramanian S, Agrawal G, Yazdanbakhsh A and Krishna T FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, (295-310)
- Sun P, Gabrielli G and Jones T Speculative vectorisation with selective replay Proceedings of the 48th Annual International Symposium on Computer Architecture, (223-236)
- Kjolstad F, Ahrens P, Kamil S and Amarasinghe S Tensor algebra compilation with workspaces Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization, (180-192)
- Zhao J and Zhao R (2018). K-DT, The Journal of Supercomputing, 74:4, (1655-1675), Online publication date: 1-Apr-2018.
- Huang J, Prabhu P, Jablin T, Ghosh S, Apostolakis S, Lee J and August D Speculatively Exploiting Cross-Invocation Parallelism Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, (207-221)
- Bastoul C Mapping deviation: a technique to adapt or to guard loop transformation intuitions for legality Proceedings of the 25th International Conference on Compiler Construction, (229-239)
- August D, Huang J, Beard S, Johnson N and Jablin T Automatically exploiting cross-invocation parallelism using runtime information Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), (1-11)
- Qasem A Efficient execution of time-step computations with pipelined parallelism and inter-thread data locality optimizaitions Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores, (27-35)
- Kotha A, Anand K, Smithson M, Yellareddy G and Barua R Automatic Parallelization in a Binary Rewriter Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, (547-557)
- Qiu M, Sha E, Liu M, Lin M, Hua S and Yang L (2008). Energy minimization with loop fusion and multi-functional-unit scheduling for multidimensional DSP, Journal of Parallel and Distributed Computing, 68:4, (443-455), Online publication date: 1-Apr-2008.
- Pearce D, Kelly P and Hankin C (2007). Efficient field-sensitive pointer analysis of C, ACM Transactions on Programming Languages and Systems, 30:1, (4-es), Online publication date: 1-Nov-2007.
- Thies W, Karczmarek M, Sermulins J, Rabbah R and Amarasinghe S Teleport messaging for distributed stream programs Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, (224-235)
- Pan Z, Armstrong B, Bae H and Eigenmann R On the interaction of tiling and automatic parallelization Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming, (24-35)
- Zhao Y and Kennedy K (2005). Scalarization using loop alignment and loop skewing, The Journal of Supercomputing, 31:1, (5-46), Online publication date: 1-Jan-2005.
- Kadayif I and Kandemir M (2004). Quasidynamic Layout Optimizations for Improving Data Locality, IEEE Transactions on Parallel and Distributed Systems, 15:11, (996-1011), Online publication date: 1-Nov-2004.
- Zhu Y, Magklis G, Scott M, Ding C and Albonesi D The Energy Impact of Aggressive Loop Fusion Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, (153-164)
- Lin Y, Terboven C, Mey D and Copty N Automatic scoping of variables in parallel regions of an OpenMP program Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP, (83-97)
- Callahan D, Carr S and Kennedy K (2004). Improving register allocation for subscripted variables, ACM SIGPLAN Notices, 39:4, (328-342), Online publication date: 1-Apr-2004.
- Horwitz S, Reps T and Binkley D (2004). Interprocedural slicing using dependence graphs, ACM SIGPLAN Notices, 39:4, (229-243), Online publication date: 1-Apr-2004.
- Burke M and Cytron R (2004). Interprocedural dependence analysis and parallelization, ACM SIGPLAN Notices, 39:4, (139-154), Online publication date: 1-Apr-2004.
- Ding C and Kennedy K (2004). Improving effective bandwidth through compiler enhancement of global cache reuse, Journal of Parallel and Distributed Computing, 64:1, (108-134), Online publication date: 1-Jan-2004.
- Tu P and Padua D Automatic array privatization Compiler optimizations for scalable parallel systems, (247-281)
- Wu J (2000). An Interleaving Transformation for Parallelizing Reductions for Distributed-Memory Parallel Machines, The Journal of Supercomputing, 15:3, (321-339), Online publication date: 1-Mar-2000.
- Vajracharya S, Karmesin S, Beckman P, Crotinger J, Malony A, Shende S, Oldehoeft R and Smith S SMARTS Proceedings of the 13th international conference on Supercomputing, (302-310)
- Vajracharya S and Grunwald D Loop re-ordering and pre-fetching at run-time Proceedings of the 1997 ACM/IEEE conference on Supercomputing, (1-13)
- Koo M, Park S, Yook H and Park M A transformation method to reduce loop overhead in HPF compiler Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
- Chen D and Yew P (1996). On Effective Execution of Nonuniform DOACROSS Loops, IEEE Transactions on Parallel and Distributed Systems, 7:5, (463-476), Online publication date: 1-May-1996.
- Saavedra-Barrera R, Mao W, Park D, Chame J and Moon S The Combined Effectiveness of Unimodular Transformations, Tiling, and Software Prefetching Proceedings of the 10th International Parallel Processing Symposium, (39-45)
- Reps T and Rosay G (1995). Precise interprocedural chopping, ACM SIGSOFT Software Engineering Notes, 20:4, (41-52), Online publication date: 1-Oct-1995.
- Reps T and Rosay G Precise interprocedural chopping Proceedings of the 3rd ACM SIGSOFT symposium on Foundations of software engineering, (41-52)
- Subhlok J and Kennedy K (1995). Integer Programming for Array Subscript Analysis, IEEE Transactions on Parallel and Distributed Systems, 6:6, (662-668), Online publication date: 1-Jun-1995.
- Lee G (1995). Parallelizing Iterative Loops with Conditional Branching, IEEE Transactions on Parallel and Distributed Systems, 6:2, (185-189), Online publication date: 1-Feb-1995.
- Reps T, Horwitz S, Sagiv M and Rosay G (1994). Speeding up slicing, ACM SIGSOFT Software Engineering Notes, 19:5, (11-20), Online publication date: 1-Dec-1994.
- Reps T, Horwitz S, Sagiv M and Rosay G Speeding up slicing Proceedings of the 2nd ACM SIGSOFT symposium on Foundations of software engineering, (11-20)
- Chen D, Torrellas J and Yew P An efficient algorithm for the run-time parallelization of DOACROSS loops Proceedings of the 1994 ACM/IEEE conference on Supercomputing, (518-527)
- Zhu G, Xie L and Sun Z (1994). A path-based method of parallelizing C++ programs, ACM SIGPLAN Notices, 29:2, (19-24), Online publication date: 1-Feb-1994.
- Maslov V Lazy array data-flow dependence analysis Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages, (311-325)
- Adl-Tabatabai A, Gross T, Lueh G and Reinders J Modeling Instruction-Level Parallelism for Software Pipelining Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism, (321-330)
- Iitsuka T Flow-sensitive Interprocedural Analysis Method for Parallelization Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism, (65-76)
- Newburn C, Huang A and Shen J Balancing Fine- and Medium-Grained Parallelism in Scheduling Loops for the XIMD Architecture Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism, (39-52)
- Carr S and Kennedy K Compiler blockability of numerical algorithms Proceedings of the 1992 ACM/IEEE conference on Supercomputing, (114-124)
- Allen R and Kennedy K (1992). Vector Register Allocation, IEEE Transactions on Computers, 41:10, (1290-1317), Online publication date: 1-Oct-1992.
- Wolfe M and Tseng C (1992). The Power Test for Data Dependence, IEEE Transactions on Parallel and Distributed Systems, 3:5, (591-601), Online publication date: 1-Sep-1992.
- Psarris K On exact data dependence analysis Proceedings of the 6th international conference on Supercomputing, (303-312)
- Eisenbeis C and Sogno J A general algorithm for data dependence analysis Proceedings of the 6th international conference on Supercomputing, (292-302)
- Cheong H Life span strategy—a compiler-based approach to cache coherence Proceedings of the 6th international conference on Supercomputing, (139-148)
- Chen W, Mahlke S, Hwu W, Kiyohara T and Chang P Tolerating data access latency with register preloading Proceedings of the 6th international conference on Supercomputing, (104-113)
- Wholey S Automatic data mapping for distributed-memory parallel computers Proceedings of the 6th international conference on Supercomputing, (25-34)
- Lucco S (1992). A dynamic scheduling method for irregular parallel programs, ACM SIGPLAN Notices, 27:7, (200-211), Online publication date: 1-Jul-1992.
- Maslov V (1992). Delinearization, ACM SIGPLAN Notices, 27:7, (152-161), Online publication date: 1-Jul-1992.
- Lucco S A dynamic scheduling method for irregular parallel programs Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation, (200-211)
- Maslov V Delinearization Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation, (152-161)
- Horwitz S and Reps T The use of program dependence graphs in software engineering Proceedings of the 14th international conference on Software engineering, (392-411)
- Girkar M and Polychronopoulos C (1992). Automatic Extraction of Functional Parallelism from Ordinary Programs, IEEE Transactions on Parallel and Distributed Systems, 3:2, (166-178), Online publication date: 1-Mar-1992.
- Ammarguellat Z (1992). A Control-Flow Normalization Algorithm and its Complexity, IEEE Transactions on Software Engineering, 18:3, (237-251), Online publication date: 1-Mar-1992.
- Koelbel C and Mehrotra P (1991). Compiling Global Name-Space Parallel Loops for Distributed Execution, IEEE Transactions on Parallel and Distributed Systems, 2:4, (440-451), Online publication date: 1-Oct-1991.
- Smith L Vectorizing C compilers: how good are they? Proceedings of the 1991 ACM/IEEE conference on Supercomputing, (544-553)
- Ramanujam J and Sadayappan P Tiling multidimensional iteration spaces for nonshared memory machines Proceedings of the 1991 ACM/IEEE conference on Supercomputing, (111-120)
- Pugh W The Omega test: a fast and practical integer programming algorithm for dependence analysis Proceedings of the 1991 ACM/IEEE conference on Supercomputing, (4-13)
- Su H and Yew P Efficient Doacross execution on distributed shared-memory multiprocessors Proceedings of the 1991 ACM/IEEE conference on Supercomputing, (842-853)
- Malony A (1991). Event-based performance perturbation, ACM SIGPLAN Notices, 26:7, (201-212), Online publication date: 1-Jul-1991.
- Ancourt C and Irigoin F (1991). Scanning polyhedra with DO loops, ACM SIGPLAN Notices, 26:7, (39-50), Online publication date: 1-Jul-1991.
- Lu L (1991). A unified framework for systematic loop transformations, ACM SIGPLAN Notices, 26:7, (28-38), Online publication date: 1-Jul-1991.
- Goff G, Kennedy K and Tseng C (1991). Practical dependence testing, ACM SIGPLAN Notices, 26:6, (15-29), Online publication date: 1-Jun-1991.
- Kennedy K, McKinley K and Tseng C Analysis and transformation in the ParaScope editor Proceedings of the 5th international conference on Supercomputing, (433-447)
- Psarris K, Kong X and Klappholz D Extending the I test to direction vectors Proceedings of the 5th international conference on Supercomputing, (330-340)
- Hudak D and Abraham S Beyond loop partitioning Proceedings of the 5th international conference on Supercomputing, (172-182)
- Hege H and Stüben H Vectorization and parallelization of irregular problems via graph coloring Proceedings of the 5th international conference on Supercomputing, (47-56)
- Goff G, Kennedy K and Tseng C Practical dependence testing Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation, (15-29)
- Malony A Event-based performance perturbation Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming, (201-212)
- Ancourt C and Irigoin F Scanning polyhedra with DO loops Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming, (39-50)
- Lu L A unified framework for systematic loop transformations Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming, (28-38)
- Pinter S and Pinter R Program optimization and parallelization using idioms Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, (79-92)
- Wang S and Uht A Ideograph/Ideogram Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture, (125-134)
- Lu L and Chen M Subdomain dependence test for massive parallelism Proceedings of the 1990 ACM/IEEE conference on Supercomputing, (962-972)
- Kennedy K and McKinley K Loop distribution with arbitrary control flow Proceedings of the 1990 ACM/IEEE conference on Supercomputing, (407-416)
- Chow J and Harrison W Switch-stacks Proceedings of the 1990 ACM/IEEE conference on Supercomputing, (190-199)
- Smith K, Appelbe B and Stirewalt K (1990). Incremental dependence analysis for interactive parallelization, ACM SIGARCH Computer Architecture News, 18:3b, (330-341), Online publication date: 1-Sep-1990.
- Klappholz D, Psarris K and Kong X (1990). On the perfect accuracy of an approximate subscript analysis test, ACM SIGARCH Computer Architecture News, 18:3b, (201-212), Online publication date: 1-Sep-1990.
- Anderson S and Hudak P (1990). Compilation of Haskell array comprehensions for scientific computing, ACM SIGPLAN Notices, 25:6, (137-149), Online publication date: 1-Jun-1990.
- Callahan D, Carr S and Kennedy K (1990). Improving register allocation for subscripted variables, ACM SIGPLAN Notices, 25:6, (53-65), Online publication date: 1-Jun-1990.
- Anderson S and Hudak P Compilation of Haskell array comprehensions for scientific computing Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation, (137-149)
- Callahan D, Carr S and Kennedy K Improving register allocation for subscripted variables Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation, (53-65)
- Smith K, Appelbe B and Stirewalt K Incremental dependence analysis for interactive parallelization Proceedings of the 4th international conference on Supercomputing, (330-341)
- Klappholz D, Psarris K and Kong X On the perfect accuracy of an approximate subscript analysis test Proceedings of the 4th international conference on Supercomputing, (201-212)
- Callahan D, Kennedy K and Subhlok J (1990). Analysis of event synchronization in a parallel programming tool, ACM SIGPLAN Notices, 25:3, (21-30), Online publication date: 1-Mar-1990.
- Callahan D, Kennedy K and Subhlok J Analysis of event synchronization in a parallel programming tool Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming, (21-30)
- Wu Y and Lewis T Parallel processor balance through loop spreading Proceedings of the 1989 ACM/IEEE conference on Supercomputing, (665-674)
- Balasundaram V, Kennedy K, Kremer U, McKinley K and Subhlok J The parascope editor: an interactive parallel programming tool Proceedings of the 1989 ACM/IEEE conference on Supercomputing, (540-550)
- Cytron R, Hind M and Hsieh W (1989). Automatic generation of DAG parallelism, ACM SIGPLAN Notices, 24:7, (54-68), Online publication date: 1-Jul-1989.
- Balasundaram V and Kennedy K (1989). A technique for summarizing data access and its use in parallelism enhancing transformations, ACM SIGPLAN Notices, 24:7, (41-53), Online publication date: 1-Jul-1989.
- Cartwright R and Felleisen M (1989). The semantics of program dependence, ACM SIGPLAN Notices, 24:7, (13-27), Online publication date: 1-Jul-1989.
- Cytron R, Hind M and Hsieh W Automatic generation of DAG parallelism Proceedings of the ACM SIGPLAN 1989 conference on Programming language design and implementation, (54-68)
- Balasundaram V and Kennedy K A technique for summarizing data access and its use in parallelism enhancing transformations Proceedings of the ACM SIGPLAN 1989 conference on Programming language design and implementation, (41-53)
- Cartwright R and Felleisen M The semantics of program dependence Proceedings of the ACM SIGPLAN 1989 conference on Programming language design and implementation, (13-27)
- Kim Y and Lee K Performance analysis of buffered banyan networks under nonuniform traffic Proceedings of the 3rd international conference on Supercomputing, (452-460)
- Li Z, Yew P and Zhu C Data dependence analysis on multi-dimensional array references Proceedings of the 3rd international conference on Supercomputing, (215-224)
- Hendren L and Nicolau A Intererence analysis tools for parallelizing programs with recursive data structures Proceedings of the 3rd international conference on Supercomputing, (205-214)
- Nicolau A (1989). Run-Time Disambiguation, IEEE Transactions on Computers, 38:5, (663-678), Online publication date: 1-May-1989.
- Arafeh B Interactive loop interchanging Proceedings of the 17th conference on ACM Annual Computer Science Conference, (418-418)
- Baxter W and Bauer H The program dependence graph and vectorization Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, (1-11)
- Gikar M and Polychronopoulos C Compiling issues for supercomputers Proceedings of the 1988 ACM/IEEE conference on Supercomputing, (164-173)
- Larus J and Hilfinger P (1988). Restructuring Lisp programs for concurrent execution, ACM SIGPLAN Notices, 23:9, (100-110), Online publication date: 1-Sep-1988.
- Li Z and Yew P (1988). Efficient interprocedural analysis for program parallelization and restructuring, ACM SIGPLAN Notices, 23:9, (85-99), Online publication date: 1-Sep-1988.
- Burke M, Cytron R, Ferrante J, Hsieh W, Sarkar V and Shields D (1988). Automatic discovery of parallelism: a tool and an experiment (extended abstract), ACM SIGPLAN Notices, 23:9, (77-84), Online publication date: 1-Sep-1988.
- Allen R and Johnson S (1988). Compiling C for vectorization, parallelization, and inline expansion, ACM SIGPLAN Notices, 23:7, (241-249), Online publication date: 1-Jul-1988.
- Culler D and Arvind Resource requirements of dataflow programs Proceedings of the 15th Annual International Symposium on Computer architecture, (141-150)
- Wallace D Dependence of multi-dimensional array references Proceedings of the 2nd international conference on Supercomputing, (418-428)
- Brandes T The importance of direct dependences for automatic parallelization Proceedings of the 2nd international conference on Supercomputing, (407-417)
- Turner S and Veidenbaum A Performance of a shared memory system for vector multiprocessors Proceedings of the 2nd international conference on Supercomputing, (315-325)
- Allen F, Burke M, Cytron R, Ferrante J and Hsieh W A framework for determining useful parallelism Proceedings of the 2nd international conference on Supercomputing, (207-215)
- Allen R and Johnson S Compiling C for vectorization, parallelization, and inline expansion Proceedings of the ACM SIGPLAN 1988 conference on Programming language design and implementation, (241-249)
- Culler D and Arvind (1988). Resource requirements of dataflow programs, ACM SIGARCH Computer Architecture News, 16:2, (141-150), Online publication date: 17-May-1988.
- Arafeh B Vectorization and parallelization interactive assistant Proceedings of the 1988 ACM sixteenth annual conference on Computer science, (573-577)
- Irigoin F and Triolet R Supernode partitioning Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, (319-329)
- Horwitz S, Prins J and Reps T On the adequacy of program dependence graphs for representing programs Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, (146-157)
- Bradley D, Nazief B, Grunwald D and Reed D Picasso: an experiment in hypercube operating system design Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1, (364-373)
- Larus J and Hilfinger P Restructuring Lisp programs for concurrent execution Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems, (100-110)
- Li Z and Yew P Efficient interprocedural analysis for program parallelization and restructuring Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems, (85-99)
- Burke M, Cytron R, Ferrante J, Hsieh W, Sarkar V and Shields D Automatic discovery of parallelism: a tool and an experiment (extended abstract) Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems, (77-84)
- Polychronopoulus C and Banerjee U (1987). Processor Allocation for Horizontal and Vertical Parallelism and Related Speedup Bounds, IEEE Transactions on Computers, 36:4, (410-420), Online publication date: 1-Apr-1987.
- Padua D and Wolfe M (1986). Advanced compiler optimizations for supercomputers, Communications of the ACM, 29:12, (1184-1201), Online publication date: 1-Dec-1986.
- Burke M and Cytron R (1986). Interprocedural dependence analysis and parallelization, ACM SIGPLAN Notices, 21:7, (162-175), Online publication date: 1-Jul-1986.
- Burke M and Cytron R Interprocedural dependence analysis and parallelization Proceedings of the 1986 SIGPLAN symposium on Compiler construction, (162-175)
Recommendations
A study of scalar compilation techniques for pipelined supercomputers
This paper studies two compilation techniques for enhancing scalar performance in high-speed scientific processors: software pipelining and loop unrolling. We study the impact of the architecture (size of the register file) and of the hardware (size of ...
A study of scalar compilation techniques for pipelined supercomputers
ASPLOS II: Proceedings of the second international conference on Architectual support for programming languages and operating systemsThis paper studies two compilation techniques for enhancing scalar performance in high-speed scientific processors: software pipelining and loop unrolling. We study the impact of the architecture (size of the register file) and of the hardware (size of ...