Abstract
The most important task of a compiler designed to exploit instruction-level parallelism (ILP) is instruction scheduling. If higher levels of ILP are to be achieved, the compiler must use, as the unit of scheduling, regions consisting of multiple basic blocks—preferably those that frequently execute consecutively, and which capture cycles in the program’s execution. Traditionally, compilers have been built using the function as the unit of compilation. In this framework, function boundaries often act as barriers to the formation of the most suitable scheduling regions. Function inlining may be used to circumvent this problem by assembling strongly coupled functions into the same compilation unit, but at the cost of very large function bodies. Consequently, global optimizations whose compile time and space requirements are superlinear in the size of the compilation unit, may be rendered prohibitively expensive. This paper introduces a new approach, called region-based compilation, wherein the compiler, after inlining, repartitions the program into more desirable compilation units, termed regions. Region-based compilation allows the compiler to control problem size and complexity while exposing inter-procedural scheduling, optimization and code motion opportunities.
Similar content being viewed by others
References
J. A. Fisher, Trace Scheduling: A technique for Global Microcode Compaction,IEEE Trans. on Computers,C-30:478–490, (July 1981).
J. Ellis,Bulldog: A Compiler for VLIW Architectures, Cambridge, Massachusetts, The MIT Press (1985).
B. R. Rau and C. D. Glaeser, Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Performance Scientific Computing,Proc. 20th Ann. Workshop on Microprogramming and Microarchitecture, pp. 183–198 (October 1981).
B. R. Rau, Iterative Modulo Scheduling,Int. J. Pa. Pr. 24:3–64 (February 1996).
W. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery, The Superblock: An Effective Technique for VLIW and Superscalar Compilation,J. Supercomputing,7:229–248 (January 1993).
P. G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O’Donell, and J. C. Ruttenberg, The Multiflow Trace Scheduling Compiler,J. Supercomputing,7, 51–142 (January 1993).
J. C. Dehnert and R. A. Towle, Compiling for the Cydra 5,J. Supercomputing,7:181–227 (January 1993).
R. Allen and S. Johnson, Compiling C for Vectorization, Parallelization, and Inline Expansion,Proc. ACMSIGPLAN 1988 Conf. Progr. Lang. Design and Implementation, pp. 241–249 (June 1988).
J. W. Davidson and A. M. Holler, Subprogram Inlining: A study of Its Effects on Program Execution Time,IEEE Trans. Software Engineering,18:89–101 (February 1992).
W. W. Hwu and P. P. Chang, Inline Function Expansion for Compiling Realistic C Programs,Proc. ACM SIGPLAN 1989 Conf. Progr. Lang. Design and Implementation, pp. 246–257 (June 1989).
S. A. Mahlke, N. J. Warter, W. Y. Chen, P. P. Chang, and W. W. Hwu, The Effect of Compiler Optimizations on Available Parallelism in Scalar Programs,Proc. Int. Conf. Parallel Processing, pp. 142–145 (August 1991).
S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann, Effective Compiler Support for Predicated Execution Using the Hyperblock,Proc. 25th Int. Symp. on Microarchitecture, pp. 45–54 (December 1992).
S. A. Mahlke, W. Y. Chen, P. P. Chang, and W. W. Hwu, Scalar Program Performance on Multiple-Instruction-Issue Processors with a Limited Number of Registers,Proc. 25th Ann. Hawaii Int’l. Conf. Syst. Sci., pp. 34–44 (January 1992).
M. Schlansker and V. Kathail, Critical Path Reduction for Scalar Programs,Proc. 28th Int’l. Symp. Microarchitecture, pp. 57–69 (December 1995).
D. M. Lavery and W. W. Hwu, Unrolling-Based Optimizations for Modulo Scheduling,Proc. 28th Int’l. Symp. Microarchitecture, pp. 327–337 (November 1995).
M. S. Schlansker, V. Kathail, and S. Anik, Parallelization of Control Recurrences for ILP Processors,Int. J. Parallel Processing 24:65–102 (February 1996).
A. Aho, R. Sethi, and J. Ullman,Compilers: Principles, Techniques, and Tools. Reading, Massachusetts, Addison-Wesley (1986).
R. Gupta and M. L. Soffa, Region Scheduling: An Approach for Detecting and Redistributing Parallelism,IEEE Trans. Software Engineering,16:421–431 (April 1990).
R. S. Pressman,Software Engineering: A Practitioner’s Approach. New York, McGraw-Hill (1988).
G. J. Chaitin, Register Allocation and Spilling via Graph Coloring,Proc. ACM SIGPLAN Symp. Compiler Construction, pp. 98–105 (June 1982).
F. C. Chow and J. L. Hennessy, The Priority-Based Coloring Approach to Register Allocation,ACM Trans. Prog. Lang. Syst. 12:501–536 (October 1990).
S. A. Mahlke, Exploiting Instruction Level Parallelism in the Presence of Conditional Branches. Ph.D. Thesis, Department of Electrical and Computer Engineering, University of Illinois, Illinois (1996).
U. Mahadevan and S. Ramakrishnan, Instruction Scheduling Over Regions: A Framework for Scheduling Across Basic Blocks,Proc. 5th Int’l. Conf. Compiler Construction, pp. 419–434 (April 1994).
R. E. Hank, S. A. Mahlke, R. A. Bringmann, J. C. Gyllenhaal, and W. W. Hwu, Superblock Formation Using Static Program Analysis,Proc. 26th Ann. Int’l. Symp. Microarchitecture (December 1993).
M. S. Lam, Software Pipelining: An Effective Scheduling Technique for VLIW Machines,Proc. ACM SIGPLAN Conf. Progr. Lang. Design and Implementation, pp. 318–328 (June 1988).
K. Ebcioglu, A Compilation Technique for Software Pipelining of Loops with Conditional Jumps,Proc. 20th Ann. Workshop on Microprogramming and Microarchitecture, pp. 69–79 (December 1987).
A. Aiken and A. Nicolau, Optimal Loop Parallelization,Proc. ACMSIGPLAN Conf. Progr. Lang. Design and Implementation, pp. 308–317 (June 1988).
K. Ebcioglu and T. Nakatani, A New Compilation Technique for Parallelizing Loops with Unpredictable Branches on a VLIW Architecture,Lang. and Compilers for Parallel Computing, pp. 213–229 (1989).
P. Tirumalai, M. Lee, and M. Schlansker, Parallelization of Loops with Exits on Pipelined Architectures,Supercomputing (November 1990).
B. R. Rau, Iterative Modulo Scheduling: An Algorithm for Software Pipelining Loops,Proc. 27th Int’l. Symp. Microarchitecture, pp. 63–74 (December 1994).
M. Schlansker, V. Kathail, and S. Anik, Height Reduction of Control Recurrences for ILP Processors,Proc. 27th Int’l. Symp. Microarchitecture, pp. 40–51 (December 1994).
D. M. Lavery and W. W. Hwu, Modulo Scheduling of Loops in Control-Intensive Non-Numeric Programs,Proc. 29th Int’l. Symp. Microarchitecture, pp. 126–137 (December 1996).
P. P. Chang and W. W. Hwu, Trace Selection for Compiling Large C Application Programs to Microcode,Proc. 21st Int’l. Workshop on Microprogramming and microarchitecture, pp. 188–198 (November 1988).
R. E. Hank,Region-Based Compilation, Ph.D. Thesis, Department of Electrical and Computer Engineering, University of Illinois, Urbana, Illinois (1996).
J. Ferrante, K. J. Ottenstein, and J. D. Warren, The Program Dependence Graph and Its Use in Optimization,ACM Trans. Progr. Lang. Syst. 9:319–349 (July 1987).
T. Ball and J. R. Larus, Branch Prediction for Free,Proc. ACM SIGPLAN Conf. Prog. Lang. Design and Implementation, pp. 300–313 (June 1993).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Hank, R.E., Hwu, Wm.W. & Rau, B.R. Region-based compilation: Introduction, motivation, and initial experience. Int J Parallel Prog 25, 113–146 (1997). https://doi.org/10.1007/BF02700049
Issue Date:
DOI: https://doi.org/10.1007/BF02700049