Abstract
The objective of software pipelining is to generate code which can maximally exploit instruction-level parallelism (ILP) in modern multiissue processor architectures, such as VLIW and superscalar processors. Since the amount of ILP is usually fixed to a small number, four โ eight, using state-of-the-art software pipelining scheduling techniques, modern compilers have been able to schedule instructions in a small window of successive iterations and keep the machine resources usefully busy. To maximally take advantage of software pipelining, it is beneficial if the number of iterations of the loops to be software pipelined is large (called trip counts in this paper). Therefore, software pipelining of nested loops becomes important, especially when the innermost loops have smaller trip counts.
This paper presents a loop transformation which extends software pipelining from the innermost loops to the enclosing loop nests. Unlike some popular loop transformation techniques (e.g. unimodular transformation) targeted to multi-processor machines (where the goal has been to maximally expose loop-level parallelism i.e. the transformed loop nests have maximum number of doall loops), the goal of our transformation, pipelining-dovetailing, is to extend the software pipelining of the innermost loop to the surrounding loop nests. Thus all iterations of the loop nests can be smoothly software pipelined through, and the number of effective trip counts is maximized. We also define the condition under which pipelining-dovetailing is valid. As a result, a software pipelining framework is derived for loop nests which integrates software pipelining and pipelining-dovetailing together.
This work was supported by research grants from NSERC, Micronet โ Network Centers of Excellence, Canada.
Chapter PDF
Similar content being viewed by others
Keywords
References
B. R. Rau and J.A. Fisher. Instruction-level parallel processing: History, overview and perspective. The Journal of Supercomputing, 7(1), January 1993.
B.R. Rau and C.D. Glaeser. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. In proceedings of the 14th International Symposium on Microprogramming and Microarchitectures (MICRO-14), pages 183โ198, October 1981.
K. Ebcioglu and T. Nakatani. A new compilation technique for paralelizing loops with unpredictable branches on a vliw architecture. In A. Nicolau D. Gelernter and D. Padua, editors, Languages and Compilers for Parallel Computing, pages 213โ229. Pitman/The MIT Press, London, 1989.
M.S. Lam. A Systolic Array Optimizing Compiler. PhD thesis, CMU, 1987. CMU-CS-87-187.
C. Eisenbeis, W. Jalby, and A. Lichnewsky. Compile-time optimization of memory and register usage on the cray-2. In proceedings of the second Workshop on Languages and Compilers, 1989.
A. Aiken and A. Nicolau. A realistic resource-constrainted software pipelining algorithm. In T. Gross A. Nicolau, D. Gelernter and D. Padua, editors, Languages and Compilers for Parallel Computing, pages 274โ290. Pitman/The MIT Press, London, 1991.
R. Huff. Lifetime-sensitive modulo scheduling. In proceedings of ACM SIGPLAN PLDI, pages 258โ267, June 1993.
Q. Ning and G.R. Gao. A novel framework of register allocation for software pipelining. In proceedings of POPL, January 1993.
Jian Wang, Christine Eisenbeis, Martin Jourdan, and Bogong Su. Decomposed Software Pipelining: A new perspective and a new approach. International Journal of Parallel Programming, 22(3):357โ379, 1994.
Michael E. Wolf and M. S. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems, 2(4), 1991.
U. Banerjee. Loop Transformations for Restructuring Compilers. Kluwer Academic, 1993.
A. Darte, L. Risset, and Y. Robert. Loop nest scheduling and transformations. In proceedings of Environments and Tools for Parallel Scientific Computing, 1992.
Amy W. Lim and M. S. Lam. Communication-free parallelization via affine transformations. In proceedings of LCPC'94, 1994.
F. Gasperoni. Compilation techniques for vliw architectures. Technical Report TR435, New York University, March 1989.
Hans Zima and Barbara Chapman. Supercompilers for Parallel and Vector Computers. ACM Press, New York, 1990.
U. Banerjee. Unimodular transformations of double loops. In proceedings of the 3rd Workshop on Languages and Compilers for Parallel Computing, 1990.
Bogong Su, Shiyuan Ding, Jian Wang, and Jinshi Xia. GURPR-a method for global software pipelining. In proceedings of the 20th Annual International Workshop on Microprogramming (MICRO-20), pages 88โ96. ACM and IEEE, November 1987.
Guang R. Gao, Qi Ning, and Vincent Van Dongen. Extending software pipelining techniques for scheduling nested loops. In proceedings of the 6th Workshop on Languages and Compilers for Parallel Computing, 1993.
Ki chang Kim and Alexandru Nicolau. Parallelizing tightly nested loops. In proceedings of International Conference on Parallel Processing, 1991.
P. Feautrier. A collection of papers on the systematic construction of parallel and distributed programs. Technical Report Hors-serie, Lab. MASI, Universite P. et M. Curie, 1992.
M. J. Wolfe. Optimizing Supercompilers for Supercomputers. MIT Press, Cambridge, MA, 1989.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
ยฉ 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, J., Gao, G.R. (1996). Pipelining-dovetailing: A transformation to enhance software pipelining for nested loops. In: Gyimรณthy, T. (eds) Compiler Construction. CC 1996. Lecture Notes in Computer Science, vol 1060. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61053-7_49
Download citation
DOI: https://doi.org/10.1007/3-540-61053-7_49
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61053-3
Online ISBN: 978-3-540-49939-8
eBook Packages: Springer Book Archive