Pipelining-dovetailing: A transformation to enhance software pipelining for nested loops

Jian Wang¹ &
Guang R. Gao¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1060))

Included in the following conference series:

International Conference on Compiler Construction

585 Accesses

Abstract

The objective of software pipelining is to generate code which can maximally exploit instruction-level parallelism (ILP) in modern multiissue processor architectures, such as VLIW and superscalar processors. Since the amount of ILP is usually fixed to a small number, four — eight, using state-of-the-art software pipelining scheduling techniques, modern compilers have been able to schedule instructions in a small window of successive iterations and keep the machine resources usefully busy. To maximally take advantage of software pipelining, it is beneficial if the number of iterations of the loops to be software pipelined is large (called trip counts in this paper). Therefore, software pipelining of nested loops becomes important, especially when the innermost loops have smaller trip counts.

This paper presents a loop transformation which extends software pipelining from the innermost loops to the enclosing loop nests. Unlike some popular loop transformation techniques (e.g. unimodular transformation) targeted to multi-processor machines (where the goal has been to maximally expose loop-level parallelism i.e. the transformed loop nests have maximum number of doall loops), the goal of our transformation, pipelining-dovetailing, is to extend the software pipelining of the innermost loop to the surrounding loop nests. Thus all iterations of the loop nests can be smoothly software pipelined through, and the number of effective trip counts is maximized. We also define the condition under which pipelining-dovetailing is valid. As a result, a software pipelining framework is derived for loop nests which integrates software pipelining and pipelining-dovetailing together.

This work was supported by research grants from NSERC, Micronet — Network Centers of Excellence, Canada.

Download to read the full chapter text

Chapter PDF

Instruction Level Loop De-optimization

Enhancing the Effectiveness of Inlining in Automatic Parallelization

Article 06 August 2021

A methodology pruning the search space of six compiler transformations by addressing them together as one problem and by exploiting the hardware architecture details

Article 09 January 2017

Keywords

References

B. R. Rau and J.A. Fisher. Instruction-level parallel processing: History, overview and perspective. The Journal of Supercomputing, 7(1), January 1993.
Google Scholar
B.R. Rau and C.D. Glaeser. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. In proceedings of the 14th International Symposium on Microprogramming and Microarchitectures (MICRO-14), pages 183–198, October 1981.
Google Scholar
K. Ebcioglu and T. Nakatani. A new compilation technique for paralelizing loops with unpredictable branches on a vliw architecture. In A. Nicolau D. Gelernter and D. Padua, editors, Languages and Compilers for Parallel Computing, pages 213–229. Pitman/The MIT Press, London, 1989.
Google Scholar
M.S. Lam. A Systolic Array Optimizing Compiler. PhD thesis, CMU, 1987. CMU-CS-87-187.
Google Scholar
C. Eisenbeis, W. Jalby, and A. Lichnewsky. Compile-time optimization of memory and register usage on the cray-2. In proceedings of the second Workshop on Languages and Compilers, 1989.
Google Scholar
A. Aiken and A. Nicolau. A realistic resource-constrainted software pipelining algorithm. In T. Gross A. Nicolau, D. Gelernter and D. Padua, editors, Languages and Compilers for Parallel Computing, pages 274–290. Pitman/The MIT Press, London, 1991.
Google Scholar
R. Huff. Lifetime-sensitive modulo scheduling. In proceedings of ACM SIGPLAN PLDI, pages 258–267, June 1993.
Google Scholar
Q. Ning and G.R. Gao. A novel framework of register allocation for software pipelining. In proceedings of POPL, January 1993.
Google Scholar
Jian Wang, Christine Eisenbeis, Martin Jourdan, and Bogong Su. Decomposed Software Pipelining: A new perspective and a new approach. International Journal of Parallel Programming, 22(3):357–379, 1994.
Google Scholar
Michael E. Wolf and M. S. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems, 2(4), 1991.
Google Scholar
U. Banerjee. Loop Transformations for Restructuring Compilers. Kluwer Academic, 1993.
Google Scholar
A. Darte, L. Risset, and Y. Robert. Loop nest scheduling and transformations. In proceedings of Environments and Tools for Parallel Scientific Computing, 1992.
Google Scholar
Amy W. Lim and M. S. Lam. Communication-free parallelization via affine transformations. In proceedings of LCPC'94, 1994.
Google Scholar
F. Gasperoni. Compilation techniques for vliw architectures. Technical Report TR435, New York University, March 1989.
Google Scholar
Hans Zima and Barbara Chapman. Supercompilers for Parallel and Vector Computers. ACM Press, New York, 1990.
Google Scholar
U. Banerjee. Unimodular transformations of double loops. In proceedings of the 3rd Workshop on Languages and Compilers for Parallel Computing, 1990.
Google Scholar
Bogong Su, Shiyuan Ding, Jian Wang, and Jinshi Xia. GURPR-a method for global software pipelining. In proceedings of the 20th Annual International Workshop on Microprogramming (MICRO-20), pages 88–96. ACM and IEEE, November 1987.
Google Scholar
Guang R. Gao, Qi Ning, and Vincent Van Dongen. Extending software pipelining techniques for scheduling nested loops. In proceedings of the 6th Workshop on Languages and Compilers for Parallel Computing, 1993.
Google Scholar
Ki chang Kim and Alexandru Nicolau. Parallelizing tightly nested loops. In proceedings of International Conference on Parallel Processing, 1991.
Google Scholar
P. Feautrier. A collection of papers on the systematic construction of parallel and distributed programs. Technical Report Hors-serie, Lab. MASI, Universite P. et M. Curie, 1992.
Google Scholar
M. J. Wolfe. Optimizing Supercompilers for Supercomputers. MIT Press, Cambridge, MA, 1989.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, McGill University, 3480 University Street, H3A 2AT, Montréal, QC, Canada
Jian Wang & Guang R. Gao

Authors

Jian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guang R. Gao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Tibor Gyimóthy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, J., Gao, G.R. (1996). Pipelining-dovetailing: A transformation to enhance software pipelining for nested loops. In: Gyimóthy, T. (eds) Compiler Construction. CC 1996. Lecture Notes in Computer Science, vol 1060. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61053-7_49

Download citation

DOI: https://doi.org/10.1007/3-540-61053-7_49
Published: 07 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61053-3
Online ISBN: 978-3-540-49939-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Pipelining-dovetailing: A transformation to enhance software pipelining for nested loops

Abstract

Chapter PDF

Similar content being viewed by others

Instruction Level Loop De-optimization

Enhancing the Effectiveness of Inlining in Automatic Parallelization

A methodology pruning the search space of six compiler transformations by addressing them together as one problem and by exploiting the hardware architecture details

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Pipelining-dovetailing: A transformation to enhance software pipelining for nested loops

Abstract

Chapter PDF

Similar content being viewed by others

Instruction Level Loop De-optimization

Enhancing the Effectiveness of Inlining in Automatic Parallelization

A methodology pruning the search space of six compiler transformations by addressing them together as one problem and by exploiting the hardware architecture details

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation