Quantitative Evaluation of Register Pressure on Software Pipelined Loops

Josep Llosa,
Eduard Ayguadé &
Mateo Valero

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Software Pipelining is a loop scheduling technique that extracts loop parallelism by overlapping the execution of several consecutive iterations. One of the drawbacks of software pipelining is its high register requirements, which increase with the number of functional units and their degree of pipelining. This paper analyzes the register requirements of software pipelined loops. It also evaluates the effects on performance of the addition of spill code. Spill code is needed when the number of registers required by the software pipelined loop is larger than the number of registers of the target machine. This spill code increases memory traffic and can reduce performance. Finally, compilers can apply transformations in order to reduce the number of memory accesses and increase functional unit utilization. The paper also evaluates the negative effect on register requirements that some of these transformations might produce on loops.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A methodology correlating code optimizations with data memory accesses, execution time and energy consumption

Article 13 May 2019

Instruction Level Loop De-optimization

An Approach for Compiler Optimization to Exploit Instruction Level Parallelism

REFERENCES

Digital Equipment Corporation (DEC), DECchip 21064-AA Microprocessor Hardware Reference Manual (1992).
S. Mirapuri, M. Woodacre, and N. Vasseghi, The Mips R4000 processor, IEEE Micro 12(2):10–22 (April 1992).
Google Scholar
S. W. White and S. Dhawan, POWER2: Next generation of the RISC System/6000 family. IBM RISC System/6000 Technology: Volume II, IBM Corporation (1993).
P. Y. T. Hsu, Designing the TFP microprocessor, IEEE Micro 14(2):23–33 (April 1994).
Google Scholar
J. H. Edmonson, P. Rubinfeld, R. Preston, and V. Rajagopalan, Superscalar instruction execution in the 21164 Alpha microprocessor, IEEE Micro 15(2):33–43 (April 1995).
Google Scholar
A. E. Charlesworth, An approach to scientific array processing: The architectural design of the AP120B/FPS-164 family, Computer 14(9):18–27 (1981).
Google Scholar
V. H. Allan, R. B. Jones, R. M. Lee, and S. J. Allan, Software pipelining, ACM Computing Surveys 27(3):367–432 (September 1995).
Google Scholar
B. R. Rau and C. D. Glaeser, Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing, Proc. 14th Ann. Microprogramming Workshop, pp. 183–197 (October 1981).
M. S. Lam, Software pipelining: An effective scheduling technique for VLIW machines, Proc. SIGPLAN Conf. Progr. Lang. Design and Implementation, pp. 318–328 ( June 1988).
S. Jain, Circular scheduling: A new technique to perform software pipelining, Proc. ACM SIGPLAN, Conf. Progr. Lang. Design and Implementation, pp. 219–228 (June 1991).
R. A. Huff, Lifetime-sensitive modulo scheduling, Six Conf. Progr. Lang. Design and Implementation, pp. 258–267 (1993).
P. Tirumalai, M. Lee, and M. S. Schlansker, Parallelization of loops with exits on pipelined architectures, Proc. Supercomputing, pp. 200–212 (November 1990).
B. R. Rau, Iterative modulo scheduling: An algorithm for software pipelining loops, Proc. 27th Ann. Int'l. Symp. Microarchitecture, pp. 63–74 (November 1994).
A. Aiken and A. Nicolau, A realistic resource-constrained software pipelining algorithm, Advances in Languages and Compilers for Parallel Processing, pp. 274–290 (1991).
N. J. Warter and N. Partamian, Modulo scheduling with multiple initiation intervals, Proc. 28th Int'l. Symp. on Microarchitecture, pp. 111–118 (November 1995).
J. Llosa, M. Valero, E. Ayguadé, and A. Gonzalez, Hypernode reduction modulo scheduling, Proc. 28th Ann. Int'l. Symp. on Microarchitecture (MICRO-28), pp. 350–360 (November 1995).
S. Ramakrishnan, Software pipelining in PA-RISC compilers, Hewlett-Packard Journal, pp. 39–45 (July 1992).
J. C. Dehnert and R. A. Towle, Compiling for the Cydra 5, J. Supercomputing, 7(1/2):181–228 (May 1993).
Google Scholar
W. Mangione-Smith, S. G. Abraham, and E. S. Davidson, Register requirements of pipelined processors, Int'l. Conf. Supercomputing, pp. 260–246 (July 1992).
R. Govindarajan, E. R. Altman, and G. R. Gao, Minimal register requirements under resource-constrained software pipelining, Proc. 27th Ann. Int'l. Symp. on Microarchitecture (MICRO-27), pp. 85–94 (November 1994).
A. E. Eichenberger, E. S. Davidson, and S. G. Abraham, Optimum modulo schedules for minimum register requirements, Proc. Int'l. Conf. on Supercomputing, pp. 31–40 ( July 1995).
J. Llosa, A. Gonzalez, E. Ayguadé, and M. Valero, Swing modulo scheduling: A lifetime-sensitive approach, IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques (PACT'96), pp. 80–86 (October 1996).
A. E. Eichenberger and E. S. Davidson, Stage scheduling: A technique to reduce the register requirements of a modulo schedule, Proc. 28th Ann. Int'l. Symp. on Microarchitecture (MICRO-28), pp. 338–349 (November 1995).
F. Sánchez and J. Cortadella, RESIS: A new methodology for register optimization in software pipelining, Proc. European Conf. on Parallel Processing (EUROPAR) (August 1996).
L. J. Hendren, G. R. Gao, E. R. Altman, and C. Mukerji, Register allocation using cyclic interval graphs: A new approach to an old problem, ACAPS Tech. Memo 33, Advanced Computer Architecture and Program Structures Group, McGill University, 1992.
B. R. Rau, M. Lee, P. Tirumalai, and P. Schlansker, Register allocation for software pipelined loops, Proc. ACM SIGPLAN Conf. Progr. Lang. Design and Implementation, pp. 283–299 (June 1992).
C. Eisenbeis, S. Lelait, and B. Marmol, The meeting graph: a new model for loop cyclic register allocation, Proc, Fifth Workshop on Compilers for Parallel Computers (CPC95), pp. 503–516 (June 1995).
J. Llosa, M. Valero, and E. Ayguadé, Heuristics for register-constrained software pipelining, Proc. 29th Ann. Int'l. Symp. on Microarchitecture (MICRO-29), pp. 250–261 (December 1996).
R. Jolly, A 9-ns 1.4 gigabyte 17-ported CMOS register file, IEEE J. Solid-State Circuits, 25(10):1407–1412 (October 1991).
Google Scholar
N. Weste and K. Eshraghian, Principles of CMOS VLSI Design: A Systems Perspective, Addison-Wesley Publishing (1988).
A. Capitanio, N. Dutt, and A. Nicolau, Partitioned register files for VLIWs: A preliminary analysis of tradeoffs, MICRO-25, pp. 292–300 (1992).
J. Llosa, M. Valero, J. Fortes, and E. Ayguadé, Using sacks to organize register files in VLIW machines, CONPAR 94-VAPP VI (September 1994).
J. Llosa, M. Valero, and E. Ayguadé, Non-consistent dual register files to reduce register pressure, First Symp. High Performance Computer Architecture, pp. 22–31 (January 1995).
L. Gwennap, Digital 21264 sets new standard, Microprocessor Report, 10(14):1–6 (October 1996).
Google Scholar
J. Llosa, Reducing the Impact of Register Pressure on Software Pipelining, Ph.D. Thesis, Universitat Politècnica de Catalunya (January 1996).
M. S. Lam, A Systolic Array Optimizing Compiler, Kluwer Academic Publishers, 1989.
J. C. Dehnert, P. Y. T. Hsu, and J. P. Bratt, Overlapped loop support in the Cydra 5, Proc. Third Int'l. Conf. Architectural Support for Progr. Lang. Oper. Syst. (ASPLOS-III ), pp. 26–38 (April 1989).
M. Berry, D. Chen, P. Koss, and D. Kuck, The Perfect club benchmarks: Effective performance evaluation of supercomputers, Technical Report827, Center for Supercomputing Research and Development (November 1988).
E. Ayguadé, C. Barrado, J. Labarta, D. López, S. Moreno, D. Padua, and M. Valero, A uniform representation for high-level and instruction-level transformations. Technical Report UPC-CEPBA 95-01, Universitat Politècnica de Catalunya (January 1995).
B. Blume, R. Eigenmann, K. Faigin, J. Grout, J. Hoeflinger, D. Padua, P. Petersen, B. Pottenger, L. Rauchwerger, P. Tu, and S. Weatherford, POLARIS: The next generation in parallelizing compilers, Proc. of the Seventh Workshop on Languages and Compilers for Parallel Computers (July 1994).
J. R. Allen, K. Kennedy, and J. Warren, Conversion of control dependence to data dependence, Proc. 10th Ann. Symp. on Principles of Progr. Lang. (January 1983).
F. H. McMahon, The Livermore FORTRAN kernels: A computer test of the numerical performance range, Technical Report, Lawrence Livermore Laboratories (December 1986).
B. R. Rau, D. W. L. Yen, W. Yen, and R. A. Towle, The Cydra 5 departmental supercomputer: design philosophies, decisions and trade-offs. IEEE Computer 22(1):12–35 (January 1989).
Google Scholar
J. J. Dongarra and A. R. Hinds, Unrolling loops in FORTRAN. Software-Practice and Experience 9:219–26 (March 1979).
Google Scholar
R. B. Jones and V. H. Allan, Software pipelining: A comparison and improvement, Proc. 23rd Ann. Workshop on Microprogramming and Microarchitecture (MICRO-23), pp. 46–56 (November 1990).
D. Callahan, S. Carr, and K. Kennedy, Improving register allocation for subscripted variables, Proc. Progr. Lang. Design and Implementation, ACM SIGPLAN, pp. 53–65 (June 1990).

Download references

Authors

Josep Llosa
View author publications
You can also search for this author in PubMed Google Scholar
Eduard Ayguadé
View author publications
You can also search for this author in PubMed Google Scholar
Mateo Valero
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Llosa, J., Ayguadé, E. & Valero, M. Quantitative Evaluation of Register Pressure on Software Pipelined Loops. International Journal of Parallel Programming 26, 121–142 (1998). https://doi.org/10.1023/A:1018743102645

Download citation

Issue Date: April 1998
DOI: https://doi.org/10.1023/A:1018743102645

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A methodology correlating code optimizations with data memory accesses, execution time and energy consumption

Instruction Level Loop De-optimization

An Approach for Compiler Optimization to Exploit Instruction Level Parallelism

REFERENCES

Rights and permissions

About this article

Cite this article

Subscribe and save

Buy Now

Navigation

Quantitative Evaluation of Register Pressure on Software Pipelined Loops

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A methodology correlating code optimizations with data memory accesses, execution time and energy consumption

Instruction Level Loop De-optimization

An Approach for Compiler Optimization to Exploit Instruction Level Parallelism

REFERENCES

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now

Search

Navigation