Abstract
Software Pipelining is a loop scheduling technique that extracts loop parallelism by overlapping the execution of several consecutive iterations. One of the drawbacks of software pipelining is its high register requirements, which increase with the number of functional units and their degree of pipelining. This paper analyzes the register requirements of software pipelined loops. It also evaluates the effects on performance of the addition of spill code. Spill code is needed when the number of registers required by the software pipelined loop is larger than the number of registers of the target machine. This spill code increases memory traffic and can reduce performance. Finally, compilers can apply transformations in order to reduce the number of memory accesses and increase functional unit utilization. The paper also evaluates the negative effect on register requirements that some of these transformations might produce on loops.
Similar content being viewed by others
REFERENCES
Digital Equipment Corporation (DEC), DECchip 21064-AA Microprocessor Hardware Reference Manual (1992).
S. Mirapuri, M. Woodacre, and N. Vasseghi, The Mips R4000 processor, IEEE Micro 12(2):10–22 (April 1992).
S. W. White and S. Dhawan, POWER2: Next generation of the RISC System/6000 family. IBM RISC System/6000 Technology: Volume II, IBM Corporation (1993).
P. Y. T. Hsu, Designing the TFP microprocessor, IEEE Micro 14(2):23–33 (April 1994).
J. H. Edmonson, P. Rubinfeld, R. Preston, and V. Rajagopalan, Superscalar instruction execution in the 21164 Alpha microprocessor, IEEE Micro 15(2):33–43 (April 1995).
A. E. Charlesworth, An approach to scientific array processing: The architectural design of the AP120B/FPS-164 family, Computer 14(9):18–27 (1981).
V. H. Allan, R. B. Jones, R. M. Lee, and S. J. Allan, Software pipelining, ACM Computing Surveys 27(3):367–432 (September 1995).
B. R. Rau and C. D. Glaeser, Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing, Proc. 14th Ann. Microprogramming Workshop, pp. 183–197 (October 1981).
M. S. Lam, Software pipelining: An effective scheduling technique for VLIW machines, Proc. SIGPLAN Conf. Progr. Lang. Design and Implementation, pp. 318–328 ( June 1988).
S. Jain, Circular scheduling: A new technique to perform software pipelining, Proc. ACM SIGPLAN, Conf. Progr. Lang. Design and Implementation, pp. 219–228 (June 1991).
R. A. Huff, Lifetime-sensitive modulo scheduling, Six Conf. Progr. Lang. Design and Implementation, pp. 258–267 (1993).
P. Tirumalai, M. Lee, and M. S. Schlansker, Parallelization of loops with exits on pipelined architectures, Proc. Supercomputing, pp. 200–212 (November 1990).
B. R. Rau, Iterative modulo scheduling: An algorithm for software pipelining loops, Proc. 27th Ann. Int'l. Symp. Microarchitecture, pp. 63–74 (November 1994).
A. Aiken and A. Nicolau, A realistic resource-constrained software pipelining algorithm, Advances in Languages and Compilers for Parallel Processing, pp. 274–290 (1991).
N. J. Warter and N. Partamian, Modulo scheduling with multiple initiation intervals, Proc. 28th Int'l. Symp. on Microarchitecture, pp. 111–118 (November 1995).
J. Llosa, M. Valero, E. Ayguadé, and A. Gonzalez, Hypernode reduction modulo scheduling, Proc. 28th Ann. Int'l. Symp. on Microarchitecture (MICRO-28), pp. 350–360 (November 1995).
S. Ramakrishnan, Software pipelining in PA-RISC compilers, Hewlett-Packard Journal, pp. 39–45 (July 1992).
J. C. Dehnert and R. A. Towle, Compiling for the Cydra 5, J. Supercomputing, 7(1/2):181–228 (May 1993).
W. Mangione-Smith, S. G. Abraham, and E. S. Davidson, Register requirements of pipelined processors, Int'l. Conf. Supercomputing, pp. 260–246 (July 1992).
R. Govindarajan, E. R. Altman, and G. R. Gao, Minimal register requirements under resource-constrained software pipelining, Proc. 27th Ann. Int'l. Symp. on Microarchitecture (MICRO-27), pp. 85–94 (November 1994).
A. E. Eichenberger, E. S. Davidson, and S. G. Abraham, Optimum modulo schedules for minimum register requirements, Proc. Int'l. Conf. on Supercomputing, pp. 31–40 ( July 1995).
J. Llosa, A. Gonzalez, E. Ayguadé, and M. Valero, Swing modulo scheduling: A lifetime-sensitive approach, IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques (PACT'96), pp. 80–86 (October 1996).
A. E. Eichenberger and E. S. Davidson, Stage scheduling: A technique to reduce the register requirements of a modulo schedule, Proc. 28th Ann. Int'l. Symp. on Microarchitecture (MICRO-28), pp. 338–349 (November 1995).
F. Sánchez and J. Cortadella, RESIS: A new methodology for register optimization in software pipelining, Proc. European Conf. on Parallel Processing (EUROPAR) (August 1996).
L. J. Hendren, G. R. Gao, E. R. Altman, and C. Mukerji, Register allocation using cyclic interval graphs: A new approach to an old problem, ACAPS Tech. Memo 33, Advanced Computer Architecture and Program Structures Group, McGill University, 1992.
B. R. Rau, M. Lee, P. Tirumalai, and P. Schlansker, Register allocation for software pipelined loops, Proc. ACM SIGPLAN Conf. Progr. Lang. Design and Implementation, pp. 283–299 (June 1992).
C. Eisenbeis, S. Lelait, and B. Marmol, The meeting graph: a new model for loop cyclic register allocation, Proc, Fifth Workshop on Compilers for Parallel Computers (CPC95), pp. 503–516 (June 1995).
J. Llosa, M. Valero, and E. Ayguadé, Heuristics for register-constrained software pipelining, Proc. 29th Ann. Int'l. Symp. on Microarchitecture (MICRO-29), pp. 250–261 (December 1996).
R. Jolly, A 9-ns 1.4 gigabyte 17-ported CMOS register file, IEEE J. Solid-State Circuits, 25(10):1407–1412 (October 1991).
N. Weste and K. Eshraghian, Principles of CMOS VLSI Design: A Systems Perspective, Addison-Wesley Publishing (1988).
A. Capitanio, N. Dutt, and A. Nicolau, Partitioned register files for VLIWs: A preliminary analysis of tradeoffs, MICRO-25, pp. 292–300 (1992).
J. Llosa, M. Valero, J. Fortes, and E. Ayguadé, Using sacks to organize register files in VLIW machines, CONPAR 94-VAPP VI (September 1994).
J. Llosa, M. Valero, and E. Ayguadé, Non-consistent dual register files to reduce register pressure, First Symp. High Performance Computer Architecture, pp. 22–31 (January 1995).
L. Gwennap, Digital 21264 sets new standard, Microprocessor Report, 10(14):1–6 (October 1996).
J. Llosa, Reducing the Impact of Register Pressure on Software Pipelining, Ph.D. Thesis, Universitat Politècnica de Catalunya (January 1996).
M. S. Lam, A Systolic Array Optimizing Compiler, Kluwer Academic Publishers, 1989.
J. C. Dehnert, P. Y. T. Hsu, and J. P. Bratt, Overlapped loop support in the Cydra 5, Proc. Third Int'l. Conf. Architectural Support for Progr. Lang. Oper. Syst. (ASPLOS-III ), pp. 26–38 (April 1989).
M. Berry, D. Chen, P. Koss, and D. Kuck, The Perfect club benchmarks: Effective performance evaluation of supercomputers, Technical Report827, Center for Supercomputing Research and Development (November 1988).
E. Ayguadé, C. Barrado, J. Labarta, D. López, S. Moreno, D. Padua, and M. Valero, A uniform representation for high-level and instruction-level transformations. Technical Report UPC-CEPBA 95-01, Universitat Politècnica de Catalunya (January 1995).
B. Blume, R. Eigenmann, K. Faigin, J. Grout, J. Hoeflinger, D. Padua, P. Petersen, B. Pottenger, L. Rauchwerger, P. Tu, and S. Weatherford, POLARIS: The next generation in parallelizing compilers, Proc. of the Seventh Workshop on Languages and Compilers for Parallel Computers (July 1994).
J. R. Allen, K. Kennedy, and J. Warren, Conversion of control dependence to data dependence, Proc. 10th Ann. Symp. on Principles of Progr. Lang. (January 1983).
F. H. McMahon, The Livermore FORTRAN kernels: A computer test of the numerical performance range, Technical Report, Lawrence Livermore Laboratories (December 1986).
B. R. Rau, D. W. L. Yen, W. Yen, and R. A. Towle, The Cydra 5 departmental supercomputer: design philosophies, decisions and trade-offs. IEEE Computer 22(1):12–35 (January 1989).
J. J. Dongarra and A. R. Hinds, Unrolling loops in FORTRAN. Software-Practice and Experience 9:219–26 (March 1979).
R. B. Jones and V. H. Allan, Software pipelining: A comparison and improvement, Proc. 23rd Ann. Workshop on Microprogramming and Microarchitecture (MICRO-23), pp. 46–56 (November 1990).
D. Callahan, S. Carr, and K. Kennedy, Improving register allocation for subscripted variables, Proc. Progr. Lang. Design and Implementation, ACM SIGPLAN, pp. 53–65 (June 1990).
Rights and permissions
About this article
Cite this article
Llosa, J., Ayguadé, E. & Valero, M. Quantitative Evaluation of Register Pressure on Software Pipelined Loops. International Journal of Parallel Programming 26, 121–142 (1998). https://doi.org/10.1023/A:1018743102645
Issue Date:
DOI: https://doi.org/10.1023/A:1018743102645