Article

Free access

Minimizing register requirements under resource-constrained rate-optimal software pipelining

Authors:

R. Govindarajan,

Erik R. Altman,

Guang R. GaoAuthors Info & Claims

MICRO 27: Proceedings of the 27th annual international symposium on Microarchitecture

Pages 85 - 94

https://doi.org/10.1145/192724.192733

Published: 30 November 1994 Publication History

Abstract

In this paper we address the following software pipelining problem: given a loop and a machine architecture with a fixed number of processor resources (e.g. function units), how can one construct a software-pipelined schedule which runs on the given architecture at the maximum possible iteration rate (a` la rate-optimal) while minimizing the number of registers?

The main contributions of this paper are:

•First, we demonstrate that such problem can be described by a simple mathematical formulation with precise optimization objectives under periodic linear scheduling framework. The mathematical formulation provides a clear picture which permits one to visualize the overall solution space (for rate-optimal schedules) under different sets of constraints.

•Secondly, we show that a precise mathematical formulation and its solution does make a significant performance difference! We evaluated the performance of our method against three other leading contemporary heuristic methods: Huff's Slack Scheduling, Wang, Eisenbeis, Jourdan and Su's FRLC, and Gasperoni and Schwiegelshohn's modified list scheduling. Experimental results show that the method described in this paper performed significantly better than these methods.

References

[1]

A. Aiken and A. Nicolau. A realistic resourceconstrained software pipelining algorithm. In A. Nicolau, D. Gelernter, T. Gross, and D. Padua, editors, Advances ~n Languages and Compilers for Parallel Processing, Res. Monographs in Parallel and Distrib. Computing, chapter 14, pages 274-290. 1991.

[2]

J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren. Conversion of control dependence to data dependence. In Conf. Rec. o/the Tenth Ann. A UM Syrup. on Principles of Programming Languages, pages 177-189, Austin, TX, Jan. 24-26, 1983.

Digital Library

[3]

E. R. Altman, R. Govindarajan, and G. R. Gao. Software pipelining to minimize registers and resources. ACAPS Technical Memo 79, School of Computer Science, McGill University, Montrdal, Qua., 1994. under preparation.

[4]

J. C. Dehnert and R. A. Towle. Compiling for Cydra 5. Journal of $upercomput~ng, 7:181-227, May 1993.

Digital Library

[5]

K. Ebcioglu and T. Nakatani. A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture. In D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing, Res. Monographs in Parallel and Distrib. Computing, chapter 12, pages 213-229. 1990.

Digital Library

[6]

F. Gasperoni and U. Schwiegelshohn. Efficient algorithms for cyclic scheduling. Res. Rep. RC 17068, IBM T. J. Watson Res. Center, Yorktown Heights, NY, 1991.

Digital Library

[7]

M. B. Girkar, M. R. Haghighat, C. L. Lee, B. P. Leung, and D. A. Schouten. Parafrase-2 user's manual. TR RC-17068(#75743), Center for Supercomputing Research and Development, University of Illinois at Urbana-Champagne, IL, 1991.

[8]

R. Govindarajan, E. R. Altman, and G. R. Gao. Minimizing register requirement in resource-constrained software pipelining. ACAPS Technical Memo 80, School of Computer Science, McGill University, MontrdM, Qud., 1994.

[9]

R. A. Huff. Lifetime-sensitive modulo scheduling. In Proc. of the SIGPLAN '93 Conf. on Programming Language Design and Implementation, pages 258-267, Albuquerque, NM, Jun. 23-25, 1993.

Digital Library

[10]

C.-T. Hwang, J.-H. Lee, and Y.-C. Hsu. A formal approach to the scheduling problem in high-level synthesis. }EEE Trans. on Computer-A~ded Deszgn, 10(4):464-475, Apr. 1991.

[11]

M. Lam. Software pipelining: An effective schedub ing technique for VLIW machines. In Proc. of the SIGPLAN '88 Conf. on Programming Language Design and Implementation, pages 318-328, Atlanta, GA, Jun. 22-24, 1988.

Digital Library

[12]

S. Moon and K. Ebcio~lu. An efficient resourceconstrained global scheduling technique for superscalar and VLIW processors. In Proc. of the 25th Ann. Intl. Syrup. on Microarchitecture, pages 55-71, Portland, OR, Dec. 1-4, 1992.

Digital Library

[13]

Q. Ning and G. R. Gao. A novel framework of register allocation for software pipelining. In Conf. Rec. of the Twentieth Ann. A CM SIGPLAN-SIGA CT Syrup. on Principles of Programming Languages, pages 29-42, Charleston, SC, Jan. 10-13, 1993.

Digital Library

[14]

M. Rajagopalan and V. H. Allan. Efficient scheduling of fine grain parallelism in loops. In Proc. of the 26th Ann. Intl. Symp. on Microarchitecture, pages 2- 11, Austin, TX, Dec. 1-3, 1993.

Digital Library

[15]

S. Ramakrishnan. Software pipelining in PA-RISC compilers. Hewlett-Packard J., pages 39-45, Jun. 1992.

[16]

B. R. Rau and J. A. Fisher. Instruction-level parallel processing: History, overview and perspective. J. of Supercomputing, 7:9-50, May 1993.

Digital Library

[17]

B. R. Rau and C. D. Glaeser. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. In Proc. of the 1Jth Ann. Microprogramming Work., pages 183-198, Chatham, MA, Oct. 12-15, 1981.

Digital Library

[18]

B. R. Rau, M. Lee, P. P. Tirumalai, and M. S. Schlansker. Register allocation for software pipelined loops. In Proc. of the SIGPLAN '92 Conf. on Programm~ng Language Design and Implementation, pages 283-299, San Francisco, CA, Jun. 17-19, 1992.

Digital Library

[19]

B. R. Rau, M. S. Schlansker, and P. P. Tirumalai. Code generation schema for modulo scheduled loops. In Proc. of the 25th Ann. Intl. Syrup. on Microarchitecture, pages 158-169, Portland, OR, Dec. 1-4, 1992.

Digital Library

[20]

B. R. Rau, D. W. L. Yen, W. Yen, and R. A. Towle. The Cydra 5 departmental supercomputer. Computer, 22(1):12-35, Jan. 1989.

Digital Library

[21]

R. Reiter. Scheduling parallel computations. J. of the A CM, 15(4):590-599, Oct. 1968.

Digital Library

[22]

R. F. Touzeau. A Fortran compiler for the FPS- 164 scientific computer. In Proc. o/ the SIGPLAN '8g Syrup. on Compiler Construction, pages 48-57, Montreal, Qud., Jun. 17-22, 1984.

Digital Library

[23]

J. Wang, C. Eisenbeis, M. Jourdan, and B. Su. DE- composed Software Pipelining: A new approach to exploit irtstruc~ion-level parallelism for loop programs. Res. Rep. RR-1838, INRIA-Rocquencourt, France, Jan. 1993.

[24]

N. J. Wafter, S. A. Mahlke, W. Hwu, and B. Ramakrishna Rau. Reverse If-Conversion. In Proc. o/ the SIGPLAN '93 Conf. on Programming Language Design and Implementation, pages 290-299, Albuquerque, NM, Jun. 23-25, 1993.

Digital Library

Cited By

Zhong HLiu Z(2023)Long-life Sensitive Modulo Scheduling with Adaptive Loop Expansion2022 IEEE 28th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS56603.2022.00075(530-537)Online publication date: Jan-2023
https://doi.org/10.1109/ICPADS56603.2022.00075
Zhong HLiu ZLiu SMa SLi C(2022)Adaptive Low-Cost Loop Expansion for Modulo SchedulingNetwork and Parallel Computing10.1007/978-3-031-21395-3_3(30-41)Online publication date: 1-Dec-2022
https://doi.org/10.1007/978-3-031-21395-3_3
Lozano RSchulte C(2019)Survey on Combinatorial Register Allocation and Instruction SchedulingACM Computing Surveys10.1145/320092052:3(1-50)Online publication date: 18-Jun-2019
https://dl.acm.org/doi/10.1145/3200920
Show More Cited By

Index Terms

Recommendations

A Framework for Resource-Constrained Rate-Optimal Software Pipelining

The rapid advances in high-performance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedules. In this paper, we develop a framework to construct ...
Heuristics for register-constrained software pipelining
MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture

Software Pipelining is a loop scheduling technique that extracts parallelism from loops by overlapping the execution of several consecutive iterations. There has been a significant effort to produce throughput-optimal schedules under resource ...
Resource-Constrained Software Pipelining

This paper presents a software pipelining algorithm for the automatic extraction of fine-grain parallelism in general loops. The algorithm accounts for machine resource constraints in a way that smoothly integrates the management of resource constraints ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO 27: Proceedings of the 27th annual international symposium on Microarchitecture

November 1994

233 pages

ISBN:0897917073

DOI:10.1145/192724

Chairmen:
Hans Mulder
Intel Corp.
,
Matthew Farrens
Univ. of California, Davis

Copyright © 1994 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
IEEE-CS\TCMM: TC on Microprocessors & Microcomputers

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 November 1994

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

MICRO94

Sponsor:

SIGMICRO
IEEE-CS\TCMM

MICRO94: 27th Annual International Symposium on Microarchitecture

November 30 - December 2, 1994

California, San Jose, USA

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

56
Total Citations
View Citations
512
Total Downloads

Downloads (Last 12 months)79
Downloads (Last 6 weeks)19

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhong HLiu Z(2023)Long-life Sensitive Modulo Scheduling with Adaptive Loop Expansion2022 IEEE 28th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS56603.2022.00075(530-537)Online publication date: Jan-2023
https://doi.org/10.1109/ICPADS56603.2022.00075
Zhong HLiu ZLiu SMa SLi C(2022)Adaptive Low-Cost Loop Expansion for Modulo SchedulingNetwork and Parallel Computing10.1007/978-3-031-21395-3_3(30-41)Online publication date: 1-Dec-2022
https://doi.org/10.1007/978-3-031-21395-3_3
Lozano RSchulte C(2019)Survey on Combinatorial Register Allocation and Instruction SchedulingACM Computing Surveys10.1145/320092052:3(1-50)Online publication date: 18-Jun-2019
https://dl.acm.org/doi/10.1145/3200920
Aiken ABanerjee UKejariwal ANicolau AAiken ABanerjee UKejariwal ANicolau A(2016)Modulo SchedulingInstruction Level Parallelism10.1007/978-1-4899-7797-7_6(133-165)Online publication date: 30-Nov-2016
https://doi.org/10.1007/978-1-4899-7797-7_6
Lee JLee JLee JPaek Y(2014)Improving performance of loops on DIAM-based VLIW architecturesACM SIGPLAN Notices10.1145/2666357.259782549:5(135-144)Online publication date: 12-Jun-2014
https://dl.acm.org/doi/10.1145/2666357.2597825
Lee JLee JLee JPaek YZhang YKulkarni P(2014)Improving performance of loops on DIAM-based VLIW architecturesProceedings of the 2014 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems10.1145/2597809.2597825(135-144)Online publication date: 12-Jun-2014
https://dl.acm.org/doi/10.1145/2597809.2597825
Dai STan MHao KZhang Z(2014)Flushing-Enabled Loop Pipelining for High-Level SynthesisProceedings of the 51st Annual Design Automation Conference10.1145/2593069.2593143(1-6)Online publication date: 1-Jun-2014
https://dl.acm.org/doi/10.1145/2593069.2593143
Eichenberger ADavidson EAbraham S(2014)Optimum modulo schedules for minimum register requirementsACM International Conference on Supercomputing 25th Anniversary Volume10.1145/2591635.2667171(227-236)Online publication date: 10-Jun-2014
https://dl.acm.org/doi/10.1145/2591635.2667171
Eichenberger ADavidson EAbraham S(2014)Author retrospective for optimum modulo schedules for minimum register requirementsACM International Conference on Supercomputing 25th Anniversary Volume10.1145/2591635.2591653(35-36)Online publication date: 10-Jun-2014
https://dl.acm.org/doi/10.1145/2591635.2591653
Carle TPotop-Butucaru D(2014)Predicate-aware, makespan-preserving software pipelining of scheduling tablesACM Transactions on Architecture and Code Optimization10.1145/257967611:1(1-26)Online publication date: 1-Feb-2014
https://dl.acm.org/doi/10.1145/2579676
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents