Article

Free access

Lifetime-sensitive modulo scheduling

Author:

Richard A. HuffAuthors Info & Claims

PLDI '93: Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation

Pages 258 - 267

https://doi.org/10.1145/155090.155115

Published: 01 June 1993 Publication History

PDF eReader

Abstract

This paper shows how to software pipeline a loop for minimal register pressure without sacrificing the loop's minimum execution time. This novel bidirectional slack-scheduling method has been implemented in a FORTRAN compiler and tested on many scientific benchmarks. The empirical results—when measured against an absolute lower bound on execution time, and against a novel schedule-independent absolute lower bound on register pressure—indicate near-optimal performance.

References

[1]

J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren. Conversion of control dependence to data dependence. In Proceedings of the Tenth Annual ACM Symposium on Principles of Programming Languages, pages 177-189, Jan. 1983.

Digital Library

Google Scholar

[2]

G. R. Beck, D. W. L. Yen, and T. L. Anderson. The Cych'a-5 mini-supercomputer: Architecture and implementation. Journal of Supercomputing, 7(1/2), Jan. 1993.

Digital Library

Google Scholar

[3]

D. G. Bradlee, S. J. Eggers, and R. R. Henry. Integrating register allocation and instruction scheduling for RISCs. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Opera,ling Systems, pages 122-131, Apr. 1991.

Digital Library

Google Scholar

[4]

R. Cytron, I. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. An efficient method of computing static single assignment form. In Proceedings of the Sixteenth Annual ACM Symposium on Principles of Programming Languages, pages 25-35, Jan. 1989.

Digital Library

Google Scholar

[5]

J. C. Dehnert, P. Y.-T Hsu, and I. P. Bratt. Overlapped loop support in the Cydra 5. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 26-38, Apr. 1989.

Digital Library

Google Scholar

[6]

J. C. Dehnert and R. A. Towle. Compiling for the Cydra 5. Journal of Supercomputing, 7(1/2), Jan. 1993.

Digital Library

Google Scholar

[7]

P. B. Gibbons and S. S. Muchnick. Efficient instruction scheduling for a pipelined architecture. In Proceedings of the ACM SIGPLAN '86 Symposium on Compiler Construction, pages 11-16, 1986.

Digital Library

Google Scholar

[8]

J. R. Goodman and W.-C. Hsu. Code scheduling and register allocation in large basic blocks. In Proceedings of the 1988 International Conference on Supercomputing, pages 442-452, June 1988.

Digital Library

Google Scholar

[9]

M. S. Lain. A Systolic Array Optimizing Compiler. Kh. twer Academic Publishers, 1989.

Digital Library

Google Scholar

[10]

D. Landakov, S. Davidson, B. Shriver, and E W. Mallet. Local microcode compaction techniques. ACM Computing Surveys, pages 261-294, Sept. 1980.

Digital Library

Google Scholar

[11]

E. L. Lawler. Combinatorial Optimization: Networks and Matroids. Saunders College Publishing, 1976.

Google Scholar

[12]

P.G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O'Donnell, and J. C. Ruttenberg. The Multifiow trace scheduling compiler. Journal of Supercomputing, 7(1}2), Jan. 1993.

Digital Library

Google Scholar

[13]

J. C. H. Park and M. S. Schlansker. On predicated execution. Technical Report HPL-91-58, Hewlett-Packard Laboratories, May 1991.

Google Scholar

[14]

S. Ramakrishnan. Software pipelining in PA-RISC compilers. Hewlett-Packard Journal, pages 39-45, June 1992.

Google Scholar

[15]

B. R. Rau. Data flow and dependence analysis for instruction level parallelism. In Fourth Workshop on Languages and Compilers for Parallel Computing, pages 236-250, Aug. 1991.

Digital Library

Google Scholar

[16]

B.R. Rau and J. A. Fisher. Instruction-level parallel processing: History, overview and perspective. Journal of Supercomputing, 7(112), Jan. 1993.

Digital Library

Google Scholar

[17]

B. R. Rau and C. D. Glaeser. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. In Proceedings of the 14th Annual Microprogramming Workshop, pages 183-197, Oct. 1981.

Digital Library

Google Scholar

[18]

B.R. Rau, M. Lee, P. Tu'umalai, and M. S. Schlansker. Register allocation for software pipelined loops. In Proceedings of the ACM SIGPLAN '92 Conference on Programming Language Design and Implementation, pages 283-299, June 1992.

Digital Library

Google Scholar

[19]

B.R. Rau, M. S. Schlansker, and P. Tlrumalai. Code generation schemas for modulo scheduled loops. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 158-169, Dec. 1992.

Digital Library

Google Scholar

[20]

B. R. Rau, D. W. L. Yen, W. Yen, and R. A. Towle. The Cydra 5 departmental supercomputer: Design philosophies, decisions, and trade-offs. IEEE Computer, pages 12-35, Jan. 1989.

Digital Library

Google Scholar

[21]

J.C. Tiernan. An efficient search algorithm to find the elementary circuits of a graph. Communications of the ACM, pages 722-726, Dec. 1970.

Digital Library

Google Scholar

[22]

P. Tlrumalai, M. Lee, and M. S. Schlansker. Parallelization of loops with exits on pipelined architectures, in IEEE Proceedings of Supercomputing ' 90, pages 200-212, Nov. 1990.

Digital Library

Google Scholar

[23]

N. J. Warter, J. W. Bockhaus, G. E. Haab, and K. Subramanian. Enhancexl modulo scheduling for loops with conditional branches. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 170-179, Dec. 1992.

Digital Library

Google Scholar

[24]

P. Wijaya and V. H. Allan. Incremental foresighted local compaction. In Proceedings of the 22nd Annual International Symposium on Microarchitecture, pages 163-171, Aug. 1989.

Digital Library

Google Scholar

[25]

M. E. Wolf and M. S. Lain. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation, pages 30-44, June 1991.

Digital Library

Google Scholar

Cited By

View all

Beidas RAnderson J(2024)Mapping Enumeration for Multi-Context CGRAs Using Zero-Suppressed Binary Decision Diagrams2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM60383.2024.00026(151-161)Online publication date: 5-May-2024
https://doi.org/10.1109/FCCM60383.2024.00026
Abel ASharma SReineke J(2023)Facile: Fast, Accurate, and Interpretable Basic-Block Throughput Prediction2023 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC59245.2023.00023(87-99)Online publication date: 1-Oct-2023
https://doi.org/10.1109/IISWC59245.2023.00023
Oppermann JReuter-Oppermann MSommer LKoch ASinnen O(2019)Exact and Practical Modulo Scheduling for High-Level SynthesisACM Transactions on Reconfigurable Technology and Systems10.1145/331767012:2(1-26)Online publication date: 6-May-2019
https://dl.acm.org/doi/10.1145/3317670
Show More Cited By

Index Terms

Lifetime-sensitive modulo scheduling
1. Computer systems organization
  1. Architectures
    1. Other architectures
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages
      1. Language types

Recommendations

Lifetime-sensitive modulo scheduling

This paper shows how to software pipeline a loop for minimal register pressure without sacrificing the loop's minimum execution time. This novel bidirectional slack-scheduling method has been implemented in a FORTRAN compiler and tested on many ...
Register Constrained Modulo Scheduling

Abstract--Software pipelining is an instruction scheduling technique that exploits the instruction level parallelism (ILP) available in loops by overlapping operations from various successive loop iterations. The main drawback of aggressive software ...
Modulo scheduling for control-intensive general-purpose programs

Reviews

Reviewer: Benjamin Rayborn Seyfarth

The author presents the concept of bidirectional slack-scheduling to improve the efficiency of register allocation for loops. Basically, he presents heuristics for determining whether to attempt to schedule an operation as early as possible or as late as possible in an attempt to reduce the register pressure within a loop. Then he uses a backtracking algorithm to perform instruction scheduling. The author also presents empirical results from incorporating his techniques into a FORTRAN compiler for a hypothetical very long instruction word (VLIW) machine similar to the Cydrome Cydra. His results indicate that his scheduling technique allows optimal scheduling for 96 percent of the loops in a set of sample code including 1525 loops, resulting in an 11 percent improvement over the Cydrome compiler. The author further states that the technique can be adapted to RISC machines. The paper is a nice blend of theory and practice. The reader is expected to be familiar with register allocation strategies and, for such readers, the paper is quite readable. For those needing to study some background material, this paper includes a sufficient bibliography for further exploration. Both compiler writers and theorists will find this paper useful.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

PLDI '93: Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation

August 1993

313 pages

ISBN:0897915984

DOI:10.1145/155090

Editor:
Robert Cartwright
Rice Univ., Houston, TX

ACM SIGPLAN Notices Volume 28, Issue 6
June 1993
313 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/173262
Editor:
Richard Wexelblat
IDA/CSED, Alexandria, VA
Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1993

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

PLDI93

Sponsor:

SIGPLAN

PLDI93: ACM SIGPLAN Conference on Programming Languages Design and Implementation

June 21 - 25, 1993

New Mexico, Albuquerque, USA

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

203
Total Citations
View Citations
881
Total Downloads

Downloads (Last 12 months)113
Downloads (Last 6 weeks)9

Reflects downloads up to 24 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Beidas RAnderson J(2024)Mapping Enumeration for Multi-Context CGRAs Using Zero-Suppressed Binary Decision Diagrams2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM60383.2024.00026(151-161)Online publication date: 5-May-2024
https://doi.org/10.1109/FCCM60383.2024.00026
Abel ASharma SReineke J(2023)Facile: Fast, Accurate, and Interpretable Basic-Block Throughput Prediction2023 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC59245.2023.00023(87-99)Online publication date: 1-Oct-2023
https://doi.org/10.1109/IISWC59245.2023.00023
Oppermann JReuter-Oppermann MSommer LKoch ASinnen O(2019)Exact and Practical Modulo Scheduling for High-Level SynthesisACM Transactions on Reconfigurable Technology and Systems10.1145/331767012:2(1-26)Online publication date: 6-May-2019
https://dl.acm.org/doi/10.1145/3317670
Najoui MHatim ABelkouch SChabini N(2019)Optimized Implementation of Modified Gram Schmidt Algorithm on VLIW Architecture2019 4th World Conference on Complex Systems (WCCS)10.1109/ICoCS.2019.8930756(1-7)Online publication date: Apr-2019
https://doi.org/10.1109/ICoCS.2019.8930756
Oppermann JReuter-Oppermann MSommer LSinnen OKoch A(2018)Dependence Graph Preprocessing for Faster Exact Modulo Scheduling in High-Level Synthesis2018 28th International Conference on Field Programmable Logic and Applications (FPL)10.1109/FPL.2018.00055(280-2806)Online publication date: Aug-2018
https://doi.org/10.1109/FPL.2018.00055
Sittel PKumm MOppermann JMoller KZipf PKoch A(2018)ILP-Based Modulo Scheduling and Binding for Register Minimization2018 28th International Conference on Field Programmable Logic and Applications (FPL)10.1109/FPL.2018.00053(265-2656)Online publication date: Aug-2018
https://doi.org/10.1109/FPL.2018.00053
Oppermann JKoch AReuter-Oppermann MSinnen O(2016)ILP-based modulo scheduling for high-level synthesisProceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems10.1145/2968455.2968512(1-10)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1145/2968455.2968512
Al-Jarah OAl-khdour T(2015)Slackness-Based Co-Scheduling Hardware and Software Pipelines with Multiple Latency SequencesInternational Journal of Computers and Applications10.1080/1206212X.2000.1144161022:3(121-128)Online publication date: 10-Jul-2015
https://doi.org/10.1080/1206212X.2000.11441610
Hanzalek ZHanen C(2015)The impact of core precedences in a cyclic RCPSP with precedence delaysJournal of Scheduling10.1007/s10951-014-0399-418:3(275-284)Online publication date: 1-Jun-2015
https://dl.acm.org/doi/10.1007/s10951-014-0399-4
Dai STan MHao KZhang Z(2014)Flushing-Enabled Loop Pipelining for High-Level SynthesisProceedings of the 51st Annual Design Automation Conference10.1145/2593069.2593143(1-6)Online publication date: 1-Jun-2014
https://dl.acm.org/doi/10.1145/2593069.2593143
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations