Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/155090.155115acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article
Free access

Lifetime-sensitive modulo scheduling

Published: 01 June 1993 Publication History

Abstract

This paper shows how to software pipeline a loop for minimal register pressure without sacrificing the loop's minimum execution time. This novel bidirectional slack-scheduling method has been implemented in a FORTRAN compiler and tested on many scientific benchmarks. The empirical results—when measured against an absolute lower bound on execution time, and against a novel schedule-independent absolute lower bound on register pressure—indicate near-optimal performance.

References

[1]
J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren. Conversion of control dependence to data dependence. In Proceedings of the Tenth Annual ACM Symposium on Principles of Programming Languages, pages 177-189, Jan. 1983.
[2]
G. R. Beck, D. W. L. Yen, and T. L. Anderson. The Cych'a-5 mini-supercomputer: Architecture and implementation. Journal of Supercomputing, 7(1/2), Jan. 1993.
[3]
D. G. Bradlee, S. J. Eggers, and R. R. Henry. Integrating register allocation and instruction scheduling for RISCs. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Opera,ling Systems, pages 122-131, Apr. 1991.
[4]
R. Cytron, I. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. An efficient method of computing static single assignment form. In Proceedings of the Sixteenth Annual ACM Symposium on Principles of Programming Languages, pages 25-35, Jan. 1989.
[5]
J. C. Dehnert, P. Y.-T Hsu, and I. P. Bratt. Overlapped loop support in the Cydra 5. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 26-38, Apr. 1989.
[6]
J. C. Dehnert and R. A. Towle. Compiling for the Cydra 5. Journal of Supercomputing, 7(1/2), Jan. 1993.
[7]
P. B. Gibbons and S. S. Muchnick. Efficient instruction scheduling for a pipelined architecture. In Proceedings of the ACM SIGPLAN '86 Symposium on Compiler Construction, pages 11-16, 1986.
[8]
J. R. Goodman and W.-C. Hsu. Code scheduling and register allocation in large basic blocks. In Proceedings of the 1988 International Conference on Supercomputing, pages 442-452, June 1988.
[9]
M. S. Lain. A Systolic Array Optimizing Compiler. Kh. twer Academic Publishers, 1989.
[10]
D. Landakov, S. Davidson, B. Shriver, and E W. Mallet. Local microcode compaction techniques. ACM Computing Surveys, pages 261-294, Sept. 1980.
[11]
E. L. Lawler. Combinatorial Optimization: Networks and Matroids. Saunders College Publishing, 1976.
[12]
P.G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O'Donnell, and J. C. Ruttenberg. The Multifiow trace scheduling compiler. Journal of Supercomputing, 7(1}2), Jan. 1993.
[13]
J. C. H. Park and M. S. Schlansker. On predicated execution. Technical Report HPL-91-58, Hewlett-Packard Laboratories, May 1991.
[14]
S. Ramakrishnan. Software pipelining in PA-RISC compilers. Hewlett-Packard Journal, pages 39-45, June 1992.
[15]
B. R. Rau. Data flow and dependence analysis for instruction level parallelism. In Fourth Workshop on Languages and Compilers for Parallel Computing, pages 236-250, Aug. 1991.
[16]
B.R. Rau and J. A. Fisher. Instruction-level parallel processing: History, overview and perspective. Journal of Supercomputing, 7(112), Jan. 1993.
[17]
B. R. Rau and C. D. Glaeser. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. In Proceedings of the 14th Annual Microprogramming Workshop, pages 183-197, Oct. 1981.
[18]
B.R. Rau, M. Lee, P. Tu'umalai, and M. S. Schlansker. Register allocation for software pipelined loops. In Proceedings of the ACM SIGPLAN '92 Conference on Programming Language Design and Implementation, pages 283-299, June 1992.
[19]
B.R. Rau, M. S. Schlansker, and P. Tlrumalai. Code generation schemas for modulo scheduled loops. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 158-169, Dec. 1992.
[20]
B. R. Rau, D. W. L. Yen, W. Yen, and R. A. Towle. The Cydra 5 departmental supercomputer: Design philosophies, decisions, and trade-offs. IEEE Computer, pages 12-35, Jan. 1989.
[21]
J.C. Tiernan. An efficient search algorithm to find the elementary circuits of a graph. Communications of the ACM, pages 722-726, Dec. 1970.
[22]
P. Tlrumalai, M. Lee, and M. S. Schlansker. Parallelization of loops with exits on pipelined architectures, in IEEE Proceedings of Supercomputing ' 90, pages 200-212, Nov. 1990.
[23]
N. J. Warter, J. W. Bockhaus, G. E. Haab, and K. Subramanian. Enhancexl modulo scheduling for loops with conditional branches. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 170-179, Dec. 1992.
[24]
P. Wijaya and V. H. Allan. Incremental foresighted local compaction. In Proceedings of the 22nd Annual International Symposium on Microarchitecture, pages 163-171, Aug. 1989.
[25]
M. E. Wolf and M. S. Lain. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation, pages 30-44, June 1991.

Cited By

View all
  • (2024)Mapping Enumeration for Multi-Context CGRAs Using Zero-Suppressed Binary Decision Diagrams2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM60383.2024.00026(151-161)Online publication date: 5-May-2024
  • (2023)Facile: Fast, Accurate, and Interpretable Basic-Block Throughput Prediction2023 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC59245.2023.00023(87-99)Online publication date: 1-Oct-2023
  • (2019)Exact and Practical Modulo Scheduling for High-Level SynthesisACM Transactions on Reconfigurable Technology and Systems10.1145/331767012:2(1-26)Online publication date: 6-May-2019
  • Show More Cited By

Recommendations

Reviews

Benjamin Rayborn Seyfarth

The author presents the concept of bidirectional slack-scheduling to improve the efficiency of register allocation for loops. Basically, he presents heuristics for determining whether to attempt to schedule an operation as early as possible or as late as possible in an attempt to reduce the register pressure within a loop. Then he uses a backtracking algorithm to perform instruction scheduling. The author also presents empirical results from incorporating his techniques into a FORTRAN compiler for a hypothetical very long instruction word (VLIW) machine similar to the Cydrome Cydra. His results indicate that his scheduling technique allows optimal scheduling for 96 percent of the loops in a set of sample code including 1525 loops, resulting in an 11 percent improvement over the Cydrome compiler. The author further states that the technique can be adapted to RISC machines. The paper is a nice blend of theory and practice. The reader is expected to be familiar with register allocation strategies and, for such readers, the paper is quite readable. For those needing to study some background material, this paper includes a sufficient bibliography for further exploration. Both compiler writers and theorists will find this paper useful.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI '93: Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
August 1993
313 pages
ISBN:0897915984
DOI:10.1145/155090
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1993

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

PLDI93
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)113
  • Downloads (Last 6 weeks)9
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Mapping Enumeration for Multi-Context CGRAs Using Zero-Suppressed Binary Decision Diagrams2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM60383.2024.00026(151-161)Online publication date: 5-May-2024
  • (2023)Facile: Fast, Accurate, and Interpretable Basic-Block Throughput Prediction2023 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC59245.2023.00023(87-99)Online publication date: 1-Oct-2023
  • (2019)Exact and Practical Modulo Scheduling for High-Level SynthesisACM Transactions on Reconfigurable Technology and Systems10.1145/331767012:2(1-26)Online publication date: 6-May-2019
  • (2019)Optimized Implementation of Modified Gram Schmidt Algorithm on VLIW Architecture2019 4th World Conference on Complex Systems (WCCS)10.1109/ICoCS.2019.8930756(1-7)Online publication date: Apr-2019
  • (2018)Dependence Graph Preprocessing for Faster Exact Modulo Scheduling in High-Level Synthesis2018 28th International Conference on Field Programmable Logic and Applications (FPL)10.1109/FPL.2018.00055(280-2806)Online publication date: Aug-2018
  • (2018)ILP-Based Modulo Scheduling and Binding for Register Minimization2018 28th International Conference on Field Programmable Logic and Applications (FPL)10.1109/FPL.2018.00053(265-2656)Online publication date: Aug-2018
  • (2016)ILP-based modulo scheduling for high-level synthesisProceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems10.1145/2968455.2968512(1-10)Online publication date: 1-Oct-2016
  • (2015)Slackness-Based Co-Scheduling Hardware and Software Pipelines with Multiple Latency SequencesInternational Journal of Computers and Applications10.1080/1206212X.2000.1144161022:3(121-128)Online publication date: 10-Jul-2015
  • (2015)The impact of core precedences in a cyclic RCPSP with precedence delaysJournal of Scheduling10.1007/s10951-014-0399-418:3(275-284)Online publication date: 1-Jun-2015
  • (2014)Flushing-Enabled Loop Pipelining for High-Level SynthesisProceedings of the 51st Annual Design Automation Conference10.1145/2593069.2593143(1-6)Online publication date: 1-Jun-2014
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media