Article

Free access

Register allocation for software pipelined loops

Authors:

P. P. Tirumalai,

M. S. SchlanskerAuthors Info & Claims

PLDI '92: Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation

Pages 283 - 299

https://doi.org/10.1145/143095.143141

Published: 01 July 1992 Publication History

Abstract

Software pipelining is an important instruction scheduling technique for efficiently overlapping successive iterations of loops and executing them in parallel. This paper studies the task of register allocation for software pipelined loops, both with and without hardware features that are specifically aimed at supporting software pipelines. Register allocation for software pipelines presents certain novel problems leading to unconventional solutions, especially in the presence of hardware support. This paper formulates these novel problems and presents a number of alternative solution strategies. These alternatives are comprehensively tested against over one thousand loops to determine the best register allocation strategy, both with and without the hardware support for software pipelining.

References

[1]

Allen, J.R., et al. Conversion of control dependence to data dependence. In Proceedings of the Tenth Annual ACM Symposium on Principles of Programming Languages, (January, 1983).

Digital Library

[2]

Berry, M., et al. The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers. The International Journal of Supercomputer Applications, 3 (1989), 5-40.

Digital Library

[3]

Callahan, D., Cart, S., and Kennedy, K. Improving Register Allocation for Subscripted Variables, In Proceedings of the A CM StGPLAN '90 Conference on Programming Language Design and Implementation, (June, 1990), 53- 65.

Digital Library

[4]

Chaitin, G.J. Register allocation and spilling via graph coloring. In Proceedings of the SIGPLAN82 Symposium on Compiler Construction, (June, 1982), 201-207.

Digital Library

[5]

Charlesworth, A.E. An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS- 164 Family. IEEE Computer 14, 9 (September, 1981), 18- 27.

Digital Library

[6]

Dehnert, J.C., Hsu, P.Y.-T., and Bratt, J.P. Overlapped loop support in the Cydra 5. In Proceedings of the Third international Conference on Architectural Support for Programming Languages and Operating Systems, (Boston, Mass., April, 1989), 26-38.

Digital Library

[7]

Ebcioglu, K., and Nakatani, T. A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture. In Proceedings of the Second Workshop on Programming Languages and Compilers for Parallel Computing, (Urbana-Champaign, 1989), 213-229.

Digital Library

[8]

Hendren, L.J., et al. Register Allocation using Cyclic interval Graphs: A New Approach to an Old Problem. ACAPS Technical Memo 33. Advanced Computer Architecture and Program Structures Group, McGill University, Montreal, Canada, 1992.

[9]

Hsu, P.Y.T. Highly Concurrent Scalar Processing. CSG-49. Coordinated Science Lab., University of Illinois, Urbana, Illinois, 1986.

[10]

Jain, S. Circular scheduling: A new technique to perform software pipelining. In Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and implementation, (June, 1991), 219-228.

Digital Library

[11]

Lain, M. Software pipelining: an effective scheduling technique for VLIW machines. In Proceedings of the A CM SIGPLAN '88 Conference on Programming Language Design and Implementation, (June, 1988), 318-327.

Digital Library

[12]

Lee, R.L., Kwok, A.Y., and Briggs, F.A. The floating point performance of a superscalar SPARC processor. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, (Santa Clara, California, April, 1991), 28-37.

Digital Library

[13]

Nicolau, A., and Potasman, R. Realistic scheduling: compaction for pipelined architectures. In Proceedings of the 23th Annual Workshop on Microprogramming and Microarchitecture, (Orlando, Florida, November, 1990), 69-79.

Digital Library

[14]

Rau, B.R. Data flow and dependence analysis for instruction level parallelism. In Proceedings of the Fourth Workshop on Languages and Compilers for Parallel Computing, (Santa Clara, August, 1991).

Digital Library

[15]

Rau, B.R., and Glaeser, C.D. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. In Proceedings of the Fourteenth Annual Workshop on Microprogramming, (October, 1981), 183-198.

Digital Library

[16]

Rau, B.R., et al. Register Allocation for Modulo Scheduled Loops: Strategies, Algorithms and Heuristics. HP Labs Technical Report HPL-92-48. Hewlett-Packard Laboratories, Palo Alto, California, 1992.

[17]

Rau, B.R., et al. Code Generation Schema for Modulo Scheduled DO-Loops and WHILE-Loops. HP Labs Technical Report HPL-92-47. Hewlett-Packard Laboratories, Palo Alto, California, 1992.

[18]

Rau, B.R., et al. The Cydra 5 departmental supercomputer: design philosophies, decisions and trade-offs, i EEE Computer 22, 1 (January, 1989).

Digital Library

[19]

Timmalai, P., Lee, M., and Schlansker, M.S. Parallelization of loops with exits on pipelined architectures, in Proceedings of the Supercomputing '90, (November, 1990), 200-212.

Digital Library

[20]

Uniejewski, .l. SPEC Benchmark Suite: Designed for Today's Advanced Systems. SPEC Newsletter 1, 1 (Fall, 1989).

Cited By

Lin ZMiao YXu GLi CSaarikivi OMaleki SYang F(2024)Efficient Schedule Construction for Distributed Execution of Large DNN ModelsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.346691335:12(2375-2391)Online publication date: Dec-2024
https://doi.org/10.1109/TPDS.2024.3466913
Lin ZMiao YXu GLi CSaarikivi OMaleki SYang F(2024)Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00067(803-816)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00067
Sutter BRaghavan PLambrechts A(2018)Coarse-Grained Reconfigurable Array ArchitecturesHandbook of Signal Processing Systems10.1007/978-3-319-91734-4_12(427-472)Online publication date: 14-Oct-2018
https://doi.org/10.1007/978-3-319-91734-4_12
Show More Cited By

Index Terms

Register allocation for software pipelined loops
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation
    2. General programming languages
      1. Language features
2. Theory of computation
  1. Semantics and reasoning
    1. Program constructs

Recommendations

Register allocation for software pipelined multidimensional loops

This article investigates register allocation for software pipelined multidimensional loops where the execution of successive iterations from an n-dimensional loop is overlapped. For single loop software pipelining, the lifetimes of a loop variable in ...
Register allocation for software pipelined multi-dimensional loops
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation

Software pipelining of a multi-dimensional loop is an important optimization that overlaps the execution of successive outermost loop iterations to explore instruction-level parallelism from the entire n-dimensional iteration space. This paper ...
Register allocation for software pipelined multi-dimensional loops
PLDI '05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation

Software pipelining of a multi-dimensional loop is an important optimization that overlaps the execution of successive outermost loop iterations to explore instruction-level parallelism from the entire n-dimensional iteration space. This paper ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PLDI '92: Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation

July 1992

352 pages

ISBN:0897914759

DOI:10.1145/143095

Chairman:
Stuart I. Feldman
Bell Communications Research, Morristown, NJ
,
Editor:
Richard L. Wexelblat

ACM SIGPLAN Notices Volume 27, Issue 7
July 1992
352 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/143103
Editor:
Richard Wexelblat
IDA/CDED, Alexandria, VA
Issue’s Table of Contents

Copyright © 1992 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 July 1992

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

PLDI92

Sponsor:

SIGPLAN

PLDI92: SIGPLAN 92 Conference on Programming Language Design and Implmentation

June 15 - 19, 1992

California, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

170
Total Citations
View Citations
1,390
Total Downloads

Downloads (Last 12 months)228
Downloads (Last 6 weeks)31

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lin ZMiao YXu GLi CSaarikivi OMaleki SYang F(2024)Efficient Schedule Construction for Distributed Execution of Large DNN ModelsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.346691335:12(2375-2391)Online publication date: Dec-2024
https://doi.org/10.1109/TPDS.2024.3466913
Lin ZMiao YXu GLi CSaarikivi OMaleki SYang F(2024)Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00067(803-816)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00067
Sutter BRaghavan PLambrechts A(2018)Coarse-Grained Reconfigurable Array ArchitecturesHandbook of Signal Processing Systems10.1007/978-3-319-91734-4_12(427-472)Online publication date: 14-Oct-2018
https://doi.org/10.1007/978-3-319-91734-4_12
Awan AHamidouche KHashmi JPanda D(2017)S-CaffeACM SIGPLAN Notices10.1145/3155284.301876952:8(193-205)Online publication date: 26-Jan-2017
https://dl.acm.org/doi/10.1145/3155284.3018769
Sabne AWang XKisner SBouman CRaghunathan AMidkiff S(2017)Model-based Iterative CT Image Reconstruction on GPUsACM SIGPLAN Notices10.1145/3155284.301876552:8(207-220)Online publication date: 26-Jan-2017
https://dl.acm.org/doi/10.1145/3155284.3018765
Zhang YLoring MSalvaneschi GLiskov BMyers A(2015)Lightweight, flexible object-oriented genericsACM SIGPLAN Notices10.1145/2813885.273800850:6(436-445)Online publication date: 3-Jun-2015
https://dl.acm.org/doi/10.1145/2813885.2738008
Chu DJaffar JTrinh M(2015)Automatic induction proofs of data-structures in imperative programsACM SIGPLAN Notices10.1145/2813885.273798450:6(457-466)Online publication date: 3-Jun-2015
https://dl.acm.org/doi/10.1145/2813885.2737984
Nguyễn PVan Horn D(2015)Relatively complete counterexamples for higher-order programsACM SIGPLAN Notices10.1145/2813885.273797150:6(446-456)Online publication date: 3-Jun-2015
https://dl.acm.org/doi/10.1145/2813885.2737971
Carbonneaux QHoffmann JShao Z(2015)Compositional certified resource boundsACM SIGPLAN Notices10.1145/2813885.273795550:6(467-478)Online publication date: 3-Jun-2015
https://dl.acm.org/doi/10.1145/2813885.2737955
Eichenberger ADavidson EAbraham S(2014)Optimum modulo schedules for minimum register requirementsACM International Conference on Supercomputing 25th Anniversary Volume10.1145/2591635.2667171(227-236)Online publication date: 10-Jun-2014
https://dl.acm.org/doi/10.1145/2591635.2667171
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents