Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/143095.143141acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article
Free access

Register allocation for software pipelined loops

Published: 01 July 1992 Publication History

Abstract

Software pipelining is an important instruction scheduling technique for efficiently overlapping successive iterations of loops and executing them in parallel. This paper studies the task of register allocation for software pipelined loops, both with and without hardware features that are specifically aimed at supporting software pipelines. Register allocation for software pipelines presents certain novel problems leading to unconventional solutions, especially in the presence of hardware support. This paper formulates these novel problems and presents a number of alternative solution strategies. These alternatives are comprehensively tested against over one thousand loops to determine the best register allocation strategy, both with and without the hardware support for software pipelining.

References

[1]
Allen, J.R., et al. Conversion of control dependence to data dependence. In Proceedings of the Tenth Annual ACM Symposium on Principles of Programming Languages, (January, 1983).
[2]
Berry, M., et al. The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers. The International Journal of Supercomputer Applications, 3 (1989), 5-40.
[3]
Callahan, D., Cart, S., and Kennedy, K. Improving Register Allocation for Subscripted Variables, In Proceedings of the A CM StGPLAN '90 Conference on Programming Language Design and Implementation, (June, 1990), 53- 65.
[4]
Chaitin, G.J. Register allocation and spilling via graph coloring. In Proceedings of the SIGPLAN82 Symposium on Compiler Construction, (June, 1982), 201-207.
[5]
Charlesworth, A.E. An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS- 164 Family. IEEE Computer 14, 9 (September, 1981), 18- 27.
[6]
Dehnert, J.C., Hsu, P.Y.-T., and Bratt, J.P. Overlapped loop support in the Cydra 5. In Proceedings of the Third international Conference on Architectural Support for Programming Languages and Operating Systems, (Boston, Mass., April, 1989), 26-38.
[7]
Ebcioglu, K., and Nakatani, T. A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture. In Proceedings of the Second Workshop on Programming Languages and Compilers for Parallel Computing, (Urbana-Champaign, 1989), 213-229.
[8]
Hendren, L.J., et al. Register Allocation using Cyclic interval Graphs: A New Approach to an Old Problem. ACAPS Technical Memo 33. Advanced Computer Architecture and Program Structures Group, McGill University, Montreal, Canada, 1992.
[9]
Hsu, P.Y.T. Highly Concurrent Scalar Processing. CSG-49. Coordinated Science Lab., University of Illinois, Urbana, Illinois, 1986.
[10]
Jain, S. Circular scheduling: A new technique to perform software pipelining. In Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and implementation, (June, 1991), 219-228.
[11]
Lain, M. Software pipelining: an effective scheduling technique for VLIW machines. In Proceedings of the A CM SIGPLAN '88 Conference on Programming Language Design and Implementation, (June, 1988), 318-327.
[12]
Lee, R.L., Kwok, A.Y., and Briggs, F.A. The floating point performance of a superscalar SPARC processor. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, (Santa Clara, California, April, 1991), 28-37.
[13]
Nicolau, A., and Potasman, R. Realistic scheduling: compaction for pipelined architectures. In Proceedings of the 23th Annual Workshop on Microprogramming and Microarchitecture, (Orlando, Florida, November, 1990), 69-79.
[14]
Rau, B.R. Data flow and dependence analysis for instruction level parallelism. In Proceedings of the Fourth Workshop on Languages and Compilers for Parallel Computing, (Santa Clara, August, 1991).
[15]
Rau, B.R., and Glaeser, C.D. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. In Proceedings of the Fourteenth Annual Workshop on Microprogramming, (October, 1981), 183-198.
[16]
Rau, B.R., et al. Register Allocation for Modulo Scheduled Loops: Strategies, Algorithms and Heuristics. HP Labs Technical Report HPL-92-48. Hewlett-Packard Laboratories, Palo Alto, California, 1992.
[17]
Rau, B.R., et al. Code Generation Schema for Modulo Scheduled DO-Loops and WHILE-Loops. HP Labs Technical Report HPL-92-47. Hewlett-Packard Laboratories, Palo Alto, California, 1992.
[18]
Rau, B.R., et al. The Cydra 5 departmental supercomputer: design philosophies, decisions and trade-offs, i EEE Computer 22, 1 (January, 1989).
[19]
Timmalai, P., Lee, M., and Schlansker, M.S. Parallelization of loops with exits on pipelined architectures, in Proceedings of the Supercomputing '90, (November, 1990), 200-212.
[20]
Uniejewski, .l. SPEC Benchmark Suite: Designed for Today's Advanced Systems. SPEC Newsletter 1, 1 (Fall, 1989).

Cited By

View all
  • (2024)Efficient Schedule Construction for Distributed Execution of Large DNN ModelsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.346691335:12(2375-2391)Online publication date: Dec-2024
  • (2024)Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00067(803-816)Online publication date: 2-Mar-2024
  • (2018)Coarse-Grained Reconfigurable Array ArchitecturesHandbook of Signal Processing Systems10.1007/978-3-319-91734-4_12(427-472)Online publication date: 14-Oct-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI '92: Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
July 1992
352 pages
ISBN:0897914759
DOI:10.1145/143095
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 July 1992

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

PLDI92
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)228
  • Downloads (Last 6 weeks)31
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient Schedule Construction for Distributed Execution of Large DNN ModelsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.346691335:12(2375-2391)Online publication date: Dec-2024
  • (2024)Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00067(803-816)Online publication date: 2-Mar-2024
  • (2018)Coarse-Grained Reconfigurable Array ArchitecturesHandbook of Signal Processing Systems10.1007/978-3-319-91734-4_12(427-472)Online publication date: 14-Oct-2018
  • (2017)S-CaffeACM SIGPLAN Notices10.1145/3155284.301876952:8(193-205)Online publication date: 26-Jan-2017
  • (2017)Model-based Iterative CT Image Reconstruction on GPUsACM SIGPLAN Notices10.1145/3155284.301876552:8(207-220)Online publication date: 26-Jan-2017
  • (2015)Lightweight, flexible object-oriented genericsACM SIGPLAN Notices10.1145/2813885.273800850:6(436-445)Online publication date: 3-Jun-2015
  • (2015)Automatic induction proofs of data-structures in imperative programsACM SIGPLAN Notices10.1145/2813885.273798450:6(457-466)Online publication date: 3-Jun-2015
  • (2015)Relatively complete counterexamples for higher-order programsACM SIGPLAN Notices10.1145/2813885.273797150:6(446-456)Online publication date: 3-Jun-2015
  • (2015)Compositional certified resource boundsACM SIGPLAN Notices10.1145/2813885.273795550:6(467-478)Online publication date: 3-Jun-2015
  • (2014)Optimum modulo schedules for minimum register requirementsACM International Conference on Supercomputing 25th Anniversary Volume10.1145/2591635.2667171(227-236)Online publication date: 10-Jun-2014
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media