research-article

Efficient Spilling Reduction for Software Pipelined Loops in Presence of Multiple Register Types in Embedded VLIW Processors

Authors:

Sid-Ahmed-Ali Touati,

Frederic Brault,

Karine Deschinkel,

Benoît Dupont de DinechinAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 10, Issue 4

Article No.: 47, Pages 1 - 25

https://doi.org/10.1145/2043662.2043671

Published: 01 November 2011 Publication History

Abstract

Integrating register allocation and software pipelining of loops is an active research area. We focus on techniques that precondition the dependence graph before software pipelining in order to ensure that no register spill instructions are inserted by the register allocator in the software pipelined loop. If spilling is not necessary for the input code, preconditioning techniques insert dependence arcs so that the maximum register pressure MAXLIVE achieved by any loop schedule is below the number of available registers, without hurting the initiation interval if possible. When a solution exists, a spill-free software pipeline is guaranteed to exist.

Existing preconditioning techniques consider one register type (register class) at a time [Deschinkel and Touati 2008]. In this article, we extend preconditioning techniques so that multiple register types are considered simultaneously. First, we generalize the existing theory of register pressure minimization for cyclic scheduling. Second, we implement our method inside the production compiler of the ST2xx VLIW family, and we demonstrate its efficiency on industry benchmarks (FFMPEG, MEDIABENCH, SPEC2000, SPEC2006). We demonstrate a high spill reduction rate without a significant initiation interval loss.

References

[1]

Briais, S., and Touati, S.-A.-A. 2009. Schedule-Sensitive register pressure reduction in innermost loops, basic blocks, and super-blocks. Tech. rep. INRIA-HAL-00436348, University of Versailles Saint-Quentin en Yvelines.

[2]

Briggs, P., Cooper, K. D., and Torczon, L. 1994. Improvements to graph coloring register allocation. ACM Trans. Program. Lang. Syst. 16, 3, 428--455.

Digital Library

[3]

Chaitin, G. 2004. Register allocation and spilling via graph coloring. SIGPLAN Not. 39, 4, 66--74.

Digital Library

[4]

De Werra, D., Eisenbeis, C., Lelait, S., and Marmol, B. 1999. On a graph-theoretical model for cyclic register allocation. Discr. Appl. Math. 93, 2--3, 191--203.

Digital Library

[5]

Deschinkel, K. and Touati, S.-A.-A. 2008. Efficient method for periodic task scheduling with storage requirement minimization. In Proceedings of the 2nd International Conference on Combinatorial Optimization and Applications (COCOA’08). Lecture Notes in Computer Science, vol. 5165. Springer, 438--447.

Digital Library

[6]

Dupont-de-Dinechin, B. 1997. Parametric computation of margins and of minimum cumulative register lifetime dates. In Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing (LCPC’96). Springer, 231--245.

Digital Library

[7]

Eichenberger, A. and Davidson, E. S. 1997. Efficient formulation for optimal modulo schedulers. SIGPLAN Not. 32, 5, 194--205.

Digital Library

[8]

Eisenbeis, C., Lelait, S., and Marmol, B. 1995. The meeting graph: A new model for loop cyclic register allocation. In Proceedings of the IFIP WG 10.3 Working Conference on Parallel Architectures and Compilation Techniques (PACT’95). L. Bic, W. Bohm, P. Evripidou, and J. L. Gaudiot Eds., ACM Press, New York, 264--267.

Digital Library

[9]

Farabosch, G., Fisher, J. A., Desoli, G., and Homewood, F. 2000. Lx: A technology platform for customizable VLIW embedded processing. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA’00). ACM, New York, 203--213.

Digital Library

[10]

Fisher, J. A., Faraboschi, and Young, C. 2005. Embedded Computing: A VLIW Approach to Architecture, Compilers and Tools. Morgan Kaufmann Publishers.

Digital Library

[11]

Guillon, C., Rastello, F., Bidault, T., and Bouchez, F. 2005. Procedure placement using temporal-ordering information: Dealing with code size expansion. J. Embed. Comput. 1, 4, 437--459.

Digital Library

[12]

Hendren, L. J., Gao, G. R., Altman, E. R., and Mukerji, C. 1992. A register allocation framework based on hierarchical cyclic interval graphs. In Proceedings of the 4th International Conference on Compiler Construction (CC’92). Springer, 176--191.

Digital Library

[13]

Jain, R. 1991. The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling. John Wiley and Sons, New York.

[14]

Lam, M. 1988. Software pipelining: An effective scheduling technique for VLIW machines. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’88). ACM Press, New York, 318--328.

Digital Library

[15]

Nagarakatte, S. G. and Govindarajan, R. 2007. Register allocation and optimal spill code scheduling in software pipelined loops using 0-1 integer linear programming formulation. In Proceedings of the 16th International Conference on Compiler Construction (CC’07) Held as Part of the Joint European Conference on Theory and Practice of Software (ETAPS’07). Lecture Notes in Computer Science, vol. 4420., Springer, 126--140.

Digital Library

[16]

Nicolau, A., Potasman, R., and Wang, H. 1992. Register allocation, renaming and their impact on fine-grain parallelism. In Proceedings of the 4th International Workshop on Languages and Compilers for Parallel Computing. Springer, 218--235.

Digital Library

[17]

Ramakrishna, R. B. 1994. Iterative modulo scheduling: An algorithm for software pipelining loops. In Proceedings of the 27th Annual International Symposium on Microarchitecture (Micro27). ACM, New York, 63--74.

Digital Library

[18]

Ramakrishna, R. B., Schlansker, M. S., and Tirumalai, P. P. 1992a. Code generation schema for modulo scheduled loops. SIGMICRO Newslett. 23, 1--2, 158--169.

Digital Library

[19]

Ramakrishna, R. B., Lee, M., Tirumalaiand, P. P., and Schlansker, M. S. 1992b. Register allocation for software pipelined loops. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’92). ACM, New York, 283--299.

Digital Library

[20]

Ravindra, K. A., Magnanti, T. L., and Orlin, J. B. 1991. Network Flows: Theory, Algorithms, and Applications. John Wiley and Sons, New York.

[21]

Ruttenberg, J., Gao, G. R., Stoutchinin, A., and Lichtenstein, W. 1996. Software pipelining showdown: Optimal vs. heuristic methods in a production compiler. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’96). ACM Press, New York, 1--11.

Digital Library

[22]

Schrijver, A. 1986. Theory of Linear and Integer Programming. Wiley and Sons.

Digital Library

[23]

Touati, S.-A.-A. 2007. On the periodic register need in software pipelining. IEEE Trans. Comput. 56, 11, 1103--1504.

Digital Library

[24]

Touati, S.-A.-A. and Barthou, D. 2006. On the decidability of phase ordering problem in optimizing compilation. In Proceedings of the ACM International Conference on Computing Frontiers.

Digital Library

[25]

Touati, S.-A.-A. and Eisenbeis, C. 2004. Early periodic register allocation on ILP processors. Parall. Process. Lett. 14, 2, 287.

[26]

Wang, J., Eisenbeis, C., Jourdan, M., and Su, B. 1994. Decomposed software pipelining: A new perspective and a new approach. Int. J. Parall. Program. 22, 3, 351--373.

Digital Library

Cited By

Touati SDinechin B(2014)BibliographyAdvanced Backend Code Optimization10.1002/9781118625446.biblio(327-343)Online publication date: 3-Jun-2014
https://doi.org/10.1002/9781118625446.biblio
Bachir MTouati SBrault FGregg DCohen A(2012)Minimal Unroll Factor for Code Generation of Software PipeliningInternational Journal of Parallel Programming10.1007/s10766-012-0203-z41:1(1-58)Online publication date: 17-Jul-2012
https://doi.org/10.1007/s10766-012-0203-z

Index Terms

Efficient Spilling Reduction for Software Pipelined Loops in Presence of Multiple Register Types in Embedded VLIW Processors
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation

Recommendations

Improved spill code generation for software pipelined loops
PLDI '00: Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation

Software pipelining is a loop scheduling technique that extracts parallelism out of loops by overlapping the execution of several consecutive iterations. Due to the overlapping of iterations, schedules impose high register requirements during their ...
Improved spill code generation for software pipelined loops

Software pipelining is a loop scheduling technique that extracts parallelism out of loops by overlapping the execution of several consecutive iterations. Due to the overlapping of iterations, schedules impose high register requirements during their ...
Enabling compiler flow for embedded VLIW DSP processors with distributed register files
LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems

High-performance and low-power VLIW DSP processors are increasingly deployed on embedded devices to process video and multimedia applications. For reducing power and cost in designs of VLIW DSP processors, distributed register files and multi-bank ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 10, Issue 4

November 2011

297 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/2043662

Issue’s Table of Contents

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 01 November 2011

Accepted: 01 March 2010

Revised: 01 December 2009

Received: 01 June 2009

Published in TECS Volume 10, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

DIGITEO foundation
Agence Nationale de la Recherche

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
219
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Touati SDinechin B(2014)BibliographyAdvanced Backend Code Optimization10.1002/9781118625446.biblio(327-343)Online publication date: 3-Jun-2014
https://doi.org/10.1002/9781118625446.biblio
Bachir MTouati SBrault FGregg DCohen A(2012)Minimal Unroll Factor for Code Generation of Software PipeliningInternational Journal of Parallel Programming10.1007/s10766-012-0203-z41:1(1-58)Online publication date: 17-Jul-2012
https://doi.org/10.1007/s10766-012-0203-z

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents