Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Efficient Spilling Reduction for Software Pipelined Loops in Presence of Multiple Register Types in Embedded VLIW Processors

Published: 01 November 2011 Publication History

Abstract

Integrating register allocation and software pipelining of loops is an active research area. We focus on techniques that precondition the dependence graph before software pipelining in order to ensure that no register spill instructions are inserted by the register allocator in the software pipelined loop. If spilling is not necessary for the input code, preconditioning techniques insert dependence arcs so that the maximum register pressure MAXLIVE achieved by any loop schedule is below the number of available registers, without hurting the initiation interval if possible. When a solution exists, a spill-free software pipeline is guaranteed to exist.
Existing preconditioning techniques consider one register type (register class) at a time [Deschinkel and Touati 2008]. In this article, we extend preconditioning techniques so that multiple register types are considered simultaneously. First, we generalize the existing theory of register pressure minimization for cyclic scheduling. Second, we implement our method inside the production compiler of the ST2xx VLIW family, and we demonstrate its efficiency on industry benchmarks (FFMPEG, MEDIABENCH, SPEC2000, SPEC2006). We demonstrate a high spill reduction rate without a significant initiation interval loss.

References

[1]
Briais, S., and Touati, S.-A.-A. 2009. Schedule-Sensitive register pressure reduction in innermost loops, basic blocks, and super-blocks. Tech. rep. INRIA-HAL-00436348, University of Versailles Saint-Quentin en Yvelines.
[2]
Briggs, P., Cooper, K. D., and Torczon, L. 1994. Improvements to graph coloring register allocation. ACM Trans. Program. Lang. Syst. 16, 3, 428--455.
[3]
Chaitin, G. 2004. Register allocation and spilling via graph coloring. SIGPLAN Not. 39, 4, 66--74.
[4]
De Werra, D., Eisenbeis, C., Lelait, S., and Marmol, B. 1999. On a graph-theoretical model for cyclic register allocation. Discr. Appl. Math. 93, 2--3, 191--203.
[5]
Deschinkel, K. and Touati, S.-A.-A. 2008. Efficient method for periodic task scheduling with storage requirement minimization. In Proceedings of the 2nd International Conference on Combinatorial Optimization and Applications (COCOA’08). Lecture Notes in Computer Science, vol. 5165. Springer, 438--447.
[6]
Dupont-de-Dinechin, B. 1997. Parametric computation of margins and of minimum cumulative register lifetime dates. In Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing (LCPC’96). Springer, 231--245.
[7]
Eichenberger, A. and Davidson, E. S. 1997. Efficient formulation for optimal modulo schedulers. SIGPLAN Not. 32, 5, 194--205.
[8]
Eisenbeis, C., Lelait, S., and Marmol, B. 1995. The meeting graph: A new model for loop cyclic register allocation. In Proceedings of the IFIP WG 10.3 Working Conference on Parallel Architectures and Compilation Techniques (PACT’95). L. Bic, W. Bohm, P. Evripidou, and J. L. Gaudiot Eds., ACM Press, New York, 264--267.
[9]
Farabosch, G., Fisher, J. A., Desoli, G., and Homewood, F. 2000. Lx: A technology platform for customizable VLIW embedded processing. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA’00). ACM, New York, 203--213.
[10]
Fisher, J. A., Faraboschi, and Young, C. 2005. Embedded Computing: A VLIW Approach to Architecture, Compilers and Tools. Morgan Kaufmann Publishers.
[11]
Guillon, C., Rastello, F., Bidault, T., and Bouchez, F. 2005. Procedure placement using temporal-ordering information: Dealing with code size expansion. J. Embed. Comput. 1, 4, 437--459.
[12]
Hendren, L. J., Gao, G. R., Altman, E. R., and Mukerji, C. 1992. A register allocation framework based on hierarchical cyclic interval graphs. In Proceedings of the 4th International Conference on Compiler Construction (CC’92). Springer, 176--191.
[13]
Jain, R. 1991. The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling. John Wiley and Sons, New York.
[14]
Lam, M. 1988. Software pipelining: An effective scheduling technique for VLIW machines. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’88). ACM Press, New York, 318--328.
[15]
Nagarakatte, S. G. and Govindarajan, R. 2007. Register allocation and optimal spill code scheduling in software pipelined loops using 0-1 integer linear programming formulation. In Proceedings of the 16th International Conference on Compiler Construction (CC’07) Held as Part of the Joint European Conference on Theory and Practice of Software (ETAPS’07). Lecture Notes in Computer Science, vol. 4420., Springer, 126--140.
[16]
Nicolau, A., Potasman, R., and Wang, H. 1992. Register allocation, renaming and their impact on fine-grain parallelism. In Proceedings of the 4th International Workshop on Languages and Compilers for Parallel Computing. Springer, 218--235.
[17]
Ramakrishna, R. B. 1994. Iterative modulo scheduling: An algorithm for software pipelining loops. In Proceedings of the 27th Annual International Symposium on Microarchitecture (Micro27). ACM, New York, 63--74.
[18]
Ramakrishna, R. B., Schlansker, M. S., and Tirumalai, P. P. 1992a. Code generation schema for modulo scheduled loops. SIGMICRO Newslett. 23, 1--2, 158--169.
[19]
Ramakrishna, R. B., Lee, M., Tirumalaiand, P. P., and Schlansker, M. S. 1992b. Register allocation for software pipelined loops. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’92). ACM, New York, 283--299.
[20]
Ravindra, K. A., Magnanti, T. L., and Orlin, J. B. 1991. Network Flows: Theory, Algorithms, and Applications. John Wiley and Sons, New York.
[21]
Ruttenberg, J., Gao, G. R., Stoutchinin, A., and Lichtenstein, W. 1996. Software pipelining showdown: Optimal vs. heuristic methods in a production compiler. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’96). ACM Press, New York, 1--11.
[22]
Schrijver, A. 1986. Theory of Linear and Integer Programming. Wiley and Sons.
[23]
Touati, S.-A.-A. 2007. On the periodic register need in software pipelining. IEEE Trans. Comput. 56, 11, 1103--1504.
[24]
Touati, S.-A.-A. and Barthou, D. 2006. On the decidability of phase ordering problem in optimizing compilation. In Proceedings of the ACM International Conference on Computing Frontiers.
[25]
Touati, S.-A.-A. and Eisenbeis, C. 2004. Early periodic register allocation on ILP processors. Parall. Process. Lett. 14, 2, 287.
[26]
Wang, J., Eisenbeis, C., Jourdan, M., and Su, B. 1994. Decomposed software pipelining: A new perspective and a new approach. Int. J. Parall. Program. 22, 3, 351--373.

Cited By

View all
  • (2014)BibliographyAdvanced Backend Code Optimization10.1002/9781118625446.biblio(327-343)Online publication date: 3-Jun-2014
  • (2012)Minimal Unroll Factor for Code Generation of Software PipeliningInternational Journal of Parallel Programming10.1007/s10766-012-0203-z41:1(1-58)Online publication date: 17-Jul-2012

Index Terms

  1. Efficient Spilling Reduction for Software Pipelined Loops in Presence of Multiple Register Types in Embedded VLIW Processors

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 10, Issue 4
    November 2011
    297 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/2043662
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 01 November 2011
    Accepted: 01 March 2010
    Revised: 01 December 2009
    Received: 01 June 2009
    Published in TECS Volume 10, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Software pipelining
    2. backend compilation
    3. code optimization
    4. instruction-level parallelism
    5. register allocation

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 21 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2014)BibliographyAdvanced Backend Code Optimization10.1002/9781118625446.biblio(327-343)Online publication date: 3-Jun-2014
    • (2012)Minimal Unroll Factor for Code Generation of Software PipeliningInternational Journal of Parallel Programming10.1007/s10766-012-0203-z41:1(1-58)Online publication date: 17-Jul-2012

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media