Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Optimal trace scheduling using enumeration

Published: 23 March 2009 Publication History

Abstract

This article presents the first optimal algorithm for trace scheduling. The trace is a global scheduling region used by compilers to exploit instruction-level parallelism across basic block boundaries. Several heuristic techniques have been proposed for trace scheduling, but the precision of these techniques has not been studied relative to optimality. This article describes a technique for finding provably optimal trace schedules, where optimality is defined in terms of a weighted sum of schedule lengths across all code paths in a trace. The optimal algorithm uses branch-and-bound enumeration to efficiently explore the entire solution space. Experimental evaluation of the algorithm shows that, with a time limit of 1 s per problem, 91% of the hard trace scheduling problems in the SPEC CPU 2006 Integer Benchmarks are solved optimally. For 58% of these hard problems, the optimal schedule is improved compared to that produced by a heuristic scheduler with a geometric mean improvement of 3.2% in weighted schedule length and 18% in compensation code size.

References

[1]
Bernstein, D. and Rodeh, M. 1991. Global scheduling for super-scalar machines. In Proceedings of the Conference on Programming Language Design and Implementation. ACM, New York.
[2]
Bharadwaj, J., Menezes, K., and Mckinsey, C. 2000. Wavefront scheduling: path based data representation and scheduling of subgraphs. J. Instruc.-Level Paral., 2, 7.
[3]
Chekuri, C., Johnson, R., Motwani, R., Natarajan, B., Rau, B., and Schlansker, M. 1996. Profile-driven instruction-level parallel scheduling with applications to superblocks. In Proceedings of the 29th International Symposium on Microarchitecture. IEEE, Los Alamitos, CA.
[4]
Chou, H. and Chung, C. 1995. An optimal instruction scheduler for superscalar processor. IEEE Trans. Parall. Distrib. Syst. 6, 303--313.
[5]
Fisher, J. 1981. Trace Scheduling: a technique for global micro-code compaction. IEEE Trans. Comput. 30, 478--490.
[6]
Freudenberger, S., Gross, T., and Lowney, P. 1994. Avoidance and suppression of compensation code in a trace scheduling compiler. ACM Trans. Program. Lang. Syst. 16, 1156--1214.
[7]
Havanki, W., Banerjia, S., and Conte, T., 1998. Treegion scheduling for wide-issue processors. In Proceedings of the 4th International Symposium on High-Performance Computer Architecture (HPCA-4), IEEE, Los Alamitos, CA.
[8]
Hennessy, J. and Gross, T. 1983. Post-pass code optimization of pipeline constraints. ACM Trans. Program. Lang. Syst. 5, 422--448.
[9]
Hwu, W., Mahlke, S., Chen, W., Chang, P., Warter, N., Bringmann, R., Ouellette, R., Hank, R., Kiyohara, T., Haab, G., Holm J., and Lavery, D. 1993. The superblock: an effective technique for VLIW and superscalar compilation. J. Supercomput. 7, 229--248.
[10]
Lah, J. and Atkin, D. 1983. Tree compaction of micro-programs. In Proceedings of the 16th Annual Workshop on Micro-Programming. ACM, New York.
[11]
Langevin, M. and Cerny, E. 1996. A recursive technique for computing lower-bound performance of schedules. ACM Trans. Des. Autom. Electron. Syst. 1, 443--456.
[12]
Linn, J. 1983. SRDAG Compaction—a generalization of trace scheduling to increase the use of global context information. In Proceedings of the 16th Annual Workshop on Micro-Programming. ACM, New York.
[13]
Mahlke, S., Lin, D., Chen, W., Hank, R., and Bringmann, R. 1992. Effective compiler support for predicated execution using the hyperblock. In Proceedings of the 25th Annual International Symposium on Microarchitecture. IEEE, Los Alamitos, CA.
[14]
Muchnick, S. 1997. Advanced Compiler Design and Implementation. Morgan Kaufmann, San Francisco, CA.
[15]
Narasimhan, M. and Ramanujam, J. 2001. A fast approach to computing exact solutions to the resource-constrained scheduling problem. ACM Trans. Des. Autom. Electron. Syst. 6, 490--500.
[16]
Rim, M. and Jain, R. 1994. Lower-bound performance estimation for the high-level synthesis scheduling problem. IEEE Trans. Comput. Aid. Des. Integr. Circ. Syst. 13, 451--458.
[17]
Shobaki, G. and Wilken, K. 2004. Optimal superblock scheduling using enumeraion. In Proceedings of the 37th International Symposium on Microarchitecture. IEEE, Los Alamitos, CA.
[18]
Shobaki, G. 2006. Optimal global instruction scheduling using enumeration. PhD dissertation, Dept. of Computer Science, University of California, Davis.
[19]
Smith, M., Horowitz, M., and Lam, M. 1992. Efficient superscalar performance through boosting. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York.
[20]
Su, B., Ding, S., and Jin, L. 1984. An improvement of trace scheduling for global microcode compaction. In Proceedings of the 17th Annual Workshop on Microprogramming. ACM, New York.
[21]
Sun Microsystems, 2004. UltraSPARC Processors Documentation, www.sun.com/processors/documentation.html.
[22]
Wilken, K., Liu, J., and Heffernan, M. 2000. Optimal instruction scheduling using integer programming. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York.
[23]
Winkel S. 2007. Optimal versus heuristic global code scheduling. In Proceedings of the 40th International Symposium on Microarchitecture. IEEE, Los Alamitos, CA.
[24]
Wolsey, L. 1998. Integer Programming. John Wiley and Sons, New York.

Cited By

View all
  • (2023)Algorithms for Pre-Compiling Programs by Parallel CompilersComputer Systems Science and Engineering10.32604/csse.2023.02623844:3(2165-2176)Online publication date: 2023
  • (2019)Survey on Combinatorial Register Allocation and Instruction SchedulingACM Computing Surveys10.1145/320092052:3(1-50)Online publication date: 18-Jun-2019
  • (2017)GPU Architecture Aware Instruction Scheduling for Improving Soft-Error ReliabilityIEEE Transactions on Multi-Scale Computing Systems10.1109/TMSCS.2017.26676613:2(86-99)Online publication date: 1-Apr-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 5, Issue 4
March 2009
122 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/1498690
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 March 2009
Accepted: 01 July 2008
Revised: 01 March 2008
Received: 01 March 2006
Published in TACO Volume 5, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Instruction scheduling
  2. branch-and-bound enumeration
  3. compiler optimizations
  4. global instruction scheduling
  5. instruction-level parallelism
  6. optimal instruction scheduling
  7. trace scheduling

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)137
  • Downloads (Last 6 weeks)10
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Algorithms for Pre-Compiling Programs by Parallel CompilersComputer Systems Science and Engineering10.32604/csse.2023.02623844:3(2165-2176)Online publication date: 2023
  • (2019)Survey on Combinatorial Register Allocation and Instruction SchedulingACM Computing Surveys10.1145/320092052:3(1-50)Online publication date: 18-Jun-2019
  • (2017)GPU Architecture Aware Instruction Scheduling for Improving Soft-Error ReliabilityIEEE Transactions on Multi-Scale Computing Systems10.1109/TMSCS.2017.26676613:2(86-99)Online publication date: 1-Apr-2017
  • (2016)PAISProceedings of the 2016 Conference on Design, Automation & Test in Europe10.5555/2971808.2972174(1568-1573)Online publication date: 14-Mar-2016
  • (2016)Efficient Resource Constrained Scheduling Using Parallel Structure-Aware Pruning TechniquesIEEE Transactions on Computers10.1109/TC.2015.246823065:7(2059-2073)Online publication date: 1-Jul-2016
  • (2015)Branch vanguardACM SIGARCH Computer Architecture News10.1145/2872887.275040043:3S(323-335)Online publication date: 13-Jun-2015
  • (2015)Branch vanguardProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750400(323-335)Online publication date: 13-Jun-2015
  • (2015)An exact algorithm for the sequential ordering problem and its application to switching energy minimization in compilersComputational Optimization and Applications10.1007/s10589-015-9725-961:2(343-372)Online publication date: 1-Jun-2015
  • (2014)Integrated modulo scheduling and cluster assignment for TI TMS320C64x+ architectureProceedings of the 11th Workshop on Optimizations for DSP and Embedded Systems10.1145/2568326.2568327(25-32)Online publication date: 15-Feb-2014
  • (2013)Preallocation instruction scheduling with register pressure minimization using a combinatorial optimization approachACM Transactions on Architecture and Code Optimization10.1145/251243210:3(1-31)Online publication date: 16-Sep-2013
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media