research-article

Open access

Optimal trace scheduling using enumeration

Authors:

Ghassan Shobaki,

Mark HeffernanAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 5, Issue 4

Article No.: 19, Pages 1 - 32

https://doi.org/10.1145/1498690.1498694

Published: 23 March 2009 Publication History

Abstract

This article presents the first optimal algorithm for trace scheduling. The trace is a global scheduling region used by compilers to exploit instruction-level parallelism across basic block boundaries. Several heuristic techniques have been proposed for trace scheduling, but the precision of these techniques has not been studied relative to optimality. This article describes a technique for finding provably optimal trace schedules, where optimality is defined in terms of a weighted sum of schedule lengths across all code paths in a trace. The optimal algorithm uses branch-and-bound enumeration to efficiently explore the entire solution space. Experimental evaluation of the algorithm shows that, with a time limit of 1 s per problem, 91% of the hard trace scheduling problems in the SPEC CPU 2006 Integer Benchmarks are solved optimally. For 58% of these hard problems, the optimal schedule is improved compared to that produced by a heuristic scheduler with a geometric mean improvement of 3.2% in weighted schedule length and 18% in compensation code size.

References

[1]

Bernstein, D. and Rodeh, M. 1991. Global scheduling for super-scalar machines. In Proceedings of the Conference on Programming Language Design and Implementation. ACM, New York.

Digital Library

[2]

Bharadwaj, J., Menezes, K., and Mckinsey, C. 2000. Wavefront scheduling: path based data representation and scheduling of subgraphs. J. Instruc.-Level Paral., 2, 7.

[3]

Chekuri, C., Johnson, R., Motwani, R., Natarajan, B., Rau, B., and Schlansker, M. 1996. Profile-driven instruction-level parallel scheduling with applications to superblocks. In Proceedings of the 29th International Symposium on Microarchitecture. IEEE, Los Alamitos, CA.

Digital Library

[4]

Chou, H. and Chung, C. 1995. An optimal instruction scheduler for superscalar processor. IEEE Trans. Parall. Distrib. Syst. 6, 303--313.

Digital Library

[5]

Fisher, J. 1981. Trace Scheduling: a technique for global micro-code compaction. IEEE Trans. Comput. 30, 478--490.

Digital Library

[6]

Freudenberger, S., Gross, T., and Lowney, P. 1994. Avoidance and suppression of compensation code in a trace scheduling compiler. ACM Trans. Program. Lang. Syst. 16, 1156--1214.

Digital Library

[7]

Havanki, W., Banerjia, S., and Conte, T., 1998. Treegion scheduling for wide-issue processors. In Proceedings of the 4th International Symposium on High-Performance Computer Architecture (HPCA-4), IEEE, Los Alamitos, CA.

Digital Library

[8]

Hennessy, J. and Gross, T. 1983. Post-pass code optimization of pipeline constraints. ACM Trans. Program. Lang. Syst. 5, 422--448.

Digital Library

[9]

Hwu, W., Mahlke, S., Chen, W., Chang, P., Warter, N., Bringmann, R., Ouellette, R., Hank, R., Kiyohara, T., Haab, G., Holm J., and Lavery, D. 1993. The superblock: an effective technique for VLIW and superscalar compilation. J. Supercomput. 7, 229--248.

Digital Library

[10]

Lah, J. and Atkin, D. 1983. Tree compaction of micro-programs. In Proceedings of the 16th Annual Workshop on Micro-Programming. ACM, New York.

[11]

Langevin, M. and Cerny, E. 1996. A recursive technique for computing lower-bound performance of schedules. ACM Trans. Des. Autom. Electron. Syst. 1, 443--456.

Digital Library

[12]

Linn, J. 1983. SRDAG Compaction—a generalization of trace scheduling to increase the use of global context information. In Proceedings of the 16th Annual Workshop on Micro-Programming. ACM, New York.

[13]

Mahlke, S., Lin, D., Chen, W., Hank, R., and Bringmann, R. 1992. Effective compiler support for predicated execution using the hyperblock. In Proceedings of the 25th Annual International Symposium on Microarchitecture. IEEE, Los Alamitos, CA.

Digital Library

[14]

Muchnick, S. 1997. Advanced Compiler Design and Implementation. Morgan Kaufmann, San Francisco, CA.

Digital Library

[15]

Narasimhan, M. and Ramanujam, J. 2001. A fast approach to computing exact solutions to the resource-constrained scheduling problem. ACM Trans. Des. Autom. Electron. Syst. 6, 490--500.

Digital Library

[16]

Rim, M. and Jain, R. 1994. Lower-bound performance estimation for the high-level synthesis scheduling problem. IEEE Trans. Comput. Aid. Des. Integr. Circ. Syst. 13, 451--458.

Digital Library

[17]

Shobaki, G. and Wilken, K. 2004. Optimal superblock scheduling using enumeraion. In Proceedings of the 37th International Symposium on Microarchitecture. IEEE, Los Alamitos, CA.

Digital Library

[18]

Shobaki, G. 2006. Optimal global instruction scheduling using enumeration. PhD dissertation, Dept. of Computer Science, University of California, Davis.

Digital Library

[19]

Smith, M., Horowitz, M., and Lam, M. 1992. Efficient superscalar performance through boosting. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York.

Digital Library

[20]

Su, B., Ding, S., and Jin, L. 1984. An improvement of trace scheduling for global microcode compaction. In Proceedings of the 17th Annual Workshop on Microprogramming. ACM, New York.

Digital Library

[21]

Sun Microsystems, 2004. UltraSPARC Processors Documentation, www.sun.com/processors/documentation.html.

[22]

Wilken, K., Liu, J., and Heffernan, M. 2000. Optimal instruction scheduling using integer programming. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York.

Digital Library

[23]

Winkel S. 2007. Optimal versus heuristic global code scheduling. In Proceedings of the 40th International Symposium on Microarchitecture. IEEE, Los Alamitos, CA.

Digital Library

[24]

Wolsey, L. 1998. Integer Programming. John Wiley and Sons, New York.

Cited By

AlFayez F(2023)Algorithms for Pre-Compiling Programs by Parallel CompilersComputer Systems Science and Engineering10.32604/csse.2023.02623844:3(2165-2176)Online publication date: 2023
https://doi.org/10.32604/csse.2023.026238
Lozano RSchulte C(2019)Survey on Combinatorial Register Allocation and Instruction SchedulingACM Computing Surveys10.1145/320092052:3(1-50)Online publication date: 18-Jun-2019
https://dl.acm.org/doi/10.1145/3200920
Lee HAl Faruque M(2017)GPU Architecture Aware Instruction Scheduling for Improving Soft-Error ReliabilityIEEE Transactions on Multi-Scale Computing Systems10.1109/TMSCS.2017.26676613:2(86-99)Online publication date: 1-Apr-2017
https://doi.org/10.1109/TMSCS.2017.2667661
Show More Cited By

Index Terms

Optimal trace scheduling using enumeration

Recommendations

Preallocation instruction scheduling with register pressure minimization using a combinatorial optimization approach

Balancing Instruction-Level Parallelism (ILP) and register pressure during preallocation instruction scheduling is a fundamentally important problem in code generation and optimization. The problem is known to be NP-complete. Many heuristic techniques ...
Exploring an Alternative Cost Function for Combinatorial Register-Pressure-Aware Instruction Scheduling

Multiple combinatorial algorithms have been proposed for doing pre-allocation instruction scheduling with the objective of minimizing register pressure or balancing register pressure and instruction-level parallelism. The cost function that is minimized ...
Warp-aware trace scheduling for GPUs
PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilation

GPU performance depends not only on thread/warp level parallelism (TLP) but also on instruction-level parallelism (ILP). It is not enough to schedule instructions within basic blocks, it is also necessary to exploit opportunities for ILP optimization ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 5, Issue 4

March 2009

122 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/1498690

Issue’s Table of Contents

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 March 2009

Accepted: 01 July 2008

Revised: 01 March 2008

Received: 01 March 2006

Published in TACO Volume 5, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
858
Total Downloads

Downloads (Last 12 months)137
Downloads (Last 6 weeks)10

Reflects downloads up to 23 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

AlFayez F(2023)Algorithms for Pre-Compiling Programs by Parallel CompilersComputer Systems Science and Engineering10.32604/csse.2023.02623844:3(2165-2176)Online publication date: 2023
https://doi.org/10.32604/csse.2023.026238
Lozano RSchulte C(2019)Survey on Combinatorial Register Allocation and Instruction SchedulingACM Computing Surveys10.1145/320092052:3(1-50)Online publication date: 18-Jun-2019
https://dl.acm.org/doi/10.1145/3200920
Lee HAl Faruque M(2017)GPU Architecture Aware Instruction Scheduling for Improving Soft-Error ReliabilityIEEE Transactions on Multi-Scale Computing Systems10.1109/TMSCS.2017.26676613:2(86-99)Online publication date: 1-Apr-2017
https://doi.org/10.1109/TMSCS.2017.2667661
Lee HChen HAl Faruque MFanucci LTeich J(2016)PAISProceedings of the 2016 Conference on Design, Automation & Test in Europe10.5555/2971808.2972174(1568-1573)Online publication date: 14-Mar-2016
https://dl.acm.org/doi/10.5555/2971808.2972174
Chen MZhang XPu GFu XMishra P(2016)Efficient Resource Constrained Scheduling Using Parallel Structure-Aware Pruning TechniquesIEEE Transactions on Computers10.1109/TC.2015.246823065:7(2059-2073)Online publication date: 1-Jul-2016
https://doi.org/10.1109/TC.2015.2468230
McFarlin DZilles C(2015)Branch vanguardACM SIGARCH Computer Architecture News10.1145/2872887.275040043:3S(323-335)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2872887.2750400
McFarlin DZilles CMarr DAlbonesi D(2015)Branch vanguardProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750400(323-335)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2749469.2750400
Shobaki GJamal J(2015)An exact algorithm for the sequential ordering problem and its application to switching energy minimization in compilersComputational Optimization and Applications10.1007/s10589-015-9725-961:2(343-372)Online publication date: 1-Jun-2015
https://dl.acm.org/doi/10.1007/s10589-015-9725-9
Kim NKrall ABrandner FVander Aa T(2014)Integrated modulo scheduling and cluster assignment for TI TMS320C64x+ architectureProceedings of the 11th Workshop on Optimizations for DSP and Embedded Systems10.1145/2568326.2568327(25-32)Online publication date: 15-Feb-2014
https://dl.acm.org/doi/10.1145/2568326.2568327
Shobaki GShawabkeh MRmaileh N(2013)Preallocation instruction scheduling with register pressure minimization using a combinatorial optimization approachACM Transactions on Architecture and Code Optimization10.1145/251243210:3(1-31)Online publication date: 16-Sep-2013
https://dl.acm.org/doi/10.1145/2512432
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents