Article

Macro-op Scheduling: Relaxing Scheduling Loop Constraints

Authors:

Mikko H. LipastiAuthors Info & Claims

MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture

Page 277

Published: 03 December 2003 Publication History

Abstract

Ensuring back-to-back execution of dependent instructionsin a conventional out-of-order processor requiresscheduling logic that wakes up and selects instructions atthe same rate as they are executed. To sustain high performance,integer ALU instructions typically have single-cyclelatency, consequently requiring scheduling logic withthe same single-cycle latency. Prior proposals have advocatedthe use of speculation in either the wakeup or selectphases to enable pipelining of scheduling logic to achievehigher clock frequency. In contrast, this paper proposesmacro-op scheduling, which systematically removesinstructions with single-cycle latency from the machine bycombining them into macro-ops, and performs nonspeculativepipelined scheduling of multi-cycle operations. Macro-opscheduling also increases the effective size of the schedulingwindow by enabling multiple instructions to occupy asingle issue queue entry. We demonstrate that pipelined 2-cyclemacro-op scheduling performs comparably or evenbetter than atomic scheduling or prior proposals for select-freescheduling.

References

[1]

{1} E. Borch, E. Tune, S. Manne and J. Emer, Loose loops sink chips, in Proc. of 8th International Symposium on High-performance computer architecture, 2002.

Digital Library

[2]

{2} G. Hinton et al., The microarchitecture of the Pentium 4 processor, Intel Technology Journal Q1, 2001.

[3]

{3} M. Hrishikesh, N. Jouppi, K. Farkas, D. Burger, S. Keckler and P. Shivakumar, The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays, in Proc. of 29th International Symposium on Computer Architecture, 2002.

Digital Library

[4]

{4} D. Ernst and T. Austin, Efficient dynamic scheduling through Tag Elimination, in Proc. of 29th International Symposium on Computer Architecture, 2002.

Digital Library

[5]

{5} I. Kim and M. H. Lipasti, Half-price architecture, in Proc. of 30th International Symposium on Computer Architecture, 2003.

Digital Library

[6]

{6} K. Diefendorff, K7 challenges Intel, Microprocessor Report, Vol. 12, No. 14, 1998.

[7]

{7} J. Stark, M. Brown and Y. Patt, On pipelining dynamic instruction scheduling logic, in Proc. of 33th International Symposium on Microarchitecture, 2000.

Digital Library

[8]

{8} M. Brown, J. Stark and Y. Patt, Select-free instruction scheduling logic, in Proc. of 34th International Symposium on Microarchitecture , 2001.

Digital Library

[9]

{9} S. Palacharla, N. P. Jouppi and J. E. Smith, Complexity-effective superscalar processors, in Proc. of 24th International Symposium on Computer Architecture, 1997.

Digital Library

[10]

{10} A. R. Lebeck et al, A large, fast instruction window for tolerating cache misses, in Proc. of 29th International Symposium on Computer Architecture, 2002.

Digital Library

[11]

{11} P. Michaud and A. Seznec, Data-flow prescheduling for large instruction windows in out-of-order processors, in Proc. of 7th International Symposium on High Performance Computer Architecture , 2001.

Digital Library

[12]

{12} M. Goshima et al., A high-speed dynamic instruction scheduling scheme for superscalar processors, in Proc. of 34th International Symposium on Microarchitecture, 2001.

Digital Library

[13]

{13} D. C. Burger and T. M. Austin, The Simplescalar tool set, version 2.0, Technical Report CS-TR-97-1342, University of Wisconsin, Madison, 1997.

Digital Library

[14]

{14} A. Kleinosowski, J. Flynn, N. Meares and D. J. Lilja, Adapting the SPEC2000 benchmarks suite for simulation-based computer architecture research, Workshop on Workload Characterization in International Conference on Computer Design, 2000.

[15]

{15} Compaq Computer Corporation, Alpha 21264 microprocessor hardware reference manual, 1999.

[16]

{16} H. Kim and J. E. Smith, An instruction set and microarchitecture for instruction level distributed processing, in Proc. of International Symposium on Computer Architecture, 2002.

Digital Library

[17]

{17} P. Y.-T. Hsu, J. T. Rahmeh, E. S. Davidson, and J. A. Abraham, TIDBITS: Speedup via time-delay bit-slicing in ALU design for VLSI technology, in Proc. of 12th International Symposium on Computer Architecture, 1985.

Digital Library

[18]

{18} S. Gochman et al., The Intel Pentium M processor: Microarchitecture and performance, Intel Technology Journal vol. 7, issue 2, 2003.

[19]

{19} R. Canal and A. Gonzalez, A low-complexity issue logic, in Proc. of 14th International Conference on Supercomputing, 2000.

Digital Library

[20]

{20} S. E. Raasch, N. L. Binkert and S. K. Reinhardt, A scalable instruction queue design using dependence chains, in Proc. of 29th International Symposium on Computer Architecture, 2002.

Digital Library

[21]

{21} Q. Jacobson and J. E. Smith, Instruction pre-processing in trace processors, in Proc. of 5th International Symposium on High Performance Computer Architecture, 1999.

Digital Library

[22]

{22} Y. Chou and J. P. Shen, Instruction path coprocessors, in Proc. of 27th International Symposium on Computer Architecture, 2000.

Digital Library

[23]

{23} D. Friendly, S. Patel and Y. Patt, Putting the fill unit to work: Dynamic optimizations for trace cache microprocessors, in Proc. of 31st International Symposium on Microarchitecture, 1998.

Digital Library

[24]

{24} N. Malik, R. Eickemeyer and S. Vassiliadis, Interlock collapsing ALU for increased instruction-level parallelism, in Proc. of 25th International Symposium on Microarchitecture, 1992.

Digital Library

[25]

{25} Y. Sazeides, S. Vassiliadis and J. E. Smith, The performance potential of data dependence speculation and collapsing, in Proc. of 29th International Symposium on Microarchitecture, 1996.

Digital Library

Cited By

Taram MVenkat ATullsen D(2018)Mobilizing the micro-opsProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00058(624-637)Online publication date: 2-Jun-2018
https://dl.acm.org/doi/10.1109/ISCA.2018.00058
Aşılıoğlu GJin ZKöksal MJaveri OÖnder S(2015)LaZy superscalarACM SIGARCH Computer Architecture News10.1145/2872887.275040943:3S(260-271)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2872887.2750409
Aşılıoğlu GJin ZKöksal MJaveri OÖnder SMarr DAlbonesi D(2015)LaZy superscalarProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750409(260-271)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2749469.2750409
Show More Cited By

Index Terms

Macro-op Scheduling: Relaxing Scheduling Loop Constraints
1. Software and its engineering
  1. Software notations and tools
    1. Context specific languages
      1. Macro languages
    2. General programming languages
      1. Language types
        Assembly languages
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Multiprocessing / multiprogramming / multitasking
        Scheduling

Recommendations

Macro-op scheduling and execution
Evaluating and Enhancing Performance through Macro-Op Fusion Optimization with RISC-V
ICPP Workshops '24: Workshop Proceedings of the 53rd International Conference on Parallel Processing

RISC-V, renowned for its simplicity and extensibility, has garnered significant attention in the realm of high-performance processor construction. However, its streamlined design necessitates more instructions for tasks compared to established ISAs like ...
Dynamically Scheduling VLIW Instructions

Very long instruction word (VLIW) machines potentially provide the most direct way to exploit instruction-level parallelism; however, they cannot be used to emulate current general-purpose instruction set architectures. In addition, programs scheduled ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture

December 2003

412 pages

ISBN:076952043X

Copyright © Copyright (c) 2003 Institute of Electrical and Electronics Engineers, Inc. All rights reserved.

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

IEEE Computer Society

United States

Publication History

Published: 03 December 2003

Check for updates

Qualifiers

Article

Conference

MICRO-36

Sponsor:

SIGMICRO

MICRO-36: The 36th Annual International Symposium on Microarchitecture

December 3 - 5, 2003

Acceptance Rates

MICRO 36 Paper Acceptance Rate 35 of 134 submissions, 26%;

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
248
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Taram MVenkat ATullsen D(2018)Mobilizing the micro-opsProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00058(624-637)Online publication date: 2-Jun-2018
https://dl.acm.org/doi/10.1109/ISCA.2018.00058
Aşılıoğlu GJin ZKöksal MJaveri OÖnder S(2015)LaZy superscalarACM SIGARCH Computer Architecture News10.1145/2872887.275040943:3S(260-271)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2872887.2750409
Aşılıoğlu GJin ZKöksal MJaveri OÖnder SMarr DAlbonesi D(2015)LaZy superscalarProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750409(260-271)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2749469.2750409
Sasaki HKondo MNakamura H(2009)Energy-efficient dynamic instruction scheduling logic through instruction groupingIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2009.201339717:6(848-852)Online publication date: 1-Jun-2009
https://dl.acm.org/doi/10.1109/TVLSI.2009.2013397
Suri TAggarwal A(2008)Scalable multi-cores with improved per-core performance using off-the-critical path reconfigurable hardwareProceedings of the 15th international conference on High performance computing10.5555/1791889.1791928(365-377)Online publication date: 17-Dec-2008
https://dl.acm.org/doi/10.5555/1791889.1791928
Tseng FPatt Y(2008)Achieving Out-of-Order Performance with Almost In-Order ComplexityACM SIGARCH Computer Architecture News10.1145/1394608.138216936:3(3-12)Online publication date: 1-Jun-2008
https://dl.acm.org/doi/10.1145/1394608.1382169
Tseng FPatt Y(2008)Achieving Out-of-Order Performance with Almost In-Order ComplexityProceedings of the 35th Annual International Symposium on Computer Architecture10.1109/ISCA.2008.23(3-12)Online publication date: 21-Jun-2008
https://dl.acm.org/doi/10.1109/ISCA.2008.23
Escuder VDurán RRico R(2007)Analysis of x86 ISA condition codes influence on superscalar executionProceedings of the 14th international conference on High performance computing10.5555/1782174.1782193(119-132)Online publication date: 18-Dec-2007
https://dl.acm.org/doi/10.5555/1782174.1782193
Sassone PWills DLoh G(2007)Static strandsACM Transactions on Embedded Computing Systems10.1145/1274858.12748626:4(24-es)Online publication date: 1-Sep-2007
https://dl.acm.org/doi/10.1145/1274858.1274862
Vandierendonck HManet PDelavallee TLoiselle ILegat JBanerjee UMoreira JDubois MStenström P(2007)By-passing the out-of-order execution pipeline to increase energy-efficiencyProceedings of the 4th international conference on Computing frontiers10.1145/1242531.1242548(97-104)Online publication date: 7-May-2007
https://dl.acm.org/doi/10.1145/1242531.1242548
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents