Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/956417.956563acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article

Macro-op Scheduling: Relaxing Scheduling Loop Constraints

Published: 03 December 2003 Publication History

Abstract

Ensuring back-to-back execution of dependent instructionsin a conventional out-of-order processor requiresscheduling logic that wakes up and selects instructions atthe same rate as they are executed. To sustain high performance,integer ALU instructions typically have single-cyclelatency, consequently requiring scheduling logic withthe same single-cycle latency. Prior proposals have advocatedthe use of speculation in either the wakeup or selectphases to enable pipelining of scheduling logic to achievehigher clock frequency. In contrast, this paper proposesmacro-op scheduling, which systematically removesinstructions with single-cycle latency from the machine bycombining them into macro-ops, and performs nonspeculativepipelined scheduling of multi-cycle operations. Macro-opscheduling also increases the effective size of the schedulingwindow by enabling multiple instructions to occupy asingle issue queue entry. We demonstrate that pipelined 2-cyclemacro-op scheduling performs comparably or evenbetter than atomic scheduling or prior proposals for select-freescheduling.

References

[1]
{1} E. Borch, E. Tune, S. Manne and J. Emer, Loose loops sink chips, in Proc. of 8th International Symposium on High-performance computer architecture, 2002.
[2]
{2} G. Hinton et al., The microarchitecture of the Pentium 4 processor, Intel Technology Journal Q1, 2001.
[3]
{3} M. Hrishikesh, N. Jouppi, K. Farkas, D. Burger, S. Keckler and P. Shivakumar, The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays, in Proc. of 29th International Symposium on Computer Architecture, 2002.
[4]
{4} D. Ernst and T. Austin, Efficient dynamic scheduling through Tag Elimination, in Proc. of 29th International Symposium on Computer Architecture, 2002.
[5]
{5} I. Kim and M. H. Lipasti, Half-price architecture, in Proc. of 30th International Symposium on Computer Architecture, 2003.
[6]
{6} K. Diefendorff, K7 challenges Intel, Microprocessor Report, Vol. 12, No. 14, 1998.
[7]
{7} J. Stark, M. Brown and Y. Patt, On pipelining dynamic instruction scheduling logic, in Proc. of 33th International Symposium on Microarchitecture, 2000.
[8]
{8} M. Brown, J. Stark and Y. Patt, Select-free instruction scheduling logic, in Proc. of 34th International Symposium on Microarchitecture , 2001.
[9]
{9} S. Palacharla, N. P. Jouppi and J. E. Smith, Complexity-effective superscalar processors, in Proc. of 24th International Symposium on Computer Architecture, 1997.
[10]
{10} A. R. Lebeck et al, A large, fast instruction window for tolerating cache misses, in Proc. of 29th International Symposium on Computer Architecture, 2002.
[11]
{11} P. Michaud and A. Seznec, Data-flow prescheduling for large instruction windows in out-of-order processors, in Proc. of 7th International Symposium on High Performance Computer Architecture , 2001.
[12]
{12} M. Goshima et al., A high-speed dynamic instruction scheduling scheme for superscalar processors, in Proc. of 34th International Symposium on Microarchitecture, 2001.
[13]
{13} D. C. Burger and T. M. Austin, The Simplescalar tool set, version 2.0, Technical Report CS-TR-97-1342, University of Wisconsin, Madison, 1997.
[14]
{14} A. Kleinosowski, J. Flynn, N. Meares and D. J. Lilja, Adapting the SPEC2000 benchmarks suite for simulation-based computer architecture research, Workshop on Workload Characterization in International Conference on Computer Design, 2000.
[15]
{15} Compaq Computer Corporation, Alpha 21264 microprocessor hardware reference manual, 1999.
[16]
{16} H. Kim and J. E. Smith, An instruction set and microarchitecture for instruction level distributed processing, in Proc. of International Symposium on Computer Architecture, 2002.
[17]
{17} P. Y.-T. Hsu, J. T. Rahmeh, E. S. Davidson, and J. A. Abraham, TIDBITS: Speedup via time-delay bit-slicing in ALU design for VLSI technology, in Proc. of 12th International Symposium on Computer Architecture, 1985.
[18]
{18} S. Gochman et al., The Intel Pentium M processor: Microarchitecture and performance, Intel Technology Journal vol. 7, issue 2, 2003.
[19]
{19} R. Canal and A. Gonzalez, A low-complexity issue logic, in Proc. of 14th International Conference on Supercomputing, 2000.
[20]
{20} S. E. Raasch, N. L. Binkert and S. K. Reinhardt, A scalable instruction queue design using dependence chains, in Proc. of 29th International Symposium on Computer Architecture, 2002.
[21]
{21} Q. Jacobson and J. E. Smith, Instruction pre-processing in trace processors, in Proc. of 5th International Symposium on High Performance Computer Architecture, 1999.
[22]
{22} Y. Chou and J. P. Shen, Instruction path coprocessors, in Proc. of 27th International Symposium on Computer Architecture, 2000.
[23]
{23} D. Friendly, S. Patel and Y. Patt, Putting the fill unit to work: Dynamic optimizations for trace cache microprocessors, in Proc. of 31st International Symposium on Microarchitecture, 1998.
[24]
{24} N. Malik, R. Eickemeyer and S. Vassiliadis, Interlock collapsing ALU for increased instruction-level parallelism, in Proc. of 25th International Symposium on Microarchitecture, 1992.
[25]
{25} Y. Sazeides, S. Vassiliadis and J. E. Smith, The performance potential of data dependence speculation and collapsing, in Proc. of 29th International Symposium on Microarchitecture, 1996.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
December 2003
412 pages
ISBN:076952043X

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 03 December 2003

Check for updates

Qualifiers

  • Article

Conference

MICRO-36
Sponsor:

Acceptance Rates

MICRO 36 Paper Acceptance Rate 35 of 134 submissions, 26%;
Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Mobilizing the micro-opsProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00058(624-637)Online publication date: 2-Jun-2018
  • (2015)LaZy superscalarACM SIGARCH Computer Architecture News10.1145/2872887.275040943:3S(260-271)Online publication date: 13-Jun-2015
  • (2015)LaZy superscalarProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750409(260-271)Online publication date: 13-Jun-2015
  • (2009)Energy-efficient dynamic instruction scheduling logic through instruction groupingIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2009.201339717:6(848-852)Online publication date: 1-Jun-2009
  • (2008)Scalable multi-cores with improved per-core performance using off-the-critical path reconfigurable hardwareProceedings of the 15th international conference on High performance computing10.5555/1791889.1791928(365-377)Online publication date: 17-Dec-2008
  • (2008)Achieving Out-of-Order Performance with Almost In-Order ComplexityACM SIGARCH Computer Architecture News10.1145/1394608.138216936:3(3-12)Online publication date: 1-Jun-2008
  • (2008)Achieving Out-of-Order Performance with Almost In-Order ComplexityProceedings of the 35th Annual International Symposium on Computer Architecture10.1109/ISCA.2008.23(3-12)Online publication date: 21-Jun-2008
  • (2007)Analysis of x86 ISA condition codes influence on superscalar executionProceedings of the 14th international conference on High performance computing10.5555/1782174.1782193(119-132)Online publication date: 18-Dec-2007
  • (2007)Static strandsACM Transactions on Embedded Computing Systems10.1145/1274858.12748626:4(24-es)Online publication date: 1-Sep-2007
  • (2007)By-passing the out-of-order execution pipeline to increase energy-efficiencyProceedings of the 4th international conference on Computing frontiers10.1145/1242531.1242548(97-104)Online publication date: 7-May-2007
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media