Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1006209.1006240acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

Scaling the issue window with look-ahead latency prediction

Published: 26 June 2004 Publication History

Abstract

In contemporary out-of-order superscalar design, high IPC is mainly achieved by exposing high instruction level parallelism (ILP). Scaling issue window size can certainly provide more ILP; however, future processor scaling demands threaten to limit the size of the issue window.In this study, we propose a dynamic instruction sorting mechanism that provides more ILP without increasing the size of the issue window. In our approach, early in the pipeline, we predict how long an instruction needs to wait before it can be issued, i.e. the waiting time for its operands to be produced. Using this knowledge, the instructions are placed into a sorting structure, which allows instructions with shorter waiting times enter the issue window ahead of those instructions with longer waiting times, preventing long-waiting instructions from clogging the issue queue.The accuracy in predicting instruction waiting times directly determines the effectiveness of our sorting mechanism. While most instructions have deterministic execution latencies, predicting load execution times is more difficult due to cache misses and in-flight loads. Loads are particularly challenging since their execution time can vary significantly. In this study, we examine techniques to predict load execution time accurately, based on data reference history.

References

[1]
T. Austin, E. Larson, D. Ernst. SimpleScalar: an Infrastructure for Computer System Modeling, IEEE Computer, Volume 35, Issue 2, Feb 2002.
[2]
D. C. Burger and T. M. Austin, The Simplescalar Tool set, version 2.0, Technical Report CS-TR-97-1342, University of Wisconsin, Madison, June 1997.
[3]
R. Canal and A. Gonzalez. A Low-Complexity Issue Logic. In Proceedings of the 2000 International Conference on Supercomputing (ICS 2001), May 2001.
[4]
D. Ernst, A. Hamel, T. Austin, Cyclone: A Broadcast-Free Dynamic Instruction Scheduler with Selective Replay. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA-30), Jun. 2003.
[5]
G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The microarchitecture of the Pentium 4 processor. Intel Technology Journal Q1, 2001.
[6]
R. E. Kessler. The Alpha 21264 Microprocessor. IEEE Micro, 19(2):24--36, March-April 1999.
[7]
H. Kim and J. E. Smith. An Instruction Set Architecture and Microarchitecture for Instruction Level Distributed Processing. In Proceedings of the 29th International Symposium on Computer Architecture (ISCA-29), May 2002.
[8]
A.R. Lebeck, J. Koppanalil, T. Li, J. Patwardhan, and E. Rotenberg. A Large, Fast Instruction Window for Tolerating Cache Misses, In Proceedings of the 29th International Symposium on Computer Architecture (ISCA-29), May 2002.
[9]
S. Lee and P. Yew, "On Some Implementation Issues for Value Prediction on Wide-Issue ILP Processors", International Conference on Parallel Architecture and Compiler Techniques (PACT2000).
[10]
M.H. Lipasti, C.B. Wilkerson and J.P. Shen, Value Locality and Load Value Prediction. In Proceedings of the Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-96), October 1996.
[11]
G. Memik, G. Reinman, W. H. Mangione-Smith, Just Say No: Benefits of Early Cache Miss Determination, In Proceedings of the 9th IEEE/ACM International Symposium on High Performance Computer Architecture (HPCA-9), Feb. 2003.
[12]
G. Memik, G. Reinman, and W. H. Mangione-Smith, Precise Scheduling with Early Cache Miss Detection, CARES Technical Report No. 2003_1.
[13]
P. Michaud and A. Seznec, Data-flow prescheduling for large instruction windows in out-of-order processors. In Proceedings of the 7th International Symposium on High Performance Computer Architecture, (HPCA-6), Jan. 2001.R.F.
[14]
E. Morancho, J. M. Llaberia and A. Olive. Recovery Mechanism for Latency Misprediction, In Proceedings of the 2001 International Symposium on Parallel Architectures and Compilation Techniques (PACT-2001), September 2001.
[15]
S. Palacharla, N. P. Jouppi, and J. Smith. Complexity effective Superscalar Processors, In Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA-24), June 1997.
[16]
S. Raasch, N. Binkert, and S. Reinhardt. A Scalable Instruction Queue Design Using Dependence Chain, In Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA-29), May 2002.
[17]
T. Sherwood, E. Perelman, and B. Calder. Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques (PACT 2001), Sep. 2001, Barcelona, Spain.
[18]
R. F. Sproull, I. E. Sutherland and C.E. Molnar, "The Counterflow Pipeline Processor Architecture" IEEE Design and Test of Computers, pp. 48--59, Vol.11, No.3, Fall 1994.
[19]
K. Wang and M. Franklin, Highly accurate data value prediction using hybrid predictors. In Proceedings of the 30th Annual International Symposium on Microarchitecture (Micro-30), Dec. 1997.

Cited By

View all
  • (2022)Efficient Instruction Scheduling Using Real-time Load Delay TrackingACM Transactions on Computer Systems10.1145/354868140:1-4(1-21)Online publication date: 24-Nov-2022
  • (2019)Filter caching for freeProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322269(436-448)Online publication date: 22-Jun-2019
  • (2018)Dynamically Disabling Way-prediction to Reduce Instruction Replay2018 IEEE 36th International Conference on Computer Design (ICCD)10.1109/ICCD.2018.00029(140-143)Online publication date: Oct-2018
  • Show More Cited By

Index Terms

  1. Scaling the issue window with look-ahead latency prediction

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '04: Proceedings of the 18th annual international conference on Supercomputing
    June 2004
    360 pages
    ISBN:1581138393
    DOI:10.1145/1006209
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 June 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. CLP
    2. LHT
    3. MNM
    4. SILO
    5. instruction sorting

    Qualifiers

    • Article

    Conference

    ICS04
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 21 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Efficient Instruction Scheduling Using Real-time Load Delay TrackingACM Transactions on Computer Systems10.1145/354868140:1-4(1-21)Online publication date: 24-Nov-2022
    • (2019)Filter caching for freeProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322269(436-448)Online publication date: 22-Jun-2019
    • (2018)Dynamically Disabling Way-prediction to Reduce Instruction Replay2018 IEEE 36th International Conference on Computer Design (ICCD)10.1109/ICCD.2018.00029(140-143)Online publication date: Oct-2018
    • (2015)Cost-effective speculative scheduling in high performance processorsACM SIGARCH Computer Architecture News10.1145/2872887.274947043:3S(247-259)Online publication date: 13-Jun-2015
    • (2015)Cost-effective speculative scheduling in high performance processorsProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2749470(247-259)Online publication date: 13-Jun-2015
    • (2009)Accurate Instruction Pre-scheduling in Dynamically Scheduled ProcessorsTransactions on High-Performance Embedded Architectures and Compilers II10.1007/978-3-642-00904-4_7(107-127)Online publication date: 22-Apr-2009
    • (2008)A low-complexity microprocessor design with speculative pre-executionJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2008.05.00354:12(1101-1112)Online publication date: 1-Dec-2008
    • (2007)On reducing energy-consumption by late-inserting instructions into the issue queueProceedings of the 2007 international symposium on Low power electronics and design10.1145/1283780.1283861(371-374)Online publication date: 27-Aug-2007
    • (2007)Exploiting Operand Availability for Efficient Simultaneous MultithreadingIEEE Transactions on Computers10.1109/TC.2007.2856:2(208-223)Online publication date: 1-Feb-2007
    • (2006)Efficient Instruction Schedulers for SMT ProcessorsThe Twelfth International Symposium on High-Performance Computer Architecture, 2006.10.1109/HPCA.2006.1598137(293-304)Online publication date: 2006
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media