Article

Scaling the issue window with look-ahead latency prediction

Authors:

Anahita Shayesteh,

Glenn ReinmanAuthors Info & Claims

ICS '04: Proceedings of the 18th annual international conference on Supercomputing

Pages 217 - 226

https://doi.org/10.1145/1006209.1006240

Published: 26 June 2004 Publication History

Abstract

In contemporary out-of-order superscalar design, high IPC is mainly achieved by exposing high instruction level parallelism (ILP). Scaling issue window size can certainly provide more ILP; however, future processor scaling demands threaten to limit the size of the issue window.In this study, we propose a dynamic instruction sorting mechanism that provides more ILP without increasing the size of the issue window. In our approach, early in the pipeline, we predict how long an instruction needs to wait before it can be issued, i.e. the waiting time for its operands to be produced. Using this knowledge, the instructions are placed into a sorting structure, which allows instructions with shorter waiting times enter the issue window ahead of those instructions with longer waiting times, preventing long-waiting instructions from clogging the issue queue.The accuracy in predicting instruction waiting times directly determines the effectiveness of our sorting mechanism. While most instructions have deterministic execution latencies, predicting load execution times is more difficult due to cache misses and in-flight loads. Loads are particularly challenging since their execution time can vary significantly. In this study, we examine techniques to predict load execution time accurately, based on data reference history.

References

[1]

T. Austin, E. Larson, D. Ernst. SimpleScalar: an Infrastructure for Computer System Modeling, IEEE Computer, Volume 35, Issue 2, Feb 2002.

Digital Library

[2]

D. C. Burger and T. M. Austin, The Simplescalar Tool set, version 2.0, Technical Report CS-TR-97-1342, University of Wisconsin, Madison, June 1997.

Digital Library

[3]

R. Canal and A. Gonzalez. A Low-Complexity Issue Logic. In Proceedings of the 2000 International Conference on Supercomputing (ICS 2001), May 2001.

Digital Library

[4]

D. Ernst, A. Hamel, T. Austin, Cyclone: A Broadcast-Free Dynamic Instruction Scheduler with Selective Replay. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA-30), Jun. 2003.

Digital Library

[5]

G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The microarchitecture of the Pentium 4 processor. Intel Technology Journal Q1, 2001.

[6]

R. E. Kessler. The Alpha 21264 Microprocessor. IEEE Micro, 19(2):24--36, March-April 1999.

Digital Library

[7]

H. Kim and J. E. Smith. An Instruction Set Architecture and Microarchitecture for Instruction Level Distributed Processing. In Proceedings of the 29th International Symposium on Computer Architecture (ISCA-29), May 2002.

Digital Library

[8]

A.R. Lebeck, J. Koppanalil, T. Li, J. Patwardhan, and E. Rotenberg. A Large, Fast Instruction Window for Tolerating Cache Misses, In Proceedings of the 29th International Symposium on Computer Architecture (ISCA-29), May 2002.

Digital Library

[9]

S. Lee and P. Yew, "On Some Implementation Issues for Value Prediction on Wide-Issue ILP Processors", International Conference on Parallel Architecture and Compiler Techniques (PACT2000).

Digital Library

[10]

M.H. Lipasti, C.B. Wilkerson and J.P. Shen, Value Locality and Load Value Prediction. In Proceedings of the Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-96), October 1996.

Digital Library

[11]

G. Memik, G. Reinman, W. H. Mangione-Smith, Just Say No: Benefits of Early Cache Miss Determination, In Proceedings of the 9th IEEE/ACM International Symposium on High Performance Computer Architecture (HPCA-9), Feb. 2003.

Digital Library

[12]

G. Memik, G. Reinman, and W. H. Mangione-Smith, Precise Scheduling with Early Cache Miss Detection, CARES Technical Report No. 2003_1.

[13]

P. Michaud and A. Seznec, Data-flow prescheduling for large instruction windows in out-of-order processors. In Proceedings of the 7th International Symposium on High Performance Computer Architecture, (HPCA-6), Jan. 2001.R.F.

Digital Library

[14]

E. Morancho, J. M. Llaberia and A. Olive. Recovery Mechanism for Latency Misprediction, In Proceedings of the 2001 International Symposium on Parallel Architectures and Compilation Techniques (PACT-2001), September 2001.

Digital Library

[15]

S. Palacharla, N. P. Jouppi, and J. Smith. Complexity effective Superscalar Processors, In Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA-24), June 1997.

Digital Library

[16]

S. Raasch, N. Binkert, and S. Reinhardt. A Scalable Instruction Queue Design Using Dependence Chain, In Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA-29), May 2002.

Digital Library

[17]

T. Sherwood, E. Perelman, and B. Calder. Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques (PACT 2001), Sep. 2001, Barcelona, Spain.

Digital Library

[18]

R. F. Sproull, I. E. Sutherland and C.E. Molnar, "The Counterflow Pipeline Processor Architecture" IEEE Design and Test of Computers, pp. 48--59, Vol.11, No.3, Fall 1994.

Digital Library

[19]

K. Wang and M. Franklin, Highly accurate data value prediction using hybrid predictors. In Proceedings of the 30th Annual International Symposium on Microarchitecture (Micro-30), Dec. 1997.

Digital Library

Cited By

Diavastos ACarlson T(2022)Efficient Instruction Scheduling Using Real-time Load Delay TrackingACM Transactions on Computer Systems10.1145/354868140:1-4(1-21)Online publication date: 24-Nov-2022
https://dl.acm.org/doi/10.1145/3548681
Alves RRos ABlack-Schaffer DKaxiras SManne SHunter HAltman E(2019)Filter caching for freeProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322269(436-448)Online publication date: 22-Jun-2019
https://dl.acm.org/doi/10.1145/3307650.3322269
Alves RKaxiras SBlack-Schaffer D(2018)Dynamically Disabling Way-prediction to Reduce Instruction Replay2018 IEEE 36th International Conference on Computer Design (ICCD)10.1109/ICCD.2018.00029(140-143)Online publication date: Oct-2018
https://doi.org/10.1109/ICCD.2018.00029
Show More Cited By

Index Terms

Scaling the issue window with look-ahead latency prediction
1. Computer systems organization
  1. Architectures

Recommendations

Clairvoyance: look-ahead compile-time scheduling
CGO '17: Proceedings of the 2017 International Symposium on Code Generation and Optimization

To enhance the performance of memory-bound applications, hardware designs have been developed to hide memory latency, such as the out-of-order (OoO) execution engine, at the price of increased energy consumption. Contemporary processor cores span a ...
A latency-conscious SMT branch prediction architecture

Executing multiple threads has proved to be an effective solution to partially hide latencies that appear in a processor. When a thread is stalled because of a long-latency operation is being processed, such as a memory access or a floating-point ...
Tolerating Load Miss-Latency by Extending Effective Instruction Window with Low Complexity
ICPP '11: Proceedings of the 2011 International Conference on Parallel Processing

An execute-ahead processor pre-executes instructions when a load miss would stall the processor. The typical design has several components that grow with the distance to execute ahead and need to be carefully balanced for optimal performance. This paper ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '04: Proceedings of the 18th annual international conference on Supercomputing

June 2004

360 pages

ISBN:1581138393

DOI:10.1145/1006209

General Chair:
Paul Feautrier
LIP, ENS Lyon
,
Program Chairs:
James Goodman
University of Auckland
,
André Seznec
IRISA, INRIA

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ICS04

Sponsor:

ICS04: International Conference on Supercomputing 2004

June 26 - July 1, 2004

Malo, France

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
622
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)4

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Diavastos ACarlson T(2022)Efficient Instruction Scheduling Using Real-time Load Delay TrackingACM Transactions on Computer Systems10.1145/354868140:1-4(1-21)Online publication date: 24-Nov-2022
https://dl.acm.org/doi/10.1145/3548681
Alves RRos ABlack-Schaffer DKaxiras SManne SHunter HAltman E(2019)Filter caching for freeProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322269(436-448)Online publication date: 22-Jun-2019
https://dl.acm.org/doi/10.1145/3307650.3322269
Alves RKaxiras SBlack-Schaffer D(2018)Dynamically Disabling Way-prediction to Reduce Instruction Replay2018 IEEE 36th International Conference on Computer Design (ICCD)10.1109/ICCD.2018.00029(140-143)Online publication date: Oct-2018
https://doi.org/10.1109/ICCD.2018.00029
Perais ASeznec AMichaud PSembrant AHagersten E(2015)Cost-effective speculative scheduling in high performance processorsACM SIGARCH Computer Architecture News10.1145/2872887.274947043:3S(247-259)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2872887.2749470
Perais ASeznec AMichaud PSembrant AHagersten EMarr DAlbonesi D(2015)Cost-effective speculative scheduling in high performance processorsProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2749470(247-259)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2749469.2749470
Choi WPark SDubois M(2009)Accurate Instruction Pre-scheduling in Dynamically Scheduled ProcessorsTransactions on High-Performance Embedded Architectures and Compilers II10.1007/978-3-642-00904-4_7(107-127)Online publication date: 22-Apr-2009
https://dl.acm.org/doi/10.1007/978-3-642-00904-4_7
Ro WGaudiot J(2008)A low-complexity microprocessor design with speculative pre-executionJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2008.05.00354:12(1101-1112)Online publication date: 1-Dec-2008
https://dl.acm.org/doi/10.1016/j.sysarc.2008.05.003
Morancho ELlabería JOlivé ÀMarculescu DRaghunathan AKeshavarzi ANarayanan V(2007)On reducing energy-consumption by late-inserting instructions into the issue queueProceedings of the 2007 international symposium on Low power electronics and design10.1145/1283780.1283861(371-374)Online publication date: 27-Aug-2007
https://dl.acm.org/doi/10.1145/1283780.1283861
Sharkey JPonomarev D(2007)Exploiting Operand Availability for Efficient Simultaneous MultithreadingIEEE Transactions on Computers10.1109/TC.2007.2856:2(208-223)Online publication date: 1-Feb-2007
https://dl.acm.org/doi/10.1109/TC.2007.28
Sharkey JPonomarev D(2006)Efficient Instruction Schedulers for SMT ProcessorsThe Twelfth International Symposium on High-Performance Computer Architecture, 2006.10.1109/HPCA.2006.1598137(293-304)Online publication date: 2006
https://doi.org/10.1109/HPCA.2006.1598137
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents