Article

Free access

Early load address resolution via register tracking

Authors:

Michael Bekerman,

Stephan Jourdan,

Ronny RonenAuthors Info & Claims

ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture

Pages 306 - 315

https://doi.org/10.1145/339647.339705

Published: 01 May 2000 Publication History

Abstract

Higher microprocessor frequencies accentuate the performance cost of memory accesses. This is especially noticeable in the Intel's IA32 architecture where lack of registers results in increased number of memory accesses. This paper presents novel, non-speculative technique that partially hides the increasing load-to-use latency, by allowing the early issue of load instructions. Early load address resolution relies on register tracking to safely compute the addresses of memory references in the front-end part of the processor pipeline. Register tracking enables decode-time computation of register values by tracking simple operations of the form reg±immediate. Register tracking may be performed in any pipeline stage following instruction decode and prior to execution.

Several tracking schemes are proposed in this paper:

Stack pointer tracking allows safe early resolution of stack references by keeping track of the value of the ESP register (the stack pointer). About 25% of all loads are stack loads and 95% of these loads may be resolved in the front-end.

Absolute address tracking allows the early resolution of constant-address loads.

Displacement-based tracking tackles all loads with addresses of the form reg±immediate by tracking the values of all general-purpose registers. This class corresponds to 82% of all loads, and about 65% of these loads can be safely resolved in the front-end pipeline.

The paper describes the tracking schemes, analyzes their performance potential in a deeply pipelined processor and discusses the integration of tracking with memory disambiguation.

References

[1]

T. M. Austin and G. S. Sohi, Zero-cycle Loads: Microarchitecture Support for Reducing Load Latency, in Proceedings of the 28th Annual International Symposium on Microarchitecture, November 1995.

Digital Library

[2]

T.M.Austin, D.N. Pnevmatikatos, G.S. Sohi. Streamlining Data Cache Access with Fast Address Calculation, In 22nd International Symposium on Computer Architecture, 1995, pp. 369-380

Digital Library

[3]

J. Baer and T. Chen, An Effective On-Chip Preloading Scheme to Reduce Data Access Penalty, in Proceedings of the International Conference on Supercomputing, November 1991.

Digital Library

[4]

M. Bekerman, S. Jourdan, R. Ronen, G. Kirshenboim, L. Rappoport, A. Yoaz, U. Weiser. Correlated Load Address Predictors, in Proceedings of the 26th Annual International Symposium on Computer Architecture, May 1999.

Digital Library

[5]

T. Chen and and J. Baer, Effective Hardware-Based Data Prefetching for High-Performance Processors, in IEEE Transactions on Computer, V.45 N.5, May 1995.

Digital Library

[6]

S. Cho, P.-C. Yew, G. Lee. Decoupling Local Variable Accesses in a Wide-Issue Superscalar Processor, in Proceedings of the 26th International Symposium on Computer Architecture, May 1999.

Digital Library

[7]

G. Chrysos and J. Emer, Memory Dependence Prediction Using Store Sets, in Proceedings of the 25th International Symposium on Computer Architecture, July 1998.

[8]

D. Ditzel and R. McLellan. Register Allocation for Free: The C Machine Stack Cache, in Proc. of the Symposium on Architectural Support for Programming Languages and Operating Systems, March 1982.

Digital Library

[9]

R. J. Eickemeyer and S. Vassiliadis, A Load-Instruction Unit for Pipelined Processors, in IBM Journal of Research and Development, July 93.

Digital Library

[10]

F. Gabbay and A. Mendelson. The Effect of Instruction Fetch Bandwidth on Value Prediction, in Proceeding of the 25th International Symposium on computer Architecture, July, 1998.

Digital Library

[11]

J. Gonzalez and A. Gonzalez, Speculative Execution via Address Prediction and Data Prefetching, in Proceedings of the International Conference on Supercomputing, 1997.

Digital Library

[12]

Pentium Pro Family Developer Manual, Volume 2: Programmer s Reference Manual, Intel Corporation, 1996

[13]

S. Jourdan, R. Ronen, M. Bekerman, B. Shomar, A. Yoaz, A Novel Renaming Scheme to Exploit Value Temporal Locality Through Physical Register Reuse and Unification, in Proceedings of the 31st Annual International Symposium on Microarchitecture, November 1998.

Digital Library

[14]

M. H. Lipasti, C. B. Wilkerson, and J. P. Shen, Value Locality and Load Value Prediction, in Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, October 1996.

Digital Library

[15]

A. I. Moshovos, S. E. Breach, T. N. Vijaykumar, and G. S. Sohi, Speculation and Synchronization of Data Dependencies, in Proceedings of the 24th International Symposium on Computer Architecture, June 1997.

Digital Library

[16]

A. I. Moshovos and G. S. Sohi, Streamlining Inter-operation Memory Communication via Data Dependence Prediction, in Proceedings of the 30th Annual international Symposium on Microarchitecture, December 1997.

Digital Library

[17]

E. Rotenberg, S. Bennett, and J. Smith, Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching, in Proceedings of the 29th International Symposium on Microarchitecture, December 1996.

Digital Library

[18]

R. Valentine, G. Sheaffer, R. Ronen, I. Spillinger and A. Yoaz, Out-of-order Superscalar Microprocessor with a Renaming Device that Maps Instructions from Memory to Registers, U.S. Patent 5,838,941, November 1998.

[19]

A. Yoaz, M. Erez, R. Ronen, and S. Jourdan, Speculation Techniques for Improving Load Related Instruction Scheduling, in Proceedings of the 26th Annual International Symposium on Computer Architecture, May 1999.

Digital Library

Cited By

Hwang YLi J(2010)On reducing load/store latencies of cache accessesJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2009.10.00156:1(1-15)Online publication date: 1-Jan-2010
https://dl.acm.org/doi/10.1016/j.sysarc.2009.10.001
Isen CJohn LJohn E(2009)A Tale of Two ProcessorsProceedings of the 2009 SPEC Benchmark Workshop on Computer Performance Evaluation and Benchmarking10.1007/978-3-540-93799-9_4(57-76)Online publication date: 20-Jan-2009
https://dl.acm.org/doi/10.1007/978-3-540-93799-9_4
Kejariwal AVeidenbaum ANicolau ATian XGirkar MSaito HBanerjee U(2008)Comparative architectural characterization of SPEC CPU2000 and CPU2006 benchmarks on the intel® Core™ 2 Duo processor2008 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation10.1109/ICSAMOS.2008.4664856(132-141)Online publication date: Jul-2008
https://doi.org/10.1109/ICSAMOS.2008.4664856
Show More Cited By

Index Terms

Early load address resolution via register tracking
1. Hardware
  1. Hardware validation
  2. Robustness
    1. Fault tolerance
    2. Hardware reliability

Recommendations

Early load address resolution via register tracking
Special Issue: Proceedings of the 27th annual international symposium on Computer architecture (ISCA '00)
Higher microprocessor frequencies accentuate the performance cost of memory accesses. This is especially noticeable in the Intel's IA32 architecture where lack of registers results in increased number of memory accesses. This paper presents novel, non-...
Address-Value Decoupling for Early Register Deallocation
ICPP '06: Proceedings of the 2006 International Conference on Parallel Processing

We propose a series of aggressive register deallocation mechanisms to reduce the register file pressure and increase the parallelism exploited by superscalar microprocessors. Our techniques are based on a key observation that a register value can be ...
Speculative register promotion using Advanced Load Address Table (ALAT)
CGO '03: Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization

The pervasive use of pointers with complicated patterns in C programs often constrains compiler alias analysis to yield conservative register allocation and promotion. Speculative register promotion with hardware support has the potential to more ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture

June 2000

327 pages

ISBN:1581132328

DOI:10.1145/339647

Chairmen:
Alan Berenbaum
Lucent Technologies
,
Joel Emer
Compaq Computer Corp.

ACM SIGARCH Computer Architecture News Volume 28, Issue 2
Special Issue: Proceedings of the 27th annual international symposium on Computer architecture (ISCA '00)
May 2000
325 pages
ISSN:0163-5964
DOI:10.1145/342001
Chairmen:
Alan Berenbaum
Lucent Technologies, Berkeley Heights, NJ
,
Joel Emer
Compaq Computer Corp., Palo Alto, CA
Issue’s Table of Contents

Copyright © 2000 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2000

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ISCA00

Sponsor:

SIGARCH

ISCA00: 27th International Symposium on Computer Architecture

British Columbia, Vancouver, Canada

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

34
Total Citations
View Citations
798
Total Downloads

Downloads (Last 12 months)149
Downloads (Last 6 weeks)18

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hwang YLi J(2010)On reducing load/store latencies of cache accessesJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2009.10.00156:1(1-15)Online publication date: 1-Jan-2010
https://dl.acm.org/doi/10.1016/j.sysarc.2009.10.001
Isen CJohn LJohn E(2009)A Tale of Two ProcessorsProceedings of the 2009 SPEC Benchmark Workshop on Computer Performance Evaluation and Benchmarking10.1007/978-3-540-93799-9_4(57-76)Online publication date: 20-Jan-2009
https://dl.acm.org/doi/10.1007/978-3-540-93799-9_4
Kejariwal AVeidenbaum ANicolau ATian XGirkar MSaito HBanerjee U(2008)Comparative architectural characterization of SPEC CPU2000 and CPU2006 benchmarks on the intel® Core™ 2 Duo processor2008 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation10.1109/ICSAMOS.2008.4664856(132-141)Online publication date: Jul-2008
https://doi.org/10.1109/ICSAMOS.2008.4664856
Hussain MMutyam M(2008)Block remap with turnoff: A variation-tolerant cache design technique2008 Asia and South Pacific Design Automation Conference10.1109/ASPDAC.2008.4484058(783-788)Online publication date: Jan-2008
https://doi.org/10.1109/ASPDAC.2008.4484058
Shun-Chieh Chang Li WYuan-Jung Kuo Chung-Ping Chung (2008)Early load: Hiding load latency in deep pipeline processor2008 13th Asia-Pacific Computer Systems Architecture Conference10.1109/APCSAC.2008.4625440(1-8)Online publication date: Aug-2008
https://doi.org/10.1109/APCSAC.2008.4625440
Mutyam MNarayanan VLauwereins RMadsen J(2007)Working with process variation aware cachesProceedings of the conference on Design, automation and test in Europe10.5555/1266366.1266615(1152-1157)Online publication date: 16-Apr-2007
https://dl.acm.org/doi/10.5555/1266366.1266615
Mutyam MNarayanan V(2007)Working with Process Variation Aware Caches2007 Design, Automation & Test in Europe Conference & Exhibition10.1109/DATE.2007.364450(1-6)Online publication date: Apr-2007
https://doi.org/10.1109/DATE.2007.364450
Xie XYu MYe Y(2006)Data value prefetching method based on Markov modelProceedings of the 10th WSEAS international conference on Computers10.5555/1981848.1982056(1139-1144)Online publication date: 13-Jul-2006
https://dl.acm.org/doi/10.5555/1981848.1982056
Aggarwal AArvind Rudolph L(2005)Reducing latencies of pipelined cache accesses through set predictionProceedings of the 19th annual international conference on Supercomputing10.1145/1088149.1088151(2-11)Online publication date: 20-Jun-2005
https://dl.acm.org/doi/10.1145/1088149.1088151
Petric VSha TRoth A(2005)RENOACM SIGARCH Computer Architecture News10.1145/1080695.106998033:2(98-109)Online publication date: 1-May-2005
https://dl.acm.org/doi/10.1145/1080695.1069980
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents