Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/339647.339705acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article
Free access

Early load address resolution via register tracking

Published: 01 May 2000 Publication History

Abstract

Higher microprocessor frequencies accentuate the performance cost of memory accesses. This is especially noticeable in the Intel's IA32 architecture where lack of registers results in increased number of memory accesses. This paper presents novel, non-speculative technique that partially hides the increasing load-to-use latency, by allowing the early issue of load instructions. Early load address resolution relies on register tracking to safely compute the addresses of memory references in the front-end part of the processor pipeline. Register tracking enables decode-time computation of register values by tracking simple operations of the form reg±immediate. Register tracking may be performed in any pipeline stage following instruction decode and prior to execution.
Several tracking schemes are proposed in this paper:
Stack pointer tracking allows safe early resolution of stack references by keeping track of the value of the ESP register (the stack pointer). About 25% of all loads are stack loads and 95% of these loads may be resolved in the front-end.
Absolute address tracking allows the early resolution of constant-address loads.
Displacement-based tracking tackles all loads with addresses of the form reg±immediate by tracking the values of all general-purpose registers. This class corresponds to 82% of all loads, and about 65% of these loads can be safely resolved in the front-end pipeline.
The paper describes the tracking schemes, analyzes their performance potential in a deeply pipelined processor and discusses the integration of tracking with memory disambiguation.

References

[1]
T. M. Austin and G. S. Sohi, Zero-cycle Loads: Microarchitecture Support for Reducing Load Latency, in Proceedings of the 28th Annual International Symposium on Microarchitecture, November 1995.
[2]
T.M.Austin, D.N. Pnevmatikatos, G.S. Sohi. Streamlining Data Cache Access with Fast Address Calculation, In 22nd International Symposium on Computer Architecture, 1995, pp. 369-380
[3]
J. Baer and T. Chen, An Effective On-Chip Preloading Scheme to Reduce Data Access Penalty, in Proceedings of the International Conference on Supercomputing, November 1991.
[4]
M. Bekerman, S. Jourdan, R. Ronen, G. Kirshenboim, L. Rappoport, A. Yoaz, U. Weiser. Correlated Load Address Predictors, in Proceedings of the 26th Annual International Symposium on Computer Architecture, May 1999.
[5]
T. Chen and and J. Baer, Effective Hardware-Based Data Prefetching for High-Performance Processors, in IEEE Transactions on Computer, V.45 N.5, May 1995.
[6]
S. Cho, P.-C. Yew, G. Lee. Decoupling Local Variable Accesses in a Wide-Issue Superscalar Processor, in Proceedings of the 26th International Symposium on Computer Architecture, May 1999.
[7]
G. Chrysos and J. Emer, Memory Dependence Prediction Using Store Sets, in Proceedings of the 25th International Symposium on Computer Architecture, July 1998.
[8]
D. Ditzel and R. McLellan. Register Allocation for Free: The C Machine Stack Cache, in Proc. of the Symposium on Architectural Support for Programming Languages and Operating Systems, March 1982.
[9]
R. J. Eickemeyer and S. Vassiliadis, A Load-Instruction Unit for Pipelined Processors, in IBM Journal of Research and Development, July 93.
[10]
F. Gabbay and A. Mendelson. The Effect of Instruction Fetch Bandwidth on Value Prediction, in Proceeding of the 25th International Symposium on computer Architecture, July, 1998.
[11]
J. Gonzalez and A. Gonzalez, Speculative Execution via Address Prediction and Data Prefetching, in Proceedings of the International Conference on Supercomputing, 1997.
[12]
Pentium Pro Family Developer Manual, Volume 2: Programmer s Reference Manual, Intel Corporation, 1996
[13]
S. Jourdan, R. Ronen, M. Bekerman, B. Shomar, A. Yoaz, A Novel Renaming Scheme to Exploit Value Temporal Locality Through Physical Register Reuse and Unification, in Proceedings of the 31st Annual International Symposium on Microarchitecture, November 1998.
[14]
M. H. Lipasti, C. B. Wilkerson, and J. P. Shen, Value Locality and Load Value Prediction, in Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, October 1996.
[15]
A. I. Moshovos, S. E. Breach, T. N. Vijaykumar, and G. S. Sohi, Speculation and Synchronization of Data Dependencies, in Proceedings of the 24th International Symposium on Computer Architecture, June 1997.
[16]
A. I. Moshovos and G. S. Sohi, Streamlining Inter-operation Memory Communication via Data Dependence Prediction, in Proceedings of the 30th Annual international Symposium on Microarchitecture, December 1997.
[17]
E. Rotenberg, S. Bennett, and J. Smith, Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching, in Proceedings of the 29th International Symposium on Microarchitecture, December 1996.
[18]
R. Valentine, G. Sheaffer, R. Ronen, I. Spillinger and A. Yoaz, Out-of-order Superscalar Microprocessor with a Renaming Device that Maps Instructions from Memory to Registers, U.S. Patent 5,838,941, November 1998.
[19]
A. Yoaz, M. Erez, R. Ronen, and S. Jourdan, Speculation Techniques for Improving Load Related Instruction Scheduling, in Proceedings of the 26th Annual International Symposium on Computer Architecture, May 1999.

Cited By

View all
  • (2010)On reducing load/store latencies of cache accessesJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2009.10.00156:1(1-15)Online publication date: 1-Jan-2010
  • (2009)A Tale of Two ProcessorsProceedings of the 2009 SPEC Benchmark Workshop on Computer Performance Evaluation and Benchmarking10.1007/978-3-540-93799-9_4(57-76)Online publication date: 20-Jan-2009
  • (2008)Comparative architectural characterization of SPEC CPU2000 and CPU2006 benchmarks on the intel® Core™ 2 Duo processor2008 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation10.1109/ICSAMOS.2008.4664856(132-141)Online publication date: Jul-2008
  • Show More Cited By

Index Terms

  1. Early load address resolution via register tracking

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture
        June 2000
        327 pages
        ISBN:1581132328
        DOI:10.1145/339647
        • cover image ACM SIGARCH Computer Architecture News
          ACM SIGARCH Computer Architecture News  Volume 28, Issue 2
          Special Issue: Proceedings of the 27th annual international symposium on Computer architecture (ISCA '00)
          May 2000
          325 pages
          ISSN:0163-5964
          DOI:10.1145/342001
          Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 01 May 2000

        Permissions

        Request permissions for this article.

        Check for updates

        Qualifiers

        • Article

        Conference

        ISCA00
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 543 of 3,203 submissions, 17%

        Upcoming Conference

        ISCA '25

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)149
        • Downloads (Last 6 weeks)18
        Reflects downloads up to 14 Nov 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2010)On reducing load/store latencies of cache accessesJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2009.10.00156:1(1-15)Online publication date: 1-Jan-2010
        • (2009)A Tale of Two ProcessorsProceedings of the 2009 SPEC Benchmark Workshop on Computer Performance Evaluation and Benchmarking10.1007/978-3-540-93799-9_4(57-76)Online publication date: 20-Jan-2009
        • (2008)Comparative architectural characterization of SPEC CPU2000 and CPU2006 benchmarks on the intel® Core™ 2 Duo processor2008 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation10.1109/ICSAMOS.2008.4664856(132-141)Online publication date: Jul-2008
        • (2008)Block remap with turnoff: A variation-tolerant cache design technique2008 Asia and South Pacific Design Automation Conference10.1109/ASPDAC.2008.4484058(783-788)Online publication date: Jan-2008
        • (2008)Early load: Hiding load latency in deep pipeline processor2008 13th Asia-Pacific Computer Systems Architecture Conference10.1109/APCSAC.2008.4625440(1-8)Online publication date: Aug-2008
        • (2007)Working with process variation aware cachesProceedings of the conference on Design, automation and test in Europe10.5555/1266366.1266615(1152-1157)Online publication date: 16-Apr-2007
        • (2007)Working with Process Variation Aware Caches2007 Design, Automation & Test in Europe Conference & Exhibition10.1109/DATE.2007.364450(1-6)Online publication date: Apr-2007
        • (2006)Data value prefetching method based on Markov modelProceedings of the 10th WSEAS international conference on Computers10.5555/1981848.1982056(1139-1144)Online publication date: 13-Jul-2006
        • (2005)Reducing latencies of pipelined cache accesses through set predictionProceedings of the 19th annual international conference on Supercomputing10.1145/1088149.1088151(2-11)Online publication date: 20-Jun-2005
        • (2005)RENOACM SIGARCH Computer Architecture News10.1145/1080695.106998033:2(98-109)Online publication date: 1-May-2005
        • Show More Cited By

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media