Instruction cache prefetching using multilevel branch prediction

Alexander V. Veidenbaum¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1336))

Included in the following conference series:

International Symposium on High Performance Computing

133 Accesses
4 Citations

Abstract

This paper presents an instruction cache prefetching mechanism capable of prefetching past branches in multiple-issue processors. Such processors at high clock rates often use small instruction caches which have significant miss rates. Prefetching from secondary cache can hide the instruction cache miss penalties but only if initiated sufficiently far ahead of the current program counter. Existing instruction cache prefetching methods are strictly sequential and cannot do that due to their inability to prefetch past branches. By keeping branch history and branch target addresses we predict a future PC several branches past the current branch. We describe a possible prefetching architecture and evaluate its accuracy, the impact of the instruction prefetching on performance, and its interaction with sequential prefetching. For a 4-issue processor and a cache architecture patterned after the DEC Alpha-21164 we show that our prefetching unit can be more effective than sequential prefetching. The two types of prefetching eliminate different types of misses and thus can be effectively combined to achieve better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

[BaCh91] Jean-Loup Baer and Tien-Fu Chen. “An effective on-chip preloading scheme to reduce data access penalty”, Supercomputing'91, pp. 176–186. November 1991.
Google Scholar
[CaGr95] B. Calder and D. Grunwald, “Next Cache Line and Set Prediction”, International Symposium on Computer Architecture, pp.287–296, May 1995.
Google Scholar
[ChHP95] P.-Y. Chang, E. Hao, and Y. N. Patt.: Alternative Implementations of Hybrid Branch Predictors. In: 28th ACM/IEEE International Symposium on Microarchitecture, Nov. 1995.
Google Scholar
[CHKW86] Fred Chow, A. M. Himelstein, Earl Killian and L. Weber, “Engineering a RISC Compiler System,” IEEE COMPCON, March 1986.
Google Scholar
[CMMP95] T.M. Conte, K. N. Menezes, P.M. Mills, and B.A. Patel, “Optimization of Instruction Fetch Mechanism for High Issue Rates”, International Symposium on Computer Architecture, pp.333–344, May 1995.
Google Scholar
[DuFr95] Simonjit Dutta and Manoj Franklin, “Control Flow Prediction with Tree-Like Subgraphs for Superscalar Processors”. International Symposium on Microarchitecture (Micro28), pp. 258–263, November 1995.
Google Scholar
[ERPR95] J. H. Edmondson, P. R. Rubinfeld, Ronald Predton, and Vidya Rajagopalan. “Superscalar Instruction Execution in the 21164 Alpha Microprocessor”. IEEE Micro, Vol. 15, No. 2, April 1995
Google Scholar
[DEC82] VAX Hardware Handbook, Digital Equipment Coporation, 1982.
Google Scholar
[EvCP96] M. Evers, P-Y Chang, and Y. N. Patt, “Using Hybrid Branch predictors to Improve Branch Prediction Accuracy in The Presence of Context Switches”, International Symposium on Computer Architecture, pp. 3–13, May 1996.
Google Scholar
[HePa96] John L. Hennessy and David A. Patterson, “Computer Architecture, a Quantative Approac”, 2nd edition, pp. 465, 1996.
Google Scholar
[HwCh89] W.-M. Hwu and P. Chang, “Achieving High Instruction Cache Performance with an Optimizing Compiler”, International Symposium on Computer Architecture, pp. 242–251, May 1989.
Google Scholar
[Inte93] Pentium Processor User's Mannual, Vol. 1: Pentium Processor Data Book. Intel, 1993.
Google Scholar
[Joup90] Norman P. Jouppi. “Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers”, International Symposium on Computer Architecture, pp. 364–373, May 1990.
Google Scholar
[JoWi94] Norman P. Jouppi and Steven J.E. Wilton, “Trade-offs in Two-level On-chip caching”, International Symposium on Computer Architecture, pp. 34–45, April 1994.
Google Scholar
[LBCG95] D. Lee, J.-L. Baer, B. Calder, D. Grunwald “Instruction Cache Fetch Policies for Speculative Execution”, International Symposium on Computer Architecture, pp. 357–367, May 1995.
Google Scholar
[McFa89] S. McFarling, “Program Optimization for Instruction Caches”, International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 183–191, 1992
Google Scholar
[PaSR92] S-T Pan, K. So, and J.T. Rameh, “Improving the Accuracy of Dynamic Branch Prediction Using Branch Correlation”", International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 76–84, October 1992.
Google Scholar
[RoBS96] Eric Rotenberg, Steve Bennett, and James E. Smith, “Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching“, 29th Annual International Symposium on Microarchitecture, pp. 24–34, December 1996.
Google Scholar
[SaPN96] Ashley Saulsbury, Fong Pong and Andreas Nowatzyk. “Missing the Memory Wall: the Case for Processor/Memory Integration”. Computer Architecture News, Vol. 24, No. 2, pp.90–101, May, 1996.
Article Google Scholar
[SeLM96] S. Sechrest, C-C Lee and T. Mudge. “Correlation and Aliasing in Dynamic Branch Prediction ”, International Symposium on Computer Architecture, pp. 22–32, May 1996.
Google Scholar
[SJSM96] A. Seznec, S. Jourdan, P. Sainrat, P. Michaud. “Multiple-Block Ahead Branch Prediction”, International Symposium on Computer Architecture, pp. 116–127, May 1996.
Google Scholar
[Smit81] J. E. Smith. “A Study of Branch Prediction Strategies.” Proceedings of the 8th International Symposium on Computer Architecture, pp. 135–148, May, 1981.
Google Scholar
[SmHS92] J.E. Smith and W.-C. Hsu, “Prefetching in Supercomputer Instruction Caches”, International Supercomputing Conference, pp. 588–597, July 1992
Google Scholar
[UNMS95] R. Uhlig, D. Nagle, T. Mudge, S. Sechrest, and J. Emer, “Instruction Fetching: Coping with Code Bloat”, International Symposium on Computer Architecture, pp. 348–356, May 1995.
Google Scholar
[YePa91] T.-Y. Yeh and Y. N. Patt. “Two Level Adaptive Branch Prediction.” 24th ACM/IEEE International Symposium on Microarchitecture, Nov. 1991.
Google Scholar
[YeMP93] T-Y Yeh, D.T. Mart, and Y. N. Part, “Increasing Instruction Fetch Rate via Multiple Branch Predictions and a Branch Address Cache”, International Conference on Supercomputing, pp. 67–76, July 1993.
Google Scholar
[XiTo96] C. Xia and J. Torrrellas, “Instruction Prefetching of Systems Codes with Optimized Layout for Reduced Cache Misses”, International Symposium on Computer Architecture, pp. 271–283, May 1996.
Google Scholar
[Zhao96] Q. Zhao, “Performance evaluation of instruciton prefetching using multi-level branch prediction”, M.S. Thesis, EECS Dept., University of Illinois at Chicago, October 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. Of Electrical Engineering and Computer Science, University of Illinois, Chicago, USA
Alexander V. Veidenbaum

Authors

Alexander V. Veidenbaum
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Constantine Polychronopoulos Kazuki Joe Keijiro Araki Makoto Amamiya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Veidenbaum, A.V. (1997). Instruction cache prefetching using multilevel branch prediction. In: Polychronopoulos, C., Joe, K., Araki, K., Amamiya, M. (eds) High Performance Computing. ISHPC 1997. Lecture Notes in Computer Science, vol 1336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0024203

Download citation

DOI: https://doi.org/10.1007/BFb0024203
Published: 09 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63766-0
Online ISBN: 978-3-540-69644-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics