Abstract
This paper presents an instruction cache prefetching mechanism capable of prefetching past branches in multiple-issue processors. Such processors at high clock rates often use small instruction caches which have significant miss rates. Prefetching from secondary cache can hide the instruction cache miss penalties but only if initiated sufficiently far ahead of the current program counter. Existing instruction cache prefetching methods are strictly sequential and cannot do that due to their inability to prefetch past branches. By keeping branch history and branch target addresses we predict a future PC several branches past the current branch. We describe a possible prefetching architecture and evaluate its accuracy, the impact of the instruction prefetching on performance, and its interaction with sequential prefetching. For a 4-issue processor and a cache architecture patterned after the DEC Alpha-21164 we show that our prefetching unit can be more effective than sequential prefetching. The two types of prefetching eliminate different types of misses and thus can be effectively combined to achieve better performance.
Preview
Unable to display preview. Download preview PDF.
References
[BaCh91] Jean-Loup Baer and Tien-Fu Chen. “An effective on-chip preloading scheme to reduce data access penalty”, Supercomputing'91, pp. 176–186. November 1991.
[CaGr95] B. Calder and D. Grunwald, “Next Cache Line and Set Prediction”, International Symposium on Computer Architecture, pp.287–296, May 1995.
[ChHP95] P.-Y. Chang, E. Hao, and Y. N. Patt.: Alternative Implementations of Hybrid Branch Predictors. In: 28th ACM/IEEE International Symposium on Microarchitecture, Nov. 1995.
[CHKW86] Fred Chow, A. M. Himelstein, Earl Killian and L. Weber, “Engineering a RISC Compiler System,” IEEE COMPCON, March 1986.
[CMMP95] T.M. Conte, K. N. Menezes, P.M. Mills, and B.A. Patel, “Optimization of Instruction Fetch Mechanism for High Issue Rates”, International Symposium on Computer Architecture, pp.333–344, May 1995.
[DuFr95] Simonjit Dutta and Manoj Franklin, “Control Flow Prediction with Tree-Like Subgraphs for Superscalar Processors”. International Symposium on Microarchitecture (Micro28), pp. 258–263, November 1995.
[ERPR95] J. H. Edmondson, P. R. Rubinfeld, Ronald Predton, and Vidya Rajagopalan. “Superscalar Instruction Execution in the 21164 Alpha Microprocessor”. IEEE Micro, Vol. 15, No. 2, April 1995
[DEC82] VAX Hardware Handbook, Digital Equipment Coporation, 1982.
[EvCP96] M. Evers, P-Y Chang, and Y. N. Patt, “Using Hybrid Branch predictors to Improve Branch Prediction Accuracy in The Presence of Context Switches”, International Symposium on Computer Architecture, pp. 3–13, May 1996.
[HePa96] John L. Hennessy and David A. Patterson, “Computer Architecture, a Quantative Approac”, 2nd edition, pp. 465, 1996.
[HwCh89] W.-M. Hwu and P. Chang, “Achieving High Instruction Cache Performance with an Optimizing Compiler”, International Symposium on Computer Architecture, pp. 242–251, May 1989.
[Inte93] Pentium Processor User's Mannual, Vol. 1: Pentium Processor Data Book. Intel, 1993.
[Joup90] Norman P. Jouppi. “Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers”, International Symposium on Computer Architecture, pp. 364–373, May 1990.
[JoWi94] Norman P. Jouppi and Steven J.E. Wilton, “Trade-offs in Two-level On-chip caching”, International Symposium on Computer Architecture, pp. 34–45, April 1994.
[LBCG95] D. Lee, J.-L. Baer, B. Calder, D. Grunwald “Instruction Cache Fetch Policies for Speculative Execution”, International Symposium on Computer Architecture, pp. 357–367, May 1995.
[McFa89] S. McFarling, “Program Optimization for Instruction Caches”, International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 183–191, 1992
[PaSR92] S-T Pan, K. So, and J.T. Rameh, “Improving the Accuracy of Dynamic Branch Prediction Using Branch Correlation”", International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 76–84, October 1992.
[RoBS96] Eric Rotenberg, Steve Bennett, and James E. Smith, “Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching“, 29th Annual International Symposium on Microarchitecture, pp. 24–34, December 1996.
[SaPN96] Ashley Saulsbury, Fong Pong and Andreas Nowatzyk. “Missing the Memory Wall: the Case for Processor/Memory Integration”. Computer Architecture News, Vol. 24, No. 2, pp.90–101, May, 1996.
[SeLM96] S. Sechrest, C-C Lee and T. Mudge. “Correlation and Aliasing in Dynamic Branch Prediction ”, International Symposium on Computer Architecture, pp. 22–32, May 1996.
[SJSM96] A. Seznec, S. Jourdan, P. Sainrat, P. Michaud. “Multiple-Block Ahead Branch Prediction”, International Symposium on Computer Architecture, pp. 116–127, May 1996.
[Smit81] J. E. Smith. “A Study of Branch Prediction Strategies.” Proceedings of the 8th International Symposium on Computer Architecture, pp. 135–148, May, 1981.
[SmHS92] J.E. Smith and W.-C. Hsu, “Prefetching in Supercomputer Instruction Caches”, International Supercomputing Conference, pp. 588–597, July 1992
[UNMS95] R. Uhlig, D. Nagle, T. Mudge, S. Sechrest, and J. Emer, “Instruction Fetching: Coping with Code Bloat”, International Symposium on Computer Architecture, pp. 348–356, May 1995.
[YePa91] T.-Y. Yeh and Y. N. Patt. “Two Level Adaptive Branch Prediction.” 24th ACM/IEEE International Symposium on Microarchitecture, Nov. 1991.
[YeMP93] T-Y Yeh, D.T. Mart, and Y. N. Part, “Increasing Instruction Fetch Rate via Multiple Branch Predictions and a Branch Address Cache”, International Conference on Supercomputing, pp. 67–76, July 1993.
[XiTo96] C. Xia and J. Torrrellas, “Instruction Prefetching of Systems Codes with Optimized Layout for Reduced Cache Misses”, International Symposium on Computer Architecture, pp. 271–283, May 1996.
[Zhao96] Q. Zhao, “Performance evaluation of instruciton prefetching using multi-level branch prediction”, M.S. Thesis, EECS Dept., University of Illinois at Chicago, October 1996.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Veidenbaum, A.V. (1997). Instruction cache prefetching using multilevel branch prediction. In: Polychronopoulos, C., Joe, K., Araki, K., Amamiya, M. (eds) High Performance Computing. ISHPC 1997. Lecture Notes in Computer Science, vol 1336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0024203
Download citation
DOI: https://doi.org/10.1007/BFb0024203
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63766-0
Online ISBN: 978-3-540-69644-5
eBook Packages: Springer Book Archive