Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Reducing Branch Delay to Zero in Pipelined Processors

Published: 01 March 1993 Publication History

Abstract

A mechanism to reduce the cost of branches in pipelined processors is described and evaluated. It is based on the use of multiple prefetch, early computation of the target address, delayed branch, and parallel execution of branches. The implementation of this mechanism using a branch target instruction memory is described. An analytical model of the performance of this implementation makes it possible to measure the efficiency of the mechanism with a very low computational cost. The model is used to determine the size of cache lines that maximizes the processor performance, to compare the performance of the mechanism with that of other schemes, and to analyze the performance of the mechanism with two alternative cache organizations.

References

[1]
{1} J. Cortadella and J. M. Llabería, "Low cost evaluation methodology for new architectures," in Proc. IASTED Int. Symp. Appl. Informatics, Feb. 1987, pp. 192-195.
[2]
{2} J. H. Crawford, "The i486 CPU: Executing instruction in one clock cycle," IEEE Micro, vol. 10, no. 1, pp. 27-36, Feb. 1990.
[3]
{3} R. W. Edenfield, "The 68040 Processor. Part 1, Design and implementation," IEEE Micro, vol. 10, no. 1, pp. 66-78, Feb. 1990.
[4]
{4} R. B. Garner et al., "The scalable processor architecture (SPARC)," in Proc. 33rd. IEEE Int. Comput. Soc. Conf., COMPCON'88, Feb. 1988, pp. 278-283.
[5]
{5} A. González, "Designing an instruction cache for reducing the cost of branches," Rese. Rep. UPC/DAC RR-91/02, Comput. Architecture Dep., Polythecnic Univ. of Catalonia, Barcelona, Jan. 1991.
[6]
{6} A. González and J.M. Llabería, "Instruction fetch unit for parallel execution of branch instructions," in Proc. 3rd Int. Conf. Supercomput., ACM SIGARCH ICS-89, June 1989, pp. 417-426.
[7]
{7} A. González, J. M. Llabería, and J. Cortadella, "Zero-delay cost branches in RISC architectures," in Proc. IASTED Int. Symp. Appl. Informatics, Feb. 1988, pp. 24-27.
[8]
{8} A. González, J. M. Llabería, and J. Cortadella, "A mechanism for reducing the cost of branches in RISC architectures," Microprocessing and Microprogramming, vol. 24, no. 1-5, pp. 565-572, Aug. 1988.
[9]
{9} G. F. Grohoski, "Machine organization of the IBM RISC System/6000 Processor," IBM J. Res. Develop., vol. 34, no. 1, pp. 37-58, Jan. 1990.
[10]
{10} T. R. Gross and J. L. Hennessy, "Optimizing delayed branches," in Proc. 15th Annu. Workshop Microprogramming, ACM SIGMICRO, Oct. 1982, pp. 114-120.
[11]
{11} G. Hinton, "80960 -- Next generation," in Proc. 34th. IEEE Comput. Society Conf. COMPCON'89, Feb. 1989, pp. 13-17.
[12]
{12} M. Johnson, "System considerations in the design of the Am29000," IEEE Micro, vol. 7, no. 4, pp. 29-41, Aug. 1987.
[13]
{13} M. G. H. Katevenis, Reduced Instruction Set Computer Architecture for VLSI. Cambridge, MA, MIT Press, 1985.
[14]
{14} D. L. Lilja, "Reducing the branch penalty in pipelined processors," IEEE Comput. Mag., vol. 21, no. 7, pp. 47-55, July 1988.
[15]
{15} S. McFarling and J. Hennessy, "Reducing the cost of branches," in Proc. 13th Int. Symp. Comput. Architecture, 1986, pp. 396-403.
[16]
{16} T. Riordan et al., "System design using the MIPS R3000/3010 RISC Chipset," in Proc. 34th IEEE Comput. Soc. Conf., COMPCON'89, Feb. 1989, pp. 494-498.

Cited By

View all
  • (1993)MIDEEProceedings of the 26th annual international symposium on Microarchitecture10.5555/255235.255286(193-201)Online publication date: 1-Dec-1993

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers
IEEE Transactions on Computers  Volume 42, Issue 3
March 1993
130 pages

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 March 1993

Author Tags

  1. branch delay
  2. branch target instruction memory
  3. buffer storage
  4. cache lines
  5. computational cost
  6. delayed branch
  7. early computation
  8. multiple prefetch
  9. parallel execution
  10. performance
  11. performance evaluation
  12. pipeline processing.
  13. pipelined processors
  14. target address

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (1993)MIDEEProceedings of the 26th annual international symposium on Microarchitecture10.5555/255235.255286(193-201)Online publication date: 1-Dec-1993

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media