research-article

Reducing Branch Delay to Zero in Pipelined Processors

Authors:

A. M. Gonzalez,

J. M. LlaberiaAuthors Info & Claims

IEEE Transactions on Computers, Volume 42, Issue 3

Pages 363 - 371

https://doi.org/10.1109/12.210179

Published: 01 March 1993 Publication History

Abstract

A mechanism to reduce the cost of branches in pipelined processors is described and evaluated. It is based on the use of multiple prefetch, early computation of the target address, delayed branch, and parallel execution of branches. The implementation of this mechanism using a branch target instruction memory is described. An analytical model of the performance of this implementation makes it possible to measure the efficiency of the mechanism with a very low computational cost. The model is used to determine the size of cache lines that maximizes the processor performance, to compare the performance of the mechanism with that of other schemes, and to analyze the performance of the mechanism with two alternative cache organizations.

References

[1]

{1} J. Cortadella and J. M. Llabería, "Low cost evaluation methodology for new architectures," in Proc. IASTED Int. Symp. Appl. Informatics, Feb. 1987, pp. 192-195.

[2]

{2} J. H. Crawford, "The i486 CPU: Executing instruction in one clock cycle," IEEE Micro, vol. 10, no. 1, pp. 27-36, Feb. 1990.

Digital Library

[3]

{3} R. W. Edenfield, "The 68040 Processor. Part 1, Design and implementation," IEEE Micro, vol. 10, no. 1, pp. 66-78, Feb. 1990.

Digital Library

[4]

{4} R. B. Garner et al., "The scalable processor architecture (SPARC)," in Proc. 33rd. IEEE Int. Comput. Soc. Conf., COMPCON'88, Feb. 1988, pp. 278-283.

[5]

{5} A. González, "Designing an instruction cache for reducing the cost of branches," Rese. Rep. UPC/DAC RR-91/02, Comput. Architecture Dep., Polythecnic Univ. of Catalonia, Barcelona, Jan. 1991.

[6]

{6} A. González and J.M. Llabería, "Instruction fetch unit for parallel execution of branch instructions," in Proc. 3rd Int. Conf. Supercomput., ACM SIGARCH ICS-89, June 1989, pp. 417-426.

Digital Library

[7]

{7} A. González, J. M. Llabería, and J. Cortadella, "Zero-delay cost branches in RISC architectures," in Proc. IASTED Int. Symp. Appl. Informatics, Feb. 1988, pp. 24-27.

[8]

{8} A. González, J. M. Llabería, and J. Cortadella, "A mechanism for reducing the cost of branches in RISC architectures," Microprocessing and Microprogramming, vol. 24, no. 1-5, pp. 565-572, Aug. 1988.

[9]

{9} G. F. Grohoski, "Machine organization of the IBM RISC System/6000 Processor," IBM J. Res. Develop., vol. 34, no. 1, pp. 37-58, Jan. 1990.

Digital Library

[10]

{10} T. R. Gross and J. L. Hennessy, "Optimizing delayed branches," in Proc. 15th Annu. Workshop Microprogramming, ACM SIGMICRO, Oct. 1982, pp. 114-120.

Digital Library

[11]

{11} G. Hinton, "80960 -- Next generation," in Proc. 34th. IEEE Comput. Society Conf. COMPCON'89, Feb. 1989, pp. 13-17.

[12]

{12} M. Johnson, "System considerations in the design of the Am29000," IEEE Micro, vol. 7, no. 4, pp. 29-41, Aug. 1987.

Digital Library

[13]

{13} M. G. H. Katevenis, Reduced Instruction Set Computer Architecture for VLSI. Cambridge, MA, MIT Press, 1985.

Digital Library

[14]

{14} D. L. Lilja, "Reducing the branch penalty in pipelined processors," IEEE Comput. Mag., vol. 21, no. 7, pp. 47-55, July 1988.

Digital Library

[15]

{15} S. McFarling and J. Hennessy, "Reducing the cost of branches," in Proc. 13th Int. Symp. Comput. Architecture, 1986, pp. 396-403.

Digital Library

[16]

{16} T. Riordan et al., "System design using the MIPS R3000/3010 RISC Chipset," in Proc. 34th IEEE Comput. Soc. Conf., COMPCON'89, Feb. 1989, pp. 494-498.

Cited By

Drach NSeznec AWolfe AMangione-Smith W(1993)MIDEEProceedings of the 26th annual international symposium on Microarchitecture10.5555/255235.255286(193-201)Online publication date: 1-Dec-1993
https://dl.acm.org/doi/10.5555/255235.255286

Reducing Branch Delay to Zero in Pipelined Processors

Recommendations

Branch Target Buffer Design and Optimization

A branch target buffer (BTB) can reduce the performance penalty of branches in pipelined processors by predicting the path of the branch and caching information used by the branch. Two major issues in the design of BTBs that achieves maximum performance ...
Design and evaluation of an instruction cache for reducing the cost of branches
Performance '93: Proceedings of the 16th IFIP Working Group 7.3 international symposium on Computer performance modeling measurement and evaluation
Eliminating Interlocks in Deeply Pipelined Processors by Delay Enforced Multistreaming

The delay enforced multistreaming (DEMUS) processor architecture provides a simple, inexpensive ways of achieving high hardware utilization in deeply pipelined processors. Multiple streams share the pipeline in an interleaved fashion. Both the data ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers

IEEE Transactions on Computers Volume 42, Issue 3

March 1993

130 pages

ISSN:0018-9340

Issue’s Table of Contents

Copyright © Copyright © 1993 IEEE. All Rights Reserved.

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 March 1993

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Drach NSeznec AWolfe AMangione-Smith W(1993)MIDEEProceedings of the 26th annual international symposium on Microarchitecture10.5555/255235.255286(193-201)Online publication date: 1-Dec-1993
https://dl.acm.org/doi/10.5555/255235.255286

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents