Article

Free access

Branch folding in the CRISP microprocessor: reducing branch delay to zero

Authors:

H. R. McLellanAuthors Info & Claims

ISCA '87: Proceedings of the 14th annual international symposium on Computer architecture

Pages 2 - 8

https://doi.org/10.1145/30350.30351

Published: 01 June 1987 Publication History

Abstract

A new method of implementing branch instructions is presented. This technique has been implemented in the CRISP Microprocessor. With a combination of hardware and software techniques the execution time cost for many branches can be effectively reduced to zero. Branches are folded into other instructions, making their execution as separate instructions unnecessary. Branch Folding can reduce the apparent number of instructions needed to execute a program by the number of branches in that program, as well as reducing or eliminating pipeline breakage. Statistics are presented demonstrating the effectiveness of Branch Folding and associated techniques used in the CRISP Microprocessor.

References

[1]

A.D. Berenbaum, B. W. Colbry, D. R. Ditzel, R. D. Freeman, H. R. McLellan, K. J. O'Connor, and M. Shoji, "A Pipelined 32b Microprocessor with 13Kb of Cache Memory," Proceedings of the 1987 International Solid State Circuits Conference, pp. 34-35 (February, 1987).

[2]

D.R. Ditzel, H. R. McLellan, and A. D. Berenbaum, "The Hardware Architecture of the CRISP Microprocessor," Proceedings of the 14th Annual Symposium on Computer Architecture (June 2-5, 1987).

Digital Library

[3]

D. Morris and R. N. Ibbet, The MU5 Computer System, Springer-Verlag (1979), p. 59.

[4]

Douglas W. Clark and Henry M. Levy, "Measurement and Analysis of Instruction Use in the VAX-11/780," The 9th Annual Symposium on Computer Architecture 10(3), pp. 9-17 (April, 1982).

Digital Library

[5]

Cheryl A. Wiecek, "A Case Study of VAX-II Instruction Set Usage for Compiler Execution," Proceedings of the Symposium on Architectural Support for Programming Languages and Operating Systems, pp. 177-184 (March 1982).

Digital Library

[6]

L. J. Shustek, Analysis and Performance of Computer Instruction Sets, Stanford Linear Accelerator Center (May 1978). Ph.D. Dissertation

Digital Library

[7]

Werner Bucholz, Editor, Planning a Computer System: Project Stretch, McGraw-Hill (1962), pp. 238-239.

Digital Library

[8]

George Radin, "The 801 Minicomputer," Proceedings of the Symposium on Architectural Support for Programming Languages and Operating Systems, pp. 39-47 (March, 1982).

Digital Library

[9]

David A. Patterson, "RISC-I: A Reduced Instruction Set VLSI Computer," Proceedings of the 8th International Symposium on Computer Architecture (May 1981).

Digital Library

[10]

J. L. Hennessy, N. Jouppi, F. Baskett, and J. Gill, "MIPS: A VLSI Processor Architecture," Proceedings of the CMU Conference on VLSI Systems and Computations (October 1981).

[11]

J. Moussouris, L. Crudele, D. Freitas, C. Hansen, E. Hudson, R. March, S. Przybylski, T. Riordan, C. Rowan, and D. Van't Hof, "A CMOS RISC Processor with Integrated System Functions," Spring COMPCON 1986, p. 126.

[12]

J. S. Birnbaum and W. S. Worley, "Beyond RISC: High- Precision Architecture," Spring COMPCON 1986, p. 40.

[13]

S. McFarling and J. Hennessy, "Reducing the Cost of Branches," Proceedings of the 13th Annual International Symposium on Computer Architecture, pp. 396-403.

Digital Library

[14]

R. W. Holgate and R. N. Ibbet, "An Analysis of Instruction-Fetching Strategies in Pipelined Computers," IEEE Transactions on Computers C-29(4), pp. 325-329 (April 1980).

[15]

D. Morris and R. N. Ibbet, The MU5 Computer System, Springer-Verlag (1979).

[16]

D. W. Anderson, "The System/360 Model 91: Machine Philosophy and Instruction Handling," IBM Journal of Research and Development 11(8), pp. 8-24 (January 1967).

Digital Library

[17]

W. D. Connors, "The IBM 3033: An Inside Look," Datamation, pp. 198-218 (May 1979).

[18]

H. Schorr, "Design Principles for a High-Performance System," Proceedings of the Symposium on Computers and Automata XXI, pp. 165-192 (April, 1971).

[19]

Robert G. Wedig and Marc A. Rose, "The Reduction of Branch Instruction Execution Overhead Using Structured Control Flow," The 11th Annual International Symposium on Computer Architecture 12, pp. 119-125, 3 (June, 1984).

Digital Library

[20]

J. K. F. Lee and A. J. Smith, "Branch Prediction Strategies and Branch Target Buffer Design," Computer 17(1) (January, 1984).

[21]

A. D. Berenbaum, D. R. Ditzel, and H. R. McLellan, "Introduction to the CRISP Instruction Set Architecture," Proceedings of the 1987 Spring COMPCON, pp. 86-90 (February, 1987).

[22]

M.G.H. Katevenis, Reduced Instruction Set Computers for VLSI, MIT Press (1984), p. 150.

Digital Library

[23]

R.D. Russell, "The PDP-II: A Case Study of How Not to Design Condition Codes," Proceedings of the 5th Annual Symposium on Computer Architecture, pp. 190-194 (April 1978).

Digital Library

[24]

J. L. Hennessy, N. Jouppi, F. Baskett, and J. Gill, "Hardware/Software Tradeoffs for Increased Performance," Proceedings of the Symposium on Architectural Support for Programming Languages and Operating Systems, pp. 2-11 (March 1982).

Digital Library

[25]

James E. Smith, "A Study of Branch Prediction Strategies," Proceedings of the 8th International Symposium on Computer Architecture, pp. 135-148 (June, 1981).

Digital Library

[26]

J. K. F. Lee and A. J. Smith, "Branch Prediction Strategies and Branch Target Buffer Design," Computer 17(1) (January, 1984).

[27]

S. Bandyopadhyay, V. Begwani, and R. Murray, "Compiling for the CRISP Microprocessor," Proceedings of the Spring 1987 COMPCON, pp. 96-100 (February, 1987).

[28]

J. L. Hennessy and T. R. Gross, "Optimizing Branch Delays," Computer Systems Lab Technical Report, Stanford University (1981).

[29]

Peter M. Kogge, The Architecture of Pipelined Computers, McGraw-Hill (1981), pp. 237-243.

[30]

Hubert Rae McLellan, Jr., "Instruction Prefetch Strategies in a Pipelined Processor," Master of Science Thesis, Massachusetts Institute of Technology (February 1983).

Cited By

Bera RRanganathan ARakshit JMahto SNori AGaur JOlgun AKanellopoulos KSadrosadati MSubramoney SMutlu O(2024)Constable: Improving Performance and Power Efficiency by Safely Eliminating Load Instruction Execution2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00017(88-102)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00017
LaForest CAnderson JSteffan J(2014)Approaching overhead-free execution on FPGA soft-processors2014 International Conference on Field-Programmable Technology (FPT)10.1109/FPT.2014.7082760(99-106)Online publication date: Dec-2014
https://doi.org/10.1109/FPT.2014.7082760
Budiu MArtigas PGoldstein S(2005)DataflowProceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 200510.1109/ISPASS.2005.1430572(177-186)Online publication date: 20-Mar-2005
https://dl.acm.org/doi/10.1109/ISPASS.2005.1430572
Show More Cited By

Index Terms

Branch folding in the CRISP microprocessor: reducing branch delay to zero
1. Applied computing
  1. Computers in other domains
    1. Personal computers and PC applications
      1. Microcomputers
2. General and reference
  1. Cross-computing tools and techniques
    1. Design

Recommendations

Reducing the penalty of branch and load hazards in pipelined microprocessors
Speculative Branch Folding for Pipelined Processors

This paper proposes an effective branch folding technique which combines branch instructions with predicted instructions. This technique can be implemented using an instruction queue, which buffers prefetched instructions. Most of the instructions in ...
Reducing Branch Delay to Zero in Pipelined Processors

A mechanism to reduce the cost of branches in pipelined processors is described and evaluated. It is based on the use of multiple prefetch, early computation of the target address, delayed branch, and parallel execution of branches. The implementation ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '87: Proceedings of the 14th annual international symposium on Computer architecture

June 1987

321 pages

ISBN:0818607769

DOI:10.1145/30350

Editor:
D. St. Clair

Copyright © 1987 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1987

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ISCA87

Sponsor:

SIGARCH

ISCA87: The 14th Annual International Symposium on Computer Architecture

June 2 - 5, 1987

Pennsylvania, Pittsburgh, USA

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

129
Total Citations
View Citations
1,211
Total Downloads

Downloads (Last 12 months)310
Downloads (Last 6 weeks)21

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bera RRanganathan ARakshit JMahto SNori AGaur JOlgun AKanellopoulos KSadrosadati MSubramoney SMutlu O(2024)Constable: Improving Performance and Power Efficiency by Safely Eliminating Load Instruction Execution2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00017(88-102)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00017
LaForest CAnderson JSteffan J(2014)Approaching overhead-free execution on FPGA soft-processors2014 International Conference on Field-Programmable Technology (FPT)10.1109/FPT.2014.7082760(99-106)Online publication date: Dec-2014
https://doi.org/10.1109/FPT.2014.7082760
Budiu MArtigas PGoldstein S(2005)DataflowProceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 200510.1109/ISPASS.2005.1430572(177-186)Online publication date: 20-Mar-2005
https://dl.acm.org/doi/10.1109/ISPASS.2005.1430572
Zhu M(2001)Formal specifications of debuggersACM SIGPLAN Notices10.1145/609769.60977836:9(54-63)Online publication date: 1-Sep-2001
https://dl.acm.org/doi/10.1145/609769.609778
Zhu M(2001)Denotational semantics of programming languages and compiler generation in PowerEpsilonACM SIGPLAN Notices10.1145/609769.60977736:9(39-53)Online publication date: 1-Sep-2001
https://dl.acm.org/doi/10.1145/609769.609777
Lasseter J(2001)Tricks to animating characters with a computerACM SIGGRAPH Computer Graphics10.1145/563693.56370635:2(45-47)Online publication date: 1-May-2001
https://dl.acm.org/doi/10.1145/563693.563706
Shi JPan Z(2001)Computer graphics around the worldACM SIGGRAPH Computer Graphics10.1145/563693.56369735:2(22-27)Online publication date: 1-May-2001
https://dl.acm.org/doi/10.1145/563693.563697
Wen JLu X(2001)Realize network subsystem QoS guaranteeACM SIGOPS Operating Systems Review10.1145/383237.38324435:3(67-71)Online publication date: 1-Jul-2001
https://dl.acm.org/doi/10.1145/383237.383244
Petrov POrailoglu ARabaey J(2001)Speeding up control-dominated applications through microarchitectural customizations in embedded processorsProceedings of the 38th annual Design Automation Conference10.1145/378239.379014(512-517)Online publication date: 22-Jun-2001
https://dl.acm.org/doi/10.1145/378239.379014
Creak GSheehan R(2000)A top-down operating systems courseACM SIGOPS Operating Systems Review10.1145/506117.50612634:3(69-80)Online publication date: 1-Jul-2000
https://dl.acm.org/doi/10.1145/506117.506126
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents