Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/30350.30351acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article
Free access

Branch folding in the CRISP microprocessor: reducing branch delay to zero

Published: 01 June 1987 Publication History

Abstract

A new method of implementing branch instructions is presented. This technique has been implemented in the CRISP Microprocessor. With a combination of hardware and software techniques the execution time cost for many branches can be effectively reduced to zero. Branches are folded into other instructions, making their execution as separate instructions unnecessary. Branch Folding can reduce the apparent number of instructions needed to execute a program by the number of branches in that program, as well as reducing or eliminating pipeline breakage. Statistics are presented demonstrating the effectiveness of Branch Folding and associated techniques used in the CRISP Microprocessor.

References

[1]
A.D. Berenbaum, B. W. Colbry, D. R. Ditzel, R. D. Freeman, H. R. McLellan, K. J. O'Connor, and M. Shoji, "A Pipelined 32b Microprocessor with 13Kb of Cache Memory," Proceedings of the 1987 International Solid State Circuits Conference, pp. 34-35 (February, 1987).
[2]
D.R. Ditzel, H. R. McLellan, and A. D. Berenbaum, "The Hardware Architecture of the CRISP Microprocessor," Proceedings of the 14th Annual Symposium on Computer Architecture (June 2-5, 1987).
[3]
D. Morris and R. N. Ibbet, The MU5 Computer System, Springer-Verlag (1979), p. 59.
[4]
Douglas W. Clark and Henry M. Levy, "Measurement and Analysis of Instruction Use in the VAX-11/780," The 9th Annual Symposium on Computer Architecture 10(3), pp. 9-17 (April, 1982).
[5]
Cheryl A. Wiecek, "A Case Study of VAX-II Instruction Set Usage for Compiler Execution," Proceedings of the Symposium on Architectural Support for Programming Languages and Operating Systems, pp. 177-184 (March 1982).
[6]
L. J. Shustek, Analysis and Performance of Computer Instruction Sets, Stanford Linear Accelerator Center (May 1978). Ph.D. Dissertation
[7]
Werner Bucholz, Editor, Planning a Computer System: Project Stretch, McGraw-Hill (1962), pp. 238-239.
[8]
George Radin, "The 801 Minicomputer," Proceedings of the Symposium on Architectural Support for Programming Languages and Operating Systems, pp. 39-47 (March, 1982).
[9]
David A. Patterson, "RISC-I: A Reduced Instruction Set VLSI Computer," Proceedings of the 8th International Symposium on Computer Architecture (May 1981).
[10]
J. L. Hennessy, N. Jouppi, F. Baskett, and J. Gill, "MIPS: A VLSI Processor Architecture," Proceedings of the CMU Conference on VLSI Systems and Computations (October 1981).
[11]
J. Moussouris, L. Crudele, D. Freitas, C. Hansen, E. Hudson, R. March, S. Przybylski, T. Riordan, C. Rowan, and D. Van't Hof, "A CMOS RISC Processor with Integrated System Functions," Spring COMPCON 1986, p. 126.
[12]
J. S. Birnbaum and W. S. Worley, "Beyond RISC: High- Precision Architecture," Spring COMPCON 1986, p. 40.
[13]
S. McFarling and J. Hennessy, "Reducing the Cost of Branches," Proceedings of the 13th Annual International Symposium on Computer Architecture, pp. 396-403.
[14]
R. W. Holgate and R. N. Ibbet, "An Analysis of Instruction-Fetching Strategies in Pipelined Computers," IEEE Transactions on Computers C-29(4), pp. 325-329 (April 1980).
[15]
D. Morris and R. N. Ibbet, The MU5 Computer System, Springer-Verlag (1979).
[16]
D. W. Anderson, "The System/360 Model 91: Machine Philosophy and Instruction Handling," IBM Journal of Research and Development 11(8), pp. 8-24 (January 1967).
[17]
W. D. Connors, "The IBM 3033: An Inside Look," Datamation, pp. 198-218 (May 1979).
[18]
H. Schorr, "Design Principles for a High-Performance System," Proceedings of the Symposium on Computers and Automata XXI, pp. 165-192 (April, 1971).
[19]
Robert G. Wedig and Marc A. Rose, "The Reduction of Branch Instruction Execution Overhead Using Structured Control Flow," The 11th Annual International Symposium on Computer Architecture 12, pp. 119-125, 3 (June, 1984).
[20]
J. K. F. Lee and A. J. Smith, "Branch Prediction Strategies and Branch Target Buffer Design," Computer 17(1) (January, 1984).
[21]
A. D. Berenbaum, D. R. Ditzel, and H. R. McLellan, "Introduction to the CRISP Instruction Set Architecture," Proceedings of the 1987 Spring COMPCON, pp. 86-90 (February, 1987).
[22]
M.G.H. Katevenis, Reduced Instruction Set Computers for VLSI, MIT Press (1984), p. 150.
[23]
R.D. Russell, "The PDP-II: A Case Study of How Not to Design Condition Codes," Proceedings of the 5th Annual Symposium on Computer Architecture, pp. 190-194 (April 1978).
[24]
J. L. Hennessy, N. Jouppi, F. Baskett, and J. Gill, "Hardware/Software Tradeoffs for Increased Performance," Proceedings of the Symposium on Architectural Support for Programming Languages and Operating Systems, pp. 2-11 (March 1982).
[25]
James E. Smith, "A Study of Branch Prediction Strategies," Proceedings of the 8th International Symposium on Computer Architecture, pp. 135-148 (June, 1981).
[26]
J. K. F. Lee and A. J. Smith, "Branch Prediction Strategies and Branch Target Buffer Design," Computer 17(1) (January, 1984).
[27]
S. Bandyopadhyay, V. Begwani, and R. Murray, "Compiling for the CRISP Microprocessor," Proceedings of the Spring 1987 COMPCON, pp. 96-100 (February, 1987).
[28]
J. L. Hennessy and T. R. Gross, "Optimizing Branch Delays," Computer Systems Lab Technical Report, Stanford University (1981).
[29]
Peter M. Kogge, The Architecture of Pipelined Computers, McGraw-Hill (1981), pp. 237-243.
[30]
Hubert Rae McLellan, Jr., "Instruction Prefetch Strategies in a Pipelined Processor," Master of Science Thesis, Massachusetts Institute of Technology (February 1983).

Cited By

View all
  • (2024)Constable: Improving Performance and Power Efficiency by Safely Eliminating Load Instruction Execution2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00017(88-102)Online publication date: 29-Jun-2024
  • (2014)Approaching overhead-free execution on FPGA soft-processors2014 International Conference on Field-Programmable Technology (FPT)10.1109/FPT.2014.7082760(99-106)Online publication date: Dec-2014
  • (2005)DataflowProceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 200510.1109/ISPASS.2005.1430572(177-186)Online publication date: 20-Mar-2005
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '87: Proceedings of the 14th annual international symposium on Computer architecture
June 1987
321 pages
ISBN:0818607769
DOI:10.1145/30350
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1987

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ISCA87
Sponsor:

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)310
  • Downloads (Last 6 weeks)21
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Constable: Improving Performance and Power Efficiency by Safely Eliminating Load Instruction Execution2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00017(88-102)Online publication date: 29-Jun-2024
  • (2014)Approaching overhead-free execution on FPGA soft-processors2014 International Conference on Field-Programmable Technology (FPT)10.1109/FPT.2014.7082760(99-106)Online publication date: Dec-2014
  • (2005)DataflowProceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 200510.1109/ISPASS.2005.1430572(177-186)Online publication date: 20-Mar-2005
  • (2001)Formal specifications of debuggersACM SIGPLAN Notices10.1145/609769.60977836:9(54-63)Online publication date: 1-Sep-2001
  • (2001)Denotational semantics of programming languages and compiler generation in PowerEpsilonACM SIGPLAN Notices10.1145/609769.60977736:9(39-53)Online publication date: 1-Sep-2001
  • (2001)Tricks to animating characters with a computerACM SIGGRAPH Computer Graphics10.1145/563693.56370635:2(45-47)Online publication date: 1-May-2001
  • (2001)Computer graphics around the worldACM SIGGRAPH Computer Graphics10.1145/563693.56369735:2(22-27)Online publication date: 1-May-2001
  • (2001)Realize network subsystem QoS guaranteeACM SIGOPS Operating Systems Review10.1145/383237.38324435:3(67-71)Online publication date: 1-Jul-2001
  • (2001)Speeding up control-dominated applications through microarchitectural customizations in embedded processorsProceedings of the 38th annual Design Automation Conference10.1145/378239.379014(512-517)Online publication date: 22-Jun-2001
  • (2000)A top-down operating systems courseACM SIGOPS Operating Systems Review10.1145/506117.50612634:3(69-80)Online publication date: 1-Jul-2000
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media