Article

Free access

Control flow optimization for supercomputer scalar processing

Authors:

Pohua P. Chang,

Wen-mei W. HwuAuthors Info & Claims

ICS '89: Proceedings of the 3rd international conference on Supercomputing

Pages 145 - 153

https://doi.org/10.1145/318789.318806

Published: 01 June 1989 Publication History

Abstract

Control intensive scalar programs pose a very different challenge to highly pipelined supercomputers than vectorizable numeric applications. Function call/return and branch instructions disrupt the flow of instructions through the pipeline, degrading the utilization of the pipelined datapaths. This paper describes control flow optimization for scalar processing using an optimizing compiler. To obtain program control flow information, a system independent profiler has been integrated into the IMPACT-I C compiler. The control flow information obtained is converted into a weighted control graph. Based on the weighted control graph, function inline expansion, multi-way branch layout, and software branch prediction can be implemented. Using better compiler technology results in a very low cost hardware control unit (architecture) for high performance scalar processors.

References

[1]

P.M. Kogge, The Architecture of Pipelined Computers, pp. 237-243, McGraw-Hill, 1981.

[2]

M. Auslander and M. Hopkins, "An Overview of the PL.8 Compiler," Proceedings of the SIGPLAN Symposium on Compiler Construction, ACM, June 1982.

Digital Library

[3]

R.M. Stallman, Internals of GNU CC, Free Software Foundation, Inc., 1988.

[4]

F. Chow and J. Hennessy, "Register Allocation by Priority-bases Coloring," Proceedings of the ACM SIGPLAN Symposium on Compiler Constructions, pp. 222-232, June 17-22, 1984.

Digital Library

[5]

C.A. Huson, An In-line Subroutine Expander for Parafrase, M.S. Thesis, University of Illinois at Urbana-Champaign, 1982.

[6]

R. Allen and S. Johnson, "Compiling C for Vectorization, Parallelism, and Inline Expansion," Proceedings of the ACM SIGPLAN '88 Conference on Programming Language Design and Implementation, pp. 241-249, June 22-24, 1988.

Digital Library

[7]

D.A. Patterson and C. H. Sequin, "A VLSI RISC," IEEE Computer, pp. 8 - 21, September, 1982.

Digital Library

[8]

D.R. Ditzel, H. R. McleUan, and A. D. Berenbaum, "The Hardware Architecture of the CRISP Microprocessor," Proceedings of the 14th Annual International Symposium on Computer Architecture, Pittsburgh, Pennsylvania, June 2-5, 1987.

Digital Library

[9]

S. McFarling and J.L. Hennessy, "Reducing the Cost of Branches," The 13th International Symposium on Computer Architecture Conference Proceedings, pp. 396-403, Tokyo, Japan, June 1986.

Digital Library

[10]

J. Emer and D. Clark, "A Characterization of Processor Performance in the VAX-11/780," Proceedings of the 1}th Annual Symposium on Computer Architecture, June 1984.

Digital Library

[11]

S. Weiss and J. E. Smith, "Instruction Issue Logic in Pipelined Supercomputers," IEEE Transactions on Computers, vol. C-33, pp. 1013--1022, IEEE, November 1984.

[12]

Y.N. Patt, W. W. Hwu, M. C. Shebanow, "HPS, A New Microarchitecture: Rationale and Introduction,'' Proceedings of the 18th International Microprogramming Workshop, pp. 103-108, Asilomar, CA, Dec. 1985.

Digital Library

[13]

W.W. Hwu, "Exploiting Concurrency to Achieve High Performance in a Single-chip Microarchitecture," Ph.D. Dissertation, Computer Science Division Report, vol. no. UCB/CSD 88/398, University of California, Berkeley, January 1988.

Digital Library

[14]

Ramon D. Acosta, Jacob Kjelstrup, and H.C. Tomg, "An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors," IEEE Transactions on Computers, vol. C-35, no. 9, September 1986.

Digital Library

[15]

J.K.F. Lee and A. J. Smith, "Branch Prediction Strategies and Branch Target Buffer Design," IEEE Computer, January 1984.

[16]

J.E. Smith, "A Study of Branch Prediction Strategies,'' Proceedings of the 8th International Symposium of Computer Architecture, pp. 135 - 148, June, 1981.

Digital Library

[17]

J.A. DeRosa and H. M. Levy, "An Evaluation of Branch Architectures," Proceedings of the i4th Annual Symposium on Computer Architecture, June 1987.

Digital Library

[18]

D.R. Ditzel and H. R. McLellan, "Branch Folding in the CRISP Microprocessor: Reducing Branch Delay to Zero," Proceedings of the I4th Annual International Symposium on Computer Architecture, pp. 2- 9, Pittsburgh, Pennsylvania, June 2-5, 1987.

Digital Library

[19]

Shebanow, M.C. and Part, Y.N., "Autocorrelafion Branch Prediction," in preparation.

[20]

Wen-mei W. Hwu, Thomas M. Conte, and Pohua P. Chang, "Comparing Software and Hardware Schemes For Reducing the Cost of Branches," Proceedings of the 16th Annual Symposium on Corr~uter Architecture, May 1989.

Digital Library

[21]

G. Radin, "The 801 Minicomputer," Proceedings of the Symposium on Architectural Support for Programming Languages and Operating Systems, pp. 39 - 47, March 1982.

Digital Library

[22]

J.L. Hennessy, N. louppi, F. Baskett, and J. Gill, "MIPS: A VLSI Processor Architecture," Proceedings of the CMU Conference on VLSi Systems and Computations, October 1981.

[23]

J.S. Birnbaum and W. S. Worley, "Beyond RISC: High Precision Architecture," Spring COMPCON, p. 40, 1986.

[24]

M. Hill and et al, "Design Decisions in SPUR," IEEE Computer, pp. 8 - 22, November 1986.

Digital Library

[25]

P. Chow and M. Horowitz, "Architecture Tradeoffs in the Design of MIPS-X," Proceedings of the 14th Annual International Symposium on Computer Architecture, Pittsburgh, Pennsylvania, June 2-5, 1987.

Digital Library

[26]

Gerry Kane, MIPS R2000 RiSC ARCHITECTURE, Prentice Hall, Englewood Cliffs, NJ 07632, 1987.

[27]

Charles Melear, "The Design of the 88000 RISC Family," IEEE MICRO, pp. 26-38, April 1989.

Digital Library

[28]

W.W. Hwu and P. P. Chang, "Inline Function Expansion for Compiling C Programs," ACM SIG- PLAN '89 Conference on Programming Language Design and Implementation, Portland, Oregon, June 21-23, 1989.

Digital Library

[29]

P.P. Chang and W. W. Hwu, "Trace Selection for Compiling Large C Application Programs to Microcode," Proceedings of the 21st Annual Workshop on Microprogramming and Microarchitectures, San Diego, California, November 29 - December 2 1988.

Digital Library

[30]

j.R. Ellis, Bulldog: A Compiler for VLiW Architectures, The MiT Press, 1986.

Digital Library

[31]

J.A. Fisher, "Trace Scheduling: A Technique for Global Microcode Compaction," IEEE Transactions on Computers, vol. vol. c-30, no.7, pp. 478- 490, IEEE, July 1981.

Digital Library

[32]

Wen-mei W. Hwu and Pohua P. Chang, "Achieving High Instruction Cache Performance with an Optimizing Compiler," Proceedings of the 16th Annual Symposium on Computer Architecture, May 1989.

Digital Library

Cited By

Chang PMahlke SChen WWarter NHwu W(1998)IMPACT25 years of the international symposia on Computer architecture (selected papers)10.1145/285930.286000(408-417)Online publication date: 1-Aug-1998
https://dl.acm.org/doi/10.1145/285930.286000
Hwu WChang P(1992)Efficient Instruction Sequencing with Inline Target InsertionIEEE Transactions on Computers10.1109/12.21466241:12(1537-1551)Online publication date: 1-Dec-1992
https://dl.acm.org/doi/10.1109/12.214662
Chang PMahlke SChen WWarter NHwu W(1991)IMPACTACM SIGARCH Computer Architecture News10.1145/115953.11597919:3(266-275)Online publication date: 1-Apr-1991
https://dl.acm.org/doi/10.1145/115953.115979
Show More Cited By

Recommendations

Highly concurrent scalar processing
ISCA '86: Proceedings of the 13th annual international symposium on Computer architecture

High speed scalar processing is an essential characteristic of high performance general purpose computer systems. Highly concurrent execution of scalar code is difficult due to data dependencies and conditional branches. This paper proposes an ...
Super-scalar processor design
Highly concurrent scalar processing
Special Issue: Proceedings of the 13th annual international symposium on Computer architecture (ISCA '86)

High speed scalar processing is an essential characteristic of high performance general purpose computer systems. Highly concurrent execution of scalar code is difficult due to data dependencies and conditional branches. This paper proposes an ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '89: Proceedings of the 3rd international conference on Supercomputing

June 1989

484 pages

ISBN:0897913094

DOI:10.1145/318789

Chairmen:
George Paul
IBM
,
T. Papatheodorou
CTI, Greese
,
D. Gannon,
E. N. Pudue

Copyright © 1989 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

AICA: Assoc Italianai de Calcolo Automatico
Computer Tech Inst.: Computer Technology Institute
SIGARCH: ACM Special Interest Group on Computer Architecture
SIAM: Society for Industrial and Applied Mathematics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1989

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ICS89

Sponsor:

AICA
Computer Tech Inst.
SIGARCH
SIAM

ICS89: International Conference on Supercomputing 89

June 5 - 9, 1989

Crete, Greece

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
451
Total Downloads

Downloads (Last 12 months)60
Downloads (Last 6 weeks)13

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chang PMahlke SChen WWarter NHwu W(1998)IMPACT25 years of the international symposia on Computer architecture (selected papers)10.1145/285930.286000(408-417)Online publication date: 1-Aug-1998
https://dl.acm.org/doi/10.1145/285930.286000
Hwu WChang P(1992)Efficient Instruction Sequencing with Inline Target InsertionIEEE Transactions on Computers10.1109/12.21466241:12(1537-1551)Online publication date: 1-Dec-1992
https://dl.acm.org/doi/10.1109/12.214662
Chang PMahlke SChen WWarter NHwu W(1991)IMPACTACM SIGARCH Computer Architecture News10.1145/115953.11597919:3(266-275)Online publication date: 1-Apr-1991
https://dl.acm.org/doi/10.1145/115953.115979
Chang PMahlke SChen WWarter NHwu WVranesic Z(1991)IMPACTProceedings of the 18th annual international symposium on Computer architecture10.1145/115952.115979(266-275)Online publication date: 1-Apr-1991
https://dl.acm.org/doi/10.1145/115952.115979
Chang PMahlke SHwu W(1991)Using profile information to assist classic code optimizationsSoftware—Practice & Experience10.1002/spe.438021120421:12(1301-1321)Online publication date: 1-Dec-1991
https://dl.acm.org/doi/10.1002/spe.4380211204

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents