Article

Free access

Overlapped loop support in the Cydra 5

Authors:

James C. Dehnert,

Peter Y.-T. Hsu,

Joseph P. BrattAuthors Info & Claims

ASPLOS III: Proceedings of the third international conference on Architectural support for programming languages and operating systems

Pages 26 - 38

https://doi.org/10.1145/70082.68185

Published: 01 April 1989 Publication History

Abstract

The Cydra^TM 5 architecture adds unique support for overlapping successive iterations of a loop to a very long instruction word (VLIW) base. This architecture allows highly parallel loop execution for a much larger class of loops than can be vectorized, without requiring the unrolling of loops usually used by compilers for VLIW machines. This paper discusses the Cydra 5 loop scheduling model, the special architectural features which support it, and the loop compilation techniques used to take full advantage of the architecture.

References

[1]

Allen, J.R., "Dependence Analysis for Subscripted Variables and its Application to Program Transformations," Ph.D. Thesis, Rice University, April 1983.

Digital Library

[2]

Allen, J.R., and K. Kennedy, "Automatic Translation of FORTRAN Programs to Vector Form," Transactions on Programming Languages and Systems, October 1987.

Digital Library

[3]

Allen, J.R., K. Kennedy, C. Portertield, and J. Warren, "Conversion of Control Dependence to Data Dependence," Proc. of the 10th Annual ACM Syrup. on Principals of Programming Languages, January 1983.

Digital Library

[4]

Banerjee, U., "Data Dependence in Ordinary Programs," Report No. UiUCDCS-R-76-837 (M.S. Thesis), Dept. of Computer Science, University of Illinois at Urbma-Champaign, November 1976.

[5]

Callahan, D., I. Cocko, and K. Konn~y, "Estimating Interlock and improving Balance for Pipclined Architectures," Journal of Parallel and Distributed Computing, August 1988, pp. 334-358.

Digital Library

[6]

Charleswonh, A.E., "An Approach to Scientific Array Processing: The Architectural Design of the AP-120BIFPS-164 Family," iNEE Computer, September 1981, pp. 18-27.

[7]

Colwell, R.P., et al, "A VLIW Architecture for a Trace Scheduling Compiler," SIGPLAN Notices 22, 10 (Proc. of ASPLOS II), Octobcr 1987.

[8]

Ellis, J.R., Bulldog: A Compiler for VI2W Architectures, MIT Press, Cambridge, Mass., 1986.

Digital Library

[9]

Ferrante, J., "What's in a Name, Or The Value of Renaming for ParaUelism De~on and Storage AHocation," Technical Report //12157, IBM Thomas J. Watson Research Center, January 1987.

[10]

Fisher, J.A., "Very Long Instruction Word Architectures and the ELI-512," Proc. of the 10th Annual Int'l Syrup. on Computer Architecture, june 1983.

Digital Library

[11]

Hennessy, J., N. Iouppi, S. Przbysld, C. Rowen, T. Gross, F. Baakett, and J. Gill, "MIPS: A Microprocessor Architecture," 15th Annual Workshop on Microprogrammlng, October 1982, pp. 17-22.

Digital Library

[12]

Hsu, Peter Y.-T., Highly Concurrent Scalar Processing, Ph.D. Thesis, Univ~sity of Illinois at Urbana- Champaign, 1986.

Digital Library

[13]

Joshi, S.M., and D.M. Dhamdhere, "A Composite Hoisting - Strength Reduction Transformation for Global Program Optimization," Parts I and II, Int'l J. Computer Math., VoL 11, 1982, pp. 21=41, 111-126.

[14]

Kuck, DJ., et al, "Dependence Graphs and Compiler Optimizations," Proc. of the 8th Annual ACM Syrup. on Principles of Programming Languages, January 1981.

Digital Library

[15]

Lain, M.S.-L., "A Systolic Array Optimizing Compiler," Ph.D. Thesis, Carnegie Mellon University, May 1987.

[16]

McMahon, F.H., '~he Livemx)re Fortran Kernels: A Computer Test of the Numcrical Performance Range," Technical Report UCRL-53745, Lawrence Livcrmorc National Laboratory, December 1986.

[17]

Rau, B.R., and C.D. Glaeser, "Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Performance Scientific Computing," Proc. of the 14th Annual Microprogramming Workshop, October 1982.

Digital Library

[18]

Rau, B.R., C.D. Glaescr, and R.L. Picard, "Efficient Code Generation for Horizontal Architectures: Compiler Techniques and Architectural Support," Pt~. of the 9th Annual Int'l Syrup. on Computer Architecture, 1982.

Digital Library

[19]

Rau, B.R., "Cydra 5 Dlrect~ Dataflow Architecture," Proc. of COMPCON 88, San Francisco, California, t988.

[20]

Rau, B.R., D.W.L. Yen, W. Yen, and R.A. Towle, "l'he Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions and Trade-offs," to appear in IEEE Computer special issue "Real Machines," 1989.

Digital Library

[21]

Schlansker, M., and M. McNamara, "The Cydra 5 Computer System Architecture," Proc. of ICCD'88, October, 1988.

[22]

Touzeau, R.F., "A Fortran Compiler for the FPS-164 Scientific Computer," SIGPLAN Notices (Proc. of the SiGPLAN '84 Syrup. on Compiler Construction), June 1984, pp. 48-57.

Digital Library

[23]

Towle, R.A., "Control and Data Dependence for Program Transformations," Ph.D. Thesis, University of Illinois at Urbana-Champaign, March 1976.

Digital Library

Cited By

Bodík RGupta R(2016)Array Data Flow Analysis for Load-Store Optimizations in Fine-Grain ArchitecturesInternational Journal of Parallel Programming10.1007/BF0335675724:6(481-512)Online publication date: 26-May-2016
https://doi.org/10.1007/BF03356757
Tyson GFarrens M(2016)Evaluating the Effects of Predicated Execution on Branch PredictionInternational Journal of Parallel Programming10.1007/BF0335674624:2(159-186)Online publication date: 26-May-2016
https://doi.org/10.1007/BF03356746
Aiken ABanerjee UKejariwal ANicolau AAiken ABanerjee UKejariwal ANicolau A(2016)Modulo SchedulingInstruction Level Parallelism10.1007/978-1-4899-7797-7_6(133-165)Online publication date: 30-Nov-2016
https://doi.org/10.1007/978-1-4899-7797-7_6
Show More Cited By

Index Terms

Overlapped loop support in the Cydra 5
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Very long instruction word
    2. Serial architectures
      1. Complex instruction set computing
      2. Reduced instruction set computing
2. Theory of computation
  1. Design and analysis of algorithms
    1. Approximation algorithms analysis
      1. Scheduling algorithms
    2. Online algorithms
      1. Online learning algorithms
        Scheduling algorithms
  2. Theory and algorithms for application domains
    1. Machine learning theory
      1. Reinforcement learning
        Sequential decision making

Recommendations

Overlapped loop support in the Cydra 5
Special issue: Proceedings of ASPLOS-III: the third international conference on architecture support for programming languages and operating systems

The Cydra^TM 5 architecture adds unique support for overlapping successive iterations of a loop to a very long instruction word (VLIW) base. This architecture allows highly parallel loop execution for a much larger class of loops than can be vectorized, ...
Outer-loop vectorization: revisited for short SIMD architectures
PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques

Vectorization has been an important method of using data-level parallelism to accelerate scientific workloads on vector machines such as Cray for the past three decades. In the last decade it has also proven useful for accelerating multi-media and ...
Loop striping: maximize parallelism for nested loops
EUC'06: Proceedings of the 2006 international conference on Embedded and Ubiquitous Computing

The majority of scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are generally applied to increase parallelism for these nested loops. Most of the existing loop transformation techniques ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS III: Proceedings of the third international conference on Architectural support for programming languages and operating systems

April 1989

303 pages

ISBN:0897913000

DOI:10.1145/70082

Chairman:
Joel Emer,
General Chair:
John Hennessy
Stanford University

ACM SIGARCH Computer Architecture News Volume 17, Issue 2
Special issue: Proceedings of ASPLOS-III: the third international conference on architecture support for programming languages and operating systems
April 1989
291 pages
ISSN:0163-5964
DOI:10.1145/68182
Editor:
Joel Emer
Issue’s Table of Contents

Copyright © 1989 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 1989

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ASPLOS89

Sponsor:

ASPLOS89: Int'l Conference on Architecture Support for Programming Lang & Operating Systems

April 3 - 6, 1989

Massachusetts, Boston, USA

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

160
Total Citations
View Citations
702
Total Downloads

Downloads (Last 12 months)81
Downloads (Last 6 weeks)11

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bodík RGupta R(2016)Array Data Flow Analysis for Load-Store Optimizations in Fine-Grain ArchitecturesInternational Journal of Parallel Programming10.1007/BF0335675724:6(481-512)Online publication date: 26-May-2016
https://doi.org/10.1007/BF03356757
Tyson GFarrens M(2016)Evaluating the Effects of Predicated Execution on Branch PredictionInternational Journal of Parallel Programming10.1007/BF0335674624:2(159-186)Online publication date: 26-May-2016
https://doi.org/10.1007/BF03356746
Aiken ABanerjee UKejariwal ANicolau AAiken ABanerjee UKejariwal ANicolau A(2016)Modulo SchedulingInstruction Level Parallelism10.1007/978-1-4899-7797-7_6(133-165)Online publication date: 30-Nov-2016
https://doi.org/10.1007/978-1-4899-7797-7_6
Aiken ABanerjee UKejariwal ANicolau AAiken ABanerjee UKejariwal ANicolau A(2016)Overview of ILP ArchitecturesInstruction Level Parallelism10.1007/978-1-4899-7797-7_2(9-42)Online publication date: 30-Nov-2016
https://doi.org/10.1007/978-1-4899-7797-7_2
Touati SDinechin B(2014)BibliographyAdvanced Backend Code Optimization10.1002/9781118625446.biblio(327-343)Online publication date: 3-Jun-2014
https://doi.org/10.1002/9781118625446.biblio
Warter NHwu W(2013)The code size advantage of predicted execution for software pipelining9th Computing in Aerospace Conference10.2514/6.1993-4671Online publication date: 18-Feb-2013
https://doi.org/10.2514/6.1993-4671
Bachir MCohen ATouati S(2012)On the effectiveness of register moves to minimise post-pass unrolling in software pipelined loops2012 International Conference on High Performance Computing & Simulation (HPCS)10.1109/HPCSim.2012.6266972(551-558)Online publication date: Jul-2012
https://doi.org/10.1109/HPCSim.2012.6266972
Bachir MTouati SBrault FGregg DCohen A(2012)Minimal Unroll Factor for Code Generation of Software PipeliningInternational Journal of Parallel Programming10.1007/s10766-012-0203-z41:1(1-58)Online publication date: 17-Jul-2012
https://doi.org/10.1007/s10766-012-0203-z
Allan VAllan S(2010)Software PipeliningThe Compiler Design Handbook10.1201/9781420040579.ch18Online publication date: 7-Mar-2010
https://doi.org/10.1201/9781420040579.ch18
Govindarajan R(2010)Instruction SchedulingThe Compiler Design Handbook10.1201/9781420040579.ch17Online publication date: 7-Mar-2010
https://doi.org/10.1201/9781420040579.ch17
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents