Article

Tailoring pipeline bypassing and functional unit mapping to application in clustered VLIW architectures

Authors:

Rodolfo Azevedo,

Paulo Centoducatte,

Guido AraujoAuthors Info & Claims

CASES '01: Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems

Pages 141 - 148

https://doi.org/10.1145/502217.502241

Published: 16 November 2001 Publication History

Abstract

In this paper we describe a design exploration methodology for clustered VLIW architectures. The central idea of this work is a set of three techniques aimed at reducing the cost of expensive inter-cluster copy operations. Instruction scheduling is performed using a list-scheduling algorithm that stores operand chains into the same register file. Functional units are assigned to clusters based on the application inter-cluster communication pattern. Finally, a careful insertion of pipeline bypasses is used to increase the number of data-dependencies that can be satisfied by pipeline register operands. Experimental results, using the SPEC95 benchmark and the IMPACT compiler, reveal a substantial reduction in the number of copies between clusters.

References

[1]

A. Abnous and N.Bagherzadeh.Pipelining and bypassing in a VLIW processor.IEEE Trans. on Parallel and Distributed Systems 5(6):658 -663,June 1994.

Digital Library

[2]

A.Abnous and N.Bagherzadeh.Architectural design and analysis of a VLIWprocessor.International Journal of Computers and Electrical Engineering 21(2):119 -142,1995.

[3]

P.S.Ahuja,D.W.Clark,and A.Rogers.The performance impact of incomplete bypassing in processor pipelines.In MICRO-28 1995.

Digital Library

[4]

A.Capitanio,N.Dutt,and A.Nicolau.Design considerations for limited connectivity VLIW architectures.Technical Report TR-92-59,University of California,Irvine,Irvine,CA 92717,1992.

[5]

A.Capitanio,N.Dutt,and A.Nicolau.Partitioned register .le for VLIWs:A preliminary analysis of tradeo .s.In 25th International Symposium on Microarchitecture (MICRO),1992.

Digital Library

[6]

J.R.Ellis.Bulldog: A Compiler for VLIW Architectures MIT Press,1986.

Digital Library

[7]

P.Faraboshchi,G.Desoli,and J.A.Fisher.Clustered instruction-level parallel processors.Technical Report Technical Report HPL-98-204,HP Labs,USA,1998.

[8]

M.M.Fernandes,J.Llosa,and N.Topham. Partitioned schedules for clustered VLIW architectures.In IEEE/ACM International Parallel Processing Symposium 1998.

Digital Library

[9]

J.A.Fisher.Trace scheduling:A technique for global microcode compaction.IEEE Trans. on Computers C-30(7):478 -490,July 1981.

Digital Library

[10]

W.W.Hwu et al.Impact advanced compiler technology. http://www.crhc.uiuc.edu/IMPACT/index.html.

[11]

M .F.Jacome,G.de Veciana,and V.Lapinskii. Exploring performance tradeo .s for clustered VLIW asips.In International Conference on Computer-Aided Design 2000.

Digital Library

[12]

C.Lee,C.Park,and M.Kim.E .cient algorithm for graph partitioning problem using a problem transformation method.Computer Aided Design 21(10):611,December 1989.

Digital Library

[13]

S.S.Muchnick.Advanced Compiler Design and Implementation Morgan Kaufmann,1997.

Digital Library

[14]

E.Ozer,S.Banerjia,and T.M.Conte.Uni .ed assign and schedule:A new approach to scheduling for clustered register .le microarchitectures.In 31th International Symposium on Microarchitecture (MICRO),1998.

Digital Library

[15]

E.Ozer and T.M.Conte.Optimal cluster scheduling for a VLIWmachine.Technical report,Dept.of Elec. and Comp.Eng.,North Carolina State University, 1998.

[16]

E.Ozer and T.M.Conte.Uni .ed cluster assignment and instruction scheduling for clustered VLIW microarchitectures.Technical report,Dept.of Elec. and Comp.Eng.,North Carolina State University, 1998.

[17]

V.K.R.Rau and S.Aditya.Machine-description driven compilers for EPIC and VLIW processors. Design Automation for Embedded Systems 4(2/3):71 -118,1999.

Digital Library

[18]

J.Sanchez and A.Gonzalez.The e .ectiveness of loop unrolling for modulo scheduling in clustered VLIW architectures.In Intl. Conference on Parallel Processing (ICPP),2000.

Digital Library

[19]

J.Sanchez and A.Gonzalez.Instruction scheduling for clustered VLIWarchitectures.In Intl. Symposium on System Synthesis (ISSS), 2000.

Digital Library

Cited By

Shrivastava ASanghyun PEarlie EDutt NNicolau AYunheung P(2007)Automatic Design Space Exploration of Register Bypasses in Embedded ProcessorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2007.90706626:12(2102-2115)Online publication date: 1-Dec-2007
https://dl.acm.org/doi/10.1109/TCAD.2007.907066
Shrivastava AEarlie EDutt NNicolau A(2006)Retargetable pipeline hazard detection for partially bypassed processorsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2006.87846814:8(791-801)Online publication date: 1-Aug-2006
https://dl.acm.org/doi/10.1109/TVLSI.2006.878468
Shrivastava ADutt NNicolau AEarlie E(2005)PBExploreProceedings of the conference on Design, Automation and Test in Europe - Volume 210.1109/DATE.2005.236(1264-1269)Online publication date: 7-Mar-2005
https://dl.acm.org/doi/10.1109/DATE.2005.236
Show More Cited By

Recommendations

Loop fusion for clustered VLIW architectures

Embedded systems require maximum performance from a processor within significant constraints in power consumption and chip cost. Using software pipelining, high-performance digital signal processors can often exploit considerable instruction-level ...
Loop fusion for clustered VLIW architectures
LCTES/SCOPES '02: Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems

Embedded systems require maximum performance from a processor within significant constraints in power consumption and chip cost. Using software pipelining, high-performance digital signal processors can often exploit considerable instruction-level ...
Loop transformations for clustered vliw architectures

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CASES '01: Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems

November 2001

258 pages

ISBN:1581133995

DOI:10.1145/502217

Conference Chairs:
Guang R. Gao
University of Delaware
,
Trevor Mudge
University of Michigan
,
General Chair:
Krishna Palem
Georgia Institute of Technology

Copyright © 2001 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

ARM: ARM
STARCORE: STARCORE
cadence: cadence
ACM: Association for Computing Machinery
NS: National Semicondutor
IBM: IBM

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 November 2001

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 52 of 230 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
328
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)1

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Shrivastava ASanghyun PEarlie EDutt NNicolau AYunheung P(2007)Automatic Design Space Exploration of Register Bypasses in Embedded ProcessorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2007.90706626:12(2102-2115)Online publication date: 1-Dec-2007
https://dl.acm.org/doi/10.1109/TCAD.2007.907066
Shrivastava AEarlie EDutt NNicolau A(2006)Retargetable pipeline hazard detection for partially bypassed processorsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2006.87846814:8(791-801)Online publication date: 1-Aug-2006
https://dl.acm.org/doi/10.1109/TVLSI.2006.878468
Shrivastava ADutt NNicolau AEarlie E(2005)PBExploreProceedings of the conference on Design, Automation and Test in Europe - Volume 210.1109/DATE.2005.236(1264-1269)Online publication date: 7-Mar-2005
https://dl.acm.org/doi/10.1109/DATE.2005.236
Kudlur MFan KChu MRavindran RClark NMahlke S(2004)FLASHProceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization10.5555/977395.977671Online publication date: 20-Mar-2004
https://dl.acm.org/doi/10.5555/977395.977671
Shrivastava AEarlie EDutt NNicolau AOrailoglu AChou PEles PJantsch A(2004)Operation tables for scheduling in the presence of incomplete bypassingProceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis10.1145/1016720.1016768(194-199)Online publication date: 8-Sep-2004
https://dl.acm.org/doi/10.1145/1016720.1016768
Kudlur MFan KChu MRavindran RClark NMahlke S(2004)FLASH: foresighted latency-aware scheduling heuristic for processors with customized datapathsInternational Symposium on Code Generation and Optimization, 2004. CGO 2004.10.1109/CGO.2004.1281675(201-212)Online publication date: 2004
https://doi.org/10.1109/CGO.2004.1281675
Fan KClark NChu MManjunath KRajiv Ravindran Smelyanskiy MMahlke S(2003)Systematic register bypass customization for application-specific processorsProceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 200310.1109/ASAP.2003.1212830(64-74)Online publication date: 2003
https://doi.org/10.1109/ASAP.2003.1212830

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten