research-article

CAeSaR: unified cluster-assignment scheduling and communication reuse for clustered VLIW processors

Authors:

Vasileios Porpodas,

Marcelo CintraAuthors Info & Claims

CASES '13: Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Article No.: 9, Pages 1 - 10

Published: 29 September 2013 Publication History

Abstract

Clustered architectures have been proposed as a solution to the scalability problem of wide ILP processors. VLIW architectures, being wide-issue by design, benefit significantly from clustering. Such architectures, being both statically scheduled and clustered, require specialized code generation techniques, as they require explicit Inter-Cluster Copy instructions (ICCs) be scheduled in the code stream. In this work we propose CAeSaR, a novel instruction scheduling algorithm that improves code generation for such architectures. It combines cluster assignment, instruction scheduling and inter-cluster communication reuse all in one single unified algorithm. The proposed algorithm improves performance by any phase-ordering issues among these three code generation and optimization steps. We evaluate CAeSaR on the MediabenchII and SPEC CINT2000 benchmarks and compare it against the state-of-the-art instruction scheduling algorithm. Our results show an improvement in execution time of up to 20.3%, and 13.8% on average, over the current state-of-the-art across the benchmarks.

References

[1]

Gcc: Gnu compiler collection. http://gcc.gnu.org.

[2]

SPEC benchmark. http://www.spec.org.

[3]

A. Branover et al. Amd fusion APU: Llano. IEEE Micro, 2012.

Digital Library

[4]

D. Burger et al. Scaling to the End of Silicon with EDGE Architectures. IEEE Computer, 2004.

Digital Library

[5]

R. Canal et al. Dynamic cluster assignment mechanisms. HPCA, 2000.

[6]

A. Capitanio et al. Partitioned register files for vliws: A preliminary analysis of tradeoffs. MICRO, 1992.

Digital Library

[7]

J. Dehnert, B. Grant, et al. The Transmeta Code Morphing Software: Using Speculation, Recovery, and Adaptive Retranslation to Address Real-Life Challenges. CGO, 2003.

Digital Library

[8]

G. Desoli. Instruction assignment for clustered vliw dsp compilers: A new approach. Technical Report, HP Labs, 1998.

[9]

J. Ellis. Bulldog: A compiler for VLIW architectures. MIT Press, Cambridge, MA, USA, 1986.

Digital Library

[10]

P. Faraboschi, G. Brown, et al. Lx: a technology platform for customizable vliw embedded processing. ISCA, 2000.

Digital Library

[11]

J. Fisher. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers, 1981.

Digital Library

[12]

J. Fridman and Z. Greenfield. The Tigersharc DSP architecture. IEEE Micro, 2000.

Digital Library

[13]

J. Fritts, F. Steiling, and J. Tucek. Mediabench II video: expediting the next generation of video systems research. SPIE, 2005.

[14]

M. Gebhart, B. Maher, et al. An evaluation of the TRIPS computer system. ASPLOS, 2009.

Digital Library

[15]

W. Havanki, S. Banerjia, and T. Conte. Treegion scheduling for wide issue processors. HPCA, 1998.

Digital Library

[16]

W.-M. W. Hwu, S. A. Mahlke, et al. The superblock: An effective technique for vliw and superscalar compilation. The Journal of Supercomputing, 1993.

Digital Library

[17]

K. Kailas, K. Ebcioglu, and A. Agrawala. Cars: a new code generation framework for clustered ilp processors. HPCA, 2001.

Digital Library

[18]

R. Kessler. The alpha 21264 microprocessor. IEEE Micro, 1999.

Digital Library

[19]

A. Klaiber et al. The technology behind Crusoe processors. Transmeta Corporation White Paper, 2000.

[20]

W. Lee, R. Barua et al. Space-time scheduling of instruction-level parallelism on a raw machine. ASPLOS, 1998.

Digital Library

[21]

P. G. Lowney, S. M. Freudenberger et al. The Multiflow trace scheduling compiler. The Journal of Supercomputing, 1993.

Digital Library

[22]

S. A. Mahlke, D. C. Lin et al. Effective compiler support for predicated execution using the hyperblock. MICRO, 1992.

Digital Library

[23]

C. McNairy and D. Soltis. Itanium 2 processor microarchitecture. IEEE Micro, 2003.

Digital Library

[24]

S. Moon and K. Ebcioğlu. An efficient resource-constrained global scheduling technique for superscalar and vliw processors. MICRO, 1992.

Digital Library

[25]

S. Moon and K. Ebcioglu. Parallelizing nonnumerical code with selective scheduling and software pipelining. TOPLAS, 1997.

Digital Library

[26]

S. S. Muchnick. Advanced compiler design and implementation. Morgan Kaufmann, 1997.

Digital Library

[27]

A. Nicolau. Percolation scheduling: A parallel compilation technique. Technical Report, Cornell University, 1985.

Digital Library

[28]

E. Ozer, S. Banerjia, and T. Conte. Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures. MICRO, 1998.

Digital Library

[29]

S. Palacharla, N. Jouppi, and J. Smith. Complexity-effective superscalar processors. ISCA, 1997.

Digital Library

[30]

G. Pechanek and S. Vassiliadis. The ManArrayTM embedded processor architecture. Euromicro, 2000.

[31]

V. Porpodas and M. Cintra. LUCAS: latency-adaptive unified cluster assignment and instruction scheduling. LCTES, 2013.

Digital Library

[32]

K. Sankaralingam, R. Nagarajan, et al. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. ISCA, 2003.

Digital Library

[33]

H. Sharangpani and H. Arora. Itanium processor microarchitecture. IEEE Micro, 2000.

Digital Library

[34]

M. Taylor, J. Kim, et al. The Raw microprocessor: A computational fabric for software circuits and general-purpose programs. IEEE Micro, 2002.

Digital Library

[35]

A. Terechko and H. Corporaal. Inter-cluster communication in vliw architectures. TACO, 2007.

Digital Library

[36]

Y. Watanabe, J. Davis, and D. Wood. WiDGET: Wisconsin decoupled grid execution tiles. ISCA, 2010.

Digital Library

[37]

X. Zhang, H. Wu, and J. Xue. An efficient heuristic for instruction scheduling on clustered vliw processors. CASES, 2011.

Digital Library

Cited By

Tong YZhang WMa YLiu YLiang YZhang TLuo H(2017)Compiler-Guided Parallelism Adaption Based on Application Partition for Power-Gated ILP ProcessorIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2016.263641925:4(1329-1341)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1109/TVLSI.2016.2636419

Index Terms

CAeSaR: unified cluster-assignment scheduling and communication reuse for clustered VLIW processors
1. Computer systems organization
  1. Architectures
    1. Serial architectures
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

LUCAS: latency-adaptive unified cluster assignment and instruction scheduling
LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems

Clustered VLIW architectures are statically scheduled wide-issue architectures that combine the advantages of wide-issue processors along with the power and frequency scalability of clustered designs. Being statically scheduled, they require that the ...
LUCAS: latency-adaptive unified cluster assignment and instruction scheduling
LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems

Clustered VLIW architectures are statically scheduled wide-issue architectures that combine the advantages of wide-issue processors along with the power and frequency scalability of clustered designs. Being statically scheduled, they require that the ...
LUCAS: latency-adaptive unified cluster assignment and instruction scheduling
LCTES '13

Clustered VLIW architectures are statically scheduled wide-issue architectures that combine the advantages of wide-issue processors along with the power and frequency scalability of clustered designs. Being statically scheduled, they require that the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CASES '13: Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

September 2013

247 pages

ISBN:9781479914005

Program Chairs:
Rodric Rabbah
IBM Research
,
Anand Raghunathan
Purdue University

Sponsors

Publisher

IEEE Press

Publication History

Published: 29 September 2013

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ESWEEK'13

Sponsor:

ESWEEK'13: Ninth Embedded System Week

September 29 - October 4, 2013

Quebec, Montreal, Canada

Acceptance Rates

CASES '13 Paper Acceptance Rate 21 of 68 submissions, 31%;

Overall Acceptance Rate 52 of 230 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
122
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Tong YZhang WMa YLiu YLiang YZhang TLuo H(2017)Compiler-Guided Parallelism Adaption Based on Application Partition for Power-Gated ILP ProcessorIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2016.263641925:4(1329-1341)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1109/TVLSI.2016.2636419

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents