Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2555729.2555738acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
research-article

CAeSaR: unified cluster-assignment scheduling and communication reuse for clustered VLIW processors

Published: 29 September 2013 Publication History

Abstract

Clustered architectures have been proposed as a solution to the scalability problem of wide ILP processors. VLIW architectures, being wide-issue by design, benefit significantly from clustering. Such architectures, being both statically scheduled and clustered, require specialized code generation techniques, as they require explicit Inter-Cluster Copy instructions (ICCs) be scheduled in the code stream. In this work we propose CAeSaR, a novel instruction scheduling algorithm that improves code generation for such architectures. It combines cluster assignment, instruction scheduling and inter-cluster communication reuse all in one single unified algorithm. The proposed algorithm improves performance by any phase-ordering issues among these three code generation and optimization steps. We evaluate CAeSaR on the MediabenchII and SPEC CINT2000 benchmarks and compare it against the state-of-the-art instruction scheduling algorithm. Our results show an improvement in execution time of up to 20.3%, and 13.8% on average, over the current state-of-the-art across the benchmarks.

References

[1]
Gcc: Gnu compiler collection. http://gcc.gnu.org.
[2]
SPEC benchmark. http://www.spec.org.
[3]
A. Branover et al. Amd fusion APU: Llano. IEEE Micro, 2012.
[4]
D. Burger et al. Scaling to the End of Silicon with EDGE Architectures. IEEE Computer, 2004.
[5]
R. Canal et al. Dynamic cluster assignment mechanisms. HPCA, 2000.
[6]
A. Capitanio et al. Partitioned register files for vliws: A preliminary analysis of tradeoffs. MICRO, 1992.
[7]
J. Dehnert, B. Grant, et al. The Transmeta Code Morphing Software: Using Speculation, Recovery, and Adaptive Retranslation to Address Real-Life Challenges. CGO, 2003.
[8]
G. Desoli. Instruction assignment for clustered vliw dsp compilers: A new approach. Technical Report, HP Labs, 1998.
[9]
J. Ellis. Bulldog: A compiler for VLIW architectures. MIT Press, Cambridge, MA, USA, 1986.
[10]
P. Faraboschi, G. Brown, et al. Lx: a technology platform for customizable vliw embedded processing. ISCA, 2000.
[11]
J. Fisher. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers, 1981.
[12]
J. Fridman and Z. Greenfield. The Tigersharc DSP architecture. IEEE Micro, 2000.
[13]
J. Fritts, F. Steiling, and J. Tucek. Mediabench II video: expediting the next generation of video systems research. SPIE, 2005.
[14]
M. Gebhart, B. Maher, et al. An evaluation of the TRIPS computer system. ASPLOS, 2009.
[15]
W. Havanki, S. Banerjia, and T. Conte. Treegion scheduling for wide issue processors. HPCA, 1998.
[16]
W.-M. W. Hwu, S. A. Mahlke, et al. The superblock: An effective technique for vliw and superscalar compilation. The Journal of Supercomputing, 1993.
[17]
K. Kailas, K. Ebcioglu, and A. Agrawala. Cars: a new code generation framework for clustered ilp processors. HPCA, 2001.
[18]
R. Kessler. The alpha 21264 microprocessor. IEEE Micro, 1999.
[19]
A. Klaiber et al. The technology behind Crusoe processors. Transmeta Corporation White Paper, 2000.
[20]
W. Lee, R. Barua et al. Space-time scheduling of instruction-level parallelism on a raw machine. ASPLOS, 1998.
[21]
P. G. Lowney, S. M. Freudenberger et al. The Multiflow trace scheduling compiler. The Journal of Supercomputing, 1993.
[22]
S. A. Mahlke, D. C. Lin et al. Effective compiler support for predicated execution using the hyperblock. MICRO, 1992.
[23]
C. McNairy and D. Soltis. Itanium 2 processor microarchitecture. IEEE Micro, 2003.
[24]
S. Moon and K. Ebcioğlu. An efficient resource-constrained global scheduling technique for superscalar and vliw processors. MICRO, 1992.
[25]
S. Moon and K. Ebcioglu. Parallelizing nonnumerical code with selective scheduling and software pipelining. TOPLAS, 1997.
[26]
S. S. Muchnick. Advanced compiler design and implementation. Morgan Kaufmann, 1997.
[27]
A. Nicolau. Percolation scheduling: A parallel compilation technique. Technical Report, Cornell University, 1985.
[28]
E. Ozer, S. Banerjia, and T. Conte. Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures. MICRO, 1998.
[29]
S. Palacharla, N. Jouppi, and J. Smith. Complexity-effective superscalar processors. ISCA, 1997.
[30]
G. Pechanek and S. Vassiliadis. The ManArrayTM embedded processor architecture. Euromicro, 2000.
[31]
V. Porpodas and M. Cintra. LUCAS: latency-adaptive unified cluster assignment and instruction scheduling. LCTES, 2013.
[32]
K. Sankaralingam, R. Nagarajan, et al. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. ISCA, 2003.
[33]
H. Sharangpani and H. Arora. Itanium processor microarchitecture. IEEE Micro, 2000.
[34]
M. Taylor, J. Kim, et al. The Raw microprocessor: A computational fabric for software circuits and general-purpose programs. IEEE Micro, 2002.
[35]
A. Terechko and H. Corporaal. Inter-cluster communication in vliw architectures. TACO, 2007.
[36]
Y. Watanabe, J. Davis, and D. Wood. WiDGET: Wisconsin decoupled grid execution tiles. ISCA, 2010.
[37]
X. Zhang, H. Wu, and J. Xue. An efficient heuristic for instruction scheduling on clustered vliw processors. CASES, 2011.

Cited By

View all
  • (2017)Compiler-Guided Parallelism Adaption Based on Application Partition for Power-Gated ILP ProcessorIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2016.263641925:4(1329-1341)Online publication date: 1-Apr-2017

Index Terms

  1. CAeSaR: unified cluster-assignment scheduling and communication reuse for clustered VLIW processors

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CASES '13: Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
      September 2013
      247 pages
      ISBN:9781479914005

      Sponsors

      Publisher

      IEEE Press

      Publication History

      Published: 29 September 2013

      Check for updates

      Author Tags

      1. cluster assignment
      2. clustered VLIW
      3. instruction scheduling

      Qualifiers

      • Research-article

      Conference

      ESWEEK'13
      ESWEEK'13: Ninth Embedded System Week
      September 29 - October 4, 2013
      Quebec, Montreal, Canada

      Acceptance Rates

      CASES '13 Paper Acceptance Rate 21 of 68 submissions, 31%;
      Overall Acceptance Rate 52 of 230 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 23 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2017)Compiler-Guided Parallelism Adaption Based on Application Partition for Power-Gated ILP ProcessorIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2016.263641925:4(1329-1341)Online publication date: 1-Apr-2017

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media