Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/514191.514222acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

An interleaved cache clustered VLIW processor

Published: 22 June 2002 Publication History

Abstract

Clustered microarchitectures are becoming a common organization due to their potential to reduce the penalties caused by wire delays and power consumption. Fully-distributed architectures are particularly effective to deal with these constraints, and besides they are very scalable. However, the distribution of the data cache memory poses a significant challenge and may be critical for performance. In this work, a distributed data cache VLIW architecture based on an interleaved cache organization along with cyclic scheduling techniques are proposed. Moreover, the use of Attraction Buffers for such an architecture is introduced. Attraction Buffers are a novel hardware mechanism to increase the percentage of local accesses. The idea is to allow the movement of some data towards the clusters that need it.Performance results for 9 Mediabench benchmarks show that our scheduling techniques are able to hide the increased memory latency when accessing data mapped in a remote cluster. In addition, the local hit ratio is increased by 15% and stall time is reduced by 30% when using the same scheduling techniques with an interleaved cache clustered processor with Attraction Buffers. Finally, the proposed architecture is compared with a state-of-the-art distributed architecture such as the multiVLIW. Results show that the performance of an interleaved cache clustered VLIW processor with Attraction Buffers is similar to that of the multiVLIW architecture, whereas the former has a lower hardware complexity.

References

[1]
Agarwal, M.S. Hrishikesh, S.W. Keckler and D. Burger, "Clock Rate versus IPC: The End of the Road For Conventional Microarchitectures", in Procs. of the 27th Int. Symp. on Computer Architecture, pp. 248-259, June 2000
[2]
R. Barua, W. Lee, S. Amarasinghe, and A. Agarwal, "Maps: A Compiler-Managed Memory System for Raw Machines", Procs. of the 26th Int. Symp. on Computer Architecture, June 1999
[3]
P.P. Chang, S.A. Mahlke, W.Y. Chen, N.J. Water, and W.W. Hwu, "IMPACT: An Architectural Framework for Multiple-Instruction-Issue Processors", in Procs. of the 18th Int. Symp. on Computer Architecture, pp. 266-275, May 1991
[4]
B. Cheng, "Compile-Time Memory Disambiguation for C Programs", PhD thesis, Department of Computer Science, University of Illinois, May 2000
[5]
J. M. Codina, J. Sánchez and A. González, "A Unified Modulo Scheduling and Register Allocation Technique for Clustered Processors", in Procs. of Int. Conf. on Parallel Architectures and Compilation Techniques, Sept. 2001
[6]
J. M. Codina, J. Llosa and A. González, "A Comparative Study of Modulo Scheduling Techniques", in Procs. of Int. Conference on Supercomputing, June 2002
[7]
R. Ellis, "Bulldog: A Compiler for VLIW Architectures", MIT Press, pp. 180-184, 1986
[8]
P. Faraboschi, G. Brown, J. Fisher, G. Desoli and F. Homewood, "Lx: A Technology Platform for Customizable VLIW Embedded Processing", in Procs. of the 27th Int. Symp. on Computer Architecture, pp. 203-213, June 2000
[9]
J. Fridman and Zvi Greefield, "The TigerSharc DSP Architecture", IEEE Micro, pp. 66-76, Jan-Feb. 2000
[10]
Enric Gibert, J. Sanchez and A. Gonzalez, "An Interleaved Cache Architecture for Clustered VLIW Processors", Technical Report UPC-DAC-2001-23, Universitat Politecnica de Catalunya, June 2001 (http://www.ac.upc.es/recerca/reports/ DAC/2001/index,en.html)
[11]
L. Gwennap, "Digital 21264 Sets New Standard", Microproccessor Report, 10(14), Oct. 1996
[12]
K. Kailas, K. Ebcioglu and A. Agrawala, "CARS: A New Code Generation Framework for Clustered ILP Processors", in Procs. of the 7th Int. Symp. on High-Performance Computer Architecture, Jan. 2001
[13]
P. M. Kogge, "The Architecture of Pipelining Processors", McGraw-Hill, New York, 1981
[14]
M. Lam, "Software pipelining: An Effective Scheduling Technique for VLIW Machines", in Procs. on Conf. on Programming Languages and Implementation Design, pp. 318-328, 1988
[15]
D. Lavery, and W. W. Hwu, "Modulo Scheduling of Loops in Control-Intensive Non-Numeric Programs", in Procs. of the 29th Int. Symp. on Microarchitecture, pp. 126-141, Dec. 1996
[16]
D. H. Lawrie, "Access and Alignement of Data in an Array Processor", IEEE Trans. on Computers, 24(12), pp. 1145-1155, 1975
[17]
C. Lee, M. Potkonjak, and W.H. Mangione-Smith, "MediaBench: a Tool for Evaluating and Synthesizing Multimedia and Communication Systems", in Procs. of Int. Symp. on Microarchitecture, pp. 330-335, Dec. 1997
[18]
J. Llosa, A. González, E. Ayguadé and M. Valero, "Swing Modulo Scheduling", in Procs. of Int. Conf. on Parallel Architectures and Compilation Techniques (PACT'96), pp.80-86, Oct. 1996
[19]
P. Lowney, S. Freudenberger, T. Karzes, W. Lichtenstein, R. Nix, J. O'Donnell and J. Ruttenberger, "The Multiflow Trace Scheduling Compiler", in Journal of Supercomputing, pp. 51-142, Jan. 1993
[20]
S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann, "Effective Compiler Support for Predicated Execution Using the Hyperblock ", in Procs. of 25th Int. Symp. on Microarchitecture, pp. 45-54, Dec. 1992
[21]
E. Nystrom and A. E. Eichenberger, "Effective Cluster Assignment for Modulo Scheduling", in Procs. of the 31st Int. Symp. on Microarchitecture, pp. 103-114, 1998
[22]
"MAP1000 unfolds at Equator", Microprocessor Report, 12(16), Dec. 1998
[23]
S. Palacharla, N.P. Jouppi, and J.E. Smith, "Complexity-Effective Superscalar Processors", in Procs. of the 24th Int. Symp. on Computer Architecture, pp. 1-13, June 1997
[24]
G.G. Pechanek, and S. Vassiliadis, "The ManArray Embedded Processor Architecture," in Procs. of the 26th. Euromicro Conference: "Informatics: inventing the future", Maastricht, The Netherlands, Vol. I, pp.348-355, Sept. 2000
[25]
J. Sánchez and A. González, "Cache Sensitive Modulo Scheduling", in Procs. of 30th Int. Symp. on Microarchitecture, pp. 338-348, Dec. 1997
[26]
J. Sánchez and A. González, "The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures", in Procs. of the 29th Int. Conf. on Parallel Processing, Aug. 2000
[27]
J. Sánchez and A. González, "Modulo Scheduling for a Fully-Distributed Clustered VLIW Architecture", in Procs. of 33rd Int. Symp. on Microarchitecture, Dec. 2000
[28]
Texas Instruments Inc., "TMS320C62x/67x CPU and Instruction Set Reference Guide", 1998.
[29]
E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal, "Baring it all to Software: Raw Machines", IEEE Computer, pp. 86-93, September 1997

Cited By

View all
  • (2012)XPoint cacheProceedings of the 21st international conference on Parallel architectures and compilation techniques10.1145/2370816.2370829(75-86)Online publication date: 19-Sep-2012
  • (2007)Design principles for a virtual multiprocessorProceedings of the 2007 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries10.1145/1292491.1292500(76-82)Online publication date: 2-Oct-2007
  • (2006)Compiler-directed Data Partitioning for Multicluster ProcessorsProceedings of the International Symposium on Code Generation and Optimization10.1109/CGO.2006.9(208-220)Online publication date: 26-Mar-2006
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '02: Proceedings of the 16th international conference on Supercomputing
June 2002
338 pages
ISBN:1581134835
DOI:10.1145/514191
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 June 2002

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. VLIW processors
  2. attraction buffers
  3. clustered microarchitectures
  4. distributed cache
  5. modulo scheduling

Qualifiers

  • Article

Conference

ICS02
Sponsor:
ICS02: International Conference on Supercomputing
June 22 - 26, 2002
New York, New York, USA

Acceptance Rates

ICS '02 Paper Acceptance Rate 31 of 144 submissions, 22%;
Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2012)XPoint cacheProceedings of the 21st international conference on Parallel architectures and compilation techniques10.1145/2370816.2370829(75-86)Online publication date: 19-Sep-2012
  • (2007)Design principles for a virtual multiprocessorProceedings of the 2007 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries10.1145/1292491.1292500(76-82)Online publication date: 2-Oct-2007
  • (2006)Compiler-directed Data Partitioning for Multicluster ProcessorsProceedings of the International Symposium on Code Generation and Optimization10.1109/CGO.2006.9(208-220)Online publication date: 26-Mar-2006
  • (2006)Instruction scheduling for a clustered VLIW processor with a word‐interleaved cacheConcurrency and Computation: Practice and Experience10.1002/cpe.101318:11(1391-1411)Online publication date: 12-Jan-2006
  • (2005)Distributed Data Cache Designs for Clustered VLIW ProcessorsIEEE Transactions on Computers10.1109/TC.2005.16354:10(1227-1241)Online publication date: 1-Oct-2005
  • (2005)A Distributed Control Path Architecture for VLIW ProcessorsProceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT.2005.5(197-206)Online publication date: 17-Sep-2005
  • (2004)Cache organizations for clustered microarchitecturesProceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture10.1145/1054943.1054950(46-55)Online publication date: 20-Jun-2004
  • (2004)Cost-Sensitive Partitioning in an Architecture Synthesis System for Multicluster ProcessorsIEEE Micro10.1109/MM.2004.724:3(10-20)Online publication date: 1-May-2004
  • (2003)Local scheduling techniques for memory coherence in a clustered VLIW processor with a distributed data cacheProceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization10.5555/776261.776283(193-203)Online publication date: 23-Mar-2003
  • (2003)Local scheduling techniques for memory coherence in a clustered VLIW processor with a distributed data cacheInternational Symposium on Code Generation and Optimization, 2003. CGO 2003.10.1109/CGO.2003.1191545(193-203)Online publication date: 2003
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media