Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/776261.776283acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
Article

Local scheduling techniques for memory coherence in a clustered VLIW processor with a distributed data cache

Published: 23 March 2003 Publication History

Abstract

Clustering is a common technique to deal with wire delays. Fully-distributed architectures, where the register file, the functional units and the cache memory are partitioned, are particularly effective to deal with these constraints and besides they are very scalable. However, the distribution of the data cache introduces a new problem: memory instructions may reach the cache in an order different to the sequential program order, thus possibly violating its contents. In this paper two local scheduling mechanisms that guarantee the serialization of aliased memory instructions are proposed and evaluated: the construction of memory dependent chains (MDC solution), and two transformations (store replication and load-store synchronization) applied to the original Data Dependence Graph (DDGT solution). These solutions do not require any extra hardware.The proposed scheduling techniques are evaluated for a word-interleaved cache clustered VLIW processor (although these techniques can also be used for any other distributed cache configuration). Results for the Mediabench benchmark suite demonstrate the effectiveness of such techniques. In particular, the DDGT solution increases the proportion of local accesses by 16% compared to MDC, and stall time is reduced by 32% since load instructions can be freely scheduled in any cluster. However, the MDC solution reduces compute time and it often outperforms the former. Finally the impact of both techniques on an architecture with Attraction Buffers is studied and evaluated.

References

[1]
V. Agarwal, M.S. Hrishikesh, S.W. Keckler and D. Burger, "Clock Rate versus IPC: The End of the Road For Conventional Microarchitectures", in Procs. of the 27th Int. Symp. on Computer Architecture, pp. 248--259, June 2000
[2]
R. Barua, W. Lee, S. Amarasinghe, and A. Agarwal, "Maps: A Compiler-Managed Memory System for Raw Machines", Procs. of the 26th Int. Symp. on Computer Architecture, June 1999
[3]
D. Bernstein, D. Cohen and D. Maydan, "Dynamic Memory Disambiguation for Array References", in Procs. of 27th Int. Symp. on Microarchitecture, pp. 105--111, Nov. 1994
[4]
P.P. Chang, S.A. Mahlke, W.Y. Chen, N.J. Water, and W.W. Hwu, "IMPACT: An Architectural Framework for Multiple-Instruction-Issue Processors", in Procs. of the 18th Int. Symp. on Computer Architecture, pp. 266--275, May 1991
[5]
A. Charlesworth, "An Approach to Scientific Array Processing: The Architectural Design of the AP 120B/FPS-164 Family", in Computer, 14(9), pp. 18--27, 1981
[6]
B. Cheng, "Compile-Time Memory Disambiguation for C Programs", PhD thesis, Dept. of Computer Science, University of Illinois, May 2000
[7]
J.M. Codina, J. Sánchez and A. González, "A Unified Modulo Scheduling and Register Allocation Technique for Clustered Processors", in Procs. of Int. Conf. on Parallel Architectures and Compilation Techniques, Sept. 2001
[8]
P. Faraboschi, G. Brown, J. Fisher, G. Desoli and F. Homewood, "Lx: A Technology Platform for Customizable VLIW Embedded Processing", in Procs. of the 27th Int. Symp. on Computer Architecture, pp. 203--213, June 2000
[9]
J. Fridman and Zvi Greefield, "The TigerSharc DSP Architecture", IEEE Micro, pp. 66--76, Jan-Feb. 2000
[10]
E. Gibert, J. Sánchez and A. González, "An Interleaved Cache Clustered VLIW Processor", in Procs. of Int. Conf. on Supercomputing, pp. 210--219, June 2002.
[11]
E. Gibert, J. Sánchez and A. González, "Effective Instruction Scheduling Techniques for an Interleaved Cache Clustered VLIW Processor", in Procs. of 35th Int. Symp. on Microarchitecture, November 2002.
[12]
L. Gwennap, "Digital 21264 Sets New Standard", Microprocessor Report, 10(14), Oct. 1996
[13]
K. Kailas, K. Ebcioglu and A. Agrawala, "CARS: A New Code Generation Framework for Clustered ILP Processors", in Procs. of the 7th Int. Symp. on High-Performance Computer Architecture, Jan. 2001
[14]
C. Lee, M. Potkonjak, and W.H. Mangione-Smith, "Media-Bench: a Tool for Evaluating and Synthesizing Multimedia and Communication Systems", in Procs. of Int. Symp. on Microarchitecture, pp. 330--335, Dec. 1997
[15]
K. Li, "IVY: A Shared Virtual Memory System for Parallel Computing", in Procs. of Int. Conf. on Parallel Processing, Aug. 1988
[16]
J. Llosa, A. González, E. Ayguadé and M. Valero, "Swing Modulo Scheduling", in Procs. of lnt. Conf. on Parallel Architectures and Compilation Techniques, pp. 80--86, Oct. 1996
[17]
S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann, "Effective Compiler Support for Predicated Execution Using the Hyperblock ", in Procs. of 25th Int. Symp. on Microarchitecture, pp. 45--54, Dec. 1992
[18]
E. Nystrom and A. E. Eichenberger, "Effective Cluster Assignment for Modulo Scheduling", in Procs. of the 31st Int. Symp. on Microarchitecture, pp. 103--114, 1998
[19]
E. Özer, S. Banerjia, T.M. Conte, "Unified Assign and Schedule: A New Approach to Scheduling for Clustered Register File Microarchitectures", in Procs. of 31st Symp. on Microarchitecture, Nov. 1998
[20]
S. Palacharla, N.P. Jouppi, and J.E. Smith, "Complexity-Effective Superscalar Processors", in Procs. of the 24th Int. Symp. on Computer Architecture, pp. 1--13, June 1997
[21]
J. Sánchez and A. González, "Cache Sensitive Modulo Scheduling", in Procs. of 30th Int. Symp. on Microarchitecture, pp. 338--348, Dec. 1997
[22]
J. Sánchez and A. González, "The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures", in Procs. of the 29th Int. Conf. on Parallel Processing, Aug. 2000
[23]
J. Sánchez, and A. González, "Modulo Scheduling for a Fully-Distributed Clustered VLIW Architecture", in Procs. of 33rd Int. Symp. on Microarchitecture, Dec. 2000
[24]
M. Tomasevic, and V. Milutinovic, "Hardware Approaches to Cache Coherence in Shared-Memory Multiprocessors", IEEE Micro, vol. 14, no. 5 and 6, Oct. and Dec. 1994
[25]
V. V. Zyuban, "Inherently lower-power high-performance superscalar architectures", PhD thesis, Dept. of Computer Science and Engineering, Univ. of Notre Dame, March 2000

Cited By

View all
  • (2005)Distributed Data Cache Designs for Clustered VLIW ProcessorsIEEE Transactions on Computers10.1109/TC.2005.16354:10(1227-1241)Online publication date: 1-Oct-2005
  • (2003)Flexible Compiler-Managed L0 Buffers for Clustered VLIW ProcessorsProceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture10.5555/956417.956547Online publication date: 3-Dec-2003

Index Terms

  1. Local scheduling techniques for memory coherence in a clustered VLIW processor with a distributed data cache

                    Recommendations

                    Comments

                    Please enable JavaScript to view thecomments powered by Disqus.

                    Information & Contributors

                    Information

                    Published In

                    cover image ACM Conferences
                    CGO '03: Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
                    March 2003
                    349 pages
                    ISBN:076951913X

                    Sponsors

                    Publisher

                    IEEE Computer Society

                    United States

                    Publication History

                    Published: 23 March 2003

                    Check for updates

                    Qualifiers

                    • Article

                    Conference

                    CGO03
                    Sponsor:

                    Acceptance Rates

                    Overall Acceptance Rate 312 of 1,061 submissions, 29%

                    Contributors

                    Other Metrics

                    Bibliometrics & Citations

                    Bibliometrics

                    Article Metrics

                    • Downloads (Last 12 months)1
                    • Downloads (Last 6 weeks)0
                    Reflects downloads up to 21 Nov 2024

                    Other Metrics

                    Citations

                    Cited By

                    View all
                    • (2005)Distributed Data Cache Designs for Clustered VLIW ProcessorsIEEE Transactions on Computers10.1109/TC.2005.16354:10(1227-1241)Online publication date: 1-Oct-2005
                    • (2003)Flexible Compiler-Managed L0 Buffers for Clustered VLIW ProcessorsProceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture10.5555/956417.956547Online publication date: 3-Dec-2003

                    View Options

                    Login options

                    View options

                    PDF

                    View or Download as a PDF file.

                    PDF

                    eReader

                    View online with eReader.

                    eReader

                    Media

                    Figures

                    Other

                    Tables

                    Share

                    Share

                    Share this Publication link

                    Share on social media