Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/859618.859642acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article

Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors

Published: 01 May 2003 Publication History

Abstract

Destination-set prediction can improve the latency/bandwidth tradeoff in shared-memory multiprocessors. The destination set is the collection of processors that receive a particular coherence request. Snooping protocols send requests to the maximal destination set (i.e., all processors), reducing latency for cache-to-cache misses at the expense of increased traffic. Directory protocols send requests to the minimal destination set, reducing bandwidth at the expense of an indirection through the directory for cache-to-cache misses. Recently proposed hybrid protocols trade-off latency and bandwidth by directly sending requests to a predicted destination set.This paper explores the destination-set predictor design space, focusing on a collection of important commercial workloads. First, we analyze the sharing behavior of these workloads. Second, we propose predictors that exploit the observed sharing behavior to target different points in the latency/bandwidth tradeoff. Third, we illustrate the effectiveness of destination-set predictors in the context of a multicast snooping protocol. For example, one of our predictors obtains almost 90% of the performance of snooping while using only 15% more bandwidth than a directory protocol (and less than half the bandwidth of snooping).

References

[1]
M. E. Acacio, J. González, J. M. García, and J. Duato. The Use of Prediction for Accelerating Upgrade Misses in cc-NUMA Multiprocessors. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pages 155--164, Sept. 2002.
[2]
A. R. Alameldeen, M. M. K. Martin, C. J. Mauer, K. E. Moore, M. Xu, D. J. Sorin, M. D. Hill, and D. A. Wood. Simulating a $2M Commercial Server on a $2K PC. IEEE Computer, 36(2):50--57, Feb. 2003.
[3]
C. Anderson and A. R. Karlin. Two Adaptive Hybrid Cache Coherency Protocols. In Proceedings of the Second IEEE Symposium on High-Performance Computer Architecture, Feb. 1996.
[4]
L. A. Barroso, K. Gharachorloo, and E. Bugnion. Memory System Characterization of Commercial Workloads. In Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 3--14, June 1998.
[5]
J. K. Bennett, J. B. Carter, and W. Zwaenepoel. Adaptive Software Cache Management for Distributed Shared Memory Architectures. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 125--135, May 1990.
[6]
E. E. Bilir, R. M. Dickson, Y. Hu, M. Plakal, D. J. Sorin, M. D. Hill, and D. A. Wood. Multicast Snooping: A New Coherence Method Using a Multicast Address Network. In Proceedings of the 26th Annual International Symposium on Computer Architecture, pages 294--304, May 1999.
[7]
A. L. Cox and R. J. Fowler. Adaptive Cache Coherency for Detecting Migratory Shared Data. In Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 98--108, May 1993.
[8]
F. Dahlgren. Boosting the Performance of Hybrid Snooping Cache Protocols. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 60--69, June 1995.
[9]
B. Falsafi and D. A. Wood. Reactive NUMA: A Design for Unifying S-COMA and CC-NUMA. In Proceedings of the 24th Annual International Symposium on Computer Architecture, pages 229--240, June 1997.
[10]
K. Gharachorloo, M. Sharma, S. Steely, and S. V. Doren. Architecture and Design of AlphaServer GS320. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 13--24, Nov. 2000.
[11]
A. Gupta and W.-D. Weber. Cache Invalidation Patterns in Shared-Memory Multiprocessors. IEEE Transactions on Computers, 41(7):794--810, July 1992.
[12]
L. Gwennap. Alpha 21364 to Ease Memory Bottleneck. Microprocessor Report, Oct. 1998.
[13]
E. Hagersten and M. Koster. WildFire: A Scalable Path for SMPs. In Proceedings of the Fifth IEEE Symposium on High-Performance Computer Architecture, pages 172--181, Jan. 1999.
[14]
A. R. Karlin, M. S. Manasse, L. Rudolph, and D. D. Sleator. Competitive Snoopy Caching. Algorithmica, 3(1):79--119, 1988.
[15]
S. Kaxiras and J. R. Goodman. Improving CC-NUMA Performance Using Instruction-Based Prediction. In Proceedings of the Fifth IEEE Symposium on High-Performance Computer Architecture, Jan. 1999.
[16]
S. Kaxiras and C. Young. Coherence Communication Prediction in Shared-Memory Multiprocessors. In Proceedings of the Sixth IEEE Symposium on High-Performance Computer Architecture, Jan. 2000.
[17]
S. Kunkel, B. Armstrong, and P. Vitale. System Optimization for OLTP Workloads. IEEE Micro, pages 56--64, May/June 1999.
[18]
A.-C. Lai and B. Falsafi. Memory Sharing Predictor: The Key to a Speculative Coherent DSM. In Proceedings of the 26th Annual International Symposium on Computer Architecture, pages 172--183, May 1999.
[19]
A.-C. Lai and B. Falsafi. Selective, Accurate, and Timely Self-Invalidation Using Last-Touch Prediction. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 139--148, June 2000.
[20]
A. R. Lebeck and D. A. Wood. Dynamic Self-Invalidation: Reducing Coherence Overhead in Shared-Memory Multiprocessors. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 48--59, June 1995.
[21]
P. S. Magnusson et al. Simics: A Full System Simulation Platform. IEEE Computer, 35(2):50--58, Feb. 2002.
[22]
M. M. K. Martin, M. D. Hill, and D. A. Wood. Token Coherence: Decoupling Performance and Correctness. In Proceedings of the 30th Annual International Symposium on Computer Architecture, June 2003.
[23]
M. M. K. Martin, D. J. Sorin, M. D. Hill, and D. A. Wood. Bandwidth Adaptive Snooping. In Proceedings of the Eighth IEEE Symposium on High-Performance Computer Architecture, pages 251--262, Feb. 2002.
[24]
C. J. Mauer, M. D. Hill, and D. A. Wood. Full System Timing-First Simulation. In Proceedings of the 2002 ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems, pages 108--116, June 2002.
[25]
F. Mounes-Toussi and D. J. Lilja. The Potential of Compile-Time Analysis to Adapt the Cache Coherence Enforcement Strategy to the Data Sharing Characteristics. IEEE Transactions on Parallel and Distributed Systems, 6(5):470--481, May 1995.
[26]
S. S. Mukherjee and M. D. Hill. Using Prediction to Accelerate Coherence Protocols. In Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 179--190, June 1998.
[27]
J. Nilsson and F. Dahlgren. Improving Performance of Load-Store Sequences for Transaction Processing Workloads on Multiprocessors. In Proceedings of the International Conference on Parallel Processing, pages 246--255, Sept. 1999.
[28]
J. Nilsson and F. Dahlgren. Reducing Ownership Overhead for Load-Store Sequences in Cache-Coherent Multiprocessors. In Proceedings of the 2000 International Parallel and Distributed Processing Symposium, May 2000.
[29]
P. Ranganathan, K. Gharachorloo, S. Adve, and L. Barroso. Performance of Database Workloads on Shared-Memory Systems with Out-of-Order Processors. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 307--318, Oct. 1998.
[30]
A. Raynaud, Z. Zhang, and J. Torrellas. Distance-Adaptive Update Protocols for Scalable Shared-Memory Multiprocesors. In Proceedings of the Second IEEE Symposium on High-Performance Computer Architecture, Feb. 1996.
[31]
D. J. Sorin, M. Plakal, M. D. Hill, A. E. Condon, M. M. K. Martin, and D. A. Wood. Specifying and Verifying a Broadcast and a Multicast Snooping Cache Coherence Protocol. IEEE Transactions on Parallel and Distributed Systems, 13(6):556--578, June 2002.
[32]
P. Stenström, M. Brorsson, and L. Sandberg. Adaptive Cache Coherence Protocol Optimized for Migratory Sharing. In Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 109--118, May 1993.
[33]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 24--37, June 1995.
[34]
Q. Yang, G. Thangadurai, and L. N. Bhuyan. Design of Adaptive Cache Coherence Protocol for Large Scale Multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 3(3):281--293, May 1992.

Cited By

View all
  • (2022)A Case for Fine-grain Coherence Specialization in Heterogeneous SystemsACM Transactions on Architecture and Code Optimization10.1145/353081919:3(1-26)Online publication date: 22-Aug-2022
  • (2020)Efficiently Supporting Dynamic Task Parallelism on Heterogeneous Cache-Coherent Systems2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA45697.2020.00025(173-186)Online publication date: May-2020
  • (2019)Analyzing and Leveraging Remote-Core Bandwidth for Enhanced Performance in GPUs2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT.2019.00028(258-271)Online publication date: Sep-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '03: Proceedings of the 30th annual international symposium on Computer architecture
June 2003
432 pages
ISBN:0769519458
DOI:10.1145/859618
  • Conference Chair:
  • Allan Gottlieb,
  • Program Chair:
  • Kai Li
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 31, Issue 2
    ISCA 2003
    May 2003
    422 pages
    ISSN:0163-5964
    DOI:10.1145/871656
    Issue’s Table of Contents

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2003

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ISCA03
Sponsor:
ISCA03: International Symposium on Computer Architecture
June 9 - 11, 2003
California, San Diego

Acceptance Rates

ISCA '03 Paper Acceptance Rate 36 of 184 submissions, 20%;
Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)1
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)A Case for Fine-grain Coherence Specialization in Heterogeneous SystemsACM Transactions on Architecture and Code Optimization10.1145/353081919:3(1-26)Online publication date: 22-Aug-2022
  • (2020)Efficiently Supporting Dynamic Task Parallelism on Heterogeneous Cache-Coherent Systems2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA45697.2020.00025(173-186)Online publication date: May-2020
  • (2019)Analyzing and Leveraging Remote-Core Bandwidth for Enhanced Performance in GPUs2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT.2019.00028(258-271)Online publication date: Sep-2019
  • (2019)CONCORD: Improving Communication using Consumer-Count Detection2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA)10.1109/AICCSA47632.2019.9035220(1-11)Online publication date: Nov-2019
  • (2018)Energy-efficient hybrid coherence protocol for multicore processorsCluster Computing10.5555/3287988.328800321:3(1521-1541)Online publication date: 1-Sep-2018
  • (2018)Energy-efficient hybrid coherence protocol for multicore processorsCluster Computing10.1007/s10586-018-1947-z21:3(1521-1541)Online publication date: 16-Feb-2018
  • (2017)Software Vulnerability Analysis and Discovery Using Machine-Learning and Data-Mining TechniquesACM Computing Surveys10.1145/309256650:4(1-36)Online publication date: 25-Aug-2017
  • (2017)Bridging the ChasmACM Computing Surveys10.1145/308422550:4(1-32)Online publication date: 25-Aug-2017
  • (2017)A survey of value prediction techniques for leveraging value localityConcurrency and Computation: Practice and Experience10.1002/cpe.425029:21Online publication date: 11-Sep-2017
  • (2016)TokenTLBProceedings of the 2016 International Conference on Supercomputing10.1145/2925426.2926280(1-13)Online publication date: 1-Jun-2016
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media