Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/762761.762762acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article

Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture

Published: 16 November 2002 Publication History

Abstract

Cache misses for which data must be obtained from a remote cache (cache-to-cache transfer misses) account for an important fraction of the total miss rate. Unfortunately, cc-NUMA designs put the access to the directory information into the critical path of 3-hop misses, which significantly penalizes them compared to SMP designs. This work studies the use of owner prediction as a means of providing cc-NUMA multiprocessors with a more efficient support for cache-to-cache transfer misses. Our proposal comprises an effective prediction scheme as well as a coherence protocol designed to support the use of prediction. Results indicate that owner prediction can significantly reduce the latency of cache-to-cache transfer misses, which translates into speed-ups on application performance up to 12%. In order to also accelerate most of those 3-hop misses that are either not predicted or mispredicted, the inclusion of a small and fast directory cache in every node is evaluated, leading to improvements up to 16% on the final performance.

References

[1]
M. E. Acacio, J. González, J. M. García and J. Duato. "A New Scalable Directory Architecture for Large-Scale Multiprocessors". Proc. of the 7th Int'l Symposium on High Performance Computer Architecture (HPCA-7), pp. 97--106, January 2001.
[2]
M. E. Acacio, J. González, J. M. García and J. Duato. "A Novel Approach to Reduce L2 Miss Latency in Shared-Memory Multiprocessors". Proc. of the 16th Int'l Parallel and Distributed Processing Symposium (IPDPS'02), April 2002.
[3]
L. A. Barroso, K. Gharachorloo and E. Bugnion. "Memory System Characterization of Commercial Workloads". In Proc. of the 25th Int'l Symposium on Computer Architecture (ISCA'98), pp. 3--14, June 1998.
[4]
E. E. Bilir, R. M. Dickson, Y. Hu, M. Plakal, D. J. Sorin, M. D. Hill and D. A. Wood. "Multicast Snooping: A New Coherence Method Using a Multicast Address Network". Proc. of the 26th Int'l Symposium on Computer Architecture (ISCA'99), pp. 294--304, May 1999.
[5]
A. Charlesworth. "Extending the SMP Envelope". IEEE Micro, 18(1):39--49, Jan/Feb 1998.
[6]
D. E. Culler, J. P. Singh and A. Gupta. "Parallel Computer Architecture: A Hardware/Software Approach". Morgan Kaufmann Publishers, Inc., 1999.
[7]
K. Gharachorloo, M. Sharma, S. Steely and S. V. Doren. "Architecture and Design of AlphaServer GS320". Proc. of International Conference on Architectural Support for Programming Language and Operating Systems (ASPLOS IX), pp. 13--24, November 2000.
[8]
A. González, M. Valero, N. Topham and J. M. Parcerisa. "Eliminating Cache Conflict Misses through XOR-Based Placement Functions". Proc. of the Int'l Conference on Su-percomputing (ICS'97), pp. 76--83, 1997.
[9]
A. Gupta, W.-D. Weber and T. Mowry. "Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes". Proc. Int'l Conference on Parallel Processing (ICPP'90), pp. 312--321, August 1990.
[10]
L. Gwennap. "Alpha 21364 to Ease Memory Bottleneck". Microprocessor Report, pp. 12--15, October 1998.
[11]
M. D. Hill. "Multiprocessors Should Support Simple Memory-Consistency Models". IEEE Computer, 31(8):28--34, August 1998.
[12]
C. J. Hughes, V. S. Pai, P. Ranganathan and S. V. Adve. "RSIM: Simulating Shared-Memory Multiprocessors with ILP Processors". IEEE Computer, 35(2):40--49, February 2002.
[13]
R. Iyer, L. N. Bhuyan and A. Nanda. "Using Switch Directories to Speed Up Cache-to-Cache Transfers in CC-NUMA Multiprocessors". Proc. of the 14th Int'l Parallel and Distributed Processing Symposium (IPDPS'00), pp. 721--728, May 2000.
[14]
S. Kaxiras and J. R. Goodman. "Improving CC-NUMA Performance Using Instruction-Based Prediction". Proc. of the 5th Int'l High Performance Computer Architecture (HPCA-5), pp. 161--170, January 1999.
[15]
S. Kaxiras and C. Young. "Coherence Communication Prediction in Shared-Memory Multiprocessors". Proc. of the 6th Int'l High Performance Computer Architecture (HPCA-6), pp. 156--167, January 2000.
[16]
A. C. Lai and B. Falsafi. "Memory Sharing Predictor: The Key to a Speculative DSM". Proc. of the 26th Int'l Symposium on Computer Architecture (ISCA'99), pp. 162--171, May 1999.
[17]
J. Laudon and D. Lenoski. "The SGI Origin: A ccNUMA Highly Scalable Server". Proc. of the 24th Int'l Symposium on Computer Architecture (ISCA'97), pp. 241--251, June 1997.
[18]
M. M. Michael and A. K. Nanda. "Design and Performance of Directory Caches for Scalable Shared Memory Multiprocessors". Proc. of the 5th Int'l Symposium on High Performance Computer Architecture (HPCA-5), pp. 142--151, January 1999.
[19]
S. S. Mukherjee and M. D. Hill. "Using Prediction to Accelerate Coherence Protocols". Proc. of the 25th Int'l Symposium on Computer Architecture (ISCA'98), pp. 179--190, July 1998.
[20]
B. O'Krafka and A. Newton. "An Empirical Evaluation of Two Memory-Efficient Directory Methods". Proc. of the 17th Int'l Symposium on Computer Architecture (ISCA'90), pp. 138--147, May 1990.
[21]
V. S. Pai, P. Ranganathan, H. Abdel-Shafi and S. Adve. "The Impact of Exploiting Instruction-Level Parallelism on Shared-Memory Multiprocessors". IEEE Transactions on Computers, 48(2):218--226, February 1999.
[22]
J. Singh, W.-D. Weber and A. Gupta. "SPLASH: Stanford Parallel Applications for Shared-Memory". Computer Architecture News, 20:5--44, March 1992.
[23]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh and A. Gupta. "The SPLASH-2 Programs: Characterization and Methodological Considerations". Proc. of the 22nd Int'l Symposium on Computer Architecture (ISCA'95), pp. 24--36, June 1995.
[24]
Z. Zhang. "Architectural Sensitive Application Characterization: The Approach of High-Performance Index-Set (HP-Set)". Technical Report HPL-2001--75, HP Laboratories Palo Alto, March 2001.

Cited By

View all
  • (2022)A Case for Fine-grain Coherence Specialization in Heterogeneous SystemsACM Transactions on Architecture and Code Optimization10.1145/353081919:3(1-26)Online publication date: 22-Aug-2022
  • (2016)Cooperative Caching for GPUsACM Transactions on Architecture and Code Optimization10.1145/300158913:4(1-25)Online publication date: 12-Dec-2016
  • (2016)TokenTLBProceedings of the 2016 International Conference on Supercomputing10.1145/2925426.2926280(1-13)Online publication date: 1-Jun-2016
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '02: Proceedings of the 2002 ACM/IEEE conference on Supercomputing
November 2002
952 pages
ISBN:076951524X

Sponsors

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 16 November 2002

Check for updates

Qualifiers

  • Article

Conference

SC '02
Sponsor:

Acceptance Rates

SC '02 Paper Acceptance Rate 67 of 230 submissions, 29%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)A Case for Fine-grain Coherence Specialization in Heterogeneous SystemsACM Transactions on Architecture and Code Optimization10.1145/353081919:3(1-26)Online publication date: 22-Aug-2022
  • (2016)Cooperative Caching for GPUsACM Transactions on Architecture and Code Optimization10.1145/300158913:4(1-25)Online publication date: 12-Dec-2016
  • (2016)TokenTLBProceedings of the 2016 International Conference on Supercomputing10.1145/2925426.2926280(1-13)Online publication date: 1-Jun-2016
  • (2015)Coherence based message prediction for optically interconnected chip multiprocessorsProceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition10.5555/2755753.2755892(613-616)Online publication date: 9-Mar-2015
  • (2015)Automatic sharing classification and timely push for cache-coherent systemsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/2807591.2807649(1-12)Online publication date: 15-Nov-2015
  • (2014)Integrated Coherence PredictionACM Transactions on Design Automation of Electronic Systems10.1145/261175619:3(1-22)Online publication date: 23-Jun-2014
  • (2014)Bandwidth Adaptive Cache Coherence Optimizations for Chip MultiprocessorsInternational Journal of Parallel Programming10.1007/s10766-013-0247-842:3(435-455)Online publication date: 1-Jun-2014
  • (2012)Predicting Coherence Communication by Tracking Synchronization Points at Run TimeProceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2012.40(351-362)Online publication date: 1-Dec-2012
  • (2011)A composite and scalable cache coherence protocol for large scale CMPsProceedings of the international conference on Supercomputing10.1145/1995896.1995941(285-294)Online publication date: 31-May-2011
  • (2010)An adaptive cache coherence protocol for chip multiprocessorsProceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies10.1145/1882453.1882458(1-10)Online publication date: 19-Jun-2010
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media