Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/300979.301004acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article
Free access

Multicast snooping: a new coherence method using a multicast address network

Published: 01 May 1999 Publication History

Abstract

This paper proposes a new coherence method called "multicast snooping" that dynamically adapts between broadcast snooping and a directory protocol. Multicast snooping is unique because processors predict which caches should snoop each coherence transaction by specifying a multicast "mask." Transactions are delivered with an ordered multicast network, such as an Isotach network, which eliminates the need for acknowledgment messages. Processors handle transactions as they would with a snooping protocol, while a simplified directory operates in parallel to check masks and gracefully handle incorrect ones (e.g., previous owner missing). Preliminary performance numbers with mostly SPLASH-2 benchmarks running on 32 processors show that we can limit multicasts to an average of 2-6 destinations (<< 32) and we can deliver 2-5 multicasts per network cycle (>> broadcast snooping's 1 per cycle). While these results do not include timing, they do provide encouragement that multicast snooping can obtain data directly (like broadcast snooping) but apply to larger systems (like directories).

References

[1]
The Ultra Enterprise 10000 Server. http:l/www.sun.com/ servers/datacenterlwhitepapers/E 10000.ps.
[2]
Anant Agarwal, Richard Simoni, John Hennessy, and Mark Horowitz. An Evaluation of Directory Schemes for Cache Coherence. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 280-289, Honolulu, Hawaii, May 30-June 2, 1988,
[3]
John B. Andrews, Carl J. Beckmann, and David K. Poulsen. Notification and Multicast Networks for Synchronization and Coherence. Journal of Parallel and Distributed Computing, 15(8):332-350, February 1992.
[4]
Henri E. Bal, M. Frans Kaashoek, and Andrew S. Tanenbaum. Orca: A language for Parallel Programming of Distributed Systems. IEEE Transactions on Software Engineering, 18(3): 190-205, March 1992.
[5]
Raoul A.F. Bhoedjang, Tim Ruhl, and Henri E. Bal. Efficient Multicast On Myrinet Using Link-Level Flow Control. In Proc. International Conference on Parallel Processing, pages 381-390, August 1998.
[6]
Kenneth P. Birman. The Process Group Approach to Reliable Distributed Computing. Communications of the ACM, 36(I2):37-53, December 1993.
[7]
B.R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States, S. Swaminathan, and M. Karplus. CHARMM: A Program For Macromolecular Energy,Minimizati0n,and Dynamics Calculation. Journal of Computational Chemistry, 4(187), 1983.
[8]
L.M. Censier and P. Feaulrier. A New Solution to Coherence Problems in Multicache Systems. IEEE Transactions on Computers, 27(12): 1112-1 t 18, December 1978.
[9]
K.M. Chandy and Jayadev Misra. "Distributed simulation: A case study Jn design and verification of distributed programs ". IEEE Transactions on Software Engineering, SE-5(5):440--452, September 1979.
[10]
Anne E. Condon, Mark D. Hill, Manoj Plakal, and Daniel J. Sorin. Using Lamport Clocks !o Reason About Relaxed Memory Models. In proceedings of the 5th International Symposium on High Performance Computer Architecture, Orlando, Florida, January 1999.
[11]
Bronis R. de Supinski. Logical Time Coherence Maintenance. PhD thesis, Umversity of Virginia, May 1998.
[12]
S. Frank, H. Burkhardt III, and J. Rothnie. The KSRi: Bridging the Gap Between Shared Memory and MPPs. In Proc. COMPCON 1993, pages 285-295, Spring 1993.
[13]
Richard M. Fujimoto. Parallel Discrete Event Simulation. Communications of the ACM, 33(10):30-53, October 1990.
[14]
James Goodman. Using Cache Memories to Reduce Processor-Memory Traffic. In Proceedings of the International Symposium on_ Computer Architecture, Trondheim, Norway, June 1983.
[15]
Anoop Gupta and Wolf-Dietrich Weber. Cache Invalidation Patterns in Shared-MemoryMuItiprocessors. IEEE Transactions on Computers, 41 (7):794:--810, July 1992.
[16]
Anoop Gupta, Wolf-Dietrich Weber, and Todd Mowry. Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes. In Proceedings of the 1990 International Conference on Parallel Processing (Vol. I Architecture), pages 312-321, 1990.
[17]
Erik Hagersten, Anders Landin, and Seif Haridi. DDM-A Cache-Only Memory Architecture. IEEE Computer, 25(9):44-54, September 1992.
[18]
Leslie Lamport. Time, Clocks and the Ordering of Events in a Distributed System. Communications o.l the A CM, 21(7):558-565, July 1978.
[19]
Anders Landin, Erik Ha/~ersten, and Seif Haridi. Race- Free Interconnection Networks and Multiprocessor Consistency. In Proceedings of the International Symposium on Computer Architecture, June I991.
[20]
James Laudon and Daniel Lenoski. The SGI_ Origin: A ccNUMA Highly Scalable Server. In Proceedings of the 24th Annual International Symposium on Computer Architecture, June 1997.
[21]
Charles E. Leiserson et al. The Network Architecture of the Connection Machine CM-5. In Proceedings of the Fifth ACM Sy_mposium on parallel Algorithms and Architectures (SPAA), July 1993.
[22]
Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Anoop Gupta, and John Hennessy. The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148-159, June 1990.
[23]
Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Wolf-Dietrich Weber, Anoop Gup_ta, John Hennessy, Mark Horowitz, and Monica Lam. The Stanford DASH MuItiprocessor. IEEE Computer, 25(3):63-79, March 1992.
[24]
Xiaola Lin and Lionel M. Ni. Deadlock-Free Multicast Wormhole Routing in Multicomputer Networks. In Proc. 18th lnt'l Syrup. on Computer Architecture, pages 116- 125, May 199L
[25]
T. Loveu and R. Clapp. STING: A CC-NUMA Computer System for the Commercial Marketplace. In Proc. 23rd lnt'l Symp. on Computer Architecture, pages 308-317, May 1996.
[26]
Prasant Mohapatra and Vara Varavithya. A Hardware Multicast Routing Algorithm for Two-Dimensional Meshes. In Proc. Eighth IEEE Symposium on Parallel and Distributed Processing, pages 198-205, October 1996.
[27]
ShubhenduS. Mukherjee and MarkD. Hill. An Evaluation of Directory Protocols for Medium-Scale Shared-Memory Multiprocessors. In Proceedings of the 1994 International Coriference on Supercomputing, pages 64-74, Manchester, England, July 1994.
[28]
Shubhendu S. Mukherjee, Steven K. Reinhardt, Babak Falsafi, Mike Litzkow, Steve Huss-Lederman, Mark D. Hill, James R. Larus, and David A. Wood. Wisconsin Wind Tunnel II: A Fast and Portable Parallel Architecture Simulator. In Workshop on Performance Analysis and Its Impact on Design (PAID), June 1997.
[29]
Sbubhendu S. Muk.berjee, Sbarnik D. Sbarma, Mark D. Hill, James R. Larus, Anne Rogers, and Joel Saltz. Efficient Support for Irregular Applications on Distributed-Memory Machines. In Fifth A CM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), pages 68-79, July 1995.
[30]
Manoj Plakal, Daniel J. Sorin, Anne E. Condon, and Mark D. Hill. Lampon Clocks: Verifying a Directory Cache-Coherence Pi'otocot. In Proceedings of the l Oth Annual A CM S)mposium, on Parallel Architectures and Algorithms, pages 67-76, June 1998.
[31]
Abhiram G. Ranade. How to Emulate Shared Memory. Journal of Computer and System Sciences, 42(3):307- 326, 199
[32]
Paul Reynolds and Craig Williams. Personal communication, October 1998.
[33]
Paul F. Reynolds, Jr., Craig Williams, and Raymond R. Wagner, Jr._lsotach Networks. IEEE Transactions on Parallel and Distributed Systems, 8(4):337-348, April 1997.
[34]
Steven L. Scott and James R. Goodman. Performance of Pruning-Cache Directories for Large-Scale Multiprocessors. IEEE Trans. on Parallel and Distributed Systems, 4(5):520-534, May 1993.
[35]
Daniel J. Sorin, Manoj Plakal, Mark D. Hill, and Anne E. Condon. L_amport Clocks: Reasoning About Shared- Memory Correctness. Technical Report CS-TR-1367, University of Wisconsin-Madison, March 1998.
[36]
Per Stenstrom. A Cache Consistency Protocol for Multipr0cessors with Multistage Networks. In Proceedings of the 16th Annual International Symposium on Computer Architecture, pages 407-415, 1989.
[37]
Craig B. Stunkel, Rajeev Sivaram, and Dhabaleswar K. Panda. Implementing Multidestination Worms in Switch- Based Parallel Systems: Architectural Alternatives and Their Impact. In Proceedings of the 24th Annual International Symposium on Computer Architecture, June 1997.
[38]
Paul Sweazey and Alan Jay Smith. A Class of Compatible Cache Consistency Protocols_and their Support by the IEEE Futurebus. In Proceedings of the ~13th Annual International Symposium on Computer Architecture, pages 414--423, June 1986.
[39]
Vara Varavithya and Prasant Mohapatra. Asynchronous Tree-Based Multicasting in Wormhole-Switched MINs. In Proceedings of the 26th International Conference on Parallel Processing, August 1997.
[40]
George White and Pete Vogt. Profusion (tin)" A Buffered, Cache Coherent Crossbar Switch. In IEEE Hot Interconnects, pages 87-96, August 1997.
[41]
Craig Williams. Concurrency Control in Asynchronous Computations. Ph.d. thesis, University of Virginia, Computer Sciences Department, January 1993.
[42]
A.W. Wilson. Jr. Hierarchical Cache/Bus Architecture for Shared Memory Multiprocessors. In Proceedings of the 14th International Symposium on Computer Architecture, pages 244-253, June 1987.
[43]
Steven Cameron Woo, Moriyoshi Ohara, Evan Tome, Jaswinder Pat Shingh, and Anoop Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, ~s 24-36, Santa Margherita Ligure, Italy, June 22-24,

Cited By

View all
  • (2022)A Case for Fine-grain Coherence Specialization in Heterogeneous SystemsACM Transactions on Architecture and Code Optimization10.1145/353081919:3(1-26)Online publication date: 22-Aug-2022
  • (2017)An adaptive cache coherence protocolJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.12.020102:C(163-174)Online publication date: 1-Apr-2017
  • (2015)Automatic sharing classification and timely push for cache-coherent systemsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/2807591.2807649(1-12)Online publication date: 15-Nov-2015
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '99: Proceedings of the 26th annual international symposium on Computer architecture
May 1999
317 pages
ISBN:0769501702
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 27, Issue 2
    Special Issue: Proceedings of the 26th annual international symposium on Computer architecture (ISCA '99)
    May 1999
    298 pages
    ISSN:0163-5964
    DOI:10.1145/307338
    Issue’s Table of Contents

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 May 1999

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ISCA99
Sponsor:

Acceptance Rates

ISCA '99 Paper Acceptance Rate 26 of 135 submissions, 19%;
Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)62
  • Downloads (Last 6 weeks)11
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)A Case for Fine-grain Coherence Specialization in Heterogeneous SystemsACM Transactions on Architecture and Code Optimization10.1145/353081919:3(1-26)Online publication date: 22-Aug-2022
  • (2017)An adaptive cache coherence protocolJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.12.020102:C(163-174)Online publication date: 1-Apr-2017
  • (2015)Automatic sharing classification and timely push for cache-coherent systemsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/2807591.2807649(1-12)Online publication date: 15-Nov-2015
  • (2013)Using in-flight chains to build a scalable cache coherence protocolACM Transactions on Architecture and Code Optimization10.1145/2541228.254123510:4(1-24)Online publication date: 1-Dec-2013
  • (2012)Predicting Coherence Communication by Tracking Synchronization Points at Run TimeProceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2012.40(351-362)Online publication date: 1-Dec-2012
  • (2011)Towards the ideal on-chip fabric for 1-to-many and many-to-1 communicationProceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/2155620.2155630(71-82)Online publication date: 3-Dec-2011
  • (2010)An adaptive cache coherence protocol for chip multiprocessorsProceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies10.1145/1882453.1882458(1-10)Online publication date: 19-Jun-2010
  • (2010)Subspace snoopingProceedings of the 19th international conference on Parallel architectures and compilation techniques10.1145/1854273.1854292(111-122)Online publication date: 11-Sep-2010
  • (2010)Token tenure and PATCHACM Transactions on Architecture and Code Optimization10.1145/1839667.18396687:2(1-31)Online publication date: 5-Oct-2010
  • (2009)In-network coherence filteringProceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/1669112.1669143(232-243)Online publication date: 12-Dec-2009
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media