Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/223982.223995acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article
Free access

Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors

Published: 01 May 1995 Publication History

Abstract

This paper introduces dynamic self-invalidation (DSI), a new technique for reducing cache coherence overhead in shared-memory multiprocessors. DSI eliminates invalidation messages by having a processor automatically invalidate its local copy of a cache block before a conflicting access by another processor. Eliminating invalidation overhead is particularly important under sequential consistency, where the latency of invalidating outstanding copies can increase a program's critical path.DSI is applicable to software, hardware, and hybrid coherence schemes. In this paper we evaluate DSI in the context of hardware directory-based write-invalidate coherence protocols. Our results show that DSI reduces execution time of a sequentially consistent full-map coherence protocol by as much as 41%. This is comparable to an implementation of weak consistency that uses a coalescing write-buffer to allow up to 16 outstanding requests for exclusive blocks. When used in conjunction with weak consistency, DSI can exploit tear-off blocks---which eliminate both invalidation and acknowledgment messages---for a total reduction in messages of up to 26%.

References

[1]
Sarita V. Adve and Mark D. Hill. Implementing Sequential Consistency in Cache-Based Systems. In ICPP90, pages 147-I50, August 1990.]]
[2]
Sarita V. Adve and Mark D. Hill. Weak Ordering - A New Definition. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 2-I 4, May 1990.]]
[3]
Anant AgarwaI, Richard Simoni, Mark Horowitz, and John Hennessy. An Evaluation of Directory Schemes for Cache Coherence. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 280- 289, 1988.]]
[4]
Thomas E. Anderson. The Performance Implications of Spin-Waiting Alternatives for Shared-Memory Multiprocessors. In Proceedings of the 1989 International Conference on Parallel Processing (Vol. H Software), pages 11170--11174, August 1989.]]
[5]
Thomas E. Anderson, David E. Culler, David A. Patterson, and the NOW team. A Case for NOW (Networks of Workstations). IEEE Micro. To appear.]]
[6]
Brian Case. SPARC V9 Adds Wealth of New Features. Microprocessor Report, 7(9), February 1993.]]
[7]
L.M. Censier and P. Feautrier. A New Solution to Coherence Problems in Multicache Systems. IEEE Transacttons on Computers, C-27(t2):1112-1118, December 1978.]]
[8]
David Chaiken, John Kubiatowics. and Anant Agarwal. LimitLESS Directories: A Scalable Cache Coherence Scheme. in Proceedings of the Fourth hzternational Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IV), pages 224- 234, April I99 I.]]
[9]
Hoichi Cheong and Alexander V. Veidenbaum. Compiler- Directed Cache Management in Multiprocessors. IEEE Computer, 23(6):39-48, June 1990.]]
[10]
Trishul M. Chilimbi and James R. Larus. Cachier: A Tool for Automatically Inserting CICO Annotations. In Proceedings of the 1994 h~ternational Conference on Parallel Processhzg (Vol. II Software), pages ii-89-98, August 1994.]]
[11]
Lynn Choi and Pen-Chung Yew. A Compiler-Directed Cache Coherence Scheme with Improved lntertask Locality. In Proceedings of Supercomputing 94, pages 773-782, Nov 1994.]]
[12]
Alan L. Cox and Robert J. Fowler. Adaptive Cache Coherency for Detecting Migratory Shared Data. In Proceedings of the 20th Annual bzternattonai Sympostum on Computer Architecture, pages 98-108, May 1993.]]
[13]
D.E. Culler, A. Dusseau, S.C. Goldstein, A. Krishnamurthy, S. Lumetta, T. yon Eicken, and K. Yelick. Parallel Programming in Split-C. In Proceedings of Supercomputing 93, pages 262-273, November I993.]]
[14]
Ron Cytron, Steve Karlovsky, and Kevin P. McAuliffe. Automatic Management of Programmable Caches. In Proceedings of the 1988 bzternational Conference on Parallel Processing (Vol. H Software), pages 229-238, Aug 1988.]]
[15]
Fredrik Dahlgren, Michel Dubois, and Per Stenstrom. Combined Performance Gains of Simple Cache Protocol Extensions. In Proceedings of the 21st Annual hTternational Symposium on Computer Architecture, pages 187- 197, April 1994.]]
[16]
Ervan Darnell and Ken Kennedy. Cache Coherence Using Local Knowledge. In Proceedings of Supercomputing 93, pages 720-729, Nov 1993.]]
[17]
Michel Dubois, Christoph Scheurich, and Faye Briggs. Memory Access Buffering in Multiprocessors. In Proceedings of the 13th Annual International Symposium on Computer Architecture, pages 434-442, June I986.]]
[18]
Vincent W. Freeh, David K. Lowenthal, and Gregory R. Andrews. Distributed Filaments: Efficient Fine-Grain Parallelism on a Cluster of Workstations. In Proceedings of the First USENIX Symposium on Operating Systems Design and Implementation ( OSDI), pages 201-213, November 1994.]]
[19]
Kourosh Gharachorloo, Daniel Lenoski, James Laudon, Philip Gibbons, Anoop Gupta, and John Hennessy. Memory Consistency and Event Ordering in Scalable Shared- Memory. In Proceedings of the 17th Amulal bzternational Symposium o, Computer Architecture, pages 15-26, June 1990.]]
[20]
Anoop Gupta, John Hennessy, Kourosh Gharachorloo, Todd Mowry, and Wolf-Dietrich Weber. Comparative evaluation of latency reducing and tolerating techniques. In Proceedings of the I8th AmzuaI h,ternational Symposium on Computer Architecture, pages 254-263, May 1991.]]
[21]
E. Hagersten, A. Landin, and S. Haridi. DDM--A Cache- Only Memory Architecture. IEEE Computer, pages 44-54, September 1992.]]
[22]
Mark D. Hill, James R. Larus, Steven K. Remhardt, and David A. Wood. Cooperative Shared Memory: Software and Hardware for Scalable Multiprocessors. A CM Transactions on Computer Systems, 11 (4):300-318, November 1993. Earlier version appeared in ASPLOS V, Oct. 1992.]]
[23]
Norman E Jouppi. Improving Direct-Mapped Cache Performance by the addition of a Small Fully-Associative Cache and Prefetch Buffers. In Proceedings of the 17th Annual h~ternational Symposium on Computer Architecture, pages 364-373, May 1990.]]
[24]
Pete Keleher, Sandhya Dwarkadas, Alan Cox, and Willy Zwanenepoel. TreadMarks: Distributed Shared Memory on Standard Workstations and Operations Systems. Technical Report COMP TR93-214, Department of Computer Science, Rice University, November 1993.]]
[25]
Gordon Kurpanek, Ken Chan, Jason Zheng, Eric Delano, and William Bryg. PA7200: A PA-RISC Processor with Integrated High Performance MP Bus Interface. In Compcon, pages 375-382, 1994.]]
[26]
Jeffrey Kuskin et al. The Stanford FLASH Multiprocessor. In Proceedings of the 21st Annual h~ternational Symposium on Computer Architecture, pages 302-313, April 1994.]]
[27]
Leslie Lamport. How to Make a Multiprocessor Compute that Correctly Executes Multiprocess Programs. IEEE Transactions on Computers, C-28(9):690-691, September 1979.]]
[28]
A.R. Lebeck. Tools and Techniques for Memory System Deszgn and Analysis. PhD thesis, University of Wisconsin at Madison, expected August 1995. Computer Sciences Department.]]
[29]
Darnel Lenoski, James Laudon, Kourosh Gharachorloo, Wolf-Dietrich Weber, Anoop Gupta, John Hennessy, Mark Horowitz, and Monica Lain. The Stanford DASH Multiprocessor. IEEE Computer, 25(3):63-79, March 1992.]]
[30]
Sang Lyul Min and Jean-Loup Baer. Design and Analysis of a Scalable Cache Coherence Scheme Based on Clocks and Timestamps. IEEE Transactions on Parallel and Distributed Systems, 3(1 ):25-44, January 1992.]]
[31]
Todd Mowry and Anoop Gupta. Tolerating latency through software-controlled prefetching in shared-memory multiprocessors. Journal of Parallel and Distributed Computing, 12(2):87-106, June 1991.]]
[32]
StevenK. Reinhardt, Mark D. Hill, James R. Larus, Alvin R. Lebeck, James C. Lewis, and David A, Wood. The Wisconsin Wind Tunnel: Virtual Prototyping of Parallel Computers. In Proceedings of the 1993 A CM Sigmetrics Conference on Measurement and Modeling qf Computer Systems, pages 48-60, May t993.]]
[33]
Steven K. Reinhardt, James R. Larus, and Dawd A. Wood. Typhoon and Tempest: User-Level Shared Memory. in Proceedings of the 21st Amzual h~ten~ational Symposium on Computer Architecture, pages 325-336, April 1994.]]
[34]
Christoph Ernst Scheurich. Access Ordering and Coherence in Shared MemoO' Multiprocessors. PhD thesis, University of Southern California, May 1989. Also available as technical report No. CENG 89-19.]]
[35]
Ioannis Schoinas, Babak Falsafi, Alvin R. Lebeck, Steve K. Reinhardt, James R. Larus, and David A. Wood. Fine-grain Access Control for Distributed Shared Memory. Submitted for publication, March 1994.]]
[36]
JaswinderPal Singh, Wolf-Dietrich Weber, and Anoop Gupta. SPLASH: Stanford Parallel Applications for Shared Memory. Computer Architecture News, 20( I ):5-44, March I992.]]
[37]
B. Smith. Architecture and Applications of the HEP Multiprocessor Computer System. In Proceedings of the hTt. Soc. for Opt. Engr, pages 241-248, 1982.]]
[38]
Per Stenstrom, Mats Brorsson, and Lars Sandberg. Adaptive Cache Coherence Protocol Optimized for Migratory Sharing. in Proceedings of the 20th Annual h~tenzational Symposium on Computer Architecture, pages 109-118, May 1993.]]
[39]
C.K. Tang. Cache System Design in the Tightly Coupled Multiprocessor System. In Proc. AFIPS, pages 749-753, 1976.]]
[40]
David A. Wood, Satish Chandra, Babak Falsafi, Mark D. Hill, James R. Larus, Alvin R. Lebeck, James C. Lewis, Shubhendu S. Mukherjee, Subbarao Palacharla, and Steven K. Reinhardt. Mechanisms for Cooperative Shared Memory. In Proceedings of the 20th Annual hlternational Symposium on Computer Architecture, pages 156-168, May 1993.]]

Cited By

View all
  • (2023)PreFlush: Lightweight Hardware Prediction Mechanism for Cache Line Flush and Writeback2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT58117.2023.00015(74-85)Online publication date: 21-Oct-2023
  • (2022)Cost-aware Programming on Page-based Distributed Shared MemoryJournal of Information Processing10.2197/ipsjjip.30.46430(464-475)Online publication date: 2022
  • (2019)A Survey on Power Management Techniques for Oversubscription of Multi-Tenant Data CentersACM Computing Surveys10.1145/329104952:1(1-31)Online publication date: 13-Feb-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture
July 1995
426 pages
ISBN:0897916980
DOI:10.1145/223982
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 23, Issue 2
    Special Issue: Proceedings of the 22nd annual international symposium on Computer architecture (ISCA '95)
    May 1995
    412 pages
    ISSN:0163-5964
    DOI:10.1145/225830
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1995

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ISCA95
Sponsor:
ISCA95: International Conference on Computer Architecture
June 22 - 24, 1995
S. Margherita Ligure, Italy

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)130
  • Downloads (Last 6 weeks)22
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)PreFlush: Lightweight Hardware Prediction Mechanism for Cache Line Flush and Writeback2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT58117.2023.00015(74-85)Online publication date: 21-Oct-2023
  • (2022)Cost-aware Programming on Page-based Distributed Shared MemoryJournal of Information Processing10.2197/ipsjjip.30.46430(464-475)Online publication date: 2022
  • (2019)A Survey on Power Management Techniques for Oversubscription of Multi-Tenant Data CentersACM Computing Surveys10.1145/329104952:1(1-31)Online publication date: 13-Feb-2019
  • (2018)Runtime-assisted cache coherence deactivation in task parallel programsProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291703(1-12)Online publication date: 11-Nov-2018
  • (2018)Runtime-assisted cache coherence deactivation in task parallel programsProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC.2018.00038(1-12)Online publication date: 11-Nov-2018
  • (2018)SpandexProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00031(261-274)Online publication date: 2-Jun-2018
  • (2017)Non-Speculative Load-Load Reordering in TSOACM SIGARCH Computer Architecture News10.1145/3140659.308022045:2(187-200)Online publication date: 24-Jun-2017
  • (2017)Non-Speculative Load-Load Reordering in TSOProceedings of the 44th Annual International Symposium on Computer Architecture10.1145/3079856.3080220(187-200)Online publication date: 24-Jun-2017
  • (2016)Lazy release consistency for GPUsThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195669(1-13)Online publication date: 15-Oct-2016
  • (2016)Cooperative Caching for GPUsACM Transactions on Architecture and Code Optimization10.1145/300158913:4(1-25)Online publication date: 12-Dec-2016
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media