Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/223587.223606acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
Article
Free access

An analytic study of dynamic hardware and software cache coherence strategies

Published: 01 May 1995 Publication History

Abstract

Dynamic software cache coherence strategies use information about program sharing behaviour to manage caches at run-time and at a granularity defined by the application. The program-level information is obtained through annotations placed into the application by the user or the compiler. The coherence protocols may range from simple static algorithms to dynamic algorithms that use run-time data structures similar to the directories used in hardware strategies. In this paper, we present an analytic study of five dynamic software cache coherence algorithms and compare these to a representative hardware coherence strategy. The analytic model is constructed using four input parameters --- write probability, locality, granularity, and system size --- and solved by analysis of a Markov chain. We show that the fundamental tradeoffs between the different hardware and software strategies are captured in this model. The results of the study show that hardware schemes perform better for fine-grained data structures for much of the parameter space that we study. However, for coarse-grained data structures, various software algorithms are dominant over most of the parameter space. Further, hardware strategies are found to be more susceptible to the effects of contention, and also perform worse for the asymmetric workload that we study.

References

[1]
S. Adve, V. Adve, M. Hill, and M. Vernon. Comparison of hardware and software cache coherence strategies. In 18th Int'l. Symp. on Computer Architecture, pages 298-307, May 1991.
[2]
A. Agarwal, R. Simoni, M. Horowitz, and J. Hennessy. An evaluation of directory schemes for cache coherence. In 15th Int' l. Symp. on Computer Architecture, pages 280- 289, Jun 1988.
[3]
B. Bershad, M. Zekauskas and W. Sawdon. The Midway distributed shared memory system, In Proc. of COMP- COM'93, pages 528-537, Feb 1993.
[4]
J. Carter, J. Bennett, and W. Zwaenelxrel. Implementation and performance of Munin. In 13th Syrup. on Operating Systems Principles, pages 152-164, Oct 1991.
[5]
L. Censier and P. Feautrier, A new solution to coherence problems in multicache systems. In IEEE Transactions on Computers, c27(12), pages 1112-1118, Dec 1978.
[6]
D. Chaiken, J. Kubiatowicz and A. Agarwal. LimitLESS directories: A scalable cache coherence scheme, Fourth Int'l. Symp. on Architectural Support for Programming Languages and Operating Systems, Apr 1991.
[7]
R. Chandra, K. Gharachorloo, V. Soundararajan, and A. Gupta. Performance evaluation of hybrid hardware and software distributed shared memory protocols. In Eighth int' l. Conference on Supercomputing, Jul 1994.
[8]
H. Cheong and A. Veidenbaum. Compiler-directed cache management for multiprocessors. IEEE Computer. 23(6), pages 39-47, Jun 1990.
[9]
R. Cytron, S. Karlovsky, and K. McAuliffe. Automatic management of programmable caches. In Proc. of the int' l. Conference on Parallel Processing, Aug 1988.
[10]
S. J. Eggers and R. H. Katz. A characterization of sharing in parallel programs and its application to coherency protocol evaluation. In 15th Int'l. Symp. on Computer Architecture, May 1988.
[11]
M. Feeley and H. Levy. Distributed shared memory with versioned objects. In Conf. on Object-Oriented Programming Systems Languages, and Applications, Oct 1992.
[12]
K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennesy. Memory consistency and event ordering in scalable shared-memory multiprocessors, in 16th Int' l. Symp. on Computer Architecture, May 1990.
[13]
M. Hill, J. Larus and S. Reinhardt and D. Wood. Cooperative Shared-Memory: Software and hardware support for scalable multiprocessors. In Fifth Int' l. Symp. on Architectural Support for Programming Languages and Operating Systems, Oct 1992.
[14]
P. Keleher, A. Cox and W. Zwaenepoel. Lazy consistency for software distributed shared memory. In 18th Int'l. Symp. on Computer Architecture, May 1992
[15]
M. Lam and M. Rinard. Coarse-grain parallel programming in Jade. in 3rd Symp. on Principles and Practices of Parallel Programming, Apr 1991.
[16]
L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers. 28(9),pp.690-691, Sep 1979.
[17]
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The directory-based cache coherence protocol for the DASH multiprocessor, in 17th Int' I. Symp. on Computer Architecture, pages 148-159, May 1990.
[18]
S. Owicki and A. Agarwal. Evaluating the performance of software cache coherency. In Proc. 3rd Int' I. Conf. on Architectural Support for Programming Languages and Operating Systems, Apr 1989.
[19]
H. Sandhu, B. Gamsa, and S. Zhou. The Shared Regions approach to software cache coherence. In Symp. on Principles and Practices of Parallel Programming,May 1993.
[20]
H. Sandhu. Shared Regions: A strategy for efficient cache coherence on shared-memory multiprocessors. Ph.D. Thesis, University of Toronto, In preparation.
[21]
Z.G. Vranesic, M. Stumm, D. Lewis and R. White. Hector - a hierarchically structured shared-memory multiprocessor, In IEEE Computer, Jan 1991.
[22]
W. Weber and A. Gupta. Analysis of cache invalidation patterns in multiprocessors. In Proc. 3rd Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, Apr 1989.

Cited By

View all
  • (2014)Simultaneous hardware and time redundancy with online task scheduling for low energy highly reliable standby-sparing systemACM Transactions on Embedded Computing Systems10.1145/2523781/256003513:4(1-31)Online publication date: 10-Mar-2014
  • (2014)Towards scalable arithmetic units with graceful degradationACM Transactions on Embedded Computing Systems10.1145/249936713:4(1-26)Online publication date: 10-Mar-2014
  • (2013)Agglomerative-based flip-flop merging and relocation for signal wirelength and clock tree optimizationACM Transactions on Design Automation of Electronic Systems10.1145/2491477.249148418:3(1-20)Online publication date: 29-Jul-2013
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMETRICS '95/PERFORMANCE '95: Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
May 1995
340 pages
ISBN:0897916956
DOI:10.1145/223587
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1995

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMETRICS95
Sponsor:

Acceptance Rates

Overall Acceptance Rate 459 of 2,691 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)12
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2014)Simultaneous hardware and time redundancy with online task scheduling for low energy highly reliable standby-sparing systemACM Transactions on Embedded Computing Systems10.1145/2523781/256003513:4(1-31)Online publication date: 10-Mar-2014
  • (2014)Towards scalable arithmetic units with graceful degradationACM Transactions on Embedded Computing Systems10.1145/249936713:4(1-26)Online publication date: 10-Mar-2014
  • (2013)Agglomerative-based flip-flop merging and relocation for signal wirelength and clock tree optimizationACM Transactions on Design Automation of Electronic Systems10.1145/2491477.249148418:3(1-20)Online publication date: 29-Jul-2013
  • (2013)Employing circadian rhythms to enhance power and reliabilityACM Transactions on Design Automation of Electronic Systems10.1145/2491477.249148218:3(1-23)Online publication date: 29-Jul-2013
  • (2013)On bottleneck analysis in stochastic stream processingACM Transactions on Design Automation of Electronic Systems10.1145/2491477.249147818:3(1-20)Online publication date: 29-Jul-2013
  • (2013)Datacenter Scale Evaluation of the Impact of Temperature on Hard Disk Drive FailuresACM Transactions on Storage10.1145/2491472.24914759:2(1-24)Online publication date: 1-Jul-2013
  • (2013)Generalized Optimal Response Time Retrieval of Replicated Data from Storage ArraysACM Transactions on Storage10.1145/2491472.24914749:2(1-36)Online publication date: 1-Jul-2013
  • (2012)Fixed-point definability and polynomial time on graphs with excluded minorsJournal of the ACM10.1145/2371656.237166259:5(1-64)Online publication date: 5-Nov-2012
  • (2001)Performance Analysis of Database SystemsPerformance Evaluation: Origins and Directions10.1007/3-540-46506-5_13(305-327)Online publication date: 9-Nov-2001
  • (1999)An extensible framework for coherence in distributed shared data systemsProceedings Fourth International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'99)10.1109/ISPAN.1999.778925(106-111)Online publication date: 1999
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media