Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

The shared regions approach to software cache coherence on multiprocessors

Published: 01 July 1993 Publication History

Abstract

The effective management of caches is critical to the performance of applications on shared-memory multiprocessors. In this paper, we discuss a technique for software cache coherence tht is based upon the integration of a program-level abstraction for shared data with software cache management . The program-level abstraction, called Shared Regions, explicitly relates synchronization objects with the data they protect. Cache coherence algorithms are presented which use the information provided by shared region primitives, and ensure that shared regions are always cacheable by the processors accessing them. Measurements and experiments of the Shared Region approach on a shared-memory multiprocessors accessing them. Measurements and experiments of the Shared Region approach on a shared-memory multiprocessor are shown. Comparisons with other software based coherence strategies, including a user-controlled strategy and an operating system-based strategy, show that this approach is able to deliver better performance, with relatively low corresponding overhead and only a small increase in the programming effort. Compared to a compiler-based coherence strategy, the Shared Regions approach still performs better than a compiler that can achieve 90% accuracy in allowing cacheing, as long as the regions are a few hundred bytes or larger, or they are re-used a few times in the cache.

References

[1]
S. Adve, V. Adve, M. Hill, and M. Vernon. Comparison of hardware and software cache coherence schemes. In 18th Intl. Syrup. on Computer Architecture. ACM, May 1991.
[2]
S. Adve and M. Hill. Weak ordering - a new definition. In 17th Intl. Syrup. on Computer Architecture. ACM, May 1990.
[3]
D. Cheriton, H. Goosen, and P. Boyle. Multilevel shared cacheing techniques for scalability in VMP-MC. In 16th Intl. Syrup. on Computer Architecture. ACM, May 1989.
[4]
D. Chaiken, J. Kubiatowicz, and A. Agarwal. LimitLESS directories: A scalable cache coherence scheme. In 4th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems. ACM, April 1991.
[5]
R. Cytron, S. Karlovsky, and K. McAuliffe. Automatic management of programmable caches. In 1988 Intl. Conf. on Parallel Processing, 1988.
[6]
H. Cheong and A. Veidenbaum. Compilerdirected cache management in multiprocessors. IEEE Computer, 1990.
[7]
M. Feeley and H. Levy. Distributed shared memory with versioned objects. In Conf. on Object-Oriented Programming Systems, Languages, and Applications. ACM, October 1992.
[8]
A. Gottlieb, C. Kruskal, K. McAullife, L. Rudolph, and M. Snir. The NYU Ultracomputer- designing an MIMD shared memoory parallel computer. In 9th Intl. Syrup. on Computer Architecture. ACM, 1982.
[9]
K. Gharacharloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory consistency and event ordering in scalable sharedmemory multiprocessors. In 17th Intl. Syrup. on Computer Architecture. ACM, May 1990.
[10]
M. Hill, J. Larus, K. Reinhardt, and D. Wood. Cooperative shared memory: Software and hardware for scalable multiprocessors. In 5th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems. ACM, October 1992.
[11]
D. Lenowski, J. Laudon, K. Gharachorloo, A. Gupta, and John Hennessy. The directorybased cache coherence protocol for the DASH multiprocessor. In 17th Intl. Syrup. on Computer Architecture. ACM, 1990.
[12]
M. Lam and M. Rinard. Coarse-grain parallel programming in Jade. In 3rd Syrup. on Principles and Practices of Parallel Programming. ACM, April 1991.
[13]
S. McGrogan, R. Olson, and N. Roda. Parallelizing large existing programs - methodology and experience. In 1986 Spring COMPCON, 1986.
[14]
S. Owicki and A. Agarwal. Evaluating the performance of software cache coherence, in 3rd Intl. Conf. on Architectural Support for Programming Languages and Operating Systems. ACM, 1989,
[15]
G. Pfister, W. Brantley, D. George, S. Harvey, W. Kleinfelder, K. McAliffe, E. Melton, V. Norton, and J. Weiss. The IBM research processor prototype (RP3); introduction and architecture. In 1985 Intl. Conf. on Parallel Processing, 1985.
[16]
M. Reiser and S. Lavenberg. Mean value analysis of closed multichain queueing networks. JA CM, April 1980.
[17]
H.S. Sandhu, B. Gamsa, and S. Zhou. Regionoriented memory management in sharedmemory multiprocessors. Technical Report 269, University of Toronto, April 1992.
[18]
M. S tumm, R. Unrau, and O. Krieger. Clustering micro-kernels for scalability. In Microkernels and Other Kernel Architectures Workshop. USENIX, April 1992.
[19]
Z.G. Vranesic, M. Stumm, D.M. Lewis, and R. White. Hector - a hierarchically structured shared-memory multiprocessor. Computer, January 1991.

Cited By

View all
  • (2016)Software Coherence Management on Non-coherent Cache Multi-coresProceedings of the 2016 29th International Conference on VLSI Design and 2016 15th International Conference on Embedded Systems (VLSID)10.1109/VLSID.2016.70(397-402)Online publication date: 4-Jan-2016
  • (2009)A tuneable software cache coherence protocol for heterogeneous MPSoCsProceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis10.1145/1629435.1629488(383-392)Online publication date: 11-Oct-2009
  • (2011)A programming model for deterministic task parallelismProceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness10.1145/1988915.1988918(7-12)Online publication date: 5-Jun-2011
  • Show More Cited By

Recommendations

Reviews

Richard Sites

The authors describe a software cache coherence scheme that involves bracketing accesses to multiprocessor shared-memory regions with read/write-access and read/write-done pairs. The underlying software implementation invalidates, flushes, or copies back variable-length cached data to maintain a consistent main-memory copy. Software caching is an alternative to hardware cache coherency, which does not scale well to large numbers of processors. On a small set of problems (LU decomposition, mean value analysis, and matrix multiply), the scheme is shown to scale more nearly linearly (up to 16 processors) than hand-tuned per-page cache disable/reenable, and to scale significantly better than operating system– provided per-page cache disable/reenable. The alternative of leaving shared data in permanently uncached pages is dismissed because the performance on 16 processors is worse than on one. The shared region scheme is also shown to compare well with static compiler analysis and insertion of cache invalidate/flush/copy-back calls, for large regions or for small regions reused often. The paper is a good overview of software caching, with good references and a well-argued but unsurprising result. It is relatively easy to read for anyone in the computer field.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 28, Issue 7
July 1993
259 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/173284
Issue’s Table of Contents
  • cover image ACM Conferences
    PPOPP '93: Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
    August 1993
    259 pages
    ISBN:0897915895
    DOI:10.1145/155332
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 July 1993
Published in SIGPLAN Volume 28, Issue 7

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)59
  • Downloads (Last 6 weeks)14
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2016)Software Coherence Management on Non-coherent Cache Multi-coresProceedings of the 2016 29th International Conference on VLSI Design and 2016 15th International Conference on Embedded Systems (VLSID)10.1109/VLSID.2016.70(397-402)Online publication date: 4-Jan-2016
  • (2009)A tuneable software cache coherence protocol for heterogeneous MPSoCsProceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis10.1145/1629435.1629488(383-392)Online publication date: 11-Oct-2009
  • (2011)A programming model for deterministic task parallelismProceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness10.1145/1988915.1988918(7-12)Online publication date: 5-Jun-2011
  • (2011)Software-Based Cache Coherence with Hardware-Assisted Selective Self-Invalidations Using Bloom FiltersIEEE Transactions on Computers10.1109/TC.2010.15560:4(472-483)Online publication date: 1-Apr-2011
  • (2005)A programming interface for NUMA shared-memory clustersHigh-Performance Computing and Networking10.1007/BFb0031641(698-707)Online publication date: 25-Jun-2005
  • (2003)Exploiting high-level coherence information to optimize distributed shared stateACM SIGPLAN Notices10.1145/966049.78151838:10(131-142)Online publication date: 11-Jun-2003
  • (2003)Exploiting high-level coherence information to optimize distributed shared stateProceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming10.1145/781498.781518(131-142)Online publication date: 11-Jun-2003
  • (2001)Multiple-writer entry consistencyCluster computing10.5555/770406.770416(97-108)Online publication date: 1-Jan-2001
  • (2001)Integrating Bulk-Data Transfer into the Aurora Distributed Shared Data SystemJournal of Parallel and Distributed Computing10.1006/jpdc.2001.175861:11(1609-1632)Online publication date: 1-Nov-2001
  • (2000)Implementing Scoped Behavior for Flexible Distributed Data SharingIEEE Concurrency10.1109/4434.8658958:3(63-73)Online publication date: 1-Jul-2000
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media