article

Free access

The shared regions approach to software cache coherence on multiprocessors

Authors:

Harjinder S. Sandhu,

Benjamin Gamsa,

Songnian ZhouAuthors Info & Claims

ACM SIGPLAN Notices, Volume 28, Issue 7

Pages 229 - 238

https://doi.org/10.1145/173284.155356

Published: 01 July 1993 Publication History

Abstract

The effective management of caches is critical to the performance of applications on shared-memory multiprocessors. In this paper, we discuss a technique for software cache coherence tht is based upon the integration of a program-level abstraction for shared data with software cache management . The program-level abstraction, called Shared Regions, explicitly relates synchronization objects with the data they protect. Cache coherence algorithms are presented which use the information provided by shared region primitives, and ensure that shared regions are always cacheable by the processors accessing them. Measurements and experiments of the Shared Region approach on a shared-memory multiprocessors accessing them. Measurements and experiments of the Shared Region approach on a shared-memory multiprocessor are shown. Comparisons with other software based coherence strategies, including a user-controlled strategy and an operating system-based strategy, show that this approach is able to deliver better performance, with relatively low corresponding overhead and only a small increase in the programming effort. Compared to a compiler-based coherence strategy, the Shared Regions approach still performs better than a compiler that can achieve 90% accuracy in allowing cacheing, as long as the regions are a few hundred bytes or larger, or they are re-used a few times in the cache.

References

[1]

S. Adve, V. Adve, M. Hill, and M. Vernon. Comparison of hardware and software cache coherence schemes. In 18th Intl. Syrup. on Computer Architecture. ACM, May 1991.

Digital Library

[2]

S. Adve and M. Hill. Weak ordering - a new definition. In 17th Intl. Syrup. on Computer Architecture. ACM, May 1990.

Digital Library

[3]

D. Cheriton, H. Goosen, and P. Boyle. Multilevel shared cacheing techniques for scalability in VMP-MC. In 16th Intl. Syrup. on Computer Architecture. ACM, May 1989.

Digital Library

[4]

D. Chaiken, J. Kubiatowicz, and A. Agarwal. LimitLESS directories: A scalable cache coherence scheme. In 4th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems. ACM, April 1991.

Digital Library

[5]

R. Cytron, S. Karlovsky, and K. McAuliffe. Automatic management of programmable caches. In 1988 Intl. Conf. on Parallel Processing, 1988.

[6]

H. Cheong and A. Veidenbaum. Compilerdirected cache management in multiprocessors. IEEE Computer, 1990.

Digital Library

[7]

M. Feeley and H. Levy. Distributed shared memory with versioned objects. In Conf. on Object-Oriented Programming Systems, Languages, and Applications. ACM, October 1992.

Digital Library

[8]

A. Gottlieb, C. Kruskal, K. McAullife, L. Rudolph, and M. Snir. The NYU Ultracomputer- designing an MIMD shared memoory parallel computer. In 9th Intl. Syrup. on Computer Architecture. ACM, 1982.

Digital Library

[9]

K. Gharacharloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory consistency and event ordering in scalable sharedmemory multiprocessors. In 17th Intl. Syrup. on Computer Architecture. ACM, May 1990.

Digital Library

[10]

M. Hill, J. Larus, K. Reinhardt, and D. Wood. Cooperative shared memory: Software and hardware for scalable multiprocessors. In 5th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems. ACM, October 1992.

Digital Library

[11]

D. Lenowski, J. Laudon, K. Gharachorloo, A. Gupta, and John Hennessy. The directorybased cache coherence protocol for the DASH multiprocessor. In 17th Intl. Syrup. on Computer Architecture. ACM, 1990.

Digital Library

[12]

M. Lam and M. Rinard. Coarse-grain parallel programming in Jade. In 3rd Syrup. on Principles and Practices of Parallel Programming. ACM, April 1991.

Digital Library

[13]

S. McGrogan, R. Olson, and N. Roda. Parallelizing large existing programs - methodology and experience. In 1986 Spring COMPCON, 1986.

[14]

S. Owicki and A. Agarwal. Evaluating the performance of software cache coherence, in 3rd Intl. Conf. on Architectural Support for Programming Languages and Operating Systems. ACM, 1989,

Digital Library

[15]

G. Pfister, W. Brantley, D. George, S. Harvey, W. Kleinfelder, K. McAliffe, E. Melton, V. Norton, and J. Weiss. The IBM research processor prototype (RP3); introduction and architecture. In 1985 Intl. Conf. on Parallel Processing, 1985.

[16]

M. Reiser and S. Lavenberg. Mean value analysis of closed multichain queueing networks. JA CM, April 1980.

Digital Library

[17]

H.S. Sandhu, B. Gamsa, and S. Zhou. Regionoriented memory management in sharedmemory multiprocessors. Technical Report 269, University of Toronto, April 1992.

[18]

M. S tumm, R. Unrau, and O. Krieger. Clustering micro-kernels for scalability. In Microkernels and Other Kernel Architectures Workshop. USENIX, April 1992.

[19]

Z.G. Vranesic, M. Stumm, D.M. Lewis, and R. White. Hector - a hierarchically structured shared-memory multiprocessor. Computer, January 1991.

Digital Library

Cited By

Cai JShrivastava A(2016)Software Coherence Management on Non-coherent Cache Multi-coresProceedings of the 2016 29th International Conference on VLSI Design and 2016 15th International Conference on Embedded Systems (VLSID)10.1109/VLSID.2016.70(397-402)Online publication date: 4-Jan-2016
https://dl.acm.org/doi/10.1109/VLSID.2016.70
Ophelders FBekooij MCorporaal HRosenstiel WWakabayashi K(2009)A tuneable software cache coherence protocol for heterogeneous MPSoCsProceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis10.1145/1629435.1629488(383-392)Online publication date: 11-Oct-2009
https://dl.acm.org/doi/10.1145/1629435.1629488
Pratikakis PVandierendonck HLyberis SNikolopoulos DVetter JMusuvathi MShen X(2011)A programming model for deterministic task parallelismProceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness10.1145/1988915.1988918(7-12)Online publication date: 5-Jun-2011
https://dl.acm.org/doi/10.1145/1988915.1988918
Show More Cited By

Index Terms

The shared regions approach to software cache coherence on multiprocessors

Recommendations

The shared regions approach to software cache coherence on multiprocessors
PPOPP '93: Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming

The effective management of caches is critical to the performance of applications on shared-memory multiprocessors. In this paper, we discuss a technique for software cache coherence tht is based upon the integration of a program-level abstraction for ...
The cache coherence problem in shared-memory multiprocessors
Adaptive prefetching for shared cache based chip multiprocessors
DATE '09: Proceedings of the Conference on Design, Automation and Test in Europe

Chip multiprocessors (CMPs) present a unique scenario for software data prefetching with subtle tradeoffs between memory bandwidth and performance. In a shared L2 based CMP, multiple cores compete for the shared on-chip cache space and limited off-chip ...

Reviews

Reviewer: Richard Sites

The authors describe a software cache coherence scheme that involves bracketing accesses to multiprocessor shared-memory regions with read/write-access and read/write-done pairs. The underlying software implementation invalidates, flushes, or copies back variable-length cached data to maintain a consistent main-memory copy. Software caching is an alternative to hardware cache coherency, which does not scale well to large numbers of processors. On a small set of problems (LU decomposition, mean value analysis, and matrix multiply), the scheme is shown to scale more nearly linearly (up to 16 processors) than hand-tuned per-page cache disable/reenable, and to scale significantly better than operating system– provided per-page cache disable/reenable. The alternative of leaving shared data in permanently uncached pages is dismissed because the performance on 16 processors is worse than on one. The shared region scheme is also shown to compare well with static compiler analysis and insertion of cache invalidate/flush/copy-back calls, for large regions or for small regions reused often. The paper is a good overview of software caching, with good references and a well-argued but unsurprising result. It is relatively easy to read for anyone in the computer field.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 28, Issue 7

July 1993

259 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/173284

Editor:
Richard Wexelblat
IDA/CSED, Alexandria, VA

Issue’s Table of Contents

PPOPP '93: Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
August 1993
259 pages
ISBN:0897915895
DOI:10.1145/155332
Chairmen:
Marina Chen
Yale Univ., New Haven, CT
,
Robert Halstead
DEC Cambridge Research Lab.

Copyright © 1993 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 July 1993

Published in SIGPLAN Volume 28, Issue 7

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

32
Total Citations
View Citations
536
Total Downloads

Downloads (Last 12 months)59
Downloads (Last 6 weeks)14

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cai JShrivastava A(2016)Software Coherence Management on Non-coherent Cache Multi-coresProceedings of the 2016 29th International Conference on VLSI Design and 2016 15th International Conference on Embedded Systems (VLSID)10.1109/VLSID.2016.70(397-402)Online publication date: 4-Jan-2016
https://dl.acm.org/doi/10.1109/VLSID.2016.70
Ophelders FBekooij MCorporaal HRosenstiel WWakabayashi K(2009)A tuneable software cache coherence protocol for heterogeneous MPSoCsProceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis10.1145/1629435.1629488(383-392)Online publication date: 11-Oct-2009
https://dl.acm.org/doi/10.1145/1629435.1629488
Pratikakis PVandierendonck HLyberis SNikolopoulos DVetter JMusuvathi MShen X(2011)A programming model for deterministic task parallelismProceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness10.1145/1988915.1988918(7-12)Online publication date: 5-Jun-2011
https://dl.acm.org/doi/10.1145/1988915.1988918
Ashby TDiaz PCintra M(2011)Software-Based Cache Coherence with Hardware-Assisted Selective Self-Invalidations Using Bloom FiltersIEEE Transactions on Computers10.1109/TC.2010.15560:4(472-483)Online publication date: 1-Apr-2011
https://dl.acm.org/doi/10.1109/TC.2010.155
Dormanns MSprangers WErtl HBemmerl T(2005)A programming interface for NUMA shared-memory clustersHigh-Performance Computing and Networking10.1007/BFb0031641(698-707)Online publication date: 25-Jun-2005
https://doi.org/10.1007/BFb0031641
Chen DTang CSanders BDwarkadas SScott M(2003)Exploiting high-level coherence information to optimize distributed shared stateACM SIGPLAN Notices10.1145/966049.78151838:10(131-142)Online publication date: 11-Jun-2003
https://dl.acm.org/doi/10.1145/966049.781518
Chen DTang CSanders BDwarkadas SScott MEigenmann RRinard M(2003)Exploiting high-level coherence information to optimize distributed shared stateProceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming10.1145/781498.781518(131-142)Online publication date: 11-Jun-2003
https://dl.acm.org/doi/10.1145/781498.781518
Sandhu HBrecht TMoscoso D(2001)Multiple-writer entry consistencyCluster computing10.5555/770406.770416(97-108)Online publication date: 1-Jan-2001
https://dl.acm.org/doi/10.5555/770406.770416
Lu P(2001)Integrating Bulk-Data Transfer into the Aurora Distributed Shared Data SystemJournal of Parallel and Distributed Computing10.1006/jpdc.2001.175861:11(1609-1632)Online publication date: 1-Nov-2001
https://dl.acm.org/doi/10.1006/jpdc.2001.1758
Lu P(2000)Implementing Scoped Behavior for Flexible Distributed Data SharingIEEE Concurrency10.1109/4434.8658958:3(63-73)Online publication date: 1-Jul-2000
https://dl.acm.org/doi/10.1109/4434.865895
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents