Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/237090.237179acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
Article
Free access

Shasta: a low overhead, software-only approach for supporting fine-grain shared memory

Published: 01 September 1996 Publication History

Abstract

This paper describes Shasta, a system that supports a shared address space in software on clusters of computers with physically distributed memory. A unique aspect of Shasta compared to most other software distributed shared memory systems is that shared data can be kept coherent at a fine granularity. In addition, the system allows the coherence granularity to vary across different shared data structures in a single application. Shasta implements the shared address space by transparently rewriting the application executable to intercept loads and stores. For each shared load or store, the inserted code checks to see if the data is available locally and communicates with other processors if necessary. The system uses numerous techniques to reduce the run-time overhead of these checks. Since Shasta is implemented entirely in software, it also provides tremendous flexibility in supporting different types of cache coherence protocols. We have implemented an efficient cache coherence protocol that incorporates a number of optimizations, including support for multiple communication granularities and use of relaxed memory models. This system is fully functional and runs on a cluster of Alpha workstations.The primary focus of this paper is to describe the techniques used in Shasta to reduce the checking overhead for supporting fine granularity sharing in software. These techniques include careful layout of the shared address space, scheduling the checking code for efficient execution on modern processors, using a simple method that checks loads using only the value loaded, reducing the extra cache misses caused by the checking code, and combining the checks for multiple loads and stores. To characterize the effect of these techniques, we present detailed performance results for the SPLASH-2 applications running on an Alpha processor. Without our optimizations, the checking overheads are excessively high, exceeding 100% for several applications. However, our techniques are effective in reducing these overheads to a range of 5% to 35% for almost all of the applications. We also describe our coherence protocol and present some preliminary results on the parallel performance of several applications running on our workstation cluster. Our experience so far indicates that once the cost of checking memory accesses is reduced using our techniques, the Shasta approach is an attractive software solution for supporting a shared address space with fine-grain access to data.

References

[1]
H.E. Bal, M. E Kaashoek, and A. S. Tanenbaum. Orca: A Language for Parallel Programming of Distributed Systems. IEEE Transactions on Software Enginee rin g, 18 ( 3): 190-205, Mar. 1992.]]
[2]
B. N. Bershad, M. J. Zekauskas, and W. A. Sawdon. The Midway Distributed Shared Memory System. In COMPCON 1993, pages 528- 537, Mar. 1993.]]
[3]
M. C. Carlisle and A. Rogers. Software Caching and Computation Migration in Olden. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 29-38, July 1995.]]
[4]
J. B. Carter, j. K. Bennett, and W. Zwaenepoel. Implementation and Performance of Munin. In Proceedings of the i3th ACM Symposium on Operating Systems Principles, pages 152-164, Oct. 1991.]]
[5]
D. Chiou, B. S. Ang, Arvind, M. j. Becherle, A. Boughton, R. Greiner, J. E. Hicks, and J. C. Hoe. StarT-NG: Delivering Seamless Parallel Computing. In Proceedings of EURO-PAR '95, pages 101-116, Aug. 1995.]]
[6]
D.E. Culler et al. Parallel Programming in Spht-C In Proceedings of Supercomputing '93, pages 262-273, Nov. 1993.]]
[7]
A. Erlichson, N. Nuckolls, G. Chesson, and J, Hennessy. SoftFLASH: Analyzing the Performance of Clustered Distributed Virtual Shared Memory. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 1996.]]
[8]
K. Gharachofloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory consistency and event ordering in scalable shared-memory multiprocessors. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 15-26, May 1990.]]
[9]
R. Gillett, M. Collins, and D. Pimm. Overview of Memory Channel Network for PCI. In Proceedings of COMPCON '96, pages 244-248, Feb. 1996.]]
[10]
M. Horowitz, M. Martonosi, T. C. Mowry, and M. D. Smith. Informing Memory Operations: Providing Memory Performance Feedback in Modem Processors. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 260-270, May 1996.]]
[11]
K. L. Johnson, M. F. Kaashoek, and D. A. Wallach. CRL: High- Performance All-Software Distributed Shared Memory. In Proceedings of the Fifteenth Symposium on Operating System Principles, pages 213-228, Dec. 1995.]]
[12]
P. Keleher, A.L. Cox, S. Dwarkadas, and W. Zwaenepoel. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems. In Proceedings of the 1994 Winter Usenix Conference, pages 115-132, January 1994.]]
[13]
K. Li and P. Hudak. Memory Coherence in Shared Virtual Memory Systems. ACM Transactions on Computer Systems, 7(4):321-359, Nov. 1989.]]
[14]
R. S. Nikhil. Cid: a Parallel, "Shared-memory" C for Distributedmemory Machines. In Seventh Workshop on Languages and Compilers for Parallel Computing, pages 376-390, Aug. 1994.]]
[15]
S. K. Reinhardt, R. W. Pfile, and D A. Wood. Decoupled Hardware Support for Distributed Shared Memory. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 34-43, May 1996.]]
[16]
D. J. Scales and M. S. Lam. The Design and Evaluation of a Shared ObJect System for Distributed Memory Machines. In Proceedings of the First Symposium on Operating System Design and Implementation, pages 101-114, Nov. 1994.]]
[17]
I. Schoinas, B. Falsafi, M. D. Hill, J. R. Larus, C. E. Lukas, S. S. Mukherjee, S. K. Reinhardt, E. Schnarr, and D. A. Wood. Implementing Fine-Grain Distributed Shared Memory on Commodity SMP Workstations. Technical Report 1307, University of Wisconsin Computer Sciences, Mar. 1996.]]
[18]
I. Schoinas, B. Falsafi, A. R. Lebeck, S. K. Reinhardt, J. R. Lares, and D. A. Wood. Fine-grain Access Control for Distributed Shared Memory. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 297-306, Oct. 1994.]]
[19]
J. P. Singh, W. D. Weber, and A. Gupta. SPLASH: Stanford Parallel Applications for Shared Memory. Computer Architecture News, 20(1 ):5-44, Mar. 1992.]]
[20]
A. Srivastava and A. Eustace. ATOM: A System for Building Customized Program Analysis Tools. In Proceedings of the SIGPLAN '94 Conference on Programming Language Design and Implementation, pages 196-205, June 1994.]]
[21]
P. R. Wilson and T. G. Moher. A Card-marking Scheme for Controlling Intergenerational References in Generation-Based GC on Stock Hardware. SIGPLAN Notices, 24(5):87-92, 1989.]]
[22]
S. C. Woo, M. Ohara, E. Tome, J. P. Singh, and A Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings of the 22nd International Symposmm on Computer Architecture, pages 24-36, June 1995.]]
[23]
D. Yeung, J. Kubiatowicz, and A. Agarwal. MGS: A Multigrain shared Memory System. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 44-55, May 1996.]]

Cited By

View all
  • (2024)TrackFM: Far-out Compiler Support for a Far Memory WorldProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624856(401-419)Online publication date: 27-Apr-2024
  • (2023)Using Local Cache Coherence for Disaggregated Memory SystemsACM SIGOPS Operating Systems Review10.1145/3606557.360656157:1(21-28)Online publication date: 28-Jun-2023
  • (2023)Revisiting Swapping in User-Space With Lightweight ThreadingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.327495342:11(4205-4218)Online publication date: Nov-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
October 1996
290 pages
ISBN:0897917677
DOI:10.1145/237090
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 1996

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ASPLOS96
Sponsor:

Acceptance Rates

ASPLOS VII Paper Acceptance Rate 25 of 109 submissions, 23%;
Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)149
  • Downloads (Last 6 weeks)32
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)TrackFM: Far-out Compiler Support for a Far Memory WorldProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624856(401-419)Online publication date: 27-Apr-2024
  • (2023)Using Local Cache Coherence for Disaggregated Memory SystemsACM SIGOPS Operating Systems Review10.1145/3606557.360656157:1(21-28)Online publication date: 28-Jun-2023
  • (2023)Revisiting Swapping in User-Space With Lightweight ThreadingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.327495342:11(4205-4218)Online publication date: Nov-2023
  • (2023)HoPP: Hardware-Software Co-Designed Page Prefetching for Disaggregated Memory2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070986(1168-1181)Online publication date: Feb-2023
  • (2021)Rethinking software runtimes for disaggregated memoryProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446713(79-92)Online publication date: 19-Apr-2021
  • (2020)AIFMProceedings of the 14th USENIX Conference on Operating Systems Design and Implementation10.5555/3488766.3488784(315-332)Online publication date: 4-Nov-2020
  • (2019)Project PBerryProceedings of the Workshop on Hot Topics in Operating Systems10.1145/3317550.3321424(127-135)Online publication date: 13-May-2019
  • (2019)Scaling out NUMA-Aware Applications with RDMA-Based Distributed Shared MemoryJournal of Computer Science and Technology10.1007/s11390-019-1901-434:1(94-112)Online publication date: 18-Jan-2019
  • (2018)Passing Messages while Sharing MemoryProceedings of the 2018 ACM Symposium on Principles of Distributed Computing10.1145/3212734.3212741(51-60)Online publication date: 23-Jul-2018
  • (2017)Remote memory in the age of fast networksProceedings of the 2017 Symposium on Cloud Computing10.1145/3127479.3131612(121-127)Online publication date: 24-Sep-2017
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media