Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/232973.232980acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article
Free access

MGS: a multigrain shared memory system

Published: 01 May 1996 Publication History

Abstract

Parallel workstations, each comprising 10-100 processors, promise cost-effective general-purpose multiprocessing. This paper explores the coupling of such small- to medium-scale shared memory multiprocessors through software over a local area network to synthesize larger shared memory systems. We call these systems Distributed Scalable Shared-memory Multiprocessors (DSSMPs).This paper introduces the design of a shared memory system that uses multiple granularities of sharing, and presents an implementation on the Alewife multiprocessor, called MGS. Multigrain shared memory enables the collaboration of hardware and software shared memory, and is effective at exploiting a form of locality called multigrain locality. The system provides efficient support for fine-grain cache-line sharing, and resorts to coarse-grain page-level sharing only when locality is violated. A framework for characterizing application performance on DSSMPs is also introduced.Using MGS, an in-depth study of several shared memory applications is conducted to understand the behavior of DSSMPs. We find that unmodified shared memory applications can exploit multigrain sharing. Keeping the number of processors fixed, applications execute up to 85% faster when each DSSMP node is a multiprocessor as opposed to a uniprocessor. We also show that tightly-coupled multiprocessors hold a significant performance advantage over DSSMPs on unmodified applications. However, a best-effort implementation of a kernel from one of the applications allows a DSSMP to almost match the performance of a tightly-coupled multiprocessor.

References

[1]
Thorsten von Eicken, Anindya Basu, Vineet Buch, and Werner Vogels. U-Net: A User-Level Network Interface for Parallel and Distributed Computing. In Proceedings of the 15th ACM Symposiumon Operating Systems Principles, Copper Mountain, Colorado, December 1995.]]
[2]
Alan L. Cox, Sandhya Dwarkadas,Pete Keleher, Honghui Lu, Ramakrishnan Rajamony, and Willy Zwaenepoel. Software Versus Hardware Shared-Memory Implementation: A Case Study. In Proceedings of the 21st Annual International Symposium on Computer Architecture, pages 106-117, San Diego, California, 1994.]]
[3]
Anant Agarwal et. al. The MIT Alewife Machine: Architecture and Performance. in Proceedings o{" the 22nd Annual International Symposium on Computer Architecture, pages 2-13, June 1995.]]
[4]
Harjinder S. Sandhu, Benjamin Gamsa, and Songnian Zhou. The Shared Regions Approach to Software Cache Coherence on Multiprocessors. In Principles and Practices of Parallel Programming, 1993, pages 229-238, San Diego, CA, May 1993.]]
[5]
Kirk Johnson, M. Frans Kaashoek, and Deborah A. Wallach. CRL: High-Performance All-Software Distributed Shared Memory. In Proceedings t~f the 15th ACM Symposium on Operating Systems Principles, Copper Mountain, Colorado, December 1995.]]
[6]
P. J. Denning. The Working Set Model for Program Behavior. Communications of the ACM, 11(5):323-333, May 1968.]]
[7]
Timothy Mark Pinkston and Sandra Johnson Baylor. Parallel Processor Memory Reference Analysis: Examining Locality and Clustering Potential. RC 15801, IBM T. J. Watson Research Center, May 1990.]]
[8]
John B. Carter, John K. Bennett, and Willy Zwaenepoel. Implementation and Performance of Munin. In Proceedings of the 13th Annual Symposium on Operating Syste~v Principles, pages 152-164, October 1991.]]
[9]
John Kub!iatowicz and Anant Agarwal. Anatomy of a Message in the Alewife Multiprocessor. In Proceedings of the International Supercomputing Conference, Tokyo, Japan, July 1993.]]
[10]
J.P. Singh, W.-D. Weber, and A. Gupta. SPLASH: Stanford Parallel Applications for Shared-Memory. Technical Report CSL-TR-92-526, Stanford University, June 1992.]]
[11]
Steven Cameron Woo, Moriyoshi Ohara, Evan Tome, Jaswinder Pal Singh, and Anoop Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings of the 22nd International Symposium on Computer Architecture, Santa Margherita Ligure, lUfly, June 1995.]]
[12]
David R. Cheriton, Hendrik A. Goosen, and Patrick D. Boyle. Multi- Level Shared Caching Techniques for Scalability in VMP-MC. In Proceedings of the 16th International Symposium on Computer Architecture, pages 16-24, Jerusalem, Israel, June 1989.]]
[13]
D. Lenoski, J. Laudon, K. Gharachodoo, W. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam. The Stanford DASH Multiprocessor. IEEE Computer, 25(3):63-79, March 1992.]]
[14]
Kendall Square Research, Inc., 170 Tracer Lane, Waltham, MA 02154. Kendall Square Research Technical Summary, 1992.]]
[15]
Andrew Erlichson, Basem A. Nayfeh, Jaswinder P. Singh, and Kunle Olukotun. The Benefits of Clustering in Shared Address Space Multiprocessors: An Applications-Driven Investigation. Technical Report CSL-TR-94-632, Computer Systems Laboratory, Stanford University, November 1994.]]
[16]
Andrew W. Wilson Jr. and Richard P. LaRowe Jr. Hiding Shared Memory Reference Latency on the Galactica Net Distributed Shared Memory Architecture. Journal of Parallel and Distributed Computing, 15(4):351-367~ 1992.]]
[17]
Rohit Chandra, Kourosh Gharachodoo, Vijayaraghavan Soundararajan, and Anoop Gupta. Performance Evaluation of Hybrid Hardware and Software Distributed Shared Memory Protocols. In Proceedings of the Eighth ACM International Conference on Supercomputing, pages 274-288, Manchester, England, July 1994.]]
[18]
Brian N. Bershad and Matthew J. Zekauskas. Midway: Shared Memory Parallel Programming with Entry Consistency for Distributed Memory Multiprocessors. CMU-CS 91-170, Carnegie Mellon University, September 1991.]]
[19]
Alan L. Cox and Robert J. Fowler. The Implementation of a Coherent Memory Abstraction on a NUMA Multiprocessor: Experiences with PLATINUM. Technical Report 263, University of Rochester Computer Science Department, May 1989.]]
[20]
Pete Keleher, Alan Cox, Sandhya Dwarkadas, and Willy Zwaenepoel. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems. Proceedings of the 1994 Usenix Conference, pages 115-131,January 1994.]]
[21]
Pete Keleher, Alan L. Cox, and Willy Zwaenepoel. Lazy Release Consistency for Software Distributed Shared Memory. In Proceedings of the 19th Annual Symposium on Computer Architecture, pages 13- 21, Gold,Coast, Australia, May 1992.]]

Cited By

View all
  • (2019)Runtime Monitoring and Resolution of Probabilistic Obstacles to System GoalsACM Transactions on Autonomous and Adaptive Systems10.1145/333780014:1(1-40)Online publication date: 31-Aug-2019
  • (2019)Scaling out NUMA-Aware Applications with RDMA-Based Distributed Shared MemoryJournal of Computer Science and Technology10.1007/s11390-019-1901-434:1(94-112)Online publication date: 18-Jan-2019
  • (2017)Feature Construction for Controlling Swarms by Visual DemonstrationACM Transactions on Autonomous and Adaptive Systems10.1145/308454112:2(1-22)Online publication date: 25-May-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '96: Proceedings of the 23rd annual international symposium on Computer architecture
May 1996
318 pages
ISBN:0897917863
DOI:10.1145/232973
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 24, Issue 2
    Special Issue: Proceedings of the 23rd annual international symposium on Computer architecture (ISCA '96)
    May 1996
    303 pages
    ISSN:0163-5964
    DOI:10.1145/232974
    Issue’s Table of Contents

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1996

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ISCA96
Sponsor:
ISCA96: International Conference on Computer Architecture
May 22 - 24, 1996
Pennsylvania, Philadelphia, USA

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)139
  • Downloads (Last 6 weeks)41
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Runtime Monitoring and Resolution of Probabilistic Obstacles to System GoalsACM Transactions on Autonomous and Adaptive Systems10.1145/333780014:1(1-40)Online publication date: 31-Aug-2019
  • (2019)Scaling out NUMA-Aware Applications with RDMA-Based Distributed Shared MemoryJournal of Computer Science and Technology10.1007/s11390-019-1901-434:1(94-112)Online publication date: 18-Jan-2019
  • (2017)Feature Construction for Controlling Swarms by Visual DemonstrationACM Transactions on Autonomous and Adaptive Systems10.1145/308454112:2(1-22)Online publication date: 25-May-2017
  • (2017)Protecting interoperable clinical environment with authenticationACM SIGBED Review10.1145/3076125.307612914:2(34-43)Online publication date: 31-Mar-2017
  • (2017)Model-based falsification of an artificial pancreas control systemACM SIGBED Review10.1145/3076125.307612814:2(24-33)Online publication date: 31-Mar-2017
  • (2017)Measuring performance of middleware technologies for medical systemsACM SIGBED Review10.1145/3076125.307612614:2(8-14)Online publication date: 31-Mar-2017
  • (2017)Integrating Reinforcement Learning with Multi-Agent Techniques for Adaptive Service CompositionACM Transactions on Autonomous and Adaptive Systems10.1145/305859212:2(1-42)Online publication date: 25-May-2017
  • (2010)Obstacle discovery in distributed actuator and sensor networksACM Transactions on Sensor Networks10.1145/1807048.18070517:3(1-24)Online publication date: 4-Oct-2010
  • (2008)SoCDALACM Transactions on Design Automation of Electronic Systems10.1145/1297666.129768313:1(1-38)Online publication date: 6-Feb-2008
  • (2006)Circulating shared-registers for multiprocessor systemsJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2005.04.00252:3(152-168)Online publication date: 1-Mar-2006
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media