Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Operating system support for improving data locality on CC-NUMA compute servers

Published: 01 September 1996 Publication History

Abstract

The dominant architecture for the next generation of shared-memory multiprocessors is CC-NUMA (cache-coherent non-uniform memory architecture). These machines are attractive as compute servers because they provide transparent access to local and remote memory. However, the access latency to remote memory is 3 to 5 times the latency to local memory. CC-NOW machines provide the benefits of cache coherence to networks of workstations, at the cost of even higher remote access latency. Given the large remote access latencies of these architectures, data locality is potentially the most important performance issue. Using realistic workloads, we study the performance improvements provided by OS supported dynamic page migration and replication. Analyzing our kernel-based implementation, we provide a detailed breakdown of the costs. We show that sampling of cache misses can be used to reduce cost without compromising performance, and that TLB misses may not be a consistent approximation for cache misses. Finally, our experiments show that dynamic page migration and replication can substantially increase application performance, as much as 30%, and reduce contention for resources in the NUMA memory system.

References

[1]
T. E. Anderson, B. N. Bershad, E. D. Lazowska, and H. M. Levy. Scheduler activations: effective kernel support for the user-level management of parallelism. in Proceedings of the 13th ACM Symposium on Operating System Principles, pages 95-109, October 1991.]]
[2]
Anant Agarwal et al. The MIT Alewife Machine: A Large-Scale Distributed-Memory Multiprocessor. MIT/LCS Memo TM-454, Massachusetts Institute of Technology, 1991.]]
[3]
J.K. Bennett, J. B. Carter, W. Zwaeneopoel. Munin: Distributed shared memory based on type-specific memory coherence. In Proceedings of the Second Symposium on Principles and Practiceof Parallel Programming, pages 168-175, March 1990.]]
[4]
B.N. Bershad, M. J. Zekauskas, and W. A. Sawdon. The Midway Distributed Shared Memory System. In Proceedings of the 1993 IEEE CompCon Conference, pages 528-537, February 1993.]]
[5]
D. Black, A. Gupta, and W. D. Weber. Competitive management of distributed shared memory. In Proceedings of COMPCON, pages 184-190, March 1989.]]
[6]
W. Bolosky, M. Scott, R. Fitzgerald, and A. Cox. NUMA policies and their relationship to memory architecture. In Proceedings, Architectural Support for Programming Languages and Operating Systems, pages 212-221, April 1991.]]
[7]
R. Chandra, S Devine, B Verghese, A Gupta, and Mendel Rosenblum. Scheduling and Page Migration for Multiprocessor Compute Servers. in Proceedings, Architectural Support for Programming Languages and Operating Systems, 12-24, October 1994.]]
[8]
A.L. Cox and R. J. Fowler. The implementation of a coherent memory abstraction on a NUMA multiprocessor: Experiences with Platinum. In Proceedings of the Twelfth A CM Symposium on Operating Systems Principles, pages 32-43, December 1989.]]
[9]
M Holliday. Reference history, page size, and migration daemons in local/remote architectures. In Proceedings, Architectural Support for Programming Languages and Operating Systems, pages 104-112, April 1989.]]
[10]
J. Kuskin, et al. The Stanford FLASH Multiprocessor. In Proceedings of the 21st International Symposium on Computer Architecture, pages 302-313, April 1994.]]
[11]
R.P. LaRowe Jr., C. S. Ellis, and L. S. Kaplan. The robustness of NUMA memory management. In Proceedings of the Thirteenth A CM Symposium on Operating System Principles, pages 137-151, October 1991.]]
[12]
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessey. The directory-based cache coherence protocol for the DASH multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148-159, May 1990.]]
[13]
K. Li. IVY: A shared virtual memory system for parallel computing. In Proceedings of the 1988 International Conference on Parallel Processing, pages 125-132, August 1988.]]
[14]
T. Lovett and R. Clapp. STING: A CC-NUMA Computer System for the Commercial Marketplace. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 308-317, May 1996.]]
[15]
A. Nowatzyk et al. The S3.mp Scalable Memory Multiprocessor. Proceedings of the 24th International Conference on Parallel Processing, Aug. 1995]]
[16]
M. Rosenblum, S. Herrod, E. Witchel, and A. Gupta. Complete Computer Simulation: the SimOS approach. In IEEE Parallel and Distributed Technology, Fall 1995.]]
[17]
M. Rinard, D. Scales, M. Lam. Heterogeneous parallel programming in Jade. in Proceedings of Supercomputing '92, pages 245-56.]]
[18]
D.J. Scales and M. S. Lam. The design and evaluation of a shared object system for distributed memory machines. In Proceedings, Operating Systems Design and Implementation, pages 101-114, November 1994.]]
[19]
J.P. Singh, W. Weber, A. Gupta. Splash: Stanford Parallel Applications for Shared Memory. Computer Architecture News, 20(1):5-44, 1992.]]
[20]
A. Tucker and A. Gupta. Process control and scheduling issues for multiprogrammed sharedmemory multiprocessors. In Proceedings of the Twelfth A CM Symposium on Operating Systems Principles, pages 159-166, December 1991.]]
[21]
R. Vaswani and J Zahorjan. The implications of cache affinity on processor scheduling for multiprogrammed, shared-memory multiprocessors. In Proceedings of the Thirteenth A CM Symposium on Operating Systems Principles, pages 26-40, October 1991.]]

Cited By

View all
  • (2021)DvéProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00048(526-539)Online publication date: 14-Jun-2021
  • (2019)Getting more performance with polymorphism from emerging memory technologiesProceedings of the 12th ACM International Conference on Systems and Storage10.1145/3319647.3325826(8-20)Online publication date: 22-May-2019
  • (2019)Multi-spectral Reuse Distance: Divining Spatial Information from Temporal Data2019 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC.2019.8916398(1-8)Online publication date: Sep-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 31, Issue 9
Sept. 1996
273 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/248209
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
    October 1996
    290 pages
    ISBN:0897917677
    DOI:10.1145/237090
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 1996
Published in SIGPLAN Volume 31, Issue 9

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)176
  • Downloads (Last 6 weeks)48
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)DvéProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00048(526-539)Online publication date: 14-Jun-2021
  • (2019)Getting more performance with polymorphism from emerging memory technologiesProceedings of the 12th ACM International Conference on Systems and Storage10.1145/3319647.3325826(8-20)Online publication date: 22-May-2019
  • (2019)Multi-spectral Reuse Distance: Divining Spatial Information from Temporal Data2019 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC.2019.8916398(1-8)Online publication date: Sep-2019
  • (2019)Efficient automatic parallelization of a single GPU program for a multiple GPU systemIntegration10.1016/j.vlsi.2018.12.006Online publication date: Jan-2019
  • (2018)Cooperative NV-NUMAProceedings of the International Symposium on Memory Systems10.1145/3240302.3240308(67-78)Online publication date: 1-Oct-2018
  • (2016)Investigating the Performance of Hardware Transactions on a Multi-Socket MachineProceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/2935764.2935796(121-132)Online publication date: 11-Jul-2016
  • (2015)HareProceedings of the Tenth European Conference on Computer Systems10.1145/2741948.2741959(1-16)Online publication date: 17-Apr-2015
  • (2014)Thread Migration Prediction for Distributed Shared CachesIEEE Computer Architecture Letters10.1109/L-CA.2012.3013:1(53-56)Online publication date: 1-Jan-2014
  • (2013)BibliographyMulticore Technology10.1201/b15268-20(409-450)Online publication date: 18-Jul-2013
  • (2013)A lightweight VMM on many core for high performance computingACM SIGPLAN Notices10.1145/2517326.245153548:7(111-120)Online publication date: 16-Mar-2013
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media