Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Cache-Only Memory Architectures

Published: 01 June 1999 Publication History

Abstract

The shared-memory concept makes it easier to write parallel programs, but tuning the application to reduce the impact of frequent long-latency memory accesses still requires substantial programmer effort. Researchers have proposed using compilers, operating systems, or architectures to improve performance by allocating data close to the processors that use it.The Cache-Only Memory Architecture (COMA) increases the chances of data being available locally because the hardware transparently replicates the data and migrates it to the memory module of the node that is currently accessing it. Each memory module acts as a huge cache memory in which each block has a tag with the address and the state.The authors explain the functionality, architecture, performance, and complexity of COMA systems. They also outline different COMA designs, compare COMA to traditional nonuniform memory access (NUMA) systems, and describe proposed improvements in NUMA systems that target the same performance obstacles as COMA.

References

[1]
E. Hagersten A. Landin and S. Haridi, "DDM—A Cache-Only Memory Architecture," Computer, Sept. 1992, pp. 44-54.
[2]
P. Stenstrom T. Joe and A. Gupta, "Comparative Performance Evaluation of Cache-Coherent NUMA and COMA Architectures," Proc. 19th Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1992, pp. 80-91.
[3]
T. Joe and J. Hennessy, "Evaluating the Memory Overhead Required for COMA Architectures," Proc. 21st Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1994, pp. 82-93.
[4]
A. Saulsbury, et al., "An Argument for Simple COMA," Proc. 1st Symp. High-Performance Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1995, pp. 276-285.
[5]
S. Basu and J. Torrellas, "Enhancing Memory Use in Simple COMA: Multiplexed Simple COMA," Proc. 4th Int'l Symp. High-Performance Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1998, pp. 152-161.
[6]
Z. Zhang and J. Torrellas, "Reducing Remote Conflict Misses: NUMA with Remote Cache versus COMA," Proc. 3rd Int'l Symp. High-Performance Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1997, pp. 272-281.
[7]
B. Verghese, et al., "Operating System Support for Improving Data Locality on CC-NUMA Computer Servers," Proc. 7th Int'l Conf. Architectural Support for Programming Languages and Operating Systems, ACM Press, New York, 1996, pp. 279-289.
[8]
V. Soundararajan, et al., "Flexible Use of Memory for Replication/Migration in Cache-Coherent DSM Multiprocessors," Proc. 25th Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1998, pp. 342-355.
[9]
B. Falsafi and D. Wood, "Reactive NUMA: A Design for Unifying S-COMA and CC-NUMA," Proc. 24th Ann. Int'l Symp. Computer Architecture, ACM Press, New York, 1997, pp. 229-239.
[10]
E. Hagersten and M. Koster, "WildFire: A Scalable Path for SMPs," Proc. 5th Int'l Symp. High-Performance Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1999, pp. 172-181.
[11]
K. Ekanadham, et al., "PRISM: An Integrated Architecture for Scalable Shared Memory," Proc. 4th Int'l Symp. High-Performance Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1998, pp. 140-151.
[12]
A. Moga and M. Dubois, "The Effectiveness of SRAM Network Caches in Clustered DSMs," Proc. 4th Int'l Symp. High-Performance Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1998, pp. 103-112.
[13]
P. Stenstrom, "A Survey of Cache Coherence Schemes for Multiprocessors," Computer, June 1990, pp.12-24.
[14]
T. Mowry, "Tolerating Latency through Software-Controlled Data Prefetching," doctoral dissertation, Computer Systems Lab., Stanford Univ., Stanford, Calif., 1994.
[15]
F. Dahlgren and P. Stenstrom, "Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors," IEEE Trans. Parallel and Distributed Systems, Apr. 1996, pp. 385-398.
[16]
S.V. Adve and K. Gharachorloo, "Shared Memory Consistency Models: A Tutorial," Computer, Dec. 1996, pp. 66-76.
[17]
R. Alverson, et al., "The Tera Compute System," Proc. 1990 Int'l Conf. Supercomputing, IEEE CS Press, Los Alamitos, Calif., 1990, pp. 1-6.

Cited By

View all
  • (2019)MGPUSimProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322230(197-209)Online publication date: 22-Jun-2019
  • (2018)Cooperative NV-NUMAProceedings of the International Symposium on Memory Systems10.1145/3240302.3240308(67-78)Online publication date: 1-Oct-2018
  • (2016)C3DThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195681(1-12)Online publication date: 15-Oct-2016
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computer
Computer  Volume 32, Issue 6
June 1999
96 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 June 1999

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 29 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2019)MGPUSimProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322230(197-209)Online publication date: 22-Jun-2019
  • (2018)Cooperative NV-NUMAProceedings of the International Symposium on Memory Systems10.1145/3240302.3240308(67-78)Online publication date: 1-Oct-2018
  • (2016)C3DThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195681(1-12)Online publication date: 15-Oct-2016
  • (2016)CANDYThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195680(1-13)Online publication date: 15-Oct-2016
  • (2014)CAMEOProceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2014.63(1-12)Online publication date: 13-Dec-2014
  • (2008)YAARCThe Journal of Supercomputing10.1007/s11227-007-0147-z44:1(24-40)Online publication date: 1-Apr-2008
  • (2007)On-chip COMA cache-coherence protocol for microgrids of microthreaded coresProceedings of the 2007 conference on Parallel processing10.5555/1793434.1793442(38-48)Online publication date: 28-Aug-2007
  • (2007)Proximity-aware directory-based coherence for multi-core processor architecturesProceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures10.1145/1248377.1248398(126-134)Online publication date: 9-Jun-2007
  • (2006)ASRProceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2006.10(443-454)Online publication date: 9-Dec-2006
  • (2005)Victim ReplicationACM SIGARCH Computer Architecture News10.1145/1080695.106999833:2(336-345)Online publication date: 1-May-2005
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media