research-article

Cache-Only Memory Architectures

Authors:

Fredrik Dahlgren,

Josep TorrellasAuthors Info & Claims

Computer, Volume 32, Issue 6

Pages 72 - 79

https://doi.org/10.1109/2.769448

Published: 01 June 1999 Publication History

Abstract

The shared-memory concept makes it easier to write parallel programs, but tuning the application to reduce the impact of frequent long-latency memory accesses still requires substantial programmer effort. Researchers have proposed using compilers, operating systems, or architectures to improve performance by allocating data close to the processors that use it.The Cache-Only Memory Architecture (COMA) increases the chances of data being available locally because the hardware transparently replicates the data and migrates it to the memory module of the node that is currently accessing it. Each memory module acts as a huge cache memory in which each block has a tag with the address and the state.The authors explain the functionality, architecture, performance, and complexity of COMA systems. They also outline different COMA designs, compare COMA to traditional nonuniform memory access (NUMA) systems, and describe proposed improvements in NUMA systems that target the same performance obstacles as COMA.

References

[1]

E. Hagersten A. Landin and S. Haridi, "DDM—A Cache-Only Memory Architecture," Computer, Sept. 1992, pp. 44-54.

[2]

P. Stenstrom T. Joe and A. Gupta, "Comparative Performance Evaluation of Cache-Coherent NUMA and COMA Architectures," Proc. 19th Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1992, pp. 80-91.

[3]

T. Joe and J. Hennessy, "Evaluating the Memory Overhead Required for COMA Architectures," Proc. 21st Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1994, pp. 82-93.

[4]

A. Saulsbury, et al., "An Argument for Simple COMA," Proc. 1st Symp. High-Performance Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1995, pp. 276-285.

[5]

S. Basu and J. Torrellas, "Enhancing Memory Use in Simple COMA: Multiplexed Simple COMA," Proc. 4th Int'l Symp. High-Performance Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1998, pp. 152-161.

[6]

Z. Zhang and J. Torrellas, "Reducing Remote Conflict Misses: NUMA with Remote Cache versus COMA," Proc. 3rd Int'l Symp. High-Performance Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1997, pp. 272-281.

[7]

B. Verghese, et al., "Operating System Support for Improving Data Locality on CC-NUMA Computer Servers," Proc. 7th Int'l Conf. Architectural Support for Programming Languages and Operating Systems, ACM Press, New York, 1996, pp. 279-289.

[8]

V. Soundararajan, et al., "Flexible Use of Memory for Replication/Migration in Cache-Coherent DSM Multiprocessors," Proc. 25th Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1998, pp. 342-355.

[9]

B. Falsafi and D. Wood, "Reactive NUMA: A Design for Unifying S-COMA and CC-NUMA," Proc. 24th Ann. Int'l Symp. Computer Architecture, ACM Press, New York, 1997, pp. 229-239.

[10]

E. Hagersten and M. Koster, "WildFire: A Scalable Path for SMPs," Proc. 5th Int'l Symp. High-Performance Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1999, pp. 172-181.

[11]

K. Ekanadham, et al., "PRISM: An Integrated Architecture for Scalable Shared Memory," Proc. 4th Int'l Symp. High-Performance Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1998, pp. 140-151.

[12]

A. Moga and M. Dubois, "The Effectiveness of SRAM Network Caches in Clustered DSMs," Proc. 4th Int'l Symp. High-Performance Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1998, pp. 103-112.

[13]

P. Stenstrom, "A Survey of Cache Coherence Schemes for Multiprocessors," Computer, June 1990, pp.12-24.

[14]

T. Mowry, "Tolerating Latency through Software-Controlled Data Prefetching," doctoral dissertation, Computer Systems Lab., Stanford Univ., Stanford, Calif., 1994.

[15]

F. Dahlgren and P. Stenstrom, "Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors," IEEE Trans. Parallel and Distributed Systems, Apr. 1996, pp. 385-398.

[16]

S.V. Adve and K. Gharachorloo, "Shared Memory Consistency Models: A Tutorial," Computer, Dec. 1996, pp. 66-76.

[17]

R. Alverson, et al., "The Tera Compute System," Proc. 1990 Int'l Conf. Supercomputing, IEEE CS Press, Los Alamitos, Calif., 1990, pp. 1-6.

Cited By

Sun YBaruah TMojumder SDong SGong XTreadway SBao YHance SMcCardwell CZhao VBarclay HZiabari AChen ZUbal RAbellán JKim JJoshi AKaeli DManne SHunter HAltman E(2019)MGPUSimProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322230(197-209)Online publication date: 22-Jun-2019
https://dl.acm.org/doi/10.1145/3307650.3322230
Jokar MZhang LChong FJacob B(2018)Cooperative NV-NUMAProceedings of the International Symposium on Memory Systems10.1145/3240302.3240308(67-78)Online publication date: 1-Oct-2018
https://dl.acm.org/doi/10.1145/3240302.3240308
Huang CKumar RElver MGrot BNagarajan VHsu WYang CLipasti MLee H(2016)C3DThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195681(1-12)Online publication date: 15-Oct-2016
https://dl.acm.org/doi/10.5555/3195638.3195681
Show More Cited By

Index Terms

Cache-Only Memory Architectures

Recommendations

Scalable cache memory design for large-scale SMT architectures
WMPI '04: Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture

The cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for band-width. The size of the L1 data cache did not scale over the past decade. Instead, larger unified L2 and L3 caches were introduced. This ...
Two novel cache memory architectures: adaptive set association and pollution control caching
Intelligent cache management techniques for reducing memory systemwaste

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computer

Computer Volume 32, Issue 6

June 1999

96 pages

ISSN:0018-9162

Issue’s Table of Contents

Copyright © Copyright © 1999 IEEE. All Rights Reserved.

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 June 1999

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 29 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sun YBaruah TMojumder SDong SGong XTreadway SBao YHance SMcCardwell CZhao VBarclay HZiabari AChen ZUbal RAbellán JKim JJoshi AKaeli DManne SHunter HAltman E(2019)MGPUSimProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322230(197-209)Online publication date: 22-Jun-2019
https://dl.acm.org/doi/10.1145/3307650.3322230
Jokar MZhang LChong FJacob B(2018)Cooperative NV-NUMAProceedings of the International Symposium on Memory Systems10.1145/3240302.3240308(67-78)Online publication date: 1-Oct-2018
https://dl.acm.org/doi/10.1145/3240302.3240308
Huang CKumar RElver MGrot BNagarajan VHsu WYang CLipasti MLee H(2016)C3DThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195681(1-12)Online publication date: 15-Oct-2016
https://dl.acm.org/doi/10.5555/3195638.3195681
Chou CJaleel AQureshi MHsu WYang CLipasti MLee H(2016)CANDYThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195680(1-13)Online publication date: 15-Oct-2016
https://dl.acm.org/doi/10.5555/3195638.3195680
Chou CJaleel AQureshi MFlautner KWenisch TOzer EFerdman M(2014)CAMEOProceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2014.63(1-12)Online publication date: 13-Dec-2014
https://dl.acm.org/doi/10.1109/MICRO.2014.63
Sharifi MZolfaghari B(2008)YAARCThe Journal of Supercomputing10.1007/s11227-007-0147-z44:1(24-40)Online publication date: 1-Apr-2008
https://dl.acm.org/doi/10.1007/s11227-007-0147-z
Zhang LJesshope C(2007)On-chip COMA cache-coherence protocol for microgrids of microthreaded coresProceedings of the 2007 conference on Parallel processing10.5555/1793434.1793442(38-48)Online publication date: 28-Aug-2007
https://dl.acm.org/doi/10.5555/1793434.1793442
Brown JKumar RTullsen DGibbons PScheideler C(2007)Proximity-aware directory-based coherence for multi-core processor architecturesProceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures10.1145/1248377.1248398(126-134)Online publication date: 9-Jun-2007
https://dl.acm.org/doi/10.1145/1248377.1248398
Beckmann BMarty MWood D(2006)ASRProceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2006.10(443-454)Online publication date: 9-Dec-2006
https://dl.acm.org/doi/10.1109/MICRO.2006.10
Zhang MAsanovic K(2005)Victim ReplicationACM SIGARCH Computer Architecture News10.1145/1080695.106999833:2(336-345)Online publication date: 1-May-2005
https://dl.acm.org/doi/10.1145/1080695.1069998
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents