research-article

A fully associative, tagless DRAM cache

Authors:

Hyunggyun Yang,

Jae W. LeeAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 43, Issue 3S

Pages 211 - 222

https://doi.org/10.1145/2872887.2750383

Published: 13 June 2015 Publication History

Abstract

This paper introduces a tagless cache architecture for large in-package DRAM caches. The conventional die-stacked DRAM cache has both a TLB and a cache tag array, which are responsible for virtual-to-physical and physical-to-cache address translation, respectively. We propose to align the granularity of caching with OS page size and take a unified approach to address translation and cache tag management. To this end, we introduce cache-map TLB (cTLB), which stores virtual-to-cache, instead of virtual-to-physical, address mappings. At a TLB miss, the TLB miss handler allocates the requested block into the cache if it is not cached yet, and updates both the page table and cTLB with the virtual-to-cache address mapping. Assuming the availability of large in-package DRAM caches, this ensures that an access to the memory region within the TLB reach always hits in the cache with low hit latency since a TLB access immediately returns the exact location of the requested block in the cache, hence saving a tag-checking operation. The remaining cache space is used as victim cache for memory pages that are recently evicted from cTLB. By completely eliminating data structures for cache tag management, from either on-die SRAM or in-package DRAM, the proposed DRAM cache achieves best scalability and hit latency, while maintaining high hit rate of a fully associative cache. Our evaluation with 3D Through-Silicon Via (TSV)-based in-package DRAM demonstrates that the proposed cache improves the IPC and energy efficiency by 30.9% and 39.5%, respectively, compared to the baseline with no DRAM cache. These numbers translate to 4.3% and 23.8% improvements over an impractical SRAM-tag cache requiring megabytes of on-die SRAM storage, due to low hit latency and zero energy waste for cache tags.

References

[1]

"AMD Working With Hynix For Development of High-Bandwidth 3D Stacked Memory." {Online}. Available: http://wccftech.com/amd-working-hynix-development-highbandwidth-3d-stacked-memory

[2]

"CACTI: An integrated cache and memory access time, cycle time, area, leakage, and dynamic power model." {Online}. Available: http://www.hpl.hp.com/research/cacti

[3]

"Intel unveils 72-core x86 Knights Landing CPU for exascale supercomputing." {Online}. Available: http://www.extremetech.com/extreme/171678-intel-unveils-72-core-x86-knights-landing

[4]

"Interview: Masaaki Tsuruta, Sony Computer Entertainment." {Online}. Available: http://eandt.theiet.org/magazine/2011/12/maasaki-tsu-interview.cfm

[5]

"McSim Simulator." {Online}. Available: http://scale.snu.ac.kr/mcsim

[6]

"Nvidia to Stack up DRAM on Future Volta GPUs." {Online}. Available: http://www.theregister.co.uk/2013/03/19

[7]

"The SAP HANA Database." {Online}. Available: http://www.sap.com/HANA

[8]

"Xilinx SSI Technology." {Online}. Available: http://www.hotchips.org/archives/hc24

[9]

AMD, AMD64 Architecture Programmer's Manual Volume 2: System Programming, May 2013.

[10]

K. Chen, S. Li, N. Muralimanohar, J. H. Ahn, J. B. Brockman, and N. P. Jouppi, "CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory," in Proceedings of the Conference on Design, Automation and Test in Europe (DATE), Mar 2012.

Digital Library

[11]

J. R. Cubillo, R. Weerasekera, Z. Z. Oo, E.-X. Liu, B. Conn, S. Bhattacharya, and R. Patti, "Interconnect design and analysis for through silicon interposers (TSIs)," in Proceedings of the 2011 IEEE International 3D Systems Integration Conference (3DIC), Jan/Feb 2012.

[12]

X. Dong, Y. Xie, N. Muralimanohar, and N. P. Jouppi, "Simple but effective heterogeneous main memory with on-chip memory controller support," in Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Nov 2010.

Digital Library

[13]

P. Gillingham and B. Millar, "High bandwidth memory interface," Jan. 21 2003, US Patent 6,510,503.

[14]

W. Gropp, E. Lusk, and R. Thakur, Using MPI-2: Advanced features of the message-passing interface. MIT press, 1999.

Digital Library

[15]

J. L. Hennessy and D. A. Patterson, Computer architecture: a quantitative approach. Elsevier, 2012.

Digital Library

[16]

J. L. Henning, "SPEC CPU2006 Memory Footprint," Computer Architecture News, vol. 35, no. 1, Mar. 2007.

Digital Library

[17]

C.-C. Huang and V. Nagarajan, "ATCache: reducing DRAM cache latency via a small SRAM tag cache," in Proceedings of the 23rd international conference on Parallel Architectures and Compilation Techniques (PACT), Aug 2014.

Digital Library

[18]

Intel, Intel® 64 and IA-32 Architectures Software Developer's Manual, September 2014.

[19]

S. S. Iyer, "The Evolution of Dense Embedded Memory in High Performance Logic Technologies," in Proceedings of the IEEE International Electron Devices Meeting (IEDM), Dec 2012.

[20]

D. Jevdjic, G. H. Loh, C. Kaynak, and B. Falsafi, "Unison cache: A scalable and effective die-stacked DRAM cache," in Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec 2014.

Digital Library

[21]

D. Jevdjic, S. Volos, and B. Falsafi, "Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache," in Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA), Jun 2013.

Digital Library

[22]

X. Jiang, N. Madan, L. Zhao, M. Upton, R. Iyer, S. Makineni, D. Newell, D. Solihin, and R. Balasubramonian, "CHOP: Adaptive filter-based DRAM caching for CMP server platforms," in Proceedings of the 16th International Symposium on High Performance Computer Architecture (HPCA), Jan 2010.

[23]

A. R. Lebeck, X. Fan, H. Zeng, and C. Ellis, "Power Aware Page Allocation," in Proceedings of the 9th international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Nov 2000.

Digital Library

[24]

D. U. Lee, K. W. Kim, K. W. Kim, H. Kim, J. Y. Kim, Y. J. Park, J. H. Kim, D. S. Kim, H. B. Park, J. W. Shin et al., "25.2 A 1.2 V 8Gb 8-channel 128GB/s high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29nm process and TSV," in Proceedings of 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), Feb 2014.

[25]

S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures," in Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec 2009.

Digital Library

[26]

G. H. Loh, "Extending the effectiveness of 3d-stacked DRAM caches with an adaptive multi-queue policy," in Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec 2009.

Digital Library

[27]

G. H. Loh and M. D. Hill, "Efficiently enabling conventional block sizes for very large die-stacked DRAM caches," in Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec 2011.

Digital Library

[28]

J. T. Pawlowski, "Hybrid Memory Cube," in Hot Chips, Aug 2011.

[29]

M. K. Qureshi and G. H. Loh, "Fundamental latency trade-off in architecting DRAM caches: Outperforming impractical SRAM-tags with a simple and practical design," in Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec 2012.

Digital Library

[30]

L. E. Ramos, E. Gorbatov, and R. Bianchini, "Page placement in hybrid memory systems," in Proceedings of the International Conference on Supercomputing (ICS), Jun 2011.

Digital Library

[31]

S. L. Salzberg, A. M. Phillippy, A. Zimin, D. Puiu, T. Magoc, S. Koren, T. J. Treangen, M. C. Schatz, A. L. Delcher, M. Roberts, G. Marais, M. Pop, and J. A. Yorke, "GAGE: A critical evaluation of genome assemblies and assembly algorithms," Genome Research, Dec 2011.

[32]

T. Sherwood, E. Perelman, G. Hamerly, and B. Calder, "Automatically Characterizing Large Scale Program Behavior," in Proceedings of the 10th international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Oct 2002.

Digital Library

[33]

J. Sim, G. H. Loh, H. Kim, M. O'Connor, and M. Thottethodi, "A mostly-clean DRAM cache for effective hit speculation and self-balancing dispatch," in Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec 2012.

Digital Library

[34]

Y. H. Son, O. Seongil, H. Yang, D. Jung, J. H. Ahn, J. Kim, J. Kim, and J. W. Lee, "Microbank: architecting through-silicon interposer-based main memory systems," in Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Dec 2014.

Digital Library

[35]

Z. Wang, D. A. Jiménez, C. Xu, G. Sun, and Y. Xie, "Adaptive placement and migration policy for an STT-RAM-based hybrid cache," in Proceedings of the 20th International Symposium on High Performance Computer Architecture (HPCA), Feb 2014.

[36]

D. H. Woo, N. H. Seong, D. L. Lewis, and H.-H. Lee, "An optimized 3d-stacked memory architecture by exploiting excessive, high-density TSV bandwidth," in Proceedings of the 16th International Symposium on High Performance Computer Architecture (HPCA), Jan 2010.

[37]

W. A. Wulf and S. A. McKee, "Hitting the Memory Wall: Implications of the Obvious," Computer Architecture News, Mar 1995.

Digital Library

[38]

L. Zhao, R. Iyer, R. Illikkal, and D. Newell, "Exploring DRAM cache architectures for CMP server platforms," in Proceedings of the 25th International Conference on Computer Design (ICCD), Oct 2007.

Cited By

Kim SHur J(2023)Adaptive Image Size Padding for Load Balancing in System-on-Chip Memory HierarchyElectronics10.3390/electronics1216339312:16(3393)Online publication date: 9-Aug-2023
https://doi.org/10.3390/electronics12163393
Peng ZFeng DChen JHu JHuang C(2023)RHPM: Using Relative Hotness to Guide Page Migration for Hybrid Memory SystemsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.323183642:8(2514-2526)Online publication date: Aug-2023
https://doi.org/10.1109/TCAD.2022.3231836
TSUKADA STAKAYASHIKI HSATO MKOMATSU KKOBAYASHI H(2022)A Metadata Prefetching Mechanism for Hybrid Memory ArchitecturesIEICE Transactions on Electronics10.1587/transele.2021LHP0004E105.C:6(232-243)Online publication date: 1-Jun-2022
https://doi.org/10.1587/transele.2021LHP0004
Show More Cited By

Index Terms

A fully associative, tagless DRAM cache
1. Hardware
  1. Hardware validation
  2. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

A fully associative, tagless DRAM cache
ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture

This paper introduces a tagless cache architecture for large in-package DRAM caches. The conventional die-stacked DRAM cache has both a TLB and a cache tag array, which are responsible for virtual-to-physical and physical-to-cache address translation, ...
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
Special Issue: Proceedings of the 17th annual international symposium on Computer Architecture

Projections of computer technology forecast processors with peak performance of 1,000 MIPS in the relatively near future. These processors could easily lose half or more of their performance in the memory hierarchy if the hierarchy design is based on ...
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
ISCA '90: Proceedings of the 17th annual international symposium on Computer Architecture

Projections of computer technology forecast processors with peak performance of 1,000 MIPS in the relatively near future. These processors could easily lose half or more of their performance in the memory hierarchy if the hierarchy design is based on ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 43, Issue 3S

ISCA'15

June 2015

745 pages

ISSN:0163-5964

DOI:10.1145/2872887

Editor:
Doug DeGroot
acm dot org

Issue’s Table of Contents

ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture
June 2015
768 pages
ISBN:9781450334020
DOI:10.1145/2749469
General Chair:
Debbie Marr
Intel
,
Program Chair:
David Albonesi
Cornell

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2015

Published in SIGARCH Volume 43, Issue 3S

Check for updates

Qualifiers

Research-article

Funding Sources

National Research Foundation of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

81
Total Citations
View Citations
1,972
Total Downloads

Downloads (Last 12 months)147
Downloads (Last 6 weeks)18

Reflects downloads up to 20 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kim SHur J(2023)Adaptive Image Size Padding for Load Balancing in System-on-Chip Memory HierarchyElectronics10.3390/electronics1216339312:16(3393)Online publication date: 9-Aug-2023
https://doi.org/10.3390/electronics12163393
Peng ZFeng DChen JHu JHuang C(2023)RHPM: Using Relative Hotness to Guide Page Migration for Hybrid Memory SystemsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.323183642:8(2514-2526)Online publication date: Aug-2023
https://doi.org/10.1109/TCAD.2022.3231836
TSUKADA STAKAYASHIKI HSATO MKOMATSU KKOBAYASHI H(2022)A Metadata Prefetching Mechanism for Hybrid Memory ArchitecturesIEICE Transactions on Electronics10.1587/transele.2021LHP0004E105.C:6(232-243)Online publication date: 1-Jun-2022
https://doi.org/10.1587/transele.2021LHP0004
Shin DJang HOh KLee J(2022)An Energy-Efficient DRAM Cache Architecture for Mobile Platforms With PCM-Based Main MemoryACM Transactions on Embedded Computing Systems10.1145/345199521:1(1-22)Online publication date: 14-Jan-2022
https://dl.acm.org/doi/10.1145/3451995
Behnam PBojnordi M(2022)Adaptively Reduced DRAM Caching for Energy-Efficient High Bandwidth MemoryIEEE Transactions on Computers10.1109/TC.2022.314089771:10(2675-2686)Online publication date: 1-Oct-2022
https://doi.org/10.1109/TC.2022.3140897
Kang MHyun SHan TKim JHong S(2022)On-the-Fly Lowering Engine: Offloading Data Layout Conversion for Convolutional Neural NetworksIEEE Access10.1109/ACCESS.2022.319261810(79730-79746)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3192618
Montasari RTait BJahankhani HCarroll F(2022)An Investigation of Microarchitectural Cache-Based Side-Channel Attacks from a Digital Forensic Perspective: Methods of Exploits and CountermeasuresArtificial Intelligence in Cyber Security: Impact and Implications10.1007/978-3-030-88040-8_11(281-306)Online publication date: 1-Jan-2022
https://doi.org/10.1007/978-3-030-88040-8_11
Kal HLee SKo GRo WMartínez JDuato JJohn L(2021)SPACEProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00059(679-691)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00059
Behnam PBojnordi MLi Z(2020)RedCacheProceedings of the 57th ACM/EDAC/IEEE Design Automation Conference10.5555/3437539.3437695(1-6)Online publication date: 20-Jul-2020
https://dl.acm.org/doi/10.5555/3437539.3437695
Eldstål-Ahrens ASourdis I(2020)MemSZACM Transactions on Architecture and Code Optimization10.1145/342466817:4(1-25)Online publication date: 10-Nov-2020
https://dl.acm.org/doi/10.1145/3424668
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents