research-article

A fully associative, tagless DRAM cache

Authors:

Hyunggyun Yang,

Jae W. LeeAuthors Info & Claims

ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture

Pages 211 - 222

https://doi.org/10.1145/2749469.2750383

Published: 13 June 2015 Publication History

Abstract

This paper introduces a tagless cache architecture for large in-package DRAM caches. The conventional die-stacked DRAM cache has both a TLB and a cache tag array, which are responsible for virtual-to-physical and physical-to-cache address translation, respectively. We propose to align the granularity of caching with OS page size and take a unified approach to address translation and cache tag management. To this end, we introduce cache-map TLB (cTLB), which stores virtual-to-cache, instead of virtual-to-physical, address mappings. At a TLB miss, the TLB miss handler allocates the requested block into the cache if it is not cached yet, and updates both the page table and cTLB with the virtual-to-cache address mapping. Assuming the availability of large in-package DRAM caches, this ensures that an access to the memory region within the TLB reach always hits in the cache with low hit latency since a TLB access immediately returns the exact location of the requested block in the cache, hence saving a tag-checking operation. The remaining cache space is used as victim cache for memory pages that are recently evicted from cTLB. By completely eliminating data structures for cache tag management, from either on-die SRAM or in-package DRAM, the proposed DRAM cache achieves best scalability and hit latency, while maintaining high hit rate of a fully associative cache. Our evaluation with 3D Through-Silicon Via (TSV)-based in-package DRAM demonstrates that the proposed cache improves the IPC and energy efficiency by 30.9% and 39.5%, respectively, compared to the baseline with no DRAM cache. These numbers translate to 4.3% and 23.8% improvements over an impractical SRAM-tag cache requiring megabytes of on-die SRAM storage, due to low hit latency and zero energy waste for cache tags.

References

[1]

"AMD Working With Hynix For Development of High-Bandwidth 3D Stacked Memory." {Online}. Available: http://wccftech.com/amd-working-hynix-development-highbandwidth-3d-stacked-memory

[2]

"CACTI: An integrated cache and memory access time, cycle time, area, leakage, and dynamic power model." {Online}. Available: http://www.hpl.hp.com/research/cacti

[3]

"Intel unveils 72-core x86 Knights Landing CPU for exascale supercomputing." {Online}. Available: http://www.extremetech.com/extreme/171678-intel-unveils-72-core-x86-knights-landing

[4]

"Interview: Masaaki Tsuruta, Sony Computer Entertainment." {Online}. Available: http://eandt.theiet.org/magazine/2011/12/maasaki-tsu-interview.cfm

[5]

"McSim Simulator." {Online}. Available: http://scale.snu.ac.kr/mcsim

[6]

"Nvidia to Stack up DRAM on Future Volta GPUs." {Online}. Available: http://www.theregister.co.uk/2013/03/19

[7]

"The SAP HANA Database." {Online}. Available: http://www.sap.com/HANA

[8]

"Xilinx SSI Technology." {Online}. Available: http://www.hotchips.org/archives/hc24

[9]

AMD, AMD64 Architecture Programmer's Manual Volume 2: System Programming, May 2013.

[10]

K. Chen, S. Li, N. Muralimanohar, J. H. Ahn, J. B. Brockman, and N. P. Jouppi, "CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory," in Proceedings of the Conference on Design, Automation and Test in Europe (DATE), Mar 2012.

Digital Library

[11]

J. R. Cubillo, R. Weerasekera, Z. Z. Oo, E.-X. Liu, B. Conn, S. Bhattacharya, and R. Patti, "Interconnect design and analysis for through silicon interposers (TSIs)," in Proceedings of the 2011 IEEE International 3D Systems Integration Conference (3DIC), Jan/Feb 2012.

[12]

X. Dong, Y. Xie, N. Muralimanohar, and N. P. Jouppi, "Simple but effective heterogeneous main memory with on-chip memory controller support," in Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Nov 2010.

Digital Library

[13]

P. Gillingham and B. Millar, "High bandwidth memory interface," Jan. 21 2003, US Patent 6,510,503.

[14]

W. Gropp, E. Lusk, and R. Thakur, Using MPI-2: Advanced features of the message-passing interface. MIT press, 1999.

Digital Library

[15]

J. L. Hennessy and D. A. Patterson, Computer architecture: a quantitative approach. Elsevier, 2012.

Digital Library

[16]

J. L. Henning, "SPEC CPU2006 Memory Footprint," Computer Architecture News, vol. 35, no. 1, Mar. 2007.

Digital Library

[17]

C.-C. Huang and V. Nagarajan, "ATCache: reducing DRAM cache latency via a small SRAM tag cache," in Proceedings of the 23rd international conference on Parallel Architectures and Compilation Techniques (PACT), Aug 2014.

Digital Library

[18]

Intel, Intel® 64 and IA-32 Architectures Software Developer's Manual, September 2014.

[19]

S. S. Iyer, "The Evolution of Dense Embedded Memory in High Performance Logic Technologies," in Proceedings of the IEEE International Electron Devices Meeting (IEDM), Dec 2012.

[20]

D. Jevdjic, G. H. Loh, C. Kaynak, and B. Falsafi, "Unison cache: A scalable and effective die-stacked DRAM cache," in Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec 2014.

Digital Library

[21]

D. Jevdjic, S. Volos, and B. Falsafi, "Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache," in Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA), Jun 2013.

Digital Library

[22]

X. Jiang, N. Madan, L. Zhao, M. Upton, R. Iyer, S. Makineni, D. Newell, D. Solihin, and R. Balasubramonian, "CHOP: Adaptive filter-based DRAM caching for CMP server platforms," in Proceedings of the 16th International Symposium on High Performance Computer Architecture (HPCA), Jan 2010.

[23]

A. R. Lebeck, X. Fan, H. Zeng, and C. Ellis, "Power Aware Page Allocation," in Proceedings of the 9th international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Nov 2000.

Digital Library

[24]

D. U. Lee, K. W. Kim, K. W. Kim, H. Kim, J. Y. Kim, Y. J. Park, J. H. Kim, D. S. Kim, H. B. Park, J. W. Shin et al., "25.2 A 1.2 V 8Gb 8-channel 128GB/s high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29nm process and TSV," in Proceedings of 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), Feb 2014.

[25]

S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures," in Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec 2009.

Digital Library

[26]

G. H. Loh, "Extending the effectiveness of 3d-stacked DRAM caches with an adaptive multi-queue policy," in Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec 2009.

Digital Library

[27]

G. H. Loh and M. D. Hill, "Efficiently enabling conventional block sizes for very large die-stacked DRAM caches," in Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec 2011.

Digital Library

[28]

J. T. Pawlowski, "Hybrid Memory Cube," in Hot Chips, Aug 2011.

[29]

M. K. Qureshi and G. H. Loh, "Fundamental latency trade-off in architecting DRAM caches: Outperforming impractical SRAM-tags with a simple and practical design," in Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec 2012.

Digital Library

[30]

L. E. Ramos, E. Gorbatov, and R. Bianchini, "Page placement in hybrid memory systems," in Proceedings of the International Conference on Supercomputing (ICS), Jun 2011.

Digital Library

[31]

S. L. Salzberg, A. M. Phillippy, A. Zimin, D. Puiu, T. Magoc, S. Koren, T. J. Treangen, M. C. Schatz, A. L. Delcher, M. Roberts, G. Marais, M. Pop, and J. A. Yorke, "GAGE: A critical evaluation of genome assemblies and assembly algorithms," Genome Research, Dec 2011.

[32]

T. Sherwood, E. Perelman, G. Hamerly, and B. Calder, "Automatically Characterizing Large Scale Program Behavior," in Proceedings of the 10th international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Oct 2002.

Digital Library

[33]

J. Sim, G. H. Loh, H. Kim, M. O'Connor, and M. Thottethodi, "A mostly-clean DRAM cache for effective hit speculation and self-balancing dispatch," in Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec 2012.

Digital Library

[34]

Y. H. Son, O. Seongil, H. Yang, D. Jung, J. H. Ahn, J. Kim, J. Kim, and J. W. Lee, "Microbank: architecting through-silicon interposer-based main memory systems," in Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Dec 2014.

Digital Library

[35]

Z. Wang, D. A. Jiménez, C. Xu, G. Sun, and Y. Xie, "Adaptive placement and migration policy for an STT-RAM-based hybrid cache," in Proceedings of the 20th International Symposium on High Performance Computer Architecture (HPCA), Feb 2014.

[36]

D. H. Woo, N. H. Seong, D. L. Lewis, and H.-H. Lee, "An optimized 3d-stacked memory architecture by exploiting excessive, high-density TSV bandwidth," in Proceedings of the 16th International Symposium on High Performance Computer Architecture (HPCA), Jan 2010.

[37]

W. A. Wulf and S. A. McKee, "Hitting the Memory Wall: Implications of the Obvious," Computer Architecture News, Mar 1995.

Digital Library

[38]

L. Zhao, R. Iyer, R. Illikkal, and D. Newell, "Exploring DRAM cache architectures for CMP server platforms," in Proceedings of the 25th International Conference on Computer Design (ICCD), Oct 2007.

Cited By

Li YTian BGao M(2024)Trimma: Trimming Metadata Storage and Latency for Hybrid Memory SystemsProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3689612(108-120)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3656019.3689612
Shao QArelakis AStenström P(2024)HMComp: Extending Near-Memory Capacity using Compression in Hybrid MemoryProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656612(74-84)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3650200.3656612
Ryu YKim YJung GAhn JKim J(2024)Native DRAM Cache: Re-architecting DRAM as a Large-Scale Cache for Data Centers2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00086(1144-1156)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00086
Show More Cited By

Index Terms

A fully associative, tagless DRAM cache
1. Hardware
  1. Hardware validation
  2. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

A fully associative, tagless DRAM cache
ISCA'15

This paper introduces a tagless cache architecture for large in-package DRAM caches. The conventional die-stacked DRAM cache has both a TLB and a cache tag array, which are responsible for virtual-to-physical and physical-to-cache address translation, ...
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
Special Issue: Proceedings of the 17th annual international symposium on Computer Architecture

Projections of computer technology forecast processors with peak performance of 1,000 MIPS in the relatively near future. These processors could easily lose half or more of their performance in the memory hierarchy if the hierarchy design is based on ...
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
ISCA '90: Proceedings of the 17th annual international symposium on Computer Architecture

Projections of computer technology forecast processors with peak performance of 1,000 MIPS in the relatively near future. These processors could easily lose half or more of their performance in the memory hierarchy if the hierarchy design is based on ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture

June 2015

768 pages

ISBN:9781450334020

DOI:10.1145/2749469

General Chair:
Debbie Marr
Intel
,
Program Chair:
David Albonesi
Cornell

ACM SIGARCH Computer Architecture News Volume 43, Issue 3S
ISCA'15
June 2015
745 pages
ISSN:0163-5964
DOI:10.1145/2872887
Editor:
Doug DeGroot
acm dot org
Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

IEEE TCCA: IEEE Computer Society Technical Committee on Computer Architecture
SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

National Research Foundation of Korea

Conference

ISCA '15

Sponsor:

IEEE TCCA
SIGARCH

ISCA '15: The 42nd Annual International Symposium on Computer Architecture

June 13 - 17, 2015

Oregon, Portland

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

81
Total Citations
View Citations
1,972
Total Downloads

Downloads (Last 12 months)147
Downloads (Last 6 weeks)18

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li YTian BGao M(2024)Trimma: Trimming Metadata Storage and Latency for Hybrid Memory SystemsProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3689612(108-120)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3656019.3689612
Shao QArelakis AStenström P(2024)HMComp: Extending Near-Memory Capacity using Compression in Hybrid MemoryProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656612(74-84)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3650200.3656612
Ryu YKim YJung GAhn JKim J(2024)Native DRAM Cache: Re-architecting DRAM as a Large-Scale Cache for Data Centers2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00086(1144-1156)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00086
Abdullah RLee HZhou HAwad A(2024)Salus: Efficient Security Support for CXL-Expanded GPU Memory2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00027(1-15)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00027
Hong JCho SPark GYang WGong YKim G(2024)Bandwidth-Effective DRAM Cache for GPU s with Storage-Class Memory2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00021(139-155)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00021
Li YGao M(2023)Baryon: Efficient Hybrid Memory Management with Compression and Sub-Blocking2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071115(137-151)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071115
Kim YKim HSong W(2023)NOMAD: Enabling Non-blocking OS-managed DRAM Cache via Tag-Data Decoupling2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071016(193-205)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071016
Hua YZheng SYin JChen WHuang L(2023)Bumblebee: A MemCache Design for Die-stacked and Off-chip Heterogeneous Memory Systems2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10248000(1-6)Online publication date: 9-Jul-2023
https://doi.org/10.1109/DAC56929.2023.10248000
Olson MKammerdiener BJantz MDoshi KJones T(2022)Online Application Guidance for Heterogeneous Memory SystemsACM Transactions on Architecture and Code Optimization10.1145/353385519:3(1-27)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3533855
Behnam PBojnordi M(2022)Adaptively Reduced DRAM Caching for Energy-Efficient High Bandwidth MemoryIEEE Transactions on Computers10.1109/TC.2022.314089771:10(2675-2686)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1109/TC.2022.3140897
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents