Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2749469.2750383acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

A fully associative, tagless DRAM cache

Published: 13 June 2015 Publication History

Abstract

This paper introduces a tagless cache architecture for large in-package DRAM caches. The conventional die-stacked DRAM cache has both a TLB and a cache tag array, which are responsible for virtual-to-physical and physical-to-cache address translation, respectively. We propose to align the granularity of caching with OS page size and take a unified approach to address translation and cache tag management. To this end, we introduce cache-map TLB (cTLB), which stores virtual-to-cache, instead of virtual-to-physical, address mappings. At a TLB miss, the TLB miss handler allocates the requested block into the cache if it is not cached yet, and updates both the page table and cTLB with the virtual-to-cache address mapping. Assuming the availability of large in-package DRAM caches, this ensures that an access to the memory region within the TLB reach always hits in the cache with low hit latency since a TLB access immediately returns the exact location of the requested block in the cache, hence saving a tag-checking operation. The remaining cache space is used as victim cache for memory pages that are recently evicted from cTLB. By completely eliminating data structures for cache tag management, from either on-die SRAM or in-package DRAM, the proposed DRAM cache achieves best scalability and hit latency, while maintaining high hit rate of a fully associative cache. Our evaluation with 3D Through-Silicon Via (TSV)-based in-package DRAM demonstrates that the proposed cache improves the IPC and energy efficiency by 30.9% and 39.5%, respectively, compared to the baseline with no DRAM cache. These numbers translate to 4.3% and 23.8% improvements over an impractical SRAM-tag cache requiring megabytes of on-die SRAM storage, due to low hit latency and zero energy waste for cache tags.

References

[1]
"AMD Working With Hynix For Development of High-Bandwidth 3D Stacked Memory." {Online}. Available: http://wccftech.com/amd-working-hynix-development-highbandwidth-3d-stacked-memory
[2]
"CACTI: An integrated cache and memory access time, cycle time, area, leakage, and dynamic power model." {Online}. Available: http://www.hpl.hp.com/research/cacti
[3]
"Intel unveils 72-core x86 Knights Landing CPU for exascale supercomputing." {Online}. Available: http://www.extremetech.com/extreme/171678-intel-unveils-72-core-x86-knights-landing
[4]
"Interview: Masaaki Tsuruta, Sony Computer Entertainment." {Online}. Available: http://eandt.theiet.org/magazine/2011/12/maasaki-tsu-interview.cfm
[5]
"McSim Simulator." {Online}. Available: http://scale.snu.ac.kr/mcsim
[6]
"Nvidia to Stack up DRAM on Future Volta GPUs." {Online}. Available: http://www.theregister.co.uk/2013/03/19
[7]
"The SAP HANA Database." {Online}. Available: http://www.sap.com/HANA
[8]
"Xilinx SSI Technology." {Online}. Available: http://www.hotchips.org/archives/hc24
[9]
AMD, AMD64 Architecture Programmer's Manual Volume 2: System Programming, May 2013.
[10]
K. Chen, S. Li, N. Muralimanohar, J. H. Ahn, J. B. Brockman, and N. P. Jouppi, "CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory," in Proceedings of the Conference on Design, Automation and Test in Europe (DATE), Mar 2012.
[11]
J. R. Cubillo, R. Weerasekera, Z. Z. Oo, E.-X. Liu, B. Conn, S. Bhattacharya, and R. Patti, "Interconnect design and analysis for through silicon interposers (TSIs)," in Proceedings of the 2011 IEEE International 3D Systems Integration Conference (3DIC), Jan/Feb 2012.
[12]
X. Dong, Y. Xie, N. Muralimanohar, and N. P. Jouppi, "Simple but effective heterogeneous main memory with on-chip memory controller support," in Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Nov 2010.
[13]
P. Gillingham and B. Millar, "High bandwidth memory interface," Jan. 21 2003, US Patent 6,510,503.
[14]
W. Gropp, E. Lusk, and R. Thakur, Using MPI-2: Advanced features of the message-passing interface. MIT press, 1999.
[15]
J. L. Hennessy and D. A. Patterson, Computer architecture: a quantitative approach. Elsevier, 2012.
[16]
J. L. Henning, "SPEC CPU2006 Memory Footprint," Computer Architecture News, vol. 35, no. 1, Mar. 2007.
[17]
C.-C. Huang and V. Nagarajan, "ATCache: reducing DRAM cache latency via a small SRAM tag cache," in Proceedings of the 23rd international conference on Parallel Architectures and Compilation Techniques (PACT), Aug 2014.
[18]
Intel, Intel® 64 and IA-32 Architectures Software Developer's Manual, September 2014.
[19]
S. S. Iyer, "The Evolution of Dense Embedded Memory in High Performance Logic Technologies," in Proceedings of the IEEE International Electron Devices Meeting (IEDM), Dec 2012.
[20]
D. Jevdjic, G. H. Loh, C. Kaynak, and B. Falsafi, "Unison cache: A scalable and effective die-stacked DRAM cache," in Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec 2014.
[21]
D. Jevdjic, S. Volos, and B. Falsafi, "Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache," in Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA), Jun 2013.
[22]
X. Jiang, N. Madan, L. Zhao, M. Upton, R. Iyer, S. Makineni, D. Newell, D. Solihin, and R. Balasubramonian, "CHOP: Adaptive filter-based DRAM caching for CMP server platforms," in Proceedings of the 16th International Symposium on High Performance Computer Architecture (HPCA), Jan 2010.
[23]
A. R. Lebeck, X. Fan, H. Zeng, and C. Ellis, "Power Aware Page Allocation," in Proceedings of the 9th international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Nov 2000.
[24]
D. U. Lee, K. W. Kim, K. W. Kim, H. Kim, J. Y. Kim, Y. J. Park, J. H. Kim, D. S. Kim, H. B. Park, J. W. Shin et al., "25.2 A 1.2 V 8Gb 8-channel 128GB/s high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29nm process and TSV," in Proceedings of 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), Feb 2014.
[25]
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures," in Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec 2009.
[26]
G. H. Loh, "Extending the effectiveness of 3d-stacked DRAM caches with an adaptive multi-queue policy," in Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec 2009.
[27]
G. H. Loh and M. D. Hill, "Efficiently enabling conventional block sizes for very large die-stacked DRAM caches," in Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec 2011.
[28]
J. T. Pawlowski, "Hybrid Memory Cube," in Hot Chips, Aug 2011.
[29]
M. K. Qureshi and G. H. Loh, "Fundamental latency trade-off in architecting DRAM caches: Outperforming impractical SRAM-tags with a simple and practical design," in Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec 2012.
[30]
L. E. Ramos, E. Gorbatov, and R. Bianchini, "Page placement in hybrid memory systems," in Proceedings of the International Conference on Supercomputing (ICS), Jun 2011.
[31]
S. L. Salzberg, A. M. Phillippy, A. Zimin, D. Puiu, T. Magoc, S. Koren, T. J. Treangen, M. C. Schatz, A. L. Delcher, M. Roberts, G. Marais, M. Pop, and J. A. Yorke, "GAGE: A critical evaluation of genome assemblies and assembly algorithms," Genome Research, Dec 2011.
[32]
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder, "Automatically Characterizing Large Scale Program Behavior," in Proceedings of the 10th international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Oct 2002.
[33]
J. Sim, G. H. Loh, H. Kim, M. O'Connor, and M. Thottethodi, "A mostly-clean DRAM cache for effective hit speculation and self-balancing dispatch," in Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec 2012.
[34]
Y. H. Son, O. Seongil, H. Yang, D. Jung, J. H. Ahn, J. Kim, J. Kim, and J. W. Lee, "Microbank: architecting through-silicon interposer-based main memory systems," in Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Dec 2014.
[35]
Z. Wang, D. A. Jiménez, C. Xu, G. Sun, and Y. Xie, "Adaptive placement and migration policy for an STT-RAM-based hybrid cache," in Proceedings of the 20th International Symposium on High Performance Computer Architecture (HPCA), Feb 2014.
[36]
D. H. Woo, N. H. Seong, D. L. Lewis, and H.-H. Lee, "An optimized 3d-stacked memory architecture by exploiting excessive, high-density TSV bandwidth," in Proceedings of the 16th International Symposium on High Performance Computer Architecture (HPCA), Jan 2010.
[37]
W. A. Wulf and S. A. McKee, "Hitting the Memory Wall: Implications of the Obvious," Computer Architecture News, Mar 1995.
[38]
L. Zhao, R. Iyer, R. Illikkal, and D. Newell, "Exploring DRAM cache architectures for CMP server platforms," in Proceedings of the 25th International Conference on Computer Design (ICCD), Oct 2007.

Cited By

View all
  • (2024)HMComp: Extending Near-Memory Capacity using Compression in Hybrid MemoryProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656612(74-84)Online publication date: 30-May-2024
  • (2024)Native DRAM Cache: Re-architecting DRAM as a Large-Scale Cache for Data Centers2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00086(1144-1156)Online publication date: 29-Jun-2024
  • (2024)Salus: Efficient Security Support for CXL-Expanded GPU Memory2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00027(1-15)Online publication date: 2-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture
June 2015
768 pages
ISBN:9781450334020
DOI:10.1145/2749469
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2015

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

ISCA '15
Sponsor:

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)133
  • Downloads (Last 6 weeks)20
Reflects downloads up to 01 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)HMComp: Extending Near-Memory Capacity using Compression in Hybrid MemoryProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656612(74-84)Online publication date: 30-May-2024
  • (2024)Native DRAM Cache: Re-architecting DRAM as a Large-Scale Cache for Data Centers2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00086(1144-1156)Online publication date: 29-Jun-2024
  • (2024)Salus: Efficient Security Support for CXL-Expanded GPU Memory2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00027(1-15)Online publication date: 2-Mar-2024
  • (2024)Bandwidth-Effective DRAM Cache for GPU s with Storage-Class Memory2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00021(139-155)Online publication date: 2-Mar-2024
  • (2023)Baryon: Efficient Hybrid Memory Management with Compression and Sub-Blocking2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071115(137-151)Online publication date: Feb-2023
  • (2023)NOMAD: Enabling Non-blocking OS-managed DRAM Cache via Tag-Data Decoupling2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071016(193-205)Online publication date: Feb-2023
  • (2023)Bumblebee: A MemCache Design for Die-stacked and Off-chip Heterogeneous Memory Systems2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10248000(1-6)Online publication date: 9-Jul-2023
  • (2022)Online Application Guidance for Heterogeneous Memory SystemsACM Transactions on Architecture and Code Optimization10.1145/353385519:3(1-27)Online publication date: 6-Jul-2022
  • (2022)Adaptively Reduced DRAM Caching for Energy-Efficient High Bandwidth MemoryIEEE Transactions on Computers10.1109/TC.2022.314089771:10(2675-2686)Online publication date: 1-Oct-2022
  • (2021)Off-chip prefetching based on Hidden Markov Model for non-volatile memory architecturesPLOS ONE10.1371/journal.pone.025704716:9(e0257047)Online publication date: 14-Sep-2021
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media