Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

A fully associative, tagless DRAM cache

Published: 13 June 2015 Publication History

Abstract

This paper introduces a tagless cache architecture for large in-package DRAM caches. The conventional die-stacked DRAM cache has both a TLB and a cache tag array, which are responsible for virtual-to-physical and physical-to-cache address translation, respectively. We propose to align the granularity of caching with OS page size and take a unified approach to address translation and cache tag management. To this end, we introduce cache-map TLB (cTLB), which stores virtual-to-cache, instead of virtual-to-physical, address mappings. At a TLB miss, the TLB miss handler allocates the requested block into the cache if it is not cached yet, and updates both the page table and cTLB with the virtual-to-cache address mapping. Assuming the availability of large in-package DRAM caches, this ensures that an access to the memory region within the TLB reach always hits in the cache with low hit latency since a TLB access immediately returns the exact location of the requested block in the cache, hence saving a tag-checking operation. The remaining cache space is used as victim cache for memory pages that are recently evicted from cTLB. By completely eliminating data structures for cache tag management, from either on-die SRAM or in-package DRAM, the proposed DRAM cache achieves best scalability and hit latency, while maintaining high hit rate of a fully associative cache. Our evaluation with 3D Through-Silicon Via (TSV)-based in-package DRAM demonstrates that the proposed cache improves the IPC and energy efficiency by 30.9% and 39.5%, respectively, compared to the baseline with no DRAM cache. These numbers translate to 4.3% and 23.8% improvements over an impractical SRAM-tag cache requiring megabytes of on-die SRAM storage, due to low hit latency and zero energy waste for cache tags.

References

[1]
"AMD Working With Hynix For Development of High-Bandwidth 3D Stacked Memory." {Online}. Available: http://wccftech.com/amd-working-hynix-development-highbandwidth-3d-stacked-memory
[2]
"CACTI: An integrated cache and memory access time, cycle time, area, leakage, and dynamic power model." {Online}. Available: http://www.hpl.hp.com/research/cacti
[3]
"Intel unveils 72-core x86 Knights Landing CPU for exascale supercomputing." {Online}. Available: http://www.extremetech.com/extreme/171678-intel-unveils-72-core-x86-knights-landing
[4]
"Interview: Masaaki Tsuruta, Sony Computer Entertainment." {Online}. Available: http://eandt.theiet.org/magazine/2011/12/maasaki-tsu-interview.cfm
[5]
"McSim Simulator." {Online}. Available: http://scale.snu.ac.kr/mcsim
[6]
"Nvidia to Stack up DRAM on Future Volta GPUs." {Online}. Available: http://www.theregister.co.uk/2013/03/19
[7]
"The SAP HANA Database." {Online}. Available: http://www.sap.com/HANA
[8]
"Xilinx SSI Technology." {Online}. Available: http://www.hotchips.org/archives/hc24
[9]
AMD, AMD64 Architecture Programmer's Manual Volume 2: System Programming, May 2013.
[10]
K. Chen, S. Li, N. Muralimanohar, J. H. Ahn, J. B. Brockman, and N. P. Jouppi, "CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory," in Proceedings of the Conference on Design, Automation and Test in Europe (DATE), Mar 2012.
[11]
J. R. Cubillo, R. Weerasekera, Z. Z. Oo, E.-X. Liu, B. Conn, S. Bhattacharya, and R. Patti, "Interconnect design and analysis for through silicon interposers (TSIs)," in Proceedings of the 2011 IEEE International 3D Systems Integration Conference (3DIC), Jan/Feb 2012.
[12]
X. Dong, Y. Xie, N. Muralimanohar, and N. P. Jouppi, "Simple but effective heterogeneous main memory with on-chip memory controller support," in Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Nov 2010.
[13]
P. Gillingham and B. Millar, "High bandwidth memory interface," Jan. 21 2003, US Patent 6,510,503.
[14]
W. Gropp, E. Lusk, and R. Thakur, Using MPI-2: Advanced features of the message-passing interface. MIT press, 1999.
[15]
J. L. Hennessy and D. A. Patterson, Computer architecture: a quantitative approach. Elsevier, 2012.
[16]
J. L. Henning, "SPEC CPU2006 Memory Footprint," Computer Architecture News, vol. 35, no. 1, Mar. 2007.
[17]
C.-C. Huang and V. Nagarajan, "ATCache: reducing DRAM cache latency via a small SRAM tag cache," in Proceedings of the 23rd international conference on Parallel Architectures and Compilation Techniques (PACT), Aug 2014.
[18]
Intel, Intel® 64 and IA-32 Architectures Software Developer's Manual, September 2014.
[19]
S. S. Iyer, "The Evolution of Dense Embedded Memory in High Performance Logic Technologies," in Proceedings of the IEEE International Electron Devices Meeting (IEDM), Dec 2012.
[20]
D. Jevdjic, G. H. Loh, C. Kaynak, and B. Falsafi, "Unison cache: A scalable and effective die-stacked DRAM cache," in Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec 2014.
[21]
D. Jevdjic, S. Volos, and B. Falsafi, "Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache," in Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA), Jun 2013.
[22]
X. Jiang, N. Madan, L. Zhao, M. Upton, R. Iyer, S. Makineni, D. Newell, D. Solihin, and R. Balasubramonian, "CHOP: Adaptive filter-based DRAM caching for CMP server platforms," in Proceedings of the 16th International Symposium on High Performance Computer Architecture (HPCA), Jan 2010.
[23]
A. R. Lebeck, X. Fan, H. Zeng, and C. Ellis, "Power Aware Page Allocation," in Proceedings of the 9th international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Nov 2000.
[24]
D. U. Lee, K. W. Kim, K. W. Kim, H. Kim, J. Y. Kim, Y. J. Park, J. H. Kim, D. S. Kim, H. B. Park, J. W. Shin et al., "25.2 A 1.2 V 8Gb 8-channel 128GB/s high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29nm process and TSV," in Proceedings of 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), Feb 2014.
[25]
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures," in Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec 2009.
[26]
G. H. Loh, "Extending the effectiveness of 3d-stacked DRAM caches with an adaptive multi-queue policy," in Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec 2009.
[27]
G. H. Loh and M. D. Hill, "Efficiently enabling conventional block sizes for very large die-stacked DRAM caches," in Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec 2011.
[28]
J. T. Pawlowski, "Hybrid Memory Cube," in Hot Chips, Aug 2011.
[29]
M. K. Qureshi and G. H. Loh, "Fundamental latency trade-off in architecting DRAM caches: Outperforming impractical SRAM-tags with a simple and practical design," in Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec 2012.
[30]
L. E. Ramos, E. Gorbatov, and R. Bianchini, "Page placement in hybrid memory systems," in Proceedings of the International Conference on Supercomputing (ICS), Jun 2011.
[31]
S. L. Salzberg, A. M. Phillippy, A. Zimin, D. Puiu, T. Magoc, S. Koren, T. J. Treangen, M. C. Schatz, A. L. Delcher, M. Roberts, G. Marais, M. Pop, and J. A. Yorke, "GAGE: A critical evaluation of genome assemblies and assembly algorithms," Genome Research, Dec 2011.
[32]
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder, "Automatically Characterizing Large Scale Program Behavior," in Proceedings of the 10th international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Oct 2002.
[33]
J. Sim, G. H. Loh, H. Kim, M. O'Connor, and M. Thottethodi, "A mostly-clean DRAM cache for effective hit speculation and self-balancing dispatch," in Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec 2012.
[34]
Y. H. Son, O. Seongil, H. Yang, D. Jung, J. H. Ahn, J. Kim, J. Kim, and J. W. Lee, "Microbank: architecting through-silicon interposer-based main memory systems," in Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Dec 2014.
[35]
Z. Wang, D. A. Jiménez, C. Xu, G. Sun, and Y. Xie, "Adaptive placement and migration policy for an STT-RAM-based hybrid cache," in Proceedings of the 20th International Symposium on High Performance Computer Architecture (HPCA), Feb 2014.
[36]
D. H. Woo, N. H. Seong, D. L. Lewis, and H.-H. Lee, "An optimized 3d-stacked memory architecture by exploiting excessive, high-density TSV bandwidth," in Proceedings of the 16th International Symposium on High Performance Computer Architecture (HPCA), Jan 2010.
[37]
W. A. Wulf and S. A. McKee, "Hitting the Memory Wall: Implications of the Obvious," Computer Architecture News, Mar 1995.
[38]
L. Zhao, R. Iyer, R. Illikkal, and D. Newell, "Exploring DRAM cache architectures for CMP server platforms," in Proceedings of the 25th International Conference on Computer Design (ICCD), Oct 2007.

Cited By

View all
  • (2023)Adaptive Image Size Padding for Load Balancing in System-on-Chip Memory HierarchyElectronics10.3390/electronics1216339312:16(3393)Online publication date: 9-Aug-2023
  • (2023)RHPM: Using Relative Hotness to Guide Page Migration for Hybrid Memory SystemsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.323183642:8(2514-2526)Online publication date: Aug-2023
  • (2022)A Metadata Prefetching Mechanism for Hybrid Memory ArchitecturesIEICE Transactions on Electronics10.1587/transele.2021LHP0004E105.C:6(232-243)Online publication date: 1-Jun-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 43, Issue 3S
ISCA'15
June 2015
745 pages
ISSN:0163-5964
DOI:10.1145/2872887
Issue’s Table of Contents
  • cover image ACM Conferences
    ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture
    June 2015
    768 pages
    ISBN:9781450334020
    DOI:10.1145/2749469
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2015
Published in SIGARCH Volume 43, Issue 3S

Check for updates

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)147
  • Downloads (Last 6 weeks)18
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Adaptive Image Size Padding for Load Balancing in System-on-Chip Memory HierarchyElectronics10.3390/electronics1216339312:16(3393)Online publication date: 9-Aug-2023
  • (2023)RHPM: Using Relative Hotness to Guide Page Migration for Hybrid Memory SystemsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.323183642:8(2514-2526)Online publication date: Aug-2023
  • (2022)A Metadata Prefetching Mechanism for Hybrid Memory ArchitecturesIEICE Transactions on Electronics10.1587/transele.2021LHP0004E105.C:6(232-243)Online publication date: 1-Jun-2022
  • (2022)An Energy-Efficient DRAM Cache Architecture for Mobile Platforms With PCM-Based Main MemoryACM Transactions on Embedded Computing Systems10.1145/345199521:1(1-22)Online publication date: 14-Jan-2022
  • (2022)Adaptively Reduced DRAM Caching for Energy-Efficient High Bandwidth MemoryIEEE Transactions on Computers10.1109/TC.2022.314089771:10(2675-2686)Online publication date: 1-Oct-2022
  • (2022)On-the-Fly Lowering Engine: Offloading Data Layout Conversion for Convolutional Neural NetworksIEEE Access10.1109/ACCESS.2022.319261810(79730-79746)Online publication date: 2022
  • (2022)An Investigation of Microarchitectural Cache-Based Side-Channel Attacks from a Digital Forensic Perspective: Methods of Exploits and CountermeasuresArtificial Intelligence in Cyber Security: Impact and Implications10.1007/978-3-030-88040-8_11(281-306)Online publication date: 1-Jan-2022
  • (2021)SPACEProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00059(679-691)Online publication date: 14-Jun-2021
  • (2020)RedCacheProceedings of the 57th ACM/EDAC/IEEE Design Automation Conference10.5555/3437539.3437695(1-6)Online publication date: 20-Jul-2020
  • (2020)MemSZACM Transactions on Architecture and Code Optimization10.1145/342466817:4(1-25)Online publication date: 10-Nov-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media