research-article

Public Access

3D-Xpath: high-density managed DRAM architecture with cost-effective alternative paths for memory transactions

Authors:

Mohammad Alian,

Nam Sung KimAuthors Info & Claims

PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques

Article No.: 22, Pages 1 - 12

https://doi.org/10.1145/3243176.3243191

Published: 01 November 2018 Publication History

Abstract

The advance of DRAM manufacturing technology slows down, whereas the density and performance needs of DRAM continue to increase. This desire has motivated the industry to explore emerging Non-Volatile Memory (e.g., 3D XPoint) and the high-density DRAM (e.g., Managed DRAM Solution). Since such memory technologies increase the density at the cost of longer latency, lower bandwidth, or both, it is essential to use them with fast memory (e.g., conventional DRAM) to which hot pages are transferred at runtime. Nonetheless, we observe that page transfers to fast memory often block memory channels from servicing memory requests from applications for a long period. This in turn significantly increases the high-percentile response time of latency-sensitive applications. In this paper, we propose a high-density managed DRAM architecture, dubbed 3D-XPath for applications demanding both low latency and high capacity for memory. 3D-XPath DRAM stacks conventional DRAM dies with high-density DRAM dies explored in this paper and connects these DRAM dies with 3D-XPath. Especially, 3D-XPath allows unused memory channels to service memory requests from applications when primary channels supposed to handle the memory requests are blocked by page transfers at given moments, considerably increasing the high-percentile response time. This can also improve the throughput of applications frequently copying memory blocks between kernel and user memory spaces. Our evaluation shows that 3D-XPath DRAM decreases high-percentile response time of latency-sensitive applications by ~30% while improving the throughput of an I/O-intensive applications by ~39%, compared with DRAM without 3D-XPath.

References

[1]

2016. SK Hynix to Push its DRAM Technology as Next Global Standards. http://www.ipnomics.net/?p=15826

[2]

2017. iPerf - The ultimate speed test tool for TCP, UDP and SCTP. https://iperf.fr/

[3]

2018. Apache http server project. https://httpd.apache.org/

[4]

2018. memcached - a distributed memory object caching system. https://memcached.org/

[5]

2018. MySQL: The world's most popular open source database. https://httpd.apache.org/

[6]

J. Ahn, S. Li, S. O, and N. P. Jouppi. 2013. McSimA+: A Manycore Simulator with Application-level+ Simulation and Detailed Microarchitecture Modeling. In ISPASS.

[7]

M. Alian, G. Dozsa, U. Darbaz, S. Diestelhorst, D. Kim, and N. S. Kim. 2017. dist-gem5: Distributed Simulation of Computer Clusters. In ISPASS.

[8]

R. B. Aniruddha N. Udipi, Naveen Muralimanohar. 2011. Combining Memory and a Controller with Photonics through 3D-Stacking to Enable Scalable and Energy-Efficient Systems. In ISCA.

Digital Library

[9]

H. Asghari-Moghaddam, Y. H. Son, J. Ahn, and N. S. Kim. 2016. Chameleon: Versatile and Practical Near-DRAM Acceleration Architecture for Large Memory Systems. In MICRO.

Digital Library

[10]

G. Ayers, J. Ahn, C. Kozyrakis, and P. Ranganathan. 2018. Memory Hierarchy for Web Search. In HPCA.

[11]

S. Bhattacharya and V. Apte. 2006. A measurement study of the Linux TCP/IP stack performance and scalability on SMP systems. In Communication System Software and Middleware.

[12]

C. Bienia, S. Kumar, and K. Li. 2008. PARSEC vs. SPLASH-2: A Quantitave Comparison of Two Multithreaded Benchmark Suites on Chip-Multiprocessors. In IEEE International Symposium on Workload Characterization (IISWC).

[13]

C. Bienia, S. Kumar, J. P. Singh, and K. Li. 2008. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In PACT.

Digital Library

[14]

R. Callaghan. 2014. ULLtraDIMM SSD Overview.

[15]

S. Cha, S. O, H. Shin, S. Hwang, K. Park, S. J. Jang, J. S. Choi, G. Y. Jin, Y. H. Son, H. Cho, J. H. Ahn, and N. S. Kim. 2017. Defect Analysis and Cost-effective Resilience Architecture for Future DRAM Devices. In HPCA.

[16]

K. K. Chang, P. J. Nair, D. Lee, S. Ghose, M. K. Qureshi, and O. Mutlu. 2016. Low-Cost Inter-Linked Subarrays (LISA): Enabling Fast Inter-Subarray Data Movement in DRAM. In HPCA.

[17]

K. Chen, S. Li, N. Muralimanohar, J. Ahn, J. B. Brockman, and N. P. Jouppi. 2012. CACTI-3DD: Architecture-level Modeling for 3D Die-stacked DRAM Main Memory. In DATE.

Digital Library

[18]

J. Choi. 2014. Next Big Thing: DDR4 3DS. In Server Forum.

[19]

C. Chou, A. Jaleel, and M. K. Qureshi. 2014. CAMEO: A Two-Level Memory Organization with Capacity of Main Memory and Flexibility of Hardware-Managed Cache. In MICRO.

Digital Library

[20]

S. P. E. Corporation. 2006. SPEC CPU2006. https://www.spec.org/cpu2006/

[21]

G. Dhiman, R. Ayoub, and T. Rosing. 2009. PDRAM: a Hybrid PRAM and DRAM Main Memory System. In DAC.

Digital Library

[22]

X. Dong, Y. Xie, N. Muralimanohar, and N. P. Jouppi. 2010. Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support. In SC.

Digital Library

[23]

X. Dong, J. Zhao, and Y. Xie. 2010. Fabrication Cost Analysis and Cost-Aware Design Space Exploration for 3-D ICs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 29, 12 (Dec 2010), 1959--1972.

Digital Library

[24]

B. Gervasi. 2016. NVDIMM-P: A New Hybrid Architecture. In Open Server Summit (OSS).

[25]

Intel. 2010. Intel Xeon Processor 7500 Series Datasheet.

[26]

A. Jaleel. 2010. Memory Characterization of Workloads Using Instrumentation-Driven Simulation. Web Copy: http://www.glue.umd.edu/ajaleel/workload (2010).

[27]

JEDEC Standard. 2015. High Bandwidth Memory (HBM) DRAM. JESD235A (2015).

[28]

JEDEC Standard. 2016. DDR4 SDRAM Load Reduced DIMM Design Specification. JESD21-C (2016).

[29]

JEDEC Standard. 2016. DDR4 SDRAM Registered DIMM Design Specification. JESD21-C (2016).

[30]

JEDEC Standard. 2016. Graphics Double Data Rate (GDDR5X) SGRAM Standard. JESD232A (2016).

[31]

JEDEC Standard. 2017. LOW POWER DOUBLE DATA RATE 4X (LPDDR4X). JESD209-4-1 (2017).

[32]

X. Jiang, N. Madan, L. Zhao, M. Upton, R. Iyer, S. Makineni, D. Newell, Y. Solihin, and R. Balasubramonian. 2011. CHOP: Integrating DRAM Caches for CMP Server Platforms. IEEE micro (2011).

Digital Library

[33]

N. P. Jouppi. 1990. Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers. In ISCA.

Digital Library

[34]

H. Kasture and D. Sanchez. 2016. Tailbench: a Benchmark Suite and Evaluation Methodology for Latency-critical Applications. In IISWC.

[35]

J. Kim and Y. Kim. 2014. HBM: Memory Soluation for Bandwidth-Hungry Processors. In Hot Chips.

[36]

J. Kim, M. Sullivan, and M. Erez. 2015. Bamboo ECC: Strong, Safe, and Flexible Codes for Reliable Computer Memory. In HPCA.

[37]

B. C. Lee, E. Ipek, O. Mutlu, and D. Burger. 2009. Architecting Phase Change Memory as a Scalable DRAM Alternative. In ISCA.

Digital Library

[38]

D. Lee, S. Ghose, G. Pekhimenko, S. Khan, and O. Mutlu. 2016. Simultaneous Multi-Layer Access: Improving 3D-Stacked Memory Bandwidth at Low Cost. ACM TACO (January 2016).

Digital Library

[39]

D. Lee, Y. Kim, V. Seshadri, J. Liu, L. Subramanian, and O. Mutlu. 2013. Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture. In HPCA.

Digital Library

[40]

H. G. Lee, S. Baek, C. Nicopoulos, and J. Kim. 2011. An Energy-and Performance-Aware DRAM Cache Architecture for Hybrid DRAM/PCM Main MemorySystems. In Intl. Conf. on Computer Design (ICCD).

Digital Library

[41]

S. Lee, H. Cho, Y. H. Son, Y. Ro, N. S. Kim, and J. Ahn. 2018. Leveraging Power-Performance Relationship of Energy-Efficient Modern DRAM Devices. IEEE Access 6 (2018).

[42]

S.-H. Lee. 2016. Technology scaling challenges and opportunities of memory devices. In IEEE International Electron Devices Meeting (IEDM).

[43]

G. H. Loh and M. D. Hill. 2011. Efficiently Enabling Conventional Block Sizes for Very Large Die-stacked DRAM Caches. In MICRO.

Digital Library

[44]

J. Meza, J. Chang, Y. HanBin, O. Mutlu, and P. Ranganathan. 2012. Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularty DRAM Cache Management. In IEEE Computer Architecture Letters.

Digital Library

[45]

Micron. 2011. RLDRAM3 Datasheet.

[46]

O. Mutlu and T. Moscibroda. 2008. Parallelism-Aware Batch Scheduling: Enhancing Both Performance and Fairness of Shared DRAM Systems. In ISCA.

Digital Library

[47]

P. J. Nair, D.-H. Kim, and M. K. Qureshi. 2013. ArchShield: Architectural Framework for Assisting DRAM Scaling by Tolerating High Error Rates. In ISCA.

Digital Library

[48]

P. J. Nair, V. Sridharan, and M. K. Qureshi. 2016. XED: Exposing On-Die Error Detection Information for Strong Memory Reliability. In ISCA.

Digital Library

[49]

R. Oh, B. Lee, S.-W. Shin, W. Bae, H. Choi, I. Song, Y.-S. Lee, J.-H. Choi, C.-W. Kim, S.-J. Jang, and J. S. Choi. 2014. Design Technologies for a 1.2V 2.4Gb/s/pin High Capacity DDR4 SDRAM with TSVs. In VLSI Circuits Digest of Technical Papers.

[50]

S. K. Palaniappan and P. B. Nagaraja. 2008. Efficient data transfer through zero copy. https://www.ibm.com/developerworks/library/j-zerocopy/

[51]

J. Park, Y. Hwang, S.-W. Kim, S. Han, J. Park, J. Kim, J. Seo, and B. Kim. 2015. 20nm DRAM: A new beginning of another revolution. In IEDM.

[52]

J. T. Pawlowski. 2011. Hybrid Memory Cube (HMC). In Hot Chips.

[53]

M. K. Qureshi, V. Srinivasan, and J. A. Rivers. 2009. Scalable High Performance Main Memory System Using Phase-Change Memory Technology. In ISCA.

Digital Library

[54]

R. K. Ramanujan, R. Agarwal, and G. J. Hinton. 2011. Apparatus and Method for Implementing a Multi-level Memory Hierarchy Having Different Operating Modes. US Patent App. 13/994,731.

[55]

L. Ramos, E. Gorbatov, and R. Bianchini. 2011. Page Placement in Hybrid Memory Systems. In ICS.

Digital Library

[56]

J. H. Ryoo, M. R. Meswani, A. Prodromou, and L. K. John. 2017. SILC-FM: Subblocked InterLeaved Cache-Like Flat Memory Organization. In HPCA.

[57]

Samsung Semiconductor. 2016. Research collaboration communications.

[58]

V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. 2013. RowClone: Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization. In MICRO.

Digital Library

[59]

T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. 2002. Automatically Characterizing Large Scale Program Behavior. In ASPLOS.

Digital Library

[60]

W. Shin, J. Yang, J. Choi, and L.-S. Kim. 2014. NUAT: A Non-Uniform Access Time Memory Controller. In HPCA.

[61]

J. Sim, A. R. Alameldeen, Z. Chishti, C. Wilkerson, and H. Kim. 2014. Transparent Hardware Management of Stacked DRAM as Part of Memory. In MICRO.

Digital Library

[62]

S. Sivaram. 2016. Storage Class Memory: Learning from 3D NAND.

[63]

A. Sodani. 2015. Knights Landing (KNL): 2nd Generation Intel® Xeon Phi Processor. In Hot Chips.

[64]

Y. H. Son, S. Lee, S. O, S. Kwo, N. S. Kim, and J. Ahn. 2015. CiDRA: A Cache-inspired DRAM Resilience Architecture. In HPCA.

[65]

Y. H. Son, S. O, Y. Ro, J. W. Lee, and J. Ahn. 2013. Reducing Memory Access Latency with Asymmetric DRAM Bank Organizations. In ISCA.

Digital Library

[66]

S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. 1995. The SPLASH-2 Programs: Characterization and Methodological Considerations. In ISCA.

Digital Library

[67]

C. Xu, D. Niu, N. Muralimanohar, R. Balasubramonian, T. Zhang, S. Yu, and Y. Xie. 2015. Overcoming the Challenges of Crossbar Resistive Memory Architectures. In HPCA.

Cited By

Hong JCho SPark GYang WGong YKim G(2024)Bandwidth-Effective DRAM Cache for GPU s with Storage-Class Memory2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00021(139-155)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00021
Sun YYuan YYu ZKuper RSong CHuang JJi HAgarwal SLou JJeong IWang RAhn JXu TKim N(2023)Demystifying CXL Memory with Genuine CXL-Ready Systems and DevicesProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614256(105-121)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614256
Lee HLee SJung YKim D(2023)T-CAT: Dynamic Cache Allocation for Tiered Memory Systems With Memory InterleavingIEEE Computer Architecture Letters10.1109/LCA.2023.329019722:2(73-76)Online publication date: Jul-2023
https://doi.org/10.1109/LCA.2023.3290197
Show More Cited By

Index Terms

3D-Xpath: high-density managed DRAM architecture with cost-effective alternative paths for memory transactions
1. Hardware

Recommendations

Analyzing the suitability of contemporary 3D-stacked PIM architectures for HPC scientific applications
CF '19: Proceedings of the 16th ACM International Conference on Computing Frontiers

Scaling off-chip bandwidth is challenging due to fundamental limitations, such as a fixed pin count and plateauing signaling rates. Recently, vendors have turned to 2.5D and 3D stacking to closely integrate system components. Interestingly, these ...
Triple Engine Processor (TEP): A Heterogeneous Near-Memory Processor for Diverse Kernel Operations

The advent of 3D memory stacking technology, which integrates a logic layer and stacked memories, is expected to be one of the most promising memory technologies to mitigate the memory wall problem by leveraging the concept of near-memory processing (...
Revisiting wear leveling design on compression applied 3D NAND flash memory: work-in-progress
CODES '18: Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis

Compression has been demonstrated as an efficient method for lifetime improvement on flash memory. However, data compression ratios are various, which bring proportional wearing on flash pages. Furthermore, the compression schemes have still not been ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques

November 2018

494 pages

ISBN:9781450359863

DOI:10.1145/3243176

General Chair:
Skevos Evripidou
University of Cyprus, Cyprus
,
Program Chairs:
Per Stenström
Chalmers University of Technology, Sweden
,
Michael O'Boyle
University of Edinburgh, UK

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

In-Cooperation

IFIP WG 10.3: IFIP WG 10.3
IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

PACT '18

Sponsor:

SIGARCH

PACT '18: International conference on Parallel Architectures and Compilation Techniques

November 1 - 4, 2018

Limassol, Cyprus

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
859
Total Downloads

Downloads (Last 12 months)183
Downloads (Last 6 weeks)46

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hong JCho SPark GYang WGong YKim G(2024)Bandwidth-Effective DRAM Cache for GPU s with Storage-Class Memory2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00021(139-155)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00021
Sun YYuan YYu ZKuper RSong CHuang JJi HAgarwal SLou JJeong IWang RAhn JXu TKim N(2023)Demystifying CXL Memory with Genuine CXL-Ready Systems and DevicesProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614256(105-121)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614256
Lee HLee SJung YKim D(2023)T-CAT: Dynamic Cache Allocation for Tiered Memory Systems With Memory InterleavingIEEE Computer Architecture Letters10.1109/LCA.2023.329019722:2(73-76)Online publication date: Jul-2023
https://doi.org/10.1109/LCA.2023.3290197
Wi MPark JKo SKim MSung Kim NLee EAhn J(2023)SHADOW: Preventing Row Hammer in DRAM with Intra-Subarray Row Shuffling2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070966(333-346)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10070966
Lee CShin WKim DYu YKim SKo TSeo DPark JLee KChoi SKim NG VGeorge AV VLee DChoi KSong CKim DChoi IJung ISong YHan J(2020)NVDIMM-C: A Byte-Addressable Non-Volatile Memory Module for Compatibility with Standard DDR Memory Interfaces2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA47549.2020.00048(502-514)Online publication date: Feb-2020
https://doi.org/10.1109/HPCA47549.2020.00048

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents