Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3243176.3243191acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article
Public Access

3D-Xpath: high-density managed DRAM architecture with cost-effective alternative paths for memory transactions

Published: 01 November 2018 Publication History

Abstract

The advance of DRAM manufacturing technology slows down, whereas the density and performance needs of DRAM continue to increase. This desire has motivated the industry to explore emerging Non-Volatile Memory (e.g., 3D XPoint) and the high-density DRAM (e.g., Managed DRAM Solution). Since such memory technologies increase the density at the cost of longer latency, lower bandwidth, or both, it is essential to use them with fast memory (e.g., conventional DRAM) to which hot pages are transferred at runtime. Nonetheless, we observe that page transfers to fast memory often block memory channels from servicing memory requests from applications for a long period. This in turn significantly increases the high-percentile response time of latency-sensitive applications. In this paper, we propose a high-density managed DRAM architecture, dubbed 3D-XPath for applications demanding both low latency and high capacity for memory. 3D-XPath DRAM stacks conventional DRAM dies with high-density DRAM dies explored in this paper and connects these DRAM dies with 3D-XPath. Especially, 3D-XPath allows unused memory channels to service memory requests from applications when primary channels supposed to handle the memory requests are blocked by page transfers at given moments, considerably increasing the high-percentile response time. This can also improve the throughput of applications frequently copying memory blocks between kernel and user memory spaces. Our evaluation shows that 3D-XPath DRAM decreases high-percentile response time of latency-sensitive applications by ~30% while improving the throughput of an I/O-intensive applications by ~39%, compared with DRAM without 3D-XPath.

References

[1]
2016. SK Hynix to Push its DRAM Technology as Next Global Standards. http://www.ipnomics.net/?p=15826
[2]
2017. iPerf - The ultimate speed test tool for TCP, UDP and SCTP. https://iperf.fr/
[3]
2018. Apache http server project. https://httpd.apache.org/
[4]
2018. memcached - a distributed memory object caching system. https://memcached.org/
[5]
2018. MySQL: The world's most popular open source database. https://httpd.apache.org/
[6]
J. Ahn, S. Li, S. O, and N. P. Jouppi. 2013. McSimA+: A Manycore Simulator with Application-level+ Simulation and Detailed Microarchitecture Modeling. In ISPASS.
[7]
M. Alian, G. Dozsa, U. Darbaz, S. Diestelhorst, D. Kim, and N. S. Kim. 2017. dist-gem5: Distributed Simulation of Computer Clusters. In ISPASS.
[8]
R. B. Aniruddha N. Udipi, Naveen Muralimanohar. 2011. Combining Memory and a Controller with Photonics through 3D-Stacking to Enable Scalable and Energy-Efficient Systems. In ISCA.
[9]
H. Asghari-Moghaddam, Y. H. Son, J. Ahn, and N. S. Kim. 2016. Chameleon: Versatile and Practical Near-DRAM Acceleration Architecture for Large Memory Systems. In MICRO.
[10]
G. Ayers, J. Ahn, C. Kozyrakis, and P. Ranganathan. 2018. Memory Hierarchy for Web Search. In HPCA.
[11]
S. Bhattacharya and V. Apte. 2006. A measurement study of the Linux TCP/IP stack performance and scalability on SMP systems. In Communication System Software and Middleware.
[12]
C. Bienia, S. Kumar, and K. Li. 2008. PARSEC vs. SPLASH-2: A Quantitave Comparison of Two Multithreaded Benchmark Suites on Chip-Multiprocessors. In IEEE International Symposium on Workload Characterization (IISWC).
[13]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. 2008. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In PACT.
[14]
R. Callaghan. 2014. ULLtraDIMM SSD Overview.
[15]
S. Cha, S. O, H. Shin, S. Hwang, K. Park, S. J. Jang, J. S. Choi, G. Y. Jin, Y. H. Son, H. Cho, J. H. Ahn, and N. S. Kim. 2017. Defect Analysis and Cost-effective Resilience Architecture for Future DRAM Devices. In HPCA.
[16]
K. K. Chang, P. J. Nair, D. Lee, S. Ghose, M. K. Qureshi, and O. Mutlu. 2016. Low-Cost Inter-Linked Subarrays (LISA): Enabling Fast Inter-Subarray Data Movement in DRAM. In HPCA.
[17]
K. Chen, S. Li, N. Muralimanohar, J. Ahn, J. B. Brockman, and N. P. Jouppi. 2012. CACTI-3DD: Architecture-level Modeling for 3D Die-stacked DRAM Main Memory. In DATE.
[18]
J. Choi. 2014. Next Big Thing: DDR4 3DS. In Server Forum.
[19]
C. Chou, A. Jaleel, and M. K. Qureshi. 2014. CAMEO: A Two-Level Memory Organization with Capacity of Main Memory and Flexibility of Hardware-Managed Cache. In MICRO.
[20]
S. P. E. Corporation. 2006. SPEC CPU2006. https://www.spec.org/cpu2006/
[21]
G. Dhiman, R. Ayoub, and T. Rosing. 2009. PDRAM: a Hybrid PRAM and DRAM Main Memory System. In DAC.
[22]
X. Dong, Y. Xie, N. Muralimanohar, and N. P. Jouppi. 2010. Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support. In SC.
[23]
X. Dong, J. Zhao, and Y. Xie. 2010. Fabrication Cost Analysis and Cost-Aware Design Space Exploration for 3-D ICs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 29, 12 (Dec 2010), 1959--1972.
[24]
B. Gervasi. 2016. NVDIMM-P: A New Hybrid Architecture. In Open Server Summit (OSS).
[25]
Intel. 2010. Intel Xeon Processor 7500 Series Datasheet.
[26]
A. Jaleel. 2010. Memory Characterization of Workloads Using Instrumentation-Driven Simulation. Web Copy: http://www.glue.umd.edu/ajaleel/workload (2010).
[27]
JEDEC Standard. 2015. High Bandwidth Memory (HBM) DRAM. JESD235A (2015).
[28]
JEDEC Standard. 2016. DDR4 SDRAM Load Reduced DIMM Design Specification. JESD21-C (2016).
[29]
JEDEC Standard. 2016. DDR4 SDRAM Registered DIMM Design Specification. JESD21-C (2016).
[30]
JEDEC Standard. 2016. Graphics Double Data Rate (GDDR5X) SGRAM Standard. JESD232A (2016).
[31]
JEDEC Standard. 2017. LOW POWER DOUBLE DATA RATE 4X (LPDDR4X). JESD209-4-1 (2017).
[32]
X. Jiang, N. Madan, L. Zhao, M. Upton, R. Iyer, S. Makineni, D. Newell, Y. Solihin, and R. Balasubramonian. 2011. CHOP: Integrating DRAM Caches for CMP Server Platforms. IEEE micro (2011).
[33]
N. P. Jouppi. 1990. Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers. In ISCA.
[34]
H. Kasture and D. Sanchez. 2016. Tailbench: a Benchmark Suite and Evaluation Methodology for Latency-critical Applications. In IISWC.
[35]
J. Kim and Y. Kim. 2014. HBM: Memory Soluation for Bandwidth-Hungry Processors. In Hot Chips.
[36]
J. Kim, M. Sullivan, and M. Erez. 2015. Bamboo ECC: Strong, Safe, and Flexible Codes for Reliable Computer Memory. In HPCA.
[37]
B. C. Lee, E. Ipek, O. Mutlu, and D. Burger. 2009. Architecting Phase Change Memory as a Scalable DRAM Alternative. In ISCA.
[38]
D. Lee, S. Ghose, G. Pekhimenko, S. Khan, and O. Mutlu. 2016. Simultaneous Multi-Layer Access: Improving 3D-Stacked Memory Bandwidth at Low Cost. ACM TACO (January 2016).
[39]
D. Lee, Y. Kim, V. Seshadri, J. Liu, L. Subramanian, and O. Mutlu. 2013. Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture. In HPCA.
[40]
H. G. Lee, S. Baek, C. Nicopoulos, and J. Kim. 2011. An Energy-and Performance-Aware DRAM Cache Architecture for Hybrid DRAM/PCM Main MemorySystems. In Intl. Conf. on Computer Design (ICCD).
[41]
S. Lee, H. Cho, Y. H. Son, Y. Ro, N. S. Kim, and J. Ahn. 2018. Leveraging Power-Performance Relationship of Energy-Efficient Modern DRAM Devices. IEEE Access 6 (2018).
[42]
S.-H. Lee. 2016. Technology scaling challenges and opportunities of memory devices. In IEEE International Electron Devices Meeting (IEDM).
[43]
G. H. Loh and M. D. Hill. 2011. Efficiently Enabling Conventional Block Sizes for Very Large Die-stacked DRAM Caches. In MICRO.
[44]
J. Meza, J. Chang, Y. HanBin, O. Mutlu, and P. Ranganathan. 2012. Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularty DRAM Cache Management. In IEEE Computer Architecture Letters.
[45]
Micron. 2011. RLDRAM3 Datasheet.
[46]
O. Mutlu and T. Moscibroda. 2008. Parallelism-Aware Batch Scheduling: Enhancing Both Performance and Fairness of Shared DRAM Systems. In ISCA.
[47]
P. J. Nair, D.-H. Kim, and M. K. Qureshi. 2013. ArchShield: Architectural Framework for Assisting DRAM Scaling by Tolerating High Error Rates. In ISCA.
[48]
P. J. Nair, V. Sridharan, and M. K. Qureshi. 2016. XED: Exposing On-Die Error Detection Information for Strong Memory Reliability. In ISCA.
[49]
R. Oh, B. Lee, S.-W. Shin, W. Bae, H. Choi, I. Song, Y.-S. Lee, J.-H. Choi, C.-W. Kim, S.-J. Jang, and J. S. Choi. 2014. Design Technologies for a 1.2V 2.4Gb/s/pin High Capacity DDR4 SDRAM with TSVs. In VLSI Circuits Digest of Technical Papers.
[50]
S. K. Palaniappan and P. B. Nagaraja. 2008. Efficient data transfer through zero copy. https://www.ibm.com/developerworks/library/j-zerocopy/
[51]
J. Park, Y. Hwang, S.-W. Kim, S. Han, J. Park, J. Kim, J. Seo, and B. Kim. 2015. 20nm DRAM: A new beginning of another revolution. In IEDM.
[52]
J. T. Pawlowski. 2011. Hybrid Memory Cube (HMC). In Hot Chips.
[53]
M. K. Qureshi, V. Srinivasan, and J. A. Rivers. 2009. Scalable High Performance Main Memory System Using Phase-Change Memory Technology. In ISCA.
[54]
R. K. Ramanujan, R. Agarwal, and G. J. Hinton. 2011. Apparatus and Method for Implementing a Multi-level Memory Hierarchy Having Different Operating Modes. US Patent App. 13/994,731.
[55]
L. Ramos, E. Gorbatov, and R. Bianchini. 2011. Page Placement in Hybrid Memory Systems. In ICS.
[56]
J. H. Ryoo, M. R. Meswani, A. Prodromou, and L. K. John. 2017. SILC-FM: Subblocked InterLeaved Cache-Like Flat Memory Organization. In HPCA.
[57]
Samsung Semiconductor. 2016. Research collaboration communications.
[58]
V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. 2013. RowClone: Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization. In MICRO.
[59]
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. 2002. Automatically Characterizing Large Scale Program Behavior. In ASPLOS.
[60]
W. Shin, J. Yang, J. Choi, and L.-S. Kim. 2014. NUAT: A Non-Uniform Access Time Memory Controller. In HPCA.
[61]
J. Sim, A. R. Alameldeen, Z. Chishti, C. Wilkerson, and H. Kim. 2014. Transparent Hardware Management of Stacked DRAM as Part of Memory. In MICRO.
[62]
S. Sivaram. 2016. Storage Class Memory: Learning from 3D NAND.
[63]
A. Sodani. 2015. Knights Landing (KNL): 2nd Generation Intel® Xeon Phi Processor. In Hot Chips.
[64]
Y. H. Son, S. Lee, S. O, S. Kwo, N. S. Kim, and J. Ahn. 2015. CiDRA: A Cache-inspired DRAM Resilience Architecture. In HPCA.
[65]
Y. H. Son, S. O, Y. Ro, J. W. Lee, and J. Ahn. 2013. Reducing Memory Access Latency with Asymmetric DRAM Bank Organizations. In ISCA.
[66]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. 1995. The SPLASH-2 Programs: Characterization and Methodological Considerations. In ISCA.
[67]
C. Xu, D. Niu, N. Muralimanohar, R. Balasubramonian, T. Zhang, S. Yu, and Y. Xie. 2015. Overcoming the Challenges of Crossbar Resistive Memory Architectures. In HPCA.

Cited By

View all
  • (2024)Bandwidth-Effective DRAM Cache for GPU s with Storage-Class Memory2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00021(139-155)Online publication date: 2-Mar-2024
  • (2023)Demystifying CXL Memory with Genuine CXL-Ready Systems and DevicesProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614256(105-121)Online publication date: 28-Oct-2023
  • (2023)T-CAT: Dynamic Cache Allocation for Tiered Memory Systems With Memory InterleavingIEEE Computer Architecture Letters10.1109/LCA.2023.329019722:2(73-76)Online publication date: Jul-2023
  • Show More Cited By

Index Terms

  1. 3D-Xpath: high-density managed DRAM architecture with cost-effective alternative paths for memory transactions

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques
        November 2018
        494 pages
        ISBN:9781450359863
        DOI:10.1145/3243176
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        In-Cooperation

        • IFIP WG 10.3: IFIP WG 10.3
        • IEEE CS

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 01 November 2018

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. 3D stacked memory
        2. asymmetric latency memory
        3. hardware managed migration

        Qualifiers

        • Research-article

        Funding Sources

        Conference

        PACT '18
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 121 of 471 submissions, 26%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)183
        • Downloads (Last 6 weeks)46
        Reflects downloads up to 18 Nov 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Bandwidth-Effective DRAM Cache for GPU s with Storage-Class Memory2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00021(139-155)Online publication date: 2-Mar-2024
        • (2023)Demystifying CXL Memory with Genuine CXL-Ready Systems and DevicesProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614256(105-121)Online publication date: 28-Oct-2023
        • (2023)T-CAT: Dynamic Cache Allocation for Tiered Memory Systems With Memory InterleavingIEEE Computer Architecture Letters10.1109/LCA.2023.329019722:2(73-76)Online publication date: Jul-2023
        • (2023)SHADOW: Preventing Row Hammer in DRAM with Intra-Subarray Row Shuffling2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070966(333-346)Online publication date: Feb-2023
        • (2020)NVDIMM-C: A Byte-Addressable Non-Volatile Memory Module for Compatibility with Standard DDR Memory Interfaces2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA47549.2020.00048(502-514)Online publication date: Feb-2020

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media