Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2896377.2901453acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
research-article
Public Access

Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization

Published: 14 June 2016 Publication History

Abstract

Long DRAM latency is a critical performance bottleneck in current systems. DRAM access latency is defined by three fundamental operations that take place within the DRAM cell array: (i) activation of a memory row, which opens the row to perform accesses; (ii) precharge, which prepares the cell array for the next memory access; and (iii) restoration of the row, which restores the values of cells in the row that were destroyed due to activation. There is significant latency variation for each of these operations across the cells of a single DRAM chip due to irregularity in the manufacturing process. As a result, some cells are inherently faster to access, while others are inherently slower. Unfortunately, existing systems do not exploit this variation.
The goal of this work is to (i) experimentally characterize and understand the latency variation across cells within a DRAM chip for these three fundamental DRAM operations, and (ii) develop new mechanisms that exploit our understanding of the latency variation to reliably improve performance. To this end, we comprehensively characterize 240 DRAM chips from three major vendors, and make several new observations about latency variation within DRAM. We find that (i) there is large latency variation across the cells for each of the three operations; (ii) variation characteristics exhibit significant spatial locality: slower cells are clustered in certain regions of a DRAM chip; and (iii) the three fundamental operations exhibit different reliability characteristics when the latency of each operation is reduced.
Based on our observations, we propose Flexible-LatencY DRAM (FLY-DRAM), a mechanism that exploits latency variation across DRAM cells within a DRAM chip to improve system performance. The key idea of FLY-DRAM is to exploit the spatial locality of slower cells within DRAM, and access the faster DRAM regions with reduced latencies for the fundamental operations. Our evaluations show that FLY-DRAM improves the performance of a wide range of applications by 13.3%, 17.6%, and 19.5%, on average, for each of the three different vendors' real DRAM chips, in a simulated 8-core system. We conclude that the experimental characterization and analysis of latency variation within modern DRAM, provided by this work, can lead to new techniques that improve DRAM and system performance.

References

[1]
N. Agarwal phet al., "Page Placement Strategies for GPUs Within Heterogeneous Memory Systems," in ASPLOS, 2015.
[2]
H. Bauer et al., "Memory: Are Challenges ahead?" March 2016. Available: http://www.mckinsey.com/industries/semiconductors/our-insights/memory-are-challenges-ahead
[3]
A. Bhattacharjee and M. Martonosi, "Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors," in ISCA, 2009.
[4]
B. H. Bloom, "Space/Time Tradeoffs in Hash Coding with Allowable Errors," CACM, July 1970.
[5]
K. Chakraborty and P. Mazumder, Fault-Tolerance and Reliability Techniques for High-Density Random-Access Memories.\hskip 1em plus 0.5em minus 0.4em\relax Prentice Hall, 2002.
[6]
R. Chandra et al., "Scheduling and Page Migration for Multiprocessor Compute Servers," in ASPLOS, 1994.
[7]
K. Chandrasekar et al., "Exploiting Expendable Process-margins in DRAMs for Run-time Performance Optimization," in DATE, 2014.
[8]
K. K.-W. Chang et al., "Improving DRAM Performance by Parallelizing Refreshes with Accesses," in HPCA, 2014.
[9]
K. K. Chang et al., "Low-Cost Inter-Linked Subarrays (LISA): Enabling Fast Inter-Subarray Data Movement in DRAM," in HPCA, 2016.
[10]
Carnegie Mellon University SAFARI Research Group Source Code Repository, https://github.com/Carnegie Mellon University-SAFARI.
[11]
J. Dean and L. A. Barroso, "The Tail at Scale," CACM, 2013.
[12]
N. El-Sayed et al., "Temperature Management in Data Centers: Why Some (Might) Like It Hot," in SIGMETRICS, 2012.
[13]
S. Eyerman and L. Eeckhout, "System-Level Performance Metrics for Multiprogram Workloads," IEEE Micro, 2008.
[14]
S.-L. Gong et al., "CLEAN-ECC: High Reliability ECC for Adaptive Granularity Memory System," in MICRO, 2015.
[15]
T. Hamamoto et al., "On the Retention Time Distribution of Dynamic Random Access Memory (DRAM)," in IEEE TED, 1998.
[16]
H. Hassan et al., "ChargeCache: Reducing DRAM Latency by Exploiting Row Access Locality," in HPCA, 2016.
[17]
H. Hidaka et al., "The Cache DRAM Architecture," IEEE Micro, 1990.
[18]
A. A. Hwang et al., "Cosmic Rays Don't Strike Twice: Understanding the Nature of DRAM Errors and the Implications for System Design," in ASPLOS, 2012.
[19]
M. Jacobsen et al., "RIFFA 2.1: A Reusable Integration Framework for FPGA Accelerators," RTS, 2015.
[20]
JEDEC, "DDR2 SDRAM Standard," 2009.
[21]
JEDEC, "DDR3 SDRAM Standard," 2010.
[22]
JEDEC, "Standard No. 21-C. Annex K: Serial Presence Detect (SPD) for DDR3 SDRAM Modules," 2011.
[23]
JEDEC, "DDR4 SDRAM Standard," 2012.
[24]
JEDEC, "Low Power Double Data Rate 3 (LPDDR3)," 2012.
[25]
X. Jian et al., "Low-Power, Low-Storage-Overhead Chipkill Correct via Multi-Line Error Correction," in SC, 2013.
[26]
X. Jiang et al., "CHOP: Adaptive Filter-Based DRAM Caching for CMP Server Platforms," in HPCA, 2010.
[27]
S. Kanev et al., "Profiling a Warehouse-scale Computer," in ISCA, 2015.
[28]
S. Khan et al., "PARBOR: An Efficient System-Level Technique to Detect Data Dependent Failures in DRAM," in DSN, 2016.
[29]
S. Khan et al., "The Efficacy of Error Mitigation Techniques for DRAM Retention Failures: A Comparative Experimental Study," in SIGMETRICS, 2014.
[30]
J. Kim et al., "Bamboo ECC: Strong, Safe, and Flexible Codes for Reliable Computer Memory," in HPCA, 2015.
[31]
K. Kim and J. Lee, "A New Investigation of Data Retention Time in Truly Nanoscaled DRAMs," in EDL, 2009.
[32]
Y. Kim et al., "Ramulator," https://github.com/Carnegie Mellon University-SAFARI/ramulator.
[33]
Y. Kim et al., "Ramulator: A Fast and Extensible DRAM Simulator," IEEE CAL, 2015.
[34]
Y. Kim et al., "Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors," in ISCA, 2014.
[35]
Y. Kim et al., "A Case for Exploiting Subarray-Level Parallelism (SALP) in DRAM," in ISCA, 2012.
[36]
C. J. Lee et al., "DRAM-Aware Last-Level Cache Writeback: Reducing Write-Caused Interference in Memory Systems," in HPS Technical Report, 2010.
[37]
D. Lee et al., "Adaptive-Latency DRAM: Optimizing DRAM Timing for the Common-Case," in HPCA, 2015.
[38]
D. Lee et al., "Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture," in HPCA, 2013.
[39]
S. Li et al., "MAGE: Adaptive Granularity and ECC for Resilient and Power Efficient Memory Systems," in SC, 2012.
[40]
X. Li et al., "A Realistic Evaluation of Memory Hardware Errors and Software System Susceptibility," in USENIX ATC, 2010.
[41]
Y. Li et al., "DRAM Yield Analysis and Optimization by a Statistical Design Approach," in IEEE TCSI, 2011.
[42]
K.-N. Lim et al., "A 1.2V 23nm 6F2 4Gb DDR3 SDRAM With Local-Bitline Sense Amplifier, Hybrid LIO Sense Amplifier and Dummy-Less Array Architecture," in ISSCC, 2012.
[43]
J. Liu et al., "An Experimental Study of Data Retention Behavior in Modern DRAM Devices: Implications for Retention Time Profiling Mechanisms," in ISCA, 2013.
[44]
J. Liu et al., "RAIDR: Retention-Aware Intelligent DRAM Refresh," in ISCA, 2012.
[45]
S.-L. Lu et al., "Improving DRAM Latency with Dynamic Asymmetric Subarray," in MICRO, 2015.
[46]
C.-K. Luk et al., "Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation," in PLDI, 2005.
[47]
J. Marathe and F. Mueller, "Hardware Profile-Guided Automatic Page Placement for ccNUMA Systems," in PPoPP, 2006.
[48]
J. D. McCalpin, "STREAM Benchmark."
[49]
S. A. McKee, "Reflections on the Memory Wall," in CF, 2004.
[50]
J. Meza et al., "Revisiting Memory Errors in Large-Scale Production Data Centers: Analysis and Modeling of New Trends from the Field," in DSN, 2015.
[51]
Micron Technology, Inc., "128Mb: x4, x8, x16 Automotive SDRAM," 1999.
[52]
Micron Technology, Inc., "576Mb: x18, x36 RLDRAM3," 2011.
[53]
O. Mutlu, "Memory Scaling: A Systems Architecture Perspective," IMW, 2013.
[54]
O. Mutlu and L. Subramanian, "Research Problems and Opportunities in Memory Systems," SUPERFRI, 2015.
[55]
S. Nassif, "Delay Variability: Sources, Impacts and Trends," in ISSCC, 2000.
[56]
S. O et al., "Row-Buffer Decoupling: A Case for Low-Latency DRAM Microarchitecture," in ISCA, 2014.
[57]
M. Onabajo and J. Silva-Martinez, Analog Circuit Design for Process Variation-Resilient Systems-on-a-Chip.\hskip 1em plus 0.5em minus 0.4em\relax Springer, 2012.
[58]
H. Patil et al., "Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation," in MICRO, 2004.
[59]
M. K. Qureshi et al., "AVATAR: A Variable-Retention-Time (VRT) Aware Refresh for DRAM Systems," in DSN, 2015.
[60]
L. E. Ramos et al., "Page Placement in Hybrid Memory Systems," in ICS, 2011.
[61]
S. Rixner et al., "Memory Access Scheduling," in ISCA, 2000.
[62]
Y. Sato et al., "Fast Cycle RAM (FCRAM): A 20-ns Random Row Access, Pipe-Lined Operating DRAM," in VLSIC, 1998.
[63]
B. Schroeder et al., "DRAM Errors in the Wild: A Large-Scale Field Study," in SIGMETRICS, 2009.
[64]
V. Seshadri et al., "The Dirty-Block Index," in ISCA, 2014.
[65]
V. Seshadri et al., "Fast Bulk Bitwise AND and OR in DRAM," IEEE CAL, 2015.
[66]
V. Seshadri et al., "RowClone: Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization," in MICRO, 2013.
[67]
V. Seshadri et al., "Gather-Scatter DRAM: In-DRAM Address Translation to Improve the Spatial Locality of Non-Unit Strided Accesses," in MICRO, 2015.
[68]
W. Shin et al., "NUAT: A Non-Uniform Access Time Memory Controller," in HPCA, 2014.
[69]
A. Snavely and D. Tullsen, "Symbiotic Jobscheduling for a Simultaneous Multithreading Processor," in ASPLOS, 2000.
[70]
Y. H. Son et al., "Reducing Memory Access Latency with Asymmetric DRAM Bank Organizations," in ISCA, 2013.
[71]
V. Sridharan et al., "Memory Errors in Modern Systems: The Good, the Bad, and the Ugly," in ASPLOS, 2015.
[72]
V. Sridharan and D. Liberty, "A Study of DRAM Failures in the Field," in SC, 2012.
[73]
Standard Performance Evaluation Corp., "SPEC CPU2006,"mboxhttp://www.spec.org/cpu2006.
[74]
K. Sudan et al., "Micro-Pages: Increasing DRAM Efficiency with Locality-Aware Data Placement," in ASPLOS, 2010.
[75]
Transaction Performance Processing Council, "TPC Benchmarks," http://www.tpc.org/.
[76]
A. N. Udipi et al., "LOT-ECC: Localized and Tiered Reliability Mechanisms for Commodity Memory Systems," in ISCA, 2012.
[77]
B. Verghese et al., "Operating System Support for Improving Data Locality on CC-NUMA Compute Servers," in ASPLOS, 1996.
[78]
C. Wilkerson et al., "Reducing Cache Power with Low-cost, Multi-bit Error-correcting Codes," in ISCA, 2010.
[79]
M. V. Wilkes, "The Memory Gap and the Future of High Performance Memories," SIGARCH CAN, 2001.
[80]
Xilinx, "ML605 Hardware User Guide," Oct. 2012.
[81]
D. H. Yoon et al., "BOOM: Enabling Mobile Memory Based Low-Power Server DIMMs," in ISCA, 2012.
[82]
D. H. Yoon and M. Erez, "Virtualized ECC: Flexible Reliability in Main Memory," in ASPLOS, 2010.
[83]
H. Yoon et al., "Row Buffer Locality Aware Caching Policies for Hybrid Memories," in ICCD, 2012.
[84]
T. Zhang et al., "Half-DRAM: A High-Bandwidth and Low-Power DRAM Architecture from the Rethinking of Fine-Grained Activation," in ISCA, 2014.

Cited By

View all
  • (2025)A 3D-stack DRAM-based PNM architecture designIntegration10.1016/j.vlsi.2024.102266100(102266)Online publication date: Jan-2025
  • (2024)The influence of job satisfaction on retention of primary healthcare professionals in Tamil NaduInternational Journal of ADVANCED AND APPLIED SCIENCES10.21833/ijaas.2024.02.02511:2(238-247)Online publication date: Feb-2024
  • (2024)Fast & Safe IO Memory ProtectionProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695943(95-109)Online publication date: 4-Nov-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMETRICS '16: Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science
June 2016
434 pages
ISBN:9781450342667
DOI:10.1145/2896377
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dram
  2. dram errors
  3. memory latency
  4. process variation

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMETRICS '16
Sponsor:

Acceptance Rates

SIGMETRICS '16 Paper Acceptance Rate 28 of 208 submissions, 13%;
Overall Acceptance Rate 459 of 2,691 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)466
  • Downloads (Last 6 weeks)67
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2025)A 3D-stack DRAM-based PNM architecture designIntegration10.1016/j.vlsi.2024.102266100(102266)Online publication date: Jan-2025
  • (2024)The influence of job satisfaction on retention of primary healthcare professionals in Tamil NaduInternational Journal of ADVANCED AND APPLIED SCIENCES10.21833/ijaas.2024.02.02511:2(238-247)Online publication date: Feb-2024
  • (2024)Fast & Safe IO Memory ProtectionProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695943(95-109)Online publication date: 4-Nov-2024
  • (2024)Exploring the Correlation Between DRAM Latencies and Rowhammer Attacks2024 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI61997.2024.00086(445-450)Online publication date: 1-Jul-2024
  • (2024) (MC) 2 : Lazy MemCopy at the Memory Controller 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00084(1112-1128)Online publication date: 29-Jun-2024
  • (2024)HiFi-DRAM: Enabling High-fidelity DRAM Research by Uncovering Sense Amplifiers with IC Imaging2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00020(133-149)Online publication date: 29-Jun-2024
  • (2024)SpyHammer: Understanding and Exploiting RowHammer Under Fine-Grained Temperature VariationsIEEE Access10.1109/ACCESS.2024.340938912(80986-81003)Online publication date: 2024
  • (2023)Lightning: A Reconfigurable Photonic-Electronic SmartNIC for Fast and Energy-Efficient InferenceProceedings of the ACM SIGCOMM 2023 Conference10.1145/3603269.3604821(452-472)Online publication date: 10-Sep-2023
  • (2023)Unity ECC: Unified Memory Protection Against Bit and Chip ErrorsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607081(1-16)Online publication date: 12-Nov-2023
  • (2023)HHVM Performance Optimization for Large Scale Web ServicesProceedings of the 2023 ACM/SPEC International Conference on Performance Engineering10.1145/3578244.3583720(137-148)Online publication date: 15-Apr-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media