Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3307650.3322231acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

CROW: a low-cost substrate for improving DRAM performance, energy efficiency, and reliability

Published: 22 June 2019 Publication History

Abstract

DRAM has been the dominant technology for architecting main memory for decades. Recent trends in multi-core system design and large-dataset applications have amplified the role of DRAM as a critical system bottleneck. We propose Copy-Row DRAM (CROW), a flexible substrate that enables new mechanisms for improving DRAM performance, energy efficiency, and reliability. We use the CROW substrate to implement 1) a low-cost in-DRAM caching mechanism that lowers DRAM activation latency to frequently-accessed rows by 38% and 2) a mechanism that avoids the use of short-retention-time rows to mitigate the performance and energy overhead of DRAM refresh operations. CROW's flexibility allows the implementation of both mechanisms at the same time. Our evaluations show that the two mechanisms synergistically improve system performance by 20.0% and reduce DRAM energy by 22.3% for memory-intensive four-core workloads, while incurring 0.48% extra area overhead in the DRAM chip and 11.3 KiB storage overhead in the memory controller, and consuming 1.6% of DRAM storage capacity, for one particular implementation.

References

[1]
Arizona State Univ., NIMO Group, "Predictive Technology Model," http://ptm.asu.edu/, 2012.
[2]
S. Baek et al., "Refresh Now and Then," TC, 2014.
[3]
I. Bhati et al., "Coordinated Refresh: Energy Efficient Techniques for DRAM Refresh Scheduling," in ISLPED, 2013.
[4]
K. Chandrasekar et al., "Exploiting Expendable Process-Margins in DRAMs for Run-Time Performance Optimization," in DATE, 2014.
[5]
K. Chandrasekar et al., "DRAMPower: Open-Source DRAM Power & Energy Estimation Tool," http://www.drampower.info.
[6]
K. K. Chang et al., "Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization," in SIGMETRICS, 2016.
[7]
K. K. Chang et al., "Improving DRAM Performance by Parallelizing Refreshes with Accesses," in HPCA, 2014.
[8]
K. K. Chang et al., "Low-cost Inter-linked Subarrays (LISA): Enabling Fast Inter-subarray Data Movement in DRAM," in HPCA, 2016.
[9]
K. K. Chang et al., "Understanding Reduced-Voltage Operation in Modern DRAM Devices: Experimental Characterization, Analysis, and Mechanisms," SIGMETRICS, 2017.
[10]
J. Choi et al., "Multiple Clone Row DRAM: A Low Latency and Area Optimized DRAM," in ISCA, 2015.
[11]
Z. Cui et al., "DTail: A Flexible Approach to DRAM Refresh Management," in ICS, 2014.
[12]
P. G. Emma et al., "Rethinking Refresh: Increasing Availability and Reducing Power in DRAM for Cache Applications," IEEE Micro, 2008.
[13]
S. Eyerman and L. Eeckhout, "System-Level Performance Metrics for Multiprogram Workloads," IEEE Micro, 2008.
[14]
M. Ferdman et al., "Clearing the Clouds: A Study of Emerging Scale-Out Workloads on Modern Hardware," in ASPLOS, 2012.
[15]
J. E. Fritts et al., "Mediabench II Video: Expediting the Next Generation of Video Systems Research," in Electronic Imaging, 2005.
[16]
M. Ghasempour et al., "Armor: A Run-Time Memory Hot-Row Detector," http://apt.cs.manchester.ac.uk/projects/ARMOR/RowHammer, 2015.
[17]
S. Ghose et al., "Demystifying Complex Workload-DRAM Interactions: An Experimental Study," in SIGMETRICS, 2019.
[18]
S. Ghose et al., "Understanding the Interactions of Workloads and DRAM Types: A Comprehensive Experimental Study," arXiv:1902.07609 {cs.AR}, 2019.
[19]
M. Ghosh and H.-H. S. Lee, "Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs," in MICRO, 2007.
[20]
D. Gruss et al., "Another Flip in the Wall of Rowhammer Defenses," in SP, 2018.
[21]
D. Gruss et al., "Rowhammer.js: A Remote Software-Induced Fault Attack in Javascript," in DIMVA, 2016.
[22]
N. D. Gulur et al., "Multiple Sub-Row Buffers in DRAM: Unlocking Performance and Energy Improvement Opportunities," in SC, 2012.
[23]
A. Gutierrez et al., "Full-System Analysis and Characterization of Interactive Smartphone Applications," in IISWC, 2011.
[24]
T. Hamamoto et al., "On the Retention Time Distribution of Dynamic Random Access Memory (DRAM)," TED, 1998.
[25]
G. Hamerly et al., "SimPoint 3.0: Faster and More Flexible Program Phase Analysis," JILP, 2005.
[26]
H. Hassan et al., "ChargeCache: Reducing DRAM Latency by Exploiting Row Access Locality," in HPCA, 2016.
[27]
E. Herrero et al., "Thread Row Buffers: Improving Memory Performance Isolation and Throughput in Multiprogrammed Environments," TC, 2012.
[28]
J. Hestness et al., "A Comparative Analysis of Microarchitecture Effects on CPU and GPU Memory System Behavior," in IISWC, 2014.
[29]
H. Hidaka et al., "The Cache DRAM Architecture: A DRAM with an On-Chip Cache Memory," IEEE Micro, 1990.
[30]
Y. Huang et al., "Moby: A Mobile Benchmark Suite for Architectural Simulators," in ISPASS, 2014.
[31]
S. Iacobovici et al., "Effective Stream-Based and Execution-Based Data Prefetching," in ICS, 2004.
[32]
E. Ipek et al., "Self-Optimizing Memory Controllers: A Reinforcement Learning Approach," in ISCA, 2008.
[33]
C. Isen and L. John, "ESKIMO - Energy Savings Using Semantic Knowledge of Inconsequential Memory Occupancy for DRAM Subsystem," in MICRO, 2009.
[34]
ITRS Reports, http://www.itrs2.net/itrs-reports.html.
[35]
JEDEC Solid State Technology Assn., "JESD79-3F: DDR3 SDRAM Standard," July 2012.
[36]
JEDEC Solid State Technology Assn., "JESD209-4B: Low Power Double Data Rate 4 (LPDDR4) Standard," March 2017.
[37]
JEDEC Solid State Technology Assn., "JESD79-4B: DDR4 SDRAM Standard," June 2017.
[38]
M. Jung et al., "Omitting Refresh: A Case Study for Commodity and Wide I/O DRAMs," in MEMSYS, 2015.
[39]
M. Kandemir et al., "Memory Row Reuse Distance and Its Role in Optimizing Application Performance," in SIGMETRICS, 2015.
[40]
U. Kang et al., "Co-Architecting Controllers and DRAM to Enhance DRAM Process Scaling," in The Memory Forum, 2014.
[41]
S. Khan et al., "The Efficacy of Error Mitigation Techniques for DRAM Retention Failures: A Comparative Experimental Study," in SIGMETRICS, 2014.
[42]
S. Khan et al., "PARBOR: An Efficient System-Level Technique to Detect Data-Dependent Failures in DRAM," in DSN, 2016.
[43]
S. Khan et al., "A Case for Memory Content-Based Detection and Mitigation of Data-Dependent Failures in DRAM," CAL, 2016.
[44]
S. Khan et al., "Detecting and Mitigating Data-Dependent DRAM Failures by Exploiting Current Memory Content," in MICRO, 2017.
[45]
D.-H. Kim et al., "Architectural Support for Mitigating Row Hammering in DRAM Memories," CAL, 2015.
[46]
J. Kim et al., "Solar-DRAM: Reducing DRAM Access Latency by Exploiting the Variation in Local Bitlines," in ICCD, 2018.
[47]
J. S. Kim et al., "The DRAM Latency PUF: Quickly Evaluating Physical Unclonable Functions by Exploiting the Latency-Reliability Tradeoff in Modern Commodity DRAM Devices," in HPCA, 2018.
[48]
J. S. Kim et al., "D-RaNGe: Using Commodity DRAM Devices to Generate True Random Numbers with Low Latency and High Throughput," in HPCA, 2019.
[49]
J. Kim and M. C. Papaefthymiou, "Dynamic Memory Design for Low Data-Retention Power," in PATMOS, 2000.
[50]
J. Kim and M. C. Papaefthymiou, "Block-based Multiperiod Dynamic Memory Design for Low Data-Retention Power," TVLSI, 2003.
[51]
K. Kim and J. Lee, "A New Investigation of Data Retention Time in Truly Nanoscaled DRAMs," EDL, 2009.
[52]
Y. Kim et al., "Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors," in ISCA, 2014.
[53]
Y. Kim et al., "A Case for Exploiting Subarray-Level Parallelism (SALP) in DRAM," in ISCA, 2012.
[54]
Y. Kim et al., "Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior," in MICRO, 2010.
[55]
Y. Kim et al., "Ramulator: A Fast and Extensible DRAM Simulator," in CAL, 2015.
[56]
Y. Konishi et al., "Analysis of Coupling Noise Between Adjacent Bit Lines in Megabit DRAMs," JSSC, 1989.
[57]
D. Lee et al., "Adaptive-Latency DRAM: Optimizing DRAM Timing for the Common-Case," in HPCA, 2015.
[58]
D. Lee et al., "Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture," in HPCA, 2013.
[59]
D. Lee et al., "Simultaneous Multi-Layer Access: Improving 3D-Stacked Memory Bandwidth at Low Cost," TACO, 2016.
[60]
D. Lee et al., "Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms," SIGMETRICS, 2017.
[61]
D. Lee et al., "Decoupled Direct Memory Access: Isolating CPU and IO Traffic by Leveraging a Dual-Data-Port DRAM," in PACT, 2015.
[62]
E. Lee et al., "TWiCe: Time Window Counter Based Row Refresh to Prevent Row-Hammering," CAL, 2018.
[63]
M. Lipp et al., "Nethammer: Inducing Rowhammer Faults Through Network Requests," arXiv, 2018.
[64]
J. Liu et al., "RAIDR: Retention-Aware Intelligent DRAM Refresh," in ISCA, 2012.
[65]
J. Liu et al., "An Experimental Study of Data Retention Behavior in Modern DRAM Devices: Implications for Retention Time Profiling Mechanisms," in ISCA, 2013.
[66]
S. Liu et al., "Flikker: Saving DRAM Refresh-Power Through Critical Data Partitioning," ASPLOS, 2012.
[67]
S.-L. Lu et al., "Improving DRAM Latency with Dynamic Asymmetric Subarray," in MICRO, 2015.
[68]
C.-K. Luk et al., "Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation," in PLDI, 2005.
[69]
Y. Luo et al., "Characterizing Application Memory Error Vulnerability to Optimize Datacenter Cost via Heterogeneous-Reliability Memory," in DSN, 2014.
[70]
J. Mandelman et al., "Challenges and Future Directions for the Scaling of Dynamic Random-Access Memory (DRAM)," IBM JRD, 2002.
[71]
J. D. McCalpin, "STREAM: Sustainable Memory Bandwidth in High Performance Computers," https://www.cs.virginia.edu/stream/.
[72]
Micron Technology, Inc., "RLDRAM 2 and 3 Specifications," http://www.micron.com/products/dram/rldram-memory.
[73]
Micron Technology, Inc., "x64 Mobile LPDDR4 SDRAM Datasheet," https://prod.micron.com/~/media/documents/products/data-sheet/dram/mobile-dram/low-power-dram/lpddr4/272b_z9am_qdp_mobile_lpddr4.pdf.
[74]
Y. Mori et al., "The Origin of Variable Retention Time in DRAM," in IEDM, 2005.
[75]
T. Moscibroda and O. Mutlu, "Memory Performance Attacks: Denial of Memory Service in Multi-Core Systems," in USENIX Security, 2007.
[76]
J. Mukundan et al., "Understanding and Mitigating Refresh Overheads in High-Density DDR4 DRAM Systems," in ISCA, 2013.
[77]
N. Muralimanohar et al., "CACTI 6.0: A Tool to Model Large Caches," HP Laboratories, Tech. Rep. HPL-2009-85, 2009.
[78]
O. Mutlu, "Memory Scaling: A Systems Architecture Perspective," IMW, 2013.
[79]
O. Mutlu, "The RowHammer Problem and Other Issues We May Face as Memory Becomes Denser," in DATE, 2017.
[80]
O. Mutlu and J. S. Kim, "RowHammer: A Retrospective," TCAD, 2019.
[81]
O. Mutlu and T. Moscibroda, "Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors," in MICRO, 2007.
[82]
O. Mutlu and L. Subramanian, "Research Problems and Opportunities in Memory Systems," SUPERFRI, 2015.
[83]
P. Nair et al., "A Case for Refresh Pausing in DRAM Memory Systems," in HPCA, 2013.
[84]
P. J. Nair et al., "Refresh Pausing in DRAM Memory Systems," TACO, 2014.
[85]
H. Park et al., "Regularities Considered Harmful: Forcing Randomness to Memory Accesses to Reduce Row Buffer Conflicts for Multi-Core, Multi-Bank Systems," in ASPLOS, 2013.
[86]
K. Patel et al., "Energy-Efficient Value-Based Selective Refresh for Embedded DRAMs," in PATMOS, 2005.
[87]
M. Patel et al., "The Reach Profiler (REAPER): Enabling the Mitigation of DRAM Retention Failures via Profiling at Aggressive Conditions," ISCA, 2017.
[88]
M. Qureshi et al., "AVATAR: A Variable-Retention-Time (VRT) Aware Refresh for DRAM Systems," in DSN, 2015.
[89]
Rambus Inc., "DRAM Power Model," http://www.rambus.com/energy/. 2016.
[90]
K. Razavi et al., "Flip Feng Shui: Hammering a Needle in the Software Stack," in USENIX Sec., 2016.
[91]
M. Redeker et al., "An Investigation into Crosstalk Noise in DRAM Structures," in MTDT, 2002.
[92]
P. J. Restle et al., "DRAM Variable Retention Time," in IEDM, 1992.
[93]
Y. Riho and K. Nakazato, "Partial Access Mode: New Method for Reducing Power Consumption of Dynamic Random Access Memory," TVLSI, 2014.
[94]
S. Rixner, "Memory Controller Optimizations for Web Servers," in MICRO, 2004.
[95]
S. Rixner et al., "Memory Access Scheduling," in ISCA, 2000.
[96]
SAFARI Research Group, "CROW --- GitHub Repository," https://github.com/CMU-SAFARI/CROW.
[97]
SAFARI Research Group, "Ramulator: A DRAM Simulator --- GitHub Repository," https://github.com/CMU-SAFARI/ramulator.
[98]
Y. Sato et al., "Fast Cycle RAM (FCRAM); A 20-ns Random Row Access, Pipe-Lined Operating DRAM," in VLSIC, 1998.
[99]
M. Seaborn and T. Dullien, "Exploiting the DRAM Rowhammer Bug to Gain Kernel Privileges," https://googleprojectzero.blogspot.com/2015/03/exploiting-dram-rowhammer-bug-to-gain.html, 2015.
[100]
V. Seshadri et al., "RowClone: Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization," in MICRO, 2013.
[101]
V. Seshadri et al., "Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology," in MICRO, 2017.
[102]
V. Seshadri et al., "Gather-Scatter DRAM: In-DRAM Address Translation to Improve the Spatial Locality of Non-Unit Strided Accesses," in MICRO, 2015.
[103]
S. M. Seyedzadeh et al., "Mitigating Wordline Crosstalk Using Adaptive Trees of Counters," in ISCA, 2018.
[104]
A. Snavely and D. M. Tullsen, "Symbiotic Jobscheduling for a Simultaneous Mutlithreading Processor," ASPLOS, 2000.
[105]
Y. Son et al., "Reducing Memory Access Latency with Asymmetric DRAM Bank Organizations," ISCA, 2013.
[106]
Standard Performance Evaluation Corp., "SPEC CPU<sup>®</sup> 2006," http://www.spec.org/cpu2006/, 2006.
[107]
J. Stuecheli et al., "Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory," in MICRO, 2010.
[108]
L. Subramanian et al., "The Blacklisting Memory Scheduler: Achieving High Performance and Fairness at Low Cost," in ICCD, 2014.
[109]
L. Subramanian et al., "BLISS: Balancing Performance, Fairness and Complexity in Memory Access Scheduling," TPDS, 2016.
[110]
A. Tatar et al., "Defeating Software Mitigations Against Rowhammer: A Surgical Precision Hammer," in RAID, 2018.
[111]
Transaction Processing Performance Council, "TPC Benchmarks," http://www.tpc.org/.
[112]
A. N. Udipi et al., "Rethinking DRAM Design and Organization for Energy-Constrained Multi-Cores," in ISCA, 2010.
[113]
V. van der Veen et al., "Drammer: Deterministic Rowhammer Attacks on Mobile Platforms," in CCS, 2016.
[114]
R. Venkatesan et al., "Retention-Aware Placement in DRAM (RAPID): Software Methods for Quasi-Non-Volatile DRAM," in HPCA, 2006.
[115]
T. Vogelsang, "Understanding the Energy Consumption of Dynamic Random Access Memories," in MICRO, 2010.
[116]
Y. Wang et al., "Reducing DRAM Latency via Charge-Level-Aware Look-Ahead Partial Restoration," in MICRO, 2018.
[117]
Y. Xiao et al., "One Bit Flips, One Cloud Flops: Cross-VM Row Hammer Attacks and Privilege Escalation," in USENIX Sec., 2016.
[118]
D. Yaney et al., "A Meta-Stable Leakage Phenomenon in DRAM Charge Storage - Variable Hold Time," in IEDM, 1987.
[119]
T. Zhang et al., "Half-DRAM: A High-bandwidth and Low-power DRAM Architecture from the Rethinking of Fine-grained Activation," in ISCA, 2014.
[120]
X. Zhang et al., "Exploiting DRAM Restore Time Variations in Deep Sub-micron Scaling," in DATE, 2015.
[121]
X. Zhang et al.,"Restore Truncation for Performance Improvement in Future DRAM Systems," in HPCA, 2016.
[122]
W. Zhao and Y. Cao, "New Generation of Predictive Technology Model for Sub-45 nm Early Design Exploration," TED, 2006.
[123]
Y. Zhu et al., "Microarchitectural Implications of Event-Driven Server-Side Web Applications," in MICRO, 2015.
[124]
W. K. Zuravleff and T. Robinson, "Controller for a Synchronous DRAM That Maximizes Throughput by Allowing Memory Requests and Commands to Be Issued Out of Order," U.S. Patent No. 5,630,096, 1997.

Cited By

View all
  • (2024)The influence of job satisfaction on retention of primary healthcare professionals in Tamil NaduInternational Journal of ADVANCED AND APPLIED SCIENCES10.21833/ijaas.2024.02.02511:2(238-247)Online publication date: Feb-2024
  • (2024)CHARM 2.0: Composing Heterogeneous Accelerators for Deep Learning on Versal ACAP ArchitectureACM Transactions on Reconfigurable Technology and Systems10.1145/368616317:3(1-31)Online publication date: 5-Aug-2024
  • (2024)FASA-DRAM: Reducing DRAM Latency with Destructive Activation and Delayed RestorationACM Transactions on Architecture and Code Optimization10.1145/364945521:2(1-27)Online publication date: 21-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '19: Proceedings of the 46th International Symposium on Computer Architecture
June 2019
849 pages
ISBN:9781450366694
DOI:10.1145/3307650
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • IEEE-CS\DATC: IEEE Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 June 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. DRAM
  2. energy
  3. memory systems
  4. performance
  5. power
  6. reliability

Qualifiers

  • Research-article

Conference

ISCA '19
Sponsor:

Acceptance Rates

ISCA '19 Paper Acceptance Rate 62 of 365 submissions, 17%;
Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)73
  • Downloads (Last 6 weeks)13
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)The influence of job satisfaction on retention of primary healthcare professionals in Tamil NaduInternational Journal of ADVANCED AND APPLIED SCIENCES10.21833/ijaas.2024.02.02511:2(238-247)Online publication date: Feb-2024
  • (2024)CHARM 2.0: Composing Heterogeneous Accelerators for Deep Learning on Versal ACAP ArchitectureACM Transactions on Reconfigurable Technology and Systems10.1145/368616317:3(1-31)Online publication date: 5-Aug-2024
  • (2024)FASA-DRAM: Reducing DRAM Latency with Destructive Activation and Delayed RestorationACM Transactions on Architecture and Code Optimization10.1145/364945521:2(1-27)Online publication date: 21-May-2024
  • (2024)The Environmental Cost of High Performance Computing System Simulation2024 32nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP62718.2024.00048(289-292)Online publication date: 20-Mar-2024
  • (2024)CoMeT: Count-Min-Sketch-based Row Tracking to Mitigate RowHammer at Low Cost2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00050(593-612)Online publication date: 2-Mar-2024
  • (2024)SpyHammer: Understanding and Exploiting RowHammer Under Fine-Grained Temperature VariationsIEEE Access10.1109/ACCESS.2024.340938912(80986-81003)Online publication date: 2024
  • (2024)Skyway: Accelerate Graph Applications with a Dual-Path Architecture and Fine-Grained Data ManagementJournal of Computer Science and Technology10.1007/s11390-023-2939-x39:4(871-894)Online publication date: 1-Jul-2024
  • (2023)Unity ECC: Unified Memory Protection Against Bit and Chip ErrorsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607081(1-16)Online publication date: 12-Nov-2023
  • (2023)Toward Sustainable HPC: Carbon Footprint Estimation and Environmental Implications of HPC SystemsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607035(1-15)Online publication date: 12-Nov-2023
  • (2023)CHARM: Composing Heterogeneous AcceleRators for Matrix Multiply on Versal ACAP ArchitectureProceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3543622.3573210(153-164)Online publication date: 12-Feb-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media