Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJune 2017
Understanding the Spatial Characteristics of DRAM Errors in HPC Clusters
FTXS '17: Proceedings of the 2017 Workshop on Fault-Tolerance for HPC at Extreme ScalePages 17–22https://doi.org/10.1145/3086157.3086164Understanding DRAM errors in high-performance computing (HPC) clusters is paramount to address future HPC resilience challenges. While there have been studies on this topic, previous work has focused on on-node and single-rack characteristics of errors; ...
- research-articleOctober 2015
E-ECC: Low Power Erasure and Error Correction Schemes for Increasing Reliability of Commodity DRAM Systems
MEMSYS '15: Proceedings of the 2015 International Symposium on Memory SystemsPages 60–70https://doi.org/10.1145/2818950.2818961Most server-grade memory systems provide Chipkill-Correct error protection at the expense of power and/or performance overhead. In this paper we present low overhead schemes for improving the reliability of commodity DRAM systems with better power and ...
- research-articleMarch 2012
Cosmic rays don't strike twice: understanding the nature of DRAM errors and the implications for system design
ASPLOS XVII: Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating SystemsPages 111–122https://doi.org/10.1145/2150976.2150989Main memory is one of the leading hardware causes for machine crashes in today's datacenters. Designing, evaluating and modeling systems that are resilient against memory errors requires a good understanding of the underlying characteristics of errors ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 40 Issue 1ACM SIGPLAN Notices: Volume 47 Issue 4