Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2830772.2830799acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article
Free access

CLEAN-ECC: high reliability ECC for adaptive granularity memory system

Published: 05 December 2015 Publication History

Abstract

Adaptive-granularity memory architectures have been considered mainly because of main memory bottleneck and power efficiency. Meanwhile, highly reliable protection schemes are getting popular especially in large computing systems. Unfortunately, conventional ECC mechanisms including Chipkill require a large number of symbols to guarantee strong protection with acceptable overhead. We propose a novel memory protection scheme called CLEAN (Chipkill-LEvel reliable and Access granularity Negotiable), which enables us to balance the contradicting demands of fine-grained (FG) access and strong & efficient ECC. To close a potentially significant detection coverage gap due to CLEAN's detection mechanism coupled with permanent faults, we design a simple mechanism access granularity enforcement. By enforcing coarse-grained (CG) access, we can get only the advantage of higher protection comparable to Chipkill instead of achieving the adaptive access granularity together. CLEAN showed Chipkill level reliability as well as improvement in performance, system and memory power efficiency by up to 11.8%, 10.8% and 64.9% with mixes of SPEC2006 benchmarks.

References

[1]
D. H. Yoon, M. K. Jeong, and M. Erez, "Adaptive granularity memory systems: A tradeoff between storage efficiency and throughput," in Proc. of ISCA, 2011.
[2]
D. H. Yoon, M. K. Jeong, M. B. Sullivan, and M. Erez, "The dynamic granularity memory system," in Proc. of ISCA, 2012.
[3]
M. Rhu, M. Sullivan, J. Leng, and M. Erez, "A locality-aware memory hierarchy for energy-efficient gpu architectures," in Proc. of MICRO, 2013.
[4]
B. Schroeder, E. Pinheiro, and W.-D. Weber, "DRAM Errors in the Wild: a Large-Scale Field Study," in Proc. of the International Joint Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), 2009.
[5]
A. A. Hwang, I. A. Stefanovici, and B. Schroeder, "Cosmic Rays Don't Strike Twice: Understanding the Nature of DRAM Errors and the Implications for System Design," in Proc. of ASPLOS, 2012.
[6]
V. Sridharan and D. Liberty, "A study of dram failures in the field," in Proc. of SC, pp. 76:1--76:11, 2012.
[7]
V. Sridharan, J. Stearley, N. DeBardeleben, S. Blanchard, and S. Gurumurthi, "Feng shui of supercomputer memory: Positional Effects in dram and sram faults," in Proc. of SC, pp. 22:1--22:11, 2013.
[8]
V. Sridharan, N. DeBardeleben, S. Blanchard, K. B. Ferreira, J. Stearley, J. Shalf, and S. Gurumurthi, "Memory errors in modern systems: The good, the bad, and the ugly," in Proc. of ASPLOS, 2015.
[9]
"BIOS and Kernel Developer's Guide (BKDG) for AMD Family 15h Models 00h-0Fh Processors," Jan 2013.
[10]
"Intel Xeon Processor E7 Family: Reliability, Availability, and Serviceability," 2011.
[11]
T. J. Dell, "A white paper on the benefits of chipkill-correct ecc for pc server main memory," 1997.
[12]
IBM, IBM 3330 data storage.
[13]
D. A. Patterson, G. Gibson, and R. H. Katz, "A case for redundant arrays of inexpensive disks (raid)," in Proc. of the ACM International Conference on Management of Data(SIGMOD), pp. 109--116, 1988.
[14]
J. S. Liptay, "Structural aspects of the system/360 model 85: Ii the cache," IBM Syst. J., vol. 7, no. 1, pp. 15--21, 1968.
[15]
J. B. Rothman and A. J. Smith, "The pool of subsectors cache design," in Proc. of the 13th International Conference on Supercomputing(ICS), pp. 31--42, 1999.
[16]
J. H. Ahn, N. P. Jouppi, C. Kozyrakis, J. Leverich, and R. S. Schreiber, "Future scaling of processor-memmory interfaces," in Proc. of SC, Nov. 2009.
[17]
J. H. Ahn, J. Leverich, R. Schreiber, and N. P. Jouppi, "Multicore DIMM: An energy efficient memory module with independently controlled DRAMs," IEEE Computer Architecture Letters, vol. 8, pp. 5--8, Jan. - Jun. 2009.
[18]
F. A. Ware and C. Hampel, "Improving power and data efficiency with threaded memory modules," in Proceedings of the International Conference on Computer Design (ICCD), 2006.
[19]
H. Zheng, J. Lin, Z. Zhang, E. Gorbatov, H. David, and Z. Zhu, "Mini-rank: Adaptive DRAM architecture for improving memory power efficiency," in Proc. of MICRO, Nov. 2008.
[20]
T. M. Brewer, "Instruction set innovations for the Convey HC-1 computer," IEEE Micro, vol. 30, no. 2, pp. 70--79, 2010.
[21]
X. Jian and R. Kumar, "Adaptive reliability chipkill correct (arcc)," vol. 0, pp. 270--281, 2013.
[22]
S. Li, D. H. Yoon, K. Chen, J. Zhao, J. H. Ahn, J. B. Brockman, Y. Xie, and N. P. Jouppi, "Mage: Adaptive granularity and ecc for resilient and power efficient memory systems," in Proc. of SC, 2012.
[23]
J. Kim, M. Sullivan, and M. Erez, "Bamboo ecc: Strong, safe, and flexible codes for reliable computer memory," in Proc. of HPCA, Feb. 2015.
[24]
X. Jian, H. Duwe, J. Sartori, V. Sridharan, and R. Kumar, "Low-power, low-storage-overhead chipkill correct via multi-line error correction," in Proc. of SC, pp. 24:1--24:12, 2013.
[25]
D. H. Yoon and M. Erez, "Virtualized and flexible ecc for main memory," in Proc. of ASPLOS, pp. 397--408, ACM, 2010.
[26]
A. N. Udipi, N. Muralimanohar, R. Balsubramonian, A. Davis, and N. P. Jouppi, "LOT-ECC: localized and tiered reliability mechanisms for commodity memory systems," SIGARCH Comput. Archit. News, vol. 40, pp. 285--296, June 2012.
[27]
X. Jian and R. Kumar, "Ecc parity: A technique for efficient memory error resilience for multi-channel memory systems," in Proc. of SC, 2014.
[28]
A. N. Udipi, N. Muralimanohar, N. Chatterjee, R. Balasubramonian, A. Davis, and N. P. Jouppi, "Rethinking dram design and organization for energy-constrained multi-cores," SIGARCH Comput. Archit. News, vol. 38, pp. 175--186, June 2010.
[29]
R. Blankenship, D. Brzezinski, and E. Valverde, "Memory error detection and/or correction," Aug. 21 2012. US Patent 8,250,435.
[30]
Z. Wang, G. A. Jullien, and W. C. Miller, "An efficient tree architecture for modulo 2 n + 1 multiplication," Journal of VLSI Signal Processing, pp. 241--248, Dec. 1996.
[31]
S. Kumar and C. Wilkerson, "Exploiting spatial locality in data caches using spatial footprints," in Proc. of ISCA, pp. 357--368, 1998.
[32]
C. F. Chen, S.-H. Yang, B. Falsafi, and A. Moshovos, "Accurate and complexity-effective spatial pattern prediction," in Proc. of HPCA, 2004.
[33]
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 simulator," SIGARCH Comput. Archit. News, vol. 39, pp. 1--7, Aug. 2011.
[34]
M. K. Jeong, D. H. Yoon, and M. Erez, "Drsim: A platform for flexible DRAM system research." http://lph.ece.utexas.edu/public/DrSim.
[35]
Standard Performance Evaluation Corporation, "SPEC CPU 2006." http://www.spec.org/cpu2006/, 2006.
[36]
M. K. Jeong, D. H. Yoon, D. Sunwoo, M. Sullivan, I. Lee, and M. Erez, "Balancing dram locality and parallelism in shared memory cmp systems," in Proc. of HPCA, pp. 1--12, feb. 2012.
[37]
S. Eyerman and L. Eeckout, "System-level performance metrics for multiprogram workoads," vol. 28, no. 3, pp. 42--53, 2008.
[38]
M. Technology, "Calculating memory system power for ddr3," Technical Report TN-41-01, 2007.
[39]
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "Mcpat: An integrated power, area, and timing modeling framework for multicore and manycore architectures," in Proc. of MICRO, pp. 469--480, 2009.
[40]
S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens, "Memory access scheduling," in Proc. of ISCA, pp. 128--138, 2000.
[41]
Micron Corp., Micron 1 Gb ×4, ×8, ×16, DDR3 SDRAM: MT41J256M4, MT41J128M8, and MT41J64M16, 2006.
[42]
D. Roberts and P. Nair, "FAULTSIM: A fast, configurable memory-resilience simulator," in The Memory Forum: In conjunction with ISCA, vol. 41.
[43]
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood, "PIN: Building customized program analysis tools with dynamic instrumentation," in Proc. the ACM Conf. Programming Language Design and Implementation (PLDI), Jun. 2005.
[44]
"FreePDK45." http://www.eda.ncsu.edu/wiki/FreePDK45:Contents, 2006.

Cited By

View all
  • (2024)LWECC: A Lightweight ECC Technology for HPC Accelerators Supporting Multi-granularity Memory Access2024 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS58744.2024.10558537(1-5)Online publication date: 19-May-2024
  • (2024)ZEC ECC: A Zero-Byte Eliminating Compression-Based ECC Scheme for DRAM ReliabilityIEEE Access10.1109/ACCESS.2024.343120912(100366-100376)Online publication date: 2024
  • (2022)Stealth ECCProceedings of the 2022 Conference & Exhibition on Design, Automation & Test in Europe10.5555/3539845.3539938(382-387)Online publication date: 14-Mar-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture
December 2015
787 pages
ISBN:9781450340342
DOI:10.1145/2830772
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. DRAM memory
  2. adaptive granularity memory system
  3. chipkill
  4. reliability

Qualifiers

  • Research-article

Funding Sources

Conference

MICRO-48
Sponsor:

Acceptance Rates

MICRO-48 Paper Acceptance Rate 61 of 283 submissions, 22%;
Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)96
  • Downloads (Last 6 weeks)29
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)LWECC: A Lightweight ECC Technology for HPC Accelerators Supporting Multi-granularity Memory Access2024 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS58744.2024.10558537(1-5)Online publication date: 19-May-2024
  • (2024)ZEC ECC: A Zero-Byte Eliminating Compression-Based ECC Scheme for DRAM ReliabilityIEEE Access10.1109/ACCESS.2024.343120912(100366-100376)Online publication date: 2024
  • (2022)Stealth ECCProceedings of the 2022 Conference & Exhibition on Design, Automation & Test in Europe10.5555/3539845.3539938(382-387)Online publication date: 14-Mar-2022
  • (2021)Characterizing and Mitigating Soft Errors in GPU DRAMMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480111(641-653)Online publication date: 18-Oct-2021
  • (2021)Automatic Sublining for Efficient Sparse Memory AccessesACM Transactions on Architecture and Code Optimization10.1145/345214118:3(1-23)Online publication date: 10-May-2021
  • (2021)CARE: Coordinated Augmentation for Elastic Resilience on DRAM Errors in Data Centers2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00052(533-544)Online publication date: Feb-2021
  • (2021)Shortest Distance Lattice Cryptographic Algorithm for Data Points Using Quantum ProcessorsData Management, Analytics and Innovation10.1007/978-981-16-2934-1_18(275-293)Online publication date: 5-Aug-2021
  • (2019)Configurable-ECC: Architecting a Flexible ECC Scheme to Support Different Sized Accesses in High Bandwidth Memory SystemsIEEE Transactions on Computers10.1109/TC.2018.288688468:5(646-659)Online publication date: 1-May-2019
  • (2018)Leverage Redundancy in Hardware Transactional Memory to Improve Cache ReliabilityProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225093(1-10)Online publication date: 13-Aug-2018
  • (2018)ReveNANDACM Transactions on Architecture and Code Optimization10.1145/318474415:2(1-26)Online publication date: 1-May-2018
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media