Abstract
This paper presents a group-based dynamic stuck-at fault diagnosis scheme intended for resistive random-access memory (ReRAM). Traditional static random-access memory, dynamic random-access memory, NAND, and NOR flash memory are limited by their scalability, power, package density, and so forth. Next-generation memory types like ReRAMs are considered to have various advantages such as high package density, non-volatility, scalability, and low power consumption, but cell reliability has been a problem. Unreliable memory operation is caused by permanent stuck-at faults due to extensive use of write- or memory-intensive workloads. An increased number of stuck-at faults also prematurely limit chip lifetime. Therefore, a cellular automaton (CA) based dynamic stuck-at fault-tolerant design is proposed here to combat unreliable cell functioning and variable cell lifetime issues. A scalable, block-level fault diagnosis and recovery scheme is introduced to ensure readable data despite multi-bit stuck-at faults. The scheme is a novel approach because its goal is to remove all the restrictions on the number and nature of stuck-at faults in general fault conditions. The proposed scheme is based on Wolfram’s null boundary and periodic boundary CA theory. Various special classes of CAs are introduced for 100% fault tolerance: single-length-cycle single-attractor cellular automata (SACAs), single-length-cycle two-attractor cellular automata (TACAs), and single-length-cycle multiple-attractor cellular automata (MACAs). The target micro-architectural unit is designed with optimal space overhead.
摘要
本文提出一种用于可变电阻式存储器(ReRAM)、基于组的动态固定型故障诊断方案。传统的静态随机存取存储器、动态随机存取存储器、NAND和NOR闪存受可扩展性、功率、封装密度等限制。可变电阻式存储器这类下一代存储器被认为具有多种优势, 如高封装密度、非易失性、可扩展性和低功耗, 但单元可靠性一直是个问题。不可靠的内存操作是由于大量使用写入或内存密集型工作负载而导致的永久性固定型故障。越来越多的固定型故障也限制了芯片寿命。因此, 本文提出一种基于元胞自动机(CA)的动态消除固定型故障设计, 以解决不可靠的电池功能和不稳定的电池寿命问题。引入可扩展的块级故障诊断和恢复方案, 以确保在出现多比特固定型故障情形下仍可读取数据。该方案是一种新颖方法, 因其目标是消除一般故障条件下对固定型故障的数量和性质的限制。所提方案基于Wolfram零边界和周期性边界CA理论。引入多种特殊类别CA——单长循环单吸引子元胞自动机(SACA)、单长循环双吸引子元胞自动机(TACA)和单长循环多吸引子元胞自动机(MACA)——以实现完全容错。目标微架构单元设计具有最佳空间开销。
Similar content being viewed by others
References
Dalui M, Sikdar BK, 2017. A cellular automata based self-correcting protocol processor for scalable CMPs. Microelectron J, 62:108–119. https://doi.org/10.1016/j.mejo.2016.11.001
Das S, Naskar NN, Mukherjee S, et al., 2010. Characterization of CA rules for SACA targeting detection of faulty nodes in WSN. Proc 9th Int Conf on Cellular Automata for Research and Industry, p.300–311. https://doi.org/10.1007/978-3-642-15979-4_32
Fan J, Jiang S, Shu JW, et al., 2013. Aegis: partitioning data block for efficient recovery of stuck-at-faults in phase change memory. Proc 46th Annual IEEE/ACM Int Symp on Microarchitecture, p.433–444. https://doi.org/10.1145/2540708.2540745
Hamming RW, 1950. Error detecting and error correcting codes. Bell Syst Techn J, 29(2):147–160. https://doi.org/10.1002/j.1538-7305.1950.tb00463.x
Ipek E, Condit J, Nightingale EB, et al., 2010. Dynamically replicated memory: building reliable systems from nanoscale resistive memories. ACM SIGARCH Comput Arch News, 38(1):3–14. https://doi.org/10.1145/1735970.1736023
Kang S, Cho WY, Cho BH, et al., 2007. A 0.1-µm1.8-V 256-MB phase-change random access memory (PRAM) with 66-MHz synchronous burst-read operation. IEEE J Sol-State Circ, 42(1):210–218. https://doi.org/10.1109/JSSC.2006.888349
Lin IC, Chiou JN, 2015. High-endurance hybrid cache design in CMP architecture with cache partitioning and access-aware policies. IEEE Trans Very Large Scale Integr (VLSI) Syst, 23(10):2149–2161. https://doi.org/10.1109/TVLSI.2014.2361150
Melhem R, Maddah R, Cho S, 2012. RDIS: a recursively defined invertible set scheme to tolerate multiple stuck- at faults in resistive memory. Proc IEEE/IFIP Int Conf on Dependable Systems and Networks, p.1–12. https://doi.org/10.1109/DSN.2012.6263949
Qureshi MK, Karidis J, Franceschini M, et al., 2009. Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling. Proc 42nd Annual IEEE/ACM Int Symp on Microarchitecture, p.14–23. https://doi.org/10.1145/1669112.1669117
Radojković P, Carpenter PM, Moretó M, et al., 2016. Thread assignment in multicore/multithreaded processors: a statistical approach. IEEE Trans Comput, 65(1):256–269. https://doi.org/10.1109/TC.2015.2417533
Saha M, Sarkar S, Sikdar BK, 2016. Cellular automata based fault tolerant resistive memory design. Proc 6th Int Symp on Embedded Computing and System Design, p.176–180. https://doi.org/10.1109/ISED.2016.7977077
Sarkar S, 2018. Multi-bit stuck-at fault recovery system with error correction pointer. Proc 3rd Int Conf on Communication and Electronics Systems, p.528–533. https://doi.org/10.1109/CESYS.2018.8723890
Sarkar S, Saha M, Sikdar BK, 2017. Multi-bit fault tolerant design for resistive memories through dynamic partitioning. Proc IEEE East-West Design & Test Symp, p.1–6. https://doi.org/10.1109/EWDTS.2017.8110053
Sarkar S, Ghosh M, Sikdar BK, et al., 2020. Periodic boundary cellular automata based wear leveling for resistive memory. IAENG Int J Comput Sci, 47(2):310–321.
Schechter S, Loh GH, Strauss K, et al., 2010. Use ECP, not ECC, for hard failures in resistive memories. ACM SIGARCH Comput Arch News, 38(3):141–152. https://doi.org/10.1145/1816038.1815980
Seong NH, Woo DH, Srinivasan V, et al., 2010. SAFER: stuck-at-fault error recovery for memories. Proc 43rd Annual IEEE/ACM Int Symp on Microarchitecture, p.115–124. https://doi.org/10.1109/MICRO.2010.46
Strukov D, 2006. The area and latency tradeoffs of binary bitparallel BCH decoders for prospective nanoelectronic memories. Proc 40th Asilomar Conf on Signals, Systems and Computers, p.1183–1187. https://doi.org/10.1109/ACSSC.2006.354942
Zhou P, Zhao B, Yang J, et al., 2009. A durable and energy efficient main memory using phase change memory technology. Proc 36th Annual Int Symp on Computer Architecture, p.14–23. https://doi.org/10.1145/1555754.1555759
Author information
Authors and Affiliations
Corresponding author
Additional information
Contributors
Sutapa SARKAR designed the research, collected and processed the data, and drafted the paper with formal analysis. Biplab Kumar SIKDAR helped organize the paper. Sutapa SARKAR and Mousumi SAHA revised and finalized the paper.
Compliance with ethics guidelines
Sutapa SARKAR, Biplab Kumar SIKDAR, and Mousumi SAHA declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Sarkar, S., Sikdar, B.K. & Saha, M. Cellular automata based multi-bit stuck-at fault diagnosis for resistive memory. Front Inform Technol Electron Eng 23, 1110–1126 (2022). https://doi.org/10.1631/FITEE.2100255
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.2100255
Key words
- Resistive memory
- Cell reliability
- Stuck-at fault diagnosis
- Single-length-cycle single-attractor cellular automata
- Single-length-cycle two-attractor cellular automata
- Single-length-cycle multiple-attractor cellular automata