Abstract
Networked distributed data storage systems are essential to deal with the needs of storing massive volumes of data. Dependability of such a system relies on its fault tolerance (data should be available in case of node failures) as well as its maintainability (its ability to repair lost data to ensure redundancy replenishment over time). Erasure codes provide a storage efficient alternative to replication based redundancy in storage systems, ensuring the same fault tolerance at a lower storage overhead cost. Traditional erasure codes however have the drawback of entailing high communication overhead for maintenance, when encoded fragments are lost due to storage device failures, and need to be replenished in new nodes. We propose a new family of erasure codes called self-repairing codes (SRC) taking into account the peculiarities of distributed storage systems, specifically to improve its maintainability by ‘localizing’ the repairs. SRC have the property that encoded fragments can be repaired directly from other small subsets of (typically 2 or 3) encoded fragments. These code properties allow bandwidth efficient and fast recovery even in the presence of multiple failures, in turn translating into better system robustness. A concrete family of such locally repairable codes, namely, homomorphic SRC are proposed and various aspects and properties of the same are studied in detail and compared—quantitatively or qualitatively (as may be suitable) with respect to other codes including traditional erasure codes as well as some recent representative codes designed specifically for storage applications.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
In this paper, we use the terms ‘fragment’ and ‘block’ interchangeably. Depending on the context, the term ‘data’ is used to mean either fragment(s) or object(s).
Linearized polynomials are also called additive polynomials.
\(\rho _x\) is zero for \(x<k\) for \(\mathrm{HSRC}\) also.
Note than in fact, often fewer than \(k\) fragments will be adequate to reconstruct a specific fragment.
A storage system’s vulnerability to further failures, as well as spiky bandwidth usage are known problems of lazy repair strategies [11].
References
Buttyán L, Czap L, Vajda I (2011) Detection and recovery from pollution attacks in coding-based distributed storage schemes. In: IEEE transactions on dependable and secure computing, vol 8, issue no 6
Bhagwan R, Tati K, Cheng Y, Savage S, Voelker G (2004) Total recall: system support for automated availability management. In: Networked systems design and implementation (NSDI), 2004
http://www.cleversafe.org/dispersed-storage/configurations. Accessed 11 Aug 2013
Dimakis AG, Brighten Godfrey P, Wu Y, Wainwright MO, Ramchandran K (2010) Network coding for distributed storage systems. IEEE Trans Inf Theory 56(9):4539–4551
Datta A, Aberer K (2006) Internet-scale storage systems under churn—a study of the steady-state using Markov models. In: Peer-to-peer computing (P2P)
Duminuco A, Biersack E (2008) Hierarchical codes: how to make erasure codes attractive for peer-to-peer storage systems. In: Peer-to-peer computing (P2P)
Duminuco A, Biersack EW (2009) A practical study of regenerating codes for peer-to-peer backup systems. In: International conference on distributed computing systems (ICDCS)
El Rouayheb S, Ramchandran K (2010) Fractional repetition codes for repair in distributed storage systems. In: Allerton conference on communication control and computing
Gaonkar S, Keeton K, Merchant A, Sanders WH (2010) Designing dependable storage solutions for shared application environments. IEEE Trans Dependable Secur Comput 7(4)
Kermarrec A-M, Le Scouarnec N, Straub G (2010) Beyong regenerating codes. In: Technical report, 10 Sept 2010
Liu X, Datta A (2009) Redundancy maintenance and garbage collection strategies in peer-to-peer storage systems. In: International symposium on stabilization, safety, and security of distributed systems (SSS)
Gopalan P, Huang C, Simitchi H, Yekhanin S (2012) On the locality of codeword symbols. IEEE Trans Inf Theory 58(11):6925–6934
Grolimund D (2007) Wuala—a distributed file system. Google Tech Talk http://www.youtube.com/watch?v=3xKZ4KGkQY8. Accessed 11 Aug 2013
Haytaoglu E, Dalkilic ME (2013) Homomorphic minimum bandwidth repairing codes. In: Information sciences and systems 2013. Lecture notes in electrical engineering, vol 264, pp 339–348
Hollmann H (2013) Storage codes—coding rate and repair locality. In: ICNC 2013
Huang C, Chen M, Li J (2007) Pyramid codes: flexible schemes to trade space for access efficiency in reliable data storage systems. In: NCA 2007
Kamath GM, Prakash N, Lalitha V, Vijay Kumar P (2013) Codes with Local Regeneration. In: The proceedings of the workshop on information theory and applications (ITA), 2013. Available at arXiv:1211.1932. Accessed Nov 2012
Oggier F, Datta A (2011) Self-repairing homomorphic codes for distributed storage systems. In: INFOCOM 2011
Oggier F, Datta A (2011) Self-repairing codes for distributed storage—a projective geometric construction. In: ITW 2011
Papailiopoulos DS, Dimakis AG (2012) Locally repairable codes. In: ISIT 2012
Rashmi KV, Shah NB, Kumar PV, Ramchandran K (2009) Explicit construction of optimal exact regenerating codes for distributed storage. In: Allerton conference on control, computing and communication 2009
Rashmi KV, Shah NB, Kumar PV (2011) Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a product-matrix construction. IEEE Trans Inf Theory 57(8):5227–5239
Rashmi KV, Shah NB, Kumar PV (2011) Enabling node repair in any erasure code for distributed storage. In: IEEE International Symposium on Information Theory, 2011. IEEE, New York
Reed IS, Solomon G (1960) Polynomial Codes Over Certain Finite Fields. J Soc Ind Appl Math 8(2):300–304
Rodrigues R, Liskov B (2005) High availability in DHTs: erasure coding vs. replication. In: Workshop on peer-to-peer systems (IPTPS)
Shum KW (2011) Cooperative regenerating codes for distributed storage systems. ICC 2011
Shum KW, Hu Y (2013) Cooperative regenerating codes. arXiv:1207.6762
Silberstein N, Rawat AS, Koyluoglu O, Vishwanath S (2012) Optimal locally repairable codes via rank-metric codes. arXiv:1301.6331
Tamo I, Wang Z, Bruck J (2011) MDS array codes with optimal rebuilding. In: IEEE ISIT 2011
Acknowledgments
This work has been supported by the MoE Tier-2 grant MOE2013-T2-1-068 “eCODE: Erasure Codes for Datacenter Environments”.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Oggier, F., Datta, A. Self-repairing codes. Computing 97, 171–201 (2015). https://doi.org/10.1007/s00607-014-0426-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-014-0426-5