On the Effectiveness of Recoding-based Repair in Network Coded Distributed Storage
Abstract
High capacity storage systems distribute files across several storage devices (nodes) and apply an erasure code to meet availability and reliability requirements. Since devices can lose network connectivity or fail permanently, a dynamic repair mechanism must be put in place. In such cases a new recovery node gets connected to a given subset of the operating nodes and receives a part of the stored data.
The objective of this paper is to investigate data survival for Random Linear Network Coding (RLNC) as a function of topology and communication overhead, defined by the number of connections and the number of transmitted packets to the recovery node, respectively. The paper includes two main contributions. First, a sufficient set of conditions for quasi-infinite longevity of the stored data is derived. Second, a comparison using experimental results shows that RLNC can be up to 50% more effective than traditional erasure codes like Reed-Solomon.