KR101621752B1

KR101621752B1 - Distributed Storage Apparatus using Locally Repairable Fractional Repetition Codes and Method thereof

Info

Publication number: KR101621752B1
Application number: KR1020150128429A
Authority: KR
Inventors: 송홍엽; 남미영
Original assignee: 연세대학교 산학협력단
Priority date: 2015-09-10
Filing date: 2015-09-10
Publication date: 2016-05-17

Abstract

The present invention relates to a distributed storage apparatus using locally repairable fractional repetition codes and a method thereof. An embodiment of the present invention proposes a partially repairable fractional repetition code which reduces a locality by improving an existing fractional repetition (FR) code method, and at the same time achieve a maximum storage capacity of the FR code method as an MBR code. According to the present invention, an upper limit of a capacity of locally repairable FR codes with locality 2 can be achieved. To this end, the locally repairable fractional repetition coding method comprises steps of: coding data to be stored including one or more original symbols using maximum distance separable (MDS) codes method to generate MDS code symbols; repeating coding the MDS code symbols a predetermined number of times to generate multiple copies of MDS code symbols of each of the MDS code symbols; and storing at least one of the MDS code symbols in a storage node of a distributed storage system, wherein the MDS code symbols to be stored in each storage node are selected according to predetermined rules that cause at least two identical symbols to be stored in different storage nodes so that a locality is smaller than the number of symbols stored in the storage node.

Description

Technical Field [0001] The present invention relates to a distributed storage apparatus using a repetition division code capable of partial access recovery and a distributed storage apparatus using the same,

The present invention relates to a distributed storage apparatus and a method thereof for storing a file using an iterative division code in a distributed storage system.

Distributed storage systems have been introduced and used for safe storage and efficient management and recovery of data in large data environments where large amounts of data must be stored and maintained. For example, companies like Google, Facebook, Amazon, and Microsoft, which provide services such as search and data delivery around the world, use cloud storage systems, or distributed storage systems, to store vast amounts of data.

A distributed storage system distributes and stores data to a plurality of nodes connected through a network. However, at this time, data stored in the node may be lost due to physical imperfections of the nodes or unstable connection state of the network. Therefore, there is a need for a stable storage method to prevent permanent loss of data due to such node failure.

In the conventional distributed storage system, a secure storage method includes a replication code method for storing data redundantly, and a more efficient method of erasure code for introducing a more complicated encoding process. All of these methods employ traditional error correction codes.

However, the distributed storage system differs from the communication system to which the conventional error correction code is applied. In a distributed storage system, it is necessary to recover a failed node from time to time in order to maintain the stability constantly. This is because only the process of decoding the original data from the entire symbols in which the error correction code applied to the existing communication system is encoded is considered It is different from the one. In such a distributed storage system, since the process of recovering some lost symbols must be performed from time to time, low recovery complexity and low recovery bandwidth are indispensable. Here, the recovery bandwidth refers to the total amount of symbols to be received from other nodes in order to recover a corresponding symbol if some coded symbols are lost due to one node failure.

A regenerating code has been proposed to optimize the recovery bandwidth. In this playback code, there is a tradeoff relationship between the recovery bandwidth and the storage capacity of one node. The MSR code, which minimizes the storage capacity, and the minimum bandwidth reproduction code (MBR Code) have been researched in both aspects. In addition, a fractional repetition code (FR code) with low complexity is proposed since it can perform an exact repair without any additional encoding process in the recovery process as one of the minimum bandwidth replay code (MBR code) .

However, the conventional iterative partitioning code has a size equal to?, Which is the number of symbols stored in one node, and thus the number of partial accesses increases as the number of symbols stored in one node increases. .

(Patent Document 001) Korean Patent Publication No. 10-1341386 (Dec. 19, 2013)

(Paper No. 001) Gopalan, Parikshit, et al. "On the locality of codeword symbols." Information Theory, IEEE Transactions on 58.11 (2012): 6925-6934.

SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and it is an object of the present invention to provide a method and apparatus for improving the efficiency of a conventional iterative partitioning coding method by reducing the number of partial accesses and simultaneously achieving a maximum storage capacity as an MBR code of an iterative partitioning coding method. And to provide an apparatus therefor.

According to an aspect of the present invention, there is provided a distributed storage method using a partial division reconfigurable code according to one aspect of the present invention, wherein a plurality of first symbols obtained by dividing data to be stored by an encoding unit are coded to generate second symbols A repetition encoding step in which the encoding unit repeatedly encodes the second symbols by a predetermined repetition number and encodes the second symbols so that the second symbols exist for the repetition times; Storing at least one of the second symbols so that the number of partial accesses that is the minimum number of access nodes required to recover the stored storage node is less than the number of second symbols stored in the storage node, A node stores at least two identical second symbols in common Lock according to pre-established rules, each said storage nodes store may include a storage node per symbol selection step of selecting the second symbol. Wherein the second symbol may be selected to store preferably at most two identical second symbols in common.

Here, the distributed storage method using the partial division reconnectable code may further include a distributed storage step of distributing and storing the second symbols selected to be stored for each storage node in the storage node in the symbol selection step for each storage node .

The pre-encoding step may generate the second symbol by encoding the first symbols using an MDS (Maximum Distance Separable) code.

Here, the symbol selection step for each storage node may include storing, for all of the storage nodes, at least one storage node and at least one second identical second symbol in common, The second symbol to be stored for each storage node may be selected so that the second symbol does not have the second symbol commonly stored or commonly stored.

Wherein the step of selecting symbols for each storage node includes the steps of: storing, for all of the storage nodes, two identical second symbols in common with only one other storage node; The second symbol to be stored for each storage node can be selected so that there is no second symbol for storing the same second symbol in common or for storing in common.

Wherein the repetition encoding step repeatedly encodes the repetition number so that the repetition number is 2 and encodes the second number of repetition times so that the second symbol exists by the repetition number. The symbol selection step for each storage node divides the storage nodes into two groups, The storage nodes belonging to the same storage group do not commonly store the same second symbols and the storage nodes belonging to different groups store the same second symbols in common for only two or fewer. The second symbol can be selected.

Wherein the step of selecting symbols for each storage node includes the steps of: storing two identical second symbols in common if the node indexes allocated to the groups are the same among the storage nodes belonging to different groups; The second symbol to be stored for each of the storage nodes may be selected so that the second symbol other than the same second symbol among the second symbols is stored in common with only the other storage node.

Wherein the repetition coding step repeatedly codes the repetition number so that the repetition number is 3, and encodes the second symbol so that the repetition number of repetition times exists, and the storage node stores the three second symbols, Wherein the storage node is configured to store only one other storage node and two identical second symbols in common, and to store the second symbol among the second symbols stored in the storage node, The second symbol to be stored for each storage node may be selected so that the second symbol is shared by only one of the other storage nodes.

Wherein the step of selecting symbols for each storage node includes the steps of: for each storage node, each storage node commonly stores two identical second symbols only with two different storage nodes; The second symbol to be stored for each storage node can be selected so that there is no second symbol for storing the same second symbol in common or for storing in common.

Wherein the repetition encoding step repeatedly encodes the repetition number so that the repetition number is 3 and encodes the second number of repetition times so that the second symbol exists for the repetition number, The second symbol to be stored for each storage node may be selected so as to commonly store two identical second symbols between the storage nodes having the node index.

Here, another embodiment of the present invention may be a computer program stored in a storage medium for executing the distributed storage method using the partial access recoverable repeat partition code.

According to an aspect of the present invention, there is provided a distributed storage apparatus using partial division reconfigurable code according to one aspect of the present invention includes a plurality of first symbols obtained by dividing data to be stored, A repetition encoding unit for repeatedly encoding the second symbols by a predetermined number of repetition times and encoding each of the second symbols so that the number of repetition times is equal to the number of repetition times of the second symbol, Wherein the number of partial accesses that is the minimum number of access nodes required to recover the storage node that is abnormal is less than the number of the second symbols stored in the storage node, According to a predetermined rule for commonly storing the second symbols, each storage node Store may include a node management section for selecting the second symbol.

Here, the pre-encoding unit may generate the second symbol by encoding the first symbols using an MDS (Maximum Distance Separable) code.

Wherein the node management unit stores, for all of the storage nodes, two identical identical second symbols only with only one other storage node, and the same second The second symbol to be stored for each of the storage nodes is selected so that there is no second symbol for storing the symbols in common or for storing in common.

Wherein the node management unit divides the storage nodes into two groups when the iterative coding is performed so that the repetition number is 2 in the iterative coding unit and the storage nodes belonging to the same group store the same second symbols in common The storage nodes belonging to different groups different from each other select the second symbol to be stored for each storage node so that only the same second symbol is stored in common to two or less of them, Wherein the storage node is configured to store only one other storage node and two identical second symbols in common if the storage node is repeatedly encoded so that the number of times is equal to 3, And the second symbol other than the second symbol to be stored is transmitted only to the other storage node , To save them in a store individually for each storage node may be characterized by selecting said second symbol.

Wherein the node management unit stores, for all of the storage nodes, two identical identical second symbols only with only two different storage nodes, and the same second The second symbol to be stored for each of the storage nodes is selected so that there is no second symbol for storing the symbols in common or for storing in common.

The distributed storage system using the partial division reconfigurable code and the distributed storage system using the partial division reconfigurable code according to the present invention improves the conventional iterative partitioned coding method to reduce the number of partial accesses, The MBR code can achieve the maximum storage capacity.

1 is a reference diagram showing a repeated division code.
2 is a reference diagram showing an example of repeated division.
FIG. 3 is a block diagram of a distributed storage apparatus using partial division reconfigurable code according to an embodiment of the present invention. Referring to FIG.
FIG. 4 is a flowchart illustrating a distributed storage method using a partial access repairable repeated partitioning code according to an embodiment of the present invention.
FIG. 5 is a graph illustrating the operation of the symbol selection step for each storage node according to the first partial access recoverable repeated division code according to the present invention.
FIG. 6 and FIG. 7 are reference views showing a result of selecting symbols stored by the storage node according to the operation of the symbol selection step for each storage node according to the second partial connection recoverable repeatable partitioning code of the present invention.
FIG. 8 is a graphical representation of an operation of selecting symbols stored by a storage node according to a second partial access recoverable repetition partitioning code of the present invention.
FIG. 9 is a reference diagram showing an operation of selecting a symbol to be stored in a storage node according to a third partial access reconfigurable cyclic redundancy code according to the present invention. FIG.

Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the drawings, the same reference numerals are used to designate the same or similar components throughout the drawings. In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear. In addition, the preferred embodiments of the present invention will be described below, but it is needless to say that the technical idea of the present invention is not limited thereto and can be variously modified by those skilled in the art.

In the conventional distributed storage system, a secure storage method includes a replication code method for storing data redundantly, and a more efficient method of erasure code for introducing a more complicated encoding process. All of these methods employ traditional error correction codes. However, the distributed storage system differs from the communication system to which the conventional error correction code is applied. In a distributed storage system, it is necessary to recover a failed node from time to time in order to maintain the stability constantly. This is because only the process of decoding the original data from the entire symbols in which the error correction code applied to the existing communication system is encoded is considered It is different from the one. In the distributed storage system, since a process of recovering some lost symbols must be performed from time to time, low recovery complexity and low recovery bandwidth are essential. Where the recovery bandwidth occupies a significant portion of the overall traffic of the distributed storage system, with the total amount of symbols that must be received from other nodes to recover the symbol if some coded symbols are lost due to one node failure. Therefore, reducing the recovery bandwidth is an important issue to be solved in a distributed storage system.

A regenerating code has been proposed to optimize the recovery bandwidth. In this playback code, there is a trade-off relationship between the recovery bandwidth and the storage capacity of one node. The minimum storage regenerating code (MSR code), which minimizes the storage capacity, and the code (MBR Code), which is a rewrite code, has been studied.

First, there are n storage nodes for storing a file or data to be stored in size M, and the storage capacity of each storage node is?, And one storage node that is in an abnormal state due to a failure or loss is connected, A distributed storage system (n, n) in which the number of storage nodes to be received is d, and an end user accesses any of K nodes among all n storage nodes and can obtain an original storage target file or data using the received data, K, d, α) is defined as a distributed storage system. In this case, assuming that the amount of data received from one node is equal to β in the process of recovering a failed storage node in a normal distributed storage system, the recovery bandwidth γ, which is the total amount of data required to recover one node, dβ. The maximum capacity M of a file or data to be stored that can be stored in the MBR Code that minimizes the recovery bandwidth is expressed by Equation 1 below. 9, pp. 4536-4551, Sep. 2002. [0008] The present invention is described in detail in "AG Dimakis, PB Godfrey, Y. Wu, MJ Wainwright, and K. Ramchandran, Network coding for distributed storage systems, IEEE Trans. Inf. . 2010 "is showing that fact.

And a minimum bandwidth replay code (MBR code) that achieves the maximum capacity as shown in Equation (1) above, it is unnecessary to perform an additional encoding process in the restoration process and can perform the same repair (exact repair) Fractional Repetition (FR Code) has been proposed.

The iterative division code will be described in more detail as follows.

Fractional Repetition (FR Code) performs coding by repeatedly dividing symbols obtained by pre-encoding a to-be-encoded file. Here, the pre-coding is preferably encoded with, for example, an MDS (Maximum Distance Separable) code. Herein, the iterative partitioning refers to a process of distributing the generated symbols to the storage nodes by repeating the coded symbols a predetermined number of times. For example, it may be a process of appropriately distributing ρθ symbols generated by repeating θ symbols by a repetition ρ times to n storage nodes. (N, k, d, α) in a distributed storage system, n subsets V ₀ , ..., V of Ω = {0, _A collection of _n-1s is called a recursive partition with a repetition ρ that satisfies the condition that each element of Ω belongs to a set of ρ in the collection. The pre-coding, preferably the MDS coding, and the entire code that concatenates these repeated division are called repeated division codes.

1 is a reference diagram showing the above-described repeated division code. As shown in FIG. 1, M symbols x may be obtained from a storage object file, and MDS-encoded MDS-encoded symbols y may be repeatedly divided.

2 is a reference diagram showing an example of repeated division in the case of? = 6,? = 3, and n = 7. In FIG. 2, each symbol stored in n = 7 nodes is represented by an index of the corresponding symbol for convenience. As shown in FIG. 2, the MDS-coded symbols repeatedly repeated by the repetition rate p are distributed to the respective storage nodes V ₀ to V ₆ .

There are repeat division codes proposed in SE Rouayheb and K. Ramchandran, Fractional repetition codes for distributed in distributed storage systems, 48 ^th Annu. Allerton Conf. Comtrol, Comput. , Commun . The proposed iterative partitioning code proposes a regular graph when ρ = 2 and a steiner system when ρ> 2. In this case, the storage capacity of the MBR according to Equation (1), that is, the size of the largest file is achieved by making only a maximum of one symbol common among all two nodes among all n nodes . However, in this case, as a result of sharing only one symbol among the nodes, in order to recover a failure occurring in one node, it is necessary to connect to another node as many as the number of symbols stored in one node, have.

However, the number of access nodes required to recover a lost or failed storage node is also one of the main factors determining the performance of the distributed storage system. The code that reduces the number of access nodes required to recover such a failed node is called a Locally Repairable Codes (LRCs), and the number of the minimum access nodes required to recover a failed node is referred to as a partial access And is used as a parameter indicating the performance of the partial access recovery codes (LRCs).

Therefore, the existing iterative partitioning code has the same size as α, which is the number of symbols stored in one node, and thus the number of partial accesses increases as the number α of symbols stored in one node increases, .

On the other hand, there have been studies on codes having a small number of partial accesses described above, and codes for reducing the number of partial connections have been developed. For example, the partial access recovery code proposed in "D. Papailiopoulos and AG Dimakis, Locally repairable codes, IEEE Trans. Inf. Theory , vol. 60, no. 10, pp. 5843-5855, Oct. 2014." There is a sign. However, most of the codes for reducing the number of partial accesses are required to perform operations between symbols in the nodes that perform recovery in order to recover from a failed node, or in other nodes that are connected in order to perform recovery, There is a problem that the computational complexity of the process increases.

On the other hand, the node recovery in the iterative partition code is a repair-by-transfer. In order to recover the node to be restored, the other nodes connected directly transmit some of the symbols stored in the node, It is possible to recover only the received symbols as they are.

Accordingly, the present invention achieves the maximum storage capacity of the MBR code according to Equation (1) while decreasing the partial access number of the conventional iterative partitioning code as described above, and at the same time, A partial access recoverable iterative partitioning code using an iterative partitioning code is provided. And we propose a method for distributed storage of data to be stored in a distributed storage system and a storage device therefor using the proposed iterative partitioning code.

Hereinafter, a distributed storage apparatus and a method thereof according to the present invention will be described in more detail.

The partial connection recoverable repeat division code provided by the present invention is a code satisfying the partial connection count d <? Among the (n, K, d,?) Repeated division codes. That is, the number of partial accesses is smaller than the number of symbols stored in one storage node.

The distributed storage apparatus using the partial division reconfigurable code according to an embodiment of the present invention may include the encoding unit 100 and the node management unit 200. [

FIG. 3 is a block diagram of a distributed storage apparatus using partial division reconfigurable code according to an embodiment of the present invention. Referring to FIG.

Herein, the distributed storage device using the partial division reconfigurable code according to the present invention may be configured such that all of the components are implemented as one independent hardware, or a part or all of the components are selectively combined, Or as a computer program having a program module that performs some or all of the functions combined in a plurality of hardware. Also, the distributed storage device using the partial division reconfigurable code according to the present invention may be implemented as a software program and operated on a processor or a signal processing module, or may be implemented in the form of hardware to be used in various processors, chips, semiconductors , Devices, and the like. Also, the distributed storage device using the partial division reconfigurable code according to the present invention can be implemented in the form of hardware or software on a computer or various embedded systems or devices included in a distributed storage system.

The distributed storage method using the partial division reconfigurable code according to an embodiment of the present invention may include a pre-encoding step S100, a repetition encoding step S200, and a symbol selection step S300 for each storage node , And if necessary, a distributed storage step (S400). The encoding unit 100 and the node management unit 200 of the distributed storage device using the partial segmentation capable of partial reconnection can be divided into a pre-encoding step S100, a repetition encoding step S200, Symbol selection step S300, and may further perform the operation of the distributed storage step S400 as needed.

FIG. 4 is a flowchart illustrating a distributed storage method using a partial access repairable repeated partitioning code according to an embodiment of the present invention.

In the pre-coding step S100, the coding unit 100 generates second symbols by coding a plurality of first symbols obtained by dividing data to be stored.

In the repetition encoding step S200, the encoding unit 100 repeatedly encodes the second symbols by a predetermined repetition degree, and encodes each of the second symbols so that the repetition degree exists.

In step S300, the node management unit 300 stores at least one second symbol in the storage node of the distributed storage system. In step S300, the node management unit 300 stores the at least one second symbol in the storage node of the distributed storage system, The number of the second symbols to be stored in the storage node is smaller than the number of the second symbols stored in the storage node according to a predetermined rule so that different storage nodes commonly store at least two identical second symbols. Select the second symbol. Here preferably, the second symbol may be selected such that different storage nodes store up to two identical identical second symbols in common.

In the distributed storage step (S400), the second symbols selected to be stored for each storage node in the symbol selection step for each storage node are dispersed and stored in the storage node. The above operation can be performed by the node management unit 300. Here, the operation of the distributed storage step (S400) may be performed by the node management unit 300. [

First, the pre-encoding step S100 will be described in more detail.

Here, the pre-encoding step S100 may generate the second symbol by encoding the first symbols using an MDS (Maximum Distance Separable) code. Referring to FIG. 1, the pre-encoding step S100 may acquire M symbols x from the data to be stored and MDS-encode the M symbols to obtain MDS-encoded θ symbols y. Here, the MDS code is defined as a code that achieves Singleton bound, and a representative code is a Reed-Solomon code. The specific code may have various parameters, and the number of the maximum faulty nodes that can be corrected using the code is determined according to the minimum distance. Here, the maximum number of faulty nodes is the minimum distance - 1. That is, when the corresponding code is used, a value indicating whether the maximum number of failed nodes can be corrected and restored is the minimum distance. Therefore, in terms of reliability, a code having a larger minimum distance is a more preferable code. Singleton bound is the upper bound of this minimum distance. And the code that achieves Singleton bound, that is, the code with the minimum minimum distance is called MDS code. It is known that the above-described Singleton bound is calculated as shown in the following Equation 2. " (2) "

Where md is the minimum distance,? Is the length of the sign, and M is the size of the data to be stored.

In this case, the (M, M) encoding is called (θ, M) encoding in which the first symbol of M symbols is MDS encoded to obtain θ second symbols.

Next, the iterative encoding step (S200) will be described in more detail.

In the repetition encoding step S200, the encoding unit 100 repeatedly encodes the second symbols by a predetermined repetition degree, and encodes each of the second symbols so that the repetition degree exists. Here, the term " repetition rate " means the number of times each symbol is repeated. For example, if A symbol is repeated twice and A becomes A, the iteration becomes 2.

Here, the Fractional Repetition (FR Code) will be described again. The second symbols obtained by pre-coding the first symbols obtained from the encoding target file are repeatedly divided. Herein, the iterative partitioning is a process of distributing symbols generated by repeating pre-coded symbols a predetermined number of times to storage nodes as described above. Here, the iterative encoding step (S200) can obtain ρθ symbols generated by repeating θ second symbols by repetition ρ times. For example, three second symbols A, B, and C are repeated twice, and 2 x 3 symbols A, A, B, B, C, and C are obtained in the repetition encoding step S200 .

Here, the process of distributing the ρθ symbols obtained as described above to n storage nodes, respectively, is performed in the symbol selection step (S300) for each storage node, which will be described in detail below. Accordingly, the operations of the iterative encoding step (S200) and the symbol selection step (S300) for each storage node are collectively referred to as the repeated division described above.

(N, K, d, α) in the distributed storage system, n subsets of Ω = {0, ..., θ-1} V ₀ , ..., V _n-1, a collection satisfying the condition that each element of Ω belongs to a set of ρ in the collection can be expressed as a repeated division having a repetition degree ρ.

Next, the symbol selection step (S300) for each storage node will be described in more detail.

In step S300, the node management unit 300 stores at least one second symbol in the storage node of the distributed storage system. In step S300, the node management unit 300 stores the at least one second symbol in the storage node of the distributed storage system, The number of the second symbols to be stored in the storage node is smaller than the number of the second symbols stored in the storage node according to a predetermined rule so that different storage nodes commonly store at least two identical second symbols. Select the second symbol. As will be described in detail below, it is preferable that there is one storage node or two storage nodes that commonly store one storage node and two second symbols.

Here, the storage node means a node storing a part of data to be stored in the distributed storage system, and one storage node can store at least one symbol representing a part of data to be stored. Here, the symbol may be one bit or byte as required, or may be set to have a predetermined size or a size determined by a predetermined rule.

Here, the abnormal storage node means a node in which an error occurs in a corresponding node, such as a storage node is lost or lost, all or a part of the storage node is destroyed, and the stored symbols can not be completely received from the corresponding storage node.

As described above, in the conventional iterative partitioning code, the storage capacity of the MBR code according to Equation (1), that is, the maximum file size is achieved by having only a maximum of one symbol common between all two nodes of the storage node In this case, in order to recover a fault occurring in one node, there is a problem that a lost symbol is received by connecting to another node as many as the number of symbols stored in one node.

In the symbol selection step S300 according to the present invention, in order to make the number of partial accesses smaller than the number of the second symbols stored in the storage node as described above, The second symbol to be stored by each storage node is selected according to a predetermined rule for commonly storing the second symbol. As described in detail below, it is preferable that one storage node and one storage node that commonly store two second symbols are present, and two storage nodes may exist if necessary.

At this time, symbol selection step (S300) for each storage node stores, for all of the storage nodes, at least one storage node and at least one second identical second symbol in common, and the remaining storage nodes The second symbol to be stored for each storage node may be selected so that there is no second symbol for storing the same second symbol in common or for common storage.

As a result, when one node fails, another node having two identical symbols among the symbols stored in the corresponding node exists. As a result, the number of partial accesses may be smaller than the number of symbols stored in the storage node It is.

In addition, in the case where symbols selected according to the storage node selection step S300 according to the present invention are stored in respective storage nodes, as will be described in detail below, not only the number of partial accesses is reduced as compared with the conventional partial division codes, Lt; RTI ID = 0.0 > capacity. &Lt; / RTI > More specifically, according to the present invention, there is an effect of achieving the maximum capacity of the MBR code in a region where K is 3 or more as described below.

Hereinafter, a method for distributed storage using the three partial access recoverable repeated division codes proposed in the present invention will be described in detail.

First, in order to achieve the maximum capacity that can be stored using the MBR code, the partial access recoverable repeated partitioning code according to the present invention is configured such that, for all the storage nodes in the symbol selection step for each storage node (S300) The second symbol is stored in common with only one other storage node, and the second symbol is stored in common with the remaining storage nodes, , And may select the second symbol to be stored for each storage node.

Here, in order to achieve the maximum capacity that can be stored using the MBR code, for all the storage nodes, each storage node commonly stores two identical second symbols only with one other storage node More specifically, a distributed storage method using a repetition division code capable of first partial access recovery of the present invention will be described in more detail.

Here, the repetition coding step S200 repeatedly codes the repetition degree to be 2, and encodes the second symbol so that the repetition degree exists.

In the symbol selection step S300 according to each storage node, the storage nodes are divided into two groups, and the storage nodes belonging to the same group do not commonly store the same second symbols, and the storage nodes belonging to different groups Selects the second symbol to be stored for each storage node so that only the same number of the second symbols is stored in common to two or less.

At this time, symbol selection step S300 for each storage node may be performed such that two identical second symbols are commonly stored if the node indexes allocated to the groups are the same among the storage nodes belonging to different groups, The second symbol to be stored can be selected.

In addition, the symbol selection step for each storage node (S300) may include storing the second symbols other than the two identical second symbols among the second symbols stored in the storage node, And may select the second symbol to be stored for each storage node.

For example, the storage node may store three of the second symbols. In this case, the symbol selection step for each storage node (S300) may select the second symbol to be stored for each storage node as shown in Equation (3).

Where V and U are respective storage nodes of the two groups of storage nodes, i is the node index of the storage node, and _Sk is the kth second symbol.

FIG. 5 is a graph for explaining the operation of the symbol selection step S300 for each storage node according to the first partial access recoverable repeated division code of the present invention. Where the number α for storing the second symbol for each storage node is 3, the repetition rate ρ is 2, and the number n of storage nodes is 6. The operation of selecting a symbol to be stored for each storage node according to the first partial access recoverable repetition division code provided in the present invention is shown in the form of a graph as shown in FIG. 5 (b) 5 (a). In FIG. 5B, each node corresponds to a storage node of the distributed storage system. Here, the storage node is divided into two groups, V _i and U _i (i = 0, 1, 2). There is no edge connecting two V _i , V _j belonging to the same group, or an edge connecting U _i , U _j . And there are three possible cases where no edge is connected, one edge is connected, or two edges are connected between each V _i and U _j (0 ≤ i, j ≤ 2). Referring to FIG. 5, two nodes V _i and U _i having the same index i are connected by two edges, and two nodes U _i , V _{i + 1 (mod 3)} is connected to one edge. In the graph of FIG. 5 (b), different values are indicated from 0 to 8 on the edge. This means that the symbols corresponding to the index of the symbol are stored at the end nodes where the edge is incident. Here, since there are two nodes where one edge is incident, all symbols are repeated twice. To recover from a single node failure, all nodes connected to the edge connected to the node need to be connected to receive the symbol corresponding to the edge. For example, when recovering a node V _0, the two neighbor nodes index 6 the symbol with the node index, the symbol {0, 1} corresponding to U ₂ and the edges connected to node V ₀ from U ₀ from the nodes It comes. Here, the graph of FIG. 5 (b) can be expanded to various sizes. The generated code is a partial division recoverable code with partial access d = 2 and repetition degree p = 2.

Next, the fact that the maximum capacity of the MBR code is achieved when the symbol selection step (S300) for each storage node operates using the first partial access recoverable repeated partitioning code as described above will be described in more detail .

Referring to FIG. 5 (b), when the symbol selection step S300 for each storage node is performed using the iterative partitioning code capable of recovering the first partial access connection, one node always connected to two edges There is a neighbor node and one neighbor connected to one edge. The maximum storage capacity M ₁ of the code represented by this graph can be expressed by the following equation (4) or (5).

Where K is the number of nodes the user connects to in order to obtain the data to be stored.

Meanwhile, the conventional iterative division code achieves when the maximum capacity of the MBR code of Equation (1) becomes d =?. Here, β is 1. Equation (1) can be summarized as Equation (6) using parameters of the iterative division code. For the purpose of comparison, a case where α is 3 is considered. Here, M _MBR means the maximum capacity of the MBR code.

If M _MBR and M ₁ are compared with each other through Equations (5) and (6), M _MBR > M _{1 in} the region of K <3 and M _MBR <M _{1 in} the region of K> 3. When K = 3, the two values are equal. Therefore, in the region of K? 3, it is shown that the first partial access recoverable partitioning code becomes a repetition partitioning code which achieves the maximum capacity of the MBR.

However, since the repetition rate ρ of the first partial access recoverable code is 2, the availability becomes ρ-1 = 1. The availability here means the maximum number of node losses that can be recovered. That is, partial access recovery is possible only when a maximum of one node fails, and partial access recovery is impossible if two or more node failures occur.

In order to overcome the limitation of the first partial access recoverable partitioning code, the present invention provides another partial access repairable repeated partitioning code.

Next, in order to achieve the maximum capacity that can be stored using the MBR code, and at the same time to increase the availability, it is necessary that for all said storage nodes, each said storage node is associated with only one other said storage node, As another method for storing symbols in common, a distributed storage method using a repetition division code capable of recovering a second partial access of the present invention will be described in more detail.

Here, the iterative encoding step (S200) repeatedly codes the repetition degree to be 3, and encodes the second symbol so that the second symbol exists by the repetition degree. That is, in order to enable partial connection recovery even in the case where a plurality of node failures occur, the repetition degree ρ is increased to 3.

At this time, the storage node stores the three second symbols.

Here, the symbol selection step (S300) for each storage node selects the second symbol to be stored for each storage node so that the storage node stores only one other storage node and two identical second symbols in common.

In addition, the symbol selection step for each storage node (S300) may be such that the second symbol other than the commonly stored second symbol among the second symbols stored in the storage node is shared by only one of the other storage nodes And selects the second symbol to be stored for each storage node.

Here, the symbol selection step for each storage node (S300) may preferably select the second symbol to be stored for each storage node as shown in Equation (7), (8), and (9).

Where V _i is the i th storage node, v _{i, j} is the j th second symbol stored in V _i , S _k is the k th second symbol, t = 1, ..., T And T is a constant determined in advance according to the number of the storage nodes necessary for the storage object data to be restored.

Here, the value T may be set according to the number K of the storage nodes to be connected in order to acquire the storage object data as shown in Equation (10).

According to the above Equation 7 to 9 wherein the storage node may verify the three symbols are stored, each V _i is _{v i, 1, v i,} 2, v that _i, the node is stored in the _third. In addition, four nodes and four symbols are added in t = 2, ..., T-1, and four nodes are added in the last t = T.

FIGS. 6 and 7 illustrate a case where T is 3 and T is 4 according to the operation of the symbol selection step S300 for each storage node according to the second partial access recoverable repetition division code of the present invention, Is a reference diagram showing a result of selecting symbols to be stored. For convenience in FIG. 6 and FIG. 7, symbol Si is represented only by index i.

FIG. 8 is a graphical representation of an operation of selecting symbols stored by a storage node according to the second partial access reconfigurable cyclic redundancy code of the present invention. 8 shows a case where the repetition rate ρ = 3 and the storage capacity per node α = 3.

8, if a set of indices i of the second symbol S _i generated through the (12, M) MDS code is {0, ..., 7, A, B, C, D} The twelve second symbols are again encoded through the repetition code with repetition ρ = 3. Generated

8 symbols can be stored in 12 nodes V ₀ , ..., V ₁₁ having a capacity of? = 3 as shown in FIG. As described above, the second partial access recoverable repetition partitioning code satisfies the condition that there is only one neighboring node with a repetition rate ρ of 3 and simultaneously connecting two edges to all nodes. For example, node V ₀ shares two symbols with node V ₁ and shares only one symbol with remaining neighbor nodes V ₄ , V ₅ , V ₈ , and V ₁₀ . All other nodes also have these neighboring nodes.

The maximum storage capacity M ₂ of the storage object data in the distributed storage method applying the second partial access reconfigurable cyclic partitioning code of the present invention is set to a value according to the number K of nodes connected to acquire storage object data, Can be calculated as shown in Equation (11).

For example, according to Equation 11, M ₂ (1) = 3 when K = 1, M ₂ (2) = 3 + 1 when K = 2, and M ₂ (3) = 3 when K = + 1 + 2.

When the equations (4) and (11) are compared, the maximum storage capacity M ₂ of the storage object data in the distributed storage system implemented by the second partial access recoverable repeatable partitioning code according to the present invention is It can be confirmed that the maximum capacity of the MBR code is achieved in the same manner as the maximum storage capacity M ₁ of the storage target data in the distributed storage method using the first part of the inventive method of applying the _I / More specifically, it can be confirmed that the maximum capacity of the MBR code is achieved in the region of K? 3.

Meanwhile, by setting T according to K as in Equation (10), even when the size of K increases, the maximum storage capacity M of the storage object data in the distributed storage method applying the second partial access recoverable code of the present invention ₂ < / RTI > Referring to FIG. 8, FIG. 8 shows a total of three steps of T = 3, which can maintain the maximum capacity M ₂ (K) only in K = 4, However, when K? 5 is required, the maximum capacity of the MBR code can be attained by increasing T to 4 or more. The number of nodes n and the number of second symbols? Necessary for storing the same size file are increased as compared with the repetition division code or the first partial access recoverable repeated division code, . This is the cost required to improve availability while reducing the number of partial connections, respectively.

Hereinafter, a distributed storage method using a repetition division code capable of recovering the third partial access according to the present invention will be described in more detail.

Here, symbol selection step S300 for each storage node may be such that, for all of the storage nodes, each of the storage nodes commonly stores two identical second symbols only with only two different storage nodes, It is preferable that the second symbol to be stored for each storage node is selected so that there is no second symbol for storing one common same second symbol in common or for storing in common.

Here, the iterative encoding step (S200) may repeat the encoding so that the repetition rate becomes 3 and encode the second symbol so that the second symbol exists by the repetition degree.

In this case, the symbol selection step for each storage node (S300) may include storing the two identical second symbols among the storage nodes having the adjacent node index based on the node index of the storage node, Select the second symbol.

Wherein the storage node stores the three second symbols.

Here, the symbol selection step for each storage node (S300) may preferably select the second symbol to be stored for each storage node as shown in Equation (12).

Where V _i is the i th storage node, i is the node index, n is the number of storage nodes, S _k is the kth second symbol, and mod is an operation to calculate the remainder. That is, i (mod n) is an operation for calculating the remainder of i / n, and S _{i (mod n)} means a second symbol having a remainder obtained by dividing i by n as an index. In other words, the mod operation restricts the value of the symbol index from 0 to n-1.

Here, when the index of the set of MDS code symbols generated through the (θ, M) MDS code is {0, ..., θ}, the θ MDS code symbols are repeated through the iterative coding step (S200) 3 symbols through the iterative coding according to Equation (3), and the 3? Symbols are stored in n nodes (V _i , i = 0, ..., n-1) Is selected and stored according to Equation (12).

If the repetition rate is 3 and the number of stored symbols per node is 3, the number of partial accesses is 2, and the availability is also 2 However, there is an effect of further reducing the number of nodes.

FIG. 9 is a reference diagram illustrating an operation of selecting a symbol to be stored in the storage node according to the third partial access recoverable repetition partitioning code symbol selection step (S300).

Referring to FIG. 9, it is assumed that (7, 4, 2, 3) partial reconnection can be recovered and the availability is 2, and each node is connected to two neighboring nodes each having two edges. For example, a node V ₀ stores symbols with a symbol index {0, 1, 2}, where V ₁ is a symbol with a symbol index {1, 2}, V ₆ is a symbol with a symbol index {0,1} _It is stored in common with ₀ node. Then, the nodes V ₂ and V ₅ store the symbols with symbol index 0 and 2, respectively, in common with the node V ₀ . Thus V In the event of a failure to _zero node, V ₆ indexes {0, 1}, the symbol and V ₂ receive the index 2 is the symbol from the node, or V ₁ indices {1, 2}, the symbol and V ₅ from the node from the node It is possible to receive a symbol of index 0 from the node and restore the V ₀ node.

In this way, the third partial reconnectable recursive partition code has two independent recovery sets, so that even if two nodes fail, partial access number 2 is guaranteed. Compared with this, the second partial reconnectable recursive partition code guarantees partial access number 2 in case of failure of one node, but partial access number increases to 3 in case of failure of two nodes. Therefore, the third partial access recoverable repeated division code has a more advantageous effect on the partial access count than the second partial access recoverable repeated division code. On the other hand, the iterative partitioning code capable of recovering the third partial connection has a loss in the maximum capacity of the data to be stored as follows.

That is, in the distributed storage method using the third partial access recoverable partitioning code, the maximum capacity M ₃ of the storage target data is expressed by the following Equation 13, and when? = 3, the following Equation 14 is obtained.

Therefore, the maximum capacity M ₃ in the distributed storage method using the third partial access recoverable partitioning code is smaller than the maximum capacity of the MBR code. More specifically, M ₃ is smaller than M ₁ or M ₂ . Also, M ₃ is smaller than the maximum capacity of the MBR code in the region of K <4.

FIG. 3 is a block diagram of a distributed storage apparatus using a partial access recoverable partitioning code according to another embodiment of the present invention. Referring to FIG.

The distributed storage apparatus using the partial division reconfigurable code according to an embodiment of the present invention may include a coding unit 100 and a node management unit 200. The coding unit 100 may include a pre- And a repetition encoding unit 120. [0033] FIG. Here, the distributed storage apparatus using the partial division recoverable code can operate in the same manner according to the distributed storage method using the partial division recoverable code described above in detail. The overlapping portions will be omitted and briefly described.

The pre-coding unit 110 generates second symbols by coding a plurality of first symbols obtained by dividing data to be stored.

The iterative encoding unit 120 repeatedly encodes the second symbols by a predetermined repetition degree, and encodes each of the second symbols so that the second symbol exists by the repetition degree.

The node management unit 200 stores at least one second symbol in the storage node of the distributed storage system, wherein the number of partial accesses, which is the minimum number of access nodes required to recover the storage node, The second symbol to be stored by each storage node is selected according to a predetermined rule so that different storage nodes commonly store at least two identical second symbols in order to be smaller than the number of symbols. Preferably, the second symbol may be selected so that there is a storage node and another storage node that commonly stores two identical second symbols, as described above.

The node manager 200 may further perform an operation of distributing and storing the second symbols selected by the storage node to be stored in the storage node in the storage node in the storage node selection step.

The pre-encoding unit 110 may generate the second symbol by encoding the first symbols using an MDS (Maximum Distance Separable) code.

Here, the node management unit 200 can select the second symbol to be stored for each storage node using the first through third partial access recoverable repeated division codes according to the present invention described above in detail.

Herein, for all the storage nodes, the node management unit 200 commonly stores two identical second symbols only with only one other storage node, and the same one The second symbol to be stored for each storage node can be selected so that there is no second symbol for storing the second symbols in common or for storing in common.

In the case where the first partial access recoverable repeated division code is used and the repetition coding unit 120 repeatedly codes the repetition rate to be 2, the node management unit 200 divides the storage nodes into two groups The storage nodes belonging to the same group do not commonly store the same second symbols and the storage nodes belonging to different groups store common second symbols only in the same number The second symbol to be stored for each storage node can be selected.

In the case where the second partial access recoverable repetition division code is used and the repetition coding unit 120 repeatedly codes the repetition degree to be 3, the node management unit 200 determines that the storage node has only one other And the second symbol other than the commonly stored second symbol among the second symbols stored by the storage node is shared with the other storage node The second symbol to be stored for each storage node can be selected so that only one common symbol is stored.

In the case where the second partial access recoverable repeatable partitioning code is used, the node management unit 200 determines, for all of the storage nodes, that each storage node has only two identical second symbols And the second symbol to be stored for each storage node can be selected so that there is no second symbol for storing the same second symbol in common with the remaining storage nodes in common or for storing in common .

In addition, the computer program according to another embodiment of the present invention may be a computer program stored in a storage medium to execute the distributed storage method using the partial division reconciliation code.

It is to be understood that the present invention is not limited to these embodiments, and all elements constituting the embodiment of the present invention described above are described as being combined or operated in one operation. That is, within the scope of the present invention, all of the components may be selectively coupled to one or more of them.

In addition, although all of the components may be implemented as one independent hardware, some or all of the components may be selectively combined to perform a part or all of the functions in one or a plurality of hardware. As shown in FIG. In addition, such a computer program may be stored in a computer readable medium such as a USB memory, a CD disk, a flash memory, etc., and read and executed by a computer to implement an embodiment of the present invention. As the recording medium of the computer program, a magnetic recording medium, an optical recording medium, a carrier wave medium, and the like can be included.

Furthermore, all terms including technical or scientific terms have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, unless otherwise defined in the Detailed Description. Commonly used terms, such as predefined terms, should be interpreted to be consistent with the contextual meanings of the related art, and are not to be construed as ideal or overly formal, unless expressly defined to the contrary.

It will be apparent to those skilled in the art that various modifications, substitutions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims. will be. Therefore, the embodiments disclosed in the present invention and the accompanying drawings are intended to illustrate and not to limit the technical spirit of the present invention, and the scope of the technical idea of the present invention is not limited by these embodiments and the accompanying drawings . The scope of protection of the present invention should be construed according to the following claims, and all technical ideas within the scope of equivalents should be construed as falling within the scope of the present invention.

100:
110: pre-encoding unit
120:
200:
S100: pre-encoding step
S200: Repeat encoding step
S300: Symbol selection step for each storage node
S400: Distributed storage step

Claims

A pre-coding step of generating second symbols by coding a plurality of first symbols obtained by dividing data to be stored by an encoding unit;
A repetition encoding step of the encoder repeatedly encoding the second symbols by a predetermined repetition number and encoding each of the second symbols so that the repetition number of the second symbols is present; And
Wherein the node management unit stores at least one second symbol in the storage node of the distributed storage system, wherein the number of partial accesses, which is the minimum number of access nodes required to recover the storage node that is abnormal, Selecting a second symbol to be stored by each of the storage nodes according to a predetermined rule so that different storage nodes store at least two identical second symbols in common, Wherein the partial access recoverable code is a partial access recoverable code.

The method according to claim 1,
Further comprising a dispersion storing step of distributing and storing the second symbols selected for storage by the storage nodes in the storage node in the symbol selection step for each storage node, How to save.

The method according to claim 1,
Wherein the pre-encoding step generates the second symbol by encoding the first symbols using an MDS (Maximum Distance Separable) code, wherein the second symbol is generated by using the iterative partial code.

The method according to claim 1,
Wherein the symbol selection step for each storage node comprises the steps of: storing, for all the storage nodes, at least one storage node and at least one second identical second symbol in common, The second symbol to be stored for each storage node is selected so that there is no second symbol for storing the second symbols in common or for common storage. .

5. The method of claim 4,
Wherein the step of selecting symbols for each storage node includes the steps of: storing, for all of the storage nodes, two identical second symbols in common with only one other storage node; Wherein the second symbol to be stored for each storage node is selected so that there is no second symbol for storing the second symbol in common or for storing in common. Way.

6. The method of claim 5,
Wherein the repetition coding step repeatedly codes the repetition number so that the repetition number is 2,
Wherein the symbol selection step for each storage node divides the storage nodes into two groups and the storage nodes belonging to the same group do not commonly store the same second symbols, Wherein the second symbol to be stored for each storage node is selected so that only two or less of the second symbols are commonly stored.

[7] The method of claim 6,
And storing the same second symbols in common if the node indexes allocated to the groups are the same among the storage nodes belonging to different groups,
Wherein the second symbol stored in the storage node is shared by only one of the storage nodes and the second symbol other than the two identical second symbols among the second symbols stored in the storage node,
And the second symbol to be stored for each storage node is selected.

8. The method of claim 7,
Wherein the storage node stores three of the second symbols,
Wherein the symbol selection step for each storage node selects the second symbol to be stored for each storage node according to Equation (1).
Equation 1.

(Where V, U are the two groups of storage nodes, i is the node index, and _Sk is the kth second symbol).

6. The method of claim 5,
Wherein the repetition coding step repeatedly codes the repetition number so that the number of repetition times is 3,
Wherein the storage node stores three of the second symbols,
Wherein the symbol selection step for each storage node is such that the storage node stores only one other storage node and two identical second symbols in common, and
And the second symbol other than the commonly stored second symbol among the second symbols stored by the storage node is shared by only one of the other storage nodes,
And the second symbol to be stored for each storage node is selected.

10. The method of claim 9,
Wherein the symbol selection step for each storage node selects the second symbol to be stored for each storage node as Equation 2, Equation 3, and Equation 4, respectively.
Equation 2.

Equation 3.

Equation 4.

(Where V _i is the i th storage node, v _{i, j} is the j th second symbol stored in V _i , S _k is the k th second symbol, t = 1, ..., T And T is a constant determined in advance according to the number of the storage nodes necessary for restoring the storage object data.

11. The method of claim 10,
Wherein T is a constant set according to the number K of the storage nodes required for restoring the storage object data as shown in Equation 5 below.
Equation 5.

5. The method of claim 4,
Wherein the step of selecting symbols for each storage node includes the steps of: storing, for all the storage nodes, two identical second symbols in common with only two different storage nodes in common; Wherein the second symbol to be stored for each storage node is selected so that there is no second symbol for storing the second symbol in common or for storing in common. Way.

13. The method of claim 12,
Wherein the repetition coding step repeatedly codes the repetition number so that the number of repetition times is 3,
Wherein the step of selecting symbols for each storage node comprises the steps of: storing two identical second symbols among the storage nodes having adjacent node indices based on the node index of the storage node; Is selected by using the iterative division code.

14. The method of claim 13,
Wherein the storage node stores three of the second symbols,
Wherein the symbol selection step for each storage node selects the second symbol to be stored for each storage node according to Equation (6).
Equation 6.

(Where V _i is the ith storage node, i is the node index, n is the number of storage nodes, S _k is the kth second symbol, and mod is the computation of the remainder).

A computer program stored in a storage medium for executing a distributed storage method using a partial access recoverable partitioning code according to any one of claims 1 to 14.

A pre-coding unit for generating second symbols by coding a plurality of first symbols obtained by dividing data to be stored;
A repetition encoding unit that repeatedly encodes the second symbols by a predetermined repetition number and encodes each of the second symbols so that the repetition number of the second symbols is present; And
Wherein at least one second symbol is stored in a storage node of the distributed storage system so that the number of partial accesses that is the minimum number of access nodes required to recover the stored storage node is less than the number of second symbols stored in the storage node And a node manager for selecting the second symbol to be stored by each of the storage nodes according to a predetermined rule so that different storage nodes store at least two identical second symbols in common, A distributed storage device using a repetition division code capable of partial access recovery.

17. The method of claim 16,
Wherein the pre-encoding unit encodes the first symbols using an MDS (Maximum Distance Separable) code to generate the second symbol.

17. The method of claim 16,
Wherein the node management unit stores, for all of the storage nodes, two identical second symbols in common with only one other storage node, and the same second symbol Wherein the second symbol to be stored for each storage node is selected so that there is no second symbol commonly stored or commonly stored.

19. The node management system according to claim 18,
Wherein the storage node is divided into two groups and the storage nodes belonging to the same group do not store the same second symbols in common, Wherein the storage node belonging to the group selects the second symbol to be stored for each storage node so that only the same number of the second symbols is stored in common,
Wherein the storage node is configured to store only one other storage node and two identical second symbols in common when the iterative coding is repeatedly performed so that the repetition number is 3 in the repetition encoding section, Wherein the second symbol to be stored for each storage node is selected so that only the second symbol other than the second symbol to be commonly stored among the second symbols is stored in common with the other storage nodes. Distributed storage using recoverable iterative partitioning codes.

17. The method of claim 16,
Wherein the node management unit stores, for all of the storage nodes, two identical identical second symbols only with only two different storage nodes, and the same second symbol Wherein the second symbol to be stored for each storage node is selected so that there is no second symbol commonly stored or commonly stored.