CN112732203B - Regeneration code construction method, file reconstruction method and node repair method - Google Patents
Regeneration code construction method, file reconstruction method and node repair method Download PDFInfo
- Publication number
- CN112732203B CN112732203B CN202110348155.4A CN202110348155A CN112732203B CN 112732203 B CN112732203 B CN 112732203B CN 202110348155 A CN202110348155 A CN 202110348155A CN 112732203 B CN112732203 B CN 112732203B
- Authority
- CN
- China
- Prior art keywords
- block
- coding
- blocks
- code
- file
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 70
- 230000008439 repair process Effects 0.000 title claims abstract description 44
- 230000008929 regeneration Effects 0.000 title claims abstract description 21
- 238000011069 regeneration method Methods 0.000 title claims abstract description 21
- 238000010276 construction Methods 0.000 title claims abstract description 12
- 239000010410 layer Substances 0.000 claims description 41
- 239000008187 granular material Substances 0.000 claims description 7
- 238000011084 recovery Methods 0.000 claims description 5
- 238000005192 partition Methods 0.000 claims description 3
- 239000002355 dual-layer Substances 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 9
- 238000004458 analytical method Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 241000695274 Processa Species 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0614—Improving the reliability of storage systems
- G06F3/0619—Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M13/00—Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
- H03M13/03—Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Probability & Statistics with Applications (AREA)
- Quality & Reliability (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses a regeneration code coding method, which comprises the steps of obtaining a combined design and a storage data block; sorting and numbering the blocks, and counting the blocks and positions of the elements to obtain file symbols; encoding an original file to obtain an encoded block; allocating a different file symbol for each coding block; and placing the coding blocks corresponding to the file symbol sets with the consistent elements in the same storage node to obtain the regenerated codes. The invention also discloses a file reconstruction method and a node repair method based on the regeneration code construction method. The method for constructing the regeneration code has loose parameter limitation and can reach the upper bound of Singelton, the file reconstruction and node repair process is simple to operate, and the node repair efficiency is high.
Description
Technical Field
The invention particularly relates to a regeneration code construction method, a file reconstruction method and a node repair method.
Background
In the current information explosion era, how to safely and reliably store mass data becomes a problem which needs to be solved urgently. The traditional file Storage System cannot meet the requirements of availability, expandability and the like, and a Distributed Storage System (Distributed Storage System) is produced accordingly. The system utilizes a distributed technology to connect storage nodes through a network to form a cluster capable of storing mass data. The distributed system dispersedly places data files in a plurality of storage nodes, and a single storage node may fail due to network failure or physical damage, so the distributed system must introduce redundant information to ensure reliability. Common redundancy strategies are multi-copy strategies and erasure coding strategies. The multi-copy strategy is to copy the original file by multiple times and then store the original file, and when a node fails, the replacement node can directly copy data from the copy of the original file, so that the source file can be normally used as long as a complete data copy exists in the system; but this scheme has low storage efficiency and system reliability. The erasure code strategy adopts erasure codes to process the original files, and the common erasure codes are (n, k) Maximum Distance Separable (MDS) code, i.e. dividing the original file into equal sizekFrom thiskThe file is generated after linear codingn-kRedundant information, utilizationnOne node storesnLinearly independent coding blocks, by which end-users can connect to arbitrary oneskAnd restoring the original file by each node. Obviously, the erasure code strategy has higher storage efficiency, but is more complex to implement and has high application difficulty.
In distributed storage systems, usingnEach node stores raw data of size B, and each node stores data of amount BαFrom the end usernAny of the storage nodeskThe original file can be recovered by downloading dataThe process is called a file reconstruction process. When a storage node in the storage system fails, in order to ensure the overall function of the distributed system, the failed node needs to be recovered, and the process is called a node repair process.
RS (Reed-Solomon) code is a common MDS code, and the node repair process of RS code is: establishing a new storage node and connecting to the new storage nodekAnd downloading the data by each node, recovering the original file, and storing corresponding data to the new node after linear coding by using the original file. FIG. 1 is a schematic diagram of a RS code repair process in the prior art, which is easy to see, and downloads information for restoring storage information in a nodekThe data transmission amount of the stored data of each node is far larger than the storage amount of the node when the node is repaired, that is, a great amount of network bandwidth is wasted in the repairing process of the node.
In order to reduce the repair bandwidth during repair, the document [ A.G. Dimakis, P.B. Godfrey, Y.Wu, M.J. Wainwright, and K. Ramchandran, "Network coding for distributed storage systems," IEEE transactions. Inf. Theory, vol. 56, No. 9, pp. 4539-]The concept of network coding is introduced, that is, when a failed node is repaired, more nodes are taken to participate in the repair, and the nodes participating in the repair upload data in the node after linear combination, so that the bandwidth consumption of the repaired failed node can be greatly reduced, the distributed storage system is modeled by using an information flow graph, and minimum segmentation analysis is performed on the modeled distributed storage system, as shown in fig. 2. S in fig. 2 denotes an information source, i.e., raw data. Each storage nodeFrom an input nodeOutput nodeAnd directed edgesTo representThe weight value on the directed edge is the data storage capacity of the node, and the values are allα. DC (data collector) denotes a data collector, which DC collects arbitrary datakThe original data can be recovered by the data of each node; fromdThe individual node can repair the failed node by acquiring the data. The minimal cut analysis is performed on FIG. 2, and a curve separating the information source S and the data collector DC is called a cut of the information flow graph, and the weight of all directed edges cut by the curve and the value called the cut are obtained. According to the maximum stream-minimum cut theorem, when the value of the minimum cut is not less than the size of the original data amount in the information source S, the DC can restore the original data. Dimakis et al have given single-node repair bandwidth based on minimal cut analysis of information flow graphsγAnd single node memory capacityαThe trade-off between them, so-called regeneration Codes (regeneration Codes)αAndγand taking the corresponding code at the point on the compromise curve. There are two special types of points on the compromise curve, namely the amount of data storageαMinimum point and repair bandwidthγMinimum points, corresponding respectively to: Minimum-Storage regeneration Codes (MSRs), called MSR Codes,
Minimum-Bandwidth regeneration Codes (MSR), called MBR Codes,
the repair process of the regenerated code is as follows: the newly-built node is selected from the intact storage nodesdEach storage node downloads data, and the data amount downloaded by each node isβI.e. repair bandwidth of regenerated codesAnd repair bandwidth followingdIs reduced because each one increases with the number d of secondary nodes participating in the repairAmount of data transmitted at node repairβBecome small andβreduced speed ratiodThe increase is faster, resulting in a reduction in the total repair bandwidth. It can be seen that the repair bandwidth of the regenerated code is better than that of the RS code, but the number of node accesses in the repair process of the regenerated coded>k. In addition, the repair node needs to perform random linear network coding operation on the stored data, and in order to satisfy that all the coding packets are independent, the operation of the regenerated code needs to be performed in a large limited domain.
According to whether the information stored in the node is the check information, the node can be divided into:
1) system Nodes (Systematic Nodes): the system node stores the uncoded original file information;
2) check Nodes (Parity Nodes): the check node stores encoded file information, namely redundant information.
According to whether the repaired node data is the same as the original failed node data or not, the repairing strategy can be divided into the following steps:
1) functional Repair (Functional Repair): the node data repaired by the repairing scheme is not necessarily the same as the original failed node data, but the distributed storage system formed by the repaired node and other nodes still has MDS characteristics;
2) exact Repair (Exact Repair): the original failure node data of the node data repaired under the repairing scheme is the same;
3) hybrid Repair (Hybrid Repair): the restoration scheme can accurately restore system nodes (storing uncoded data), and only needs to ensure that a new distributed system still meets the MDS characteristics after restoration for non-system nodes (storing redundant data) without accurate restoration.
Based on the reproduction codes proposed by Dimakis et al, documents [ C. Tian, B. Sasidharan, V. Aggarwal, V.A. Vaishampayan, and P.V. Kumar, "Layered exact-reproducing codes via embedded error correction and block design," IEEE Trans. inf. Therory, vol.61, No. 4, pp. 1933-]An accurate repair regeneration code based on a combinatorial design placement strategy is presented. Combination ofA design is a set of subsets that satisfy certain characteristics (a subset is called a granule). A given block design ofX,B) Specifying with parameters (r,n),r≤nI.e. byXIs provided withnThe set of the individual elements is then selected,Bis thatXIs/are as followsrA set of meta subsets. In this document, two designs, namely, Steiner systems (Steiner systems) and Duplicate Combination Block Design (DCBD), are mainly used to realize the parameter limitation of the distributed storage systemn,k,d=k=n-1],[n,k=n-2,d=n-1]And 2n,k,d>k]To the precise construction of the same. This construction performs better than the spatial sharing between MSR and MBR codes and is the first type of code to achieve this. Further, the configuration may be limited in the parameter to [ ]n,k,d=k=n-1]The optimal function of (2) to repair the compromise curve reaches an extraordinary point and asymptotically optimizes at high code rates. However, the construction method proposed in this document has strict parameter limitation, only allows the failure of 1-2 nodes, and is not highly practical.
Disclosure of Invention
One of the objectives of the present invention is to provide a method for constructing a regenerated code, which has a simple construction process and is more relaxed in parameter limitation.
Another object of the present invention is to provide a file reconstruction method based on the above regenerated code construction method.
The third object of the present invention is to provide a node repairing method based on the above regenerated code constructing method.
The invention provides a method for constructing a regeneration code, which comprises the following steps:
s1, acquiring a combined design and storage data block;
s2, sorting and numbering the block groups, and counting the block groups and positions of the elements to obtain file symbols;
s3, coding the data block to obtain a coding block;
s4, distributing a different file symbol for each coding block;
and S5, placing the coding blocks corresponding to the file symbol sets with the consistent elements in the same storage node to obtain the regeneration codes.
Step S1, the obtained combination design is specifically that the combination condition is satisfiedt-designing,λIs composed oftEquilibrium number, representing arbitrarytThe meta-subset all appear inλThe number of the blocks is the same as the number of the blocks,
the combination conditions are as follows:
wherein,tis the number of elements of the subset;vthe order of the design, specifically the total number of different elements in the design;mthe block capacity is the number of elements in a block;
the obtained storage data block is specifically that the size of the original storage file is set as M, and the original file is split intoEqual-sized data blocks.
The file symbol of step S2 is set toWherein 1 is less than or equal toi≤b,1≤j≤m,bThe number of the block groups is used as the number of the block groups,mthe block capacity is obtained; at the same timePresentation elementhAppear atiThe first in the groupjA location; the block number is shown asB 1,B 2,…,B b For a block of blocksB i The elements therein are arranged in descending order, and the position numbers are given as {1,2, …,m}。
step S3 is specifically to encode the data block using a two-layer encoding strategy; the structure of the MDS code obtained from the outer layer is as follows:
using a parameter ofMDS code pairEncoding the data block to obtainb(m-t+1) code blocks, whereinbThe number of the block groups is used as the number of the block groups,mis the capacity of the block group,tis the number of elements of the subset,λis composed oftEquilibrium number, representing arbitrarytThe meta-subset appearing inλEach group of cells;
the MDS code obtained in the inner layer is constructed as follows:
obtained by outer layer codingb(m-t+1) code blocks are equally divided intobSet of using parameters of (m,m-t+1) MDS code pairs in each groupm-tCarrying out secondary coding on +1 coding blocks to finally obtainbmAnd coding the blocks.
Step S4 specifically includes allocating a file symbol as an identifier for each of the obtained coding blocks, and without loss of generality, allocating a file symbol as an identifier for each of the obtained coding blocksFront ofFile symbolAs outer layer codingIdentification of the generated redundant block; will be provided withFile symbol ofAs an identification of the redundant blocks generated by the inner layer coding, wherein,presentation elementhAppear atiThe first in the groupjThe position of each of the plurality of positions,mthe block size.
Step S5 specifically includes that each time a file symbol is foundIf it is an elementhWith previous file symbolsOtherwise, a new storage node is allocated and numbered ash(ii) a For each file symbolIf it is an elementhAllocated storage node, then elementhThe same cultural symbolsThe corresponding code block is stored to be coded ashOf the storage nodes of (1), one storage node storesrA plurality of code blocks, each of which is encoded,ris the element repetition degree; the obtained parameter is [ 2 ]n,k=n-t,d=n-t+1]The reproduction code of (2) is stored,nthe number of the nodes is the number of the nodes,kin order to download the number of data,din order to determine the number of nodes participating in the repair,tis the number of elements of the subset,presentation elementhAppear atiThe first in the groupjAnd (4) a position.
The invention also provides a file reconstruction method comprising the regeneration code construction method, which comprises the following steps:
A1. constructing a reproduction code according to the steps S1-S5;
A2. connecting to a storage node;
A3. when downloading the current coding block, carrying out reconstruction first judgment;
A4. after the downloading of one coding block is finished, carrying out reconstruction and second judgment;
A5. and restoring the original file by using the MDS coding rules of the inner layer and the outer layer.
The first determination of reconstruction in step A3 specifically includes that the current granules have the same granule numberi 0File symbol ofWhether or not to already downloadm- t + 1 file symbolThe code blocks represented by the code blocks are,presentation elementhAppear ati 0The first in the groupjThe position of each of the plurality of positions,mthe block capacity is obtained; if the judgment result is yes, stopping downloading the current coding block; if not, downloading the current coding block;
the second determination of the reconstruction in step a4 specifically includes determining whether the download has been completedA plurality of coding blocks; if the judgment result is yes, the downloading process is stopped, if the judgment result is no, the downloading process is continued,bthe number of the block groups is used as the number of the block groups,tis the number of elements of the subset,λis composed oftEquilibrium number, representing arbitrarytThe meta-subset appearing inλIn groups of blocks.
The invention also provides a node repairing method comprising the regeneration code constructing method, which comprises the following steps:
B1. constructing a reproduction code according to the steps S1-S5;
B2. determining the serial number of the failed node;
B3. determining the block serial number of the failed node in the combined design;
B4. fromn-t(ii) downloading code blocks in +1 nodes, with total downloadr(m-t+1) code blocks, the number of code blocks,rthe degree of repetition of the elements is,nthe number of the nodes is the number of the nodes,tis the number of elements of the subset,mthe block capacity is obtained;
B5. downloaded step B4r(m-t+1) coded block partitionsrGroups, the coding blocks of each group having the same block numberi;
B6. Recovering a coding block for each group of coding blocks by using the coding rule of the MDS code obtained by the inner layer; will be provided withrRecovery of block coded blocksrA plurality of coding blocks;
B7. will be recoveredrAnd storing the coding blocks into a new storage node, and adding the node into the original storage system to complete the repair process.
Step B2 is embodied as numbering elements from the remaining good nodes and recording them ash 1,...,h v-1Numbering the elements withComparing elements in design and determining serial number of failed nodeh 0(ii) a WhereinvIn order to be a step of the design,mis the capacity of the block group,λis composed oft-a balance number; the block number of step B3 is recorded asi 1,...,i r }; the file symbol of step B4 is specificallyIs shown byh 0Appear atiThe first in the groupjThe position of each of the plurality of positions,。
the method for constructing the regeneration code has loose parameter limitation and can reach the upper bound of Singelton, the file reconstruction and node repair process is simple to operate, and the node repair efficiency is high.
Drawings
Fig. 1 is a diagram illustrating a RS code repair process in the prior art.
Fig. 2 is an information flow diagram of a distributed storage system in the prior art.
FIG. 3 is a flow chart of a method for constructing a regenerated code according to the present invention.
FIG. 4 is a schematic diagram of a stored code of an embodiment of a regenerated code constructing method according to the present invention.
Fig. 5 is a schematic flow chart of a file reconstruction method according to the method of the present invention.
FIG. 6 is a schematic diagram of an embodiment of a file reconstruction method according to the present invention.
Fig. 7 is a flowchart illustrating a node repairing method according to the present invention.
Fig. 8 is a schematic diagram of an embodiment of a node repairing method according to the present invention.
Detailed Description
FIG. 3 is a flow chart of a method for constructing a regenerated code according to the present invention. The invention provides a method for constructing a regeneration code, which comprises the following steps:
s1, acquiring a combined design and storage data block; the obtained combination design is specifically that the combination condition is satisfiedt-designing,λIs composed oftEquilibrium number, representing arbitrarytThe meta-subset all appear inλThe number of the blocks is the same as the number of the blocks,
the combination conditions are as follows:
wherein,tis the number of elements of the subset;vthe design order is specifically the number of types of elements in the design;mthe block capacity is the number of elements in a block;
the obtained storage data block is specifically that the size of an original storage file is set to be M, and the original file is split into equal sizeAnd (4) a data block.
S2, sorting and numbering the block groups, and counting the block groups and positions of the elements to obtain file symbols; the file symbol is set asWherein 1 is less than or equal toi≤b,1≤j≤m,bThe number of the block groups is used as the number of the block groups,mthe block capacity is obtained; at the same timePresentation elementhAppear atiThe first in the groupjA location; the block number is shown asB 1,B 2,…,B b For a block of blocksB i The elements therein are ordered in chronological order, numbered as positions 1,2, …,m}。
s3, coding the data block by using a double-layer coding strategy; the structure of the MDS code obtained from the outer layer is as follows:
using a parameter ofMDS code pairNumber ofCoding according to the block to obtainb(m-t+1) code blocks, whereinbThe number of the block groups is used as the number of the block groups,mis the capacity of the block group,tis the number of elements of the subset,λis composed oftEquilibrium number, representing arbitrarytThe meta-subset appearing inλEach group of cells;
the MDS code obtained in the inner layer is constructed as follows:
obtained by outer layer codingb(m-t+1) code blocks are equally divided intobSet of using parameters of (m,m-t+1) MDS code pairs in each groupm-tCarrying out secondary coding on +1 coding blocks to finally obtainbmAnd coding the blocks.
S4, distributing a file symbol as an identifier for each obtained coding block, and without loss of generality, distributing a file symbol as an identifier for each obtained coding blockFront ofFile symbolAs the identification of the redundant block generated by the outer layer coding; will be provided withFile symbol ofAs an identification of the redundant blocks generated by the inner layer coding, wherein,presentation elementhAppear atiThe first in the groupjThe position of each of the plurality of positions,mthe block size.
And S5, placing the coding blocks corresponding to the file symbol sets with the consistent elements in the same storage node to obtain the regeneration codes. Specifically, each time a file symbol is foundIf it is an elementhWith previous file symbolsOtherwise, a new storage node is allocated and numbered ash(ii) a For each file symbolIf it is an elementhAllocated storage node, then elementhThe same cultural symbolsThe corresponding code block is stored to be coded ashOf the storage nodes of (1), one storage node storesrA plurality of code blocks, each of which is encoded,ris the element repetition degree; the obtained parameter is [ 2 ]n,k=n-t,d=n-t+1]The reproduction code of (2) is stored,nthe number of the nodes is the number of the nodes,kin order to download the number of data,din order to determine the number of nodes participating in the repair,tis the number of elements of the subset.
In a specific embodiment, a parameter M =39 is given,n=8,d=7,kstorage coding example of =6, as shown in fig. 4, a schematic storage coding diagram of an embodiment of a regeneration code constructing method of the present invention is shown. The method comprises the following steps:
and (1) acquiring 39 initial data blocks (the data blocks are obtained after the original file is subjected to the step S1) by the distributed storage system, and encoding by using the double-layer encoding strategy in the step S3 to obtain 56 encoding blocks.
The parameter is [ alpha ]n=8,k=6,d=7]A distributed storage system constructed based on a 2- (8,4,3) design,
the sequence of 2- (8,4,3) is as follows: {{0,1,2,3},{0,1,2,4},{0,1,5,6},{0,2,5,7},{0,3,4,5},{0,3,6,7},{0,4,6,7},{1,2,6,7},{1,3,4,6},{1,3,5,7},{1,4,5,7},{2,3,4,7},{2,3,5,6},{2,4,5,6}}.
Step (2) counting the blocks, positions and knots of the elements in the blocksThe following fruits were obtained:
step (3) of allocating a file symbol for each of the 56 coding blocksWhereinPresentation elementhAppear atiThe first in the groupjAnd (4) a position.
And (4) the double-layer coding strategy is specifically as follows, wherein the inner-layer coding utilizes MDS (modified System code) with the parameter of (4, 3) to code, and the coding block corresponding to the last file symbol in each block is used as a check block without loss of generality, so that the coding block can be obtained
…
Outer layer coding using MD with parameters (42, 39)And S, encoding. In the present embodiment, without loss of generalityThe corresponding coding block is used as the check block, and then the corresponding coding block can be obtained
In this embodiment, to ensure the check symbols are independent of each other, the coefficients before the file symbols are as followsx,x 2,x 3Etc. are the data of different rows of the vandermonde matrix. While the modulo addition operations are all performed in the finite field, this embodiment sets the finite field to GF (43).
Step (5) distributing different storage nodes for different elements, wherein in the embodiment, the number of the elements is 0-7, and 8 storage nodes need to be created; each will behConsistent symbolsThe corresponding code blocks are placed in the same storage node, and then one storage node stores 7 code blocks, and finally the storage code as shown in fig. 4 is obtained.
Fig. 5 is a schematic flow chart of a file reconstruction method according to the present invention. The file reconstruction method based on the regeneration code construction method comprises the following steps:
A1. constructing a reproduction code according to the steps S1-S5;
A2. connecting to a storage node;
the file reconstruction method mainly uses the outer layer parameters generated in step S3 asRedundant information provided by the MDS code of (1); number of original data blocks ofThat is to increaseRedundant blocks, therefore, need to be collected during the file reconstruction processA number of different encoded blocks. In the present embodiment, the maximum value is obtainedn-tAnd the storage node is used for downloading the data.
A3. When downloading the current coding block, carrying out reconstruction first judgment; the first determination of reconstruction in step A2 specifically includes that the current granules have the same granule numberi 0File symbol ofWhether or not to already downloadm- t + 1 file symbolThe code blocks represented by the code blocks are,presentation elementhAppear ati 0The first in the groupjThe position of each of the plurality of positions,mthe current coding block is downloaded if the block capacity is judged to be the block capacity, and the current coding block is downloaded if the block capacity is judged to be the block capacity;
A4. after the downloading of one coding block is finished, carrying out reconstruction and second judgment; the second determination of the reconstruction in step a3 specifically includes determining whether the download has been completedIf the coding block is judged to be yes, the downloading process is stopped, and if the coding block is judged to be yesIf not, the downloading process is continued,bthe number of the block groups is used as the number of the block groups,tis the number of elements of the subset,λis composed oftEquilibrium number, representing arbitrarytThe meta-subset all appear inλIn each block;
A5. restoring the original file by using MDS coding rules of the inner layer and the outer layer; in particular downloaded toWhen there are several coding blocks, if the number of coding blocks in the same block group is less than that of coding blocksm-t+1, then use the outer layerRecovering the coding block by the MDS coding rule; if the number of coding blocks in the same block group ism-t+1, then use the inner layer(s) (C)m, m-t+1) the MDS coding rule recovers the coded block.
Not every reconstruction operation is requiredn-tThe storage nodes transmit data, but only need to downloadAnd finishing the file transmission process by different coding blocks so as to start the file reconstruction operation.
In the specific embodiment, a file reconstruction example of M =39, n =8, d =7, and k =6 is given, and fig. 6 is a schematic diagram of an embodiment of a file reconstruction method of the present invention. The concrete description is as follows:
step 1), in this embodiment, the 6 nodes connected to Node 0 to Node 5 perform the file reconfiguration operation.
Step 2), Node 0 needs to download all data, Node 1 needs to download all data, Node 2 needs to download all data, and Node 3 downloads symbols except filesAll data except the corresponding code block is downloaded in Node 4Corresponding codeAll data except the block, Node 5, is downloadedAnd downloading 39 coding blocks in total by all data outside the corresponding coding blocks, and finishing the data downloading process.
And 3) performing file recovery by using an outer layer (42, 39) MDS encoding procedure. Due to the fact that
The data downloaded from Node 0 to Node 5 is lackingThe represented coding block, thereforeThe result of eliminating the other coding blocks is as follows:
step 4), setting the finite field to GF (43) in the encoding process,and isLinearly independent, then the three formulas can be used toThe represented encoded block is recovered. Then all encoded blocks have been obtained so far.
Fig. 7 is a schematic flow chart of the node repairing method of the present invention, and the node repairing method based on the regenerated code constructing method includes the following steps:
B1. constructing a reproduction code according to the steps S1-S5;
B2. determining the serial number of the failed node; specifically, numbering the elements from the remaining good nodes, and recording the numbersNumbering elements withComparing elements in design and determining serial number of failed nodeh 0(ii) a The node repairing method according to the inner layer parameter of step S3 is (m,m- t + 1 MDS code provides redundant information due to the fact that the inner layer parameter is (m,m-t+1) MDS code onlyt-1 data symbol as redundant information, thus requiring a common downloadr(m-t+1) code blocks.
B3. Determining the serial number of the block with the failed node in the combined design, and recording the serial number in sequence;
B4. Fromn-tThe coding block is downloaded in +1 node, and the corresponding file symbol is specifically,To representhAppear atiThe first in the groupjIndividual location, co-downloadr(m-t+1) code blocks;
B5. downloaded step B4r(m-t+1) coded block partitionsrGroups, the coding blocks of each group having the same block numberi;
B6. Recovering a coding block for each group of coding blocks by using the coding rule of the MDS code obtained by the inner layer; will be provided withrRecovery of block coded blocksrA plurality of coding blocks;
B7. will be recoveredrAnd storing the coding blocks into a new storage node, and adding the node into the original storage system to complete the repair process.
For onet-designingI.e. sharevEach group of different elements havingmAn element, antThe elements meetλSecond (i.e., appearing in the same block). In step S3 of the encoding process, the inner layer parameter is (m,m-t+1) MDS code provisiont1 redundant information, then tolerance for coding blocks belonging to the same granulet-failure of 1 coded block. Further, the whole storage system can tolerate the failure of t nodes and the failure of t nodes togethertrA coding block in which a failed code cannot be coded only by the inner layer(s) ((m,m-t+1) the redundant information provided by the MDS code is recovered, but must also depend on the outer parameter beingRecovering the MDS code; wherein the inner layer (m,m-t+1) MDS codes can be thistrProvision of multiple fail code blocksRedundant information, outer layerMDS codes can provide for these redundant filesRedundant information, thus making use ofWith redundant information to recover from such failuretrAnd coding the blocks.
In the specific embodiment, given an M =39,n=8,d=7,kthe node repair example of =6, as shown in fig. 8, is a schematic diagram of an embodiment of the node repair method of the present invention. The concrete description is as follows: in this embodiment, if Node 0 is disabled, the specific repair process is as follows:
step one, because Node 1-Node 7 nodes are intact, the element number of the failure Node is 0;
and step two, the distributed storage system is designed based on 2- (8,4,3), and then the block sequence number of the element 0 is {1,2,3,4,5,6,7 }.
Step four, dividing the downloaded coding blocks into 7 groups, wherein the coding blocks in each groupWith the same block numberiAfter grouping, the following
Step five, for each group, recovering a coding block by using the coding rule of the MDS code with the inner layer parameter of (4, 3), for example, in the group 1And is andif it is obtained, it can be recovered after XOR operation. Similarly, the recovery operation is performed on each group, and finally the result is obtainedThis is exactly the coding block in the failed node.
And step six, storing the 7 coding blocks to a new storage node, and adding the node into the original distributed storage system. And finishing the node repairing process.
Summarizing the codeword proposed by the present invention and the codeword-related parameters proposed by Chao Tian, as shown in the following table:
demonstrating production ofn,k=n-t,d=n-t+1]Distributed storage systems reaching the singleton boundary, i.e.d min= n-k+1, the demonstration procedure is as follows:
according to step S5 in the encoding process, all will have the samehIs/are as followsThe represented code blocks are stored in the same node, and the number of files stored in one node,rIs the element repetition degree; and for each different elementhAssigning a new storage node, the total number of nodes of the storage systemn=v。
Step S3 of the encoding process, the number of data blocks that can be stored in the distributed system due to the presence of the dual-layer MDS encodingWherein。
III. according to. According to the node repair process, a node failure needs to be connected toAnd repairing each node.
IV, left formula of Singelton bound:
left formula =n-(n-t)+1= t + 1
The right formula of the Singelton bound:
due to the fact thatv≥m>0, so that the above formula
V. due to the encoding procedure, step S1tThe design requirements are satisfied byHence the singelton bound right formula = t + 1。
Vi, since left = right, the evidence is complete.
Claims (9)
1. A method for constructing a regenerated code, comprising the steps of:
s1, acquiring a combined design and storage data block; the obtained combination design is specifically that the combination condition is satisfiedt-designing,λIs composed oftEquilibrium number, representing arbitrarytThe meta-subset all appear inλThe number of the blocks is the same as the number of the blocks,
the combination conditions are as follows:
wherein,tis the number of elements of the subset;vthe order of the design, specifically the total number of different elements in the design;mthe block capacity is the number of elements in a block;
the obtained storage data block is specifically that the size of the original storage file is set as M, and the original file is split intoData blocks of equal size;
s2, sorting and numbering the block groups, and counting the block groups and positions of the elements to obtain file symbols;
s3, coding the data block to obtain a coding block;
s4, distributing a different file symbol for each coding block;
and S5, placing the coding blocks corresponding to the file symbol sets with the consistent elements in the same storage node to obtain the regeneration codes.
2. The method according to claim 1, wherein the file symbol of step S2 is set asWherein 1 is less than or equal toi≤b,1≤j≤m,bThe number of the block groups is used as the number of the block groups,mthe block capacity is obtained; at the same timePresentation elementhAppear atiThe first in the groupjA location; the block number is shown asB 1,B 2,…,B b For a block of blocksB i The elements therein are arranged in descending order, and the position numbers are given as {1,2, …,m}。
3. the method according to claim 2, wherein the step S3 is to encode the data block by using a dual-layer encoding strategy; the structure of the MDS code obtained from the outer layer is as follows:
using a parameter ofMDS code pairEncoding the data block to obtainb(m-t+1) code blocks, whereinbThe number of the block groups is used as the number of the block groups,mis the capacity of the block group,tis the number of elements of the subset,λis composed oftEquilibrium number, representing arbitrarytThe meta-subset appearing inλEach group of cells;
the MDS code obtained in the inner layer is constructed as follows:
obtained by outer layer codingb(m-t+1) code blocks are equally divided intobSet of using parameters of (m,m-t+1) MDS code pairs in each groupm-tCarrying out secondary coding on +1 coding blocks to finally obtainbmAnd coding the blocks.
4. The method according to claim 3, wherein step S4 comprises assigning a file symbol as an identifier to each of the obtained code blocks, and without loss of generality, assigning a file symbol as an identifier to each of the obtained code blocksFront ofFile symbolAs the identification of the redundant block generated by the outer layer coding; will be provided withFile symbol ofAs an identification of the redundant blocks generated by the inner layer coding, wherein,presentation elementhAppear atiThe first in the groupjThe position of each of the plurality of positions,mthe block size.
5. The method of claim 4, wherein the step S5 is embodied in the form of a packageWhenever a file symbol is foundIf it is an elementhWith previous file symbolsOtherwise, a new storage node is allocated and numbered ash(ii) a For each file symbolIf it is an elementhAllocated storage node, then elementhThe same cultural symbolsThe corresponding code block is stored to be coded ashOf the storage nodes of (1), one storage node storesrA plurality of code blocks, each of which is encoded,ris the element repetition degree; the obtained parameter is [ 2 ]n,k=n-t,d=n-t+1]The reproduction code of (2) is stored,nthe number of the nodes is the number of the nodes,kin order to download the number of data,din order to determine the number of nodes participating in the repair,tis the number of elements of the subset,presentation elementhAppear atiThe first in the groupjAnd (4) a position.
6. A file reconstruction method based on the reproduction code construction method according to any one of claims 1 to 5, characterized by comprising the steps of:
A1. constructing a reproduction code according to the steps S1-S5;
A2. connecting to a storage node;
A3. when downloading the current coding block, carrying out reconstruction first judgment;
A4. after the downloading of one coding block is finished, carrying out reconstruction and second judgment;
A5. and restoring the original file by using the MDS coding rules of the inner layer and the outer layer.
7. The method of claim 6, wherein the first determination of reconstruction at step A3 includes, for the current granules having the same numberi 0File symbol ofWhether or not to already downloadm-t+1 file symbolThe code blocks represented by the code blocks are,presentation elementhAppear ati 0The first in the groupjThe position of each of the plurality of positions,mthe block capacity is obtained; if the judgment result is yes, stopping downloading the current coding block; if not, downloading the current coding block; the second determination of the reconstruction in step a4 specifically includes determining whether the download has been completedA plurality of coding blocks; if the judgment result is yes, the downloading process is stopped, if the judgment result is no, the downloading process is continued,bthe number of the block groups is used as the number of the block groups,tis the number of elements of the subset,λis composed oftEquilibrium number, representing arbitrarytThe meta-subset appearing inλIn groups of blocks.
8. A node repairing method based on the regenerated code constructing method of any claim 1 to 5, characterized by comprising the steps of:
B1. constructing a reproduction code according to the steps S1-S5;
B2. determining the serial number of the failed node;
B3. determining the block serial number of the failed node in the combined design;
B4. fromn-tIn +1 nodesDownloading coding blocks, co-downloadingr(m-t+1) code blocks, the number of code blocks,rthe degree of repetition of the elements is,nthe number of the nodes is the number of the nodes,tis the number of elements of the subset,mthe block capacity is obtained;
B5. downloaded step B4r(m-t+1) coded block partitionsrGroups, the coding blocks of each group having the same block numberi;
B6. Recovering a coding block for each group of coding blocks by using the coding rule of the MDS code obtained by the inner layer; will be provided withrRecovery of block coded blocksrA plurality of coding blocks;
B7. will be recoveredrAnd storing the coding blocks into a new storage node, and adding the node into the original storage system to complete the repair process.
9. The node repairing method according to claim 8, wherein the step B2 is embodied as numbering elements from the remaining intact nodes as ∑h 1,...,h v-1Numbering the elements withComparing elements in design and determining serial number of failed nodeh 0(ii) a WhereinvIn order to be a step of the design,mis the capacity of the block group,λis composed oft-a balance number; the block number of step B3 is recorded asi 1,...,i r }; the file symbol of step B4 is specificallyIs shown byh 0Appear atiThe first in the groupjThe position of each of the plurality of positions,。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110348155.4A CN112732203B (en) | 2021-03-31 | 2021-03-31 | Regeneration code construction method, file reconstruction method and node repair method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110348155.4A CN112732203B (en) | 2021-03-31 | 2021-03-31 | Regeneration code construction method, file reconstruction method and node repair method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112732203A CN112732203A (en) | 2021-04-30 |
CN112732203B true CN112732203B (en) | 2021-06-22 |
Family
ID=75596222
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110348155.4A Active CN112732203B (en) | 2021-03-31 | 2021-03-31 | Regeneration code construction method, file reconstruction method and node repair method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112732203B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116880778B (en) * | 2023-09-07 | 2023-11-21 | 杭州迅杭科技有限公司 | User privacy protection method based on regenerative coding and distributed storage |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103688514A (en) * | 2013-02-26 | 2014-03-26 | 北京大学深圳研究生院 | Coding method for minimum storage regeneration codes and method for restoring of storage nodes |
CN103688515A (en) * | 2013-03-26 | 2014-03-26 | 北京大学深圳研究生院 | Method for encoding minimum bandwidth regeneration codes and repairing storage nodes |
US8719667B2 (en) * | 2010-07-26 | 2014-05-06 | Thomson Licensing | Method for adding redundancy data to a distributed data storage system and corresponding device |
CN104838626A (en) * | 2013-01-04 | 2015-08-12 | 北京大学深圳研究生院 | Coding method for general projective self-repairing codes, and data reconstruction and repair method |
CN105721611A (en) * | 2016-04-15 | 2016-06-29 | 西南交通大学 | General method for generating minimal storage regenerating code with maximum distance separable storage code |
CN108279995A (en) * | 2018-01-30 | 2018-07-13 | 北京交通大学 | A kind of storage method for the distributed memory system regenerating code based on safety |
CN109062724A (en) * | 2018-07-21 | 2018-12-21 | 湖北大学 | A kind of correcting and eleting codes conversion method and terminal |
CN110750382A (en) * | 2019-09-18 | 2020-02-04 | 华中科技大学 | Minimum storage regeneration code coding method and system for improving data repair performance |
US10608784B2 (en) * | 2016-03-15 | 2020-03-31 | ClineHair Commercial Endeavors | Distributed storage system data management and security |
CN111125014A (en) * | 2019-11-19 | 2020-05-08 | 长安大学 | Construction method of flexible partial repeat code based on U-shaped design |
WO2020160142A1 (en) * | 2019-01-29 | 2020-08-06 | ClineHair Commercial Endeavors | Encoding and storage node repairing method for minimum storage regenerating codes for distributed storage systems |
CN111796966A (en) * | 2020-05-19 | 2020-10-20 | 中大编码有限公司 | Construction method of storage rack perception regeneration code |
-
2021
- 2021-03-31 CN CN202110348155.4A patent/CN112732203B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8719667B2 (en) * | 2010-07-26 | 2014-05-06 | Thomson Licensing | Method for adding redundancy data to a distributed data storage system and corresponding device |
CN104838626A (en) * | 2013-01-04 | 2015-08-12 | 北京大学深圳研究生院 | Coding method for general projective self-repairing codes, and data reconstruction and repair method |
CN103688514A (en) * | 2013-02-26 | 2014-03-26 | 北京大学深圳研究生院 | Coding method for minimum storage regeneration codes and method for restoring of storage nodes |
CN103688515A (en) * | 2013-03-26 | 2014-03-26 | 北京大学深圳研究生院 | Method for encoding minimum bandwidth regeneration codes and repairing storage nodes |
US10608784B2 (en) * | 2016-03-15 | 2020-03-31 | ClineHair Commercial Endeavors | Distributed storage system data management and security |
CN105721611A (en) * | 2016-04-15 | 2016-06-29 | 西南交通大学 | General method for generating minimal storage regenerating code with maximum distance separable storage code |
CN108279995A (en) * | 2018-01-30 | 2018-07-13 | 北京交通大学 | A kind of storage method for the distributed memory system regenerating code based on safety |
CN109062724A (en) * | 2018-07-21 | 2018-12-21 | 湖北大学 | A kind of correcting and eleting codes conversion method and terminal |
WO2020160142A1 (en) * | 2019-01-29 | 2020-08-06 | ClineHair Commercial Endeavors | Encoding and storage node repairing method for minimum storage regenerating codes for distributed storage systems |
CN110750382A (en) * | 2019-09-18 | 2020-02-04 | 华中科技大学 | Minimum storage regeneration code coding method and system for improving data repair performance |
CN111125014A (en) * | 2019-11-19 | 2020-05-08 | 长安大学 | Construction method of flexible partial repeat code based on U-shaped design |
CN111796966A (en) * | 2020-05-19 | 2020-10-20 | 中大编码有限公司 | Construction method of storage rack perception regeneration code |
Non-Patent Citations (7)
Title |
---|
Interference Alignment in Regenerating Codes;Nihar B. Shah et al.;《IEEE TRANSACTIONS ON INFORMATION THEORY》;20100510;全文 * |
Minimum Storage Regenerating Codes for;HUAYU ZHANG et al.;《IEEE Access》;20170428;全文 * |
On the Optimal Minimum Distance of;Bing Zhu et al.;《IEEE Xplore》;20200514;全文 * |
Replication-based Distributed Storage Systems with;Bing Zhu et al.;《IEEE Xplore》;20140508;全文 * |
二元再生码在分布式存储系统的应用;侯韩旭 等;《计算机研究与发展》;20131126;全文 * |
基于柯西矩阵的最小带宽再生码研究伏;宋海龙,王伟平,肖亚龙;《湖南大学学报(自然科学版)》;20170831;全文 * |
基于精确再生码的秘密共享方案;宋海龙,王伟平;《中南大学学报(自然科学版)》;20170430;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112732203A (en) | 2021-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9959169B2 (en) | Expansion of dispersed storage network (DSN) memory | |
CN107656832B (en) | A kind of correcting and eleting codes method of low data reconstruction expense | |
US10241864B2 (en) | Expanding information dispersal algorithm width without rebuilding through imposter slices | |
US7979771B2 (en) | Erasure coding technique for scalable and fault tolerant storage system | |
CN110750382B (en) | Minimum storage regeneration code coding method and system for improving data repair performance | |
CN103746774B (en) | The fault-tolerant coding method that a kind of efficient data is read | |
US20160328295A1 (en) | Site-Based Namespace Allocation | |
US20140344227A1 (en) | Streaming Content Storage | |
CN107844272A (en) | A kind of cross-packet coding and decoding method for improving error correcting capability | |
US10001923B2 (en) | Generation collapse | |
US20170212683A1 (en) | Provisioning ds units on the fly in a dsn memory in response to load | |
CN112799605B (en) | Square part repeated code construction method, node repair method and capacity calculation method | |
CN109491835A (en) | A kind of data fault tolerance method based on Dynamic Packet code | |
CN116501553B (en) | Data recovery method, device, system, electronic equipment and storage medium | |
WO2015180038A1 (en) | Partial replica code construction method and device, and data recovery method therefor | |
CN111831223A (en) | Fault-tolerant coding method, device and system for improving expandability of data deduplication system | |
CN108762978B (en) | Grouping construction method of local part repeated cyclic code | |
CN112732203B (en) | Regeneration code construction method, file reconstruction method and node repair method | |
CN107153661A (en) | A kind of storage, read method and its device of the data based on HDFS systems | |
WO2018011670A1 (en) | Manipulating distributed agreement protocol to identify desired set of storage units | |
Ivanichkina et al. | Mathematical methods and models of improving data storage reliability including those based on finite field theory | |
US20180113626A1 (en) | Fail-in-place supported via decentralized or distributed agreement protocol (dap) | |
Li et al. | Relieving both storage and recovery burdens in big data clusters with R-STAIR codes | |
CN111224747A (en) | Coding method capable of reducing repair bandwidth and disk reading overhead and repair method thereof | |
CN106911793B (en) | I/O optimized distributed storage data repair method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |