CN112732203B

CN112732203B - Regeneration code construction method, file reconstruction method and node repair method

Info

Publication number: CN112732203B
Application number: CN202110348155.4A
Authority: CN
Inventors: 朱兵; 曾志伟; 赵旭煜; 王伟平; 王建新
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2021-03-31
Filing date: 2021-03-31
Publication date: 2021-06-22
Anticipated expiration: 2041-03-31
Also published as: CN112732203A

Abstract

The invention discloses a regeneration code coding method, which comprises the steps of obtaining a combined design and a storage data block; sorting and numbering the blocks, and counting the blocks and positions of the elements to obtain file symbols; encoding an original file to obtain an encoded block; allocating a different file symbol for each coding block; and placing the coding blocks corresponding to the file symbol sets with the consistent elements in the same storage node to obtain the regenerated codes. The invention also discloses a file reconstruction method and a node repair method based on the regeneration code construction method. The method for constructing the regeneration code has loose parameter limitation and can reach the upper bound of Singelton, the file reconstruction and node repair process is simple to operate, and the node repair efficiency is high.

Description

A regeneration code construction method, a file reconstruction method and a node repair method

技术领域technical field

本发明具体涉及一种再生码构造方法、文件重构方法及节点修复方法。The invention specifically relates to a regeneration code construction method, a file reconstruction method and a node repair method.

背景技术Background technique

在当今信息爆炸的时代，如何安全可靠地存储海量数据成为了亟需解决的问题。传统的文件存储系统已不能满足可用性和可扩展性等要求，分布式存储系统(DistributedStorage System)应运而生。该系统利用分布式技术，将存储节点通过网络联结起来形成一个能存储海量数据的集群。分布式系统将数据文件分散放置于多个存储节点中，由于网络故障或物理损坏，单个存储节点可能会失效，因此分布式系统必须引入冗余信息来保证可靠性。常见的冗余策略有多副本策略和纠删码策略。多副本策略就是把原文件复制若干倍后再存储，当有节点失效时，替换节点就可以直接从它的副本中复制数据，因此系统中只要存在一个完整的数据副本，源文件就能正常使用；但这种方案存储效率和系统可靠性较低。纠删码策略采用纠删码处理原始文件，常用的纠删码为(n, k)最大距离可分(MaximumDistance Separable, MDS)码，即将原始文件分成大小相等的k份，由这k份文件经线性编码后生成n-k份冗余信息，利用n个节点存储n个线性无关的编码块，终端用户可以通过连接到任意k个节点恢复出原始文件。显然纠删码策略存储效率更高，但是实现也更复杂，应用难度大。In today's era of information explosion, how to safely and reliably store massive data has become an urgent problem to be solved. The traditional file storage system can no longer meet the requirements of availability and scalability, and the distributed storage system (Distributed Storage System) emerges as the times require. The system uses distributed technology to connect storage nodes through the network to form a cluster that can store massive data. Distributed systems place data files in multiple storage nodes. Due to network failure or physical damage, a single storage node may fail. Therefore, distributed systems must introduce redundant information to ensure reliability. Common redundancy strategies include multiple replica strategies and erasure coding strategies. The multi-copy strategy is to copy the original file several times before storing it. When a node fails, the replacement node can directly copy the data from its copy. Therefore, as long as there is a complete copy of the data in the system, the source file can be used normally. However, this scheme has low storage efficiency and system reliability. The erasure code strategy uses erasure codes to process original files. The commonly used erasure codes are ( n , k ) Maximum Distance Separable (MDS) codes, that is, the original file is divided into k parts of equal size, and the k parts of the file are divided into k parts. After linear coding, n - k redundant information is generated, and n nodes are used to store n linearly independent coding blocks. The end user can restore the original file by connecting to any k nodes. Obviously, the erasure coding strategy has higher storage efficiency, but the implementation is also more complicated and the application is difficult.

在分布式存储系统中，利用n个节点存储大小为B的原始数据，每个节点存储的数据量为α，终端用户从n个存储节点中的任意k个下载数据即可恢复出原始文件，该过程被称为文件重构过程。当存储系统中有存储节点失效时，为了保证分布式系统的整体功能，需要对该失效节点进行恢复，该过程被称为节点修复过程。In a distributed storage system, n nodes are used to store the original data of size B, and the amount of data stored in each node is α . The end user can restore the original file by downloading data from any k of the n storage nodes. This process is called the file reconstruction process. When a storage node in the storage system fails, in order to ensure the overall function of the distributed system, the failed node needs to be restored. This process is called a node repair process.

RS（Reed-Solomon）码是一种常见的MDS码，RS码的节点修复过程为：先建立一个新的存储节点，再连接至k个节点下载数据并恢复出原始文件，利用原始文件经线性编码后再向该新节点存储相应的数据。图1为现有技术中RS码修复过程示意图，易见，为恢复一个节点中的存储信息而下载了k个节点的存储数据，修复节点时的数据传输量远大于节点的存储量，即节点的修复过程需要浪费大量的网络带宽。RS (Reed-Solomon) code is a common MDS code. The node repair process of RS code is as follows: first establish a new storage node, then connect to k nodes to download data and restore the original file. After encoding, store the corresponding data to the new node. Fig. 1 is a schematic diagram of the RS code repair process in the prior art, it is easy to see that the stored data of k nodes is downloaded in order to restore the stored information in a node, and the data transmission amount when repairing a node is much greater than the storage capacity of the node, that is, the node The repair process requires a lot of wasted network bandwidth.

为了降低修复过程中的修复带宽，文献[A. G. Dimakis, P. B. Godfrey, Y.Wu, M. J. Wainwright, and K. Ramchandran, “Network coding for distributedstorage systems,” IEEE Trans. Inf. Theory, vol. 56, no. 9, pp. 4539–4551,Sep. 2010] 引入了网络编码的概念，即在修复失效节点时，取较多的节点参与修复，且参与修复的节点先将本节点内的数据经过线性组合后再上传，这能极大地降低修复失效节点的带宽消耗，该文使用信息流图对分布式存储系统进行建模，并对其进行最小割分析，如图2所示。图2中S表示信息源，即原始数据。每个存储节点

由输入节点

、输出节点

和有向边

来表示，有向边上的权值为该节点的数据存储量，其值均为α。DC (Data Collector)表示数据收集器，数据收集器DC收集任意k个节点的数据，就能恢复原始数据；从d个节点获取数据，就能修复失效节点。对图2进行最小割分析，将信息源S和数据收集者DC分开的曲线称为该信息流图的割，该曲线切过的所有有向边的权值和称为割的值。根据最大流-最小割定理，当最小割的值不小于信息源S中的原始数据量大小时，DC能够恢复原始数据。基于对信息流图的最小割分析，Dimakis等人给出了单节点修复带宽γ和单节点存储量α之间的折衷曲线，所谓再生码(Regenerating Codes)即α和γ取该折衷曲线上的点时对应的编码。在折衷曲线上存在两类特殊的点，即数据存储量α最小值点和修复带宽γ最小值点，分别对应于：最小存储再生码(Minimum-Storage Regenerating, MSR)，称为MSR编码(MSR Codes)，In order to reduce the repair bandwidth in the repair process, the literature [AG Dimakis, PB Godfrey, Y.Wu, MJ Wainwright, and K. Ramchandran, “Network coding for distributedstorage systems,” IEEE Trans. Inf. Theory, vol. 56, no. 9, pp. 4539–4551, Sep. 2010] introduced the concept of network coding, that is, when repairing a failed node, more nodes are selected to participate in the repair, and the nodes participating in the repair first linearly combine the data in this node. Re-upload, which can greatly reduce the bandwidth consumption of repairing failed nodes. This paper uses information flow graph to model the distributed storage system, and performs minimum cut analysis on it, as shown in Figure 2. S in Fig. 2 represents the information source, that is, the original data. per storage node

by the input node

, output node

and directed edges

to indicate that the weight on the directed edge is the data storage amount of the node, and its value is α . DC (Data Collector) represents a data collector. The data collector DC collects the data of any k nodes and can restore the original data; obtains data from d nodes, and can repair the failed node. Perform the minimum cut analysis on Figure 2, the curve separating the information source S and the data collector DC is called the cut of the information flow graph, and the weights of all directed edges cut by the curve are called the cut value. According to the maximum flow-minimum cut theorem, when the value of the minimum cut is not less than the original data size in the information source S, the DC can restore the original data. Based on the minimum cut analysis of the information flow graph, Dimakis et al . gave a compromise curve between the repair bandwidth γ of a single node and the storage capacity α of a single node. The corresponding code when the point is clicked. There are two kinds of special points on the compromise curve, namely the minimum point of data storage α and the minimum point of repair bandwidth γ , which correspond to: Minimum-Storage Regenerating (MSR), which is called MSR code (MSR). Codes),

最小带宽再生码(Minimum-Bandwidth Regenerating, MSR)，称为MBR编码(MBRCodes)，Minimum-Bandwidth Regenerating Codes (MSR), called MBR Codes,

再生码的修复过程如下：新建节点从完好的存储节点中任选d个存储节点下载数据，每个节点下载的数据量为β，即再生码的修复带宽

，且修复带宽随着d的增大而减小，这是因为随着参与修复的辅助节点数目d的增加，每个节点修复时传输的数据量β变小，且β变小的速度比d增大的速度更快，从而使总修复带宽减小。可见再生码的修复带宽优于RS码，但再生码的修复过程中的节点访问数d>k。另外，修复节点需要对其存储的数据执行随机线性网络编码操作，为了满足所有编码包是相互独立的，再生码的运算需要在一个较大的有限域内进行。The repair process of the regeneration code is as follows: the new node downloads data from d storage nodes selected from the intact storage nodes, and the amount of data downloaded by each node is β , which is the repair bandwidth of the regeneration code.

, and the repair bandwidth decreases as d increases, because with the increase of the number of auxiliary nodes d participating in the repair, the amount of data β transmitted by each node during repair becomes smaller, and the speed at which β becomes smaller is faster than d The increase is faster, resulting in a reduction in the total repair bandwidth. It can be seen that the repair bandwidth of the regenerative code is better than that of the RS code, but the number of node visits d > k in the repair process of the regenerative code. In addition, the repair node needs to perform random linear network coding operations on its stored data. In order to satisfy that all coding packets are independent of each other, the operation of the regeneration code needs to be performed in a large finite field.

根据节点中存储的信息是否为校验信息，可以将节点分为：According to whether the information stored in the node is verification information, the nodes can be divided into:

1）系统节点(Systematic Nodes)：系统节点存储的是未经编码的原始文件信息；1) Systematic Nodes: System nodes store unencoded original file information;

2）校验节点(Parity Nodes)：校验节点存储的是经过编码后的文件信息，也就是冗余信息。2) Parity Nodes: Parity Nodes store encoded file information, that is, redundant information.

根据修复后的节点数据是否与原失效节点数据相同可以将修复策略分为：According to whether the restored node data is the same as the original failed node data, the restoration strategies can be divided into:

1）功能修复(Functional Repair)：该修复方案修复后的节点数据与原失效节点数据不一定相同，但修复后生成的节点与其他节点组成的分布式存储系统仍具有MDS特性；1) Functional Repair: The repaired node data is not necessarily the same as the original failed node data, but the distributed storage system composed of the repaired node and other nodes still has MDS characteristics;

2）精确修复(Exact Repair)：该修复方案下修复后的节点数据原失效节点数据相同；2) Exact Repair: The repaired node data under this repair scheme is the same as the original failed node data;

3）混合修复(Hybrid Repair)：该修复方案对系统节点(存储未编码数据)进行精确修复，而对非系统节点(存储冗余数据）不需精确修复只需修复后使新的分布式系统仍满足 MDS 特性即可。3) Hybrid Repair: This repair scheme accurately repairs system nodes (stores uncoded data), while non-system nodes (stores redundant data) do not need to be accurately repaired, and only needs to be repaired to make a new distributed system. It is sufficient to still meet the MDS characteristics.

基于Dimakis等人提出的再生码，文献[C. Tian, B. Sasidharan, V. Aggarwal,V. A. Vaishampayan, and P. V. Kumar, “Layered exact-repair regenerating codesvia embedded error correction and block designs,” IEEE Trans. Inf. Theory,vol. 61, no. 4, pp. 1933–1947, Apr. 2015]提出了一种基于组合设计放置策略的精确修复再生码。组合设计是一组满足某些特性的子集的集合（子集被称为区组）。一个给定的区组设计由(X,B)指定，带有参数(r,n)，r≤n，即X是具有n个元素的集合，B是X的r元子集的集合。在该文中主要使用了斯坦纳系(Steiner systems)和重复组合区组设计(DuplicatedCombination Block Design, DCBD)这两种设计，实现了分布式存储系统参数限制为[n,k,d=k=n-1]，[n,k=n-2,d=n-1]和[n,k,d>k]的精确构造。这种构造比MSR码和MBR码之间的空间共享性能更好，并且是实现这种性能的第一类编码。此外，该构造可以在参数限制为[n,k,d=k=n-1]的最优功能修复折衷曲线上达到不平凡的点，并且在高码率下渐近最优。但是该文提出的构造方式对参数限制比较严格，只容许1~2个节点的失效，实用性不高。Based on the regeneration codes proposed by Dimakis et al. [C. Tian, B. Sasidharan, V. Aggarwal, VA Vaishampayan, and PV Kumar, “Layered exact-repair regenerating codes via embedded error correction and block designs,” IEEE Trans. Inf. Theory,vol. 61, no. 4, pp. 1933–1947, Apr. 2015] proposed an accurate repair regeneration code based on combinatorial design placement strategy. A combinatorial design is a collection of subsets (the subsets are called blocks) that satisfy certain properties. A given block design is specified by ( X , B ), with parameters ( r , n ), where r ≤ n , that is, X is a set with n elements and B is a set of r -element subsets of X. In this paper, two designs of Steiner systems and Duplicated Combination Block Design (DCBD) are mainly used, and the parameters of the distributed storage system are limited to [ n , k , d = k = n -1], [ n , k = n -2, d = n -1] and exact constructions for [ n , k , d > k ]. This construction performs better than spatial sharing between MSR codes and MBR codes, and is the first class of codes to achieve this performance. Furthermore, the construction can reach non-trivial points on the optimal functional repair trade-off curve with parameter constraints [ n , k , d = k = n -1], and is asymptotically optimal at high code rates. However, the construction method proposed in this paper has strict parameter restrictions, and only allows the failure of 1~2 nodes, which is not practical.

发明内容SUMMARY OF THE INVENTION

本发明的目的之一在于提供一种再生码构造方法，该方法的构造过程简单，对参数限制更为宽松。One of the objectives of the present invention is to provide a method for constructing a regenerated code, which has a simple construction process and looser parameter restrictions.

本发明的目的之二在于还提供一种基于所述的再生码构造方法的文件重构方法。Another object of the present invention is to further provide a file reconstruction method based on the regenerated code construction method.

本发明的目的之三在于还提供一种基于所述的再生码构造方法的节点修复方法。The third object of the present invention is to further provide a node repair method based on the regenerated code construction method.

本发明提供的这种再生码构造方法，包括如下步骤：This regeneration code construction method provided by the present invention comprises the following steps:

S1．获取组合设计和存储数据块；S1. Get combined design and storage data blocks;

S2．对区组排序并编号，统计元素出现的区组及位置，得到文件符号；S2. Sort and number the blocks, count the blocks and positions where the elements appear, and get the file symbol;

S3．对数据块进行编码，得到编码块；S3. Encode the data block to obtain an encoded block;

S4．为每个编码块分配一个不同的文件符号；S4. assign each encoded block a different file symbol;

S5．将每个元素一致的文件符号集合所对应的编码块放置在同一存储节点中，得到再生码。S5. The coding blocks corresponding to the file symbol sets that are consistent with each element are placed in the same storage node to obtain a regeneration code.

步骤S1，获取的组合设计具体为，选择满足组合条件的t-设计

，λ为t-平衡数，表示任意t元子集均出现在λ个区组中，其中，In step S1, the obtained combination design is specifically, selecting a t -design that satisfies the combination condition

, λ is the t -balance number, which means that any subset of t elements appears in λ blocks, where,

记b为区组个数，

；Let b be the number of blocks,

;

记r为元素重复度，

；Let r be the element repetition degree,

;

组合条件为：The combination conditions are:

；

;

其中，t为子集的元素数；v为设计的阶，具体为设计中不同元素的总数；m为区组容量，具体为一个区组中的元素个数；Among them, t is the number of elements in the subset; v is the order of the design, specifically the total number of different elements in the design; m is the block capacity, specifically the number of elements in a block;

获取的存储数据块具体为，设存储原文件大小为M，将原文件拆分成

个大小相等的数据块。The obtained storage data block is as follows, set the size of the original file to be stored as M, and split the original file into

data blocks of equal size.

步骤S2的文件符号设为

，其中1≤i≤b，1≤j≤m，b为区组个数，m为区组容量；同时

表示元素h出现在第i个区组中的第j个位置；将区组编号记为B ₁,B ₂,…,B _b，对于区组B _i，将其中的元素由小到大顺序排列，位置编号记为{1,2,…,m}。The file symbol of step S2 is set to

, where 1≤i≤b , 1≤j≤m , b is the number of blocks, m is the block capacity;

Indicates that the element h appears at the jth position in the ith block; record the block number as B ₁ , B ₂ ,..., B _b , for block B _i , arrange the elements in the order from small to large , and the position number is marked as {1,2,…, m }.

步骤S3具体为利用双层编码策略对数据块进行编码；外层获得的MDS码构造如下：Step S3 is specifically to use the double-layer coding strategy to encode the data block; the MDS code structure obtained by the outer layer is as follows:

利用参数为

MDS码对

个数据块进行编码，获得b(m-t+1)个编码块，其中b为区组个数，m为区组容量，t为子集的元素数，λ为t-平衡数，表示任意t元子集出现在λ个区组；Use parameters as

MDS code pair

Encode the data blocks to obtain b ( m - t +1) coding blocks, where b is the number of blocks, m is the block capacity, t is the number of elements of the subset, λ is the t -balance number, representing any A subset of t -elements occurs in λ blocks;

内层获得的MDS码构造如下：The MDS code obtained by the inner layer is constructed as follows:

将上述利用外层编码得到的b(m-t+1)个编码块均分为b组，利用参数为(m,m-t+1)MDS码对每组中的m-t+1个编码块进行二次编码，最终得到bm个编码块。Divide the above b ( m - t +1) coding blocks obtained by using the outer layer coding into b groups, and use the parameter ( m , m - t +1) MDS code for m - t +1 in each group. The coding block is subjected to secondary coding, and finally bm coding blocks are obtained.

步骤S4具体包括，为上述获得的每一个编码块分配一个文件符号作为标识，不失一般性地，将

的前

个文件符号

作为外层编码生成的冗余块的标识；将

的文件符号

作为内层编码生成的冗余块的标识，其中，

表示元素h出现在第i个区组中的第j个位置，m为区组容量。Step S4 specifically includes assigning a file symbol as an identifier for each coding block obtained above, without loss of generality, using

the former

file symbol

As the identifier of the redundant block generated by the outer encoding; the

file symbol

As the identifier of the redundant block generated by the inner layer coding, where,

Indicates that the element h appears at the jth position in the ith block, and m is the block capacity.

步骤S5具体包括，每当发现一个文件符号

，若其元素h与之前的文件符号

不同，则分配一个新的存储节点并编号为h；对于每一个文件符号

，若其元素h已分配存储节点，则将元素h相同的文化符号

所对应的编码块存放到编码为h的存储节点中，则一个存储节点存储r个编码块，r为元素重复度；得到参数为[n,k=n-t,d=n-t+1]的再生码，n为节点个数，k为下载数据的个数，d为参与修复的节点个数，t为子集的元素数，

表示元素h出现在第i个区组中的第j个位置。Step S5 specifically includes, whenever a file symbol is found

, if its element h is the same as the previous file symbol

is different, a new storage node is allocated and numbered h ; for each file symbol

, if its element h has been allocated a storage node, then the same cultural symbol of element h

The corresponding coding blocks are stored in the storage node coded as h , then one storage node stores r coding blocks, r is the element repetition degree; the obtained parameters are [ n , k = n - t , d = n - t +1 ], n is the number of nodes, k is the number of downloaded data, d is the number of nodes participating in the repair, t is the number of elements of the subset,

Indicates that element h appears at the jth position in the ith block.

本发明还提供一种包括上述再生码构造方法的文件重构方法，包括如下步骤：The present invention also provides a file reconstruction method comprising the above-mentioned regeneration code construction method, comprising the following steps:

A1．根据步骤S1~S5构造再生码；A1. Construct the regeneration code according to steps S1～S5;

A2．连接到存储节点；A2. connect to the storage node;

A3．在下载当前编码块时，进行重构第一次判断；A3. When downloading the current encoding block, make the first judgment of reconstruction;

A4．在完成一个编码块的下载后，进行重构第二次判断；A4. After completing the download of a coding block, the second judgment of reconstruction is performed;

A5．利用内层及外层的MDS编码规则恢复原文件。A5. Use the inner and outer MDS coding rules to restore the original file.

步骤A3的重构第一次判断具体包括，对于当前具有相同区组编号i ₀的文件符号

，是否已经下载了m-t+1个文件符号

所代表的编码块，

表示元素h出现在第i ₀个区组中的第j个位置，m为区组容量；若判断为是，则停止下载当前编码块；若判断为否，则下载当前编码块；The first judgment of the reconstruction of step A3 specifically includes, for the current file symbol with the same block number i ₀

, whether m - t +1 file symbols have been downloaded

represents the coding block,

Indicates that the element h appears at the jth position in the _i0th block, and m is the block capacity; if it is judged to be yes, then stop downloading the current coding block; if it is judged to be no, then download the current coding block;

步骤A4的重构第二次判断具体包括，判断是否已经下载了

个编码块；若判断为是，则停止下载过程，若判断为否，则继续下载过程，b为区组个数，t为子集的元素数，λ为t-平衡数，表示任意t元子集出现在λ个区组中。The second judgment of reconstruction in step A4 specifically includes, judging whether it has been downloaded

If it is judged as yes, then stop the download process, if it is judged as no, then continue the download process, b is the number of blocks, t is the number of elements of the subset, λ is the t -balance number, representing any t element Subsets appear in λ blocks.

本发明还提供了一种包括上述的再生码构造方法的节点修复方法，包括如下步骤：The present invention also provides a node repair method including the above-mentioned regeneration code construction method, comprising the following steps:

B1．根据步骤S1~S5构造再生码；B1. Construct the regeneration code according to steps S1～S5;

B2．确定失效节点的序号；B2. Determine the serial number of the failed node;

B3．在组合设计中确定失效节点出现过的区组序号；B3. Determine the block serial number of the failed node in the combined design;

B4．从n-t+1个节点中下载编码块，共下载r(m-t+1)个编码块，r为元素重复度，n为节点个数，t为子集的元素数，m为区组容量；B4. Download coding blocks from n - t +1 nodes, download r ( m - t +1) coding blocks in total, r is the element repetition degree, n is the number of nodes, t is the number of elements of the subset, m is the area group capacity;

B5．将步骤B4下载的r(m-t+1)个编码块分成r组，每组的编码块具有相同的区组编号i；B5. The r ( m - t +1) code blocks downloaded in step B4 are divided into r groups, and the code blocks of each group have the same block number i ;

B6．对每一组编码块利用内层获得的MDS码的编码规则恢复一个编码块；将r组编码块恢复出r个编码块；B6. For each group of coding blocks, use the coding rule of the MDS code obtained by the inner layer to restore a coding block; restore r coding blocks from the r group of coding blocks;

B7．将恢复出的r个编码块存储到一个新的存储节点，并将此节点加入原存储系统中完成修复过程。B7. Store the recovered r encoded blocks in a new storage node, and add this node to the original storage system to complete the repair process.

步骤B2具体为，从剩下的完好节点中对元素编号，记为{h ₁,...,h _v-1}，将元素编号与

设计中的元素对比，确定失效节点的序号h ₀；其中v为设计的阶，m为区组容量，λ为t-平衡数；步骤B3的区组序号记为{i ₁,...,i _r}；步骤B4的文件符号具体为

，表示h ₀出现在第i个区组中的第j个位置，

。Step B2 is specifically: number the elements from the remaining intact nodes, denoted as { h ₁ ,..., h _{v -1} }, and compare the element number with the

The elements in the design are compared, and the sequence number h ₀ of the failed node is determined; where v is the design order, m is the block capacity, λ is the t -balance number; the block sequence number in step B3 is marked as { i ₁ ,..., i _r }; The file symbol of step B4 is specifically

, indicating that h ₀ appears at the jth position in the ith block,

.

本发明提出的再生码构造方法参数限制宽松且能达到辛格尔顿上界，文件重构和节点修复过程运算简单，同时节点修复效率较高。The regeneration code construction method proposed by the invention has loose parameter restrictions and can reach the Singleton upper bound, the file reconstruction and node repairing process are simple in operation, and the node repairing efficiency is high.

附图说明Description of drawings

图1为现有技术中RS码修复过程示意图。FIG. 1 is a schematic diagram of an RS code repair process in the prior art.

图2为现有技术中分布式存储系统的信息流图。FIG. 2 is an information flow diagram of a distributed storage system in the prior art.

图3为本发明方法的再生码构造方法的流程示意图。FIG. 3 is a schematic flowchart of a method for constructing a regenerative code according to the method of the present invention.

图4为本发明方法的再生码构造方法的实施例存储编码示意图。FIG. 4 is a schematic diagram of storage coding according to an embodiment of a method for constructing a regenerative code according to the method of the present invention.

图5为本发明方法的文件重构方法的流程示意图。FIG. 5 is a schematic flowchart of a file reconstruction method according to the method of the present invention.

图6为本发明方法的文件重构方法的实施例示意图。FIG. 6 is a schematic diagram of an embodiment of a file reconstruction method according to the method of the present invention.

图7为本发明方法的节点修复方法的流程示意图。FIG. 7 is a schematic flowchart of a node repairing method according to the method of the present invention.

图8为本发明方法的节点修复方法的实施例示意图。FIG. 8 is a schematic diagram of an embodiment of a node repairing method according to the method of the present invention.

具体实施方式Detailed ways

如图3为本发明方法的再生码构造方法的流程示意图。本发明提供的这种再生码构造方法，包括如下步骤：FIG. 3 is a schematic flowchart of a method for constructing a regenerative code according to the method of the present invention. This regeneration code construction method provided by the present invention comprises the following steps:

S1．获取组合设计和存储数据块；获取的组合设计具体为，选择满足组合条件的t-设计

，λ为t-平衡数，表示任意t元子集均出现在λ个区组中，其中，S1. Obtain the combination design and store the data block; the obtained combination design is specifically, select the t -design that satisfies the combination condition

记b为区组个数，

；Let b be the number of blocks,

;

记r为元素重复度，

；Let r be the element repetition degree,

;

组合条件为：The combination conditions are:

其中，t为子集的元素数；v为设计的阶，具体为设计中元素的种类数；m为区组容量，具体为一个区组中的元素个数；Among them, t is the number of elements of the subset; v is the order of the design, specifically the number of types of elements in the design; m is the block capacity, specifically the number of elements in a block;

获取的存储数据块具体为，设存储原文件大小为M，将原文件拆分成相等大小的

个数据块。The obtained storage data block is specifically, set the size of the original file to be stored as M, and split the original file into equal-sized files.

data blocks.

S2．对区组排序并编号，统计元素出现的区组及位置，得到文件符号；文件符号设为

，其中1≤i≤b，1≤j≤m，b为区组个数，m为区组容量；同时

表示元素h出现在第i个区组中的第j个位置；将区组编号记为B ₁,B ₂,…,B _b，对于区组B _i，将其中的元素先后顺序排列，编号记为位置{1,2,…,m}。S2. Sort and number the blocks, count the blocks and positions where the elements appear, and get the file symbol; the file symbol is set to

Indicates that the element h appears at the jth position in the ith block; the block number is recorded as B ₁ , B ₂ ,..., B _b , for the block B _i , the elements in it are arranged in sequence, and the number is marked as is the position {1,2,…, m }.

S3．利用双层编码策略对数据块进行编码；外层获得的MDS码构造如下：S3. The data block is coded using a double-layer coding strategy; the MDS code obtained by the outer layer is constructed as follows:

利用参数为

MDS码对

MDS code pair

S4．为上述获得的每一个编码块分配一个文件符号作为标识，不失一般性地，将

的前

个文件符号

作为外层编码生成的冗余块的标识；将

的文件符号

作为内层编码生成的冗余块的标识，其中，

表示元素h出现在第i个区组中的第j个位置，m为区组容量。S4. Assign a file symbol as an identifier to each coding block obtained above, without loss of generality, set

the former

file symbol

As the identifier of the redundant block generated by the outer encoding; the

file symbol

S5．将每个元素一致的文件符号集合所对应的编码块放置在同一存储节点中，得到再生码。具体包括，每当发现一个文件符号

，若其元素h与之前的文件符号

，若其元素h已分配存储节点，则将元素h相同的文化符号

所对应的编码块存放到编码为h的存储节点中，则一个存储节点存储r个编码块，r为元素重复度；得到参数为[n,k=n-t,d=n-t+1]的再生码，n为节点个数，k为下载数据的个数，d为参与修复的节点个数，t为子集的元素数。S5. The coding blocks corresponding to the file symbol sets that are consistent with each element are placed in the same storage node to obtain a regeneration code. Specifically include, whenever a file symbol is found

, if its element h is the same as the previous file symbol

The corresponding coding blocks are stored in the storage node coded as h , then one storage node stores r coding blocks, r is the element repetition degree; the obtained parameters are [ n , k = n - t , d = n - t +1 ], n is the number of nodes, k is the number of downloaded data, d is the number of nodes participating in the repair, and t is the number of elements of the subset.

在具体实施方式中，给出一个参数M=39，n=8，d=7，k=6的存储编码实例，如图4为本发明方法的再生码构造方法的实施例存储编码示意图。具体为如下步骤：In the specific implementation, an example of storage coding with parameters M=39, n =8, d =7, k =6 is given, as shown in FIG. The specific steps are as follows:

步骤（1）、该分布式存储系统取得39个初始数据块（该处的数据块是指将原文件经过步骤S1后获得），利用步骤S3中的双层编码策略进行编码，得到56个编码块。Step (1), the distributed storage system obtains 39 initial data blocks (the data block here refers to the original file obtained after going through step S1), and uses the double-layer encoding strategy in step S3 to encode, and obtains 56 codes piece.

该参数为[n=8,k=6,d=7] 分布式存储系统，是基于2-(8,4,3)设计来构建的，This parameter is [ n =8, k =6, d =7] Distributed storage system, which is built based on 2-(8,4,3) design,

2-(8,4,3)进排序后如下：{{0,1,2,3},{0,1,2,4},{0,1,5,6},{0,2,5,7},{0,3,4,5},{0,3,6,7},{0,4,6,7},{1,2,6,7},{1,3,4,6},{1,3,5,7},{1,4,5,7},{2,3,4,7},{2,3,5,6},{2,4,5,6}}。2-(8,4,3) is sorted as follows: {{0,1,2,3},{0,1,2,4},{0,1,5,6},{0,2, 5,7},{0,3,4,5},{0,3,6,7},{0,4,6,7},{1,2,6,7},{1,3, 4,6},{1,3,5,7},{1,4,5,7},{2,3,4,7},{2,3,5,6},{2,4, 5,6}}.

步骤（2）、统计其内元素出现的区组及位置，结果如下：

Step (2), count the blocks and positions where the elements appear, and the results are as follows:

,

；

;

步骤（3）、为56个编码块各分配一个文件符号

，其中

表示元素h出现在第i个区组中的第j个位置。Step (3), assign a file symbol to each of the 56 coding blocks

,in

Indicates that element h appears at the jth position in the ith block.

步骤（4）、双层编码策略具体如下，内层编码利用参数为(4, 3)的MDS码进行编码，不失一般性地令每个区组中的最后一个文件符号所对应的编码块作为校验块，那么可得Step (4), the double-layer encoding strategy is as follows, the inner layer encoding uses the MDS code with parameters (4, 3) to encode, without loss of generality, make the encoding block corresponding to the last file symbol in each block group. As a check block, then we can get

……

外层编码利用参数为(42, 39)的MDS进行编码。在本实施例中，不失一般性地令

所对应的编码块作为校验块，那么可得The outer coding is coded using MDS with parameters (42, 39). In this embodiment, without loss of generality, let

The corresponding coding block is used as the check block, then we can get

。

.

在本实施例中，为了保证校验符号之间相互独立，文件符号前的系数如x，x ²，x ³等为范德蒙矩阵不同行的数据。同时模加法运算均是在有限域内进行，本实施例将有限域设置为GF（43）。In this embodiment, in order to ensure that the check symbols are independent of each other, the coefficients before the file symbols, such as x , x ² , x ³ , etc., are data of different rows of the Vandermonde matrix. At the same time, the modulo addition operations are all performed in a finite field, and in this embodiment, the finite field is set as GF(43).

步骤（5）、为不同的元素分配不同的存储节点，该实施例中共由0~7个元素，则需创建8个存储节点；将每个h一致的符号

所对应的编码块放置在同一存储节点中，那么一个存储节点存储7个编码块，最终得到如图4所示的存储编码。Step (5): Allocate different storage nodes for different elements. In this embodiment, there are 0 to 7 elements in total, and 8 storage nodes need to be created; the symbols that are consistent with each h are

If the corresponding coding blocks are placed in the same storage node, then one storage node stores 7 coding blocks, and finally the storage code shown in Figure 4 is obtained.

如图5为本发明方法的文件重构方法的流程示意图。基于所述再生码构造方法的文件重构方法包括如下步骤：FIG. 5 is a schematic flowchart of a file reconstruction method according to the method of the present invention. The file reconstruction method based on the regenerated code construction method comprises the following steps:

A2．连接到存储节点；A2. connect to the storage node;

文件重构方法主要利用步骤S3生成的外层参数为

的MDS码提供的冗余信息；原始数据块数目为

，即增加了

个冗余块，因此，在文件重构过程中需要收集

个不同的编码块。在本实施方式中，最多取n-t个存储节点，用于下载数据。The file reconstruction method mainly uses the outer parameters generated in step S3 as

The redundancy information provided by the MDS code; the number of original data blocks is

, which increases the

redundant blocks, therefore, need to be collected during file reconstruction

different coding blocks. In this implementation manner, at most n - t storage nodes are selected for downloading data.

A3．在下载当前编码块时，进行重构第一次判断；步骤A2的重构第一次判断具体包括，对于当前具有相同区组编号i ₀的文件符号

，是否已经下载了m-t+1个文件符号

所代表的编码块，

表示元素h出现在第i ₀个区组中的第j个位置，m为区组容量，若判断为是，则停止下载当前编码块，若判断为否，则下载当前编码块；A3. When downloading the current coding block, the first judgment of reconstruction is performed; the first judgment of reconstruction in step A2 specifically includes, for the current file symbol with the same block number i ₀

, whether m - t +1 file symbols have been downloaded

represents the coding block,

Indicates that element h appears in the jth position in the _i0th block, m is the block capacity, if it is judged to be yes, then stop downloading the current coding block, if it is judged to be no, then download the current coding block;

A4．在完成一个编码块的下载后，进行重构第二次判断；步骤A3的重构第二次判断具体包括，判断是否已经下载了

个编码块，若判断为是，则停止下载过程，若判断为否，则继续下载过程，b为区组个数，t为子集的元素数，λ为t-平衡数，表示任意t元子集均出现在λ个区组中；A4. After completing the downloading of a coding block, a second judgment of reconstruction is performed; the second judgment of reconstruction in step A3 specifically includes, judging whether it has been downloaded

If the judgment is yes , the downloading process will be stopped; if the judgment is negative , the downloading process will be continued . The subsets all appear in λ blocks;

A5．利用内层及外层的MDS编码规则恢复原文件；具体为已下载到

个编码块时，若同一区组中编码块数目少于m-t+1个，那么利用外层

MDS编码规则恢复编码块；若同一区组中编码块数目为m-t+1个，那么利用内层(m, m-t+1) MDS编码规则恢复编码块。A5. Use the MDS coding rules of the inner and outer layers to restore the original file; specifically, it has been downloaded to

If the number of coding blocks in the same block is less than m - t +1, then the outer layer is used.

The MDS coding rule restores the coding block; if the number of coding blocks in the same block group is m - t +1, then the inner layer ( m , m - t +1) MDS coding rule is used to restore the coding block.

并不是每一次重构操作都需n-t个存储节点传输数据，而是只需下载

不同的编码块即可结束文件传输过程，进而开始文件重构操作。Not every reconstruction operation requires n - t storage nodes to transfer data, but just download

The different encoding blocks can end the file transfer process and start the file reconstruction operation.

具体实施方式中，给出一个M=39，n=8，d=7，k=6的文件重构实例，如图6为本发明方法的文件重构方法的实施例示意图。具体说明如下：In the specific implementation, an example of file reconstruction with M=39, n=8, d=7, and k=6 is given. FIG. 6 is a schematic diagram of an embodiment of a file reconstruction method according to the method of the present invention. The specific instructions are as follows:

步骤1)、在本实施例中，连接至Node 0~Node 5这6个节点进行文件重构操作。Step 1), in this embodiment, connect to 6 nodes Node 0 to Node 5 to perform file reconstruction operation.

步骤2)、Node 0需下载所有数据，Node 1需下载所有数据，Node 2需下载所有数据，Node 3中下载除文件符号

对应的编码块外的所有数据，Node 4中下载除

对应的编码块外的所有数据，Node 5中下载除

对应的编码块外的所有数据，至此共下载39个编码块，数据下载过程结束。Step 2), Node 0 needs to download all data, Node 1 needs to download all data, Node 2 needs to download all data, and Node 3 downloads all data except the file symbol

All data except the corresponding coding block, downloaded in Node 4 except

All data except the corresponding coding block, downloaded in Node 5 except

For all the data outside the corresponding coding block, a total of 39 coding blocks have been downloaded so far, and the data downloading process ends.

步骤3)、利用外层(42, 39)MDS编码规程进行文件恢复。由于Step 3), using the outer layer (42, 39) MDS coding procedure to restore the file. because

而从Node 0~Node 5下载的数据缺少

所代表的的的编码块，所以从

中消去其余编码块后结果如下：And the data downloaded from Node 0~Node 5 is missing

represented by the encoding block, so from

After eliminating the remaining coding blocks, the result is as follows:

步骤4)、编码过程中将有限域设置为GF（43），且

线性无关，那么利用上述三式能将

所代表的编码块恢复出来。那么至此，所有编码块均已得到。Step 4), the finite field is set to GF(43) in the encoding process, and

Linear independent, then using the above three equations can be

The encoded block represented is recovered. So far, all coding blocks have been obtained.

如图7为本发明方法的节点修复方法的流程示意图，基于所述再生码构造方法的节点修复方法，包括如下步骤：FIG. 7 is a schematic flowchart of the node repair method of the method of the present invention. The node repair method based on the regeneration code construction method includes the following steps:

B2．确定失效节点的序号；具体为，从剩下的完好节点中对元素编号，记为

，将元素编号与

设计中的元素对比，确定失效节点的序号h ₀；节点修复方法根据步骤S3的内层参数为(m,m-t+1)的MDS码提供的冗余信息，由于内层参数为(m,m-t+1)的MDS码只有t-1个数据符号作为冗余信息，因此共需下载r(m-t+1)个编码块。B2. Determine the serial number of the failed node; specifically, number the elements from the remaining intact nodes, denoted as

, compare the element number with

The elements in the design are compared, and the serial number h ₀ of the failed node is determined; the node repair method is based on the redundant information provided by the MDS code whose inner parameter in step S3 is ( m , m - t +1), since the inner parameter is ( m , m - t +1) of the MDS code has only t -1 data symbols as redundant information, so a total of r ( m - t +1) coding blocks need to be downloaded.

B3．在组合设计中确定失效节点出现过的区组序号，依次记为

；B3. Determine the block serial number of the failed node in the combined design, and record it as

;

B4．从n-t+1个节点中下载编码块，对应文件符号具体为

，

表示h出现在第i个区组中的第j个位置，共下载r(m-t+1)个编码块；B4. Download the encoded block from n - t +1 nodes, the corresponding file symbol is specifically

,

Indicates that h appears at the jth position in the ith block, and a total of r ( m - t +1) coding blocks are downloaded;

对于一个t-设计

，即共有v个不同的元素，每个区组有m个元素，任意t个元素相遇λ次（即出现在同一个区组中）。在编码过程的步骤S3中，内层参数为(m,m-t+1)的MDS码提供t-1份冗余信息，那么对于属于同一个区组的编码块能容忍t-1个编码块的失效。进一步地，整个存储系统能容忍t个节点的失效，共失效tr个编码块，该过程中失效的编码不能只由内层(m,m-t+1) MDS码提供的冗余信息进行恢复，而还必须依赖于外层参数为

的MDS码来进行恢复；其中内层(m,m-t+1)MDS码能为这tr个失效编码块提供

份冗余信息，外层

MDS码能为这些冗余文件提供

份冗余信息，因此利用这

份冗余信息能恢复这失效的tr个编码块。for a t -design

, that is, there are v different elements in total, each block has m elements, and any t elements meet λ times (that is, appear in the same block). In step S3 of the encoding process, the MDS code whose inner parameter is ( m , m - t +1) provides t -1 redundant information, then t -1 encodings can be tolerated for the encoding blocks belonging to the same block group Invalidation of blocks. Further, the entire storage system can tolerate the failure of t nodes, and tr code blocks fail in total. The code that fails in this process cannot be recovered only by the redundant information provided by the inner ( m , m - t +1) MDS code. , and must also depend on the outer parameters as

The MDS code is used for recovery; the inner ( m , m - t +1) MDS code can provide the tr failed coding blocks

redundant information, outer layer

MDS codes can provide these redundant files

redundant information, so use this

The redundant information can restore the failed tr coding blocks.

在具体实施方式中，给出一个M=39，n=8，d=7，k=6的节点修复实例，如图8为本发明方法的节点修复方法的实施例示意图。具体说明如下：在本实施例中，令Node 0失效，那么具体修复过程如下：In the specific implementation manner, an example of node repairing with M=39, n =8, d =7, k =6 is given. FIG. 8 is a schematic diagram of an embodiment of the node repairing method of the method of the present invention. The specific description is as follows: In this embodiment, if Node 0 is disabled, the specific repair process is as follows:

步骤一、由于Node 1~Node 7节点完好，那么失效节点的元素编号为0；Step 1. Since the nodes from Node 1 to Node 7 are intact, the element number of the failed node is 0;

步骤二、该分布式存储系统是基于2-(8,4,3)设计的，那么元素0出现过的区组序号有{1,2,3,4,5,6,7}。Step 2: The distributed storage system is designed based on 2-(8,4,3), then the block sequence numbers where element 0 has appeared are {1,2,3,4,5,6,7}.

步骤三、从7个存储节点中下载

所代表的文件，Step 3. Download from 7 storage nodes

the document represented,

，则共下载了21个编码块。

, a total of 21 code blocks are downloaded.

步骤四、将所下载的编码块分为7组，每组中的编码块

具有相同的区组编号i，分组后如下

Step 4. Divide the downloaded coding blocks into 7 groups, and the coding blocks in each group

have the same block number i , grouped as follows

。

.

步骤五、对于对每一组均利用内层参数为(4, 3)的MDS码的编码规则恢复出一个编码块，例如在第1组中

，且

已获得，那么经过异或操作后即可恢复

。类似地对每一组均进行上述恢复操作，最终能获得

，而这正是失效节点中的编码块。Step 5. For each group, use the encoding rule of the MDS code with the inner parameter of (4, 3) to recover a coding block, for example, in the first group

,and

has been obtained, then it can be restored after XOR operation

. Similarly, the above recovery operation is performed for each group, and finally the

, which is the encoded block in the failed node.

步骤六、将上述7个编码块存储至一个新的存储节点，并将此节点加入原分布式存储系统中。至此，节点修复过程完成。Step 6: Store the above seven coding blocks in a new storage node, and add this node to the original distributed storage system. At this point, the node repair process is complete.

总结本发明提出的码字与Chao Tian提出的码字相关参数，如下表所示：Summarize the relevant parameters of the code word proposed by the present invention and the code word proposed by Chao Tian, as shown in the following table:

证明利用上述编码方案生成的[n,k=n-t,d=n-t+1]分布式存储系统能达到辛格尔顿界，即d _min= n-k+1，证明过程如下：Prove that the [ n , k = n - t , d = n - t +1] distributed storage system generated by the above coding scheme can reach the Singleton bound, that is, d _min = n - k +1, and the proof process is as follows:

Ⅰ．根据编码过程中的步骤S5，将所有具有相同h的

所表示的编码块存储至同一节点中，那么一个节点中存储的文件数

，r为元素重复度；并且为每一个不同的元素h分配一个新的存储节点，那么存储系统的总节点数n=v。I. According to step S5 in the encoding process, all the

If the encoded block represented is stored in the same node, then the number of files stored in a node

, r is the element repetition degree; and a new storage node is allocated for each different element h , then the total number of nodes in the storage system is n = v .

Ⅱ．编码过程的步骤S3，由于存在双层MDS编码，那么该分布式系统中能存储的数据块数目

，其中

。II. In step S3 of the encoding process, due to the existence of double-layer MDS encoding, the number of data blocks that can be stored in the distributed system

,in

.

Ⅲ．根据

。根据节点修复过程，一个节点失效需连接至

个节点进行修复。Ⅲ. according to

. According to the node repair process, a node failure needs to be connected to

nodes are repaired.

Ⅳ．辛格尔顿界的左式：Ⅳ. The left form of the Singleton bound:

左式=n-(n-t)+1=t+1Left = n -( n - t )+1= t +1

辛格尔顿界的右式：The right-hand form of the Singleton bound:

由于v≥m>0，所以上式Since v ≥ m > 0, the above formula

Ⅴ．由于编码过程的步骤S1中，选择的t-设计需要满足的要求是

，因此辛格尔顿界右式=t+1。V. As in step S1 of the encoding process, the selected t -design needs to satisfy the requirement that

, so the right-hand Singleton bound = t + 1.

Ⅵ．由于左式=右式，因此证毕。VI. Since the left form = the right form, the proof is completed.

Claims

1. A method for constructing a regenerated code, comprising the steps of:

s1, acquiring a combined design and storage data block; the obtained combination design is specifically that the combination condition is satisfiedt-designing

，λIs composed oftEquilibrium number, representing arbitrarytThe meta-subset all appear inλThe number of the blocks is the same as the number of the blocks,

note the bookbThe number of the block groups is used as the number of the block groups,

；

note the bookrThe degree of repetition of the elements is,

；

the combination conditions are as follows:

；

wherein,tis the number of elements of the subset;vthe order of the design, specifically the total number of different elements in the design;mthe block capacity is the number of elements in a block;

the obtained storage data block is specifically that the size of the original storage file is set as M, and the original file is split into

Data blocks of equal size;

s2, sorting and numbering the block groups, and counting the block groups and positions of the elements to obtain file symbols;

s3, coding the data block to obtain a coding block;

s4, distributing a different file symbol for each coding block;

and S5, placing the coding blocks corresponding to the file symbol sets with the consistent elements in the same storage node to obtain the regeneration codes.

2. The method according to claim 1, wherein the file symbol of step S2 is set as

Wherein 1 is less than or equal toi≤b，1≤j≤m，bThe number of the block groups is used as the number of the block groups,mthe block capacity is obtained; at the same time

Presentation elementhAppear atiThe first in the groupjA location; the block number is shown asB ₁,B ₂,…,B _bFor a block of blocksB _iThe elements therein are arranged in descending order, and the position numbers are given as {1,2, …,m}。

3. the method according to claim 2, wherein the step S3 is to encode the data block by using a dual-layer encoding strategy; the structure of the MDS code obtained from the outer layer is as follows:

using a parameter of

MDS code pair

Encoding the data block to obtainb(m-t+1) code blocks, whereinbThe number of the block groups is used as the number of the block groups,mis the capacity of the block group,tis the number of elements of the subset,λis composed oftEquilibrium number, representing arbitrarytThe meta-subset appearing inλEach group of cells;

the MDS code obtained in the inner layer is constructed as follows:

obtained by outer layer codingb(m-t+1) code blocks are equally divided intobSet of using parameters of (m,m-t+1) MDS code pairs in each groupm-tCarrying out secondary coding on +1 coding blocks to finally obtainbmAnd coding the blocks.

4. The method according to claim 3, wherein step S4 comprises assigning a file symbol as an identifier to each of the obtained code blocks, and without loss of generality, assigning a file symbol as an identifier to each of the obtained code blocks

Front of

File symbol

As the identification of the redundant block generated by the outer layer coding; will be provided with

File symbol of

As an identification of the redundant blocks generated by the inner layer coding, wherein,

presentation elementhAppear atiThe first in the groupjThe position of each of the plurality of positions,mthe block size.

5. The method of claim 4, wherein the step S5 is embodied in the form of a packageWhenever a file symbol is found

If it is an elementhWith previous file symbols

Otherwise, a new storage node is allocated and numbered ash(ii) a For each file symbol

If it is an elementhAllocated storage node, then elementhThe same cultural symbols

The corresponding code block is stored to be coded ashOf the storage nodes of (1), one storage node storesrA plurality of code blocks, each of which is encoded,ris the element repetition degree; the obtained parameter is [ 2 ]n,k=n-t,d=n-t+1]The reproduction code of (2) is stored,nthe number of the nodes is the number of the nodes,kin order to download the number of data,din order to determine the number of nodes participating in the repair,tis the number of elements of the subset,

presentation elementhAppear atiThe first in the groupjAnd (4) a position.

6. A file reconstruction method based on the reproduction code construction method according to any one of claims 1 to 5, characterized by comprising the steps of:

A1. constructing a reproduction code according to the steps S1-S5;

A2. connecting to a storage node;

A3. when downloading the current coding block, carrying out reconstruction first judgment;

A4. after the downloading of one coding block is finished, carrying out reconstruction and second judgment;

A5. and restoring the original file by using the MDS coding rules of the inner layer and the outer layer.

7. The method of claim 6, wherein the first determination of reconstruction at step A3 includes, for the current granules having the same numberi ₀File symbol of

Whether or not to already downloadm-t+1 file symbol

The code blocks represented by the code blocks are,

presentation elementhAppear ati ₀The first in the groupjThe position of each of the plurality of positions,mthe block capacity is obtained; if the judgment result is yes, stopping downloading the current coding block; if not, downloading the current coding block; the second determination of the reconstruction in step a4 specifically includes determining whether the download has been completed

A plurality of coding blocks; if the judgment result is yes, the downloading process is stopped, if the judgment result is no, the downloading process is continued,bthe number of the block groups is used as the number of the block groups,tis the number of elements of the subset,λis composed oftEquilibrium number, representing arbitrarytThe meta-subset appearing inλIn groups of blocks.

8. A node repairing method based on the regenerated code constructing method of any claim 1 to 5, characterized by comprising the steps of:

B1. constructing a reproduction code according to the steps S1-S5;

B2. determining the serial number of the failed node;

B3. determining the block serial number of the failed node in the combined design;

B4. fromn-tIn +1 nodesDownloading coding blocks, co-downloadingr(m-t+1) code blocks, the number of code blocks,rthe degree of repetition of the elements is,nthe number of the nodes is the number of the nodes,tis the number of elements of the subset,mthe block capacity is obtained;

B5. downloaded step B4r(m-t+1) coded block partitionsrGroups, the coding blocks of each group having the same block numberi；

B6. Recovering a coding block for each group of coding blocks by using the coding rule of the MDS code obtained by the inner layer; will be provided withrRecovery of block coded blocksrA plurality of coding blocks;

B7. will be recoveredrAnd storing the coding blocks into a new storage node, and adding the node into the original storage system to complete the repair process.

9. The node repairing method according to claim 8, wherein the step B2 is embodied as numbering elements from the remaining intact nodes as ∑h ₁,...,h _v-1Numbering the elements with

Comparing elements in design and determining serial number of failed nodeh ₀(ii) a WhereinvIn order to be a step of the design,mis the capacity of the block group,λis composed oft-a balance number; the block number of step B3 is recorded asi ₁,...,i _r}; the file symbol of step B4 is specifically

Is shown byh ₀Appear atiThe first in the groupjThe position of each of the plurality of positions,

。