CN108141228A

CN108141228A - The coding of distributed memory system

Info

Publication number: CN108141228A
Application number: CN201580083657.1A
Authority: CN
Inventors: 彼得·弗拉基米罗维奇·特里福诺夫; 邵云峰; 王元钢
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2015-10-09
Filing date: 2015-10-09
Publication date: 2018-06-08
Also published as: WO2017061891A9; WO2017061891A1

Abstract

The present invention relates to a kind of methods for being encoded to the input data in code word.The code word is the product as vector u ' and matrix A and obtains that the vector u ' includes symbol, (I), rest position (II) includes the input data, the matrix A is represented by (III), wherein B be permutation matrix, F₀, F₁... ..., F_m‑1It is GF (2^μ) on non-equivalent replacement be diagonal matrix l_i×l_iA matrix, wherein：SetInclude the integer s of band (IV), setIncluding integer (V) so that (VI) or setIncluding integer (VII).

Description

The coding of distributed memory system

Technical field

The present invention relates to the method for the input data in coding codeword and the methods for updating code word.The present invention Further relate to storage control and computer readable storage medium.Moreover, it relates to a kind of calculating for storing program code Machine readable storage medium storing program for executing, the program code include the instruction for performing one of the above method.

Background technology

By N_sIn the information storage system of platform server composition, due to many differences, every equipped with N_dA storage Equipment, server, disk and block server may break down at any time or interim off line.In order to ensure what is stored in system Information is continuously available, and multiple data copies (Google file system and Hadoop points can be stored on different server/disk This method is used in cloth file system).But this can greatly increase memory requirement and integral device cost.It is another Solution is using certain type of erasure code, i.e., a block number is divided into k block of information (symbol) according to (slitting), is it Calculate n-k odd even (verification) block (symbol), and these blocks are stored in different disk and server.If any of which Respective symbol can then be considered as and wipe by one failure, and attempt by corresponding encoded entangle delete decoding come restore lose Block.

Many correcting and eleting codes for being used for network store system have been proposed, including：Reed Solomon Coding, pyramid are compiled Code, EvenOdd and RDP codings, odd even partition encoding and Zigzag codings.(n, k, n-k+1) Reed Solomon Coding provides For the protection arbitrarily combined of up to n-k erasing, therefore there is minimum possibility redundancy.But restore any and wiped Divided-by symbol is required for accessing at least k survival symbol.Pyramid and odd even partition encoding are provided by accessing at most l ＜ k It is non-to wipe symbol to restore the ability largely wiped.This is realized using higher coding redundancy as cost.Substantially, these structures Make is to be encoded by introducing additional checking symbol in the codeword from some maximum distance separables (for example, Reed-Solomon is compiled Code) in obtain, wherein additional checking symbol is only dependent upon some subsets of information symbol.

It is poised for battle row coding (such as EvenOdd, RDP and Zigzag are encoded) by vector alphabet to be defined.This causes people Can design high efficient coding algorithm, and reduce and pass through the data volume of network transmission in the case where wiping recovery situation.But for The arbitrary value of coding dimension k and redundancy n-k are not still used for the construction of these codings clearly with effective method.

The performance of network store system depends on the clothes of the traffic and access generated during coding and reconstruction operation The computation complexity of business device quantity, disk access rate and related algorithm.The major issue occurred in such systems is： Using tending to write data into relatively small number of piece of less than k block composition.This needs to implement buffering, i.e., arrives data accumulation Somewhere, until being collected into sufficient amount of data.Since event may occur for the storage device itself for being used to buffer Barrier, this method may result in loss of data.In addition, application may need certain information in the slitting that update stores before Block.In order to keep the consistency of check block, need to obtain the old value of its old value and block of information, calculate their difference, and more It is new they to reflect new data.This is related to many input/output and network transmission operation, can seriously reduce system performance.

If entangle and deleting decoding, server should be accessed as few as possible, because each network data transmission can Cause very high performance compensatory.This problem is solved by constructing local decodable code coding, such as pyramid and odd even point Cut coding.But more checking symbols, part are encoded than comparable maximum distance separable since these codings have Update can bring higher performance cost.Therefore, the method has been only used for immutable data.

Invention content

The purpose of the present invention is to provide the method for coded data and the method for updating code word, wherein the side Method overcomes the problem of one or more of the prior art is mentioned above.

Particularly, the purpose of the present invention can include ensuring that the availability of data in distributed memory system, the distribution Storage system can suffer from obstruction, equipment and server failure.

First aspect present invention provides a kind of method for being encoded to the input data in code word, the code word It is the product as vector u ＇ and matrix A and obtains, the vector u ＇ includes symbol u ＇_i=0, whereinRest positionIncluding the input data, the matrix A is represented byWherein B is displacement square Battle array, F₀, F₁... ..., F_m-1It is GF (2^μ) on non-equivalent replacement be diagonal matrix l_i×l_iA matrix, wherein：

SetIncluding integer s, wherein

SetIncluding integer

So thatOr

SetIncluding integer

Above-mentioned GF (2^μ) represent order 2^μFinite field, wherein μ is natural number.

SetDifferent selections different application scene is advantageous：

● there is l symbol c in order to ensure each_il..., c_(i+1)l-1Block in can the erasing of local recovery r, can will Integer s is coveredIn, wherein

● it, can be by integer in order to ensure coding can correct any combinations of d-1 erasing (i.e. equipment fault)It coversIn so that

● it, can be by integer in order to ensure coding can correct any combinations of ρ server failureIt coversIn.

● it is limited in order to ensure the upper limit of data loss probability by some predefined value π, it can be by integerIt coversIn so thatHere p is The probability of server (node) failure in given interval (no to replace), and

The method of first aspect can use polar code.These polar codes are by matrix The generation of some rows, wherein B is and mappingCorresponding permutation matrix, 0≤ s_i＜ l_i, F_iIt is GF (2^μ) on non-equivalent replacement be diagonal matrix l_i×l_iA matrix.

Therefore, it is proposed to a kind of side being stored in using polar code coded data and by encoded radio on the element of storage system Method and the effective ways for realizing coding and decoding operation.

In an embodiment of first aspect, a kind of side for being encoded to the data in storage system is provided Method, wherein, place data into code word c and be claimed as in known selected location, the position of checking symbol be claimed as it is unknown, Recurrence determines the value of unknown symbols, to obtain code wordWherein u ＇ are Vector,c_iIt is stored on i-th of node (disk in server or server).

In embodiments of the present invention, it can obtain in several of different ways and delete the coding of attribute with identical entangle.For example, It can be to matrix F_iRow be multiplied by arbitrary nonzero value into line replacement, and by these row.Additionally, there are many different modes to divide With the information symbol and checking symbol in code word.

In the first realization method of the method according to first aspect, the method is for restoring to have wiped data Method, the method are based on the matrix F_iThe encoding scheme of corresponding node F_i, the method includes：

It is unknown that the one or more of the code word has been wiped sign flag, and the one or more of the code word is not wiped Divided-by symbol be labeled as it is known that

By one or more symbol v_iLabeled as it is known that wherein

For any node F_i in the encoding scheme, if its t output symbol labeled as unknown, by it most T incoming symbol above labeled as unknown, unless they have been labeled as it is known；

If node F_i has t known incoming symbols and t unknown output symbols, remaining output symbol is marked To be unknown, unless they have been labeled as it is known；

Following operation is repeated, until all sign flags of the code word are known：

If node has t known incoming symbols, t unknown output symbols and l-t known output symbols, lead to The local decoding crossed at node restores unknown output symbol, and by all output symbols labeled as known；

If all output symbols of node be all it is known that if calculate Unknown worm symbol and by them labeled as known.

Therefore, the method for the first realization method provides a kind of effective erasing restoration methods.Particularly, erasing restores Task can be reduced to multiple local decoding tasks, there are efficient implementations outlined further below thus.

In second of realization method of the method according to first aspect, the method is for system coding and obtains institute The method for stating code word, including following initial step：

Select the Data Position of the code word forWherein

The input data is put into the Data Position of the code word, and the rest position of the code word is marked To have wiped；

Data have been wiped using above method recovery.

This present a kind of particularly effective system coding modes, this is extremely important to real system, and input data should A part as code word occurs.

In the third realization method of the method according to first aspect, the method further includes using first aspect A kind of method of realization method restores the step of symbol in the remainder codewords position.

What is proposed ensures that described erasing restoration methods are always extensive for the method that selects Data Position in code word All checking symbols of the multiple code word.

In the 4th kind of realization method of the method according to first aspect, the local decoding includes：

Use known incoming symbol S_i, for unknown output symbolSolve equation Wherein B=F^-1；

Once all output symbols all become it is known that once calculate Unknown worm symbol

In the 5th kind of realization method of the method according to first aspect, the local decoding is embodied as：

Calculating unknown output symbol is

Wherein Γ (x) ≡ S (x) Λ (x) mod x^t；

Once all output symbols be all it is known that once calculate Unknown worm symbol

This has the following advantages：It is O (t that complexity, which can be used,²) low complex degree multinomial assessment algorithm, without the use of It is O (t for solving complexity³) system of linear equations common Gaussian null method.

In the 6th kind of realization method of the method according to first aspect, the matrix F₀, F₁... ..., F_m-1It is selected asWherein α_jIt is arranged in cyclotomic cosets sequence.

This provides the practical realization of the method for the 5th aspect.

In the 7th kind of realization method of the method according to first aspect, Fast Fourier Transform (Fast is used Fourier Transform, abbreviation FFT) unknown-value is determined from given value.

Specifically, as required by above-mentioned local coding/decoding method, FFT can be used for assessing multinomial, be held so as to provide one kind The local decoded particularly effective mode of row.

In the 8th kind of realization method of the method according to first aspect, the method, which further includes, is less than length coding dimension DegreeInput data carry out code segment the step of.

Therefore, code segment can be carried out until unknown symbols can be restored.As long as there is the arrival of other data, coding can To restart.

In the case where giving some target data losing probabilities, in order to maximize the load bearing capacity of storage system, need Long codes.But compared with the data volume that application can once generate, the dimension of these codings may be too high.Therefore, the 8th kind Realization method provides a kind of delay coding method, and this method can be used for generating for it when reaching in small block data Checking symbol is accumulated until in order to generate entire code word until sufficient amount of data.

The method of 8th kind of realization method all code-word symbols can be initially appointed as using the concept it is unknown, and by this A little information symbols are put into appropriate location, once they reach just they are appointed as known to.Then above system volume can be performed Code algorithm, due to lacking known symbol, these algorithms may stop, and in known symbol once occurring opening again in certain points Begin.It should be noted that many equipment are less likely to break down in short time range, this needs to accumulate t data block.Therefore, exist Some checking symbols obtained during the incomplete execution of encryption algorithm may be enough to cope with this failure.As long as encryption algorithm is complete Into its execution, it is possible to entire checking symbol set is obtained, so that it is guaranteed that being directed to many equipment required by long term data storage The protection of failure.

Therefore, the method for the 8th kind of realization method provides processing by using the effective of the big coding and blockette provided Mode.

In the 9th kind of realization method of the method according to first aspect, the method also includes depositing on i-th of node The step of storing up i-th of element of the code word, wherein the node is the disk in server or server.

The second aspect of the present invention be related to it is a kind of for update according to the method for one of first aspect or its realization method into The method of the code word of row coding, the method includes：

It is expired by the sign flag of code word to be updated；

Store the new value of symbol to be updated；

By one or more checking symbols of the code word labeled as expired；

They are simultaneously labeled as unknown by the one or more new checking symbols of distribution；

Recurrence determines the value of one or more symbols that label is；

Wherein, when sign flag is known, by its preceding value labeled as expired；When code word all symbols to be known or When expired, erasing is labeled as expired one or more positions.

The realization method of second aspect is capable of providing a kind of number encoded for update according to the method for first aspect According to method, wherein notation declarations to be updated to be expired, new value are stored in equipment, old checking symbol was claimed as Phase distributes new checking symbol and original assertion is unknown.Then, as in an encoding process, recurrence determines unknown symbols Value.Herein, per next notation declarations for it is known when, store its preceding value block be all claimed as it is expired.When code word corresponds to All pieces when being all known or expired, wipe expired piece.

The third aspect of the present invention is related to a kind of storage control, for performing the side described in one of preceding claims Method.The controller can be realized by software or hardware (such as ASIC, FPGA).The controller can be directly connected to Storage device can be connected to storage device by network connection, for example, storage device is connected by another controller To network.

The fourth aspect of the present invention is related to a kind of computer readable storage medium for storing program code, said program code The instruction of method described in one of realization method including being used to perform first or second aspect or first or second aspect.

Description of the drawings

Technical characteristic in order to illustrate the embodiments of the present invention more clearly makes required in being described below to embodiment Attached drawing is briefly described.The accompanying drawings in the following description is only some embodiments of the present invention, these embodiments are not In the case of violating the present invention such as protection domain defined in claims, it can modify.

Fig. 1 is the block diagram for showing coder structure according to embodiments of the present invention；

Fig. 2A to 2D is the block diagram for the processing step for showing erasing restoration methods according to another embodiment of the present invention；

Fig. 3 is the schematic diagram for showing the method according to another embodiment of the present invention for delay coding；

Fig. 4 is the schematic diagram for showing the method according to another embodiment of the present invention for being used to update code word.

Specific embodiment

In the first embodiment, polar code is by matrixSome rows generation, wherein B It is and mappingCorresponding permutation matrix, F_iIt is GF (2^μ) On non-equivalent replacement be diagonal matrix l_i×l_iA matrix.That is, it is c=vA that non-system code operation, which performs, wherein v is long It spends and isVector, in positionLocate to be 0, there is information symbol u at rest position_i.Here for convenience Assuming that u_iAnd c_iIt is GF (2^μ) element, although actually these may be GF (2^μ) a value column vector (block).SetIt will claim To freeze assemble of symbol.For simplicity, it will be assumed below that F_i=F, although the construction proposed is general and can use In l_iAny combinations of a value.

Fig. 1 shows encoding scheme according to embodiments of the present invention.System 100 includes a group node 102 to 112, wherein The vector of input value (left side input) and the multiplication of matrix F are implemented by each node that " F " is represented, as a result passed via right side termination It passs.The symbol of coding can be stored in the following manner：

1. by the storage of each symbol on their own devices, wherein the output of each " F " node in the layer of right side is deposited Storage is in individual equipment group (for example, in a server).

2. each symbol is stored in the block of their own, and by the output storage of each node on the same device, l The output of adjacent node is stored in individual server.

It should be appreciated that other mappings in code-word symbol to storage device are also possible.For the sake of specific, method 2 will hereinafter consider.It should be noted that any piece of failure can all cause the data being stored thereon unavailable.This can lead to phase The code-word symbol answered is wiped free of.

But this general construction needs to provide specific matrix F and freezes channel set for searchingMethod. In the environment of storage system, using byGiven Reed-Solomon kernel is advantageous, wherein α_i, 0≤i ＜ l are GF (2^μ) some different elements.About selection α_iMore details will be given below.

It should be noted that for Reed-Solomon kernel, the last i rows generation (l, i, l-i+1) of F, 1≤i≤l is compiled Code.This allows one to construction one as described below and freezes channel set

1. in order to ensure in each block for having l symbol can the erasing of local recovery r, by integer s, 0≤s ＜ rl^m-1Packet Containing in F.For example, this allows people's local recovery failure in every server in the case where not accessing network to set It is standby.

2. in order to ensure coding can correct any combinations of d-1 erasing (i.e. equipment fault), by integerIt coversIn so that

3. in order to ensure coding can correct any combinations of ρ server failure, by integer t+ls, 0≤t ＜ l, 0≤s ＜ ρ l^m-2It coversIn.

4. the upper limit in order to ensure data loss probability is limited by some predefined value π, by integer It coversIn so thatHere p is that given interval (replace by nothing Change) in server (node) failure probability, and

In the example depicted in fig. 1, one group of 4 channel is frozen, and is represented with drawing reference numeral 120.Input data is provided to make For a group code u₀To u₄, represented with drawing reference numeral 122.Output symbol c₀To c₈It is represented with drawing reference numeral 124.

Wipe restoration methods

The embodiment of the present invention, which proposes, to be carried out entangling the algorithm deleted in the following code word for polar code：

1. it is unknown, and it is known that will not wipe sign flag by all code character labelled notations of having wiped.

2. if symbol v_iCorrespondence freezes channel, then is marked as known.

3. for any node F in encoding scheme, if its t output symbol is unknown, by its uppermost t Incoming symbol is also labeled as unknown (unless they have been labeled as known).

4. if node F has t known incoming symbols and t unknown output symbols, remaining output symbol is marked To be unknown, unless they have been labeled as it is known.

5. following operation is repeated, until all code-word symbols all become known：

A. if node has t known incoming symbols, t unknown output symbols and l-t known output symbols, lead to The local decoding (seeing below) crossed at node restores unknown symbols.By all output symbols labeled as known.

B. if all output symbols of node be all it is known that if calculate Unknown worm symbol and by them labeled as Know.

Fig. 2A to 2D presents the example of the application of above-mentioned erasing restoration methods.Example system include with drawing reference numeral 202, 204th, 206,208 and 210 five nodes represented.Unknown-value is shown in broken lines, it is known that is worth and is shown with uninterrupted line.

Fig. 2A shows initial situation.Symbol c₀、c₁、c₃And c₆It has been wiped free of and labeled as unknown.First node 202 First input is corresponding to freeze channel, and therefore also labeled as known.The first of the node 206,208,210 of third, the 4th and the 5th Input is also corresponding to freeze channel, therefore also labeled as known.

Third node 206 there are two unknown output symbol, therefore according to it is above-mentioned it is regular 3), the second input is not labeled as yet Know.

In the first processing step, according to above-mentioned rule 5a), it is carried out at 208 and the 5th node 210 of fourth node local Decoding.

Therefore, their output c₃And c₆It can calculate and labeled as known.Therefore, according to above-mentioned rule 5b), Section four Point 208 and the 5th node 210 all outputs be all it is known that and their input can mark and be.Above-mentioned first processing Rule 5a is applied in step) and situation 5b) show in fig. 2b.

Then, can be according to regular 5a) calculate first node 202 uppermost output symbol.Subsequent situation is being schemed It is shown in 2C.

Finally, can the output symbol c of third node 206 be calculated by local decoding₀And c₁.Then, such as Fig. 2 D institutes Show, all symbol c₀To c₈All it is known.

It should be noted that most typical fault mode only includes primary erasing.It, can be with if code Design is rationally (i.e. r >=1) Restore erasing in the single iteration of this algorithm, without transmitting information among the nodes.

Local decoding at node can proceed as described below.If S_iAnd c_jIncoming symbol and output symbol are represented respectively.

1. known incoming symbol Si is used, for unknown output symbolSolve equation Wherein B=F^-1。

2. once all output symbols all become it is known that once calculate Unknown worm symbolIfWherein K is known output symbol index set.Then it obtains

Restore unknown-value c_jTask may be considered for verification sub-vectorUtilize school Test matrix H=(B_ji), what 0≤j ＜ l, 0≤i ＜ t were encoded entangles the problem of deleting decoding, calculatesTask may be considered The task of syndrome vectorial evaluation.It should be noted that syndrome vectorial evaluation can be carried out using cyclotomy Fast Fourier Transform.

It should be noted that in the case of Reed-Solomon kernel B is proved to be Vandermonde matrix, syndrome assessment is reduced to Calculate t component of the Discrete Fourier Transform of vector C.Furthermore, it is possible to constructing has what is given by the element of verification sub-vector The multinomial S (x) of coefficient.Then having wiped the value of symbol can be obtained by Forney formula：

Wherein

Γ(x)≡S(x)Λ(x)mod x^t 3)

It is to wipe evaluator multinomial, s_jIt is the position for having wiped symbol,It is erasing assessment Device multinomial, Λ ＇ (x) are its form derivatives.These expression formulas can be used for obtaining complexity as O (t²) wiped symbol Value, and without solving equation group (1).

Rapid system encodes

Practical storage system needs system coding, i.e., such coding method so that one as code word of information symbol It separates existing.

It is above-mentioned entangle delete decoding algorithm can be used for realize system coding.For this purpose, information symbol can be placed in code word PositionWhereinBy remaining sign flag to have wiped, and perform Above-mentioned coding method.

It can be advantageous to selective value α_iSo as toForm one group of conjugate elements, i.e. Λ (x) is have binary coefficient more Item formula.At this point it is possible to it is carried out pair using the inverse cyclotomy fast Fourier transformation algorithm for needing O (tlog μ) multiplication's Assessment.In addition, calculating Γ (x) from (3) does not need to any multiplication.It should be noted that if μ is 2 power, such selection may 's.

For example, it is contemplated that for l=3, the GF (2 of m=2 constructions²) on coding situation.R=1 symbol is introduced in local It can restore and the requirement that can restore (i.e. d=3) is combined in any two erasing.This means thatIt needs comprising element 0,1,2,3. It is further assumed that the probability in specified time interval (such as 1 year) internal memory devices failure is p=0.01, and require year Data loss probability is no more than 10^-4.Obtain P (0.01,1)=0.0297, P (0.01,2)=2.9810^-4, P (0.01,3)= 10^-5And following valueSequence：.86e-1,.89e-3,.3e-5,.26e-2, .27e-6,.3e-11,.26e-4,.26e-10,.1e-17

In addition to first four value is (because they are already contained inIn) except all these values summation be 2.610^-4, So do not need to freeze other symbols, that is, can setThis can lead to non-system code device structure.

IfWherein α is x²The primitive root of+x+1.

It should be noted that

In order to realize system coding, data to be encoded are placed on symbol c₂, c₄, c₅, c₇, c₈In, and mark corresponding symbol Number it is known.Code-word symbol c₀, c₁, c₃, c₆It is known.It should be noted that the 0th input of all F nodes of most right layer accords in Fig. 1 Number it is set as 0, that is, it is known.Therefore, above-mentioned encryption algorithm is first had to for unknown symbols c_3i, the solution equatioies 0 of 0≤i ＜ 3 =B₀₀c_3i+B₁₀c_3i+1+B₂₀c_3i+2.This can be completed immediately.

It needs to solve this equation group to restore c₀, c₁.Can use the above method based on Forney formula without the use of Gauss eliminates.Define Λ (x)=(1- α⁰x)(1-α¹X)=1+ (α+1) x+ α x².It can be seen that Λ ' (x)=α+1+2 α x=α+ 1..Calculate Γ (x)=(S '₀+S′₁x)Λ(x)mod x², obtain

It should be noted that matrix can also be usedOr equivalently place data into symbol c₀, c₄, c₅, c₇, c₈In.

At this point it is possible to obtain Λ (x)=(1- α¹x)(1-α²X)=1+x+x².Multiply at this point, calculating Γ (x) and not needing to be any Method, and checking symbol will be by expression formula It provides.It should be noted that (Γ₀+Γ₁It can α) calculate once, and be reused in the two expression formulas.

Delay coding

Fig. 3 and Fig. 4 shows the embodiment for postponing the method for encoding and updating coding codeword.In figs. 3 and 4, Assuming that checking symbol is located at position 3,7,11,14 and 15.

Fig. 3 shows the method for postponing coding.Position 0,1 and 2 includes the data symbol x of previous coding₀、x₁And x₂。 Included with the position 3 that drawing reference numeral 300 represents labeled as unknown parity character p₀。

In the first processing step S10, the step of performing encryption algorithm, parity character p is calculated₀, store it in position 3 In, and labeled as it is known that as shown in drawing reference numeral 310.

In second processing step S12, new data symbol x₃、x₄And x₅It reaches.They are stored in position 4,5 and 6, And labeled as it is known that as shown in drawing reference numeral 320,322 and 324.

In third processing step S14, another step of encryption algorithm is performed, calculates parity character p₁, and stored In position 7, as shown in drawing reference numeral 330.

In fourth process step S16, new data symbol x₆、x₇、x₈、x₉And x₁₀It reaches.They are stored in position 8,9 In 10, labeled as it is known that as shown in drawing reference numeral 340,342,344,346 and 348.

In the 5th processing step S18, another step of encryption algorithm is performed, calculates parity character p₂And p₃, stored In position 11 and 14, as shown in drawing reference numeral 330.

In the 6th processing step S20, another step of encryption algorithm is performed, calculates global parity character p₄。

The above method can be extended to realize that the part of information symbol updates.This is shown in FIG. 4.

In the first processing step S20 of update method, symbol x is updated₀, i.e., labeled as expired, with 410 table of drawing reference numeral Show；And store new symbol x_o', it is represented with drawing reference numeral 412.Corresponding checking symbol x₀' labeled as unknown, use drawing reference numeral 414 represent.

In second processing step S21, previous checking symbol p₀Labeled as expired, represented by drawing reference numeral 420, new Checking symbol p₀' labeled as it is known that being represented by drawing reference numeral 422.

In third processing step S22, checking symbol is updated, stores the checking symbol rather than old checking symbol.This It is represented by drawing reference numeral 430,432 and 434.

For this purpose, the new value of some other information symbols in the block can be stored in the storage device identical with old value, And respectively by the old value for storing information symbol and the block being newly worth labeled as expired and known.In addition, block should be distributed to store school The updated value of symbol is tested, and by them labeled as unknown.Then above-mentioned encryption algorithm should be performed, is become until all unknown pieces Know.When calculating the new value of checking symbol every time, store the block of old value should be labeled as it is expired.Calculating all new of checking symbol After value, corresponding expired piece should be discharged.If should be noted that certain equipment break down before the completion of above-mentioned renewal process, Still it can restore corresponding data using stale data block.

Realize that this method needs to safeguard a catalogue, which stores reality corresponding with slitting and expired piece of address And their state (known/unknown/expired).

To sum up, it is used for data are encoded in the storage system with polar code the present invention provides a kind of Method, this method include searching the method for polar code parameter, system coding algorithm, postponing coding method and for part Update the method for coded data.

The embodiment of the present invention encodes the data in distributed memory system using polar code, and provides one kind and be used for Its method constructed, realizes local data recovery, protection for the block of given quantity, disk and server failure and It encodes and entangles for it and delete decoded fast algorithm.Further it is proposed that the skill of system can be write data into fritter Art, and provide a kind of efficient implementation of part update operation.With the Reed-Saloman used in such as HDFS-RAID The existing methods such as coding are compared, this is capable of providing following one or more advantages：

1. server internal fault disk can be stored in local recovery in the case where not accessing any other server On data, so as to reduce network flow.

It 2. can be by accessing at most l_iIt is a to survive server to restore the data in failed server, so as to avoid in weight Expensive network data transmission during building the stage.HDFS-RAID does not provide this characteristic.

3. it is able to carry out delay coding, that is, it is encoded once small block data reaches, so as to avoid buffering.It should If note that having used buffering, if system is collapsed before disk is written in data, data may lose.Therefore, it is carried The method haveing reduces due to the probability of loss of data caused by this failure.

4. in addition, by postponing the calculating of some check blocks until system load becomes sufficiently low, can be born with EQUILIBRIUM CALCULATION FOR PROCESS The time of load and the response speed for therefore improving whole system.

5. high efficient coding and update can be carried out to small block data, enabling within the storage system using long codes.This Allow for increasing under fixed data loss probability the load bearing capacity of system.

6. the complexity of the system coding algorithm for polar code proposed is given by O (nlogn), wherein n is code length. For pyramidal situation, issued so far only to complexity is O (n²) algorithm encoded odd even segmentation It is encoded with EvenOdd/RDP.High encoder complexity limits the practical application to these codings with smaller length n.

7. the method proposed makes it possible in field GF (q), q >=maxl_iIt is upper construction coding, and based on odd even segmentation and The construction of pyramid coding needs field size q >=n.This causes the complexity of arithmetical operation to reduce.

8. what is proposed carries out data by polar code using Reed-Solomon kernel the fast algorithm of system coding It is also applied for the telecommunication system using corresponding polar code.

These advantages to construct large-scale distributed Fault-tolerant storage systems.

All the above description is only embodiments of the present invention, and the range that the present invention is protected is not limited to that.This Field technology personnel can easily make any change or replacement.Therefore, protection scope of the present invention should be wanted with appended right Subject to the protection domain asked.

Claims

1. one kind is used for the method encoded to the input data (122) in code word (124), which is characterized in that the code word It is the product as vector u ＇ and matrix A and obtains, the vector u ' includes symbol u '_i=0, whereinRest positionIncluding the input data, the matrix A is represented byWherein B is displacement square Battle array, F₀, F₁... ..., F_m-1It is GF (2^μ) on non-equivalent replacement be diagonal matrix l_i×l_iA matrix, wherein：

SetIncluding integer s, wherein

SetIncluding integer

So thatOr

SetIncluding integer

2. according to the method described in claim 1, it is characterized in that, the method be for restore wiped data method, The method is based on the matrix F_iThe encoding scheme of corresponding node F_i (102-112,202-210), the method includes：

It is unknown that the one or more of the code word has been wiped sign flag, and the one or more of the code word is not wiped symbol Labelled notation be it is known that

By one or more symbol v_iLabeled as it is known that wherein

For any node F_i in the encoding scheme, if its t output symbol labeled as unknown, by its top T incoming symbol labeled as unknown, unless they have been labeled as it is known；

If node F_i has t known incoming symbols and t unknown output symbols, by remaining output symbol labeled as not Know, unless they have been labeled as it is known；

If node has t known incoming symbols, t unknown output symbols and l-t known output symbols, pass through node The local decoding at place restores unknown output symbol, and by all output symbols labeled as known；

3. according to the method described in claim 1, it is characterized in that, the method is for system coding and obtains the code word Method, including following initial step：

Select the Data Position of the code word forWherein

The input data is put into the number of the code word It is labeled as having wiped according to position, and by the rest position of the code word.

4. according to the method described in claim 3, restore institute using the method described in claim 2 it is characterized in that, further including The step of stating the symbol in remainder codewords position.

5. according to the method described in claim 2, it is characterized in that, the local decoding includes：

6. according to the method described in claim 5, it is characterized in that, the local decoding includes：

Calculating unknown output symbol is

Wherein Γ (x) ≡ S (x) Λ (x) mod x^t；

7. the method according to one of preceding claims, which is characterized in that the matrix F₀, F₁... ..., F_m-1It is selected asWherein α_jIt is arranged in cyclotomic cosets sequence.

8. the method according to any one of claim 2 to 7, which is characterized in that use Fast Fourier Transform (Fast Fourier Transform, abbreviation FFT) unknown-value is determined from given value.

9. the method according to one of preceding claims, which is characterized in that further include and coding dimension is less than to lengthData carry out code segment the step of.

10. the method according to one of preceding claims, which is characterized in that also include on i-th of node described in storage The step of i-th of element of code word, wherein the node is the disk in server or server.

11. a kind of method of code word encoded for method of the update according to one of preceding claims, feature It is, the method includes：

It is expired by the sign flag of code word to be updated；

Store the new value of symbol to be updated；

By one or more checking symbols of the code word labeled as expired；

Distribute one or more new checking symbols of the code word and by them labeled as unknown；

Recurrence determines the value of one or more symbols that label is；

Wherein, when sign flag is known, by its preceding value labeled as expired；When all symbols of code word are known or expired When, erasing is labeled as expired one or more positions.

12. a kind of storage control, for performing the method described in one of preceding claims.

13. a kind of computer readable storage medium for storing program code, said program code includes requiring 1 for perform claim To the instruction of the method described in any one of 10.