Nothing Special   »   [go: up one dir, main page]

CN108141228A - The coding of distributed memory system - Google Patents

The coding of distributed memory system Download PDF

Info

Publication number
CN108141228A
CN108141228A CN201580083657.1A CN201580083657A CN108141228A CN 108141228 A CN108141228 A CN 108141228A CN 201580083657 A CN201580083657 A CN 201580083657A CN 108141228 A CN108141228 A CN 108141228A
Authority
CN
China
Prior art keywords
symbol
code word
labeled
unknown
symbols
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201580083657.1A
Other languages
Chinese (zh)
Inventor
彼得·弗拉基米罗维奇·特里福诺夫
邵云峰
王元钢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN108141228A publication Critical patent/CN108141228A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/03Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
    • H03M13/05Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits
    • H03M13/13Linear codes
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/37Decoding methods or techniques, not specific to the particular type of coding provided for in groups H03M13/03 - H03M13/35
    • H03M13/373Decoding methods or techniques, not specific to the particular type of coding provided for in groups H03M13/03 - H03M13/35 with erasure correction and erasure determination, e.g. for packet loss recovery or setting of erasures for the decoding of Reed-Solomon codes

Landscapes

  • Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Error Detection And Correction (AREA)

Abstract

The present invention relates to a kind of methods for being encoded to the input data in code word.The code word is the product as vector u ' and matrix A and obtains that the vector u ' includes symbol, (I), rest position (II) includes the input data, the matrix A is represented by (III), wherein B be permutation matrix, F0, F1... ..., Fm‑1It is GF (2μ) on non-equivalent replacement be diagonal matrix li×liA matrix, wherein:SetInclude the integer s of band (IV), setIncluding integer (V) so that (VI) or setIncluding integer (VII).

Description

The coding of distributed memory system
Technical field
The present invention relates to the method for the input data in coding codeword and the methods for updating code word.The present invention Further relate to storage control and computer readable storage medium.Moreover, it relates to a kind of calculating for storing program code Machine readable storage medium storing program for executing, the program code include the instruction for performing one of the above method.
Background technology
By NsIn the information storage system of platform server composition, due to many differences, every equipped with NdA storage Equipment, server, disk and block server may break down at any time or interim off line.In order to ensure what is stored in system Information is continuously available, and multiple data copies (Google file system and Hadoop points can be stored on different server/disk This method is used in cloth file system).But this can greatly increase memory requirement and integral device cost.It is another Solution is using certain type of erasure code, i.e., a block number is divided into k block of information (symbol) according to (slitting), is it Calculate n-k odd even (verification) block (symbol), and these blocks are stored in different disk and server.If any of which Respective symbol can then be considered as and wipe by one failure, and attempt by corresponding encoded entangle delete decoding come restore lose Block.
Many correcting and eleting codes for being used for network store system have been proposed, including:Reed Solomon Coding, pyramid are compiled Code, EvenOdd and RDP codings, odd even partition encoding and Zigzag codings.(n, k, n-k+1) Reed Solomon Coding provides For the protection arbitrarily combined of up to n-k erasing, therefore there is minimum possibility redundancy.But restore any and wiped Divided-by symbol is required for accessing at least k survival symbol.Pyramid and odd even partition encoding are provided by accessing at most l < k It is non-to wipe symbol to restore the ability largely wiped.This is realized using higher coding redundancy as cost.Substantially, these structures Make is to be encoded by introducing additional checking symbol in the codeword from some maximum distance separables (for example, Reed-Solomon is compiled Code) in obtain, wherein additional checking symbol is only dependent upon some subsets of information symbol.
It is poised for battle row coding (such as EvenOdd, RDP and Zigzag are encoded) by vector alphabet to be defined.This causes people Can design high efficient coding algorithm, and reduce and pass through the data volume of network transmission in the case where wiping recovery situation.But for The arbitrary value of coding dimension k and redundancy n-k are not still used for the construction of these codings clearly with effective method.
The performance of network store system depends on the clothes of the traffic and access generated during coding and reconstruction operation The computation complexity of business device quantity, disk access rate and related algorithm.The major issue occurred in such systems is: Using tending to write data into relatively small number of piece of less than k block composition.This needs to implement buffering, i.e., arrives data accumulation Somewhere, until being collected into sufficient amount of data.Since event may occur for the storage device itself for being used to buffer Barrier, this method may result in loss of data.In addition, application may need certain information in the slitting that update stores before Block.In order to keep the consistency of check block, need to obtain the old value of its old value and block of information, calculate their difference, and more It is new they to reflect new data.This is related to many input/output and network transmission operation, can seriously reduce system performance.
If entangle and deleting decoding, server should be accessed as few as possible, because each network data transmission can Cause very high performance compensatory.This problem is solved by constructing local decodable code coding, such as pyramid and odd even point Cut coding.But more checking symbols, part are encoded than comparable maximum distance separable since these codings have Update can bring higher performance cost.Therefore, the method has been only used for immutable data.
Invention content
The purpose of the present invention is to provide the method for coded data and the method for updating code word, wherein the side Method overcomes the problem of one or more of the prior art is mentioned above.
Particularly, the purpose of the present invention can include ensuring that the availability of data in distributed memory system, the distribution Storage system can suffer from obstruction, equipment and server failure.
First aspect present invention provides a kind of method for being encoded to the input data in code word, the code word It is the product as vector u ' and matrix A and obtains, the vector u ' includes symbol u 'i=0, whereinRest positionIncluding the input data, the matrix A is represented byWherein B is displacement square Battle array, F0, F1... ..., Fm-1It is GF (2μ) on non-equivalent replacement be diagonal matrix li×liA matrix, wherein:
SetIncluding integer s, wherein
SetIncluding integer
So thatOr
SetIncluding integer
Above-mentioned GF (2μ) represent order 2μFinite field, wherein μ is natural number.
SetDifferent selections different application scene is advantageous:
● there is l symbol c in order to ensure eachil..., c(i+1)l-1Block in can the erasing of local recovery r, can will Integer s is coveredIn, wherein
● it, can be by integer in order to ensure coding can correct any combinations of d-1 erasing (i.e. equipment fault)It coversIn so that
● it, can be by integer in order to ensure coding can correct any combinations of ρ server failureIt coversIn.
● it is limited in order to ensure the upper limit of data loss probability by some predefined value π, it can be by integerIt coversIn so thatHere p is The probability of server (node) failure in given interval (no to replace), and
The method of first aspect can use polar code.These polar codes are by matrix The generation of some rows, wherein B is and mappingCorresponding permutation matrix, 0≤ si< li, FiIt is GF (2μ) on non-equivalent replacement be diagonal matrix li×liA matrix.
Therefore, it is proposed to a kind of side being stored in using polar code coded data and by encoded radio on the element of storage system Method and the effective ways for realizing coding and decoding operation.
In an embodiment of first aspect, a kind of side for being encoded to the data in storage system is provided Method, wherein, place data into code word c and be claimed as in known selected location, the position of checking symbol be claimed as it is unknown, Recurrence determines the value of unknown symbols, to obtain code wordWherein u ' are Vector,ciIt is stored on i-th of node (disk in server or server).
In embodiments of the present invention, it can obtain in several of different ways and delete the coding of attribute with identical entangle.For example, It can be to matrix FiRow be multiplied by arbitrary nonzero value into line replacement, and by these row.Additionally, there are many different modes to divide With the information symbol and checking symbol in code word.
In the first realization method of the method according to first aspect, the method is for restoring to have wiped data Method, the method are based on the matrix FiThe encoding scheme of corresponding node F_i, the method includes:
It is unknown that the one or more of the code word has been wiped sign flag, and the one or more of the code word is not wiped Divided-by symbol be labeled as it is known that
By one or more symbol viLabeled as it is known that wherein
For any node F_i in the encoding scheme, if its t output symbol labeled as unknown, by it most T incoming symbol above labeled as unknown, unless they have been labeled as it is known;
If node F_i has t known incoming symbols and t unknown output symbols, remaining output symbol is marked To be unknown, unless they have been labeled as it is known;
Following operation is repeated, until all sign flags of the code word are known:
If node has t known incoming symbols, t unknown output symbols and l-t known output symbols, lead to The local decoding crossed at node restores unknown output symbol, and by all output symbols labeled as known;
If all output symbols of node be all it is known that if calculate Unknown worm symbol and by them labeled as known.
Therefore, the method for the first realization method provides a kind of effective erasing restoration methods.Particularly, erasing restores Task can be reduced to multiple local decoding tasks, there are efficient implementations outlined further below thus.
In second of realization method of the method according to first aspect, the method is for system coding and obtains institute The method for stating code word, including following initial step:
Select the Data Position of the code word forWherein
The input data is put into the Data Position of the code word, and the rest position of the code word is marked To have wiped;
Data have been wiped using above method recovery.
This present a kind of particularly effective system coding modes, this is extremely important to real system, and input data should A part as code word occurs.
In the third realization method of the method according to first aspect, the method further includes using first aspect A kind of method of realization method restores the step of symbol in the remainder codewords position.
What is proposed ensures that described erasing restoration methods are always extensive for the method that selects Data Position in code word All checking symbols of the multiple code word.
In the 4th kind of realization method of the method according to first aspect, the local decoding includes:
Use known incoming symbol Si, for unknown output symbolSolve equation Wherein B=F-1
Once all output symbols all become it is known that once calculate Unknown worm symbol
In the 5th kind of realization method of the method according to first aspect, the local decoding is embodied as:
Calculating unknown output symbol is
Wherein Γ (x) ≡ S (x) Λ (x) mod xt
Once all output symbols be all it is known that once calculate Unknown worm symbol
This has the following advantages:It is O (t that complexity, which can be used,2) low complex degree multinomial assessment algorithm, without the use of It is O (t for solving complexity3) system of linear equations common Gaussian null method.
In the 6th kind of realization method of the method according to first aspect, the matrix F0, F1... ..., Fm-1It is selected asWherein αjIt is arranged in cyclotomic cosets sequence.
This provides the practical realization of the method for the 5th aspect.
In the 7th kind of realization method of the method according to first aspect, Fast Fourier Transform (Fast is used Fourier Transform, abbreviation FFT) unknown-value is determined from given value.
Specifically, as required by above-mentioned local coding/decoding method, FFT can be used for assessing multinomial, be held so as to provide one kind The local decoded particularly effective mode of row.
In the 8th kind of realization method of the method according to first aspect, the method, which further includes, is less than length coding dimension DegreeInput data carry out code segment the step of.
Therefore, code segment can be carried out until unknown symbols can be restored.As long as there is the arrival of other data, coding can To restart.
In the case where giving some target data losing probabilities, in order to maximize the load bearing capacity of storage system, need Long codes.But compared with the data volume that application can once generate, the dimension of these codings may be too high.Therefore, the 8th kind Realization method provides a kind of delay coding method, and this method can be used for generating for it when reaching in small block data Checking symbol is accumulated until in order to generate entire code word until sufficient amount of data.
The method of 8th kind of realization method all code-word symbols can be initially appointed as using the concept it is unknown, and by this A little information symbols are put into appropriate location, once they reach just they are appointed as known to.Then above system volume can be performed Code algorithm, due to lacking known symbol, these algorithms may stop, and in known symbol once occurring opening again in certain points Begin.It should be noted that many equipment are less likely to break down in short time range, this needs to accumulate t data block.Therefore, exist Some checking symbols obtained during the incomplete execution of encryption algorithm may be enough to cope with this failure.As long as encryption algorithm is complete Into its execution, it is possible to entire checking symbol set is obtained, so that it is guaranteed that being directed to many equipment required by long term data storage The protection of failure.
Therefore, the method for the 8th kind of realization method provides processing by using the effective of the big coding and blockette provided Mode.
In the 9th kind of realization method of the method according to first aspect, the method also includes depositing on i-th of node The step of storing up i-th of element of the code word, wherein the node is the disk in server or server.
The second aspect of the present invention be related to it is a kind of for update according to the method for one of first aspect or its realization method into The method of the code word of row coding, the method includes:
It is expired by the sign flag of code word to be updated;
Store the new value of symbol to be updated;
By one or more checking symbols of the code word labeled as expired;
They are simultaneously labeled as unknown by the one or more new checking symbols of distribution;
Recurrence determines the value of one or more symbols that label is;
Wherein, when sign flag is known, by its preceding value labeled as expired;When code word all symbols to be known or When expired, erasing is labeled as expired one or more positions.
The realization method of second aspect is capable of providing a kind of number encoded for update according to the method for first aspect According to method, wherein notation declarations to be updated to be expired, new value are stored in equipment, old checking symbol was claimed as Phase distributes new checking symbol and original assertion is unknown.Then, as in an encoding process, recurrence determines unknown symbols Value.Herein, per next notation declarations for it is known when, store its preceding value block be all claimed as it is expired.When code word corresponds to All pieces when being all known or expired, wipe expired piece.
The third aspect of the present invention is related to a kind of storage control, for performing the side described in one of preceding claims Method.The controller can be realized by software or hardware (such as ASIC, FPGA).The controller can be directly connected to Storage device can be connected to storage device by network connection, for example, storage device is connected by another controller To network.
The fourth aspect of the present invention is related to a kind of computer readable storage medium for storing program code, said program code The instruction of method described in one of realization method including being used to perform first or second aspect or first or second aspect.
Description of the drawings
Technical characteristic in order to illustrate the embodiments of the present invention more clearly makes required in being described below to embodiment Attached drawing is briefly described.The accompanying drawings in the following description is only some embodiments of the present invention, these embodiments are not In the case of violating the present invention such as protection domain defined in claims, it can modify.
Fig. 1 is the block diagram for showing coder structure according to embodiments of the present invention;
Fig. 2A to 2D is the block diagram for the processing step for showing erasing restoration methods according to another embodiment of the present invention;
Fig. 3 is the schematic diagram for showing the method according to another embodiment of the present invention for delay coding;
Fig. 4 is the schematic diagram for showing the method according to another embodiment of the present invention for being used to update code word.
Specific embodiment
In the first embodiment, polar code is by matrixSome rows generation, wherein B It is and mappingCorresponding permutation matrix, FiIt is GF (2μ) On non-equivalent replacement be diagonal matrix li×liA matrix.That is, it is c=vA that non-system code operation, which performs, wherein v is long It spends and isVector, in positionLocate to be 0, there is information symbol u at rest positioni.Here for convenience Assuming that uiAnd ciIt is GF (2μ) element, although actually these may be GF (2μ) a value column vector (block).SetIt will claim To freeze assemble of symbol.For simplicity, it will be assumed below that Fi=F, although the construction proposed is general and can use In liAny combinations of a value.
Fig. 1 shows encoding scheme according to embodiments of the present invention.System 100 includes a group node 102 to 112, wherein The vector of input value (left side input) and the multiplication of matrix F are implemented by each node that " F " is represented, as a result passed via right side termination It passs.The symbol of coding can be stored in the following manner:
1. by the storage of each symbol on their own devices, wherein the output of each " F " node in the layer of right side is deposited Storage is in individual equipment group (for example, in a server).
2. each symbol is stored in the block of their own, and by the output storage of each node on the same device, l The output of adjacent node is stored in individual server.
It should be appreciated that other mappings in code-word symbol to storage device are also possible.For the sake of specific, method 2 will hereinafter consider.It should be noted that any piece of failure can all cause the data being stored thereon unavailable.This can lead to phase The code-word symbol answered is wiped free of.
But this general construction needs to provide specific matrix F and freezes channel set for searchingMethod. In the environment of storage system, using byGiven Reed-Solomon kernel is advantageous, wherein αi, 0≤i < l are GF (2μ) some different elements.About selection αiMore details will be given below.
It should be noted that for Reed-Solomon kernel, the last i rows generation (l, i, l-i+1) of F, 1≤i≤l is compiled Code.This allows one to construction one as described below and freezes channel set
1. in order to ensure in each block for having l symbol can the erasing of local recovery r, by integer s, 0≤s < rlm-1Packet Containing in F.For example, this allows people's local recovery failure in every server in the case where not accessing network to set It is standby.
2. in order to ensure coding can correct any combinations of d-1 erasing (i.e. equipment fault), by integerIt coversIn so that
3. in order to ensure coding can correct any combinations of ρ server failure, by integer t+ls, 0≤t < l, 0≤s < ρ lm-2It coversIn.
4. the upper limit in order to ensure data loss probability is limited by some predefined value π, by integer It coversIn so thatHere p is that given interval (replace by nothing Change) in server (node) failure probability, and
In the example depicted in fig. 1, one group of 4 channel is frozen, and is represented with drawing reference numeral 120.Input data is provided to make For a group code u0To u4, represented with drawing reference numeral 122.Output symbol c0To c8It is represented with drawing reference numeral 124.
Wipe restoration methods
The embodiment of the present invention, which proposes, to be carried out entangling the algorithm deleted in the following code word for polar code:
1. it is unknown, and it is known that will not wipe sign flag by all code character labelled notations of having wiped.
2. if symbol viCorrespondence freezes channel, then is marked as known.
3. for any node F in encoding scheme, if its t output symbol is unknown, by its uppermost t Incoming symbol is also labeled as unknown (unless they have been labeled as known).
4. if node F has t known incoming symbols and t unknown output symbols, remaining output symbol is marked To be unknown, unless they have been labeled as it is known.
5. following operation is repeated, until all code-word symbols all become known:
A. if node has t known incoming symbols, t unknown output symbols and l-t known output symbols, lead to The local decoding (seeing below) crossed at node restores unknown symbols.By all output symbols labeled as known.
B. if all output symbols of node be all it is known that if calculate Unknown worm symbol and by them labeled as Know.
Fig. 2A to 2D presents the example of the application of above-mentioned erasing restoration methods.Example system include with drawing reference numeral 202, 204th, 206,208 and 210 five nodes represented.Unknown-value is shown in broken lines, it is known that is worth and is shown with uninterrupted line.
Fig. 2A shows initial situation.Symbol c0、c1、c3And c6It has been wiped free of and labeled as unknown.First node 202 First input is corresponding to freeze channel, and therefore also labeled as known.The first of the node 206,208,210 of third, the 4th and the 5th Input is also corresponding to freeze channel, therefore also labeled as known.
Third node 206 there are two unknown output symbol, therefore according to it is above-mentioned it is regular 3), the second input is not labeled as yet Know.
In the first processing step, according to above-mentioned rule 5a), it is carried out at 208 and the 5th node 210 of fourth node local Decoding.
Therefore, their output c3And c6It can calculate and labeled as known.Therefore, according to above-mentioned rule 5b), Section four Point 208 and the 5th node 210 all outputs be all it is known that and their input can mark and be.Above-mentioned first processing Rule 5a is applied in step) and situation 5b) show in fig. 2b.
Then, can be according to regular 5a) calculate first node 202 uppermost output symbol.Subsequent situation is being schemed It is shown in 2C.
Finally, can the output symbol c of third node 206 be calculated by local decoding0And c1.Then, such as Fig. 2 D institutes Show, all symbol c0To c8All it is known.
It should be noted that most typical fault mode only includes primary erasing.It, can be with if code Design is rationally (i.e. r >=1) Restore erasing in the single iteration of this algorithm, without transmitting information among the nodes.
Local decoding at node can proceed as described below.If SiAnd cjIncoming symbol and output symbol are represented respectively.
1. known incoming symbol Si is used, for unknown output symbolSolve equation Wherein B=F-1
2. once all output symbols all become it is known that once calculate Unknown worm symbolIfWherein K is known output symbol index set.Then it obtains
Restore unknown-value cjTask may be considered for verification sub-vectorUtilize school Test matrix H=(Bji), what 0≤j < l, 0≤i < t were encoded entangles the problem of deleting decoding, calculatesTask may be considered The task of syndrome vectorial evaluation.It should be noted that syndrome vectorial evaluation can be carried out using cyclotomy Fast Fourier Transform.
It should be noted that in the case of Reed-Solomon kernel B is proved to be Vandermonde matrix, syndrome assessment is reduced to Calculate t component of the Discrete Fourier Transform of vector C.Furthermore, it is possible to constructing has what is given by the element of verification sub-vector The multinomial S (x) of coefficient.Then having wiped the value of symbol can be obtained by Forney formula:
Wherein
Γ(x)≡S(x)Λ(x)mod xt 3)
It is to wipe evaluator multinomial, sjIt is the position for having wiped symbol,It is erasing assessment Device multinomial, Λ ' (x) are its form derivatives.These expression formulas can be used for obtaining complexity as O (t2) wiped symbol Value, and without solving equation group (1).
Rapid system encodes
Practical storage system needs system coding, i.e., such coding method so that one as code word of information symbol It separates existing.
It is above-mentioned entangle delete decoding algorithm can be used for realize system coding.For this purpose, information symbol can be placed in code word PositionWhereinBy remaining sign flag to have wiped, and perform Above-mentioned coding method.
It can be advantageous to selective value αiSo as toForm one group of conjugate elements, i.e. Λ (x) is have binary coefficient more Item formula.At this point it is possible to it is carried out pair using the inverse cyclotomy fast Fourier transformation algorithm for needing O (tlog μ) multiplication's Assessment.In addition, calculating Γ (x) from (3) does not need to any multiplication.It should be noted that if μ is 2 power, such selection may 's.
For example, it is contemplated that for l=3, the GF (2 of m=2 constructions2) on coding situation.R=1 symbol is introduced in local It can restore and the requirement that can restore (i.e. d=3) is combined in any two erasing.This means thatIt needs comprising element 0,1,2,3. It is further assumed that the probability in specified time interval (such as 1 year) internal memory devices failure is p=0.01, and require year Data loss probability is no more than 10-4.Obtain P (0.01,1)=0.0297, P (0.01,2)=2.9810-4, P (0.01,3)= 10-5And following valueSequence:.86e-1,.89e-3,.3e-5,.26e-2, .27e-6,.3e-11,.26e-4,.26e-10,.1e-17
In addition to first four value is (because they are already contained inIn) except all these values summation be 2.610-4, So do not need to freeze other symbols, that is, can setThis can lead to non-system code device structure.
IfWherein α is x2The primitive root of+x+1.
It should be noted that
In order to realize system coding, data to be encoded are placed on symbol c2, c4, c5, c7, c8In, and mark corresponding symbol Number it is known.Code-word symbol c0, c1, c3, c6It is known.It should be noted that the 0th input of all F nodes of most right layer accords in Fig. 1 Number it is set as 0, that is, it is known.Therefore, above-mentioned encryption algorithm is first had to for unknown symbols c3i, the solution equatioies 0 of 0≤i < 3 =B00c3i+B10c3i+1+B20c3i+2.This can be completed immediately.
It needs to solve this equation group to restore c0, c1.Can use the above method based on Forney formula without the use of Gauss eliminates.Define Λ (x)=(1- α0x)(1-α1X)=1+ (α+1) x+ α x2.It can be seen that Λ ' (x)=α+1+2 α x=α+ 1..Calculate Γ (x)=(S '0+S′1x)Λ(x)mod x2, obtain
It should be noted that matrix can also be usedOr equivalently place data into symbol c0, c4, c5, c7, c8In.
At this point it is possible to obtain Λ (x)=(1- α1x)(1-α2X)=1+x+x2.Multiply at this point, calculating Γ (x) and not needing to be any Method, and checking symbol will be by expression formula It provides.It should be noted that (Γ01It can α) calculate once, and be reused in the two expression formulas.
Delay coding
Fig. 3 and Fig. 4 shows the embodiment for postponing the method for encoding and updating coding codeword.In figs. 3 and 4, Assuming that checking symbol is located at position 3,7,11,14 and 15.
Fig. 3 shows the method for postponing coding.Position 0,1 and 2 includes the data symbol x of previous coding0、x1And x2。 Included with the position 3 that drawing reference numeral 300 represents labeled as unknown parity character p0
In the first processing step S10, the step of performing encryption algorithm, parity character p is calculated0, store it in position 3 In, and labeled as it is known that as shown in drawing reference numeral 310.
In second processing step S12, new data symbol x3、x4And x5It reaches.They are stored in position 4,5 and 6, And labeled as it is known that as shown in drawing reference numeral 320,322 and 324.
In third processing step S14, another step of encryption algorithm is performed, calculates parity character p1, and stored In position 7, as shown in drawing reference numeral 330.
In fourth process step S16, new data symbol x6、x7、x8、x9And x10It reaches.They are stored in position 8,9 In 10, labeled as it is known that as shown in drawing reference numeral 340,342,344,346 and 348.
In the 5th processing step S18, another step of encryption algorithm is performed, calculates parity character p2And p3, stored In position 11 and 14, as shown in drawing reference numeral 330.
In the 6th processing step S20, another step of encryption algorithm is performed, calculates global parity character p4
The above method can be extended to realize that the part of information symbol updates.This is shown in FIG. 4.
In the first processing step S20 of update method, symbol x is updated0, i.e., labeled as expired, with 410 table of drawing reference numeral Show;And store new symbol xo', it is represented with drawing reference numeral 412.Corresponding checking symbol x0' labeled as unknown, use drawing reference numeral 414 represent.
In second processing step S21, previous checking symbol p0Labeled as expired, represented by drawing reference numeral 420, new Checking symbol p0' labeled as it is known that being represented by drawing reference numeral 422.
In third processing step S22, checking symbol is updated, stores the checking symbol rather than old checking symbol.This It is represented by drawing reference numeral 430,432 and 434.
For this purpose, the new value of some other information symbols in the block can be stored in the storage device identical with old value, And respectively by the old value for storing information symbol and the block being newly worth labeled as expired and known.In addition, block should be distributed to store school The updated value of symbol is tested, and by them labeled as unknown.Then above-mentioned encryption algorithm should be performed, is become until all unknown pieces Know.When calculating the new value of checking symbol every time, store the block of old value should be labeled as it is expired.Calculating all new of checking symbol After value, corresponding expired piece should be discharged.If should be noted that certain equipment break down before the completion of above-mentioned renewal process, Still it can restore corresponding data using stale data block.
Realize that this method needs to safeguard a catalogue, which stores reality corresponding with slitting and expired piece of address And their state (known/unknown/expired).
To sum up, it is used for data are encoded in the storage system with polar code the present invention provides a kind of Method, this method include searching the method for polar code parameter, system coding algorithm, postponing coding method and for part Update the method for coded data.
The embodiment of the present invention encodes the data in distributed memory system using polar code, and provides one kind and be used for Its method constructed, realizes local data recovery, protection for the block of given quantity, disk and server failure and It encodes and entangles for it and delete decoded fast algorithm.Further it is proposed that the skill of system can be write data into fritter Art, and provide a kind of efficient implementation of part update operation.With the Reed-Saloman used in such as HDFS-RAID The existing methods such as coding are compared, this is capable of providing following one or more advantages:
1. server internal fault disk can be stored in local recovery in the case where not accessing any other server On data, so as to reduce network flow.
It 2. can be by accessing at most liIt is a to survive server to restore the data in failed server, so as to avoid in weight Expensive network data transmission during building the stage.HDFS-RAID does not provide this characteristic.
3. it is able to carry out delay coding, that is, it is encoded once small block data reaches, so as to avoid buffering.It should If note that having used buffering, if system is collapsed before disk is written in data, data may lose.Therefore, it is carried The method haveing reduces due to the probability of loss of data caused by this failure.
4. in addition, by postponing the calculating of some check blocks until system load becomes sufficiently low, can be born with EQUILIBRIUM CALCULATION FOR PROCESS The time of load and the response speed for therefore improving whole system.
5. high efficient coding and update can be carried out to small block data, enabling within the storage system using long codes.This Allow for increasing under fixed data loss probability the load bearing capacity of system.
6. the complexity of the system coding algorithm for polar code proposed is given by O (nlogn), wherein n is code length. For pyramidal situation, issued so far only to complexity is O (n2) algorithm encoded odd even segmentation It is encoded with EvenOdd/RDP.High encoder complexity limits the practical application to these codings with smaller length n.
7. the method proposed makes it possible in field GF (q), q >=maxliIt is upper construction coding, and based on odd even segmentation and The construction of pyramid coding needs field size q >=n.This causes the complexity of arithmetical operation to reduce.
8. what is proposed carries out data by polar code using Reed-Solomon kernel the fast algorithm of system coding It is also applied for the telecommunication system using corresponding polar code.
These advantages to construct large-scale distributed Fault-tolerant storage systems.
All the above description is only embodiments of the present invention, and the range that the present invention is protected is not limited to that.This Field technology personnel can easily make any change or replacement.Therefore, protection scope of the present invention should be wanted with appended right Subject to the protection domain asked.

Claims (13)

1. one kind is used for the method encoded to the input data (122) in code word (124), which is characterized in that the code word It is the product as vector u ' and matrix A and obtains, the vector u ' includes symbol u 'i=0, whereinRest positionIncluding the input data, the matrix A is represented byWherein B is displacement square Battle array, F0, F1... ..., Fm-1It is GF (2μ) on non-equivalent replacement be diagonal matrix li×liA matrix, wherein:
SetIncluding integer s, wherein
SetIncluding integer
So thatOr
SetIncluding integer
2. according to the method described in claim 1, it is characterized in that, the method be for restore wiped data method, The method is based on the matrix FiThe encoding scheme of corresponding node F_i (102-112,202-210), the method includes:
It is unknown that the one or more of the code word has been wiped sign flag, and the one or more of the code word is not wiped symbol Labelled notation be it is known that
By one or more symbol viLabeled as it is known that wherein
For any node F_i in the encoding scheme, if its t output symbol labeled as unknown, by its top T incoming symbol labeled as unknown, unless they have been labeled as it is known;
If node F_i has t known incoming symbols and t unknown output symbols, by remaining output symbol labeled as not Know, unless they have been labeled as it is known;
Following operation is repeated, until all sign flags of the code word are known:
If node has t known incoming symbols, t unknown output symbols and l-t known output symbols, pass through node The local decoding at place restores unknown output symbol, and by all output symbols labeled as known;
If all output symbols of node be all it is known that if calculate Unknown worm symbol and by them labeled as known.
3. according to the method described in claim 1, it is characterized in that, the method is for system coding and obtains the code word Method, including following initial step:
Select the Data Position of the code word forWherein
The input data is put into the number of the code word It is labeled as having wiped according to position, and by the rest position of the code word.
4. according to the method described in claim 3, restore institute using the method described in claim 2 it is characterized in that, further including The step of stating the symbol in remainder codewords position.
5. according to the method described in claim 2, it is characterized in that, the local decoding includes:
Use known incoming symbol Si, for unknown output symbolSolve equation Wherein B=F-1
Once all output symbols all become it is known that once calculate Unknown worm symbol
6. according to the method described in claim 5, it is characterized in that, the local decoding includes:
Calculating unknown output symbol is
Wherein Γ (x) ≡ S (x) Λ (x) mod xt
Once all output symbols be all it is known that once calculate Unknown worm symbol
7. the method according to one of preceding claims, which is characterized in that the matrix F0, F1... ..., Fm-1It is selected asWherein αjIt is arranged in cyclotomic cosets sequence.
8. the method according to any one of claim 2 to 7, which is characterized in that use Fast Fourier Transform (Fast Fourier Transform, abbreviation FFT) unknown-value is determined from given value.
9. the method according to one of preceding claims, which is characterized in that further include and coding dimension is less than to lengthData carry out code segment the step of.
10. the method according to one of preceding claims, which is characterized in that also include on i-th of node described in storage The step of i-th of element of code word, wherein the node is the disk in server or server.
11. a kind of method of code word encoded for method of the update according to one of preceding claims, feature It is, the method includes:
It is expired by the sign flag of code word to be updated;
Store the new value of symbol to be updated;
By one or more checking symbols of the code word labeled as expired;
Distribute one or more new checking symbols of the code word and by them labeled as unknown;
Recurrence determines the value of one or more symbols that label is;
Wherein, when sign flag is known, by its preceding value labeled as expired;When all symbols of code word are known or expired When, erasing is labeled as expired one or more positions.
12. a kind of storage control, for performing the method described in one of preceding claims.
13. a kind of computer readable storage medium for storing program code, said program code includes requiring 1 for perform claim To the instruction of the method described in any one of 10.
CN201580083657.1A 2015-10-09 2015-10-09 The coding of distributed memory system Pending CN108141228A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/RU2015/000655 WO2017061891A1 (en) 2015-10-09 2015-10-09 Coding for distributed storage system

Publications (1)

Publication Number Publication Date
CN108141228A true CN108141228A (en) 2018-06-08

Family

ID=55967384

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201580083657.1A Pending CN108141228A (en) 2015-10-09 2015-10-09 The coding of distributed memory system

Country Status (2)

Country Link
CN (1) CN108141228A (en)
WO (1) WO2017061891A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10505566B2 (en) 2017-06-15 2019-12-10 Huawei Technologies Co., Ltd. Methods and apparatus for encoding and decoding based on layered polar code

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101432969A (en) * 2005-06-10 2009-05-13 数字方敦股份有限公司 Forward error-correcting (FEC) coding and streaming
CN101834898A (en) * 2010-04-29 2010-09-15 中科院成都信息技术有限公司 Method for storing network distributed codes
US20110208996A1 (en) * 2010-02-22 2011-08-25 International Business Machines Corporation Read-other protocol for maintaining parity coherency in a write-back distributed redundancy data storage system
CN102624866A (en) * 2012-01-13 2012-08-01 北京大学深圳研究生院 Data storage method, data storage device and distributed network storage system
US20130198560A1 (en) * 2012-01-31 2013-08-01 Cleversafe, Inc. Efficiently storing data in a dispersed storage network
US20130227374A1 (en) * 2012-02-23 2013-08-29 Sandisk Technologies Inc. Erasure correction using single error detection parity
CN103336785A (en) * 2013-06-04 2013-10-02 华中科技大学 Distributed storage method and distributed storage device based on network coding
US20140331083A1 (en) * 2012-12-29 2014-11-06 Emc Corporation Polar codes for efficient encoding and decoding in redundant disk arrays
CN104932953A (en) * 2015-06-04 2015-09-23 华为技术有限公司 Data distribution method, data storage method, and relevant device and system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101432969A (en) * 2005-06-10 2009-05-13 数字方敦股份有限公司 Forward error-correcting (FEC) coding and streaming
US20110208996A1 (en) * 2010-02-22 2011-08-25 International Business Machines Corporation Read-other protocol for maintaining parity coherency in a write-back distributed redundancy data storage system
CN101834898A (en) * 2010-04-29 2010-09-15 中科院成都信息技术有限公司 Method for storing network distributed codes
CN102624866A (en) * 2012-01-13 2012-08-01 北京大学深圳研究生院 Data storage method, data storage device and distributed network storage system
US20130198560A1 (en) * 2012-01-31 2013-08-01 Cleversafe, Inc. Efficiently storing data in a dispersed storage network
US20130227374A1 (en) * 2012-02-23 2013-08-29 Sandisk Technologies Inc. Erasure correction using single error detection parity
US20140331083A1 (en) * 2012-12-29 2014-11-06 Emc Corporation Polar codes for efficient encoding and decoding in redundant disk arrays
CN103336785A (en) * 2013-06-04 2013-10-02 华中科技大学 Distributed storage method and distributed storage device based on network coding
CN104932953A (en) * 2015-06-04 2015-09-23 华为技术有限公司 Data distribution method, data storage method, and relevant device and system

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
BRIJESH KUMAR RAI: "Adaptive erasure code based distributed storage systems", 《 2015 IEEE 14TH CANADIAN WORKSHOP ON INFORMATION THEORY (CWIT)》 *
ERDAL ARIKAN: "A survey of reed-muller codes from polar coding perspective", 《 2010 IEEE INFORMATION THEORY WORKSHOP ON INFORMATION THEORY (ITW 2010, CAIRO)》 *
KYUMARS SHEYKH ESMAILI等: "CORE: Cross-object redundancy for efficient data repair in storage systems", 《2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA》 *
KYUMARS SHEYKH ESMAILI等: "Efficient updates in cross-object erasure-coded storage systems", 《2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA》 *
PENGFEI HUANG等: "Cyclic linear binary locally repairable codes", 《2015 IEEE INFORMATION THEORY WORKSHOP (ITW)》 *
黄震: "大规模分布式存储系统中数据冗余技术研究", 《中国博士学位论文全文数据库 信息科技辑》 *

Also Published As

Publication number Publication date
WO2017061891A9 (en) 2017-06-15
WO2017061891A1 (en) 2017-04-13

Similar Documents

Publication Publication Date Title
CN102624866B (en) Data storage method, data storage device and distributed network storage system
US10146618B2 (en) Distributed data storage with reduced storage overhead using reduced-dependency erasure codes
Rashmi et al. Having your cake and eating it too: Jointly optimal erasure codes for {I/O}, storage, and network-bandwidth
JP6555759B2 (en) Structured LDPC coding method, decoding method, coding device, and decoding device
WO2016181980A1 (en) Secret sharing method, secret sharing system, sharing device, and program
US20150303949A1 (en) [01] cost-efficient repair for storage systems using progressive engagement
CN104156283B (en) Data reconstruction method, device and storage system
KR101618269B1 (en) Method and Apparatus of Encoding for Data Recovery in Distributed Storage System
CN105356968B (en) The method and system of network code based on cyclic permutation matrices
CN114153651B (en) Data encoding method, device, equipment and medium
CN106484559A (en) A kind of building method of check matrix and the building method of horizontal array correcting and eleting codes
CN104915609B (en) It is a kind of based on Lagrange interpolation methods and cloudy data-hiding method
US20180246679A1 (en) Hierarchical data recovery processing for extended product codes
CN108141228A (en) The coding of distributed memory system
CN108614749A (en) A kind of data processing method and device
Chen et al. A new Zigzag MDS code with optimal encoding and efficient decoding
CN108432170A (en) Apparatus and method for multi-code distributed storage
Cassuto et al. Low-complexity array codes for random and clustered 4-erasures
CN107615248A (en) Distributed data storage method, control device and system
Manasse et al. A reed-solomon code for disk storage, and efficient recovery computations for erasure-coded disk storage
CN110990188B (en) Construction method of partial repetition code based on Hadamard matrix
CN110780813B (en) Distributed storage system based on subspace codes on binary domain
CN109144767B (en) Data storage system and method of operating the same
CN108352845A (en) Method for being encoded to storage data and device
Silberstein et al. Optimal fractional repetition codes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180608

RJ01 Rejection of invention patent application after publication