CN112398487A

CN112398487A - Implementation method and system for reducing complexity of Turbo parallel decoding

Info

Publication number: CN112398487A
Application number: CN202011474548.1A
Authority: CN
Inventors: 邓红梅; 郭军平; 赵峰; 施渊籍; 周一青; 石晶林
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2020-12-14
Filing date: 2020-12-14
Publication date: 2021-02-23
Anticipated expiration: 2040-12-14
Also published as: CN112398487B

Abstract

The invention relates to a realization method and a system for reducing the complexity of Turbo parallel decoding, which can index inner interleaving coefficients f1 and f2 only according to frame length information by specific calculation in an FPGA (field programmable gate array), thereby carrying out real-time simplified calculation of interleaving addresses without storing the interleaving addresses in the calculation process. Through a large amount of simulation verification, the method is improved on the basis of the existing algorithm, and the maximum value of the branch state measurement is calculated due to the fact that no feedback exists in the branch state measurement, so that compared with the existing scheme, the method can save the process of calculating the branch measurement for 2 times in each iteration, and effectively reduces the calculation complexity. Can be compatible with all 188 frame lengths of the LTE protocol and is suitable for various rates. Aiming at the problems of higher complexity and larger time delay of the current decoding implementation, the invention effectively reduces the calculation complexity and the time delay, and simultaneously, the scheme also has lower bit error rate.

Description

Implementation method and system for reducing complexity of Turbo parallel decoding

Technical Field

The invention relates to a realization method and a system for reducing the complexity of Turbo parallel decoding, and relates to the technical field of communication coding and decoding.

Background

The concept of Turbo code proposed by berrou et al in 1993 obtains decoding performance almost close to the theoretical limit of shannon due to the good application of random encoding and decoding conditions in the shannon channel encoding scheme. With the continuous research of Turbo codes by various scholars in recent years, Turbo codes have been put into practical use as one of the mainstream schemes for LTE channel coding in the 3GPP standard.

Although the Turbo code algorithm gradually matures in continuous research, more problems still exist in practical systems in application, and particularly in the aspect of hardware implementation, the problems are mainly due to the fact that the Turbo code decoding algorithm is high in complexity, feedback exists inside the Turbo code decoding algorithm, and large time delay exists in multiple iterations.

In order to reduce time delay, Turbo decoding firstly needs to solve the problem of parallel interleaving/de-interleaving addresses, two schemes are generally adopted for generating interleaving addresses, the first scheme is to generate a specific interleaving table in advance according to the frame length for storage, and if different frame lengths are supported, different interleaving tables need to be stored, so that more resources are occupied; the second is real-time calculation, which needs to obtain interleaving coefficient by computer search to calculate interleaving address in real time. The existing parallel interleaving scheme generally interleaves the grouped subblocks respectively, and different interleaving addresses are adopted for interleaving and deinterleaving, so that the computational complexity is increased or more storage resources are occupied.

According to the MAX-LOG-MAP algorithm, the core of the decoding algorithm is to calculate branch state metric, forward state metric and backward state metric, feedback exists in the internal calculation of the forward state metric and the backward state metric, multiple iterations are needed, the algorithm for solving the maximum value of the forward state metric and the backward state metric has high complexity, and the decoding time delay is in direct proportion to the calculation complexity and the iteration times.

The problem to be solved in the prior art is to effectively reduce the complexity of a decoding algorithm, improve the decoding throughput rate, reduce the decoding delay and reduce the hardware resource overhead.

Disclosure of Invention

The purpose of the invention is as follows: an object is to provide an implementation method for reducing the complexity of Turbo parallel decoding, so as to solve the above problems in the prior art. A further object is to propose a system implementing the above method.

The technical scheme is as follows: a realization method for reducing the complexity of Turbo parallel decoding comprises the following steps:

step 1, inputting a ping-pong buffer unit, and caching the input soft information sequence by a ping-pong RAM according to the state of the component decoder.

And 2, an interleaving/de-interleaving address index unit sequentially stores the internal interleaving coefficients, acquires the interleaving coefficients corresponding to different frame lengths in real time by specific calculation according to the frame length information index RAM addresses, generates and stores fixed interleaving parallel index addresses by specific calculation of the interleaving addresses, and adopts the same index address for interleaving and de-interleaving.

Further, the parallel interleaving RAM index address is obtained according to the following formula:

wherein k represents parallelism, L represents subblock length, i satisfies 0 ≤ i ≤ k-1, j satisfies 0 ≤ j ≤ L-1, and x satisfies 0 ≤ x ≤ N-1.

And 3, the first component decoder unit sets the length of an input information frame as N and divides the input information frame into P blocks, the length L of each block of data is N/P, P paths of parallelism are adopted in the realization, P sub-component decoders are adopted, namely the first component decoder unit is realized by dividing the input information frame into P windows, each window processes L data, and the decoding is realized by adopting an MAX-LOG-MAP algorithm.

Further, when detecting that the ping or pong RAM buffering is completed, the decoding iteration starts, and an indication signal occypy 1 of the currently occupied ping or pong RAM is fed back to indicate that the data is read from the ping or pong RAM.

Further, let the decoding current timeWhen the time is k, the state at the time k is s_kThe state at the time k-1 is s_k-1The state at the time k +1 is s_k+1When decoding iteration starts, the first window adopts the first block data to forward calculate branch state measurement, and the calculation method is as follows:

where the branch state metric has four possible state values in total, using gamma, due to the uncertainty of the received information bits and check bits_k(s_k-1,s_k) Which is hereinafter referred to simply as gamma,

respectively representing system information and check information before encoding,

indicating the reception of the decoded information bits and check bits.

Further, the current state branch metric gamma and the maximum value thereof are calculated, and the calculation results are sequentially stored. And simultaneously, initializing the backward state metric by a second window, pre-calculating the backward state metric by adopting second block data, and storing a boundary value as an initial value of backward state metric calculation at the next moment.

And 4, calculating the forward state metric. The value of the forward state metric at the current moment is determined by the value of the previous moment, the branch transition metric value at the current moment and the maximum value, and when the first gamma maximum value is calculated, the first window starts to calculate the forward state metric in the forward direction. The calculation method is shown as the following formula:

α_k(s_k)＝α_k-1(s_k-1)+γ(s_k-1，s_k)+max(γ(s_k-1，s_k))

wherein alpha is_k(s_k) Denotes the time k s_kThe forward metric of a state is referred to below as alpha.

Further, firstly, the alpha is initialized by adopting uniform distribution, and the sum operation is carried out by utilizing the alpha of the previous state, the current branch state metric and the maximum value of gamma, so as to obtain the alpha value of the forward state metric.

And 5, calculating backward state measurement, namely beta for short. The value of the backward state metric at the current moment is determined by the value of the next moment, the branch transition metric value at the current moment and the maximum value thereof, as shown in the formula:

β_k(s_k)＝β_k+1(s_k+1)+γ(s_k+1，s_k)+max(γ(s_k+1，s_k))

furthermore, beta adopts a second window to pre-calculate a boundary value stored by beta of the second block data as an initial value, and the backward state metric beta value is obtained by adding the backward state (initial value) of beta, the branch state metric and the maximum value of gamma.

And 6, calculating the external information Le, wherein the calculation of the external information Le and the calculation of the beta are carried out simultaneously, reading alpha and gamma values from the cache in sequence, and carrying out addition ratio operation through the alpha, the gamma and the real-time calculated beta values to obtain the external information Le value.

The working principle of the second component decoding is the same as that of the first component decoding, and the difference 1 is that the two component decoding work has a window-length delay, so that compared with the traditional decoding, the decoding delay is reduced by P times; difference 2 is that the soft information sequence it inputs is different, so that the extrinsic information sequence it outputs is different.

Furthermore, the system soft information sequence input originally and the extrinsic information Le output by the first block of data of the first component decoder are indexed by parallel interleaving to obtain the system information and the prior information input by the second component decoder. When detecting the first component decoding first block data external information calculation completion indication signal, the second component decoding feeds back an indication signal occupy2 signal of the input buffer module, which indicates that the system check information is read from the input buffer RAM

Further, when the second component decoding unit completes the computation of the extrinsic information Le2 and LLR of the first block data, the interleaving/deinterleaving address index computation unit performs deinterleaving as the prior information La1 of the first component decoding input.

And 7, repeating the steps 2-6 for the calculation of the remaining P-1 windows, stopping iteration when the data iteration of the P windows meets a certain number of times, sequentially outputting final likelihood ratio information LLR by the second component decoder, performing de-interleaving on the LLR through an interleaving address index unit, performing output buffer processing after hard decision, and waiting for the next module to read data.

Based on the realization method, the invention provides a system for reducing the complexity of Turbo parallel decoding, which comprises a first module, a second module and a third module, wherein the first module is used for inputting a ping-pong cache unit and caching an input soft information sequence by a ping-pong RAM according to the state of a component decoder; a second module for interleaving/deinterleaving the address index unit;

a third module for maximizing branch metrics to reduce coding complexity; a fourth module for calculating a forward state metric; a fifth module for calculating a backward state metric; a sixth module for calculating extrinsic information Le; and a seventh module for repeating the iteration among the second module, the third module, the fourth module, the fifth module and the sixth module, and stopping the iteration when the data iteration of the P windows satisfies a predetermined number of times.

Further, the second module is further configured to sequentially store the intra-interleaving coefficients, index an RAM address according to the frame length information, and obtain the interleaving coefficients corresponding to different frame lengths in real time:

wherein addr_iThe index i is greater than or equal to 0 and less than or equal to 187, and N represents the frame length information;

and generating fixed interleaving parallel index addresses by computing the interleaving addresses and storing the fixed interleaving parallel index addresses, wherein the same index address is adopted for interleaving and deinterleaving:

f(x)＝(f1*x+f2*x²)mod N 0≤x≤N-1

wherein f1 and f2 respectively represent inner interleaving coefficients, x represents an information address before interleaving, f (x) represents an information address after interleaving, mod represents remainder, and N represents a frame length, namely interleaving depth;

the third module is further configured to set an input information frame length to be N, divide the frame length into P blocks, where each block of data length L is N/P, adopt P-way parallelism, and adopt P sub-component decoders, that is, divide the frame length into P windows to implement, where each window processes L data;

when detecting that the ping or pong RAM buffer finishes the indication signal, the decoding iteration starts, and an indication signal copy 1 of the current occupied ping or pong RAM is fed back to represent that the data is read from the ping or pong RAM;

setting the current decoding time as k, the state of k time is s_kThe state at the time k-1 is s_k-1The state at the time k +1 is s_k+1When decoding iteration starts, the first window adopts the first block data to forward calculate branch state measurement, and the calculation method is as follows:

where the branch state metric has four possible state values in total, using gamma, due to the uncertainty of the received information bits and check bits_k(s_k-1,s_k) Which means, in the following simply called branch metrics,

information bits and check bits representing a reception decoding;

calculating the branch measurement and the maximum value of the current state, and respectively and sequentially storing the calculation results; meanwhile, the second window initializes the backward state measurement, pre-calculates the backward state measurement by adopting second block data, and stores a boundary value as an initial value of backward state measurement calculation at the next moment;

the fourth module is further used for initializing the forward metric value by adopting uniform distribution, and performing summation operation by utilizing the forward metric value of the previous state, the current branch state metric and the maximum value of the branch metric to obtain the forward metric value of the forward state metric; the value of the forward state metric at the current moment is determined by the value of the previous moment, the branch transition metric value at the current moment and the maximum value, when the calculation of the maximum value of the first branch metric is completed, the first window starts to calculate the forward state metric in the forward direction, and the calculation method is shown as the following formula:

α_k(s_k)＝α_k-1(s_k-1)+γ(s_k-1，s_k)+max(γ(s_k-1，s_k))

wherein alpha is_k(s_k) Denotes the time k s_kForward metric values for the states.

Further, the fifth module further adopts a second window to pre-calculate a boundary value stored in the backward metric value of the second block data as an initial value, and performs addition operation on the backward state of the backward metric value, the branch state metric value and the maximum value of the state branch metric value to obtain a backward state metric value; the value of the backward state metric at the current moment is determined by the value of the next moment, the branch transition metric value at the current moment and the maximum value thereof, as shown in the formula:

β_k(s_k)＝β_k+1(s_k+1)+γ(s_k+1，s_k)+max(γ(s_k+1，s_k))

in the formula, beta_k(s_k) Denotes the time k s_kA backward measure of state;

the calculation of the external information Le and the backward state metric value in the seventh module is carried out simultaneously, the forward state metric value and the state branch metric value are read from the cache in sequence, and the addition ratio operation is carried out through the forward state metric value, the state branch metric value and the backward state metric value calculated in real time to obtain the external information Le value;

repeating the steps 2-6 for the calculation of the remaining P-1 windows, stopping iteration when the data iteration of the P windows meets a certain number of times, sequentially outputting final likelihood ratio information LLR by a second component decoder, performing de-interleaving on the LLR through an interleaving address index unit, performing output buffer processing after hard decision, and waiting for the next module to read data;

Λ(s_k0)＝α_k-1(s_k-1)+γ(s_k-1，s_k)+β_k(s_k)

Λ(s_k1)＝α_k-1(s_k-1)+γ(s_k-1，s_k)+β_k(s_k)

LLR＝max^*(Λ₀(i))-max^*(Λ₁(i))

wherein, Λ(s)_k0) When s represents input of 0_kCorresponding LLR, Λ(s)_k1) Indicating that input is 1 hour s_kCorresponding LLR, L_eAnd LLRs respectively represent external information and external likelihood ratio information of the first component-decoded output.

Has the advantages that: the invention relates to a realization method and a system for reducing the complexity of Turbo parallel decoding, which can index inner interleaving coefficients f1 and f2 only according to frame length information by specific calculation in an FPGA (field programmable gate array), thereby carrying out real-time simplified calculation of interleaving addresses without storing the interleaving addresses in the calculation process. Through a large amount of simulation verification, the method is improved on the basis of the existing algorithm, and the maximum value of the branch state measurement is calculated due to the fact that no feedback exists in the branch state measurement, so that compared with the existing scheme, the method can save the process of calculating the branch measurement for 2 times in each iteration, and effectively reduces the calculation complexity. Can be compatible with all 188 frame lengths of the LTE protocol and is suitable for various rates. Aiming at the problems of higher complexity and larger time delay of the current decoding implementation, the invention effectively reduces the calculation complexity and the time delay, and simultaneously, the scheme also has lower bit error rate.

Drawings

Fig. 1 is a schematic structural diagram of a method for effectively reducing the complexity of Turbo parallel decoding according to an embodiment of the present application.

Fig. 2 is a schematic flowchart of a method for effectively reducing the complexity of Turbo parallel decoding according to an embodiment of the present application.

Fig. 3 is a flowchart of a component decoding sub-decoder according to an embodiment of the present application.

Fig. 4 is a timing diagram illustrating a method for effectively reducing the complexity of Turbo parallel decoding according to an embodiment of the present application.

Fig. 5 is a performance curve diagram of a method for effectively reducing the complexity of Turbo parallel decoding according to an embodiment of the present application.

Fig. 6 is an overall work flow diagram of the present invention.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without one or more of these specific details. In other instances, well-known features have not been described in order to avoid obscuring the invention.

In an embodiment, a low-complexity Turbo parallel decoding principle structure diagram provided by the embodiment of the present application is provided. As shown in fig. 1, the method includes: step S101 to step S105.

Step S101, inputting a ping-pong buffer unit, caching the input decoding soft information according to the state of the component decoder by using a ping-pong RAM, if the frame length is 288, 8 paths are parallel, and the number of each group of soft information is 36, respectively performing ping-pong buffer on the input decoding external information by using 2 groups of RAMs, wherein the decoding external information comprises system external information, check external information 1 and check external information 2, each group is 8 RAMs, and the depth of each RAM is 36.

Step S102, an interleaving/de-interleaving address index calculation unit adopts a RAM with the depth of 188 to sequentially store the interleaving coefficients in 188 groups of the 3GPP LTE protocol, and each group comprises 3 pieces of parameter information, the frame length and 2 interleaving coefficients f1 and f 2.

Further, RAM addresses are indexed according to different frame lengths using the following equation:

wherein addr_iThe index i is greater than or equal to 0 and less than or equal to 187, and N represents the frame length information. The division in the formula can be realized by right shift operation without any complex calculation, and the inner interleaving coefficients f1 and f2 of 188 different frame lengths of the 3GPP LTE can be obtained in real time.

Further, the interleaving addresses of different frame lengths can be obtained by interleaving coefficients f1 and f 2:

f(x)＝(f1*x+f2*x²)mod N 0≤x≤N-1

wherein f1 and f2 respectively represent inner interleaving coefficients, x represents an information address before interleaving, f (x) represents an information address after interleaving, mod represents remainder, and N represents a frame length, that is, an interleaving depth.

Further, the above formula can be further transformed to obtain the following formula:

f(x)＝(f(x-1)+f1+f2+2f2*(x-1))mod N 0≤x≤N-1

furthermore, the optimized calculation complexity is obviously reduced, and the hardware implementation is facilitated.

Further, the information outside the system is subjected to packet interleaving through the index address.

Step S103, the first component decoding unit adopts 8 paths of parallelism in implementation, adopts 8 sub-component decoders, namely is realized by dividing into 8 windows, each window processes one block of data, namely 36, and the decoding is realized by adopting an MAX-LOG-MAP algorithm.

Further, initializing the alpha by adopting uniform distribution, and performing summation operation by utilizing the alpha of the previous state, the current branch state metric and the maximum value of gamma to obtain a forward state metric alpha value.

Further, as shown in the following formula, calculating the extrinsic information Le, and calculating the extrinsic information Le and beta simultaneously, reading the alpha and gamma values from the cache in sequence, and performing a sum-ratio operation through the alpha, gamma and the real-time calculated beta value to obtain the extrinsic information Le value.

Λ(s_k0)＝α_k-1(s_k-1)+γ(s_k-1，s_k)+β_k(s_k)

Λ(s_k1)＝α_k-1(s_k-1)+γ(s_k-1，s_k)+β_k(s_k)

LLR＝max^*(Λ₀(i))-max^*(Λ₁(i))

Wherein,Λ(s_k0) When s represents input of 0_kCorresponding LLR, Λ(s)_k1) Indicating that input is 1 hour s_kCorresponding LLR, L_eAnd LLRs respectively represent external information and external likelihood ratio information of the first component-decoded output.

Step S104, a second component decoding unit, wherein the working principle of second component decoding is the same as that of first component decoding, the difference 1 is that the two component decoding work has a time delay with a window length, and compared with the traditional decoding, the decoding time delay is reduced by P times; difference 2 is that the soft information sequence it inputs is different, so that the extrinsic information sequence it outputs is different.

Further, the original input system soft information sequence and the external information Le output by the first block data of the first component decoder are interleaved by adopting a parallel interleaving index address to obtain the system information input by the second component decoder

And a priori information La 2.

Furthermore, according to the parallel interleaving index formula, it can be known that P pieces of external system information with the RAM address of 1 need to be stored in the address 19 of the P pieces of interleaving RAM through the index address to obtain interleaved data.

Furthermore, when detecting the external information calculation completion indication signal of the first block of data of the first component decoding, the second component decoding unit feeds back an indication signal of an occupy2 of the input buffer module, which indicates that the system check information is read from the input buffer RAM

And S105, repeating the steps 2-8 for the calculation of the remaining P-1 windows, stopping iteration when the data iteration of the P windows meets a certain number of times, sequentially outputting the final likelihood ratio information LLR by the second component decoder, performing de-interleaving on the LLR through an interleaving index unit, performing output buffer processing after a hard decision module is passed, and waiting for the next module to read data.

In an embodiment, a low-complexity Turbo parallel decoding flow diagram provided in an embodiment of the present application is provided. As shown in fig. 2, the method includes: step S106 to step S110.

And step S106, performing ping-pong cache of input decoding information according to the component decoding working state.

And step S107, when the decoding is started, the first component decoder starts to work, judges whether the calculation of the P-th block data external information Le1 is finished or not, enters an interleaving address index unit for parallel interleaving to obtain the prior information La2 of the second component decoding if the calculation is finished, and continues to perform the first component decoding if the calculation is not finished.

Step S108, reading the system information bit input into the ping-pong buffer

And a second check bit

The system information bit is entered into an interleaving address index unit for parallel interleaving to obtain the system information bit

With a priori information La2 and second verification information

And entering the second component decoding together for calculation.

Step S109, during the second component decoding, whether the calculation of the P-th block data extrinsic information Le2 and LLR is completed or not is judged, if yes, the parallel de-interleaving is carried out in an interleaving address index unit to obtain the prior information La1 of the first component decoding, and if not, the second component decoding is continued.

And step S110, when the iteration meets a certain number of times, stopping the iteration, outputting the final likelihood ratio information LLR interleaving address index unit by the second component decoder according to the window sequence for de-interleaving, and performing output buffer processing after passing through the hard decision unit.

In an embodiment, an implementation flow diagram of a component decoding sub-decoder is provided in an embodiment of the present application. As shown in fig. 3, the method includes: step S111 to step S115.

Step S111, calculating gamma values of the current 4 possible states, and simultaneously sequentially caching the results.

Step S112, calculating the gamma maximum values of the 4 possible states, and simultaneously sequentially caching the results.

And step S113, reading the gamma value and the gamma maximum value of the cache in a positive sequence, calculating alpha and sequentially caching the result.

And step S114, reading the cached gamma value and the gamma maximum value in a reverse order, and calculating beta.

And step S115, reading the alpha cache value and the gamma cache value in a positive sequence, and calculating extrinsic information Le and likelihood ratio information LLR together by combining the current beta value.

In an embodiment, a low-complexity Turbo parallel decoding timing diagram is provided in an embodiment of the present application. As shown in fig. 4, the abscissa represents decoding time in units of one sliding window calculation time, the ordinate represents window length in units of one sliding window length, blue represents first component decoding timing, and orange represents second component decoding timing, both of which have a sliding window delay, the method comprising: step S116 to step S118.

In step S116, before the time T0, the first window W0 of the first component decoding calculates gamma, gamma maximum value and alpha in forward direction, the second window W1 pre-calculates beta in reverse direction, and the time T0 sequentially stores gamma, gamma maximum value and alpha value.

At step S116, at time T1, the first component decoding first window W0 reversely calculates beta, extrinsic information Le, and likelihood ratio information LLR. The initial value of beta adopts the beta value pre-calculated reversely at the moment of the first component decoding T0, the second window W1 calculates gamma, the maximum value of gamma and alpha in a forward direction, and the third window W2 pre-calculates beta reversely.

Further, at time T1, the first window W0 of the second component decoding starts to forward calculate gamma, the maximum value of gamma, and the alpha value, and the external information stored in the first component decoding at time T0 is interleaved by the interleaving address index unit, so as to obtain the external information La2 required by the second component decoding for calculating gamma. The second window W1 pre-calculates beta in reverse, time T1, saves gamma, gamma maximum and alpha values.

And step S117, at the time of T2, the first component decoding second window W1 reversely calculates beta, the external information Le and the likelihood ratio information LLR, the initial value of the beta adopts the beta value reversely pre-calculated at the time of T1, the third window W2 forwardly calculates gamma, the maximum value of the gamma and alpha, and the fourth window W3 reversely pre-calculates the beta.

Further, at time T2, the second component decoding first window W0 reversely calculates beta, extrinsic information Le and likelihood ratio information LLR, the initial value of beta adopts the beta value reversely pre-calculated at time T1 of the second component decoding, the second window W1 forwardly calculates gamma, the maximum value of gamma and alpha, and the third window W2 reversely pre-calculates beta.

And S118, repeating the steps, and then completing the calculation of all LLR likelihood ratio information.

In one embodiment, a low-complexity Turbo parallel decoding performance graph is provided.

As shown in fig. 5, for BPSK modulation under AWGN channel, 1/3 code rate, after 6 iterations of bit error rate performance curves, the MAX-LOG-MAP algorithm and the improved MAX-LOG-MAP algorithm are respectively adopted for simulation comparison, and it can be seen that the two curves almost completely coincide.

As noted above, while the present invention has been shown and described with reference to certain preferred embodiments, it is not to be construed as limited thereto. Various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A realization method for reducing the complexity of Turbo parallel decoding is characterized by comprising the following steps:

step 1, inputting a ping-pong cache unit, and caching an input soft information sequence into a ping-pong RAM according to the state of a component decoder;

step 2, interleaving/deinterleaving address index unit;

step 3, solving the maximum value of the branch metric to reduce the decoding complexity;

step 4, calculating forward state measurement;

step 5, calculating backward state measurement;

step 6, calculating extrinsic information Le;

and 7, repeating the steps 2-6, and stopping iteration when the data iteration of the P windows meets the preset times.

2. The method according to claim 1, wherein step 2 further comprises: sequentially storing the internal interleaving coefficients, indexing the RAM address according to the frame length information, and acquiring the interleaving coefficients corresponding to different frame lengths in real time:

f(x)＝(f1*x+f2*x²)mod N 0≤x≤N-1

3. The method according to claim 2, wherein step 2 further comprises:

obtaining parallel interleaving RAM index address:

wherein k represents parallelism, L represents subblock length, i satisfies 0 ≤ i ≤ k-1, j satisfies 0 ≤ j ≤ L-1, x satisfies 0 ≤ x ≤ N-1, and map _ addr_i，jThe method represents parallel interleaving RAM index address, and the other symbols have the same meaning.

4. The implementation method for reducing the complexity of Turbo parallel decoding according to claim 1, wherein step 3 further comprises:

step 3-1, setting the frame length of input information as N, dividing the frame length into P blocks, wherein the length L of each block of data is N/P, adopting P-path parallelism, adopting P sub-component decoders, namely dividing the decoder into P windows to realize, and processing L data by each window;

step 3-2, when detecting that the ping or pong RAM buffering is finished and indicating signals are detected, decoding iteration starts, and an indicating signal occopy 1 of the ping or pong RAM which is currently occupied is fed back to indicate that data are read from the ping or pong RAM;

step 3-3, setting the current decoding time as k, and setting the state of the k time as s_kThe state at the time k-1 is s_k-1The state at the time k +1 is s_k+1When decoding iteration starts, the first window adopts the first block data to forward calculate branch state measurement, and the calculation method is as follows:

where the branch state metric has four possible state values in total, using gamma, due to the uncertainty of the received information bits and check bits_k(s_k-1，s_k) Which means, in the following simply called branch metrics,

respectively representing information bits and check bits of the received decoding;

3-4, starting to calculate the branch metric and the maximum value of the current state, and respectively and sequentially storing the calculation results; and simultaneously, initializing the backward state metric by a second window, pre-calculating the backward state metric by adopting second block data, and storing a boundary value as an initial value of backward state metric calculation at the next moment.

5. The method according to claim 1, wherein the value of the forward state metric at the current time in step 4 is determined by the value of the previous time, the branch transition metric value at the current time, and the maximum value, and when the calculation of the first maximum value of the branch metric is completed, the first window starts to calculate the forward state metric in the forward direction, and the calculation method is as follows:

α_k(s_k)＝α_k-1(s_k-1)+γ(s_k-1，s_k)+max(γ(s_k-1，s_k))

wherein alpha is_k(s_k) Denotes the time k s_kForward metric of state, max (gamma(s)_k-1，s_k) Represents the maximum value of the branch state metric; alpha is alpha_k-1(s_k-1) Denotes the k-1 time s_k-1Forward metrics of the states; the other symbols have the same meanings as above;

and initializing the forward metric value by adopting uniform distribution, and performing summation operation by utilizing the forward metric value of the previous state, the current branch state metric and the maximum value of the branch metric to obtain the forward metric value of the forward state metric.

6. The method according to claim 1, wherein the backward state metric value at the current time in step 5 is determined by a value at a subsequent time, a branch transition metric value at the current time, and a maximum value thereof, as shown in the following formula:

β_k(s_k)＝β_k+1(s_k+1)+γ(s_k+1，s_k)+max(γ(s_k+1，s_k))

in the formula, beta_k(s_k) Denotes the time k s_kBackward measure of state, beta_k+1(s_k+1) Denotes the k +1 time s_k+1Backward measure of state, gamma(s)_k+1，s_k) Represents the branch transition metric at time k +1, max (γ(s)_k+1，s_k) Represents the maximum value of the branch transition metric at time k + 1; the other symbols have the same meanings as above;

and adopting a second window to pre-calculate a boundary value stored by the backward measurement value of the second block data as an initial value, and carrying out addition operation through the backward state of the backward measurement value, the branch state measurement and the maximum value of the state branch measurement to obtain the backward measurement value of the backward state measurement.

7. The method for realizing the reduction of the Turbo parallel decoding complexity according to claim 1, wherein the calculation of the external information Le and the backward state metric value in step 7 is performed simultaneously, a forward state metric value and a state branch metric value are read from a cache in sequence, and the addition ratio operation is performed through the forward state metric value, the state branch metric value and the backward state metric value calculated in real time to obtain the external information Le value;

Λ(s_k0)＝α_k-1(s_k-1)+γ(s_k-1，s_k)+β_k(s_k)

Λ(s_k1)＝α_k-1(s_k-1)+γ(s_k-1，s_k)+β_k(s_k)

LLR＝max^*(Λ₀(i))-max^*(Λ₁(i))

wherein, Λ(s)_k0) When s represents input of 0_kCorresponding LLR, Λ(s)_k1) Indicating that input is 1 hour s_kCorresponding LLR, L_eAnd LLR represents external information and external likelihood ratio information of the first component decoding output, respectively; the other symbols have the same meanings as above.

8. A realization method for reducing the complexity of Turbo parallel decoding is characterized by comprising the following modules:

a first module for inputting ping-pong buffer unit, and performing ping-pong RAM buffer to the input soft information sequence according to the state of the component decoder;

a second module for interleaving/deinterleaving the address index unit;

a third module for maximizing branch metrics to reduce coding complexity;

a fourth module for calculating a forward state metric;

a fifth module for calculating a backward state metric;

a sixth module for calculating extrinsic information Le;

and the seventh module is used for repeating iteration in the second module, the third module, the fourth module, the fifth module and the sixth module, and stopping iteration when the data iteration of the P windows meets the preset times.

9. The method according to claim 8, wherein the second module is further configured to sequentially store the intra-interleaving coefficients, index the RAM address according to the frame length information, and obtain the interleaving coefficients corresponding to different frame lengths in real time:

wherein,addr_ithe index i is greater than or equal to 0 and less than or equal to 187, and N represents the frame length information;

f(x)＝(f1*x+f2*x²)mod N 0≤x≤N-1

information bits and check bits representing a reception decoding;

α_k(s_k)＝α_k-1(s_k-1)+γ(s_k-1，s_k)+max(γ(s_k-1，s_k))

wherein alpha is_k(s_k) Denotes the time k s_kForward metric of state, max (gamma(s)_k-1，s_k) Represents the maximum value of the branch state metric; alpha is alpha_k-1(s_k-1) Denotes the k-1 time s_k-1Forward metrics of the states; the other symbols have the same meanings as above.

10. The implementation method for reducing the complexity of Turbo parallel decoding according to claim 7, wherein: the fifth module further adopts a second window to pre-calculate a boundary value stored by the backward measurement value of the second block data as an initial value, and performs addition operation through the backward state of the backward measurement value, the branch state measurement and the maximum value of the state branch measurement to obtain the backward measurement value of the backward state measurement; the value of the backward state metric at the current moment is determined by the value of the next moment, the branch transition metric value at the current moment and the maximum value thereof, as shown in the formula:

β_k(s_k)＝β_k+1(s_k+1)+γ(s_k+1，s_k)+max(γ(s_k+1，s_k))

Λ(s_k0)＝α_k-1(s_k-1)+γ(s_k-1，s_k)+β_k(s_k)

Λ(s_k1)＝α_k-1(s_k-1)+γ(s_k-1，s_k)+β_k(s_k)

LLR＝max^*(Λ₀(i))-max^*(Λ₁(i))

wherein, Λ(s)_k0) When s represents input of 0_kCorresponding LLR, Λ(s)_k1) Indicating that input is 1 hour s_kCorresponding LLR, L_eAnd LLR represents the first component, respectivelyAnd decoding the output external information and the external likelihood ratio information, wherein the meanings of the rest symbols are the same as the above.