Nothing Special   »   [go: up one dir, main page]

A Simplified Min-Sum Decoding Algorithm For Non-Binary LDPC Codes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

24 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 61, NO.

1, JANUARY 2013
A Simplied Min-Sum Decoding Algorithm for
Non-Binary LDPC Codes
Chung-Li Wang, Xiaoheng Chen, Zongwang Li, and Shaohua Yang
AbstractNon-binary low-density parity-check codes are ro-
bust to various channel impairments. However, based on the
existing decoding algorithms, the decoder implementations are
expensive because of their excessive computational complexity
and memory usage. Based on the combinatorial optimization,
we present an approximation method for the check node pro-
cessing. The simulation results demonstrate that our scheme has
small performance loss over the additive white Gaussian noise
channel and independent Rayleigh fading channel. Furthermore,
the proposed reduced-complexity realization provides signicant
savings on hardware, so it yields a good performance-complexity
tradeoff and can be efciently implemented.
Index TermsLow-density parity-check (LDPC) codes, non-
binary codes, iterative decoding, extended min-sum algorithm.
I. INTRODUCTION
B
INARY low-density parity-check (LDPC) codes, discov-
ered by Gallager in 1962 [1], were rediscovered and
shown to approach Shannon capacity in the late 1990s [2].
Since their rediscovery, a great deal of research has been
conducted in the study of code construction methods, decoding
techniques, and performance analysis. With hardware-efcient
decoding algorithms such as the min-sum algorithm [3],
practical decoders can be implemented for effective error-
control. Therefore, binary LDPC codes have been considered
for a wide range of applications such as satellite broadcasting,
wireless communications, optical communications, and high-
density storage systems.
As the extension of the binary LDPC codes over the Galois
eld of order q, non-binary LDPC (NB-LDPC) codes, also
known as q-ary LDPC codes, were rst investigated by Davey
and MacKay in 1998 [4]. They extended the sum-product
algorithm (SPA) for binary LDPC codes to decode q-ary
LDPC codes and referred to this extension as the q-ary SPA
(QSPA). Based on the fast Fourier transform (FFT), they
devised an equivalent realization called FFT-QSPA to reduce
the computational complexity of QSPA for codes with q as
a power of 2 [4]. With good construction methods [5][9],
NB-LDPC codes decoded with the FFT-QSPA outperform
Reed-Solomon codes decoded with the algebraic soft-decision
Koetter-Vardy algorithm [10].
Paper approved by F. Fekri, the Editor for LDPC Codes and Applications
of the IEEE Communications Society. Manuscript received October 23, 2011;
revised March 12, 2012.
C.-L. Wang, Z. Li, and S. Yang are with LSI Corporation, Mil-
pitas, CA 95035, USA (e-mail: {ChungLi.Wang, Zongwang.Li, Shao-
hua.Yang}@lsi.com).
X. Chen was with Microsoft Corporation, Redmond, WA 98052, USA.
He is now with Sandisk Corporation, Milpitas, CA 95035, USA (e-mail:
chen.xiaoheng@gmail.com).
Digital Object Identier 10.1109/TCOMM.2012.101712.110709
As a class of capacity approaching codes, NB-LDPC codes
are capable of correcting symbol-wise errors and have recently
been actively studied by numerous researchers. However,
despite the excellent error performance of NB-LDPC codes,
very little research contribution has been made for VLSI
decoder implementations due to the lack of hardware-efcient
decoding algorithms. Even though the FFT-QSPA signicantly
reduces the number of computations for the QSPA, its com-
plexity is still too high for practical applications, since it
incorporates a great number of multiplications in probability
domain for both check node (CN) and variable node (VN)
processing. Thus logarithmic domain approaches were devel-
oped to approximate the QSPA, such as the extended min-
sum algorithm (EMSA), which applies message truncation and
sorting to further reduce complexity and memory requirements
[11], [12]. The second widely used algorithm is the min-max
algorithm (MMA) [13], which replaces the sum operations
in the CN processing by max operations. With an optimal
scaling or offset factor, the EMSA and MMA can cause less
than 0.2 dB performance loss in terms of signal-to-noise ratio
(SNR) compared to the QSPA. However, implementing the
EMSA and MMA still requires excessive silicon area, making
the decoder considerably expensive for practical designs [14]
[17]. Besides the QSPA and its approximations, two reliability-
based algorithms were proposed towards much lower complex-
ity based on the concept of simple orthogonal check-sums used
in the one-step majority-logic decoding [18]. Nevertheless,
both algorithms incur at least 0.8 dB of SNR loss compared to
the FFT-QSPA. Moreover, they are effective for decoding only
when the parity-check matrix has a relatively large column
weight. Consequently, the existing decoding algorithms are
either too costly to implement or only applicable to limited
code classes at cost of huge performance degradation.
Therefore, we propose a reduced-complexity decoding al-
gorithm, called the simplied min-sum algorithm (SMSA),
which is derived from our analysis of the EMSA based on
the combinatorial optimization. Compared to the QSPA, the
SMSA shows small SNR loss, which is similar to that of
the EMSA and MMA. Regarding the complexity of the CN
processing, the SMSA saves around 60% to 70% of com-
putations compared to the EMSA. Also, the SMSA provides
an exceptional saving of memory usage in the decoder design.
According to our simulation results and complexity estimation,
this decoding algorithm achieves a favorable tradeoff between
error performance and implementation cost.
The rest of the paper is organized as follows. The NB-
LDPC code and EMSA decoding are reviewed in Section II.
The SMSA is derived and developed in Section III. The error
0090-6778/13$31.00 c 2013 IEEE
WANG et al.: A SIMPLIFIED MIN-SUM DECODING ALGORITHM FOR NON-BINARY LDPC CODES 25
performance simulation results are summarized in Section IV.
In Section V, the SMSA is compared with the EMSA in
terms of complexity and memory usage. At last, Section VI
concludes this paper.
II. NB-LDPC CODES AND ITERATIVE DECODING
Let GF(q) denote a nite eld of q elements with addition
and multiplication . We will focus on the eld with
characteristic 2, i.e., q = 2
p
. In such a eld, each element has
a binary representation, which is a vector of p bits and can be
translated to a decimal number. Thus we label the elements
in GF(2
p
) as 0, 1, 2, . . . 2
p
1. An (n, r) q-ary LDPC code
C is given by the null space of an mn sparse parity-check
matrix H = [h
i,j
] over GF(q), with the dimension r.
The parity-check matrix H can be represented graphically
by a Tanner graph, which is a bipartite graph with two disjoint
variable node (VN) and check node (CN) classes. The j-th VN
represents the j-th column of H, which is associated with the
j-th symbol of the q-ary codeword. The i-th CN represents its
i-th row, i.e., the i-th q-ary parity check of H. The j-th VN
and i-th CN are connected by an edge if h
i,j
,= 0. This implies
that the j-th code symbol is checked by the i-th parity check.
Thus for 0 i < m and 0 j < n, we dene N
i
= j :
0 j < n, h
i,j
,= 0, and M
j
= i : 0 i < n, h
i,j
,= 0.
The size of N
i
is referred to as the CN degree of the i-th
CN, denoted as [N
i
[. The size of M
j
is referred to as the VN
degree of the j-th VN, denoted as [M
j
[. If both VN and CN
degrees are invariable, letting d
v
= [M
j
[ and d
c
= [N
i
[, such
a code is called a (d
v
, d
c
)-regular code. Otherwise it is an
irregular code.
Similarly as binary LDPC codes, q-ary LDPC codes can
be decoded iteratively by the message passing algorithm, in
which messages are passed through the edges between the
CNs and VNs. In the QSPA, EMSA, and MMA, a message is a
vector composed of q sub-messages, or simply say, entries. Let

j
= [
j
(0),
j
(1), . . . ,
j
(q 1)] be the a priori information
of the j-th code symbol from the channel. Assuming that
X
j
is the j-th code symbol, the d-th sub-message of
j
is a log-likelihood reliability (LLR) dened as
j
(d) =
log(Prob(X
j
= z
j
)/Prob(X
j
= d)). z
j
is the most likely
(ML) symbol for X
j
, i.e., z
j
= arg max
dGF(q)
Prob(X
j
=
d), and z = [z
j
]
j=1...n
. The smaller
j
(d) is, the more likely
X
j
= d is. Let
i,j
and
i,j
be the VN-to-CN (V2C) and CN-
to-VN (C2V) soft messages between the i-th CN and j-th VN
respectively. For all d GF(q), the d-th entry of
i,j
, denoted
as
i,j
(d), is the logarithmic reliability of d from the VN
perspective. a
i,j
is the symbol with the smallest reliability, i.e.,
the ML symbol of the V2C message. With x
i,j
= X
j
h
i,j
,
we let
i,j
(d) = log(Prob(x
i,j
= a
i,j
)/Prob(x
i,j
= d))
and
i,j
(a
i,j
) = 0. b
i,j
and
i,j
(d) are dened from the
CN perspective similarly. The EMSA can be summarized as
follows.
Algorithm 1. The Extended Min-Sum Algorithm
Initialization: Set z
j
= arg min
dGF(q)

j
(d). For all i, j with
h
i,j
,= 0, set
i,j
(h
i,j
d) =
j
(d). Set = 0.
Step 1) Parity check: Compute the syndrome z H
T
. If
z H
T
= 0, stop decoding and output z as the decoded
codeword; otherwise go to Step 2.
Step 2) If =
max
, stop decoding and declare a decoding
failure; otherwise, go to Step 3.
Step 3) CN processing: Let the congurations /
i
(x
i,j
=
d) be the sequence [x
i,j
]
j

Ni
such that

Ni
x
i,j
=
0 and x
i,j
= d. With a preset scaling factor 0 < c 1,
compute the C2V messages by

i,j
(d) = c min
Li(xi,j=d)

Ni\j

i,j
(x
i,j
). (1)
Step 4) VN processing: + 1. Compute V2C mes-
sages in two steps. First compute the primitive messages
by

i,j
(h
i,j
d) =
j
(d) +

Mj\i

,j
(h
i

,j
d). (2)
Step 5) Message normalization: Obtain V2C messages
by normalizing with respect to the ML symbol
a
i,j
= arg min
dGF(q)

i,j
(d). (3)

i,j
(d) =
i,j
(d)
i,j
(a
i,j
). (4)
Step 6) Tentative Decisions:

j
(d) =
j
(d) +

iMj

i,j
(h
i,j
d). (5)
z
j
= arg min
dGF(q)

j
(d). (6)
Go to Step 1.
III. A SIMPLIFIED MIN-SUM DECODING ALGORITHM
In this section we develop the simplied min-sum decoding
algorithm. In the rst part, we analyze the congurations and
propose the approximation of the CN processing. Then in the
second part, a practical scheme is presented to achieve the
tradeoff between complexity and performance.
A. Algorithm Derivation and Description
In the beginning, two differences between the SMSA and
EMSA are introduced. First, the SMSA utilizes a
i,j
(b
i,j
) as
the V2C (C2V) hard message, which indicates the ML symbol
given by the V2C (C2V) message. Second, the reordering of
soft message entries in the SMSA is dened as:

i,j
() =
i,j
( a
i,j
) (7)

i,j
() =
i,j
( b
i,j
), (8)
for all i, j with h
i,j
,= 0. While in the EMSA the arrangement
of entries is made by the absolute value, the SMSA arranges
the entries by the relative value to the hard message, expressed
and denoted as the deviation . Thus before the CN processing
of the SMSA, the messages are required to be transformed
from the absolute space to the deviation space.
Equation (1) performs the combinatorial optimization over
all congurations. If we regard the sum of reliabilities

Ni\j

i,j
(x
i,j
) as the reliability of the conguration
[x
i,j
]
j

Ni
, this operation actually provides the most likely
conguration and assigns its reliability to the result. However,
the size of its search space is of O(q
dc
) and leads to excessive
26 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 61, NO. 1, JANUARY 2013
complexity. Fortunately, in [11] it is observed that the opti-
mization tends to choose the conguration with more entries
equal to the V2C hard messages. Therefore, if we dene the
order as the number of all j

N
i
j such that x
i,j
,= a
i,j
,
(1) can be reduced by utilizing the order-k subset, denoted as
/
(k)
i
(x
i,j
= d), which consists of the congurations of orders
not higher than k. Limiting the size of the search space gives
a reduced-search algorithm with performance loss [11], so
adjusting k can be used to give a tradeoff between performance
and complexity. We denote the order-k C2V soft message by

(k)
(with the subscript i, j omitted for clearness), i.e.

i,j
(d)
(k)
(d) = min
L
(k)
i
(xi,j=d)

Ni\j

i,j
(x
i,j
), (9)
since /
(k)
i
(x
i,j
= d) /
i
(x
i,j
= d). In the following context,
we will show the computations for the hard message and order-
1 soft message. Then these messages will be used to generate
high-order messages. The hard message is simply given by
Theorem 1.
Theorem 1. The hard message b
i,j
is determined by
b
i,j
arg min
dGF(q)

i,j
(d) =

Ni\j

a
i,j
. (10)
Besides, for any order k,
i,j
(b
i,j
) =

i,j
(0) =
(k)
(b
i,j
) = 0.
PROOF From (9) the inequality is obtained as:

(k)
(d)

Ni\j
min
x
i,j
GF(q)

i,j
(x
i,j
) =

Ni\j

i,j
(a
i,j
).
(11)
If x
i,j
= b
i,j
and x
i,j
= a
i,j
for all j

N
i
j, we get an
order-0 conguration, included in /
(k)
i
(x
i,j
= b
i,j
) for any k.
Thus one can nd that the equation (11) holds if d = b
i,j
, and

(k)
(b
i,j
) has the smallest reliability. It follows that for any k

i,j
(b
i,j
) =
(k)
(b
i,j
) =

Ni\j

i,j
(a
i,j
) = 0. (12)
Based on Theorem 1, for any k we can dene the order-k
message

(k)
() =
(k)
( b
i,j
) in the deviation space. For
,= 0, the order-1 C2V message

(1)
() can be determined
by Theorem 2, which performs a combinatorial optimization
in the deviation space.
Theorem 2. With = b
i,j
d, the order-1 soft message is
determined by

(1)
(d) =

(1)
()
= min
j

Ni\j

Ni\{j,j

i,j
(a
i,j
) +
i,j
(a
i,j
)

= min
j

Ni\j

i,j
().
(13)
PROOF According to the denition of the order, each
conguration in /
(1)
i
(x
i,j
= d) has x
i,j
= a
i,j
for some
j

N
i
j and x
i,j
= a
i,j
for all j

N
i
j, j

, since
d(a
i,j
)

={j,j

}
a
i,j
= 0. It follows that selecting
j

N
i
j is equivalent to selecting an order-1 conguration
in /
(1)
i
(x
i,j
= d). Correspondingly, minimizing
i,j
()
over j

in the deviation space is equivalent to minimizing

i,j
(a
i,j
) over the congurations in the absolute space.
Hence searching for j

to minimize the sum in the bracket of


(13) yields
(1)
(d).
Similarly to Theorem 2, in the absolute space an order-k
conguration can be determined by assigning a deviation to
each of k VNs selected from N
i
j, i.e., x
i,j
=
j
a
i,j
with

j
,= 0 for selected VNs and
j
= 0 for all other VNs. Thus
in the deviation space, the order-k message can be computed
as follows:
Theorem 3. With = b
i,j
d, choosing a combination of
k symbols from GF(q) (denoted as
1
. . .
k
) and picking a
permutation of k different VNs from the set N
i
j (denoted
as j
1
, j
2
, . . . , j
k
), the order-k soft message is given by

(k)
(d) =

(k)
() = min

k
=1

=
min
j1,...,j
k
Ni\j
j1=...=j
k
k

=1

i,j

).
(14)
Theorem 3 shows that the conguration set can be analyzed
as the Cartesian product of the set of symbol combinations and
that of VN permutations. For Equation (14) the required set
of combinations can be generated according to Theorem 4.
Theorem 4. The set of k-symbol combinations
1
. . .
k
for
(14) can be obtained by choosing k symbols from GF(q) of
which there exists no subset with the sum equal to 0.
PROOF Suppose that there exists a subset ! in 1, . . . k
such that

= 0. With a modied k-symbol com-


bination that

= 0 for all ! and


for all
1, . . . k !, we have
k

=1

i,j

) =

{1,...k}\R

i,j

)
k

=1

i,j

), (15)
where

k
=1

k
=1

= . Thus the original combi-


nation can be ignored.
Directly following from Theorem 4, Lemma 5 shows that

(k)
() of order k > p is equal to

(p)
(), since the combi-
nations with more than p nonzero symbols can be ignored.
Lemma 5. With q = 2
p
, for all GF(q), we have

(p)
() =

(p+1)
() = . . . =

() (16)
PROOF

(k)
() is determined in (14) by searching for the
optimal k-symbol combination

k
=1

= . Assuming that
some

is 0, this combination is equivalent to the (k 1)-


symbol combination and has been considered for

(k1)
().
Otherwise if all symbols are nonzero, with k p +1, we can
consider the p k binary matrix B of which the -th column
is the binary vector of

. Since the rank is at most p, it can


be proved that there must exist a subset ! in 1, . . . k such
that

= 0. Following from Theorem 4, the k-symbol


combination can be ignored, but the equivalent (k [![)-
symbol combination has been considered for

(k|R|)
().
Consequently, after ignoring every combination of more than
p nonzero symbols, the search space for

(k)
() becomes
WANG et al.: A SIMPLIFIED MIN-SUM DECODING ALGORITHM FOR NON-BINARY LDPC CODES 27
equivalent to that for

(p)
(). It implies that

(k)
() must
be equal to

(p)
().
By the derivations given above, we have proposed to reduce
the search space signicantly in the deviation space, especially
for the larger check node degree and smaller eld. Lemma 5
also yields the maximal conguration order required by (1),
i.e., min(d
c
1, p). Moreover, in (14), the k VNs are chosen
from N
i
j without repetition. However, if k VNs are allowed
to be chosen with repetition, the search space will expand such
that (14) can be approximated by the lower bound:

(k)
() min

k
=1

=
min
j1Ni\j
min
j
k
Ni\j
k

=1

i,j

)
= min

k
=1

=
k

=1
min
j

Ni\j

i,j

)
= min

k
=1

=
k

=1

(1)
(

), (17)
where the last equation follows from (13). Therefore, the
SMSA can be carried out as follows:
Algorithm 2. The Simplied Min-Sum Algorithm
Initialization: Set z
j
= arg min
dGF(q)

j
(d). For all i, j with
h
i,j
,= 0, set a
i,j
= h
i,j
z
j
and
i,j
(h
i,j
) =
j
( z
j
).
Set = 0.
Step 1) and 2) (The same as Step 1 and 2 in the EMSA)
CN processing: Step 3.1-4
Step 3.1) Compute the C2V hard messages:
b
i,j
=

Ni\j

a
i,j
. (18)
Step 3.2) Compute the step-1 soft messages:

(1)
i,j
() = min
j

Ni\j

i,j
(). (19)
Step 3.3) Compute the step-2 soft messages by selecting
the combination of k symbols according to Theorem 4:

i,j
() = min

k
=1

=
k

=1

(1)
i,j
(

). (20)
Step 3.4) Scaling and reordering: With 0 < c 1,

i,j
() c

i,j
().
For d ,= b
i,j
,
i,j
(d) =

i,j
(b
i,j
d); otherwise

i,j
(b
i,j
) = 0.
Step 4) (The same as Step 4 in the EMSA)
Step 5) Message normalization and reordering:
a
i,j
= arg min
dGF(q)

i,j
(d). (21)

i,j
(d) =
i,j
(d)
i,j
(a
i,j
). (22)

i,j
() =
i,j
( a
i,j
). (23)
Step 6) (The same as the Step 6 in the EMSA)
Go to Step 1.
As a result, the soft message generation is conducted in
two steps (Step 3.2 and 3.3). To compute C2V messages

i,j
, rst in Step 3.2 we compute the minimal entry values
min
j

i,j
() over all j

N
i
j for each GF(q) 0.
Then in Step 3.3, the minimal values are used to generate
the approximation of

i,j
(). Instead of the congurations of
all d
c
VNs in N
i
, (20) optimizes over the combinations of k
symbols chosen from the eld. Comparing Theorem 3 to (19)
and (20), we can nd that by our approximation method, in
the SMSA, the optimization is performed over the VN set and
symbol combination set separately and thus has the advantage
of a much smaller search space.
B. Practical Realization
Because of the complexity issue, the authors of [11] sug-
gested to use k = 4 for (1), as using k > 4 is reported to give
unnoticeable performance improvement. Correspondingly, we
only consider a small k for (20). But it is still costly to generate
all combinations with the large nite eld. For example, with
a 64-ary code there are totally

q
2

= 2016 combinations for


k = 2 and

q
4

= 635376 for k = 4. Even with Theorem 4


applied, the number of required combinations can be proved
to be of O(q
k
). For this reason, we consider a reduced-
complexity realization other than directly transforming the
algorithm into the implementation. It can be shown that for

2
= with

1
=

h
=1

and

2
=

k
=h+1

and
1 < h < k, in SMSA

() can also be approximated by

() min

k
=1

min
j1,...,j
h
Ni\j
j1=...=j
h
h

=1

i,j

)
+ min
j
h+1
,...,j
k
Ni\j
j
h+1
=...=j
k
k

=h+1

i,j

= min

2
=

min

h
=1

1
min
j1,...,j
h
Ni\j
j1=...=j
h
h

=1

i,j

)
+ min

k
=h+1

2
min
j
h+1
,...,j
k
Ni\j
j
h+1
=...=j
k
k

=h+1

i,j

min

2
=

1
) +

2
)

,
(24)
where

() denotes the primitive message, that is the soft


message of any order lower than the required order k. Hence
we can successively combine two 2-symbol combinations to
make a 4-symbol one by two sub-steps with a look-up table
(LUT), in which all 2-symbol combinations are listed. This
method allows us to obtain k-symbol combinations using
log
2
k sub-steps, with k equal to a power of 2. Based on
this general technique, in the following we will select k to
meet requirements for complexity and performance, and then
practical realizations are provided specically for different k.
The approximation loss with a small k results from the
reduced search, with the search space size of O(q
k
). Accord-
ing to Theorem 5, the full-size search space is of p-symbol
combinations, with the size of O(q
p
). As the size ratio between
two spaces is of O(q
pk
), the performance degradation is
supposed to be smaller for smaller elds. k = 1 was shown to
28 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 61, NO. 1, JANUARY 2013
TABLE I
THE LOOK-UP TABLE D FOR GF(2
3
).
f 0 1 2 3
1 (0,1) (2,3) (4,5) (6,7)
2 (0,2) (1,3) (4,6) (5,7)
3 (0,3) (1,2) (4,7) (5,6)
4 (0,4) (1,5) (2,6) (3,7)
5 (0,5) (1,4) (2,7) (3,6)
6 (0,6) (1,7) (2,4) (3,5)
7 (0,7) (1,6) (2,5) (3,4)
Algorithm 3 Generate the look-up table for GF(q).
1: for

= 1 . . . q 1 do
2: for

= (

1) . . . q 1 do
3: =

;
4: D().Add(

);
5: end
6: end
have huge performance loss for NB-LDPC codes [11]. By the
simulation results in Section IV, setting k = 2 will be shown
to have smaller loss with smaller elds when compared to the
EMSA. And having k = 4 will be shown to provide negligible
loss, with eld size q up to 256. Since we observed that using
k > 4 gives little advantage, two settings k = 2 and k = 4
will be further investigated in the following as two tradeoffs
between complexity and performance.
Let us rst look at the required LUT. Shown in Algorithm
3, the pseudo code generates the list of combinations (
1
,
2
)
without repetition for each target with
1

2
= . Since
we have q/2 combinations for each of q 1 target, D can
be depicted as a two-dimensional table with q 1 rows and
q/2 columns. For 1 d (q 1) and 0 f q/2
1, each cell D
,f
in the table is a two-tuple containing two
elements D
,f
(0) and D
,f
(1), which satisfy the addition rule
D
,f
(0) D
,f
(1) = . For example, when q = 8, the LUT
is provided in Table I.
Step 3.3 and (20) can be realized by Step 3.3.1 and 3.3.2
given below.
Step 3.3.1) With the LUT D, compute the step-1 mes-
sages by

i,j
() = min
f=0...q/21

(1)
i,j
(D
,f
(0))+

(1)
i,j
(D
,f
(1))

. (25)
Step 3.3.2) Compute the step-2 messages by

i,j
() = min
f=0...q/41

i,j
(D
,f
(0)) +

i,j
(D
,f
(1))

. (26)
By the denition, we let

(1)
i,j
(0) =

i,j
(0) = 0, so

(1)
i,j
(D
,0
(0)) +

(1)
i,j
(D
,0
(1)) =

(1)
i,j
().
The rst sub-step combines two symbols D
,f
(0) and
D
,f
(1) for each and f, making a 2-symbol combination.
The comparison will be conducted over f = 0 . . . q/2 1 for
each . Assume that the index of the minimal value is f

().
Then the second sub-step essentially combines two two-
tuples D
D
,f
(0),f

(D
,f
(0))
and D
D
,f
(1),f

(D
,f
(1))
, making
a 4-symbol combination. It can be proved that all 4-symbol
2.7 2.8 2.9 3 3.1 3.2
10
8
10
7
10
6
10
5
10
4
10
3
10
2
10
1
SNR (dB)
B
L
E
R
/
B
E
R


BLER SMSA1
BER SMSA1
BLER SMSA2
BER SMSA2
BLER EMSA
BER EMSA
BLER MMA
BER MMA
BLER QSPA
BER QSPA
Fig. 1. BLER and BER comparison of the SMSA-1, SMSA-2, EMSA,
MMA, and QSPA with the (1057,813) code over GF(2
4
). The BPSK is used
over the AWGN channel. The maximal iteration number max is set to 50.
combinations can be considered by combining two-tuples D
,f
of f = 0, 1, . . . q/41. So the second sub-step only performs
the left half of the Table D. For instance, over GF(2
3
) the left
half of D is formed by f = 0, 1 in Table I.
For k = 2 and k = 4 respectively, we dene two versions
of SMSA, i.e., the one-step SMSA (denoted as SMSA-1) and
the two-step SMSA (denoted as SMSA-2). The SMSA-1 is
the same as the SMSA-2 except for the implementation of
Step 3.3. The SMSA-1 only requires Step 3.3.1 and skips
Step 3.3.2, while the SMSA-2 implements both steps. We will
present the performance and complexity results of the SMSA-
1 and SMSA-2 in the following sections.
IV. SIMULATION RESULTS
In this section, we use ve examples to demonstrate the
performance of the above proposed SMSA for decoding NB-
LDPC codes. The existing algorithms including the QSPA,
EMSA, and MMA are used for performance comparison.
The SMSA includes the one-step (SMSA-1) and two-step
(SMSA-2) versions. In the rst two examples, three codes
over GF(2
4
), GF(2
6
), and GF(2
8
) are considered. We show
that the SMSA-2 has very good performance for different
nite elds and modulations. And the SMSA-1 has small
performance loss compared to the SMSA-2 over GF(2
4
) and
GF(2
5
). The binary phase-shift keying (BPSK) and quadrature
amplitude modulation (QAM) are applied over the additive
white Gaussian noise (AWGN) channel. In the third example,
we study the xed-point realizations of SMSA and nd that
it is exceptionally suitable for hardware implementation. The
fourth example compares the performance of the SMSA,
QSPA, EMSA, and MMA over the uncorrelated Rayleigh-
fading channel. The SMSA-2 shows its reliability with higher
channel randomness. In the last example, we research on the
convergence speed of SMSA and show that it converges almost
as fast as EMSA.
Example 1. (BPSK-AWGN) Three codes constructed by com-
puter search over different nite elds are used in this ex-
ample. Four iterative decoding algorithms (SMSA, QSPA,
WANG et al.: A SIMPLIFIED MIN-SUM DECODING ALGORITHM FOR NON-BINARY LDPC CODES 29
3.6 3.7 3.8 3.9 4 4.1 4.2 4.3 4.4
10
8
10
7
10
6
10
5
10
4
10
3
10
2
10
1
10
0
SNR (dB)
B
L
E
R
/
B
E
R


BLER SMSA1
BER SMSA1
BLER SMSA2
BER SMSA2
BLER EMSA
BER EMSA
BLER MMA
BER MMA
BLER QSPA
BER QSPA
Fig. 2. BLER and BER comparison of the SMSA-1, SMSA-2, EMSA,
MMA, and QSPA with the (495,433) code over GF(2
6
). The BPSK is used
over the AWGN channel. The maximal iteration number max is set to 50.
2.8 2.9 3 3.1 3.2 3.3 3.4
10
7
10
6
10
5
10
4
10
3
10
2
10
1
SNR (dB)
B
L
E
R
/
B
E
R


BLER SMSA1
BER SMSA1
BLER SMSA2
BER SMSA2
BLER EMSA
BER EMSA
BLER MMA
BER MMA
BLER QSPA
BER QSPA
Fig. 3. BLER and BER comparison of the SMSA-1, SMSA-2, EMSA,
MMA, and QSPA with the (273,191) code over GF(2
8
). The BPSK is used
over the AWGN channel. The maximal iteration number max is set to 50.
EMSA, and MMA) are simulated with the BPSK modulation
over the binary-input AWGN channel for every code. The
maximal iteration number
max
is set to 50 for all algorithms.
The bit error rate (BER) and block error rate (BLER) are
obtained to characterize the error performance. The rst code
is a rate-0.769 (3,13)-regular (1057,813) code over GF(2
4
),
and its error performance is shown in Fig. 1. We use optimal
scaling factors c = 0.60, 0.75, and 0.73 for the SMSA-
1, SMSA-2, and EMSA respectively. The second code is a
rate-0.875 (3,24)-regular (495,433) code over GF(2
6
), and
its error performance is shown in Fig. 2. We use optimal
scaling factors c = 0.50, 0.70, and 0.65 for the SMSA-1,
SMSA-2, and EMSA respectively. The third code is a rate-
0.70 (3,10)-regular (273,191) code over GF(2
8
), and its error
performance is shown in Fig. 3. We use optimal scaling factors
c = 0.35, 0.575, and 0.60 for the SMSA-1, SMSA-2, and
EMSA respectively. Taking the EMSA as a benchmark at
BLER of 10
5
, we observe that the SMSA-2 has SNR loss of
less than 0.05 dB, while the MMA suffers from about 0.1 dB
15.4 15.6 15.8 16 16.2 16.4
10
8
10
7
10
6
10
5
10
4
10
3
10
2
10
1
10
0
SNR (dB)
B
L
E
R
/
B
E
R


BLER EMSA
BER EMSA
BLER MMA
BER MMA
BLER QSPA
BER QSPA
BLER SMSA1
BER SMSA1
BLER SMSA2
BER SMSA2
Fig. 4. BLER and BER comparison of the SMSA-1, SMSA-2, EMSA,
MMA, and QSPA with the (495,433) code over GF(2
6
). The 64-QAM is
used over the AWGN channel. The maximal iteration number max is set to
50.
1.7 1.8 1.9 2 2.1 2.2 2.3 2.4
10
5
10
4
10
3
10
2
10
1
10
0
SNR (dB)
B
L
E
R


SMSA1 floating point
SMSA1 I=3 F=2
EMSA floatingpoint
FFTQSPA
MMA floatingpoint
SMSA2 floatingpoint
SMSA2 I=3 F=2
Fig. 5. BLER comparison of the SMSA-1, SMSA-2 (xed-point and oating-
point), QSPA, EMSA, and MMA (oating-point only) with the (620,310)
code over GF(2
5
). The BPSK is used over the AWGN channel. The maximal
iteration number max is set to 50.
loss. The SMSA-1 has 0.06 dB loss with GF(2
4
) and almost
0.15 dB loss with GF(2
6
) and GF(2
8
) against the EMSA. As
discussed in Section III-B, the SMSA-1 performs better with
smaller elds. At last, the QSPA has SNR gain of less than
0.05 dB and yet is viewed as undesirable for implementation.
Example 2. (QAM-AWGN) Fig. 4 shows the performance of
the 64-ary (495,433) code, the second code in Example 1, with
the rectangular 64-QAM. Four decoding algorithms (SMSA,
QSPA, EMSA, and MMA) are simulated with nite eld sym-
bols directly mapped to the grey-coded constellation symbols
over the AWGN channel. The maximal iteration number
max
is set to 50 for all algorithms. The SMSA-1, SMSA-2, and
EMSA have the optimal scaling factors c = 0.37, 0.60, and
0.50 respectively.We note that the SMSA-2 and EMSA achieve
nearly the same BER and BLER, while the MMA and SMSA-
1 have 0.11 and 0.14 dB of performance loss.
30 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 61, NO. 1, JANUARY 2013
12.6 12.8 13 13.2 13.4 13.6
10
4
10
3
10
2
10
1
10
0
SNR (dB)
B
L
E
R


SMSA2
SMSA1
FFTQSPA
EMSA
MMA
Fig. 6. BLER comparison of the SMSA-1, SMSA-2, QSPA, EMSA, and
MMA with the (620,310) code over GF(2
5
). The BPSK is used over the
uncorrelated Rayleigh-fading channel. The maximal iteration number max is
set to 50.
1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3
10
4
10
3
10
2
10
1
10
0
SNR (dB)
B
L
E
R


EMSA
max
=4
EMSA
max
=5
EMSA
max
=7
EMSA
max
=10
SMSA2
max
=4
SMSA2
max
=5
SMSA2
max
=7
SMSA2
max
=10
Fig. 7. BLER comparison of the SMSA-2 and EMSA with the (620,310)
code over GF(2
5
). The BPSK is used over the AWGN channel. The maximal
iteration number max is set to 4, 5, 7, and 10.
Example 3. (Fixed-Point Analysis) To investigate the effec-
tiveness of the SMSA, we evaluate the block error perfor-
mance of the (620,310) code over GF(2
5
) taken from [9].
The parity-check matrix of the code is a 10 20 array of
3131 circulant permutation matrices and zero matrices. The
oating-point QSPA, EMSA, MMA, SMSA-1, and SMSA-2
and the xed-point SMSA-1 and SMSA-2 are simulated using
the BPSK modulation over the AWGN channel. The BLER
results are shown in Fig. 5. The optimal scaling factors for
the SMSA-1, SMSA-2, and EMSA are c = 0.6875, 0.6875,
and 0.65 respectively. The maximal iteration number
max
is
set to 50 for all algorithms. Let I and F denote the number of
bits for the integer part and fraction part of the quantization
scheme. We observe that for SMSA-1 and SMSA-2 ve bits
(I = 3, F = 2) are sufcient. For approximating the QSPA
and EMSA, the SMSA-2 has SNR loss of only 0.1 dB and
0.04 dB at BLER of 10
4
, respectively. And the SMSA-1 has
SNR loss of 0.14 dB and 0.08 dB respectively.
1.4 1.6 1.8 2 2.2 2.4 2.6 2.8
0
10
20
30
40
50
60
70
80
90
SNR (dB)
A
v
e
r
a
g
e

N
u
m
b
e
r

o
f

I
t
e
r
a
t
i
o
n
s


EMSA
max
=100
EMSA
max
=50
EMSA
max
=20
SMSA2
max
=100
SMSA2
max
=50
SMSA2
max
=20
2.2 2.21 2.22 2.23 2.24
7.8
8
8.2
8.4


Fig. 8. The average number of iterations for the SMSA-2 and EMSA with
the (620,310) code over GF(2
5
). The BPSK is used over the AWGN channel.
The maximal iteration number max is set to 20, 50, and 100.
Example 4. (Fading Channel) To test the reliability of the
SMSA, we examine the error performance of the 32-ary
(620,310) code given in Example 3 over the uncorrelated
Rayleigh-fading channel with additive Gaussian noise. The
channel information is assumed to be known to the re-
ceiver. The oating-point QSPA, EMSA, MMA, SMSA-1, and
SMSA-2 are simulated using the BPSK modulation, as the
BLER results are shown in Fig. 6. Compared to the EMSA, the
SNR loss of SMSA-2 is within 0.1 dB, while the SMSA-1 and
MMA have around 0.2 dB loss. The QSPA has performance
gain in low and medium SNR regions and no gain at high
SNR.
Example 5. (Convergence Speed) Consider again the 32-
ary (620,310) code given in Example 3. The block error
performances for this code using the SMSA-2 and EMSA
with 4, 5, 7, and 10 maximal iterations are shown in Fig. 7.
At BLER of 10
3
, the SNR gap between the SMSA-2 and
EMSA is 0.04 dB for various
max
. To further investigate
the convergence speed, we summarize the average number
of iterations for the EMSA and SMSA-2 with 20, 50, and
100 maximal iterations and show the results in Fig. 8. It
should be noted that shown in Fig. 5, the SNR gap of BLER
between EMSA and SMSA-2 is about 0.04 dB, at BLER of
10
3
and SNR of about 2.2 dB . By examining the curves of
Fig. 8 at SNR of about 2.2 dB (in the partial enlargement),
we observe that for the same average iteration number the
difference of required SNR is also around 0.04 dB between the
two algorithms. Since a decoding failure increases the average
iteration number, the SNR gap of error performance can be
seen as the main reason for the SNR gap of average iteration
numbers. Therefore, as the failure occurs often at low SNR and
rarely at high SNR, in Fig. 8 the iteration increase for SMSA-
2 at high SNR is negligible (< 5% at 2.2 dB), and at low and
medium SNR the gap is larger ( 11% at 1.8 dB). Although
the result is not shown, we observe that the SMSA-1 also
has similar convergence properties, and the iteration increase
compared with the EMSA at high SNR is around 6%.
WANG et al.: A SIMPLIFIED MIN-SUM DECODING ALGORITHM FOR NON-BINARY LDPC CODES 31
V. COMPLEXITY ANALYSIS
In this section, we analyze the computational complexity of
the SMSA and compare it with the EMSA. The comparison
of average required iterations is provided in Example 5 of
Section IV. With a xed SNR, the SMSA requires slightly
more (5 6%) number of average iterations than the EMSA
at medium and high SNR region. As the two algorithms
have small (within 0.2 dB) performance difference, especially
between the EMSA and SMSA-2, we think that it is fair
to simply compare the complexity of the SMSA and EMSA
by the computations per iteration. Moreover, since the VN
processing is similar for both algorithms, we only analyze the
CN processing. The required operation counts per iteration for
a CN with degree d
c
are adopted as the metric.
To further reduce the duplication of computations in CN
processing, we propose to transform the Step 3.1 and 3.2 of
SMSA as follows. Step 3.1 can be transformed into two sub-
steps. We dene
A
i
=

Ni

a
i,j
.
Then each b
i,j
can be computed by
b
i,j
= a
i,j
A
i
.
Thus totally it takes 2d
c
1 nite eld additions to compute
this step for a CN.
Similarly, the computation of Step 3.2 can be transformed
into two sub-steps. For the i-th row of the parity-check
matrix, we dene a three-tuple min1
i
(), min2
i
(), idx
i
(),
in which
min1
i
() min
j

Ni

i,j
(),

i,idxi()
() min1
i
(),
min2
i
() min
j

Ni\idxi()

i,j
().
For each nonzero symbol in GF(q), it takes at most 1 +
2(d
c
2) = 2d
c
3 min operations, and each operation can
be realized by a comparator and multiplexor to compute the
3-tuple min1
i
(), min2
i
(), idx
i
().
The remaining computations of Step 3.2 can be computed
equivalently by

(1)
i,j
() = min1
i
() if j ,= idx
i
();

(1)
i,j
() = min2
i
() if j = idx
i
.
It takes d
c
comparisons and d
c
two-to-one selections to
perform the required operations. As there are q 1 entries
of

(1)
i,j
, the overall computations of Step 3.2 per CN requires
(3d
c
3)(q1) comparators and (3d
c
3)(q1) multiplexors.
For the SMSA-2, Step 3.3 is realized by Step 3.3.1 and
3.3.2. To compute Step 3.3.1 for each symbol , it takes (q
2)/2 summations and (q2)/2 comparisons. To compute Step
3.3.2 for each , it takes (q 4)/4 summations and (q 4)/4
comparisons. Therefore, totally it takes 3q/4 2 summations
and min operations for each . As we have q 1 nonzero
symbols in GF(q), overall it requires (3q/4 2)(q 1)d
c
summations, comparisons, and two-to-one selections. For the
SMSA-1, it requires (q/2 1)(q 1)d
c
summations, com-
parisons, and two-to-one selections. Step 3.4 performs scaling
and shifting and thus is ignored here, since the workload is
negligible compared to LLR calculations.
Then let us analyze the CN processing in EMSA for com-
parison. As in [14], [15], [17], usually the forward-backward
scheme is used to reduce the implementation complexity. For
a CN with degree d
c
, 3(d
c
2) stages are required, and each
stage needs q
2
summations and (q1)q min operations. Over-
all, in the EMSA, each CN has q
2
d
c
summations, (q 1)qd
c
comparisons, and (q1)qd
c
two-to-one selections. The results
for the SMSA and EMSA are summarized in Table II. As in
implementation the required nite eld additions of SMSA
take only marginal area, we see that the SMSA requires much
less computations compared to the EMSA.
Since the computational complexity for decoding NB-LDPC
codes is very large, the decoder implementations usually
adopt partially-parallel architectures. Therefore, the CN-to-
VN messages are usually stored in the decoder memory
for future VN processing. As memory occupies signicant
amount of silicon area in hardware implementation, opti-
mizing the memory usage becomes an important research
problem [14], [15], [17]. For Step 3.2 of SMSA, the 3-
tuple min1
i
(), min2
i
(), idx
i
() can be used to recover
the messages

(1)
i,j
() for all j N
i
. Assume that the bit
width for each entry of the soft message is w in the CN
processing. Then for each in GF(q), the SMSA needs to
store 2w + ,log
2
d
c
| bits for the 3-tuple. Also, it needs to
store the hard messages a
i,j
in Step 3.1, which translate to
p d
c
bits of storage. To store the intermediate messages for
the CN processing of each row, totally the SMSA requires to
store (2w + ,log
2
d
c
|)(q 1) + pd
c
bits. In comparison, for
the EMSA, there is no correlation between
i,j
(d) of each
j N
i
in the i-th CN. Therefore, the EMSA requires to store
the soft messages
i,j
(d) of all j N
i
, which translate to
w d
c
q bits. We see that the SMSA requires much less
memory storage compared to the EMSA.
We take as an example the (620,310) code over GF(2
5
) used
in Section IV. With w = 5 and d
c
= 6, the SMSA-1 requires
2790 summations, 3255 comparisons, and 433 memory bits
for each CN per iteration, and the SMSA-2 requires 4092
summations, 4557 comparisons, and 433 memory bits. The
EMSA requires 12288 summations, 11904 comparisons, and
960 memory bits. As a result, compared to the EMSA, the
SMSA-1 saves 77% on summations and 73% on comparisons,
and the SMSA-2 saves 67% and 62% respectively. Both of
the two SMSA versions save 55% on memory bits. More
hardware implementation results are presented for SMSA-2
in [19], which shows exceptional saving in silicon area when
compared with existing NB-LDPC decoders.
VI. CONCLUSIONS
In this paper, we have presented a hardware-efcient de-
coding algorithm, called the SMSA, to decode NB-LDPC
codes. This algorithm is devised based on signicantly re-
ducing the search space of combinatorial optimization in
the CN processing. Two practical realizations, the one-step
and two-step SMSAs, are proposed for effective complexity-
performance tradeoffs. Simulation results show that with eld
32 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 61, NO. 1, JANUARY 2013
TABLE II
THE REQUIRED OPERATIONS PER ITERATION AND MEMORY USAGE TO PERFORM THE CN PROCESSING OF A CN WITH DEGREE dc FOR A q-ARY CODE.
THE BIT WIDTH PER SUB-MESSAGE IS w.
Type SMSA-1 SMSA-2 EMSA
Finite Field Additions 2dc 1 2dc 1 0
Summations (q/2 1)(q 1)dc (3q/4 2)(q 1)dc 3(dc 2)q
2
Comparisons and Selections ((q/2 + 2)dc 3)(q 1) ((3q/4 + 1)dc 3)(q 1) 3(dc 2)q(q 1)
Memory Usage (Bits) (2w + log
2
dc)(q 1) (2w + log
2
dc)(q 1) wdcq
+pdc +pdc
size up to 256, the two-step SMSA has negligible error
performance loss compared to the EMSA over the AWGN
and Rayleigh-fading channels. The one-step SMSA has 0.1 to
0.2 dB loss depending on the eld size. Also, the xed-point
study and convergence speed research show that it is suitable
for hardware implementation. Another important feature of
SMSA is simplicity. Based on our analysis, the SMSA has
much lower computational complexity and memory usage
compared to other decoding algorithms for NB-LDPC codes.
We believe that our work for the hardware-efcient algorithm
will encourage researchers to explore the use of NB-LDPC
codes in emerging applications.
REFERENCES
[1] R. Gallager., Low-density parity-check codes, IEEE Trans. Inf. Theory,
vol. 8, no. 1, pp. 2128, 1962.
[2] D. MacKay, Good error-correcting codes based on very sparse matri-
ces, IEEE Trans. Inf. Theory, vol. 45, no. 2, pp. 399431, Mar. 1999.
[3] M. Fossorier, M. Mihaljevic, and H. Imai, Reduced complexity iterative
decoding of low-density parity check codes based on belief propaga-
tion, IEEE Trans. Commun., vol. 47, no. 5, pp. 673680, May 1999.
[4] M. Davey and D. MacKay, Low-density parity check codes over
GF(q), IEEE Commun. Lett., vol. 2, no. 6, pp. 165167, 2002.
[5] A. Bennatan and D. Burshtein, Design and analysis of nonbinary LDPC
codes for arbitrary discrete-memoryless channels, IEEE Trans. Inf.
Theory, vol. 52, no. 2, pp. 549583, 2006.
[6] S. Song, L. Zeng, S. Lin, and K. Abdel-Ghaffar, Algebraic constructions
of nonbinary quasi-cyclic ldpc codes, in Proc. 2006 IEEE International
Symp. Inf. Theory, pp. 8387.
[7] L. Zeng, L. Lan, Y. Tai, B. Zhou, S. Lin, and K. Abdel-Ghaffar,
Construction of nonbinary cyclic, quasi-cyclic and regular ldpc codes:
a nite geometry approach, IEEE Trans. Commun., vol. 56, no. 3, pp.
378387, 2008.
[8] B. Zhou, J. Kang, Y. Tai, S. Lin, and Z. Ding, High performance non-
binary quasi-cyclic LDPC codes on euclidean geometries LDPC codes
on euclidean geometries, IEEE Trans. Commun., vol. 57, no. 5, pp.
12981311, 2009.
[9] B. Zhou, J. Kang, S. Song, S. Lin, K. Abdel-Ghaffar, and M. Xu,
Construction of non-binary quasi-cyclic LDPC codes by arrays and
array dispersions, IEEE Trans. Commun., vol. 57, no. 6, pp. 1652
1662, 2009.
[10] R. Koetter and A. Vardy, Algebraic soft-decision decoding of Reed-
Solomon codes, IEEE Trans. Inf. Theory, vol. 49, no. 11, pp. 2809
2825, 2003.
[11] D. Declercq and M. Fossorier, Decoding algorithms for nonbinary
LDPC codes over GF(q), IEEE Trans. Commun., vol. 55, no. 4, p.
633, 2007.
[12] A. Voicila, F. Verdier, D. Declercq, M. Fossorier, and P. Urard, Archi-
tecture of a low-complexity non-binary LDPC decoder for high order
elds, in Proc. 2007 IEEE International Symp. Commun. Inf. Technol.,
pp. 12011206.
[13] V. Savin, Min-max decoding for non binary LDPC codes, in Proc.
2008 IEEE International Symp. Inf. Theory, pp. 960964.
[14] J. Lin, J. Sha, Z. Wang, and L. Li, Efcient decoder design for
nonbinary quasicyclic LDPC codes, IEEE Trans. Circuits Syst. I:
Regular Papers, vol. 57, no. 5, pp. 10711082, 2010.
[15] X. Zhang and F. Cai, Efcient partial-parallel decoder architecture
for quasi-cyclic non-binary LDPC codes, IEEE Trans. Circuits Syst.
I: Regular Papers, vol. 58, no. 2, pp. 402414, Feb. 2011.
[16] , Reduced-complexity decoder architecture for non-binary LDPC
codes, IEEE Trans. Very Large Scale Integration (VLSI) Syst., vol. 19,
pp. 12291238, July 2011.
[17] X. Chen, S. Lin, and V. Akella, Efcient congurable decoder archi-
tecture for non-binary quasi-cyclic LDPC codes, IEEE Trans. Circuits
Syst. I: Regular Papers, vol. 59, no. 1, pp. 188197, Jan. 2012.
[18] C. Chen, Q. Huang, C. Chao, and S. Lin, Two low-complexity
reliability-based message-passing algorithms for decoding non-binary
LDPC codes, IEEE Trans. Commun., vol. 58, no. 11, pp. 31403147,
2010.
[19] X. Chen and C.-L. Wang, High-throughput efcient non-binary LDPC
decoder based on the simplied min-sum algorithm, IEEE Trans.
Circuits Syst. I: Regular Papers, in publication.
Chung-Li Wang received his B.S. degree in elec-
trical engineering from National Tsing Hua Uni-
versity, Hsinchu, Taiwan, R.O.C., in 2004, and the
M.S. degree in communication engineering from
National Taiwan University, Taipei, Taiwan, R.O.C.,
in 2006. In 2010, he received the Ph.D. degree from
University of California, Davis, U.S.A. in electrical
and computer engineering. He is currently with
LSI Corporation as a staff engineer. His research
interests include coding theory, magnetic recording
channel, and solid state drives.
Xiaoheng Chen received his B.S. and M.S. degree
in electrical engineering from the Zhejiang Univer-
sity, Hangzhou, China, in 2005 and 2007, respec-
tively, and received his Ph.D. degree in electrical and
computer engineering from the University of Cali-
fornia, Davis in 2011. He is currently with SanDisk
in Milpitas, CA as a senior research engineer. His
research interests include computer architecture and
parallel computing, recongurable computing, error
control coding, NAND ash memory, and solid state
drives.
Zongwang Li received his PhD. degree in electri-
cal engineering from Shanghai Jiaotong University,
Shanghai, China, in 2002. He is currently with LSI
in Milpitas, CA as a distinguished engineer. His
research interests include error control coding, sig-
nal processing, wireless communication, magnetic
recording channel, and solid state drives.
Shaohua Yang received his B.Eng. degree in Elec-
tronics Engineering from Tsinghua University, Bei-
jing, China, in 2000, and received his M.S. and
Ph.D. degree in Engineering Science from Harvard
University, Cambridge, USA in 2002 and 2004,
respectively. He is currently with LSI Corporation in
Milpitas, CA as a Senior Manager and Distinguished
Engineer for the read channel backend architecture
group. His research interests include signal pro-
cessing, data detection, error control coding, and
modulation coding for data storage products.

You might also like