High-Throughput FFT-SPA Decoder Implementation for Non-Binary LDPC Codes on x86 Multicore Processors

357 Accesses
2 Citations
Explore all metrics

Abstract

Low-Density Parity-Check (LDPC) codes are a well known Error Correction Code family used for instance, in wireless and satellite communication links. Error correction performance of LDPC codes was further enhanced by extending it to higher order Galois fields, giving rise hence to the so-called non-binary LDPC codes (NB-LDPC). Error correction performance improvement (CCSDS 2014) is the main reason behind the adoption by the Consultative Committee for Space Data Systems (CCSDS) of the NB-LDPC codes in the experimental specification for the future next generation uplinks (CCSDS 2014, 2015). The high error correction efficiency for short frames make NB-LDPC codes a good candidate for IoT applications. However, the performance gain comes at the expense of a high decoding computational complexity (CCSDS 2014; Conde-Canencia et al. 2009). In this paper, an x86 multicore NB-LDPC decoder implementation is provided. This decoder that implements the FFT-SPA algorithm provides a throughput improvement of about 1.3 × to 2.7 ×, a latency reduction of more than 95% and a power consumption halved in comparison with the most efficient works on GPU (Graphics Processing Unit) device. Indeed, an efficient memory mapping and computation optimizations on the x86 architecture enable to achieve a higher decoding throughput than the GPU-based in similar experimental setup. Consequently, the throughput efficiency, the low processing latency associated with a low power consumption makes this proposed multicore implementation practical and attractive for real time implementations of NB-LDPC decoders in future SDR or Cloud-RAN systems for CCSDS standard and IoT applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Design and FPGA Implementation of a Quasi-Cyclic LDPC Decoder

Non-binary LDPC Decoders Design for Maximizing Throughput of an FPGA Implementation

Article 12 January 2016

Multicore and Manycore Implementations of ADMM-based Decoders for LDPC Decoding

Article 23 September 2017

Notes

Note that optimized SPA and Min-Sum decoders were also implemented and evaluated even if they are not detailed in this paper. A subsection in the experimentation section provides performance comparisons.
Source codes of the NB-LDPC decoder implementations will be provided under open-source licence on GitHub
It provides an optimistic estimation of the processing latency discarding memory access penalties from latency estimation. Nevertheless, useful information about instruction level parallelism and processing bottlenecks is given.
The $\mathbb {GF}(16)$ CCSDS codes are not currently available.
Source codes of the NB-LDPC decoder implementations will be provided under open-source licence on GitHub such as previous works of the authors.
For multi-threaded experimentations, the C++ 11 thread feature is applied to allocate Q distinct threads that execute Q data and control independent decoders in parallel. System throughput is the sum of the Q decoder throughputs whereas decoding latency is the average of the latency average of each decoder.
Throughput reported in [10] is about 20 kbps whereas the proposed decoder throughput is about 3 Mbps for the same NB-LDPC code. However, no detail on experimentation setup is provided in [10] concerning the x86 OpenCL decoder implementation making a deeper comparison impossible.
It is due to low memory footprint of the decoders that is reached thanks to the intra-frame parallelization strategy.
$\mathcal {P}_2$ has an Haswell architecture contrary to other ones that have a Skylake architecture. Skylake architecture has improved features such as the floating point multiplication and division that consumes 4 and 11 clock cycles, respectively, 5 and 17 clock cycles are required for Haswell architecture.
Min-Sum and Min-Max decoder provides the same throughput and latency results because addition and maximum instructions requires the same number of clock cycles to execute.
Authors were contacted but no access to H matrices was granted, except the authors of [11, 17] who published their source codes under open-source licence.
Throughput under Normalized Decoding Cost [41].

References

Consultative Committee for Space Data Systems (CCSDS). (2014). CCSDS 231.1-O-1 - Green Book - Next Generation Uplink - Informational RePORT CCSDS 230.2-G-1. Washington, DC, USA.
Consultative Committee for Space Data Systems (CCSDS). (2015). CCSDS 231.1-O-1 - Orange Book - Short Block Length LDPC codes for TC Synchronization and Channel Coding Washington, DC, USA.
Conde-Canencia, L., Al-Ghouwayel, A., Boutillon, E. (2009). Complexity comparison of Non-Binary LDPC decoders. In Proceedings of the ICT mobile summit conference, pp. 1–8, Santander, Spain.
Gallager, R.G. (1962). Low density parity check codes. IRE Transactions on Information Theory, 8(1), 21–28.
Article MathSciNet Google Scholar
DVB Document A83-2. (2014). Digital video broadcasting (DVB) - part II: S2-extensions (DVB-s2x) march.
Consultative Committee for Space Data Systems (CCSDS). (2011). CCSDS 131.0-B-2 - TM synchronization and channel coding. Washington, DC, USA, blue book edition.
Davey, M.C., & MacKay, D.J.C. (1998). Low density parity check codes over GF(q). IEEE Communications Letters, 2(6), 165–167.
Article Google Scholar
Barnault, L., & Declercq, D. (2003). Fast decoding algorithm for LDPC over GF(2q). In Proceedings of the IEEE information theory workshop (pp. 70–73).
Dolecek, L., Divsalar, D., Sun, Y., Amiri, B. (2014). Non-binary protograph-based LDPC codes: enumerators, analysis, and designs. IEEE Transactions on Information Theory, 60(7), 3913–3941.
Article MathSciNet Google Scholar
Wang, G, Shen, H., Yin, B., Wu, M., Sun, Y., Cavallaro, J.R. (2012). Parallel nonbinary LDPC decoding on GPU. In Proceedings of the conference record of 46th Asilomar conference on signals, systems and computers (ASILOMAR) (pp. 1277–1281).
Andrade, J., Falcao, G., Silva, V., Kasai, K. (2013). FFT-SPA non-binary LDPC decoding on GPU. In Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP). Vancouver, BC (pp. 5099–5103).
Beermann, M., Monzo, E., Schmalen, L., Vary, P. (2013). High speed decoding of non-binary irregular LDPC codes using GPUs. In Proceedings of the IEEE workshop on signal processing systems (SiPS) (pp. 36–41).
Andrade, J., Falcao, G., Silva, V. (2014). Optimized fast walsh-hadamard transform on GPUs, for non-binary LDPC decoding. Elsevier Journal of Parallel Computing, 40(9), 449–453.
Article MathSciNet Google Scholar
Thi, H.P., Ajaz, S., Lee, H. (2014). Efficient Min-Max nonbinary LDPC decoding on GPU. In Proceedings of the international SoC design conference (ISOCC) (pp. 266–267).
Beermann, M., Schmalen, L., Vary, P. (2015). GPU accelerated belief propagation decoding of non-binary LDPC codes with parallel and sequential scheduling. Springer, Journal of Signal Processing Systems, 78(1), 21–34.
Article Google Scholar
Pham, H.T., Ajaz, S., Lee, H. (2015). Parallel block-layered nonbinary QC-LDPC decoding on GPU. In Proceedings of the IEEE workshop on signal processing systems (SiPS) (pp. 1–6).
Liu, Z., Liu, R., Hou, Y., Zhao, L. (2018). High-throughput multi-codeword decoder for non-binary LDPC codes on GPU. IEEE Communications Letters, 22, 486–489.
Article Google Scholar
Boutillon, E., Conde-Canencia, L., Ghouwayel, A.A. (2013). Design of a GF(64)-LDPC decoder based on the EMS algorithm. IEEE Transactions on Circuits and Systems (TCAS-I), 60(10), 2644–2656.
Article MathSciNet Google Scholar
Abassi, O, Conde-Canencia, L, Ghouwayel, A.A., Boutillon, E. (2017). A novel architecture for elementary check node processing in non-binary LDPC decoders. IEEE Transactions on Circuits and Systems II, (TCAS-II), 64 (2), 136–140.
Article Google Scholar
Chang, C.-C., Chang, Y.-L., Huang, M.-Y., Huang, B. (2011). Accelerating regular LDPC code decoders on GPUs. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 4(3), 653–659.
Article Google Scholar
Falcao, G., Silva, V., Sousa, L., Andrade, J. (2012). Portable LDPC decoding on multicores using OpenCL. IEEE Signal Processing Magazine, 29(4), 81–88.
Article Google Scholar
Wang, G., Wu, M., Yin, B., Cavallaro, J.R. (2013). High throughput low latency LDPC decoding on GPU for SDR systems. In Proceedings of the IEEE GlobalSIP conference (pp. 1258–1261).
Li, R., Dou, Y., Zou, D., Wang, S., Zhang, Y. (2013). Efficient graphics processing unit based layered decoders for quasicyclic low-density parity-check codes. Concurrency and Computation: Practice and Experience, 27 (1), 29–46.
Article Google Scholar
Lin, Y., & Niu, W. (2014). High throughput LDPC decoder on GPU. IEEE Communications Letters, 18(2), 344–347.
Article Google Scholar
Le Gal, B., & Jego, C. (2016). High-throughput multi-core LDPC decoders based on x86 processor. IEEE Transactions on Parallel and Distributed Systems (TPDS), 27(5), 1373–1386.
Article Google Scholar
Andrade, J., Falcao, G., Silva, V., Sousa, L. (2016). A survey on programmable LDPC decoders. IEEE Access, 4, 6704–6718.
Article Google Scholar
Le Gal, B., Leroux, C., Jego, C. (2014). Software polar decoder on an embedded processor. In Proceedings of the IEEE workshop on signal processing systems (SiPS) (pp. 1–6).
Le Gal, B., Leroux, C., Jego, C. (2015). Multi-Gb/s, software decoding of polar codes. IEEE Transactions on Signal Processing, 63(2), 349–359.
Article MathSciNet Google Scholar
Giard, P., Sarkis, G., Leroux, C., Thibeault, C., Gross, W.J. (2018). Low-latency software polar decoders. Journal of Signal Processing Systems, Springer, 90, 761–775.
Article Google Scholar
Grayver, E. (2013). Implementing software defined radio. New York: Springer.
Book Google Scholar
Cassagne, A., Tonnellier, T., Leroux, C., Le Gal, B., Aumage, O., Barthou, D. (2016). Beyond Gbps turbo decoder on multi-core CPUs. In Proceedings of the 9th International Symposium on Turbo Codes & Iterative Information Processing (ISTC’16), Brest, France.
Carrasco, R.A., & Johnston, M. (2008). Non-Binary error control coding for wireless communication and data storage. Chichester: Wiley.
Book Google Scholar
Hocevar, D.E. (2004). A reduced complexity decoder architecture via layered decoding of LDPC codes. In Proceedings of the IEEE workshop on signal processing systems (SIPS) workshop (pp. 107–112).
Declercq, D., & Fossorier, M. (2007). Decoding algorithms for nonbinary LDPC codes over GF. IEEE Transactions on Communications, 55(4), 633–643.
Article Google Scholar
Voicila, A., Declercq, D., Verdier, F., Fossorier, M., Urard, P. (2007). Low complexity, low memory EMS algorithm for non-binary LDPC codes. In IEEE Internationnal Conference on Communications (ICC’07), Glasgow, England.
Le Gal, B., & Jego, C. (2015). High-throughput LDPC decoder on low-power embedded processors. IEEE Communications Letters, 19(11), 1861–1864.
Article Google Scholar
Gronroos, S., Nybom, K., Bjorkqvist, J. (2012). Efficient GPU and CPU-based LDPC decoders for long codewords. Journal of Analog Integrated Circuits and Signal Processing, 73, 583—595.
Article Google Scholar
Wymeersch, H., Steendam, H., Moeneclaey, M. (2004). Log-domain decoding of LDPC codes over GF(q). In IEEE international conference on communications (ICC’04), pp. 772–776, Paris, France.
Poulliat, C., Fossorier, M., Declercq, D. (2008). Design of regular (2, d _c)-LDPC codes over GF(q) using their binary images. IEEE Transactions on Communications, 56(10), 1626–1635.
Article Google Scholar
Helmling, M., & Scholl, S. Database of channel codes and ml simulation results. University of Kaiserslautern, https://www.uni-kl.de/channel-codes.
Ying, Y., You, K., Zhou, L., Quan, H., Zeng, X. (2012). A pure software LDPC decoder on a multi-core processor platform with reduced inter-processor communication cost. In Proceedings of the ISCAS Symposium (pp. 2609–2612).
Le Gal, B., Jego, C., Crenne, J. (2014). A high throughput efficient approach for decoding LDPC codes onto GPU devices. IEEE Embedded Systems Letters, 6(2), 29–32.
Article Google Scholar
Peng, H., Liu, R., Hou, Y., Zhao, L. (2016). A Gb/s parallel block-based Viterbi decoder for convolutional codes on GPU. In Proceedings of the 8th international conference on wireless communications & signal processing (WCSP) (p. 1-6).

Download references

Acknowledgements

The authors would also like to thank INTEL Inc., for providing access to vLab Academic Research Environment.

Author information

Authors and Affiliations

IMS Lab., Univ. Bordeaux, Bordeaux INP, CNRS UMR 5218, France
Bertrand Le Gal & Christophe Jego

Authors

Bertrand Le Gal
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Jego
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bertrand Le Gal.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Le Gal, B., Jego, C. High-Throughput FFT-SPA Decoder Implementation for Non-Binary LDPC Codes on x86 Multicore Processors. J Sign Process Syst 92, 37–53 (2020). https://doi.org/10.1007/s11265-019-01447-8

Download citation

Received: 03 May 2018
Revised: 06 February 2019
Accepted: 05 March 2019
Published: 27 March 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s11265-019-01447-8

High-Throughput FFT-SPA Decoder Implementation for Non-Binary LDPC Codes on x86 Multicore Processors

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Design and FPGA Implementation of a Quasi-Cyclic LDPC Decoder

Non-binary LDPC Decoders Design for Maximizing Throughput of an FPGA Implementation

Multicore and Manycore Implementations of ADMM-based Decoders for LDPC Decoding

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

High-Throughput FFT-SPA Decoder Implementation for Non-Binary LDPC Codes on x86 Multicore Processors

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Design and FPGA Implementation of a Quasi-Cyclic LDPC Decoder

Non-binary LDPC Decoders Design for Maximizing Throughput of an FPGA Implementation

Multicore and Manycore Implementations of ADMM-based Decoders for LDPC Decoding

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation