Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

High-Throughput FFT-SPA Decoder Implementation for Non-Binary LDPC Codes on x86 Multicore Processors

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Low-Density Parity-Check (LDPC) codes are a well known Error Correction Code family used for instance, in wireless and satellite communication links. Error correction performance of LDPC codes was further enhanced by extending it to higher order Galois fields, giving rise hence to the so-called non-binary LDPC codes (NB-LDPC). Error correction performance improvement (CCSDS 2014) is the main reason behind the adoption by the Consultative Committee for Space Data Systems (CCSDS) of the NB-LDPC codes in the experimental specification for the future next generation uplinks (CCSDS 2014, 2015). The high error correction efficiency for short frames make NB-LDPC codes a good candidate for IoT applications. However, the performance gain comes at the expense of a high decoding computational complexity (CCSDS 2014; Conde-Canencia et al. 2009). In this paper, an x86 multicore NB-LDPC decoder implementation is provided. This decoder that implements the FFT-SPA algorithm provides a throughput improvement of about 1.3 × to 2.7 ×, a latency reduction of more than 95% and a power consumption halved in comparison with the most efficient works on GPU (Graphics Processing Unit) device. Indeed, an efficient memory mapping and computation optimizations on the x86 architecture enable to achieve a higher decoding throughput than the GPU-based in similar experimental setup. Consequently, the throughput efficiency, the low processing latency associated with a low power consumption makes this proposed multicore implementation practical and attractive for real time implementations of NB-LDPC decoders in future SDR or Cloud-RAN systems for CCSDS standard and IoT applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5

Similar content being viewed by others

Notes

  1. Note that optimized SPA and Min-Sum decoders were also implemented and evaluated even if they are not detailed in this paper. A subsection in the experimentation section provides performance comparisons.

  2. Source codes of the NB-LDPC decoder implementations will be provided under open-source licence on GitHub

  3. It provides an optimistic estimation of the processing latency discarding memory access penalties from latency estimation. Nevertheless, useful information about instruction level parallelism and processing bottlenecks is given.

  4. The \(\mathbb {GF}(16)\) CCSDS codes are not currently available.

  5. Source codes of the NB-LDPC decoder implementations will be provided under open-source licence on GitHub such as previous works of the authors.

  6. For multi-threaded experimentations, the C++ 11 thread feature is applied to allocate Q distinct threads that execute Q data and control independent decoders in parallel. System throughput is the sum of the Q decoder throughputs whereas decoding latency is the average of the latency average of each decoder.

  7. Throughput reported in [10] is about 20 kbps whereas the proposed decoder throughput is about 3 Mbps for the same NB-LDPC code. However, no detail on experimentation setup is provided in [10] concerning the x86 OpenCL decoder implementation making a deeper comparison impossible.

  8. It is due to low memory footprint of the decoders that is reached thanks to the intra-frame parallelization strategy.

  9. \(\mathcal {P}_2\) has an Haswell architecture contrary to other ones that have a Skylake architecture. Skylake architecture has improved features such as the floating point multiplication and division that consumes 4 and 11 clock cycles, respectively, 5 and 17 clock cycles are required for Haswell architecture.

  10. Min-Sum and Min-Max decoder provides the same throughput and latency results because addition and maximum instructions requires the same number of clock cycles to execute.

  11. Authors were contacted but no access to H matrices was granted, except the authors of [11, 17] who published their source codes under open-source licence.

  12. Throughput under Normalized Decoding Cost [41].

References

  1. Consultative Committee for Space Data Systems (CCSDS). (2014). CCSDS 231.1-O-1 - Green Book - Next Generation Uplink - Informational RePORT CCSDS 230.2-G-1. Washington, DC, USA.

  2. Consultative Committee for Space Data Systems (CCSDS). (2015). CCSDS 231.1-O-1 - Orange Book - Short Block Length LDPC codes for TC Synchronization and Channel Coding Washington, DC, USA.

  3. Conde-Canencia, L., Al-Ghouwayel, A., Boutillon, E. (2009). Complexity comparison of Non-Binary LDPC decoders. In Proceedings of the ICT mobile summit conference, pp. 1–8, Santander, Spain.

  4. Gallager, R.G. (1962). Low density parity check codes. IRE Transactions on Information Theory, 8(1), 21–28.

    Article  MathSciNet  Google Scholar 

  5. DVB Document A83-2. (2014). Digital video broadcasting (DVB) - part II: S2-extensions (DVB-s2x) march.

  6. Consultative Committee for Space Data Systems (CCSDS). (2011). CCSDS 131.0-B-2 - TM synchronization and channel coding. Washington, DC, USA, blue book edition.

  7. Davey, M.C., & MacKay, D.J.C. (1998). Low density parity check codes over GF(q). IEEE Communications Letters, 2(6), 165–167.

    Article  Google Scholar 

  8. Barnault, L., & Declercq, D. (2003). Fast decoding algorithm for LDPC over GF(2q). In Proceedings of the IEEE information theory workshop (pp. 70–73).

  9. Dolecek, L., Divsalar, D., Sun, Y., Amiri, B. (2014). Non-binary protograph-based LDPC codes: enumerators, analysis, and designs. IEEE Transactions on Information Theory, 60(7), 3913–3941.

    Article  MathSciNet  Google Scholar 

  10. Wang, G, Shen, H., Yin, B., Wu, M., Sun, Y., Cavallaro, J.R. (2012). Parallel nonbinary LDPC decoding on GPU. In Proceedings of the conference record of 46th Asilomar conference on signals, systems and computers (ASILOMAR) (pp. 1277–1281).

  11. Andrade, J., Falcao, G., Silva, V., Kasai, K. (2013). FFT-SPA non-binary LDPC decoding on GPU. In Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP). Vancouver, BC (pp. 5099–5103).

  12. Beermann, M., Monzo, E., Schmalen, L., Vary, P. (2013). High speed decoding of non-binary irregular LDPC codes using GPUs. In Proceedings of the IEEE workshop on signal processing systems (SiPS) (pp. 36–41).

  13. Andrade, J., Falcao, G., Silva, V. (2014). Optimized fast walsh-hadamard transform on GPUs, for non-binary LDPC decoding. Elsevier Journal of Parallel Computing, 40(9), 449–453.

    Article  MathSciNet  Google Scholar 

  14. Thi, H.P., Ajaz, S., Lee, H. (2014). Efficient Min-Max nonbinary LDPC decoding on GPU. In Proceedings of the international SoC design conference (ISOCC) (pp. 266–267).

  15. Beermann, M., Schmalen, L., Vary, P. (2015). GPU accelerated belief propagation decoding of non-binary LDPC codes with parallel and sequential scheduling. Springer, Journal of Signal Processing Systems, 78(1), 21–34.

    Article  Google Scholar 

  16. Pham, H.T., Ajaz, S., Lee, H. (2015). Parallel block-layered nonbinary QC-LDPC decoding on GPU. In Proceedings of the IEEE workshop on signal processing systems (SiPS) (pp. 1–6).

  17. Liu, Z., Liu, R., Hou, Y., Zhao, L. (2018). High-throughput multi-codeword decoder for non-binary LDPC codes on GPU. IEEE Communications Letters, 22, 486–489.

    Article  Google Scholar 

  18. Boutillon, E., Conde-Canencia, L., Ghouwayel, A.A. (2013). Design of a GF(64)-LDPC decoder based on the EMS algorithm. IEEE Transactions on Circuits and Systems (TCAS-I), 60(10), 2644–2656.

    Article  MathSciNet  Google Scholar 

  19. Abassi, O, Conde-Canencia, L, Ghouwayel, A.A., Boutillon, E. (2017). A novel architecture for elementary check node processing in non-binary LDPC decoders. IEEE Transactions on Circuits and Systems II, (TCAS-II), 64 (2), 136–140.

    Article  Google Scholar 

  20. Chang, C.-C., Chang, Y.-L., Huang, M.-Y., Huang, B. (2011). Accelerating regular LDPC code decoders on GPUs. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 4(3), 653–659.

    Article  Google Scholar 

  21. Falcao, G., Silva, V., Sousa, L., Andrade, J. (2012). Portable LDPC decoding on multicores using OpenCL. IEEE Signal Processing Magazine, 29(4), 81–88.

    Article  Google Scholar 

  22. Wang, G., Wu, M., Yin, B., Cavallaro, J.R. (2013). High throughput low latency LDPC decoding on GPU for SDR systems. In Proceedings of the IEEE GlobalSIP conference (pp. 1258–1261).

  23. Li, R., Dou, Y., Zou, D., Wang, S., Zhang, Y. (2013). Efficient graphics processing unit based layered decoders for quasicyclic low-density parity-check codes. Concurrency and Computation: Practice and Experience, 27 (1), 29–46.

    Article  Google Scholar 

  24. Lin, Y., & Niu, W. (2014). High throughput LDPC decoder on GPU. IEEE Communications Letters, 18(2), 344–347.

    Article  Google Scholar 

  25. Le Gal, B., & Jego, C. (2016). High-throughput multi-core LDPC decoders based on x86 processor. IEEE Transactions on Parallel and Distributed Systems (TPDS), 27(5), 1373–1386.

    Article  Google Scholar 

  26. Andrade, J., Falcao, G., Silva, V., Sousa, L. (2016). A survey on programmable LDPC decoders. IEEE Access, 4, 6704–6718.

    Article  Google Scholar 

  27. Le Gal, B., Leroux, C., Jego, C. (2014). Software polar decoder on an embedded processor. In Proceedings of the IEEE workshop on signal processing systems (SiPS) (pp. 1–6).

  28. Le Gal, B., Leroux, C., Jego, C. (2015). Multi-Gb/s, software decoding of polar codes. IEEE Transactions on Signal Processing, 63(2), 349–359.

    Article  MathSciNet  Google Scholar 

  29. Giard, P., Sarkis, G., Leroux, C., Thibeault, C., Gross, W.J. (2018). Low-latency software polar decoders. Journal of Signal Processing Systems, Springer, 90, 761–775.

    Article  Google Scholar 

  30. Grayver, E. (2013). Implementing software defined radio. New York: Springer.

    Book  Google Scholar 

  31. Cassagne, A., Tonnellier, T., Leroux, C., Le Gal, B., Aumage, O., Barthou, D. (2016). Beyond Gbps turbo decoder on multi-core CPUs. In Proceedings of the 9th International Symposium on Turbo Codes & Iterative Information Processing (ISTC’16), Brest, France.

  32. Carrasco, R.A., & Johnston, M. (2008). Non-Binary error control coding for wireless communication and data storage. Chichester: Wiley.

    Book  Google Scholar 

  33. Hocevar, D.E. (2004). A reduced complexity decoder architecture via layered decoding of LDPC codes. In Proceedings of the IEEE workshop on signal processing systems (SIPS) workshop (pp. 107–112).

  34. Declercq, D., & Fossorier, M. (2007). Decoding algorithms for nonbinary LDPC codes over GF. IEEE Transactions on Communications, 55(4), 633–643.

    Article  Google Scholar 

  35. Voicila, A., Declercq, D., Verdier, F., Fossorier, M., Urard, P. (2007). Low complexity, low memory EMS algorithm for non-binary LDPC codes. In IEEE Internationnal Conference on Communications (ICC’07), Glasgow, England.

  36. Le Gal, B., & Jego, C. (2015). High-throughput LDPC decoder on low-power embedded processors. IEEE Communications Letters, 19(11), 1861–1864.

    Article  Google Scholar 

  37. Gronroos, S., Nybom, K., Bjorkqvist, J. (2012). Efficient GPU and CPU-based LDPC decoders for long codewords. Journal of Analog Integrated Circuits and Signal Processing, 73, 583—595.

    Article  Google Scholar 

  38. Wymeersch, H., Steendam, H., Moeneclaey, M. (2004). Log-domain decoding of LDPC codes over GF(q). In IEEE international conference on communications (ICC’04), pp. 772–776, Paris, France.

  39. Poulliat, C., Fossorier, M., Declercq, D. (2008). Design of regular (2, d c)-LDPC codes over GF(q) using their binary images. IEEE Transactions on Communications, 56(10), 1626–1635.

    Article  Google Scholar 

  40. Helmling, M., & Scholl, S. Database of channel codes and ml simulation results. University of Kaiserslautern, https://www.uni-kl.de/channel-codes.

  41. Ying, Y., You, K., Zhou, L., Quan, H., Zeng, X. (2012). A pure software LDPC decoder on a multi-core processor platform with reduced inter-processor communication cost. In Proceedings of the ISCAS Symposium (pp. 2609–2612).

  42. Le Gal, B., Jego, C., Crenne, J. (2014). A high throughput efficient approach for decoding LDPC codes onto GPU devices. IEEE Embedded Systems Letters, 6(2), 29–32.

    Article  Google Scholar 

  43. Peng, H., Liu, R., Hou, Y., Zhao, L. (2016). A Gb/s parallel block-based Viterbi decoder for convolutional codes on GPU. In Proceedings of the 8th international conference on wireless communications & signal processing (WCSP) (p. 1-6).

Download references

Acknowledgements

The authors would also like to thank INTEL Inc., for providing access to vLab Academic Research Environment.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bertrand Le Gal.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Le Gal, B., Jego, C. High-Throughput FFT-SPA Decoder Implementation for Non-Binary LDPC Codes on x86 Multicore Processors. J Sign Process Syst 92, 37–53 (2020). https://doi.org/10.1007/s11265-019-01447-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-019-01447-8

Keywords

Navigation