Abstract
Low-Density Parity-Check (LDPC) codes are a well known Error Correction Code family used for instance, in wireless and satellite communication links. Error correction performance of LDPC codes was further enhanced by extending it to higher order Galois fields, giving rise hence to the so-called non-binary LDPC codes (NB-LDPC). Error correction performance improvement (CCSDS 2014) is the main reason behind the adoption by the Consultative Committee for Space Data Systems (CCSDS) of the NB-LDPC codes in the experimental specification for the future next generation uplinks (CCSDS 2014, 2015). The high error correction efficiency for short frames make NB-LDPC codes a good candidate for IoT applications. However, the performance gain comes at the expense of a high decoding computational complexity (CCSDS 2014; Conde-Canencia et al. 2009). In this paper, an x86 multicore NB-LDPC decoder implementation is provided. This decoder that implements the FFT-SPA algorithm provides a throughput improvement of about 1.3 × to 2.7 ×, a latency reduction of more than 95% and a power consumption halved in comparison with the most efficient works on GPU (Graphics Processing Unit) device. Indeed, an efficient memory mapping and computation optimizations on the x86 architecture enable to achieve a higher decoding throughput than the GPU-based in similar experimental setup. Consequently, the throughput efficiency, the low processing latency associated with a low power consumption makes this proposed multicore implementation practical and attractive for real time implementations of NB-LDPC decoders in future SDR or Cloud-RAN systems for CCSDS standard and IoT applications.
Similar content being viewed by others
Notes
Note that optimized SPA and Min-Sum decoders were also implemented and evaluated even if they are not detailed in this paper. A subsection in the experimentation section provides performance comparisons.
Source codes of the NB-LDPC decoder implementations will be provided under open-source licence on GitHub
It provides an optimistic estimation of the processing latency discarding memory access penalties from latency estimation. Nevertheless, useful information about instruction level parallelism and processing bottlenecks is given.
The \(\mathbb {GF}(16)\) CCSDS codes are not currently available.
Source codes of the NB-LDPC decoder implementations will be provided under open-source licence on GitHub such as previous works of the authors.
For multi-threaded experimentations, the C++ 11 thread feature is applied to allocate Q distinct threads that execute Q data and control independent decoders in parallel. System throughput is the sum of the Q decoder throughputs whereas decoding latency is the average of the latency average of each decoder.
It is due to low memory footprint of the decoders that is reached thanks to the intra-frame parallelization strategy.
\(\mathcal {P}_2\) has an Haswell architecture contrary to other ones that have a Skylake architecture. Skylake architecture has improved features such as the floating point multiplication and division that consumes 4 and 11 clock cycles, respectively, 5 and 17 clock cycles are required for Haswell architecture.
Min-Sum and Min-Max decoder provides the same throughput and latency results because addition and maximum instructions requires the same number of clock cycles to execute.
Throughput under Normalized Decoding Cost [41].
References
Consultative Committee for Space Data Systems (CCSDS). (2014). CCSDS 231.1-O-1 - Green Book - Next Generation Uplink - Informational RePORT CCSDS 230.2-G-1. Washington, DC, USA.
Consultative Committee for Space Data Systems (CCSDS). (2015). CCSDS 231.1-O-1 - Orange Book - Short Block Length LDPC codes for TC Synchronization and Channel Coding Washington, DC, USA.
Conde-Canencia, L., Al-Ghouwayel, A., Boutillon, E. (2009). Complexity comparison of Non-Binary LDPC decoders. In Proceedings of the ICT mobile summit conference, pp. 1–8, Santander, Spain.
Gallager, R.G. (1962). Low density parity check codes. IRE Transactions on Information Theory, 8(1), 21–28.
DVB Document A83-2. (2014). Digital video broadcasting (DVB) - part II: S2-extensions (DVB-s2x) march.
Consultative Committee for Space Data Systems (CCSDS). (2011). CCSDS 131.0-B-2 - TM synchronization and channel coding. Washington, DC, USA, blue book edition.
Davey, M.C., & MacKay, D.J.C. (1998). Low density parity check codes over GF(q). IEEE Communications Letters, 2(6), 165–167.
Barnault, L., & Declercq, D. (2003). Fast decoding algorithm for LDPC over GF(2q). In Proceedings of the IEEE information theory workshop (pp. 70–73).
Dolecek, L., Divsalar, D., Sun, Y., Amiri, B. (2014). Non-binary protograph-based LDPC codes: enumerators, analysis, and designs. IEEE Transactions on Information Theory, 60(7), 3913–3941.
Wang, G, Shen, H., Yin, B., Wu, M., Sun, Y., Cavallaro, J.R. (2012). Parallel nonbinary LDPC decoding on GPU. In Proceedings of the conference record of 46th Asilomar conference on signals, systems and computers (ASILOMAR) (pp. 1277–1281).
Andrade, J., Falcao, G., Silva, V., Kasai, K. (2013). FFT-SPA non-binary LDPC decoding on GPU. In Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP). Vancouver, BC (pp. 5099–5103).
Beermann, M., Monzo, E., Schmalen, L., Vary, P. (2013). High speed decoding of non-binary irregular LDPC codes using GPUs. In Proceedings of the IEEE workshop on signal processing systems (SiPS) (pp. 36–41).
Andrade, J., Falcao, G., Silva, V. (2014). Optimized fast walsh-hadamard transform on GPUs, for non-binary LDPC decoding. Elsevier Journal of Parallel Computing, 40(9), 449–453.
Thi, H.P., Ajaz, S., Lee, H. (2014). Efficient Min-Max nonbinary LDPC decoding on GPU. In Proceedings of the international SoC design conference (ISOCC) (pp. 266–267).
Beermann, M., Schmalen, L., Vary, P. (2015). GPU accelerated belief propagation decoding of non-binary LDPC codes with parallel and sequential scheduling. Springer, Journal of Signal Processing Systems, 78(1), 21–34.
Pham, H.T., Ajaz, S., Lee, H. (2015). Parallel block-layered nonbinary QC-LDPC decoding on GPU. In Proceedings of the IEEE workshop on signal processing systems (SiPS) (pp. 1–6).
Liu, Z., Liu, R., Hou, Y., Zhao, L. (2018). High-throughput multi-codeword decoder for non-binary LDPC codes on GPU. IEEE Communications Letters, 22, 486–489.
Boutillon, E., Conde-Canencia, L., Ghouwayel, A.A. (2013). Design of a GF(64)-LDPC decoder based on the EMS algorithm. IEEE Transactions on Circuits and Systems (TCAS-I), 60(10), 2644–2656.
Abassi, O, Conde-Canencia, L, Ghouwayel, A.A., Boutillon, E. (2017). A novel architecture for elementary check node processing in non-binary LDPC decoders. IEEE Transactions on Circuits and Systems II, (TCAS-II), 64 (2), 136–140.
Chang, C.-C., Chang, Y.-L., Huang, M.-Y., Huang, B. (2011). Accelerating regular LDPC code decoders on GPUs. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 4(3), 653–659.
Falcao, G., Silva, V., Sousa, L., Andrade, J. (2012). Portable LDPC decoding on multicores using OpenCL. IEEE Signal Processing Magazine, 29(4), 81–88.
Wang, G., Wu, M., Yin, B., Cavallaro, J.R. (2013). High throughput low latency LDPC decoding on GPU for SDR systems. In Proceedings of the IEEE GlobalSIP conference (pp. 1258–1261).
Li, R., Dou, Y., Zou, D., Wang, S., Zhang, Y. (2013). Efficient graphics processing unit based layered decoders for quasicyclic low-density parity-check codes. Concurrency and Computation: Practice and Experience, 27 (1), 29–46.
Lin, Y., & Niu, W. (2014). High throughput LDPC decoder on GPU. IEEE Communications Letters, 18(2), 344–347.
Le Gal, B., & Jego, C. (2016). High-throughput multi-core LDPC decoders based on x86 processor. IEEE Transactions on Parallel and Distributed Systems (TPDS), 27(5), 1373–1386.
Andrade, J., Falcao, G., Silva, V., Sousa, L. (2016). A survey on programmable LDPC decoders. IEEE Access, 4, 6704–6718.
Le Gal, B., Leroux, C., Jego, C. (2014). Software polar decoder on an embedded processor. In Proceedings of the IEEE workshop on signal processing systems (SiPS) (pp. 1–6).
Le Gal, B., Leroux, C., Jego, C. (2015). Multi-Gb/s, software decoding of polar codes. IEEE Transactions on Signal Processing, 63(2), 349–359.
Giard, P., Sarkis, G., Leroux, C., Thibeault, C., Gross, W.J. (2018). Low-latency software polar decoders. Journal of Signal Processing Systems, Springer, 90, 761–775.
Grayver, E. (2013). Implementing software defined radio. New York: Springer.
Cassagne, A., Tonnellier, T., Leroux, C., Le Gal, B., Aumage, O., Barthou, D. (2016). Beyond Gbps turbo decoder on multi-core CPUs. In Proceedings of the 9th International Symposium on Turbo Codes & Iterative Information Processing (ISTC’16), Brest, France.
Carrasco, R.A., & Johnston, M. (2008). Non-Binary error control coding for wireless communication and data storage. Chichester: Wiley.
Hocevar, D.E. (2004). A reduced complexity decoder architecture via layered decoding of LDPC codes. In Proceedings of the IEEE workshop on signal processing systems (SIPS) workshop (pp. 107–112).
Declercq, D., & Fossorier, M. (2007). Decoding algorithms for nonbinary LDPC codes over GF. IEEE Transactions on Communications, 55(4), 633–643.
Voicila, A., Declercq, D., Verdier, F., Fossorier, M., Urard, P. (2007). Low complexity, low memory EMS algorithm for non-binary LDPC codes. In IEEE Internationnal Conference on Communications (ICC’07), Glasgow, England.
Le Gal, B., & Jego, C. (2015). High-throughput LDPC decoder on low-power embedded processors. IEEE Communications Letters, 19(11), 1861–1864.
Gronroos, S., Nybom, K., Bjorkqvist, J. (2012). Efficient GPU and CPU-based LDPC decoders for long codewords. Journal of Analog Integrated Circuits and Signal Processing, 73, 583—595.
Wymeersch, H., Steendam, H., Moeneclaey, M. (2004). Log-domain decoding of LDPC codes over GF(q). In IEEE international conference on communications (ICC’04), pp. 772–776, Paris, France.
Poulliat, C., Fossorier, M., Declercq, D. (2008). Design of regular (2, d c)-LDPC codes over GF(q) using their binary images. IEEE Transactions on Communications, 56(10), 1626–1635.
Helmling, M., & Scholl, S. Database of channel codes and ml simulation results. University of Kaiserslautern, https://www.uni-kl.de/channel-codes.
Ying, Y., You, K., Zhou, L., Quan, H., Zeng, X. (2012). A pure software LDPC decoder on a multi-core processor platform with reduced inter-processor communication cost. In Proceedings of the ISCAS Symposium (pp. 2609–2612).
Le Gal, B., Jego, C., Crenne, J. (2014). A high throughput efficient approach for decoding LDPC codes onto GPU devices. IEEE Embedded Systems Letters, 6(2), 29–32.
Peng, H., Liu, R., Hou, Y., Zhao, L. (2016). A Gb/s parallel block-based Viterbi decoder for convolutional codes on GPU. In Proceedings of the 8th international conference on wireless communications & signal processing (WCSP) (p. 1-6).
Acknowledgements
The authors would also like to thank INTEL Inc., for providing access to vLab Academic Research Environment.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Le Gal, B., Jego, C. High-Throughput FFT-SPA Decoder Implementation for Non-Binary LDPC Codes on x86 Multicore Processors. J Sign Process Syst 92, 37–53 (2020). https://doi.org/10.1007/s11265-019-01447-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-019-01447-8