Abstract
A many-core parallel approach of the multilevel fast multipole algorithm (MLFMA) based on the Athread parallel programming model is presented on the homegrown many-core SW26010 CPU of China. In the proposed many-core implementation of MLFMA, the data access efficiency is improved by using data structures based on the structure of array. The adaptive workload distribution strategies are adopted on different MLFMA tree levels to ensure full utilization of computing capability and the scratchpad memory. A double buffering scheme is specially designed to make communication overlapped computation. The resulting Athread-based many-core implementation of the MLFMA is capable of solving real-life problems with over one million unknowns with a remarkable speedup. The capability and efficiency of the proposed method are analyzed through the examples of computing scattering by spheres and a practical aerocraft. Numerical results show that with the proposed parallel scheme, the total speedup ratios from 6.4 to 8.0 can be achieved, compared with the CPU master core.
Similar content being viewed by others
References
Dongarra J, Sullivan F (2000) Guest Editors Introduction to the top 10 algorithms. Comput Sci Eng 2(1):22–23
Song JM, Lu CC, Chew WC (1997) Multilevel fast multipole algorithm for electromagnetic scattering by large complex objects. IEEE Trans Antennas Propag 45(10):1488–1493
Sheng XQ, Jin JM, Song J et al (1998) Solution of combined-field integral equation using multilevel fast multipole algorithm for scattering by homogeneous bodies. IEEE Trans Antennas Propag 46(11):1718–1726
Velamparambil S, Chew WC, Song JM (2003) 10 million unknowns: Is it that big? IEEE Antennas Propag Mag 45(2):43–58
Pan XM, Sheng XQ (2008) A sophisticated parallel MLFMA for scattering by extremely large targets. IEEE Antennas Propag Mag 50(3):129–138
Ergul O, Gurel L (2008) Hierarchical parallelization strategy for multilevel fast multipole algorithm in computational electromagnetics. Electron Lett 44(6):3–4
Yang ML, Wu BY, Gao HW et al (2008) A ternary parallelization approach of MLFMA for solving electromagnetic scattering problems with over 10 billion unknowns. IEEE Trans Antennas Propag 67(11):6965–6978
Hu FJ, Nie ZP, Hu J (2010) An efficient parallel multilevel fast multipole algorithm for large-scale scattering problems. Appl Comput Electromagn Soc J 25(4):381–387
Zhao HP, Hu J, Nie ZP (2010) Parallelization of MLFMA with composite load partition criteria and asynchronous communication. Appl Comput Electromag Soc J 25(2):167–173
Pan XM, Pi WC, Yang ML et al (2012) Solving problems with over one billion unknowns by the MLFMA. IEEE Trans Antennas Propag 60(5):2571–2574
Donno DD, Esposito A, Tarricone LCL (2010) Introduction to GPU computing and CUDA programming: a case study on FDTD. IEEE Antennas Propag Mag 53(3):116–122
Corp NVIDIA (2011) NVIDIA CUDA C Programming Guide. Santa Clara, CA, USA
Crimi G, Mantovani F, Pivanti M et al (2013) Early experience on porting and running a Lattice Boltzmann code on the Xeon-Phi co-processor. Proc Comput Sci 18:551–560
Murano K, Shimobaba T, Sugiyama A et al (2014) Fast computation of computer-generated hologram using Xeon Phi coprocessor. Comput Phys Commun 185(10):2742–2757
Teodoro G, Kurc T, Kong J et al (2014) Comparative performance analysis of Intel Xeon Phi, GPU, and CPU: a case study from microscopy image analysis. IEEE Trans Parallel Distrib Syst 2014:1063–1072
Zheng F, Li HL, Lv H et al (2015) Cooperative computing techniques for a deeply fused and heterogeneous many-core processor architecture. J Comput Sci Technol 30(1):145–162
Jiang L, Yang C, Ao Y et al (2017) Towards Highly Efficient DGEMM on the Emerging SW26010 Many-Core Processor. In: 46th International Conference on Parallel Processing (ICPP), IEEE computer society
Xu K, Ding DZ, Fan ZH et al (2010) Multilevel fast multipole algorithm enhanced by GPU parallel technique for electromagnetic scattering problems. Microw Opt Technol Lett 52(3):502–507
Guan J, Yan S, Jin JM (2013) An OpenMP-CUDA implementation of multilevel fast multipole algorithm for electromagnetic simulation on multi-GPU computing systems. IEEE Trans Antennas Propag 61(7):3607–3616
Mu X, Zhou HX, Chen K et al (2014) Higher order method of moments with a parallel out-of-core LU solver on GPU/CPU platform. IEEE Trans Antennas Propag 62(11):5634–5646
Tran N, Kilic O (2016) Parallel implementations of multilevel fast multipole algorithm on graphical processing unit cluster for large-scale electromagnetics objects. Appl Comput Electromag Soc J 1(4):145–148
Phan T, Tran N, Kilic O (2018) Multi-level fast multipole algorithm for 3-D homogeneous dielectric objects using MPI-CUDA on GPU cluster. Appl Comput Electromag Soc J 33(3):335–338
Rao S, Wilton D, Glisson A (1982) Electromagnetic scattering by surfaces of arbitrary shape. IEEE Trans Antennas Propag 30(3):409–418
Fu H, Liao JF, Yang JZ et al (2016) The Sunway TaihuLight supercomputer: system and applications. Sci China Inf Sci 59(7):072001
Dongarra J (2016) Sunway TaihuLight supercomputer makes its appearance. Natl Sci Rev 3(3):265–266
Xu Z, Lin J, Matsuoka S (2017) Benchmarking SW26010 Many-Core processor. In: IEEE International parallel and distributed processing symposium workshops
OpenACC-Standard.org (2018) The OpenACC Application Programming Interface
National Supercomputing Center in Wuxi (2016) The Compiling System User Guide of Sunway TighthuLight
Acknowledgements
This work was supported by the National Key R&D Program of China (Grant No. 2017YFB0202500), and the NSFC (Grant Nos. 61971034 and U1730102).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
He, WJ., Yang, ML., Wang, W. et al. Efficient parallelization of multilevel fast multipole algorithm for electromagnetic simulation on many-core SW26010 processor. J Supercomput 77, 1502–1516 (2021). https://doi.org/10.1007/s11227-020-03308-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-020-03308-9