Efficient parallelization of multilevel fast multipole algorithm for electromagnetic simulation on many-core SW26010 processor

Wei-Jia He¹,
Ming-Lin Yang ORCID: orcid.org/0000-0002-0638-9526¹,
Wu Wang² &
…
Xin-Qing Sheng¹

298 Accesses
3 Citations
Explore all metrics

Abstract

A many-core parallel approach of the multilevel fast multipole algorithm (MLFMA) based on the Athread parallel programming model is presented on the homegrown many-core SW26010 CPU of China. In the proposed many-core implementation of MLFMA, the data access efficiency is improved by using data structures based on the structure of array. The adaptive workload distribution strategies are adopted on different MLFMA tree levels to ensure full utilization of computing capability and the scratchpad memory. A double buffering scheme is specially designed to make communication overlapped computation. The resulting Athread-based many-core implementation of the MLFMA is capable of solving real-life problems with over one million unknowns with a remarkable speedup. The capability and efficiency of the proposed method are analyzed through the examples of computing scattering by spheres and a practical aerocraft. Numerical results show that with the proposed parallel scheme, the total speedup ratios from 6.4 to 8.0 can be achieved, compared with the CPU master core.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Massive parallelization of multilevel fast multipole algorithm for 3-D electromagnetic scattering problems on SW26010 many-core cluster

Article 28 November 2023

swParaFEM: a highly efficient parallel finite element solver on Sunway many-core architecture

Article 28 February 2023

Scalable Fast Multipole Method for Electromagnetic Simulations

References

Dongarra J, Sullivan F (2000) Guest Editors Introduction to the top 10 algorithms. Comput Sci Eng 2(1):22–23
Article Google Scholar
Song JM, Lu CC, Chew WC (1997) Multilevel fast multipole algorithm for electromagnetic scattering by large complex objects. IEEE Trans Antennas Propag 45(10):1488–1493
Article Google Scholar
Sheng XQ, Jin JM, Song J et al (1998) Solution of combined-field integral equation using multilevel fast multipole algorithm for scattering by homogeneous bodies. IEEE Trans Antennas Propag 46(11):1718–1726
Article Google Scholar
Velamparambil S, Chew WC, Song JM (2003) 10 million unknowns: Is it that big? IEEE Antennas Propag Mag 45(2):43–58
Article Google Scholar
Pan XM, Sheng XQ (2008) A sophisticated parallel MLFMA for scattering by extremely large targets. IEEE Antennas Propag Mag 50(3):129–138
Article MathSciNet Google Scholar
Ergul O, Gurel L (2008) Hierarchical parallelization strategy for multilevel fast multipole algorithm in computational electromagnetics. Electron Lett 44(6):3–4
Article Google Scholar
Yang ML, Wu BY, Gao HW et al (2008) A ternary parallelization approach of MLFMA for solving electromagnetic scattering problems with over 10 billion unknowns. IEEE Trans Antennas Propag 67(11):6965–6978
Article Google Scholar
Hu FJ, Nie ZP, Hu J (2010) An efficient parallel multilevel fast multipole algorithm for large-scale scattering problems. Appl Comput Electromagn Soc J 25(4):381–387
Google Scholar
Zhao HP, Hu J, Nie ZP (2010) Parallelization of MLFMA with composite load partition criteria and asynchronous communication. Appl Comput Electromag Soc J 25(2):167–173
Google Scholar
Pan XM, Pi WC, Yang ML et al (2012) Solving problems with over one billion unknowns by the MLFMA. IEEE Trans Antennas Propag 60(5):2571–2574
Article MathSciNet Google Scholar
Donno DD, Esposito A, Tarricone LCL (2010) Introduction to GPU computing and CUDA programming: a case study on FDTD. IEEE Antennas Propag Mag 53(3):116–122
Article Google Scholar
Corp NVIDIA (2011) NVIDIA CUDA C Programming Guide. Santa Clara, CA, USA
Crimi G, Mantovani F, Pivanti M et al (2013) Early experience on porting and running a Lattice Boltzmann code on the Xeon-Phi co-processor. Proc Comput Sci 18:551–560
Article Google Scholar
Murano K, Shimobaba T, Sugiyama A et al (2014) Fast computation of computer-generated hologram using Xeon Phi coprocessor. Comput Phys Commun 185(10):2742–2757
Article Google Scholar
Teodoro G, Kurc T, Kong J et al (2014) Comparative performance analysis of Intel Xeon Phi, GPU, and CPU: a case study from microscopy image analysis. IEEE Trans Parallel Distrib Syst 2014:1063–1072
Google Scholar
Zheng F, Li HL, Lv H et al (2015) Cooperative computing techniques for a deeply fused and heterogeneous many-core processor architecture. J Comput Sci Technol 30(1):145–162
Article Google Scholar
Jiang L, Yang C, Ao Y et al (2017) Towards Highly Efficient DGEMM on the Emerging SW26010 Many-Core Processor. In: 46th International Conference on Parallel Processing (ICPP), IEEE computer society
Xu K, Ding DZ, Fan ZH et al (2010) Multilevel fast multipole algorithm enhanced by GPU parallel technique for electromagnetic scattering problems. Microw Opt Technol Lett 52(3):502–507
Article Google Scholar
Guan J, Yan S, Jin JM (2013) An OpenMP-CUDA implementation of multilevel fast multipole algorithm for electromagnetic simulation on multi-GPU computing systems. IEEE Trans Antennas Propag 61(7):3607–3616
Article MathSciNet Google Scholar
Mu X, Zhou HX, Chen K et al (2014) Higher order method of moments with a parallel out-of-core LU solver on GPU/CPU platform. IEEE Trans Antennas Propag 62(11):5634–5646
Article MathSciNet Google Scholar
Tran N, Kilic O (2016) Parallel implementations of multilevel fast multipole algorithm on graphical processing unit cluster for large-scale electromagnetics objects. Appl Comput Electromag Soc J 1(4):145–148
Google Scholar
Phan T, Tran N, Kilic O (2018) Multi-level fast multipole algorithm for 3-D homogeneous dielectric objects using MPI-CUDA on GPU cluster. Appl Comput Electromag Soc J 33(3):335–338
Google Scholar
Rao S, Wilton D, Glisson A (1982) Electromagnetic scattering by surfaces of arbitrary shape. IEEE Trans Antennas Propag 30(3):409–418
Article Google Scholar
Fu H, Liao JF, Yang JZ et al (2016) The Sunway TaihuLight supercomputer: system and applications. Sci China Inf Sci 59(7):072001
Article Google Scholar
Dongarra J (2016) Sunway TaihuLight supercomputer makes its appearance. Natl Sci Rev 3(3):265–266
Article Google Scholar
Xu Z, Lin J, Matsuoka S (2017) Benchmarking SW26010 Many-Core processor. In: IEEE International parallel and distributed processing symposium workshops
OpenACC-Standard.org (2018) The OpenACC Application Programming Interface
National Supercomputing Center in Wuxi (2016) The Compiling System User Guide of Sunway TighthuLight

Download references

Acknowledgements

This work was supported by the National Key R&D Program of China (Grant No. 2017YFB0202500), and the NSFC (Grant Nos. 61971034 and U1730102).

Author information

Authors and Affiliations

Center for Electromagnetic Simulation, Beijing Institute of Technology, Beijing, 100081, China
Wei-Jia He, Ming-Lin Yang & Xin-Qing Sheng
Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100190, China
Wu Wang

Authors

Wei-Jia He
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Lin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Wu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xin-Qing Sheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ming-Lin Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

He, WJ., Yang, ML., Wang, W. et al. Efficient parallelization of multilevel fast multipole algorithm for electromagnetic simulation on many-core SW26010 processor. J Supercomput 77, 1502–1516 (2021). https://doi.org/10.1007/s11227-020-03308-9

Download citation

Published: 19 May 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s11227-020-03308-9

Efficient parallelization of multilevel fast multipole algorithm for electromagnetic simulation on many-core SW26010 processor

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Massive parallelization of multilevel fast multipole algorithm for 3-D electromagnetic scattering problems on SW26010 many-core cluster

swParaFEM: a highly efficient parallel finite element solver on Sunway many-core architecture

Scalable Fast Multipole Method for Electromagnetic Simulations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Efficient parallelization of multilevel fast multipole algorithm for electromagnetic simulation on many-core SW26010 processor

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Massive parallelization of multilevel fast multipole algorithm for 3-D electromagnetic scattering problems on SW26010 many-core cluster

swParaFEM: a highly efficient parallel finite element solver on Sunway many-core architecture

Scalable Fast Multipole Method for Electromagnetic Simulations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation