Abstract
The latest production version of the fusion particle simulation code, Gyrokinetic Toroidal Code (GTC), has been ported to and optimized for the next generation exascale GPU supercomputing platform. Heterogeneous programming using directives has been utilized to balance the continuously implemented physical capabilities and rapidly evolving software/hardware systems. The original code has been refactored to a set of unified functions/calls to enable the acceleration for all the species of particles. Extensive GPU optimization has been performed on GTC to boost the performance of the particle push and shift operations. In order to identify the hotspots, the code was the first benchmarked on up to 8000 nodes of the Titan supercomputer, which shows about 2–3 times overall speedup comparing NVidia M2050 GPUs to Intel Xeon X5670 CPUs. This Phase I optimization was followed by further optimizations in Phase II, where single-node tests show an overall speedup of about 34 times on SummitDev and 7.9 times on Titan. The real physics tests on Summit machine showed impressive scaling properties that reaches roughly 50% efficiency on 928 nodes of Summit. The GPU + CPU speed up from purely CPU is over 20 times, leading to an unprecedented speed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The timings for the TITAN CPU w/PETSc case in Table 1 assume an ideal scaling in OMP threads from 8 threads to 16. i.e. the times presented in Table 1 for this case are those of the 8 OMP threads case, but they are divided by 2. The motivation for this is to set a lower bound in the possible GPU speedup attainable in TITAN.
References
Lee, W.W.: Phys. Fluids 26, 556 (1983)
Lee, W.: J. Comput. Phys. 72, 243 (1987). ISSN 0021-9991
Littlejohn, R.G.: J. Plasma Phys. 29, 111 (1983)
Brizard, A., Hahm, T.: Rev. Mod. Phys. 79, 421 (2007)
Hahm, T.: Phys. Fluids (1958–1988) 31, 2670 (1988)
Frieman, E., Chen, L.: Phys. Fluids (1958–1988) 25, 502 (1982)
Rogister, A., Li, D.: Phys. Fluids B: Plasma Phys. (1989–1993) 4, 804 (1992)
Lin, Z., Chen, L.: Phys. Plasmas (1994-present) 8, 1447 (2001)
Lin, Y., Wang, X., Lin, Z., Chen, L.: Plasma Phys. Controlled Fusion 47, 657 (2005)
Holod, I., Zhang, W.L., Xiao, Y., Lin, Z.: Phys. Plasmas 16, 122307 (2009)
Liu, P., Zhang, W., Dong, C., Lin, J., Lin, Z., Cao, J.: Nucl. Fusion 57, 126011 (2017)
Lin, Z., Hahm, T.S., Lee, W.W., Tang, W.M., White, R.B.: Turbulent transport reduction by zonal flows: massively parallel simulations. Science 281, 1835 (1998)
Lin, Z., Holod, I., Chen, L., Diamond, P.H., Hahm, T.S., Ethier, S.: Phys. Rev. Lett. 99, 265003 (2007)
Xiao, Y., Lin, Z.: Turbulent transport of trapped electron modes in collisionless plasmas. Phys. Rev. Lett. 103, 085004 (2009)
Zhang, W., Lin, Z., Chen, L.: Phys. Rev. Lett. 101, 095001 (2008)
Zhang, W., Decyk, V., Holod, I., Xiao, Y., Lin, Z., Chen, L.: Phys. Plasmas 17, 055902 (2010)
Zhang, W., Holod, I., Lin, Z., Xiao, Y.: Phys. Plasmas 19, 022507 (2012)
Zhang, C., Zhang, W., Lin, Z., Li, D.: Phys. Plasmas 20, 052501 (2013)
Wang, Z., et al.: Radial localization of toroidicity-induced alfven eigenmodes. Phys. Rev. Lett. 111, 145003 (2013)
Cheng, J., et al.: Phys. Plasmas 23, 052504 (2016)
Kuley, A., et al.: Phys. Plasmas 22, 102515 (2015)
Peng, J., Zhihong, L., Holod, I., Chijie, X.: Plasma Sci. Technol 18, 126 (2016)
McClenaghan, J., Lin, Z., Holod, I., Deng, W., Wang, Z.: Phys. Plasmas 21, 122519 (2014)
Liu, D., Zhang, W., McClenaghan, J., Wang, J., Lin, Z.: Phys. Plasmas 21, 122520 (2014)
Lin, Z., Hahm, T.S., Ethier, S., Tang, W.M.: Size scaling of turbulent transport in magnetically confined plasmas. Phys. Rev. Lett. 88, 195004 (2002)
Meng, X., et al.: Heterogeneous programming and optimization of gyrokinetic toroidal code and large-scale performance test on TH-1A. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. LNCS, vol. 7905, pp. 81–96. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38750-0_7
Wang, E., et al.: The gyrokinetic particle simulation of fusion plasmas on Tianhe-2 supercomputer. In: Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA) 2016, International Conference for High Performance Computing, Networking, Storage and Analysis (SC2016), Salt Lake City, USA (2016)
Madduri, K., et al.: Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems. In: Proceedings of International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2011 (2011)
Madduri, K., Im, E.J., Ibrahim, K.Z., Williams, S., Ethier, S., Oliker, L.: Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms. Parallel Comput. 37(9), 501–520 (2011)
Wang, B., et al.: Kinetic turbulence simulations at extreme scale on leadership-class systems. In: Proceedings of International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2013, no. 82 (2013)
Ethier, S., Adams, M., Carter, J., Oliker, L.: Petascale parallelization of the gyrokinetic toroidal Code. LBNL Paper LBNL-4698 (2012)
Tang, W., Wang, B., Ethier, S.: Scientific discovery in fusion plasma turbulence simulations at extreme scale. Comput. Sci. Eng. 16, 44 (2014)
Dawson, J.M.: Rev. Mod. Phys. 55, 403 (1983)
Birdsall, C.K., Langdon, A.B.: Plasma Physics via Computer Simulation. CRC Press, Boca Raton (2004)
Xiao, Y., Holod, I., Wang, Z., Lin, Z., Zhang, T.: Phys. Plasmas 22, 022516 (2015)
Feng, H., et al.: Development of finite element field solver in gyrokinetic toroidal code. Commun. Comput. Phys. 24, 655 (2018)
Ethier, S., Lin, Z.: Porting the 3D gyrokinetic particle-in-cell code GTC to the NEC SX-6 vector architecture: perspectives and challenges. Comput. Phys. Commun. 164, 456–458 (2004)
White, R.B., Chance, M.S.: Phys. Fluids 27, 2455 (1984)
Joubert, W., et al.: Accelerated application development: the ORNL Titan experience. Comput. Electr. Eng. 46, 123–138 (2015)
Vergara Larrea, V.G., et al.: Experiences evaluating functionality and performance of IBM POWER8+ systems. In: Kunkel, J.M., Yokota, R., Taufer, M., Shalf, J. (eds.) ISC High Performance 2017. LNCS, vol. 10524, pp. 254–274. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67630-2_20
Acknowledgments
The authors would like to thank Eduardo D’Azevedo for his many useful suggestions in the optimizations. This work was supported by the US Department of Energy (DOE) CAAR project, DOE SciDAC ISEP center, and National MCF Energy R&D Program under Grant Nos. 2018YFE0304100 and 2017YFE0301300, the National Natural Science Foundation of China under Grant Nos. 11675257, and the External Cooperation Program of Chinese Academy of Sciences under Grant No. 112111KYSB20160039. This research used resources of the Oak Ridge Leadership Computing Facility (OLCF) at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, W. et al. (2019). Heterogeneous Programming and Optimization of Gyrokinetic Toroidal Code Using Directives. In: Chandrasekaran, S., Juckeland, G., Wienke, S. (eds) Accelerator Programming Using Directives. WACCPD 2018. Lecture Notes in Computer Science(), vol 11381. Springer, Cham. https://doi.org/10.1007/978-3-030-12274-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-12274-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-12273-7
Online ISBN: 978-3-030-12274-4
eBook Packages: Computer ScienceComputer Science (R0)