Abstract
Simulations based on stencil computations (widely used in geosciences) have been dominated by the MPI+OpenMP programming model paradigm. Little effort has been devoted to experimenting with task-based parallelism in this context. We address this by introducing OpenMP task parallelism into the kernel of an industrial seismic modeling code, Minimod. We observe that even for these highly regular stencil computations, taskified kernels are competitive with traditional OpenMP-augmented loops, and in some experiments tasks even outperform loop parallelism.
This promising result sets the stage for more complex computational patterns. Simulations involve more than just the stencil calculation: a collection of kernels is often needed to accomplish the scientific objective (e.g., I/O, boundary conditions). These kernels can often be computed simultaneously; however, implementing this simultaneous computation with traditional programming models is not trivial. The presented approach will be extended to cover simultaneous execution of several kernels, where we expect to fully exploit the benefits of task-based programming.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Acun, B., et al.: Parallel programming with migratable objects: Charm++ in practice. In: SC 2014: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 647–658 (2014). https://doi.org/10.1109/SC.2014.58
Atkinson, P., McIntosh-Smith, S.: On the performance of parallel tasking runtimes for an irregular fast multipole method application. In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2017. LNCS, vol. 10468, pp. 92–106. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65578-9_7
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput.: Pract. Exp. 23(2), 187–198 (2011). https://doi.org/10.1002/cpe.1631
Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: SC 2012: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–11, November 2012. https://doi.org/10.1109/SC.2012.71
Berenger, J.P.: A perfectly matched layer for the absorption of electromagnetic waves. J. Comput. Phys. 114(2), 185–200 (1994). https://doi.org/10.1006/jcph.1994.1159
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. SIGPLAN Not. 30(8), 207–216 (1995). https://doi.org/10.1145/209937.209958
Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Herault, T., Dongarra, J.J.: PaRSEC: exploiting heterogeneity to enhance scalability. Comput. Sci. Eng. 15(6), 36–45 (2013). https://doi.org/10.1109/MCSE.2013.98
de la Cruz, R., Araya-Polo, M.: Algorithm 942: semi-stencil. ACM Trans. Math. Softw. 40(3) (2014). https://doi.org/10.1145/2591006
de la Cruz, R., Araya-Polo, M.: Towards a multi-level cache performance model for 3D stencil computation. Proc. Comput. Sci. 4, 2146 –2155 (2011). https://doi.org/10.1016/j.procs.2011.04.235. Proceedings of the International Conference on Computational Science, ICCS 2011
Delannoy, O., Petiton, S.: A peer to peer computing framework: design and performance evaluation of YML. In: Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks, pp. 362–369 (2004). https://doi.org/10.1109/ISPDC.2004.7
Duran, A., et al.: OmpSs: a proposal for programming heterogeneous multi-core architectures. Parallel Process. Lett. 21(02), 173–193 (2011). https://doi.org/10.1142/S0129626411000151
Duran, A., Corbalán, J., Ayguadé, E.: Evaluation of OpenMP task scheduling strategies. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 100–110. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79561-2_9
Ghosh, S., Liao, T., Calandra, H., Chapman, B.M.: Experiences with OpenMP, PGI, HMPP and OpenACC directives on ISO/TTI kernels. In: 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, pp. 691–700, November 2012. https://doi.org/10.1109/SC.Companion.2012.95
Gurhem, J., Tsuji, M., Petiton, S.G., Sato, M.: Distributed and parallel programming paradigms on the K computer and a cluster. In: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2019, pp. 9–17. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3293320.3293330
Kaiser, H., Heller, T., Adelstein-Lelbach, B., Serio, A., Fey, D.: HPX: a task based programming model in a global address space. In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, PGAS 2014. Association for Computing Machinery, New York (2014). https://doi.org/10.1145/2676870.2676883
Klinkenberg, J., Samfass, P., Bader, M., Terboven, C., Müller, M.S.: Chameleon: reactive load balancing for hybrid MPI + OpenMP task-parallel applications. J. Parallel Distrib. Comput. 138, 55–64 (2020). https://doi.org/10.1016/j.jpdc.2019.12.005
Klinkenberg, J., et al.: Assessing task-to-data affinity in the LLVM OpenMP runtime. In: de Supinski, B.R., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds.) IWOMP 2018. LNCS, vol. 11128, pp. 236–251. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98521-3_16
Lee, J., Sato, M.: Implementation and performance evaluation of XcalableMP: a parallel programming language for distributed memory systems. In: 2010 39th International Conference on Parallel Processing Workshops, pp. 413–420 (2010). https://doi.org/10.1109/ICPPW.2010.62
Louboutin, M., et al.: Devito (v3.1.0): an embedded domain-specific language for finite differences and geophysical exploration. Geosci. Model Dev. 12(3), 1165–1187 (2019). https://doi.org/10.5194/gmd-12-1165-2019
Mellor-Crummey, J., Fowler, R., Whalley, D.: Tools for application-oriented performance tuning. In: Proceedings of the 15th International Conference on Supercomputing, ICS 2001, pp. 154–165. Association for Computing Machinery, New York (2001). https://doi.org/10.1145/377792.377826
Meng, J., Atle, A., Calandra, H., Araya-Polo, M.: Minimod: a finite difference solver for seismic modeling. arXiv (2020). https://arxiv.org/abs/2007.06048
Moustafa, S., Kirschenmann, W., Dupros, F., Aochi, H.: Task-based programming on emerging parallel architectures for finite-differences seismic numerical kernel. In: Aldinucci, M., Padovani, L., Torquati, M. (eds.) Euro-Par 2018. LNCS, vol. 11014, pp. 764–777. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96983-1_54
NERSC: Cori. https://docs.nersc.gov/systems/cori/
Nguyen, A., Satish, N., Chhugani, J., Kim, C., Dubey, P.: 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In: SC 2010: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–13 (2010)
Oak Ridge Leadership Computing Facility: Summit. https://www.olcf.ornl.gov/olcf-resources/compute-systems/summit/
OpenMP Architecture Review Board: OpenMP Application Programming Interface, November 2018. https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf. version 5.0
Planas, J., Badia, R.M., Ayguadé, E., Labarta, J.: Hierarchical task-based programming with StarSs. Int. J. High Perform. Comput. Appl. 23(3), 284–299 (2009). https://doi.org/10.1177/1094342009106195
Qawasmeh, A., Hugues, M.R., Calandra, H., Chapman, B.M.: Performance portability in reverse time migration and seismic modelling via OpenACC. Int. J. High Perform. Comput. Appl. 31(5), 422–440 (2017). https://doi.org/10.1177/1094342016675678
Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O’Reilly Media, Beijing (2007)
Rico, A., Sánchez Barrera, I., Joao, J.A., Randall, J., Casas, M., Moretó, M.: On the benefits of tasking with OpenMP. In: Fan, X., de Supinski, B.R., Sinnen, O., Giacaman, N. (eds.) IWOMP 2019. LNCS, vol. 11718, pp. 217–230. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28596-8_15
Slaughter, E., Lee, W., Treichler, S., Bauer, M., Aiken, A.: Regent: a high-productivity programming language for HPC with logical regions. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015. Association for Computing Machinery, New York (2015). https://doi.org/10.1145/2807591.2807629
Thaler, F., et al.: Porting the COSMO weather model to manycore CPUs. In: Proceedings of the Platform for Advanced Scientific Computing Conference, PASC 2019. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3324989.3325723
Vidal, R., et al.: Evaluating the impact of OpenMP 4.0 extensions on relevant parallel workloads. In: Terboven, C., de Supinski, B.R., Reble, P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2015. LNCS, vol. 9342, pp. 60–72. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24595-9_5
Virouleau, P., et al.: Evaluation of OpenMP dependent tasks with the KASTORS benchmark suite. In: DeRose, L., de Supinski, B.R., Olivier, S.L., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2014. LNCS, vol. 8766, pp. 16–29. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11454-5_2
Acknowledgements
We would like to thank Total Exploration and Production Research and Technologies for their support of this work. We also thank Vivek Kale at Brookhaven National Laboratory for his help in guiding the experiments in this paper. We used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility operated under Contract No. DE-AC02-05CH11231. We also used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. Furthermore, we would like to thank Stony Brook Research Computing and Cyberinfrastructure, and the Institute for Advanced Computational Science at Stony Brook University for access to the SeaWulf computing system, which was made possible by a 1.4M National Science Foundation grant (#1531492).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Raut, E., Meng, J., Araya-Polo, M., Chapman, B. (2020). Evaluating Performance of OpenMP Tasks in a Seismic Stencil Application. In: Milfeld, K., de Supinski, B., Koesterke, L., Klinkenberg, J. (eds) OpenMP: Portable Multi-Level Parallelism on Modern Systems. IWOMP 2020. Lecture Notes in Computer Science(), vol 12295. Springer, Cham. https://doi.org/10.1007/978-3-030-58144-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-58144-2_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58143-5
Online ISBN: 978-3-030-58144-2
eBook Packages: Computer ScienceComputer Science (R0)