Evaluating Performance of OpenMP Tasks in a Seismic Stencil Application

Eric Raut ORCID: orcid.org/0000-0001-8091-2066¹²,
Jie Meng¹³,
Mauricio Araya-Polo¹³ &
…
Barbara Chapman¹²

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 12295))

Included in the following conference series:

International Workshop on OpenMP

681 Accesses
7 Citations

Abstract

Simulations based on stencil computations (widely used in geosciences) have been dominated by the MPI+OpenMP programming model paradigm. Little effort has been devoted to experimenting with task-based parallelism in this context. We address this by introducing OpenMP task parallelism into the kernel of an industrial seismic modeling code, Minimod. We observe that even for these highly regular stencil computations, taskified kernels are competitive with traditional OpenMP-augmented loops, and in some experiments tasks even outperform loop parallelism.

This promising result sets the stage for more complex computational patterns. Simulations involve more than just the stencil calculation: a collection of kernels is often needed to accomplish the scientific objective (e.g., I/O, boundary conditions). These kernels can often be computed simultaneously; however, implementing this simultaneous computation with traditional programming models is not trivial. The presented approach will be extended to cover simultaneous execution of several kernels, where we expect to fully exploit the benefits of task-based programming.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Task-Based Programming on Emerging Parallel Architectures for Finite-Differences Seismic Numerical Kernel

A Generic Strategy for Multi-stage Stencils

Enhancing a GPU-Based Wave Propagation Application Through Loop Tiling and Loop Fission Optimizations

References

Acun, B., et al.: Parallel programming with migratable objects: Charm++ in practice. In: SC 2014: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 647–658 (2014). https://doi.org/10.1109/SC.2014.58
Atkinson, P., McIntosh-Smith, S.: On the performance of parallel tasking runtimes for an irregular fast multipole method application. In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2017. LNCS, vol. 10468, pp. 92–106. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65578-9_7
Chapter Google Scholar
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput.: Pract. Exp. 23(2), 187–198 (2011). https://doi.org/10.1002/cpe.1631
Article Google Scholar
Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: SC 2012: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–11, November 2012. https://doi.org/10.1109/SC.2012.71
Berenger, J.P.: A perfectly matched layer for the absorption of electromagnetic waves. J. Comput. Phys. 114(2), 185–200 (1994). https://doi.org/10.1006/jcph.1994.1159
Article MathSciNet MATH Google Scholar
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. SIGPLAN Not. 30(8), 207–216 (1995). https://doi.org/10.1145/209937.209958
Article Google Scholar
Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Herault, T., Dongarra, J.J.: PaRSEC: exploiting heterogeneity to enhance scalability. Comput. Sci. Eng. 15(6), 36–45 (2013). https://doi.org/10.1109/MCSE.2013.98
Article Google Scholar
de la Cruz, R., Araya-Polo, M.: Algorithm 942: semi-stencil. ACM Trans. Math. Softw. 40(3) (2014). https://doi.org/10.1145/2591006
de la Cruz, R., Araya-Polo, M.: Towards a multi-level cache performance model for 3D stencil computation. Proc. Comput. Sci. 4, 2146 –2155 (2011). https://doi.org/10.1016/j.procs.2011.04.235. Proceedings of the International Conference on Computational Science, ICCS 2011
Delannoy, O., Petiton, S.: A peer to peer computing framework: design and performance evaluation of YML. In: Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks, pp. 362–369 (2004). https://doi.org/10.1109/ISPDC.2004.7
Duran, A., et al.: OmpSs: a proposal for programming heterogeneous multi-core architectures. Parallel Process. Lett. 21(02), 173–193 (2011). https://doi.org/10.1142/S0129626411000151
Article MathSciNet Google Scholar
Duran, A., Corbalán, J., Ayguadé, E.: Evaluation of OpenMP task scheduling strategies. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 100–110. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79561-2_9
Chapter Google Scholar
Ghosh, S., Liao, T., Calandra, H., Chapman, B.M.: Experiences with OpenMP, PGI, HMPP and OpenACC directives on ISO/TTI kernels. In: 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, pp. 691–700, November 2012. https://doi.org/10.1109/SC.Companion.2012.95
Gurhem, J., Tsuji, M., Petiton, S.G., Sato, M.: Distributed and parallel programming paradigms on the K computer and a cluster. In: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2019, pp. 9–17. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3293320.3293330
Kaiser, H., Heller, T., Adelstein-Lelbach, B., Serio, A., Fey, D.: HPX: a task based programming model in a global address space. In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, PGAS 2014. Association for Computing Machinery, New York (2014). https://doi.org/10.1145/2676870.2676883
Klinkenberg, J., Samfass, P., Bader, M., Terboven, C., Müller, M.S.: Chameleon: reactive load balancing for hybrid MPI + OpenMP task-parallel applications. J. Parallel Distrib. Comput. 138, 55–64 (2020). https://doi.org/10.1016/j.jpdc.2019.12.005
Klinkenberg, J., et al.: Assessing task-to-data affinity in the LLVM OpenMP runtime. In: de Supinski, B.R., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds.) IWOMP 2018. LNCS, vol. 11128, pp. 236–251. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98521-3_16
Chapter Google Scholar
Lee, J., Sato, M.: Implementation and performance evaluation of XcalableMP: a parallel programming language for distributed memory systems. In: 2010 39th International Conference on Parallel Processing Workshops, pp. 413–420 (2010). https://doi.org/10.1109/ICPPW.2010.62
Louboutin, M., et al.: Devito (v3.1.0): an embedded domain-specific language for finite differences and geophysical exploration. Geosci. Model Dev. 12(3), 1165–1187 (2019). https://doi.org/10.5194/gmd-12-1165-2019
Mellor-Crummey, J., Fowler, R., Whalley, D.: Tools for application-oriented performance tuning. In: Proceedings of the 15th International Conference on Supercomputing, ICS 2001, pp. 154–165. Association for Computing Machinery, New York (2001). https://doi.org/10.1145/377792.377826
Meng, J., Atle, A., Calandra, H., Araya-Polo, M.: Minimod: a finite difference solver for seismic modeling. arXiv (2020). https://arxiv.org/abs/2007.06048
Moustafa, S., Kirschenmann, W., Dupros, F., Aochi, H.: Task-based programming on emerging parallel architectures for finite-differences seismic numerical kernel. In: Aldinucci, M., Padovani, L., Torquati, M. (eds.) Euro-Par 2018. LNCS, vol. 11014, pp. 764–777. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96983-1_54
Chapter Google Scholar
NERSC: Cori. https://docs.nersc.gov/systems/cori/
Nguyen, A., Satish, N., Chhugani, J., Kim, C., Dubey, P.: 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In: SC 2010: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–13 (2010)
Google Scholar
Oak Ridge Leadership Computing Facility: Summit. https://www.olcf.ornl.gov/olcf-resources/compute-systems/summit/
OpenMP Architecture Review Board: OpenMP Application Programming Interface, November 2018. https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf. version 5.0
Planas, J., Badia, R.M., Ayguadé, E., Labarta, J.: Hierarchical task-based programming with StarSs. Int. J. High Perform. Comput. Appl. 23(3), 284–299 (2009). https://doi.org/10.1177/1094342009106195
Qawasmeh, A., Hugues, M.R., Calandra, H., Chapman, B.M.: Performance portability in reverse time migration and seismic modelling via OpenACC. Int. J. High Perform. Comput. Appl. 31(5), 422–440 (2017). https://doi.org/10.1177/1094342016675678
Article Google Scholar
Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O’Reilly Media, Beijing (2007)
Google Scholar
Rico, A., Sánchez Barrera, I., Joao, J.A., Randall, J., Casas, M., Moretó, M.: On the benefits of tasking with OpenMP. In: Fan, X., de Supinski, B.R., Sinnen, O., Giacaman, N. (eds.) IWOMP 2019. LNCS, vol. 11718, pp. 217–230. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28596-8_15
Chapter Google Scholar
Slaughter, E., Lee, W., Treichler, S., Bauer, M., Aiken, A.: Regent: a high-productivity programming language for HPC with logical regions. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015. Association for Computing Machinery, New York (2015). https://doi.org/10.1145/2807591.2807629
Thaler, F., et al.: Porting the COSMO weather model to manycore CPUs. In: Proceedings of the Platform for Advanced Scientific Computing Conference, PASC 2019. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3324989.3325723
Vidal, R., et al.: Evaluating the impact of OpenMP 4.0 extensions on relevant parallel workloads. In: Terboven, C., de Supinski, B.R., Reble, P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2015. LNCS, vol. 9342, pp. 60–72. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24595-9_5
Chapter Google Scholar
Virouleau, P., et al.: Evaluation of OpenMP dependent tasks with the KASTORS benchmark suite. In: DeRose, L., de Supinski, B.R., Olivier, S.L., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2014. LNCS, vol. 8766, pp. 16–29. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11454-5_2
Chapter Google Scholar

Download references

Acknowledgements

We would like to thank Total Exploration and Production Research and Technologies for their support of this work. We also thank Vivek Kale at Brookhaven National Laboratory for his help in guiding the experiments in this paper. We used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility operated under Contract No. DE-AC02-05CH11231. We also used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. Furthermore, we would like to thank Stony Brook Research Computing and Cyberinfrastructure, and the Institute for Advanced Computational Science at Stony Brook University for access to the SeaWulf computing system, which was made possible by a 1.4M National Science Foundation grant (#1531492).

Author information

Authors and Affiliations

Stony Brook University, Stony Brook, NY, 11794, USA
Eric Raut & Barbara Chapman
Total EP R&T, Houston, TX, 77002, USA
Jie Meng & Mauricio Araya-Polo

Authors

Eric Raut
View author publications
You can also search for this author in PubMed Google Scholar
Jie Meng
View author publications
You can also search for this author in PubMed Google Scholar
Mauricio Araya-Polo
View author publications
You can also search for this author in PubMed Google Scholar
Barbara Chapman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eric Raut .

Editor information

Editors and Affiliations

Texas Advanced Computing Center (TACC), Austin, TX, USA
Kent Milfeld
Lawrence Livermore National Laboratory, Livermore, CA, USA
Bronis R. de Supinski
Texas Advanced Computing Center (TACC), Austin, TX, USA
Lars Koesterke
RWTH Aachen University, Aachen, Germany
Jannis Klinkenberg

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Raut, E., Meng, J., Araya-Polo, M., Chapman, B. (2020). Evaluating Performance of OpenMP Tasks in a Seismic Stencil Application. In: Milfeld, K., de Supinski, B., Koesterke, L., Klinkenberg, J. (eds) OpenMP: Portable Multi-Level Parallelism on Modern Systems. IWOMP 2020. Lecture Notes in Computer Science(), vol 12295. Springer, Cham. https://doi.org/10.1007/978-3-030-58144-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-58144-2_5
Published: 01 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58143-5
Online ISBN: 978-3-030-58144-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evaluating Performance of OpenMP Tasks in a Seismic Stencil Application

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Task-Based Programming on Emerging Parallel Architectures for Finite-Differences Seismic Numerical Kernel

A Generic Strategy for Multi-stage Stencils

Enhancing a GPU-Based Wave Propagation Application Through Loop Tiling and Loop Fission Optimizations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Evaluating Performance of OpenMP Tasks in a Seismic Stencil Application

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Task-Based Programming on Emerging Parallel Architectures for Finite-Differences Seismic Numerical Kernel

A Generic Strategy for Multi-stage Stencils

Enhancing a GPU-Based Wave Propagation Application Through Loop Tiling and Loop Fission Optimizations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation