Interprocessor communication for high performance, explicit time integration

Georgios Petropoulos¹ &
Gregory L. Fenves²

148 Accesses
2 Citations
Explore all metrics

Abstract

Parallel, explicit finite element analysis is based almost exclusively on point-to-point interprocessor communication. However, point-to-point communication on multicore architectures results in large performance variability because of shared caches and sockets. The interprocessor communication required during the solution phase must be designed to achieve a high degree of scalability and performance for explicit time integration operators. An analysis of point-to-point communication on different hardware platforms, communication library implementations, and message sizes demonstrates the need for a flexible software design that allows for optimization. Autotuning modules and preliminary performance tests are necessary to identify the optimal combination of calls. Performance differences of point-to-point messaging on multicore machines are illustrated with a test that uses combinations of MPI communication calls. The differences are apparent when cache and sockets are shared among the cores and for message sizes up to 1.5 MB. Alternative communication schemes are shown to perform faster depending on the architecture and message size. Nearly linear scalability results for explicit time integration are demonstrated using the design techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multithreaded runtime framework for parallel and adaptive applications

Article 31 July 2022

Parallel and Distributed Computing

Iteration-fusing conjugate gradient for sparse linear systems with MPI + OmpSs

Article 10 December 2019

References

Culler D, Karp R, Patterson D, Sahay A, Schauser KE, Santos E, Subramonian R, von Eicken T (1993) LogP: Towards a realistic model of parallel computation. In: Proceedings 4th ACM SIGPLAN symposium on principles and practice of parallel programming. http://citeseer.ist.psu.edu/culler93logp.html
Danielson KT, Namburu RR (1998) Nonlinear dynamic finite element analysis on parallel computers using FORTRAN 90 and MPI. Adv Eng Softw 29(3–6):179–186
Article Google Scholar
Demmel JW (1997) Applied numerical linear algebra. SIAM, Philadelphia
MATH Google Scholar
Karypis G, Kumar V (1999) Parallel multilevel k-way partitioning scheme for irregular graphs. SIAM Rev 41(2):278–300
Article MATH MathSciNet Google Scholar
Karypis G, Kumar V (1998) Multilevel k-way partitioning scheme for irregular graphs. J Parallel Distrib Comput 48:96–129
Article Google Scholar
Krysl P, Bittner Z (2001) Parallel explicit solid dynamics with domain decomposition and message passing: dual partitioning scalability. Comput Struct 79(3):345–360
Article Google Scholar
Kumar S, Kale LV (2004) Scaling all-to-all multicast on fat-tree networks. In: ICPADS ’04: proceedings of the parallel and distributed systems, tenth international conference 2004, IEEE Computer Society, Washington, DC, USA
Lawrence Livermore National Laboratory, MPI Performance Topics (2008) http://computing.llnl.gov/tutorials/mpi_performance
McKenna FT, Fenves GL (2005) Open system for earthquake engineering simulation, Pacific Earthquake Engineering Research Center, University of California, Berkeley. http://opensees.berkeley.edu
Message Passing Interface Forum, MPI (1995) A Message Passing Interface Standard. http://www.mpi-forum.org
PMB PingPong (2008) http://www.lfbs.rwth-aachen.de/content/index.php?ctl_pos=392
MVAPICH: MPI over InfiniBand and iWARP. http://mvapich.cse.ohio-state.edu/
National Center for Supercomputing Applications at the University of Illinois. http://www.ncsa.uiuc.edu/
Open MPI: Open Source High Performance Computing. http://www.open-mpi.org/
Petropoulos G, Fenves GL (2008) Large-scale simulation of soil-foundation interaction on building response in a region. In: The 14th world conference on earthquake engineering, 12–17 October, Beijing, China (14-0061)
Rao RM (2006) Explicit nonlinear dynamic finite element analysis on homogeneous/heterogeneous parallel computing environment. Adv Eng Softw 37:701–720
Article Google Scholar
San Diego Supercomputer Center. http://www.sdsc.edu/
Texas Advanced Computing Center. http://www.tacc.utexas.edu/
Thakur R, Rabenseifner R, Gropp W (2005) Optimization of collective communication operations in MPICH. Int J High Perform Comput Appl 19:49–66
Article Google Scholar
Toselli A, Windluff O (2005) Domain decomposition methods—algorithms and theory. Springer, Berlin
Virtual Machine Interface 2.1. http://vmi.ncsa.uiuc.edu

Download references

Acknowledgments

This research has been supported by the National Science Foundation under grants EEC-0121989 and OCI-0749227. The simulations where performed under an allocation approved by the Cyberinfrastructure Partnership for TeraGrid resources under award ECS080001. The awards and grants are greatly appreciated.

Author information

Authors and Affiliations

Haas School of Business, University of California at Berkeley, Berkeley, CA, 94720, USA
Georgios Petropoulos
Dean Cockrell School of Engineering, University of Texas at Austin, Austin, TX, 78712, USA
Gregory L. Fenves

Authors

Georgios Petropoulos
View author publications
You can also search for this author in PubMed Google Scholar
Gregory L. Fenves
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Georgios Petropoulos.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Petropoulos, G., Fenves, G.L. Interprocessor communication for high performance, explicit time integration. Engineering with Computers 26, 149–157 (2010). https://doi.org/10.1007/s00366-010-0174-x

Download citation

Received: 24 February 2009
Accepted: 05 August 2009
Published: 28 January 2010
Issue Date: April 2010
DOI: https://doi.org/10.1007/s00366-010-0174-x

Interprocessor communication for high performance, explicit time integration

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multithreaded runtime framework for parallel and adaptive applications

Parallel and Distributed Computing

Iteration-fusing conjugate gradient for sparse linear systems with MPI + OmpSs

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Interprocessor communication for high performance, explicit time integration

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multithreaded runtime framework for parallel and adaptive applications

Parallel and Distributed Computing

Iteration-fusing conjugate gradient for sparse linear systems with MPI + OmpSs

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation