Abstract
Parallel, explicit finite element analysis is based almost exclusively on point-to-point interprocessor communication. However, point-to-point communication on multicore architectures results in large performance variability because of shared caches and sockets. The interprocessor communication required during the solution phase must be designed to achieve a high degree of scalability and performance for explicit time integration operators. An analysis of point-to-point communication on different hardware platforms, communication library implementations, and message sizes demonstrates the need for a flexible software design that allows for optimization. Autotuning modules and preliminary performance tests are necessary to identify the optimal combination of calls. Performance differences of point-to-point messaging on multicore machines are illustrated with a test that uses combinations of MPI communication calls. The differences are apparent when cache and sockets are shared among the cores and for message sizes up to 1.5 MB. Alternative communication schemes are shown to perform faster depending on the architecture and message size. Nearly linear scalability results for explicit time integration are demonstrated using the design techniques.
Similar content being viewed by others
References
Culler D, Karp R, Patterson D, Sahay A, Schauser KE, Santos E, Subramonian R, von Eicken T (1993) LogP: Towards a realistic model of parallel computation. In: Proceedings 4th ACM SIGPLAN symposium on principles and practice of parallel programming. http://citeseer.ist.psu.edu/culler93logp.html
Danielson KT, Namburu RR (1998) Nonlinear dynamic finite element analysis on parallel computers using FORTRAN 90 and MPI. Adv Eng Softw 29(3–6):179–186
Demmel JW (1997) Applied numerical linear algebra. SIAM, Philadelphia
Karypis G, Kumar V (1999) Parallel multilevel k-way partitioning scheme for irregular graphs. SIAM Rev 41(2):278–300
Karypis G, Kumar V (1998) Multilevel k-way partitioning scheme for irregular graphs. J Parallel Distrib Comput 48:96–129
Krysl P, Bittner Z (2001) Parallel explicit solid dynamics with domain decomposition and message passing: dual partitioning scalability. Comput Struct 79(3):345–360
Kumar S, Kale LV (2004) Scaling all-to-all multicast on fat-tree networks. In: ICPADS ’04: proceedings of the parallel and distributed systems, tenth international conference 2004, IEEE Computer Society, Washington, DC, USA
Lawrence Livermore National Laboratory, MPI Performance Topics (2008) http://computing.llnl.gov/tutorials/mpi_performance
McKenna FT, Fenves GL (2005) Open system for earthquake engineering simulation, Pacific Earthquake Engineering Research Center, University of California, Berkeley. http://opensees.berkeley.edu
Message Passing Interface Forum, MPI (1995) A Message Passing Interface Standard. http://www.mpi-forum.org
PMB PingPong (2008) http://www.lfbs.rwth-aachen.de/content/index.php?ctl_pos=392
MVAPICH: MPI over InfiniBand and iWARP. http://mvapich.cse.ohio-state.edu/
National Center for Supercomputing Applications at the University of Illinois. http://www.ncsa.uiuc.edu/
Open MPI: Open Source High Performance Computing. http://www.open-mpi.org/
Petropoulos G, Fenves GL (2008) Large-scale simulation of soil-foundation interaction on building response in a region. In: The 14th world conference on earthquake engineering, 12–17 October, Beijing, China (14-0061)
Rao RM (2006) Explicit nonlinear dynamic finite element analysis on homogeneous/heterogeneous parallel computing environment. Adv Eng Softw 37:701–720
San Diego Supercomputer Center. http://www.sdsc.edu/
Texas Advanced Computing Center. http://www.tacc.utexas.edu/
Thakur R, Rabenseifner R, Gropp W (2005) Optimization of collective communication operations in MPICH. Int J High Perform Comput Appl 19:49–66
Toselli A, Windluff O (2005) Domain decomposition methods—algorithms and theory. Springer, Berlin
Virtual Machine Interface 2.1. http://vmi.ncsa.uiuc.edu
Acknowledgments
This research has been supported by the National Science Foundation under grants EEC-0121989 and OCI-0749227. The simulations where performed under an allocation approved by the Cyberinfrastructure Partnership for TeraGrid resources under award ECS080001. The awards and grants are greatly appreciated.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Petropoulos, G., Fenves, G.L. Interprocessor communication for high performance, explicit time integration. Engineering with Computers 26, 149–157 (2010). https://doi.org/10.1007/s00366-010-0174-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00366-010-0174-x