Optimizing the synchronization operations in message passing interface one-sided communication
The International Journal of High Performance Computing …, 2005•journals.sagepub.com
One-sided communication in Message Passing Interface (MPI) requires the use of one of
three different synchronization mechanisms, which indicate when the one-sided operation
can be started and when the operation is completed. Efficient implementation of the
synchronization mechanisms is critical to achieving good performance with one-sided
communication. However, our performance measurements indicate that in many MPI
implementations, the synchronization functions add significant overhead, resulting in one …
three different synchronization mechanisms, which indicate when the one-sided operation
can be started and when the operation is completed. Efficient implementation of the
synchronization mechanisms is critical to achieving good performance with one-sided
communication. However, our performance measurements indicate that in many MPI
implementations, the synchronization functions add significant overhead, resulting in one …
One-sided communication in Message Passing Interface (MPI) requires the use of one of three different synchronization mechanisms, which indicate when the one-sided operation can be started and when the operation is completed. Efficient implementation of the synchronization mechanisms is critical to achieving good performance with one-sided communication. However, our performance measurements indicate that in many MPI implementations, the synchronization functions add significant overhead, resulting in one-sided communication performing much worse than point-to-point communication for short- and medium-sized messages. In this paper, we describe our efforts to minimize the overhead of synchronization in our implementation of one-sided communication in MPICH2. We describe our optimizations for all three synchronization mechanisms defined in MPI: fence, post-start-complete-wait, and lock-unlock. Our performance results demonstrate that, for short messages, MPICH2 performs six times faster than LAM for fence synchronization and 50% faster for post-start-complete-wait synchronization, and it performs more than twice as fast as Sun MPI for all three synchronization methods.
Sage Journals