Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- ArticleSeptember 2010
Measuring execution times of collective communications in an empirical optimization framework
An essential part of an empirical optimization library are the timing procedures with which the performance of different codelets is determined. In this paper, we present for four different timing methods to optimize collective MPI communications and ...
- ArticleSeptember 2010
Adaptive MPI multirail tuning for non-uniform input/output access
Multicore processors have not only reintroduced Non-Uniform Memory Access (NUMA) architectures in nowadays parallel computers, but they are also responsible for non-uniform access times with respect to Input/Output devices (NUIOA). In clusters of ...
- ArticleSeptember 2010
Load balancing for regular meshes on SMPs with MPI
Domain decomposition for regular meshes on parallel computers has traditionally been performed by attempting to exactly partition the work among the available processors (now cores). However, these strategies often do not consider the inherent system ...
- ArticleSeptember 2010
Transparent redundant computing with MPI
Extreme-scale parallel systems will require alternative methods for applications to maintain current levels of uninterrupted execution. Redundant computation is one approach to consider, if the benefits of increased resiliency outweigh the cost of ...
- ArticleSeptember 2010
Communication target selection for replicated MPI processes
VolpexMPI is an MPI library designed for volunteer computing environments. In order to cope with the fundamental unreliability of these environments, VolpexMPI deploys two or more replicas of each MPI process. A receiver-driven communication scheme is ...
- ArticleSeptember 2010
Dodging the cost of unavoidable memory copies in message logging protocols
With the number of computing elements spiraling to hundred of thousands in modern HPC systems, failures are common events. Few applications are nevertheless fault tolerant; most are in need for a seamless recovery framework. Among the automatic fault ...
- ArticleSeptember 2010
Characteristics of the unexpected message queue of MPI applications
High Performance Computing systems are used on a regular basis to run a myriad of application codes, yet a surprising dearth of information exists with respect to communications characteristics. Even less information is available on the low-level ...
- ArticleSeptember 2010
Implementing MPI on windows: comparison with common approaches on Unix
Commercial HPC applications are often run on clusters that use the Microsoft Windows operating system and need an MPI implementation that runs efficiently in the Windows environment. The MPI developer community, however, is more familiar with the issues ...
- ArticleSeptember 2010
Network offloaded hierarchical collectives using ConnectX-2's CORE-Direct capabilities
As the scale of High Performance Computing (HPC) systems continues to increase, demanding that we extract even more parallelism from applications, the need to move communication management away from the Central Processing Unit (CPU) becomes even ...
- ArticleSeptember 2010
Design of kernel-level asynchronous collective communication
Overlapping computation and communication, not only point-to-point but also collective communications, is an important technique to improve the performance of parallel programs. Since the current non-blocking collective communications have been mostly ...