The development of Mellanox/NVIDIA GPUDirect over InfiniBand—a new model for GPU to GPU communications

Gilad Shainer¹,
Ali Ayoub²,
Pak Lui²,
Tong Liu²,
Michael Kagan²,
Christian R. Trott³,
Greg Scantlen⁴ &
…
Paul S. Crozier⁵

620 Accesses
45 Citations
4 Altmetric
Explore all metrics

Abstract

The usage and adoption of General Purpose GPUs (GPGPU) in HPC systems is increasing due to the unparalleled performance advantage of the GPUs and the ability to fulfill the ever-increasing demands for floating points operations. While the GPU can offload many of the application parallel computations, the system architecture of a GPU-CPU-InfiniBand server does require the CPU to initiate and manage memory transfers between remote GPUs via the high speed InfiniBand network. In this paper we introduce for the first time a new innovative technology—GPUDirect that enables Tesla GPUs to transfer data via InfiniBand without the involvement of the CPU or buffer copies, hence dramatically reducing the GPU communication time and increasing overall system performance and efficiency. We also explore for the first time the performance benefits of GPUDirect using Amber and LAMMPS applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scalable Multi-node Fast Fourier Transform on GPUs

Article 19 August 2023

OpenCL as a Programming Model for GPU Clusters

MACC: An OpenACC Transpiler for Automatic Multi-GPU Use

References

Kindratenko V, Enos J, Shi G, Showerman M, Arnold G, Stone J, Phillips J, Hwu W-m (2009) GPU clusters for high-performance computing. In: Cluster computing and workshops
Google Scholar
Wu E, Liu Y (2008) Emerging technology about GPGPU. In: Circuits and systems
Google Scholar
Chen G, Li G, Pei S, Wu B (2009) High performance computing via a GPU. In: Information science and engineering (ICISE)
Google Scholar
Garland M (2010) Parallel computing with CUDA. In: Parallel & distributed processing (IPDPS)
Google Scholar
The TOP500 list—www.top500.org
InfiniBand Trade Association—www.infinibandta.org/
Mellanox Technologies—www.mellanox.com
LAMMPS and the USER-CUDA package—http://lammps.sandia.gov/, http://code.google.com/p/gpulammps/

Download references

Author information

Authors and Affiliations

HPC Advisory Council, Sunnyvale, CA, USA
Gilad Shainer
Mellanox Technologies, Sunnyvale, CA, USA
Ali Ayoub, Pak Lui, Tong Liu & Michael Kagan
Institut für Physik, Technische Universität at Ilmenau, Ilmenau, Germany
Christian R. Trott
Creative Consultants, Albuquerque, NM, USA
Greg Scantlen
Sandia National Laboratories, Albuquerque, NM, USA
Paul S. Crozier

Authors

Gilad Shainer
View author publications
You can also search for this author in PubMed Google Scholar
Ali Ayoub
View author publications
You can also search for this author in PubMed Google Scholar
Pak Lui
View author publications
You can also search for this author in PubMed Google Scholar
Tong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Michael Kagan
View author publications
You can also search for this author in PubMed Google Scholar
Christian R. Trott
View author publications
You can also search for this author in PubMed Google Scholar
Greg Scantlen
View author publications
You can also search for this author in PubMed Google Scholar
Paul S. Crozier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gilad Shainer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shainer, G., Ayoub, A., Lui, P. et al. The development of Mellanox/NVIDIA GPUDirect over InfiniBand—a new model for GPU to GPU communications. Comput Sci Res Dev 26, 267–273 (2011). https://doi.org/10.1007/s00450-011-0157-1

Download citation

Published: 08 April 2011
Issue Date: June 2011
DOI: https://doi.org/10.1007/s00450-011-0157-1

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Scalable Multi-node Fast Fourier Transform on GPUs

OpenCL as a Programming Model for GPU Clusters

MACC: An OpenACC Transpiler for Automatic Multi-GPU Use

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

The development of Mellanox/NVIDIA GPUDirect over InfiniBand—a new model for GPU to GPU communications

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Scalable Multi-node Fast Fourier Transform on GPUs

OpenCL as a Programming Model for GPU Clusters

MACC: An OpenACC Transpiler for Automatic Multi-GPU Use

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation