research-article

Local and Remote GPUs Perform Similar with EDR 100G InfiniBand

Authors:

Federico Silla,

Scot SchultzAuthors Info & Claims

Middleware Industry '15: Proceedings of the Industrial Track of the 16th International Middleware Conference

Article No.: 4, Pages 1 - 7

https://doi.org/10.1145/2830013.2830015

Published: 07 December 2015 Publication History

Abstract

The use of graphics processing units (GPUs) to accelerate some portions of applications is widespread nowadays. To avoid the usual inconveniences associated with these accelerators (high acquisition cost, high energy consumption, and low utilization), one possible solution is sharing them among several nodes in the cluster. Several years ago, remote GPU virtualization middleware systems appeared to implement this solution. Although these systems tackled the aforementioned inconveniences, their performance was usually impaired by the low bandwidth attained by the underlying network. However, the recent advances in InfiniBand fabrics have changed this trend. In this paper we analyze how the high bandwidth provided by the new EDR 100G InfiniBand fabric allows remote GPU virtualization middleware systems not only to perform very similar to local GPUs, but also to improve overall performance for some applications.

References

[1]

P. K. Agarwal, S. Hampton, J. Poznanovic, A. Ramanthan, S. R. Alam, and P. S. Crozier. Performance modeling of microsecond scale biological molecular dynamics simulations on heterogeneous architectures. Concurrency and Computation: Practice and Experience, 25(10):1356--1375, 2013.

[2]

A. Athanasopoulos, A. Dimou, V. Mezaris, and I. Kompatsiaris. GPU Acceleration for Support Vector Machines. In 12th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2011), Apr 2011.

[3]

W. M. Brown, A. Kohlmeyer, S. J. Plimpton, and A. N. Tharrington. Implementing molecular dynamics on hybrid high performance computers: Particle-particle particle-mesh. Computer Physics Communications, 183(3):449--459, 2012.

[4]

S. Che, M. Boyer, J. Meng, D. Tarjan, J. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on, pages 44--54, Oct 2009.

Digital Library

[5]

G. Giunta, R. Montella, G. Agrillo, and G. Coviello. A GPGPU transparent virtualization component for high performance computing clouds. In Euro-Par 2010 - Parallel Processing, 2010.

Digital Library

[6]

L. H and D. R. Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics, 25:1754--1760, 2009.

Digital Library

[7]

Innovative Computing Laboratory, University of Tennessee. MAGMA: Matrix Algebra on GPU and Multicore Architectures, 2014.

[8]

S. Iserte, A. Castello, R. Mayo, E. Quintana-Orti, F. Silla, J. Duato, C. Reaño, and J. Prades. SLURM Support for Remote GPU Virtualization: Implementation and Performance Study. In 26th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), Oct 2014.

Digital Library

[9]

T. Y. Liang and Y. W. Chang. GridCuda: A Grid-Enabled CUDA Programming Toolkit. In 2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications (WAINA), pages 141--146, 2011.

Digital Library

[10]

Y. Liu, B. Schmidt, W. Liu, and D. L. Maskell. CUDA-MEME: Accelerating motif discovery in biological sequences using CUDA-enabled graphics processing units. Pattern Recognition Letters, 31(14):2170--2177, 2010.

Digital Library

[11]

Y. Liu, A. Wirawan, and B. Schmidt. CUDASW++ 3.0: accelerating smith-waterman protein database search by coupling CPU and GPU SIMD instructions. BMC Bioinformatics, 14(1), 2013.

[12]

NVIDIA. CUDA C Programming Guide 7.0, 2015.

[13]

NVIDIA. CUDA Runtime API 7.0, 2015.

[14]

NVIDIA. CUDA Samples Reference Manual 7.0, 2015.

[15]

NVIDIA. GPU Applications. http://www.nvidia.com/object/gpu-applications.html, 2015.

[16]

M. Oikawa, A. Kawai, K. Nomura, K. Yasuoka, K. Yoshikawa, and T. Narumi. DS-CUDA: A Middleware to Use Many GPUs in the Cloud Environment. In Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC '12, pages 1207--1214, 2012.

Digital Library

[17]

A. J. Peña, C. Reaño, F. Silla, R. Mayo, E. S. Quintana-Orti, and J. Duato. A complete and efficient CUDA-sharing solution for HPC clusters. Parallel Computing, 40:574--588, 12/2014 2014.

Digital Library

[18]

D. P. Playne and K. A. Hawick. Data parallel three-dimensional cahn-hilliard field equation simulation on GPUs with CUDA. In PDPTA, pages 104--110, 2009.

[19]

C. Reaño and F. Silla. A performance comparison of CUDA remote GPU virtualization frameworks. In IEEE Cluster Conference, Sept 2015.

Digital Library

[20]

C. Reaño, F. Silla, A. Castello, A. J. Peña, R. Mayo, E. S. Quintana-Orti, and J. Duato. Improving the user experience of the rCUDA remote GPU virtualization framework. Concurrency and Computation: Practice and Experience, 2014.

[21]

L. Shi, H. Chen, and J. Sun. vCUDA: GPU accelerated high performance computing in virtual machines. In IEEE International Symposium on Parallel & Distributed Processing (IPDPS) 2009, pages 1--11, 2009.

Digital Library

[22]

F. Silla. rCUDA: Virtualizing CPUs to reduce cost and improve performance. https://stacresearch.com/fall2014LON. STAC Summit. London, UK. October 30, 2014. Last accessed: 2015 July 26.

[23]

V. Surkov. Parallel option pricing with fourier space time-stepping method on graphics processing units. Parallel Computing, 36(7):372--380, 2010. Parallel and Distributed Computing in Finance.

Digital Library

[24]

P. D. Vouzis and N. V. Sahinidis. GPU-BLAST: Using graphics processors to accelerate protein sequence alignment. Bioinformatics, 2010.

Digital Library

[25]

H. Wu, G. Diamos, T. Sheard, M. Aref, S. Baxter, M. Garland, and S. Yalamanchili. Red Fox: An Execution Environment for Relational Query Processing on GPUs. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '14, pages 44:44--44:54. ACM, 2014.

Digital Library

[26]

I. Yamazaki, T. Dong, R. Solca, S. Tomov, J. Dongarra, and T. Schulthess. Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems. Concurrency and Computation: Practice and Experience, 26(16):2652--2666, 2014.

Digital Library

[27]

D. Yuancheng Luo. Canny edge detection on NVIDIA CUDA. In Computer Vision and Pattern Recognition Workshops, 2008. CVPRW '08. IEEE Computer Society Conference on, pages 1--8. IEEE, 2008.

Cited By

Rajput SWidmayer TShang ZKechagia MSarro FSharma T(2024)Enhancing Energy-Awareness in Deep Learning through Fine-Grained Energy MeasurementACM Transactions on Software Engineering and Methodology10.1145/368047033:8(1-34)Online publication date: 26-Jul-2024
https://dl.acm.org/doi/10.1145/3680470
Natori KOtani IHarasawa HSaito SFujimoto K(2024)Power Saving for Hardware Accelerated Applications With Dynamical Processor SwitchingIEEE Access10.1109/ACCESS.2024.344843212(118109-118121)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3448432
Shan BAraya-Polo MMalik AChapman BChen QHuang ZSi M(2023)MPI-based Remote OpenMP Offloading: A More Efficient and Easy-to-use ImplementationProceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3582514.3582519(50-59)Online publication date: 25-Feb-2023
https://dl.acm.org/doi/10.1145/3582514.3582519
Show More Cited By

Index Terms

Local and Remote GPUs Perform Similar with EDR 100G InfiniBand

Recommendations

InfiniBand Verbs Optimizations for Remote GPU Virtualization
CLUSTER '15: Proceedings of the 2015 IEEE International Conference on Cluster Computing

The use of InfiniBand networks to interconnect high performance computing clusters has considerably increased during the last years. So much so that the majority of the supercomputers included in the TOP500 list either use Ethernet or InfiniBand ...
CUDA acceleration for Xen virtual machines in infiniband clusters with rCUDA
PPoPP '16

Many data centers currently use virtual machines (VMs) to achieve a more efficient usage of hardware resources. However, current virtualization solutions, such as Xen, do not easily provide graphics processing unit (GPU) accelerators to applications ...
On construction of a virtual GPU cluster with InfiniBand and 10 Gb Ethernet virtualization

Due to increasing requirement of computing capability, the graphics processor unit and CUDA are used to build a higher-performance computing environment. The graphics processing unit (GPU) is necessary for building the high-performance computing ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

Middleware Industry '15: Proceedings of the Industrial Track of the 16th International Middleware Conference

December 2015

32 pages

ISBN:9781450337274

DOI:10.1145/2830013

Program Chairs:
K. R. Jayaram
IBM T. J. Watson Research Center, USA
,
Michael A. Kozuch
Intel Labs, USA

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

ACM: Association for Computing Machinery
USENIX Assoc: USENIX Assoc
IFIP: International Federation for Information Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 December 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Generalitat Valenciana

Conference

Middleware '15

Sponsor:

ACM
USENIX Assoc
IFIP

Middleware '15: 16th International Middleware Conference

December 7 - 11, 2015

BC, Vancouver, Canada

Acceptance Rates

Middleware Industry '15 Paper Acceptance Rate 4 of 20 submissions, 20%;

Overall Acceptance Rate 203 of 948 submissions, 21%

Upcoming Conference

MIDDLEWARE '25

26th International Middleware Conference

December 15 - 19, 2025

Nashville , TN , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

40
Total Citations
View Citations
305
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)2

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Rajput SWidmayer TShang ZKechagia MSarro FSharma T(2024)Enhancing Energy-Awareness in Deep Learning through Fine-Grained Energy MeasurementACM Transactions on Software Engineering and Methodology10.1145/368047033:8(1-34)Online publication date: 26-Jul-2024
https://dl.acm.org/doi/10.1145/3680470
Natori KOtani IHarasawa HSaito SFujimoto K(2024)Power Saving for Hardware Accelerated Applications With Dynamical Processor SwitchingIEEE Access10.1109/ACCESS.2024.344843212(118109-118121)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3448432
Shan BAraya-Polo MMalik AChapman BChen QHuang ZSi M(2023)MPI-based Remote OpenMP Offloading: A More Efficient and Easy-to-use ImplementationProceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3582514.3582519(50-59)Online publication date: 25-Feb-2023
https://dl.acm.org/doi/10.1145/3582514.3582519
Naranjo Delgado DContreras MMoltó GRisco SBlanquer IPrades JSilla FVieira MCardellini VDi Marco ATuma P(2023)On the Acceleration of FaaS Using Remote GPU VirtualizationCompanion of the 2023 ACM/SPEC International Conference on Performance Engineering10.1145/3578245.3584933(157-164)Online publication date: 15-Apr-2023
https://dl.acm.org/doi/10.1145/3578245.3584933
Saito SFujimoto KShiraga A(2023)Low-latency remote-offloading system for accelerator offloading2023 26th Conference on Innovation in Clouds, Internet and Networks and Workshops (ICIN)10.1109/ICIN56760.2023.10073506(58-65)Online publication date: 6-Mar-2023
https://doi.org/10.1109/ICIN56760.2023.10073506
Masouros DPinto CGazzetti MXydis SSoudris D(2023)Adrias: Interference-Aware Memory Orchestration for Disaggregated Cloud Infrastructures2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070939(855-869)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10070939
Saito SFujimoto KShiraga A(2023)Low-latency remote-offloading system for acceleratorAnnals of Telecommunications10.1007/s12243-023-00994-3Online publication date: 3-Nov-2023
https://doi.org/10.1007/s12243-023-00994-3
Shen WLiu ZTan YLuo ZLei Z(2022)KubeGPU: efficient sharing and isolation mechanisms for GPU resource management in container cloudThe Journal of Supercomputing10.1007/s11227-022-04682-279:1(591-625)Online publication date: 14-Jul-2022
https://doi.org/10.1007/s11227-022-04682-2
Saito SFujimoto KKaneko MShiraga A(2022)Fast Offloading of Accelerator Task over Network with Hardware AssistanceEdge Computing – EDGE 202210.1007/978-3-031-23470-5_1(1-17)Online publication date: 16-Dec-2022
https://doi.org/10.1007/978-3-031-23470-5_1
Lu WShan BRaut EMeng JAraya-Polo MDoerfert JMalik AChapman B(2022)Towards Efficient Remote OpenMP OffloadingOpenMP in a Modern World: From Multi-device Support to Meta Programming10.1007/978-3-031-15922-0_2(17-31)Online publication date: 20-Sep-2022
https://doi.org/10.1007/978-3-031-15922-0_2
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten