Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2830013.2830015acmconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article

Local and Remote GPUs Perform Similar with EDR 100G InfiniBand

Published: 07 December 2015 Publication History

Abstract

The use of graphics processing units (GPUs) to accelerate some portions of applications is widespread nowadays. To avoid the usual inconveniences associated with these accelerators (high acquisition cost, high energy consumption, and low utilization), one possible solution is sharing them among several nodes in the cluster. Several years ago, remote GPU virtualization middleware systems appeared to implement this solution. Although these systems tackled the aforementioned inconveniences, their performance was usually impaired by the low bandwidth attained by the underlying network. However, the recent advances in InfiniBand fabrics have changed this trend. In this paper we analyze how the high bandwidth provided by the new EDR 100G InfiniBand fabric allows remote GPU virtualization middleware systems not only to perform very similar to local GPUs, but also to improve overall performance for some applications.

References

[1]
P. K. Agarwal, S. Hampton, J. Poznanovic, A. Ramanthan, S. R. Alam, and P. S. Crozier. Performance modeling of microsecond scale biological molecular dynamics simulations on heterogeneous architectures. Concurrency and Computation: Practice and Experience, 25(10):1356--1375, 2013.
[2]
A. Athanasopoulos, A. Dimou, V. Mezaris, and I. Kompatsiaris. GPU Acceleration for Support Vector Machines. In 12th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2011), Apr 2011.
[3]
W. M. Brown, A. Kohlmeyer, S. J. Plimpton, and A. N. Tharrington. Implementing molecular dynamics on hybrid high performance computers: Particle-particle particle-mesh. Computer Physics Communications, 183(3):449--459, 2012.
[4]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on, pages 44--54, Oct 2009.
[5]
G. Giunta, R. Montella, G. Agrillo, and G. Coviello. A GPGPU transparent virtualization component for high performance computing clouds. In Euro-Par 2010 - Parallel Processing, 2010.
[6]
L. H and D. R. Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics, 25:1754--1760, 2009.
[7]
Innovative Computing Laboratory, University of Tennessee. MAGMA: Matrix Algebra on GPU and Multicore Architectures, 2014.
[8]
S. Iserte, A. Castello, R. Mayo, E. Quintana-Orti, F. Silla, J. Duato, C. Reaño, and J. Prades. SLURM Support for Remote GPU Virtualization: Implementation and Performance Study. In 26th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), Oct 2014.
[9]
T. Y. Liang and Y. W. Chang. GridCuda: A Grid-Enabled CUDA Programming Toolkit. In 2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications (WAINA), pages 141--146, 2011.
[10]
Y. Liu, B. Schmidt, W. Liu, and D. L. Maskell. CUDA-MEME: Accelerating motif discovery in biological sequences using CUDA-enabled graphics processing units. Pattern Recognition Letters, 31(14):2170--2177, 2010.
[11]
Y. Liu, A. Wirawan, and B. Schmidt. CUDASW++ 3.0: accelerating smith-waterman protein database search by coupling CPU and GPU SIMD instructions. BMC Bioinformatics, 14(1), 2013.
[12]
NVIDIA. CUDA C Programming Guide 7.0, 2015.
[13]
NVIDIA. CUDA Runtime API 7.0, 2015.
[14]
NVIDIA. CUDA Samples Reference Manual 7.0, 2015.
[15]
NVIDIA. GPU Applications. http://www.nvidia.com/object/gpu-applications.html, 2015.
[16]
M. Oikawa, A. Kawai, K. Nomura, K. Yasuoka, K. Yoshikawa, and T. Narumi. DS-CUDA: A Middleware to Use Many GPUs in the Cloud Environment. In Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC '12, pages 1207--1214, 2012.
[17]
A. J. Peña, C. Reaño, F. Silla, R. Mayo, E. S. Quintana-Orti, and J. Duato. A complete and efficient CUDA-sharing solution for HPC clusters. Parallel Computing, 40:574--588, 12/2014 2014.
[18]
D. P. Playne and K. A. Hawick. Data parallel three-dimensional cahn-hilliard field equation simulation on GPUs with CUDA. In PDPTA, pages 104--110, 2009.
[19]
C. Reaño and F. Silla. A performance comparison of CUDA remote GPU virtualization frameworks. In IEEE Cluster Conference, Sept 2015.
[20]
C. Reaño, F. Silla, A. Castello, A. J. Peña, R. Mayo, E. S. Quintana-Orti, and J. Duato. Improving the user experience of the rCUDA remote GPU virtualization framework. Concurrency and Computation: Practice and Experience, 2014.
[21]
L. Shi, H. Chen, and J. Sun. vCUDA: GPU accelerated high performance computing in virtual machines. In IEEE International Symposium on Parallel & Distributed Processing (IPDPS) 2009, pages 1--11, 2009.
[22]
F. Silla. rCUDA: Virtualizing CPUs to reduce cost and improve performance. https://stacresearch.com/fall2014LON. STAC Summit. London, UK. October 30, 2014. Last accessed: 2015 July 26.
[23]
V. Surkov. Parallel option pricing with fourier space time-stepping method on graphics processing units. Parallel Computing, 36(7):372--380, 2010. Parallel and Distributed Computing in Finance.
[24]
P. D. Vouzis and N. V. Sahinidis. GPU-BLAST: Using graphics processors to accelerate protein sequence alignment. Bioinformatics, 2010.
[25]
H. Wu, G. Diamos, T. Sheard, M. Aref, S. Baxter, M. Garland, and S. Yalamanchili. Red Fox: An Execution Environment for Relational Query Processing on GPUs. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '14, pages 44:44--44:54. ACM, 2014.
[26]
I. Yamazaki, T. Dong, R. Solca, S. Tomov, J. Dongarra, and T. Schulthess. Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems. Concurrency and Computation: Practice and Experience, 26(16):2652--2666, 2014.
[27]
D. Yuancheng Luo. Canny edge detection on NVIDIA CUDA. In Computer Vision and Pattern Recognition Workshops, 2008. CVPRW '08. IEEE Computer Society Conference on, pages 1--8. IEEE, 2008.

Cited By

View all
  • (2024)Enhancing Energy-Awareness in Deep Learning through Fine-Grained Energy MeasurementACM Transactions on Software Engineering and Methodology10.1145/368047033:8(1-34)Online publication date: 26-Jul-2024
  • (2024)Power Saving for Hardware Accelerated Applications With Dynamical Processor SwitchingIEEE Access10.1109/ACCESS.2024.344843212(118109-118121)Online publication date: 2024
  • (2023)MPI-based Remote OpenMP Offloading: A More Efficient and Easy-to-use ImplementationProceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3582514.3582519(50-59)Online publication date: 25-Feb-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
Middleware Industry '15: Proceedings of the Industrial Track of the 16th International Middleware Conference
December 2015
32 pages
ISBN:9781450337274
DOI:10.1145/2830013
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 December 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CUDA
  2. GPU virtualization
  3. GPUs
  4. InfiniBand
  5. performance

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

Middleware '15
Sponsor:
  • ACM
  • USENIX Assoc
  • IFIP
Middleware '15: 16th International Middleware Conference
December 7 - 11, 2015
BC, Vancouver, Canada

Acceptance Rates

Middleware Industry '15 Paper Acceptance Rate 4 of 20 submissions, 20%;
Overall Acceptance Rate 203 of 948 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)3
Reflects downloads up to 01 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Enhancing Energy-Awareness in Deep Learning through Fine-Grained Energy MeasurementACM Transactions on Software Engineering and Methodology10.1145/368047033:8(1-34)Online publication date: 26-Jul-2024
  • (2024)Power Saving for Hardware Accelerated Applications With Dynamical Processor SwitchingIEEE Access10.1109/ACCESS.2024.344843212(118109-118121)Online publication date: 2024
  • (2023)MPI-based Remote OpenMP Offloading: A More Efficient and Easy-to-use ImplementationProceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3582514.3582519(50-59)Online publication date: 25-Feb-2023
  • (2023)On the Acceleration of FaaS Using Remote GPU VirtualizationCompanion of the 2023 ACM/SPEC International Conference on Performance Engineering10.1145/3578245.3584933(157-164)Online publication date: 15-Apr-2023
  • (2023)Low-latency remote-offloading system for accelerator offloading2023 26th Conference on Innovation in Clouds, Internet and Networks and Workshops (ICIN)10.1109/ICIN56760.2023.10073506(58-65)Online publication date: 6-Mar-2023
  • (2023)Adrias: Interference-Aware Memory Orchestration for Disaggregated Cloud Infrastructures2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070939(855-869)Online publication date: Feb-2023
  • (2023)Low-latency remote-offloading system for acceleratorAnnals of Telecommunications10.1007/s12243-023-00994-3Online publication date: 3-Nov-2023
  • (2022)KubeGPU: efficient sharing and isolation mechanisms for GPU resource management in container cloudThe Journal of Supercomputing10.1007/s11227-022-04682-279:1(591-625)Online publication date: 14-Jul-2022
  • (2022)Fast Offloading of Accelerator Task over Network with Hardware AssistanceEdge Computing – EDGE 202210.1007/978-3-031-23470-5_1(1-17)Online publication date: 16-Dec-2022
  • (2022)Towards Efficient Remote OpenMP OffloadingOpenMP in a Modern World: From Multi-device Support to Meta Programming10.1007/978-3-031-15922-0_2(17-31)Online publication date: 20-Sep-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media