Nothing Special   »   [go: up one dir, main page]

skip to main content
announcement

GPUrpc: Exploring Transparent Access to Remote GPUs

Published: 13 October 2016 Publication History

Abstract

Graphics processing units (GPUs) are increasingly used for high-performance computing. Programming frameworks for general-purpose computing on GPUs (GPGPU), such as CUDA and OpenCL, are also maturing. Driving this trend is the recent proliferation of mobile devices such as smartphones and wearable computers. These devices are increasingly incorporating computationally intensive applications that involve some form of environmental recognition such as augmented reality (AR) or voice recognition. However, devices with low computational power cannot satisfy such demanding computing requirements. The CPU load of these devices could be reduced by offloading computation onto GPUs on the cloud. This paper presents GPUrpc, a remote procedure call (RPC) extension to Gdev, which is a rich set of runtime libraries and device drivers for achieving first-class GPU resource management. GPUrpc allows developers to use CUDA for GPGPU development work. Existing research uses RPCs based on the CUDA application programming interfaces (APIs); hence, all CUDA APIs require communication. To reduce communication overhead, we use an RPC based on a low-level API than CUDA API and reduced API that does not require communication. Our evaluation conducted on Linux and NVIDIA GPUs shows that the basic performance of our prototype implementation is reliable in comparison with the existing method. Evaluation using the Rodinia benchmark suite designed for research in heterogeneous parallel computing showed that GPUrpc is effective for applications such as image processing and data mining. GPUrpc also can improve power consumption to approximately 1/6 that of CPU processing for performing 512 × 512 matrix multiplication.

References

[1]
2014. TOP500 supercomputing sites. Retrieved from http://www.top500.org/lists/2014/11/.
[2]
Erik Alerstam, Tomas Svensson, and Stefan Andersson-Engels. 2008. Parallel computing with graphics processing units for high-speed Monte Carlo simulation of photon migration. J. Biomed. Optics 13, 6 (2008), 060504--060504.
[3]
Nguyen Viet Anh, Yusuke Fujii, Yuki Iida, Takuya Azumi, Nobuhiko Nishio, and Shinpei Kato. 2014. Reducing data copies between GPUs and NICs. In Proceedings of the IEEE International Conference on Cyber-Physical Systems, Networks, and Applications. 37--42.
[4]
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC) (IISWC’09). IEEE Computer Society, 44--54.
[5]
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, and Kevin Skadron. 2008. A performance study of general-purpose applications on graphics processors using CUDA. J. Parallel Distrib. Comput. 68, 10 (Oct. 2008), 1370--1380.
[6]
Eduardo Cuervo, Aruna Balasubramanian, Dae ki Cho, Alec Wolman, Stefan Saroiu, Ranveer Chandra, and Paramvir Bahl. 2010. MAUI: Making smartphones last longer with code offload. In Proceedings of the ACM International Conference on Mobile Systems, Applications, and Services. 49--62.
[7]
José Duato, Francisco D. Igual, Rafael Mayo, Antonio J. Peña, Enrique S. Quintana-Ortí, and Federico Silla. 2010a. An efficient implementation of GPU virtualization in high performance clusters. In Proceedings of the 2009 International Conference on Parallel Processing (Euro-Par’09). Springer-Verlag, 385--394. http://dl.acm.org/citation.cfm?id=1884795.1884840
[8]
José Duato, Antonio J. Peña, Federico Silla, Rafael Mayo, and Enrique S. Quintana-Ortí. 2010b. rCUDA: Reducing the number of GPU-based accelerators in high performance clusters. In Proceedings of the 2010 International Conference on High Performance Computing 8 Simulation (HPCS’10). 224--231.
[9]
Glenn A. Elliott and James H. Anderson. 2014. Exploring the multitude of real-time multi-GPU configurations. In Proceedings of the IEEE Real-Time Systems Symposium. 260--271.
[10]
James H. Anderson Glenn A. Elliott, Bryan C. Ward. 2013. GPUSync: A framework for real-time GPU management. In Proceedings of the IEEE Real-Time Systems Symposium. 33--44.
[11]
Google. 2013. Google Glass. Retrieved from http://www.google.com/glass.
[12]
Khronos Group. 2013. OpenCL. Retrieved from http://jp.khronos.org/opencl.
[13]
Yuki Iida, Manato Hirabayashi, Takuya Azumi, Nobuhiko Nishio, and Shinpei Kato. 2014. Connected smartphones and high-performance servers for remote object detection. In Proceedings of the IEEE International Conference on Cyber-Physical Systems, Networks, and Applications. 71--76.
[14]
Shinpei Kato, Jason Aumiller, and Scott Brandt. 2013. Zero-copy I/O processing for low-latency GPU computing. In Proceedings of the ACM/IEEE 4th International Conference on Cyber-Physical Systems (ICCPS’13). ACM, 170--178.
[15]
Shinpei Kato, Michael McThrow, Carlos Maltzahn, and Scott Brandt. 2012. Gdev: First-class gpu resource management in the operating system. In Presented as Part of the 2012 USENIX Annual Technical Conference (USENIX ATC’12). USENIX, 401--412. https://www.usenix.org/conference/atc12/technical-sessions/ presentation/kato.
[16]
A. Kawai, K. Yasuoka, K. Yoshikawa, and T. Narumi. 2012. Distributed-shared CUDA: Virtualization of large-scale GPU systems for programmability and reliability. In Proceedings of the 4th International Conference on Future Computational Technologies and Applications (FUTURE COMPUTING’12). 712.
[17]
Volodymyr Kindratenko and Pedro Trancoso. 2011. Trends in high-performance computing. Computing in Science 8 Engineering 13, 3 (2011), 92--95.
[18]
Andreas Kolb and Nicolas Cuntz. 2005. Dynamic particle coupling for GPU-based fluid simulation. In Proceedings of the Symposium on Simulation Technique. 722--727.
[19]
Wenjing Ma and Gagan Agrawal. 2009. A translation system for enabling data mining applications on GPUs. In Proceedings of the 23rd International Conference on Supercomputing (ICS’09). ACM, 400--409.
[20]
NVIDIA. 2015a. CUDA C Programming Guide. Retrieved from http://docs.nvidia.com/cuda/cuda-c-programm ing-guide.
[21]
NVIDIA. 2015b. CUDA Documents. Retrieved from http://docs.nvidia.com/cuda/.
[22]
Minoru Oikawa, Atsushi Kawai, Kentaro Nomura, Kenji Yasuoka, Kazuyuki Yoshikawa, and Tetsu Narumi. 2012. DS-CUDA: A middleware to use many GPUs in the cloud environment. In Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis (SCC’12). IEEE Computer Society, Washington, DC, USA, 1207--1214.
[23]
Antonio José Peña, Carlos Reaño, Federico Silla, Rafael Mayo, Enrique S. Quintana-Ortí, and Jose Duato. 2014. A complete and efficient CUDA-sharing solution for HPC clusters. Parallel Comput. 40, 10 (2014), 574--588.
[24]
Padmanabhan S. Pillai, Lily B. Mummert, Steven W. Schlosser, and Rahul Sukthankar. 2009. SLIPstream: Scalable low-latency interactive perception on streaming data. In NOSSDAV. ACM, 43--48.
[25]
Moo-Ryong Ra, Anmol Sheth, Lily Mummert, Padmanabhan Pillai, David Wetherall, and Ramesh Govindan. 2011. Odessa: Enabling interactive perception applications on mobile devices. In Proceedings of the 9th International Conference on Mobile Systems, Applications, and Services (MobiSys’11). ACM, 43--56.
[26]
Haicheng Wu, Gregory Diamos, Srihari Cadambi, and Sudhakar Yalamanchili. 2012. Kernel weaver: Automatically fusing database primitives for efficient GPU computation. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). IEEE Computer Society, 107--118.
[27]
Jeffrey Young, Haicheng Wu, and Sudhakar Yalamanchili. 2012. Satisfying data-intensive queries using GPU clusters. In SC Companion. IEEE Computer Society, 1314.

Cited By

View all
  • (2019)A Crowd-Based Image Learning Framework using Edge Computing for Smart City Applications2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM)10.1109/BigMM.2019.00-47(11-20)Online publication date: Sep-2019
  • (2017)Mobile Edge Decoding for Saving Energy and Improving Experience2017 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData)10.1109/iThings-GreenCom-CPSCom-SmartData.2017.76(475-482)Online publication date: Jun-2017

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 16, Issue 1
Special Issue on VIPES, Special Issue on ICESS2015 and Regular Papers
February 2017
602 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/3008024
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 13 October 2016
Accepted: 01 May 2016
Revised: 01 May 2016
Received: 01 October 2015
Published in TECS Volume 16, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPU
  2. cloud computing
  3. distributed computing
  4. high performance computing
  5. parallel computing

Qualifiers

  • Announcement
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)3
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2019)A Crowd-Based Image Learning Framework using Edge Computing for Smart City Applications2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM)10.1109/BigMM.2019.00-47(11-20)Online publication date: Sep-2019
  • (2017)Mobile Edge Decoding for Saving Energy and Improving Experience2017 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData)10.1109/iThings-GreenCom-CPSCom-SmartData.2017.76(475-482)Online publication date: Jun-2017

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media