survey

GPU Virtualization and Scheduling Methods: A Comprehensive Survey

Authors:

Dimitrios S. NikolopoulosAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 50, Issue 3

Article No.: 35, Pages 1 - 37

https://doi.org/10.1145/3068281

Published: 29 June 2017 Publication History

Abstract

The integration of graphics processing units (GPUs) on high-end compute nodes has established a new accelerator-based heterogeneous computing model, which now permeates high-performance computing. The same paradigm nevertheless has limited adoption in cloud computing or other large-scale distributed computing paradigms. Heterogeneous computing with GPUs can benefit the Cloud by reducing operational costs and improving resource and energy efficiency. However, such a paradigm shift would require effective methods for virtualizing GPUs, as well as other accelerators. In this survey article, we present an extensive and in-depth survey of GPU virtualization techniques and their scheduling methods. We review a wide range of virtualization techniques implemented at the GPU library, driver, and hardware levels. Furthermore, we review GPU scheduling methods that address performance and fairness issues between multiple virtual machines sharing GPUs. We believe that our survey delivers a perspective on the challenges and opportunities for virtualization of heterogeneous computing environments.

References

[1]

Darren Abramson, Jeff Jackson, Sridhar Muthrasanallur, Gil Neiger, Greg Regnier, Rajesh Sankaran, Ioannis Schoinas, Rich Uhlig, Balaji Vembu, and John Wiegert. 2006. Intel virtualization technology for directed I/O. Intel Technol. J. 10, 3 (2006).

[2]

EC Amazon. 2010. Amazon elastic compute cloud (Amazon EC2). https://aws.amazon.com/ec2/.

[3]

AMD. 2009. R6xx_3D_Registers.pdf. Retrieved from http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/10/R6xx_3D_Registers.pdf. (2009).

[4]

Joshua Anderson, Aaron Keys, Carolyn Phillips, Trung Dac Nguyen, and Sharon Glotzer. 2010. HOOMD-blue, general-purpose many-body dynamics on the GPU. In APS Meeting Abstracts, Vol. 1. 18008.

[5]

Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A. Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, and others. 2006. The Landscape of Parallel Computing Research: A View from Berkeley. EECS Department Technical Report UCB/EECS-2006-183. University of California, Berkeley.

[6]

Andreu Badal and Aldo Badano. 2009. Accelerating monte carlo simulations of photon transport in a voxelized geometry using a massively parallel graphics processing unit. Med. Phys. 36, 11 (2009), 4878--4880.

[7]

Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. 2003. Xen and the art of virtualization. ACM SIGOPS Operat. Syst. Rev. 37, 5 (2003), 164--177.

Digital Library

[8]

Andreas Athanasopoulos, Anastasios Dimou, Vasileios Mezaris, and Ioannis Kompatsiaris. 2011. GPU acceleration for support vector machines. In 12th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS’11). TU Delft; EWI; MM; PRB, Delft, The Netherlands.

[9]

Can Basaran and Kyoung-Don Kang. 2012. Supporting preemptive task executions and memory copies in gpgpus. In Proceedings of the 2012 24th Euromicro Conference on Real-Time Systems. IEEE, 287--296.

Digital Library

[10]

Michela Becchi, Kittisak Sajjapongse, Ian Graves, Adam Procter, Vignesh Ravi, and Srimat Chakradhar. 2012. A virtual memory based runtime to support multi-tenancy in clusters with GPUs. In Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing. ACM, 97--108.

Digital Library

[11]

Brahim Bensaou, Danny H. K. Tsang, and King Tung Chan. 2001. Credit-based fair queueing (CBFQ): A simple service-scheduling algorithm for packet-switched networks. IEEE/ACM Trans. Network. 9, 5 (2001), 591--604.

Digital Library

[12]

David Blythe. 2006. The direct3d 10 system. In ACM Transactions on Graphics, Vol. 25. ACM, 724--734.

Digital Library

[13]

Robert A. Bridges, Neena Imam, and Tiffany M Mintz. 2016. Understanding GPU power: A survey of profiling, modeling, and simulation methods. ACM Comput. Surv. 49, 3 (2016), 41.

Digital Library

[14]

Anton Burtsev, Kiran Srinivasan, Prashanth Radhakrishnan, Kaladhar Voruganti, and Garth R. Goodson. 2009. Fido: Fast inter-virtual-machine communication for enterprise appliances. In Proceedings of the USENIX Annual Technical Conference.

Digital Library

[15]

Adrián Castelló, Antonio J. Peña, Rafael Mayo, Pavan Balaji, and Enrique S. Quintana-Ortí. 2015. Exploring the suitability of remote GPGPU virtualization for the OpenACC programming model using rCUDA. In Proceedings of the 2015 IEEE International Conference on Cluster Computing. IEEE, 92--95.

Digital Library

[16]

Ethan Cerami. 2002. Web Services Essentials: Distributed Applications with XML-RPC, SOAP, UDDI 8 WSDL. O’Reilly Media, Inc.

Digital Library

[17]

Charu Chaubal. 2008. The architecture of vmware esxi. VMware White Pap. 1, 7 (2008).

[18]

Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the IEEE International Symposium on Workload Characterization, 2009 (IISWC’09). IEEE, 44--54.

Digital Library

[19]

Hao Chen, Lin Shi, and Jianhua Sun. 2010. VMRPC: A high efficiency and light weight RPC system for virtual machines. In Proceedings of the 2010 18th International Workshop on Quality of Service (IWQoS’10). IEEE, 1--9.

[20]

Yun Chan Cho and Jae Wook Jeon. 2007. Sharing data between processes running on different domains in para-virtualized xen. In Proceedings of the International Conference on Control, Automation and Systems, 2007 (ICCAS’07). IEEE, 1255--1260.

[21]

Steve Crago, Kyle Dunn, Patrick Eads, Lorin Hochstein, Dong-In Kang, Mikyung Kang, Devendra Modium, Karandeep Singh, Jinwoo Suh, and John Paul Walters. 2011. Heterogeneous cloud computing. In Proceedings of the 2011 IEEE International Conference on Cluster Computing. IEEE, 378--385.

Digital Library

[22]

Chris I. Dalton, David Plaquin, Wolfgang Weidner, Dirk Kuhlmann, Boris Balacheff, and Richard Brown. 2009. Trusted virtual platforms: A key enabler for converged client devices. ACM SIGOPS Operat. Syst. Rev. 43, 1 (2009), 36--43.

Digital Library

[23]

Anthony Danalis, Gabriel Marin, Collin McCurdy, Jeremy S. Meredith, Philip C. Roth, Kyle Spafford, Vinod Tipparaju, and Jeffrey S. Vetter. 2010. The scalable heterogeneous computing (SHOC) benchmark suite. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units. ACM, 63--74.

Digital Library

[24]

A. Demers, S. Keshav, and S. Shenker. 1989. Design and analysis of a fair queuing algorithm. In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM’89), Vol. 89.

Digital Library

[25]

Roberto Di Lauro, Flora Giannone, Luigia Ambrosio, and Raffaele Montella. 2012. Virtualizing general purpose GPUs for high performance cloud computing: An application to a fluid simulator. In Proceedings of the 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications (ISPA’12). IEEE, 863--864.

Digital Library

[26]

Matthew Dixon, Sabbir Ahmed Khan, and Mohammad Zubair. 2014. Accelerating option risk analytics in R using GPUs. In Proceedings of the High Performance Computing Symposium. Society for Computer Simulation International, 24.

Digital Library

[27]

Yaozu Dong, Mochi Xue, Xiao Zheng, Jiajun Wang, Zhengwei Qi, and Haibing Guan. 2015. Boosting GPU virtualization performance with hybrid shadow page tables. In Proceedings of the 2015 USENIX Annual Technical Conference (USENIX ATC’15). 517--528.

Digital Library

[28]

Yaozu Dong, Xiaowei Yang, Jianhui Li, Guangdeng Liao, Kun Tian, and Haibing Guan. 2012. High performance network virtualization with SR-IOV. J. Parallel Distrib. Comput. 72, 11 (2012), 1471--1480.

Digital Library

[29]

Jack J. Dongarra, Piotr Luszczek, and Antoine Petitet. 2003. The LINPACK benchmark: Past, present and future. Concurr. Comput.: Pract. Exper. 15, 9 (2003), 803--820.

[30]

Micah Dowty and Jeremy Sugerman. 2009. GPU virtualization on VMware’s hosted I/O architecture. ACM SIGOPS Operat. Syst. Rev. 43, 3 (2009), 73--82.

Digital Library

[31]

José Duato, Francisco D. Igual, Rafael Mayo, Antonio J. Peña, Enrique S. Quintana-Ortí, and Federico Silla. 2009. An efficient implementation of GPU virtualization in high performance clusters. In European Conference on Parallel Processing. Springer, 385--394.

Digital Library

[32]

José Duato, Antonio J. Peña, Federico Silla, Juan C. Fernandez, Rafael Mayo, and Enrique S. Quintana-Ortí. 2011. Enabling CUDA acceleration within virtual machines using rCUDA. In Proceedings of the 2011 18th International Conference on High Performance Computing (HiPC’11). IEEE, 1--10.

Digital Library

[33]

José Duato, Antonio J. Peña, Federico Silla, Rafael Mayo, and Enrique S. Quintana-Ortı. 2010a. Modeling the CUDA remoting virtualization behaviour in high performance networks. In Proceedings of the 1st Workshop on Language, Compiler, and Architecture Support for GPGPU.

[34]

José Duato, Antonio J. Peña, Federico Silla, Rafael Mayo, and Enrique S. Quintana-Ortí. 2010b. rCUDA: Reducing the number of GPU-based accelerators in high performance clusters. In Proceedings of the 2010 International Conference on High Performance Computing and Simulation (HPCS’10). IEEE, 224--231.

[35]

José Duato, Antonio J. Peña, Federico Silla, Rafael Mayo, and Enrique S Quintana-Ortí. 2011. Performance of CUDA virtualized remote GPUs in high performance clusters. In Proceedings of the 2011 International Conference on Parallel Processing (ICPP’11). IEEE, 365--374.

Digital Library

[36]

Ashok Dwarakinath. 2008. A Fair-Share Scheduler for the Graphics Processing Unit. Ph.D. Dissertation. Citeseer.

[37]

Roberto R. Expósito, Guillermo L. Taboada, Sabela Ramos, Juan Touriño, and Ramón Doallo. 2013. General-purpose computation on GPUs for high performance cloud computing. Concurr. Comput.: Pract. Exper. 25, 12 (2013), 1628--1642.

[38]

Naila Farooqui, Rajkishore Barik, Brian T. Lewis, Tatiana Shpeisman, and Karsten Schwan. 2016. Affinity-aware work-stealing for integrated CPU-GPU processors. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, 30.

Digital Library

[39]

Denis Foley. 2014. NVLink, pascal and stacked memory: Feeding the appetite for big data. Retrieved from Nvidia.com (2014).

[40]

Futuremark. 1998. 3DMark Benchmarks—See the Current Range of this Popular PC Graphics Card Test. Retrieved from http://www.futuremark.com/benchmarks/3dmark/all?_ga=1.168926249.987441096.1470653002. (1998).

[41]

Tal Garfinkel and Mendel Rosenblum. 2005. When virtual is harder than real: Security challenges in virtual machine based computing environments. In Proceedings of the Workshop on Hot Topics in Operating Systems (HotOS’05).

Digital Library

[42]

Carl Gebhardt and Allan Tomlinson. 2010. Challenges for Inter Virtual Machine Communication. Technical Report. Citeseer.

[43]

Francisco Giunta, Raffaele Montella, Giuliano Laccetti, Florin Isaila, and F. Blas. 2011. A GPU accelerated high performance cloud computing infrastructure for grid computing based virtual environmental laboratory. Adv. Grid Comput. Lecture Notes in Computer Science. Vol. 6271. Springer, Berlin, Heidelberg, 35--43.

[44]

Giulio Giunta, Raffaele Montella, Giuseppe Agrillo, and Giuseppe Coviello. 2010. A GPGPU transparent virtualization component for high performance computing clouds. In Euro-Par 2010-Parallel Processing. Springer, 379--391.

Digital Library

[45]

Jens Glaser, Trung Dac Nguyen, Joshua A. Anderson, Pak Lui, Filippo Spiga, Jaime A. Millan, David C. Morse, and Sharon C. Glotzer. 2015. Strong scaling of general-purpose molecular dynamics simulations on GPUs. Comput. Phys. Commun. 192 (2015), 97--107.

[46]

Robert P. Goldberg. 1974. Survey of virtual machine research. Computer 7, 6 (1974), 34--45.

Digital Library

[47]

Mathias Gottschlag, Martin Hillenbrand, Jens Kehne, Jan Stoess, and Frank Bellosa. 2013. LoGV: Low-overhead GPGPU virtualization. In Proceedings of the 2013 IEEE 10th International Conference on High Performance Computing and Communications 8 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC’13). IEEE, 1721--1726.

[48]

Simon Green. 2010. Particle simulation using cuda. NVIDIA Whitepaper 6 (2010), 121--128.

[49]

William Gropp, Ewing Lusk, Nathan Doss, and Anthony Skjellum. 1996. A high-performance, portable implementation of the MPI message passing interface standard. Parallel Comput. 22, 6 (1996), 789--828.

Digital Library

[50]

Khronos OpenCL Working Group et al. 2008. The opencl specification. Version 1, 29 (2008), 8.

[51]

Vishakha Gupta, Ada Gavrilovska, Karsten Schwan, Harshvardhan Kharche, Niraj Tolia, Vanish Talwar, and Parthasarathy Ranganathan. 2009. GViM: GPU-accelerated virtual machines. In Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing. ACM, 17--24.

Digital Library

[52]

Haibing Guan, Jianguo Yao, Zhengwei Qi, and Runze Wang. 2015. Energy-efficient SLA guarantees for virtualized GPU in cloud gaming. IEEE Trans.actions on Parallel Distrib. Syst. 26, 9 (2015), 2434--2443.

Digital Library

[53]

Vishakha Gupta, Karsten Schwan, Niraj Tolia, Vanish Talwar, and Parthasarathy Ranganathan. 2011. Pegasus: Coordinated scheduling for virtualized accelerator-based systems. In Proceedings of the 2011 USENIX Annual Technical Conference (USENIX ATC’11). 31.

Digital Library

[54]

Per Hammarlund, Alberto J. Martinez, Atiq A. Bajwa, David L. Hill, Erik Hallnor, Hong Jiang, Martin Dixon, Michael Derr, Mikal Hunsaker, Rajesh Kumar, et al. 2014. Haswell: The fourth-generation intel core processor. IEEE Micro 34, 2 (2014), 6--20.

[55]

Jacob Gorm Hansen. 2007. Blink: Advanced display multiplexing for virtualized applications. In Proceedings of the SIGMM Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV’07).

[56]

Nadav Har’El, Abel Gordon, Alex Landau, Muli Ben-Yehuda, Avishay Traeger, and Razya Ladelsky. 2013. Efficient and scalable paravirtual I/O system. In Proceedings of the USENIX Annual Technical Conference. 231--242.

Digital Library

[57]

Nicholas Haydel, Sandra Gesing, Ian Taylor, Gregory Madey, Abdul Dakkak, Simon Garcia De Gonzalo, and Wen-Mei W. Hwu. 2015. Enhancing the usability and utilization of accelerated architectures via docker. In Proceedings of the 2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing (UCC’15). IEEE, 361--367.

[58]

Alex Herrera. 2014. NVIDIA GRID: Graphics accelerated VDI with the visual performance of a workstation. Nvidia Corp (2014). http://www.nvidia.com/content/grid/vdi-whitepaper.pdf.

[59]

Hua-Jun Hong, Tao-Ya Fan-Chiang, Che-Run Lee, Kuan-Ta Chen, Chun-Ying Huang, and Cheng-Hsin Hsu. 2014. GPU consolidation for cloud games: Are we there yet?. In Proceedings of the 13th Annual Workshop on Network and Systems Support for Games. IEEE Press, 3.

Digital Library

[60]

Yu-Ju Huang, Hsuan-Heng Wu, Yeh-Ching Chung, and Wei-Chung Hsu. 2016. Building a KVM-based hypervisor for a heterogeneous system architecture compliant system. In Proceedings of the12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments. ACM, 3--15.

Digital Library

[61]

Greg Humphreys, Mike Houston, Ren Ng, Randall Frank, Sean Ahern, Peter D. Kirchner, and James T. Klosowski. 2002. Chromium: A stream-processing framework for interactive rendering on clusters. In ACM Transactions on Graphics, Vol. 21. ACM, 693--702.

Digital Library

[62]

Su Min Jang, Won Hyuk Choi, and Won Young Kim. 2013. Client rendering method for desktop virtualization services. ETRI J. 35, 2 (2013), 348--351.

[63]

Víctor J. Jiménez, Lluís Vilanova, Isaac Gelado, Marisa Gil, Grigori Fursin, and Nacho Navarro. 2009. Predictive runtime code scheduling for heterogeneous architectures. In High Performance Embedded Architectures and Compilers. Springer, 19--33.

Digital Library

[64]

Heeseung Jo, Jinkyu Jeong, Myoungho Lee, and Dong Hoon Choi. 2013a. Exploiting GPUs in virtual machine for biocloud. BioMed Res. Int. 2013 (2013).

[65]

Hee Seung Jo, Myung Ho Lee, and Dong Hoon Choi. 2013b. GPU virtualization using PCI direct pass-through. In Applied Mechanics and Materials, Vol. 311. Trans Tech Publ, 15--19.

[66]

David Kanter. 2010. Intels sandy bridge microarchitecture. http://www.realworldtech.com/sandy-bridge/.

[67]

Ian Karlin, Jeff Keasler, and Rob Neely. 2013. Lulesh 2.0 updates and changes. Livermore, CA (2013). https://codesign.llnl.gov/lulesh.php.

[68]

Shinpei Kato, Scott Brandt, Yutaka Ishikawa, and R Rajkumar. 2011a. Operating systems challenges for GPU resource management. In Proceedings of the International Workshop on Operating Systems Platforms for Embedded Real-Time Applications. 23--32.

[69]

Shinpei Kato, Karthik Lakshmanan, Yutaka Ishikawa, and Ragunathan Rajkumar. 2011b. Resource sharing in GPU-accelerated windowing systems. In Proceedings of the 2011 17th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’11). IEEE, 191--200.

Digital Library

[70]

Shinpei Kato, Karthik Lakshmanan, Aman Kumar, Mihir Kelkar, Yutaka Ishikawa, and Ragunathan Rajkumar. 2011c. RGEM: A responsive GPGPU execution model for runtime engines. In Proceedings of the 2011 IEEE 32nd Real-Time Systems Symposium (RTSS’11). IEEE, 57--66.

Digital Library

[71]

Shinpei Kato, Karthik Lakshmanan, Raj Rajkumar, and Yutaka Ishikawa. 2011d. TimeGraph: GPU scheduling for real-time multi-tasking environments. In Proceedings of the 2011 USENIX Annual Technical Conference (USENIX ATC’11). 17.

Digital Library

[72]

Shinpei Kato, Michael McThrow, Carlos Maltzahn, and Scott A. Brandt. 2012. Gdev: First-class GPU resource management in the operating system. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’11). 401--412.

Digital Library

[73]

Se Won Kim, Chiyoung Lee, MooWoong Jeon, Hae Young Kwon, Hyun Woo Lee, and Chuck Yoo. 2013. Secure device access for automotive software. In Proceedings of the 2013 International Conference on Connected Vehicles and Expo (ICCVE’13). IEEE, 177--181.

[74]

David B. Kirk and W. Hwu Wen-mei. 2012. Programming Massively Parallel Processors: A Hands-on Approach. Newnes.

Digital Library

[75]

Avi Kivity, Yaniv Kamay, Dor Laor, Uri Lublin, and Anthony Liguori. 2007. kvm: The Linux virtual machine monitor. In Proceedings of the Linux Symposium, Vol. 1. 225--230.

[76]

Nasser A. Kurd, Subramani Bhamidipati, Christopher Mozak, Jeffrey L. Miller, Timothy M. Wilson, Mahadev Nemani, and Muntaquim Chowdhury. 2010. Westmere: A family of 32nm IA processors. In Proceedings of the 2010 IEEE International Solid-State Circuits Conference (ISSCC’10).

[77]

Maxim A. Kuzkin and Alexander G. Tormasov. 2011. Method and system for remote device access in virtual environment. (issued date: July 5 2011). Patent No. 7,975,017. Filed date: Feb 25, 2009.

[78]

George Kyriazis. 2012. Heterogeneous system architecture: A technical review. In Proceedings of the AMD Fusion Developer Summit (2012).

[79]

Giuliano Laccetti, Raffaele Montella, Carlo Palmieri, and Valentina Pelliccia. 2013. The high performance internet of things: Using GVirtuS to share high-end GPUs with ARM based cluster computing nodes. In International Conference on Parallel Processing and Applied Mathematics. Springer, 734--744.

[80]

H. Andrés Lagar-Cavilla, Niraj Tolia, Mahadev Satyanarayanan, and Eyal De Lara. 2007. VMM-independent graphics acceleration. In Proceedings of the 3rd International Conference on Virtual Execution Environments. ACM, 33--43.

Digital Library

[81]

Palden Lama, Yan Li, Ashwin M. Aji, Pavan Balaji, James Dinan, Shucai Xiao, Yunquan Zhang, Wu-chun Feng, Rajeev Thakur, and Xiaobo Zhou. 2013. pVOCL: Power-aware dynamic placement and migration in virtualized GPU environments. In Proceedings of the 2013 IEEE 33rd International Conference on Distributed Computing Systems (ICDCS’13). IEEE, 145--154.

Digital Library

[82]

Michael Larabel and M. Tippett. 2011. Phoronix test suite. https://www.phoronix-test-suite.com.

[83]

Chiyoung Lee, Se-Won Kim, and Chuck Yoo. 2016. VADI: GPU virtualization for an automotive platform. IEEE Trans. Industr. Inf. 12, 1 (2016), 277--290.

[84]

Gunho Lee and Randy H. Katz. 2011. Heterogeneity-aware resource allocation and scheduling in the cloud. In Proceedings of the 3rd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud’11).

Digital Library

[85]

Teng Li, Vikram K. Narayana, Esam El-Araby, and Tarek El-Ghazawi. 2011. GPU resource sharing and virtualization on high performance computing systems. In Proceedings of the 2011 International Conference on Parallel Processing (ICPP’11). IEEE, 733--742.

Digital Library

[86]

Teng Li, Vikram K. Narayana, and Tarek El-Ghazawi. 2012. Accelerated high-performance computing through efficient multi-process GPU resource sharing. In Proceedings of the 9th Conference on Computing Frontiers. ACM, 269--272.

Digital Library

[87]

Wenqiang Li, Guanghao Jin, Xuewen Cui, and Simon See. 2015. An evaluation of unified memory technology on nvidia gpus. In Proceedings of the 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid’15). IEEE, 1092--1098.

Digital Library

[88]

Tyng-Yeu Liang and Yu-Wei Chang. 2011. GridCuda: A grid-enabled CUDA programming toolkit. In Proceedings of the 2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications (WAINA’11). IEEE, 141--146.

Digital Library

[89]

Christos Margiolas and Michael F. P. O’Boyle. 2016. Portable and transparent software managed scheduling on accelerators for fair resource sharing. In Proceedings of the 2016 International Symposium on Code Generation and Optimization. ACM, 82--93.

Digital Library

[90]

Konstantinos Menychtas, Kai Shen, and Michael L. Scott. 2013. Enabling OS research by inferring interactions in the black-box GPU stack. In Proceedings of the 2013 USENIX Annual Technical Conference (USENIX ATC’13). 291--296.

Digital Library

[91]

Konstantinos Menychtas, Kai Shen, and Michael L. Scott. 2014. Disengaged scheduling for fair, protected access to fast computational accelerators. In ACM SIGPLAN Notices, Vol. 49. ACM, 301--316.

Digital Library

[92]

Alexander M. Merritt, Vishakha Gupta, Abhishek Verma, Ada Gavrilovska, and Karsten Schwan. 2011. Shadowfax: Scaling in heterogeneous cluster systems via GPGPU assemblies. In Proceedings of the 5th International Workshop on Virtualization Technologies in Distributed Computing. ACM, 3--10.

Digital Library

[93]

Sparsh Mittal and Jeffrey S. Vetter. 2015. A survey of methods for analyzing and improving GPU energy efficiency. ACM Comput. Surv. 47, 2 (2015), 19.

Digital Library

[94]

Raffaele Montella, Giuseppe Coviello, Giulio Giunta, Giuliano Laccetti, Florin Isaila, and Javier Garcia Blas. 2011. A general-purpose virtualization service for HPC on cloud computing: An application to GPUs. In International Conference on Parallel Processing and Applied Mathematics. Springer, 740--749.

Digital Library

[95]

Raffaele Montella, Giulio Giunta, and Giuliano Laccetti. 2014. Virtualizing high-end GPGPUs on ARM clusters for the next generation of high performance cloud computing. Cluster Comput. 17, 1 (2014), 139--152.

Digital Library

[96]

Raffaele Montella, Giulio Giunta, Giuliano Laccetti, Marco Lapegna, Carlo Palmieri, Carmine Ferraro, and Valentina Pelliccia. 2016a. Virtualizing CUDA enabled GPGPUs on ARM clusters. In Parallel Processing and Applied Mathematics. Springer, 3--14.

[97]

Raffaele Montella, Giulio Giunta, Giuliano Laccetti, Marco Lapegna, Carlo Palmieri, Carmine Ferraro, Valentina Pelliccia, Cheol-Ho Hong, Ivor Spence, and Dimitrios S. Nikolopoulos. 2016b. On the virtualization of CUDA based GPU remoting on ARM and X86 machines in the GVirtuS framework. Int. J. Parallel Program. (2016), 1--22.

[98]

Christopher Niederauer, Mike Houston, Maneesh Agrawala, and Greg Humphreys. 2003. Non-invasive interactive visualization of dynamic architectural environments. In Proceedings of the 2003 Symposium on Interactive 3D Graphics. ACM, 55--58.

Digital Library

[99]

Nvidia. 2007a. CUDA Code Samples—NVIDIA Developer. Retrieved from https://developer.nvidia.com/cuda-code-samples.

[100]

NVIDIA. 2012. HyperQ Example. Retrieved from http://docs.nvidia.com/cuda/samples/6_Advanced/simpleHyperQ/doc/HyperQ.pdf.

[101]

NVIDIA. 2016a. GP100 Pascal Whitepaper. Retrieved from https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf.

[102]

NVIDIA. 2016b. GPU Cloud Computing Service Providers—NVIDIA. Retrieved from http://www.nvidia.com/object/gpu-cloud-computing-services.html.

[103]

CUDA Nvidia. 2007b. Compute Unified Device Architecture Programming Guide. http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html.

[104]

Katsuhiko Ogata. 1995. Discrete-Time Control Systems. Vol. 2. Prentice Hall, Englewood Cliffs, NJ.

Digital Library

[105]

Masahiro Oikawa, Atsushi Kawai, Keigo Nomura, Koichi Yasuoka, Kenichi Yoshikawa, and Tetsu Narumi. 2012. DS-CUDA: A middleware to use many GPUs in the cloud environment. In Proceedings of the 2012 SC Companion to High Performance Computing, Networking, Storage and Analysis (SCC). IEEE, 1207--1214.

Digital Library

[106]

Zhonghong Ou, Hao Zhuang, Jukka K. Nurminen, Antti Ylä-Jääski, and Pan Hui. 2012. Exploiting hardware heterogeneity within the same instance type of Amazon EC2. Presented in the 4th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud).

Digital Library

[107]

Sankaralingam Panneerselvam and Michael M Swift. 2012. Operating systems should manage accelerators. In Proceedings of the 4th USENIX Workshop on Hot Topics in Parallelism.

Digital Library

[108]

Stan Park and Kai Shen. 2012. FIOS: A fair, efficient flash I/O scheduler. In Proceedings of the 10th USENEX Conference on File and Storage Technologies (FAST’12). 13.

Digital Library

[109]

PathScale. 2012. pathscale/pscnv. Retrieved from https://github.com/pathscale/pscnv.

[110]

Sagar Patni, Jobin George, Pratik Lahoti, and Jibi Abraham. 2015. A zero-copy fast channel for inter-guest and guest-host communication using VirtIO-serial. In Proceedings of the 2015 1st International Conference on Next Generation Computing Technologies (NGCT’15). IEEE, 6--9.

[111]

David Patterson. 2009. The top 10 innovations in the new NVIDIA fermi architecture, and the top 3 next challenges. NVIDIA Whitepaper 47 (2009).

[112]

Antonio J. Peña, Carlos Reaño, Federico Silla, Rafael Mayo, Enrique S. Quintana-Ortí, and José Duato. 2014. A complete and efficient CUDA-sharing solution for HPC clusters. Parallel Comput. 40, 10 (2014), 574--588.

Digital Library

[113]

Ferran Pérez, Carlos Reaño, and Federico Silla. 2016. Providing CUDA acceleration to KVM virtual machines in InfiniBand Clusters with rCUDA. In Distributed Applications and Interoperable Systems. Springer, 82--95.

Digital Library

[114]

Antoine Petitet. 2004. HPL-A portable implementation of the high-performance Linpack benchmark for distributed-memory computers. Retrieved from http://www.netlib-.org/-benchmark/hpl/.

[115]

James C. Phillips, Rosemary Braun, Wei Wang, James Gumbart, Emad Tajkhorshid, Elizabeth Villa, Christophe Chipot, Robert D. Skeel, Laxmikant Kale, and Klaus Schulten. 2005. Scalable molecular dynamics with NAMD. J. Comput. Chem. 26, 16 (2005), 1781--1802.

[116]

Steve Plimpton, Paul Crozier, and Aidan Thompson. 2007. LAMMPS-large-scale atomic/molecular massively parallel simulator. Sandia National Laboratories 18 (2007). http://lammps.sandia.gov.

[117]

Javier Prades, Carlos Reaño, and Federico Silla. 2016. CUDA acceleration for Xen virtual machines in infiniband clusters with rCUDA. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, 35.

Digital Library

[118]

Zhengwei Qi, Jianguo Yao, Chao Zhang, Miao Yu, Zhizhou Yang, and Haibing Guan. 2014. VGRIS: Virtualized GPU resource isolation and scheduling in cloud gaming. ACM Trans. Arch. Code Optimiz. 11, 2 (2014), 17.

Digital Library

[119]

Adit Ranadive and Bhavesh Davda. 2012. Toward a paravirtual vRDMA device for VMware ESXi guests. VMware Techn. J. 2012 1, 2 (2012).

[120]

Vignesh T. Ravi, Michela Becchi, Gagan Agrawal, and Srimat Chakradhar. 2011. Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework. In Proceedings of the 20th International Symposium on High Performance Distributed Computing. ACM, 217--228.

Digital Library

[121]

Carlos Reaño, Rafael Mayo, Enrique S. Quintana-Ortí, Federico Silla, José Duato, and Antonio J. Peña. 2013. Influence of InfiniBand FDR on the performance of remote GPU virtualization. In Proceedings of the 2013 IEEE International Conference on Cluster Computing (CLUSTER’13). IEEE, 1--8.

[122]

Carlos Reaño, A. J. Pea, Federico Silla, José Duato, Rafael Mayo, and Enrique S. Quintana-Ortí. 2012. Cu2rcu: Towards the complete rcuda remote gpu virtualization and sharing solution. In Proceedings of the 2012 19th International Conference on High Performance Computing (HiPC’12). IEEE, 1--10.

[123]

Carlos Reaño and Federico Silla. 2015. A performance comparison of CUDA remote GPU virtualization frameworks. In Proceedings of the 2015 IEEE International Conference on Cluster Computing. IEEE, 488--489.

Digital Library

[124]

Carlos Reaño, Federico Silla, Adrián Castelló, Antonio J . Peña, Rafael Mayo, Enrique S Quintana-Ortí, and José Duato. 2015a. Improving the user experience of the rCUDA remote GPU virtualization framework. Concurr. Comput.: Pract. Exper. 27, 14 (2015), 3746--3770.

Digital Library

[125]

Carlos Reaño, Federico Silla, Gilad Shainer, and Scot Schultz. 2015b. Local and remote GPUs perform similar with EDR 100G InfiniBand. In Proceedings of the Industrial Track of the 16th International Middleware Conference. ACM, 4.

Digital Library

[126]

Christopher J. Rossbach, Jon Currey, Mark Silberstein, Baishakhi Ray, and Emmett Witchel. 2011. PTask: Operating system abstractions to manage GPUs as compute devices. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles. ACM, 233--248.

Digital Library

[127]

Eric E. Schadt, Michael D. Linderman, Jon Sorenson, Lawrence Lee, and Garry P. Nolan. 2011. Cloud and heterogeneous computing solutions exist today for the emerging big data problems in biology. Nat. Rev. Genet. 12, 3 (2011), 224--224.

[128]

Dipanjan Sengupta, Raghavendra Belapure, and Karsten Schwan. 2013. Multi-tenancy on GPGPU-based servers. In Proceedings of the 7th International Workshop on Virtualization Technologies in Distributed Computing. ACM, 3--10.

Digital Library

[129]

Dipanjan Sengupta, Anshuman Goswami, Karsten Schwan, and Krishna Pallavi. 2014. Scheduling multi-tenant cloud workloads on accelerator-based systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press, 513--524.

Digital Library

[130]

Gilad Shainer, Ali Ayoub, Pak Lui, Tong Liu, Michael Kagan, Christian R. Trott, Greg Scantlen, and Paul S. Crozier. 2011. The development of Mellanox/NVIDIA GPUDirect over InfiniBanda new model for GPU to GPU communications. Comput. Sci. Res. Dev. 26, 3-4 (2011), 267--273.

Digital Library

[131]

Haitao Shan, Kevin Tian, Eddie Dong, and David Cowperthwaite. 2013. XenGT: A software based intel graphics virtualization solution. Proceedings of the Xen Project Developer Summit.

[132]

Ryan Shea and Jiangchuan Liu. 2013. On GPU pass-through performance for cloud gaming: Experiments and analysis. In Proceedings of the 2013 12th Annual Workshop on Network and Systems Support for Games (NetGames’13). IEEE, 1--6.

Digital Library

[133]

Lin Shi, Hao Chen, and Jianhua Sun. 2009. vCUDA: GPU accelerated high performance computing in virtual machines. In Proceedings of the IEEE International Symposium on Parallel 8 Distributed Processing, 2009 (IPDPS’09). IEEE, 1--11.

Digital Library

[134]

Lin Shi, Hao Chen, Jianhua Sun, and Kenli Li. 2012. vCUDA: GPU-accelerated high-performance computing in virtual machines. IEEE Trans. Comput. 61, 6 (2012), 804--816.

Digital Library

[135]

Weidong Shi, Yang Lu, Zhu Li, and Jonathan Engelsma. 2011. SHARC: A scalable 3D graphics virtual appliance delivery framework in cloud. J. Netw. Comput. Appl. 34, 4 (2011), 1078--1087.

Digital Library

[136]

Madhavapeddi Shreedhar and George Varghese. 1996. Efficient fair queuing using deficit round-robin. IEEE/ACM Trans. Netw. 4, 3 (1996), 375--385.

Digital Library

[137]

Abraham Silberschatz, Peter B. Galvin, Greg Gagne, and A. Silberschatz. 1998. Operating System Concepts. Vol. 4. Addison-Wesley, Reading, MA.

[138]

Jike Song, Zhiyuan Lv, and Kevin Tian. 2014. KVMGT: A full GPU virtualization solution. In KVM Forum 2014. http://www.linux-kvm.org/page/KVM_Forum_2014.

[139]

John A. Stratton, Christopher Rodrigues, I-Jui Sung, Nady Obeid, Li-Wen Chang, Nasser Anssari, Geng Daniel Liu, and Wen-mei W. Hwu. 2012. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Center for Reliable and High-Performance Computing 127 (2012).

[140]

Yusuke Suzuki, Shinpei Kato, Hiroshi Yamada, and Kenji Kono. 2014. GPUvm: Why not virtualizing GPUs at the hypervisor?. In Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC’14). 109--120.

Digital Library

[141]

Yusuke Suzuki, Shinpei Kato, Hiroshi Yamada, and Kenji Kono. 2016. Gpuvm: Gpu virtualization at the hypervisor. IEEE Trans. Comput. 65, 9 (2016), 2752--2766.

Digital Library

[142]

Ivan Tanasic, Isaac Gelado, Javier Cabezas, Alex Ramirez, Nacho Navarro, and Mateo Valero. 2014. Enabling preemptive multiprogramming on GPUs. In ACM SIGARCH Computer Architecture News, Vol. 42. IEEE Press, 193--204.

Digital Library

[143]

Kun Tian, Yaozu Dong, and David Cowperthwaite. 2014. A full GPU virtualization solution with mediated pass-through. In Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC’14).

Digital Library

[144]

Tsan-Rong Tien and Yi-Ping You. 2014. Enabling OpenCL support for GPGPU in Kernel-based Virtual Machine. Softw.: Pract. Exper. 44, 5 (2014), 483--510.

Digital Library

[145]

Top500. 2016. TOP500 Supercomputer Sites. Retrieved from https://www.top500.org/list/2016/06/.

[146]

Rich Uhlig, Gil Neiger, Dion Rodgers, Amy L. Santoni, Fernando C. M. Martins, Andrew V. Anderson, Steven M. Bennett, Alain Kagi, Felix H. Leung, and Larry Smith. 2005. Intel virtualization technology. Computer 38, 5 (2005), 48--56.

Digital Library

[147]

Leendert Van Doorn. 2006. Hardware virtualization trends. In Proceedings of the 2nd International ACM/Usenix Conference on Virtual Execution Environments, Vol. 14. 45--45.

Digital Library

[148]

Stephen J. Vaughan-Nichols. 2006. New approach to virtualization is a lightweight. Computer 39, 11 (2006).

Digital Library

[149]

Anthony Velte and Toby Velte. 2009. Microsoft Virtualization with Hyper-V. McGraw-Hill, Inc.

Digital Library

[150]

M. S. Vinaya, Naga Vydyanathan, and Mrugesh Gajjar. 2012. An evaluation of CUDA-enabled virtualization solutions. In Proceedings of the 2012 2nd IEEE International Conference on Parallel Distributed and Grid Computing (PDGC’12). IEEE, 621--626.

[151]

Lan Vu, Hari Sivaraman, and Rishi Bidarkar. 2014. GPU virtualization for high performance general purpose computing on the ESX hypervisor. In Proceedings of the High Performance Computing Symposium. Society for Computer Simulation International, 2.

Digital Library

[152]

John Paul Walters, Andrew J. Younge, Dong In Kang, Ke Thia Yao, Mikyung Kang, Stephen P. Crago, and Geoffrey C. Fox. 2014. GPU passthrough performance: A comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL applications. In Proceedings of the 2014 IEEE 7th International Conference on Cloud Computing (CLOUD’14). IEEE, 636--643.

Digital Library

[153]

Bin Wang, Ruhui Ma, Zhengwei Qi, Jianguo Yao, and Haibing Guan. 2016. A user mode CPU--GPU scheduling framework for hybrid workloads. Future Gener. Comput. Syst. 63 (2016), 25--36.

Digital Library

[154]

Jian Wang, Kwame-Lante Wright, and Kartik Gopalan. 2008. XenLoop: A transparent high performance inter-vm network loopback. In Proceedings of the 17th International Symposium on High Performance Distributed Computing. ACM, 109--118.

Digital Library

[155]

Johannes Winter. 2008. Trusted computing building blocks for embedded linux-based ARM trustzone platforms. In Proceedings of the 3rd ACM Workshop on Scalable Trusted Computing. ACM, 21--30.

Digital Library

[156]

Craig M. Wittenbrink, Emmett Kilgariff, and Arjun Prabhu. 2011. Fermi GF100 GPU architecture. IEEE Micro 2 (2011), 50--59.

Digital Library

[157]

Mason Woo, Jackie Neider, Tom Davis, and Dave Shreiner. 1999. OpenGL Programming Guide: The Official Guide to Learning OpenGL, Version 1.2. Addison-Wesley Longman Publishing Co., Inc.

Digital Library

[158]

Linlin Wu, Saurabh Kumar Garg, and Rajkumar Buyya. 2011. Sla-based resource allocation for software as a service provider (saas) in cloud computing environments. In Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid’11). IEEE, 195--204.

Digital Library

[159]

Xenproject. 2016. Xen Project Release Features. Retrieved from https://wiki.xenproject.org/wiki/Xen_Project_Release_Features.

[160]

Shucai Xiao, Pavan Balaji, Qian Zhu, Rajeev Thakur, Susan Coghlan, Heshan Lin, Gaojin Wen, Jue Hong, and Wu-chun Feng. 2012. VOCL: An optimized environment for transparent virtualization of graphics processing units. In Proceedings of the Innovative Parallel Computing (InPar’12). IEEE, 1--12.

[161]

X.OrgFoundation. 2011. Nouveau: Accelerated Open Source driver for nVidia cards. Retrieved from https://nouveau.freedesktop.org/wiki/.

[162]

Mochi Xue, Kun Tian, Yaozu Dong, Jiajun Wang, Zhengwei Qi, Bingsheng He, and Haibing Guan. 2016. gScale: Scaling up GPU virtualization with dynamic sharing of graphics memory space. In Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC’16).

Digital Library

[163]

Chao-Tung Yang, Jung-Chun Liu, Hsien-Yi Wang, and Ching-Hsien Hsu. 2014. Implementation of GPU virtualization using PCI pass-through mechanism. J. Supercomput. 68, 1 (2014), 183--213.

Digital Library

[164]

Chao-Tung Yang, Hsien-Yi Wang, and Yu-Tso Liu. 2012a. Using pci pass-through for gpu virtualization with cuda. In Network and Parallel Computing. Springer, 445--452.

[165]

Chao-Tung Yang, Hsien-Yi Wang, Wei-Shen Ou, Yu-Tso Liu, and Ching-Hsien Hsu. 2012b. On implementation of GPU virtualization using PCI pass-through. In Proceedings of the 2012 IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom’12). IEEE, 711--716.

Digital Library

[166]

Chih-Yuan Yeh, Chung-Yao Kao, Wei-Shu Hung, Ching-Chi Lin, Pangfeng Liu, Jan-Jan Wu, and Kuang-Chih Liu. 2013. GPU virtualization support in cloud system. In International Conference on Grid and Pervasive Computing. Springer, 423--432.

[167]

Yi-Ping You, Hen-Jung Wu, Yeh-Ning Tsai, and Yen-Ting Chao. 2015. VirtCL: A framework for OpenCL device abstraction and management. In ACM SIGPLAN Notices, Vol. 50. ACM, 161--172.

Digital Library

[168]

Andrew J. Younge and Geoffrey C. Fox. 2014. Advanced virtualization techniques for high performance cloud cyberinfrastructure. In Proceedings of the 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid’14). IEEE, 583--586.

Digital Library

[169]

Andrew J. Younge, John Paul Walters, Stephen Crago, and Geoffrey C. Fox. 2014. Evaluating GPU passthrough in Xen for high performance cloud computing. In Proceedings of the 2014 IEEE International Parallel 8 Distributed Processing Symposium Workshops (IPDPSW’14). IEEE, 852--859.

Digital Library

[170]

Andrew J. Younge, John Paul Walters, Stephen P. Crago, and Geoffrey C. Fox. 2015. Supporting high performance molecular dynamics in virtualized clusters using IOMMU, SR-IOV, and GPUDirect. In Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments. ACM, 31--38.

Digital Library

[171]

Chao Zhang, Jianguo Yao, Zhengwei Qi, Miao Yu, and Haibing Guan. 2014. vgasa: Adaptive scheduling algorithm of virtualized gpu resource in cloud gaming. IEEE Trans. Parallel Distrib. Syst. 25, 11 (2014), 3036--3045.

[172]

Youhui Zhang, Peng Qu, Jiang Cihang, and Weimin Zheng. 2016. A cloud gaming system based on user-level virtualization and its resource scheduling. IEEE Trans. Parallel Distrib. Syst. 27, 5 (2016), 1239--1252.

Digital Library

[173]

Husheng Zhou, Guangmo Tong, and Cong Liu. 2015. GPES: A preemptive execution system for GPGPU computing. In Proceedings of the 21st IEEE Real-Time and Embedded Technology and Applications Symposium. IEEE, 87--97.

Cited By

Turchet LKrstulović S(2024)DSP as a Service: Foundations and DirectionsIEEE Open Journal of the Communications Society10.1109/OJCOMS.2024.34646965(6212-6226)Online publication date: 2024
https://doi.org/10.1109/OJCOMS.2024.3464696
Liu YYao MYang XQin MMa LXiang H(2024)Advanced Virtualization in Automotive Operating System: Technology Overview and Scheme Design2024 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA)10.1109/ISPA63168.2024.00018(74-81)Online publication date: 30-Oct-2024
https://doi.org/10.1109/ISPA63168.2024.00018
Macia-Lillo AMora HJimeno-Morenilla ARamirez T(2024)Towards Abstraction of Heterogeneous Accelerators for HPC/AI Tasks in the Cloud2024 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)10.1109/CloudCom62794.2024.00013(151-159)Online publication date: 9-Dec-2024
https://doi.org/10.1109/CloudCom62794.2024.00013
Show More Cited By

Index Terms

GPU Virtualization and Scheduling Methods: A Comprehensive Survey

Recommendations

Implementation of GPU virtualization using PCI pass-through mechanism

As a general purpose scalable parallel programming model for coding highly parallel applications, CUDA from NVIDIA provides several key abstractions: a hierarchy of thread blocks, shared memory, and barrier synchronization. It has proven to be rather ...
On implementation of GPU virtualization using PCI pass-through
CLOUDCOM '12: Proceedings of the 2012 IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom)

In this paper, we use PCI pass-through technology and make the virtual machines in a virtual environment are able to use the NVIDIA graphics card, which uses the CUDA parallel progamming. It makes the virtual machine have not only the virtual CPU but ...
Enabling GPU Virtualization in Cloud Environments
CLOSER 2016: Proceedings of the 6th International Conference on Cloud Computing and Services Science - Volume 1 and 2

The use of accelerators, such as graphics processing units (GPUs), to reduce the execution time of compute-intensive applications has become popular during the past few years. These devices increment the computational power of a node thanks to their ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 50, Issue 3

May 2018

550 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/3101309

Editor:
Sartaj Sahni
Department of Computer and Information Science and Engineering / University of Florida / Gainesville, FL 32611

Issue’s Table of Contents

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 June 2017

Accepted: 01 March 2017

Revised: 01 February 2017

Received: 01 October 2016

Published in CSUR Volume 50, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Survey
Research
Refereed

Funding Sources

European Commission under the Horizon 2020 program RAPID

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

75
Total Citations
View Citations
3,568
Total Downloads

Downloads (Last 12 months)474
Downloads (Last 6 weeks)45

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Turchet LKrstulović S(2024)DSP as a Service: Foundations and DirectionsIEEE Open Journal of the Communications Society10.1109/OJCOMS.2024.34646965(6212-6226)Online publication date: 2024
https://doi.org/10.1109/OJCOMS.2024.3464696
Liu YYao MYang XQin MMa LXiang H(2024)Advanced Virtualization in Automotive Operating System: Technology Overview and Scheme Design2024 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA)10.1109/ISPA63168.2024.00018(74-81)Online publication date: 30-Oct-2024
https://doi.org/10.1109/ISPA63168.2024.00018
Macia-Lillo AMora HJimeno-Morenilla ARamirez T(2024)Towards Abstraction of Heterogeneous Accelerators for HPC/AI Tasks in the Cloud2024 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)10.1109/CloudCom62794.2024.00013(151-159)Online publication date: 9-Dec-2024
https://doi.org/10.1109/CloudCom62794.2024.00013
Rosa LFoschini LCorradi A(2024)Empowering Cloud Computing With Network Acceleration: A SurveyIEEE Communications Surveys & Tutorials10.1109/COMST.2024.337753126:4(2729-2768)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1109/COMST.2024.3377531
Muzzini FCapodieci NRamanzin FBurgio P(2024)GPU implementation of the Frenet Path Planner for embedded autonomous systems: A case study in the F1tenth scenarioJournal of Systems Architecture10.1016/j.sysarc.2024.103239154(103239)Online publication date: Sep-2024
https://doi.org/10.1016/j.sysarc.2024.103239
Wang YYu JYu Z(2023)Resource scheduling techniques in cloud from a view of coordination: a holistic survey从协同视角论云资源调度技术：综述Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.210029824:1(1-40)Online publication date: 23-Jan-2023
https://doi.org/10.1631/FITEE.2100298
He BZheng XChen YLi WZhou YLong XZhang PLu XJiang LLiu QCai DZhang X(2023)DxPU: Large-scale Disaggregated GPU Pools in the DatacenterACM Transactions on Architecture and Code Optimization10.1145/361799520:4(1-23)Online publication date: 5-Oct-2023
https://dl.acm.org/doi/10.1145/3617995
Friesel BLütke Dreimann MSpinczyk O(2023)A Full-System Perspective on UPMEM PerformanceProceedings of the 1st Workshop on Disruptive Memory Systems10.1145/3609308.3625266(1-7)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1145/3609308.3625266
Liang QHanafy WAli-Eldin AShenoy P(2023)Model-driven Cluster Resource Management for AI Workloads in Edge CloudsACM Transactions on Autonomous and Adaptive Systems10.1145/358208018:1(1-26)Online publication date: 27-Mar-2023
https://dl.acm.org/doi/10.1145/3582080
liang gDaud SIsmail N(2023)Evolution of GPU virtualization to resource poolingSecond International Conference on Electronic Information Technology (EIT 2023)10.1117/12.2685490(35)Online publication date: 15-Aug-2023
https://doi.org/10.1117/12.2685490
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents