Nothing Special   »   [go: up one dir, main page]

skip to main content
survey

GPU Virtualization and Scheduling Methods: A Comprehensive Survey

Published: 29 June 2017 Publication History

Abstract

The integration of graphics processing units (GPUs) on high-end compute nodes has established a new accelerator-based heterogeneous computing model, which now permeates high-performance computing. The same paradigm nevertheless has limited adoption in cloud computing or other large-scale distributed computing paradigms. Heterogeneous computing with GPUs can benefit the Cloud by reducing operational costs and improving resource and energy efficiency. However, such a paradigm shift would require effective methods for virtualizing GPUs, as well as other accelerators. In this survey article, we present an extensive and in-depth survey of GPU virtualization techniques and their scheduling methods. We review a wide range of virtualization techniques implemented at the GPU library, driver, and hardware levels. Furthermore, we review GPU scheduling methods that address performance and fairness issues between multiple virtual machines sharing GPUs. We believe that our survey delivers a perspective on the challenges and opportunities for virtualization of heterogeneous computing environments.

References

[1]
Darren Abramson, Jeff Jackson, Sridhar Muthrasanallur, Gil Neiger, Greg Regnier, Rajesh Sankaran, Ioannis Schoinas, Rich Uhlig, Balaji Vembu, and John Wiegert. 2006. Intel virtualization technology for directed I/O. Intel Technol. J. 10, 3 (2006).
[2]
EC Amazon. 2010. Amazon elastic compute cloud (Amazon EC2). https://aws.amazon.com/ec2/.
[3]
AMD. 2009. R6xx_3D_Registers.pdf. Retrieved from http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/10/R6xx_3D_Registers.pdf. (2009).
[4]
Joshua Anderson, Aaron Keys, Carolyn Phillips, Trung Dac Nguyen, and Sharon Glotzer. 2010. HOOMD-blue, general-purpose many-body dynamics on the GPU. In APS Meeting Abstracts, Vol. 1. 18008.
[5]
Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A. Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, and others. 2006. The Landscape of Parallel Computing Research: A View from Berkeley. EECS Department Technical Report UCB/EECS-2006-183. University of California, Berkeley.
[6]
Andreu Badal and Aldo Badano. 2009. Accelerating monte carlo simulations of photon transport in a voxelized geometry using a massively parallel graphics processing unit. Med. Phys. 36, 11 (2009), 4878--4880.
[7]
Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. 2003. Xen and the art of virtualization. ACM SIGOPS Operat. Syst. Rev. 37, 5 (2003), 164--177.
[8]
Andreas Athanasopoulos, Anastasios Dimou, Vasileios Mezaris, and Ioannis Kompatsiaris. 2011. GPU acceleration for support vector machines. In 12th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS’11). TU Delft; EWI; MM; PRB, Delft, The Netherlands.
[9]
Can Basaran and Kyoung-Don Kang. 2012. Supporting preemptive task executions and memory copies in gpgpus. In Proceedings of the 2012 24th Euromicro Conference on Real-Time Systems. IEEE, 287--296.
[10]
Michela Becchi, Kittisak Sajjapongse, Ian Graves, Adam Procter, Vignesh Ravi, and Srimat Chakradhar. 2012. A virtual memory based runtime to support multi-tenancy in clusters with GPUs. In Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing. ACM, 97--108.
[11]
Brahim Bensaou, Danny H. K. Tsang, and King Tung Chan. 2001. Credit-based fair queueing (CBFQ): A simple service-scheduling algorithm for packet-switched networks. IEEE/ACM Trans. Network. 9, 5 (2001), 591--604.
[12]
David Blythe. 2006. The direct3d 10 system. In ACM Transactions on Graphics, Vol. 25. ACM, 724--734.
[13]
Robert A. Bridges, Neena Imam, and Tiffany M Mintz. 2016. Understanding GPU power: A survey of profiling, modeling, and simulation methods. ACM Comput. Surv. 49, 3 (2016), 41.
[14]
Anton Burtsev, Kiran Srinivasan, Prashanth Radhakrishnan, Kaladhar Voruganti, and Garth R. Goodson. 2009. Fido: Fast inter-virtual-machine communication for enterprise appliances. In Proceedings of the USENIX Annual Technical Conference.
[15]
Adrián Castelló, Antonio J. Peña, Rafael Mayo, Pavan Balaji, and Enrique S. Quintana-Ortí. 2015. Exploring the suitability of remote GPGPU virtualization for the OpenACC programming model using rCUDA. In Proceedings of the 2015 IEEE International Conference on Cluster Computing. IEEE, 92--95.
[16]
Ethan Cerami. 2002. Web Services Essentials: Distributed Applications with XML-RPC, SOAP, UDDI 8 WSDL. O’Reilly Media, Inc.
[17]
Charu Chaubal. 2008. The architecture of vmware esxi. VMware White Pap. 1, 7 (2008).
[18]
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the IEEE International Symposium on Workload Characterization, 2009 (IISWC’09). IEEE, 44--54.
[19]
Hao Chen, Lin Shi, and Jianhua Sun. 2010. VMRPC: A high efficiency and light weight RPC system for virtual machines. In Proceedings of the 2010 18th International Workshop on Quality of Service (IWQoS’10). IEEE, 1--9.
[20]
Yun Chan Cho and Jae Wook Jeon. 2007. Sharing data between processes running on different domains in para-virtualized xen. In Proceedings of the International Conference on Control, Automation and Systems, 2007 (ICCAS’07). IEEE, 1255--1260.
[21]
Steve Crago, Kyle Dunn, Patrick Eads, Lorin Hochstein, Dong-In Kang, Mikyung Kang, Devendra Modium, Karandeep Singh, Jinwoo Suh, and John Paul Walters. 2011. Heterogeneous cloud computing. In Proceedings of the 2011 IEEE International Conference on Cluster Computing. IEEE, 378--385.
[22]
Chris I. Dalton, David Plaquin, Wolfgang Weidner, Dirk Kuhlmann, Boris Balacheff, and Richard Brown. 2009. Trusted virtual platforms: A key enabler for converged client devices. ACM SIGOPS Operat. Syst. Rev. 43, 1 (2009), 36--43.
[23]
Anthony Danalis, Gabriel Marin, Collin McCurdy, Jeremy S. Meredith, Philip C. Roth, Kyle Spafford, Vinod Tipparaju, and Jeffrey S. Vetter. 2010. The scalable heterogeneous computing (SHOC) benchmark suite. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units. ACM, 63--74.
[24]
A. Demers, S. Keshav, and S. Shenker. 1989. Design and analysis of a fair queuing algorithm. In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM’89), Vol. 89.
[25]
Roberto Di Lauro, Flora Giannone, Luigia Ambrosio, and Raffaele Montella. 2012. Virtualizing general purpose GPUs for high performance cloud computing: An application to a fluid simulator. In Proceedings of the 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications (ISPA’12). IEEE, 863--864.
[26]
Matthew Dixon, Sabbir Ahmed Khan, and Mohammad Zubair. 2014. Accelerating option risk analytics in R using GPUs. In Proceedings of the High Performance Computing Symposium. Society for Computer Simulation International, 24.
[27]
Yaozu Dong, Mochi Xue, Xiao Zheng, Jiajun Wang, Zhengwei Qi, and Haibing Guan. 2015. Boosting GPU virtualization performance with hybrid shadow page tables. In Proceedings of the 2015 USENIX Annual Technical Conference (USENIX ATC’15). 517--528.
[28]
Yaozu Dong, Xiaowei Yang, Jianhui Li, Guangdeng Liao, Kun Tian, and Haibing Guan. 2012. High performance network virtualization with SR-IOV. J. Parallel Distrib. Comput. 72, 11 (2012), 1471--1480.
[29]
Jack J. Dongarra, Piotr Luszczek, and Antoine Petitet. 2003. The LINPACK benchmark: Past, present and future. Concurr. Comput.: Pract. Exper. 15, 9 (2003), 803--820.
[30]
Micah Dowty and Jeremy Sugerman. 2009. GPU virtualization on VMware’s hosted I/O architecture. ACM SIGOPS Operat. Syst. Rev. 43, 3 (2009), 73--82.
[31]
José Duato, Francisco D. Igual, Rafael Mayo, Antonio J. Peña, Enrique S. Quintana-Ortí, and Federico Silla. 2009. An efficient implementation of GPU virtualization in high performance clusters. In European Conference on Parallel Processing. Springer, 385--394.
[32]
José Duato, Antonio J. Peña, Federico Silla, Juan C. Fernandez, Rafael Mayo, and Enrique S. Quintana-Ortí. 2011. Enabling CUDA acceleration within virtual machines using rCUDA. In Proceedings of the 2011 18th International Conference on High Performance Computing (HiPC’11). IEEE, 1--10.
[33]
José Duato, Antonio J. Peña, Federico Silla, Rafael Mayo, and Enrique S. Quintana-Ortı. 2010a. Modeling the CUDA remoting virtualization behaviour in high performance networks. In Proceedings of the 1st Workshop on Language, Compiler, and Architecture Support for GPGPU.
[34]
José Duato, Antonio J. Peña, Federico Silla, Rafael Mayo, and Enrique S. Quintana-Ortí. 2010b. rCUDA: Reducing the number of GPU-based accelerators in high performance clusters. In Proceedings of the 2010 International Conference on High Performance Computing and Simulation (HPCS’10). IEEE, 224--231.
[35]
José Duato, Antonio J. Peña, Federico Silla, Rafael Mayo, and Enrique S Quintana-Ortí. 2011. Performance of CUDA virtualized remote GPUs in high performance clusters. In Proceedings of the 2011 International Conference on Parallel Processing (ICPP’11). IEEE, 365--374.
[36]
Ashok Dwarakinath. 2008. A Fair-Share Scheduler for the Graphics Processing Unit. Ph.D. Dissertation. Citeseer.
[37]
Roberto R. Expósito, Guillermo L. Taboada, Sabela Ramos, Juan Touriño, and Ramón Doallo. 2013. General-purpose computation on GPUs for high performance cloud computing. Concurr. Comput.: Pract. Exper. 25, 12 (2013), 1628--1642.
[38]
Naila Farooqui, Rajkishore Barik, Brian T. Lewis, Tatiana Shpeisman, and Karsten Schwan. 2016. Affinity-aware work-stealing for integrated CPU-GPU processors. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, 30.
[39]
Denis Foley. 2014. NVLink, pascal and stacked memory: Feeding the appetite for big data. Retrieved from Nvidia.com (2014).
[40]
Futuremark. 1998. 3DMark Benchmarks—See the Current Range of this Popular PC Graphics Card Test. Retrieved from http://www.futuremark.com/benchmarks/3dmark/all?_ga=1.168926249.987441096.1470653002. (1998).
[41]
Tal Garfinkel and Mendel Rosenblum. 2005. When virtual is harder than real: Security challenges in virtual machine based computing environments. In Proceedings of the Workshop on Hot Topics in Operating Systems (HotOS’05).
[42]
Carl Gebhardt and Allan Tomlinson. 2010. Challenges for Inter Virtual Machine Communication. Technical Report. Citeseer.
[43]
Francisco Giunta, Raffaele Montella, Giuliano Laccetti, Florin Isaila, and F. Blas. 2011. A GPU accelerated high performance cloud computing infrastructure for grid computing based virtual environmental laboratory. Adv. Grid Comput. Lecture Notes in Computer Science. Vol. 6271. Springer, Berlin, Heidelberg, 35--43.
[44]
Giulio Giunta, Raffaele Montella, Giuseppe Agrillo, and Giuseppe Coviello. 2010. A GPGPU transparent virtualization component for high performance computing clouds. In Euro-Par 2010-Parallel Processing. Springer, 379--391.
[45]
Jens Glaser, Trung Dac Nguyen, Joshua A. Anderson, Pak Lui, Filippo Spiga, Jaime A. Millan, David C. Morse, and Sharon C. Glotzer. 2015. Strong scaling of general-purpose molecular dynamics simulations on GPUs. Comput. Phys. Commun. 192 (2015), 97--107.
[46]
Robert P. Goldberg. 1974. Survey of virtual machine research. Computer 7, 6 (1974), 34--45.
[47]
Mathias Gottschlag, Martin Hillenbrand, Jens Kehne, Jan Stoess, and Frank Bellosa. 2013. LoGV: Low-overhead GPGPU virtualization. In Proceedings of the 2013 IEEE 10th International Conference on High Performance Computing and Communications 8 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC’13). IEEE, 1721--1726.
[48]
Simon Green. 2010. Particle simulation using cuda. NVIDIA Whitepaper 6 (2010), 121--128.
[49]
William Gropp, Ewing Lusk, Nathan Doss, and Anthony Skjellum. 1996. A high-performance, portable implementation of the MPI message passing interface standard. Parallel Comput. 22, 6 (1996), 789--828.
[50]
Khronos OpenCL Working Group et al. 2008. The opencl specification. Version 1, 29 (2008), 8.
[51]
Vishakha Gupta, Ada Gavrilovska, Karsten Schwan, Harshvardhan Kharche, Niraj Tolia, Vanish Talwar, and Parthasarathy Ranganathan. 2009. GViM: GPU-accelerated virtual machines. In Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing. ACM, 17--24.
[52]
Haibing Guan, Jianguo Yao, Zhengwei Qi, and Runze Wang. 2015. Energy-efficient SLA guarantees for virtualized GPU in cloud gaming. IEEE Trans.actions on Parallel Distrib. Syst. 26, 9 (2015), 2434--2443.
[53]
Vishakha Gupta, Karsten Schwan, Niraj Tolia, Vanish Talwar, and Parthasarathy Ranganathan. 2011. Pegasus: Coordinated scheduling for virtualized accelerator-based systems. In Proceedings of the 2011 USENIX Annual Technical Conference (USENIX ATC’11). 31.
[54]
Per Hammarlund, Alberto J. Martinez, Atiq A. Bajwa, David L. Hill, Erik Hallnor, Hong Jiang, Martin Dixon, Michael Derr, Mikal Hunsaker, Rajesh Kumar, et al. 2014. Haswell: The fourth-generation intel core processor. IEEE Micro 34, 2 (2014), 6--20.
[55]
Jacob Gorm Hansen. 2007. Blink: Advanced display multiplexing for virtualized applications. In Proceedings of the SIGMM Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV’07).
[56]
Nadav Har’El, Abel Gordon, Alex Landau, Muli Ben-Yehuda, Avishay Traeger, and Razya Ladelsky. 2013. Efficient and scalable paravirtual I/O system. In Proceedings of the USENIX Annual Technical Conference. 231--242.
[57]
Nicholas Haydel, Sandra Gesing, Ian Taylor, Gregory Madey, Abdul Dakkak, Simon Garcia De Gonzalo, and Wen-Mei W. Hwu. 2015. Enhancing the usability and utilization of accelerated architectures via docker. In Proceedings of the 2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing (UCC’15). IEEE, 361--367.
[58]
Alex Herrera. 2014. NVIDIA GRID: Graphics accelerated VDI with the visual performance of a workstation. Nvidia Corp (2014). http://www.nvidia.com/content/grid/vdi-whitepaper.pdf.
[59]
Hua-Jun Hong, Tao-Ya Fan-Chiang, Che-Run Lee, Kuan-Ta Chen, Chun-Ying Huang, and Cheng-Hsin Hsu. 2014. GPU consolidation for cloud games: Are we there yet?. In Proceedings of the 13th Annual Workshop on Network and Systems Support for Games. IEEE Press, 3.
[60]
Yu-Ju Huang, Hsuan-Heng Wu, Yeh-Ching Chung, and Wei-Chung Hsu. 2016. Building a KVM-based hypervisor for a heterogeneous system architecture compliant system. In Proceedings of the12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments. ACM, 3--15.
[61]
Greg Humphreys, Mike Houston, Ren Ng, Randall Frank, Sean Ahern, Peter D. Kirchner, and James T. Klosowski. 2002. Chromium: A stream-processing framework for interactive rendering on clusters. In ACM Transactions on Graphics, Vol. 21. ACM, 693--702.
[62]
Su Min Jang, Won Hyuk Choi, and Won Young Kim. 2013. Client rendering method for desktop virtualization services. ETRI J. 35, 2 (2013), 348--351.
[63]
Víctor J. Jiménez, Lluís Vilanova, Isaac Gelado, Marisa Gil, Grigori Fursin, and Nacho Navarro. 2009. Predictive runtime code scheduling for heterogeneous architectures. In High Performance Embedded Architectures and Compilers. Springer, 19--33.
[64]
Heeseung Jo, Jinkyu Jeong, Myoungho Lee, and Dong Hoon Choi. 2013a. Exploiting GPUs in virtual machine for biocloud. BioMed Res. Int. 2013 (2013).
[65]
Hee Seung Jo, Myung Ho Lee, and Dong Hoon Choi. 2013b. GPU virtualization using PCI direct pass-through. In Applied Mechanics and Materials, Vol. 311. Trans Tech Publ, 15--19.
[66]
David Kanter. 2010. Intels sandy bridge microarchitecture. http://www.realworldtech.com/sandy-bridge/.
[67]
Ian Karlin, Jeff Keasler, and Rob Neely. 2013. Lulesh 2.0 updates and changes. Livermore, CA (2013). https://codesign.llnl.gov/lulesh.php.
[68]
Shinpei Kato, Scott Brandt, Yutaka Ishikawa, and R Rajkumar. 2011a. Operating systems challenges for GPU resource management. In Proceedings of the International Workshop on Operating Systems Platforms for Embedded Real-Time Applications. 23--32.
[69]
Shinpei Kato, Karthik Lakshmanan, Yutaka Ishikawa, and Ragunathan Rajkumar. 2011b. Resource sharing in GPU-accelerated windowing systems. In Proceedings of the 2011 17th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’11). IEEE, 191--200.
[70]
Shinpei Kato, Karthik Lakshmanan, Aman Kumar, Mihir Kelkar, Yutaka Ishikawa, and Ragunathan Rajkumar. 2011c. RGEM: A responsive GPGPU execution model for runtime engines. In Proceedings of the 2011 IEEE 32nd Real-Time Systems Symposium (RTSS’11). IEEE, 57--66.
[71]
Shinpei Kato, Karthik Lakshmanan, Raj Rajkumar, and Yutaka Ishikawa. 2011d. TimeGraph: GPU scheduling for real-time multi-tasking environments. In Proceedings of the 2011 USENIX Annual Technical Conference (USENIX ATC’11). 17.
[72]
Shinpei Kato, Michael McThrow, Carlos Maltzahn, and Scott A. Brandt. 2012. Gdev: First-class GPU resource management in the operating system. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’11). 401--412.
[73]
Se Won Kim, Chiyoung Lee, MooWoong Jeon, Hae Young Kwon, Hyun Woo Lee, and Chuck Yoo. 2013. Secure device access for automotive software. In Proceedings of the 2013 International Conference on Connected Vehicles and Expo (ICCVE’13). IEEE, 177--181.
[74]
David B. Kirk and W. Hwu Wen-mei. 2012. Programming Massively Parallel Processors: A Hands-on Approach. Newnes.
[75]
Avi Kivity, Yaniv Kamay, Dor Laor, Uri Lublin, and Anthony Liguori. 2007. kvm: The Linux virtual machine monitor. In Proceedings of the Linux Symposium, Vol. 1. 225--230.
[76]
Nasser A. Kurd, Subramani Bhamidipati, Christopher Mozak, Jeffrey L. Miller, Timothy M. Wilson, Mahadev Nemani, and Muntaquim Chowdhury. 2010. Westmere: A family of 32nm IA processors. In Proceedings of the 2010 IEEE International Solid-State Circuits Conference (ISSCC’10).
[77]
Maxim A. Kuzkin and Alexander G. Tormasov. 2011. Method and system for remote device access in virtual environment. (issued date: July 5 2011). Patent No. 7,975,017. Filed date: Feb 25, 2009.
[78]
George Kyriazis. 2012. Heterogeneous system architecture: A technical review. In Proceedings of the AMD Fusion Developer Summit (2012).
[79]
Giuliano Laccetti, Raffaele Montella, Carlo Palmieri, and Valentina Pelliccia. 2013. The high performance internet of things: Using GVirtuS to share high-end GPUs with ARM based cluster computing nodes. In International Conference on Parallel Processing and Applied Mathematics. Springer, 734--744.
[80]
H. Andrés Lagar-Cavilla, Niraj Tolia, Mahadev Satyanarayanan, and Eyal De Lara. 2007. VMM-independent graphics acceleration. In Proceedings of the 3rd International Conference on Virtual Execution Environments. ACM, 33--43.
[81]
Palden Lama, Yan Li, Ashwin M. Aji, Pavan Balaji, James Dinan, Shucai Xiao, Yunquan Zhang, Wu-chun Feng, Rajeev Thakur, and Xiaobo Zhou. 2013. pVOCL: Power-aware dynamic placement and migration in virtualized GPU environments. In Proceedings of the 2013 IEEE 33rd International Conference on Distributed Computing Systems (ICDCS’13). IEEE, 145--154.
[82]
Michael Larabel and M. Tippett. 2011. Phoronix test suite. https://www.phoronix-test-suite.com.
[83]
Chiyoung Lee, Se-Won Kim, and Chuck Yoo. 2016. VADI: GPU virtualization for an automotive platform. IEEE Trans. Industr. Inf. 12, 1 (2016), 277--290.
[84]
Gunho Lee and Randy H. Katz. 2011. Heterogeneity-aware resource allocation and scheduling in the cloud. In Proceedings of the 3rd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud’11).
[85]
Teng Li, Vikram K. Narayana, Esam El-Araby, and Tarek El-Ghazawi. 2011. GPU resource sharing and virtualization on high performance computing systems. In Proceedings of the 2011 International Conference on Parallel Processing (ICPP’11). IEEE, 733--742.
[86]
Teng Li, Vikram K. Narayana, and Tarek El-Ghazawi. 2012. Accelerated high-performance computing through efficient multi-process GPU resource sharing. In Proceedings of the 9th Conference on Computing Frontiers. ACM, 269--272.
[87]
Wenqiang Li, Guanghao Jin, Xuewen Cui, and Simon See. 2015. An evaluation of unified memory technology on nvidia gpus. In Proceedings of the 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid’15). IEEE, 1092--1098.
[88]
Tyng-Yeu Liang and Yu-Wei Chang. 2011. GridCuda: A grid-enabled CUDA programming toolkit. In Proceedings of the 2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications (WAINA’11). IEEE, 141--146.
[89]
Christos Margiolas and Michael F. P. O’Boyle. 2016. Portable and transparent software managed scheduling on accelerators for fair resource sharing. In Proceedings of the 2016 International Symposium on Code Generation and Optimization. ACM, 82--93.
[90]
Konstantinos Menychtas, Kai Shen, and Michael L. Scott. 2013. Enabling OS research by inferring interactions in the black-box GPU stack. In Proceedings of the 2013 USENIX Annual Technical Conference (USENIX ATC’13). 291--296.
[91]
Konstantinos Menychtas, Kai Shen, and Michael L. Scott. 2014. Disengaged scheduling for fair, protected access to fast computational accelerators. In ACM SIGPLAN Notices, Vol. 49. ACM, 301--316.
[92]
Alexander M. Merritt, Vishakha Gupta, Abhishek Verma, Ada Gavrilovska, and Karsten Schwan. 2011. Shadowfax: Scaling in heterogeneous cluster systems via GPGPU assemblies. In Proceedings of the 5th International Workshop on Virtualization Technologies in Distributed Computing. ACM, 3--10.
[93]
Sparsh Mittal and Jeffrey S. Vetter. 2015. A survey of methods for analyzing and improving GPU energy efficiency. ACM Comput. Surv. 47, 2 (2015), 19.
[94]
Raffaele Montella, Giuseppe Coviello, Giulio Giunta, Giuliano Laccetti, Florin Isaila, and Javier Garcia Blas. 2011. A general-purpose virtualization service for HPC on cloud computing: An application to GPUs. In International Conference on Parallel Processing and Applied Mathematics. Springer, 740--749.
[95]
Raffaele Montella, Giulio Giunta, and Giuliano Laccetti. 2014. Virtualizing high-end GPGPUs on ARM clusters for the next generation of high performance cloud computing. Cluster Comput. 17, 1 (2014), 139--152.
[96]
Raffaele Montella, Giulio Giunta, Giuliano Laccetti, Marco Lapegna, Carlo Palmieri, Carmine Ferraro, and Valentina Pelliccia. 2016a. Virtualizing CUDA enabled GPGPUs on ARM clusters. In Parallel Processing and Applied Mathematics. Springer, 3--14.
[97]
Raffaele Montella, Giulio Giunta, Giuliano Laccetti, Marco Lapegna, Carlo Palmieri, Carmine Ferraro, Valentina Pelliccia, Cheol-Ho Hong, Ivor Spence, and Dimitrios S. Nikolopoulos. 2016b. On the virtualization of CUDA based GPU remoting on ARM and X86 machines in the GVirtuS framework. Int. J. Parallel Program. (2016), 1--22.
[98]
Christopher Niederauer, Mike Houston, Maneesh Agrawala, and Greg Humphreys. 2003. Non-invasive interactive visualization of dynamic architectural environments. In Proceedings of the 2003 Symposium on Interactive 3D Graphics. ACM, 55--58.
[99]
Nvidia. 2007a. CUDA Code Samples—NVIDIA Developer. Retrieved from https://developer.nvidia.com/cuda-code-samples.
[100]
NVIDIA. 2012. HyperQ Example. Retrieved from http://docs.nvidia.com/cuda/samples/6_Advanced/simpleHyperQ/doc/HyperQ.pdf.
[101]
NVIDIA. 2016a. GP100 Pascal Whitepaper. Retrieved from https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf.
[102]
NVIDIA. 2016b. GPU Cloud Computing Service Providers—NVIDIA. Retrieved from http://www.nvidia.com/object/gpu-cloud-computing-services.html.
[103]
CUDA Nvidia. 2007b. Compute Unified Device Architecture Programming Guide. http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html.
[104]
Katsuhiko Ogata. 1995. Discrete-Time Control Systems. Vol. 2. Prentice Hall, Englewood Cliffs, NJ.
[105]
Masahiro Oikawa, Atsushi Kawai, Keigo Nomura, Koichi Yasuoka, Kenichi Yoshikawa, and Tetsu Narumi. 2012. DS-CUDA: A middleware to use many GPUs in the cloud environment. In Proceedings of the 2012 SC Companion to High Performance Computing, Networking, Storage and Analysis (SCC). IEEE, 1207--1214.
[106]
Zhonghong Ou, Hao Zhuang, Jukka K. Nurminen, Antti Ylä-Jääski, and Pan Hui. 2012. Exploiting hardware heterogeneity within the same instance type of Amazon EC2. Presented in the 4th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud).
[107]
Sankaralingam Panneerselvam and Michael M Swift. 2012. Operating systems should manage accelerators. In Proceedings of the 4th USENIX Workshop on Hot Topics in Parallelism.
[108]
Stan Park and Kai Shen. 2012. FIOS: A fair, efficient flash I/O scheduler. In Proceedings of the 10th USENEX Conference on File and Storage Technologies (FAST’12). 13.
[109]
PathScale. 2012. pathscale/pscnv. Retrieved from https://github.com/pathscale/pscnv.
[110]
Sagar Patni, Jobin George, Pratik Lahoti, and Jibi Abraham. 2015. A zero-copy fast channel for inter-guest and guest-host communication using VirtIO-serial. In Proceedings of the 2015 1st International Conference on Next Generation Computing Technologies (NGCT’15). IEEE, 6--9.
[111]
David Patterson. 2009. The top 10 innovations in the new NVIDIA fermi architecture, and the top 3 next challenges. NVIDIA Whitepaper 47 (2009).
[112]
Antonio J. Peña, Carlos Reaño, Federico Silla, Rafael Mayo, Enrique S. Quintana-Ortí, and José Duato. 2014. A complete and efficient CUDA-sharing solution for HPC clusters. Parallel Comput. 40, 10 (2014), 574--588.
[113]
Ferran Pérez, Carlos Reaño, and Federico Silla. 2016. Providing CUDA acceleration to KVM virtual machines in InfiniBand Clusters with rCUDA. In Distributed Applications and Interoperable Systems. Springer, 82--95.
[114]
Antoine Petitet. 2004. HPL-A portable implementation of the high-performance Linpack benchmark for distributed-memory computers. Retrieved from http://www.netlib-.org/-benchmark/hpl/.
[115]
James C. Phillips, Rosemary Braun, Wei Wang, James Gumbart, Emad Tajkhorshid, Elizabeth Villa, Christophe Chipot, Robert D. Skeel, Laxmikant Kale, and Klaus Schulten. 2005. Scalable molecular dynamics with NAMD. J. Comput. Chem. 26, 16 (2005), 1781--1802.
[116]
Steve Plimpton, Paul Crozier, and Aidan Thompson. 2007. LAMMPS-large-scale atomic/molecular massively parallel simulator. Sandia National Laboratories 18 (2007). http://lammps.sandia.gov.
[117]
Javier Prades, Carlos Reaño, and Federico Silla. 2016. CUDA acceleration for Xen virtual machines in infiniband clusters with rCUDA. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, 35.
[118]
Zhengwei Qi, Jianguo Yao, Chao Zhang, Miao Yu, Zhizhou Yang, and Haibing Guan. 2014. VGRIS: Virtualized GPU resource isolation and scheduling in cloud gaming. ACM Trans. Arch. Code Optimiz. 11, 2 (2014), 17.
[119]
Adit Ranadive and Bhavesh Davda. 2012. Toward a paravirtual vRDMA device for VMware ESXi guests. VMware Techn. J. 2012 1, 2 (2012).
[120]
Vignesh T. Ravi, Michela Becchi, Gagan Agrawal, and Srimat Chakradhar. 2011. Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework. In Proceedings of the 20th International Symposium on High Performance Distributed Computing. ACM, 217--228.
[121]
Carlos Reaño, Rafael Mayo, Enrique S. Quintana-Ortí, Federico Silla, José Duato, and Antonio J. Peña. 2013. Influence of InfiniBand FDR on the performance of remote GPU virtualization. In Proceedings of the 2013 IEEE International Conference on Cluster Computing (CLUSTER’13). IEEE, 1--8.
[122]
Carlos Reaño, A. J. Pea, Federico Silla, José Duato, Rafael Mayo, and Enrique S. Quintana-Ortí. 2012. Cu2rcu: Towards the complete rcuda remote gpu virtualization and sharing solution. In Proceedings of the 2012 19th International Conference on High Performance Computing (HiPC’12). IEEE, 1--10.
[123]
Carlos Reaño and Federico Silla. 2015. A performance comparison of CUDA remote GPU virtualization frameworks. In Proceedings of the 2015 IEEE International Conference on Cluster Computing. IEEE, 488--489.
[124]
Carlos Reaño, Federico Silla, Adrián Castelló, Antonio J . Peña, Rafael Mayo, Enrique S Quintana-Ortí, and José Duato. 2015a. Improving the user experience of the rCUDA remote GPU virtualization framework. Concurr. Comput.: Pract. Exper. 27, 14 (2015), 3746--3770.
[125]
Carlos Reaño, Federico Silla, Gilad Shainer, and Scot Schultz. 2015b. Local and remote GPUs perform similar with EDR 100G InfiniBand. In Proceedings of the Industrial Track of the 16th International Middleware Conference. ACM, 4.
[126]
Christopher J. Rossbach, Jon Currey, Mark Silberstein, Baishakhi Ray, and Emmett Witchel. 2011. PTask: Operating system abstractions to manage GPUs as compute devices. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles. ACM, 233--248.
[127]
Eric E. Schadt, Michael D. Linderman, Jon Sorenson, Lawrence Lee, and Garry P. Nolan. 2011. Cloud and heterogeneous computing solutions exist today for the emerging big data problems in biology. Nat. Rev. Genet. 12, 3 (2011), 224--224.
[128]
Dipanjan Sengupta, Raghavendra Belapure, and Karsten Schwan. 2013. Multi-tenancy on GPGPU-based servers. In Proceedings of the 7th International Workshop on Virtualization Technologies in Distributed Computing. ACM, 3--10.
[129]
Dipanjan Sengupta, Anshuman Goswami, Karsten Schwan, and Krishna Pallavi. 2014. Scheduling multi-tenant cloud workloads on accelerator-based systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press, 513--524.
[130]
Gilad Shainer, Ali Ayoub, Pak Lui, Tong Liu, Michael Kagan, Christian R. Trott, Greg Scantlen, and Paul S. Crozier. 2011. The development of Mellanox/NVIDIA GPUDirect over InfiniBanda new model for GPU to GPU communications. Comput. Sci. Res. Dev. 26, 3-4 (2011), 267--273.
[131]
Haitao Shan, Kevin Tian, Eddie Dong, and David Cowperthwaite. 2013. XenGT: A software based intel graphics virtualization solution. Proceedings of the Xen Project Developer Summit.
[132]
Ryan Shea and Jiangchuan Liu. 2013. On GPU pass-through performance for cloud gaming: Experiments and analysis. In Proceedings of the 2013 12th Annual Workshop on Network and Systems Support for Games (NetGames’13). IEEE, 1--6.
[133]
Lin Shi, Hao Chen, and Jianhua Sun. 2009. vCUDA: GPU accelerated high performance computing in virtual machines. In Proceedings of the IEEE International Symposium on Parallel 8 Distributed Processing, 2009 (IPDPS’09). IEEE, 1--11.
[134]
Lin Shi, Hao Chen, Jianhua Sun, and Kenli Li. 2012. vCUDA: GPU-accelerated high-performance computing in virtual machines. IEEE Trans. Comput. 61, 6 (2012), 804--816.
[135]
Weidong Shi, Yang Lu, Zhu Li, and Jonathan Engelsma. 2011. SHARC: A scalable 3D graphics virtual appliance delivery framework in cloud. J. Netw. Comput. Appl. 34, 4 (2011), 1078--1087.
[136]
Madhavapeddi Shreedhar and George Varghese. 1996. Efficient fair queuing using deficit round-robin. IEEE/ACM Trans. Netw. 4, 3 (1996), 375--385.
[137]
Abraham Silberschatz, Peter B. Galvin, Greg Gagne, and A. Silberschatz. 1998. Operating System Concepts. Vol. 4. Addison-Wesley, Reading, MA.
[138]
Jike Song, Zhiyuan Lv, and Kevin Tian. 2014. KVMGT: A full GPU virtualization solution. In KVM Forum 2014. http://www.linux-kvm.org/page/KVM_Forum_2014.
[139]
John A. Stratton, Christopher Rodrigues, I-Jui Sung, Nady Obeid, Li-Wen Chang, Nasser Anssari, Geng Daniel Liu, and Wen-mei W. Hwu. 2012. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Center for Reliable and High-Performance Computing 127 (2012).
[140]
Yusuke Suzuki, Shinpei Kato, Hiroshi Yamada, and Kenji Kono. 2014. GPUvm: Why not virtualizing GPUs at the hypervisor?. In Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC’14). 109--120.
[141]
Yusuke Suzuki, Shinpei Kato, Hiroshi Yamada, and Kenji Kono. 2016. Gpuvm: Gpu virtualization at the hypervisor. IEEE Trans. Comput. 65, 9 (2016), 2752--2766.
[142]
Ivan Tanasic, Isaac Gelado, Javier Cabezas, Alex Ramirez, Nacho Navarro, and Mateo Valero. 2014. Enabling preemptive multiprogramming on GPUs. In ACM SIGARCH Computer Architecture News, Vol. 42. IEEE Press, 193--204.
[143]
Kun Tian, Yaozu Dong, and David Cowperthwaite. 2014. A full GPU virtualization solution with mediated pass-through. In Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC’14).
[144]
Tsan-Rong Tien and Yi-Ping You. 2014. Enabling OpenCL support for GPGPU in Kernel-based Virtual Machine. Softw.: Pract. Exper. 44, 5 (2014), 483--510.
[145]
Top500. 2016. TOP500 Supercomputer Sites. Retrieved from https://www.top500.org/list/2016/06/.
[146]
Rich Uhlig, Gil Neiger, Dion Rodgers, Amy L. Santoni, Fernando C. M. Martins, Andrew V. Anderson, Steven M. Bennett, Alain Kagi, Felix H. Leung, and Larry Smith. 2005. Intel virtualization technology. Computer 38, 5 (2005), 48--56.
[147]
Leendert Van Doorn. 2006. Hardware virtualization trends. In Proceedings of the 2nd International ACM/Usenix Conference on Virtual Execution Environments, Vol. 14. 45--45.
[148]
Stephen J. Vaughan-Nichols. 2006. New approach to virtualization is a lightweight. Computer 39, 11 (2006).
[149]
Anthony Velte and Toby Velte. 2009. Microsoft Virtualization with Hyper-V. McGraw-Hill, Inc.
[150]
M. S. Vinaya, Naga Vydyanathan, and Mrugesh Gajjar. 2012. An evaluation of CUDA-enabled virtualization solutions. In Proceedings of the 2012 2nd IEEE International Conference on Parallel Distributed and Grid Computing (PDGC’12). IEEE, 621--626.
[151]
Lan Vu, Hari Sivaraman, and Rishi Bidarkar. 2014. GPU virtualization for high performance general purpose computing on the ESX hypervisor. In Proceedings of the High Performance Computing Symposium. Society for Computer Simulation International, 2.
[152]
John Paul Walters, Andrew J. Younge, Dong In Kang, Ke Thia Yao, Mikyung Kang, Stephen P. Crago, and Geoffrey C. Fox. 2014. GPU passthrough performance: A comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL applications. In Proceedings of the 2014 IEEE 7th International Conference on Cloud Computing (CLOUD’14). IEEE, 636--643.
[153]
Bin Wang, Ruhui Ma, Zhengwei Qi, Jianguo Yao, and Haibing Guan. 2016. A user mode CPU--GPU scheduling framework for hybrid workloads. Future Gener. Comput. Syst. 63 (2016), 25--36.
[154]
Jian Wang, Kwame-Lante Wright, and Kartik Gopalan. 2008. XenLoop: A transparent high performance inter-vm network loopback. In Proceedings of the 17th International Symposium on High Performance Distributed Computing. ACM, 109--118.
[155]
Johannes Winter. 2008. Trusted computing building blocks for embedded linux-based ARM trustzone platforms. In Proceedings of the 3rd ACM Workshop on Scalable Trusted Computing. ACM, 21--30.
[156]
Craig M. Wittenbrink, Emmett Kilgariff, and Arjun Prabhu. 2011. Fermi GF100 GPU architecture. IEEE Micro 2 (2011), 50--59.
[157]
Mason Woo, Jackie Neider, Tom Davis, and Dave Shreiner. 1999. OpenGL Programming Guide: The Official Guide to Learning OpenGL, Version 1.2. Addison-Wesley Longman Publishing Co., Inc.
[158]
Linlin Wu, Saurabh Kumar Garg, and Rajkumar Buyya. 2011. Sla-based resource allocation for software as a service provider (saas) in cloud computing environments. In Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid’11). IEEE, 195--204.
[159]
Xenproject. 2016. Xen Project Release Features. Retrieved from https://wiki.xenproject.org/wiki/Xen_Project_Release_Features.
[160]
Shucai Xiao, Pavan Balaji, Qian Zhu, Rajeev Thakur, Susan Coghlan, Heshan Lin, Gaojin Wen, Jue Hong, and Wu-chun Feng. 2012. VOCL: An optimized environment for transparent virtualization of graphics processing units. In Proceedings of the Innovative Parallel Computing (InPar’12). IEEE, 1--12.
[161]
X.OrgFoundation. 2011. Nouveau: Accelerated Open Source driver for nVidia cards. Retrieved from https://nouveau.freedesktop.org/wiki/.
[162]
Mochi Xue, Kun Tian, Yaozu Dong, Jiajun Wang, Zhengwei Qi, Bingsheng He, and Haibing Guan. 2016. gScale: Scaling up GPU virtualization with dynamic sharing of graphics memory space. In Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC’16).
[163]
Chao-Tung Yang, Jung-Chun Liu, Hsien-Yi Wang, and Ching-Hsien Hsu. 2014. Implementation of GPU virtualization using PCI pass-through mechanism. J. Supercomput. 68, 1 (2014), 183--213.
[164]
Chao-Tung Yang, Hsien-Yi Wang, and Yu-Tso Liu. 2012a. Using pci pass-through for gpu virtualization with cuda. In Network and Parallel Computing. Springer, 445--452.
[165]
Chao-Tung Yang, Hsien-Yi Wang, Wei-Shen Ou, Yu-Tso Liu, and Ching-Hsien Hsu. 2012b. On implementation of GPU virtualization using PCI pass-through. In Proceedings of the 2012 IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom’12). IEEE, 711--716.
[166]
Chih-Yuan Yeh, Chung-Yao Kao, Wei-Shu Hung, Ching-Chi Lin, Pangfeng Liu, Jan-Jan Wu, and Kuang-Chih Liu. 2013. GPU virtualization support in cloud system. In International Conference on Grid and Pervasive Computing. Springer, 423--432.
[167]
Yi-Ping You, Hen-Jung Wu, Yeh-Ning Tsai, and Yen-Ting Chao. 2015. VirtCL: A framework for OpenCL device abstraction and management. In ACM SIGPLAN Notices, Vol. 50. ACM, 161--172.
[168]
Andrew J. Younge and Geoffrey C. Fox. 2014. Advanced virtualization techniques for high performance cloud cyberinfrastructure. In Proceedings of the 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid’14). IEEE, 583--586.
[169]
Andrew J. Younge, John Paul Walters, Stephen Crago, and Geoffrey C. Fox. 2014. Evaluating GPU passthrough in Xen for high performance cloud computing. In Proceedings of the 2014 IEEE International Parallel 8 Distributed Processing Symposium Workshops (IPDPSW’14). IEEE, 852--859.
[170]
Andrew J. Younge, John Paul Walters, Stephen P. Crago, and Geoffrey C. Fox. 2015. Supporting high performance molecular dynamics in virtualized clusters using IOMMU, SR-IOV, and GPUDirect. In Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments. ACM, 31--38.
[171]
Chao Zhang, Jianguo Yao, Zhengwei Qi, Miao Yu, and Haibing Guan. 2014. vgasa: Adaptive scheduling algorithm of virtualized gpu resource in cloud gaming. IEEE Trans. Parallel Distrib. Syst. 25, 11 (2014), 3036--3045.
[172]
Youhui Zhang, Peng Qu, Jiang Cihang, and Weimin Zheng. 2016. A cloud gaming system based on user-level virtualization and its resource scheduling. IEEE Trans. Parallel Distrib. Syst. 27, 5 (2016), 1239--1252.
[173]
Husheng Zhou, Guangmo Tong, and Cong Liu. 2015. GPES: A preemptive execution system for GPGPU computing. In Proceedings of the 21st IEEE Real-Time and Embedded Technology and Applications Symposium. IEEE, 87--97.

Cited By

View all
  • (2024)DSP as a Service: Foundations and DirectionsIEEE Open Journal of the Communications Society10.1109/OJCOMS.2024.34646965(6212-6226)Online publication date: 2024
  • (2024)GPU implementation of the Frenet Path Planner for embedded autonomous systems: A case study in the F1tenth scenarioJournal of Systems Architecture10.1016/j.sysarc.2024.103239154(103239)Online publication date: Sep-2024
  • (2023)Resource scheduling techniques in cloud from a view of coordination: a holistic survey从协同视角论云资源调度技术:综述Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.210029824:1(1-40)Online publication date: 23-Jan-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 50, Issue 3
May 2018
550 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3101309
  • Editor:
  • Sartaj Sahni
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 June 2017
Accepted: 01 March 2017
Revised: 01 February 2017
Received: 01 October 2016
Published in CSUR Volume 50, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CPU-GPU heterogeneous computing
  2. GPU scheduling methods
  3. GPU virtualization
  4. cloud computing

Qualifiers

  • Survey
  • Research
  • Refereed

Funding Sources

  • European Commission under the Horizon 2020 program RAPID

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)487
  • Downloads (Last 6 weeks)98
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)DSP as a Service: Foundations and DirectionsIEEE Open Journal of the Communications Society10.1109/OJCOMS.2024.34646965(6212-6226)Online publication date: 2024
  • (2024)GPU implementation of the Frenet Path Planner for embedded autonomous systems: A case study in the F1tenth scenarioJournal of Systems Architecture10.1016/j.sysarc.2024.103239154(103239)Online publication date: Sep-2024
  • (2023)Resource scheduling techniques in cloud from a view of coordination: a holistic survey从协同视角论云资源调度技术:综述Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.210029824:1(1-40)Online publication date: 23-Jan-2023
  • (2023)DxPU: Large-scale Disaggregated GPU Pools in the DatacenterACM Transactions on Architecture and Code Optimization10.1145/361799520:4(1-23)Online publication date: 5-Oct-2023
  • (2023)A Full-System Perspective on UPMEM PerformanceProceedings of the 1st Workshop on Disruptive Memory Systems10.1145/3609308.3625266(1-7)Online publication date: 23-Oct-2023
  • (2023)Model-driven Cluster Resource Management for AI Workloads in Edge CloudsACM Transactions on Autonomous and Adaptive Systems10.1145/358208018:1(1-26)Online publication date: 27-Mar-2023
  • (2023)Evolution of GPU virtualization to resource poolingSecond International Conference on Electronic Information Technology (EIT 2023)10.1117/12.2685490(35)Online publication date: 15-Aug-2023
  • (2023)Enabling Efficient Spatio-Temporal GPU Sharing for Network Function VirtualizationIEEE Transactions on Computers10.1109/TC.2023.327854172:10(2963-2977)Online publication date: 22-May-2023
  • (2023)Hardware-Accelerated FaaS for the Edge-Cloud Continuum2023 IEEE 31st International Conference on Network Protocols (ICNP)10.1109/ICNP59255.2023.10355594(1-6)Online publication date: 10-Oct-2023
  • (2023)MAS: Towards Resource-Efficient Federated Multiple-Task Learning2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.02140(23357-23367)Online publication date: 1-Oct-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media