research-article

DCUDA: Dynamic GPU Scheduling with Live Migration Support

Authors:

John C. S. Lui,

Yinlong XuAuthors Info & Claims

SoCC '19: Proceedings of the ACM Symposium on Cloud Computing

Pages 114 - 125

https://doi.org/10.1145/3357223.3362714

Published: 20 November 2019 Publication History

Abstract

In clouds and data centers, GPU servers which consist of multiple GPUs are widely deployed. Current state-of-the-art GPU scheduling algorithm are "static" in assigning applications to different GPUs. These algorithms usually ignore the dynamics of the GPU utilization and are often inaccurate in estimating resource demand before assigning/running applications, so there is a large opportunity to further load balance and to improve GPU utilization. Based on CUDA (Compute Unified Device Architecture), we develop a runtime system called DCUDA which supports "dynamic" scheduling of running applications between multiple GPUs. In particular, DCUDA provides a realtime and lightweight method to accurately monitor the resource demand of applications and GPU utilization. Furthermore, it provides a universal migration facility to migrate "running applications" between GPUs with negligible overhead. More importantly, DCUDA transparently supports all CUDA applications without changing their source codes. Experiments with our prototype system show that DCUDA can reduce 78.3% of overloaded time of GPUs on average. As a result, for different workloads consisting of a wide range applications we studied, DCUDA can reduce the average execution time of applications by up to 42.1%. Furthermore, DCUDA also reduces 13.3% energy in the light load scenario.

References

[1]

2019. CUDA samples. http://docs.nvidia.com/cuda/cuda-samples/index.html.

[2]

2019. CUDA Toolkit. https://developer.nvidia.com/cuda-toolkit.

[3]

2019. nvdia-smi. https://developer.nvidia.com/nvidia-system-management-interface.

[4]

2019. nvprof. http://docs.nvidia.com/cuda/profiler-users-guide/index.html.

[5]

2019. Tensorflow benchmarks. https://github.com/tensorflow/benchmarks.

[6]

2019. Unified Memory on Pascal. https://devblogs.nvidia.com/beyond-gpu-memory-limits-unified-memory-pascal/.

[7]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI, Vol. 16. 265--283.

Digital Library

[8]

James Bergstra, Frédéric Bastien, Olivier Breuleux, Lamblin, et al. 2011. Theano: Deep learning on gpus with python. In NIPS 2011, BigLearning Workshop. Citeseer.

[9]

Guoyang Chen, Yue Zhao, Xipeng Shen, and Huiyang Zhou. 2017. EffiSha: A Software Framework for Enabling Effficient Preemptive Scheduling of GPU. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 3--16.

Digital Library

[10]

Quan Chen, Hailong Yang, Jason Mars, and Lingjia Tang. 2016. Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 681--696.

Digital Library

[11]

Anthony Danalis, Gabriel Marin, Collin McCurdy, Jeremy S Meredith, Philip C Roth, Kyle Spafford, Vinod Tipparaju, and Jeffrey S Vetter. 2010. The scalable heterogeneous computing (SHOC) benchmark suite. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units. ACM, 63--74.

Digital Library

[12]

Khaled M Diab, M Mustafa Rafique, and Mohamed Hefeeda. 2013. Dynamic sharing of GPUs in cloud systems. In IEEE 27th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW). IEEE, 947--954.

Digital Library

[13]

José Duato, Antonio J Pena, Federico Silla, Rafael Mayo, and Enrique S Quintana-Ortí. 2010. rCUDA: Reducing the number of GPU-based accelerators in high performance clusters. In 2010 International Conference on High Performance Computing and Simulation (HPCS). IEEE, 224--231.

[14]

Wenbin Fang, Ka Keung Lau, Mian Lu, Xiangye Xiao, Chi K Lam, Philip Yang Yang, Bingsheng He, Qiong Luo, Pedro V Sander, and Ke Yang. 2008. Parallel data mining on graphics processors. Hong Kong Univ. Sci. and Technology, Hong Kong, China, Tech. Rep. HKUST-CS08-07 (2008).

[15]

Mariza Ferro, André Yokoyama, Vinicius Klôh, Gabrieli Silva, Rodrigo Gandra, Ricardo Bragança, Andre Bulcao, Bruno Schulze, and Petróleo Brasileiro SA-PETROBRAS. 2017. Analysis of gpu power consumption using internal sensors. In Anais do XVI Workshop em Desempenho de Sistemas Computacionais e de Comunicaç ao, Sao Paulo-SP. Sociedade Brasileira de Computaçao (SBC).

[16]

Chris Gregg, Jonathan Dorn, Kim M Hazelwood, and Kevin Skadron. 2012. FineGrained Resource Sharing for Concurrent GPGPU Kernels. In HotPar.

[17]

Vishakha Gupta, Ada Gavrilovska, Karsten Schwan, Harshvardhan Kharche, Niraj Tolia, Vanish Talwar, and Parthasarathy Ranganathan. 2009. GViM: GPU-accelerated virtual machines. In Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing. ACM, 17--24.

Digital Library

[18]

Vishakha Gupta, Karsten Schwan, Niraj Tolia, Vanish Talwar, and Parthasarathy Ranganathan. 2011. Pegasus: Coordinated scheduling for virtualized accelerator-based systems. In 2011 USENIX Annual Technical Conference (USENIX ATC'11). 31.

[19]

Everton Hermann, Bruno Raffin, François Faure, Thierry Gautier, and Jérémie Allard. 2010. Multi-GPU and multi-CPU parallelization for interactive physics simulations. In European Conference on Parallel Processing. Springer.

[20]

Raj Jain, Dah-Ming Chiu, and William Hawe. 1984. A Quantitative Measure of Fairness and Discrimination for Resource Allocation in Shared Systems, Digital Equipment Corporation. Technical Report DEC-TR-301 (1984).

[21]

Shinpei Kato, Karthik Lakshmanan, Ragunathan Raj Rajkumar, and Yutaka Ishikawa. 2011. TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments. In 2011 USENIX Annual Technical Conference (USENIX ATC). Citeseer, 17.

[22]

Naoya Maruyama, Tatsuo Nomura, Kento Sato, and Satoshi Matsuoka. 2011. Physis: an implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. ACM.

Digital Library

[23]

Akira Nukada, Hiroyuki Takizawa, and Satoshi Matsuoka. 2011. NVCR: A transparent checkpoint-restart library for NVIDIA CUDA. In IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW). IEEE, 104--113.

Digital Library

[24]

Sreepathi Pai, Matthew J Thazhuthaveetil, and Ramaswamy Govindarajan. 2013. Improving GPGPU concurrency with elastic kernels. In ACM SIGPLAN Notices, Vol. 48. ACM, 407--418.

Digital Library

[25]

Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke. 2015. Chimera: Collaborative preemption for multitasking on a shared GPU. ACM SIGARCH Computer Architecture News 43, 1 (2015), 593--606.

Digital Library

[26]

Dipanjan Sengupta, Raghavendra Belapure, and Karsten Schwan. 2013. Multi-tenancy on GPGPU-based servers. In Proceedings of the 7th international workshop on virtualization technologies in distributed computing. ACM, 3--10.

Digital Library

[27]

Dipanjan Sengupta, Anshuman Goswami, Karsten Schwan, and Krishna Pallavi. 2014. Scheduling multi-tenant cloud workloads on accelerator-based systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press, 513--524.

Digital Library

[28]

Lin Shi, Hao Chen, Jianhua Sun, and Kenli Li. 2012. vCUDA: GPU-accelerated high-performance computing in virtual machines. IEEE Trans. Comput. 61, 6 (2012), 804--816.

Digital Library

[29]

Hiroyuki Takizawa, Katsuto Sato, Kazuhiko Komatsu, and Hiroaki Kobayashi. 2009. CheCUDA: A checkpoint/restart tool for CUDA applications. In International Conference on Parallel and Distributed Computing, Applications and Technologies. IEEE, 408--413.

Digital Library

[30]

Yash Ukidave, Xiangyu Li, and David Kaeli. 2016. Mystic: Predictive scheduling for gpu based cloud servers using machine learning. In 2016 IEEE International Parallel and Distributed Processing Symposium. IEEE, 353--362.

[31]

Matthias Vogelgesang, Suren Chilingaryan, Tomy dos_Santos Rolo, and Andreas Kopmann. 2012. UFO: A scalable GPU-based image processing framework for on-line monitoring. In IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS) & IEEE 14th International Conference on High Performance Computing and Communication. IEEE.

Digital Library

[32]

Bo Wu, Xu Liu, Xiaobo Zhou, and Changjun Jiang. 2017. FLEP: Enabling Flexible and Efficient Preemption on GPUs. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 483--496.

Digital Library

[33]

Shucai Xiao, Pavan Balaji, James Dinan, Qian Zhu, Rajeev Thakur, Susan Coghlan, Heshan Lin, Gaojin Wen, Jue Hong, and Wu-chun Feng. 2012. Transparent accelerator migration in a virtualized GPU environment. In Proceedings of the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012). IEEE Computer Society, 124--131.

Digital Library

Cited By

Pei QWang LZhang DYan BYu CLiu F(2024)InferCool: Enhancing AI Inference Cooling through Transparent, Non-Intrusive Task ReassignmentProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698556(487-504)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3698038.3698556
Xiang YTang DHuang RYao YXie CShi QXu RHaghighat MBao CGu YQi ZGuan H(2024)CARE: Cloudified Android With Optimized Rendering PlatformIEEE Transactions on Multimedia10.1109/TMM.2023.327430326(958-971)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3274303
Nabavinejad SReda SGuo T(2024)MediatorDNN: Contention Mitigation for Co-Located DNN Inference Jobs2024 IEEE 17th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD62652.2024.00063(502-512)Online publication date: 7-Jul-2024
https://doi.org/10.1109/CLOUD62652.2024.00063
Show More Cited By

Index Terms

DCUDA: Dynamic GPU Scheduling with Live Migration Support
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Availability

Recommendations

POSTER: Pagoda: A Runtime System to Maximize GPU Utilization in Data Parallel Tasks with Limited Parallelism
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation

Massively multithreaded GPUs achieve high throughput by running thousands of threads in parallel. To fully utilize the hardware, contemporary workloads spawn work to the GPU in bulk by launching large tasks, where each task is a kernel that contains ...
A novel GPU resources management and scheduling system based on virtual machines

Virtual machine (VM) technologies offer lots of benefits such as users' isolation, server consolidation and live migration. However, owing to the overhead incurred by indirect access to physical resources such as GPU, IO devices and VM technologies have ...
Crane: fast and migratable GPU passthrough for OpenCL applications
SYSTOR '17: Proceedings of the 10th ACM International Systems and Storage Conference

General purpose GPU (GPGPU) computing in virtualized environments leverages PCI passthrough to achieve GPU performance comparable to bare-metal execution. However, GPU passthrough prevents service administrators from performing virtual machine migration ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SoCC '19: Proceedings of the ACM Symposium on Cloud Computing

November 2019

503 pages

ISBN:9781450369732

DOI:10.1145/3357223

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 November 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Natural Science Foundation of China
National Key R&D Program of China
Youth Innovation Promotion Association CAS

Conference

SoCC '19

Sponsor:

SoCC '19: ACM Symposium on Cloud Computing

November 20 - 23, 2019

CA, Santa Cruz, USA

Acceptance Rates

SoCC '19 Paper Acceptance Rate 39 of 157 submissions, 25%;

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

42
Total Citations
View Citations
779
Total Downloads

Downloads (Last 12 months)108
Downloads (Last 6 weeks)9

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Pei QWang LZhang DYan BYu CLiu F(2024)InferCool: Enhancing AI Inference Cooling through Transparent, Non-Intrusive Task ReassignmentProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698556(487-504)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3698038.3698556
Xiang YTang DHuang RYao YXie CShi QXu RHaghighat MBao CGu YQi ZGuan H(2024)CARE: Cloudified Android With Optimized Rendering PlatformIEEE Transactions on Multimedia10.1109/TMM.2023.327430326(958-971)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3274303
Nabavinejad SReda SGuo T(2024)MediatorDNN: Contention Mitigation for Co-Located DNN Inference Jobs2024 IEEE 17th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD62652.2024.00063(502-512)Online publication date: 7-Jul-2024
https://doi.org/10.1109/CLOUD62652.2024.00063
John JMilthorpe JHerault TBosilca G(2024)Multi-GPU work sharing in a task-based dataflow programming modelFuture Generation Computer Systems10.1016/j.future.2024.03.017156(313-324)Online publication date: Jul-2024
https://doi.org/10.1016/j.future.2024.03.017
Wang YYu JYu Z(2023)Resource scheduling techniques in cloud from a view of coordination: a holistic survey从协同视角论云资源调度技术：综述Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.210029824:1(1-40)Online publication date: 23-Jan-2023
https://doi.org/10.1631/FITEE.2100298
Chen YRao YChen SLei ZXie HLau RYin J(2023)Semi-Supervised Sentiment Classification and Emotion Distribution Learning Across DomainsACM Transactions on Knowledge Discovery from Data10.1145/357173617:5(1-30)Online publication date: 27-Feb-2023
https://dl.acm.org/doi/10.1145/3571736
Ren YZhang HYu PFu LCao XWang XChen GLong FZhou C(2023)Ada-MIP: Adaptive Self-supervised Graph Representation Learning via Mutual Information and Proximity OptimizationACM Transactions on Knowledge Discovery from Data10.1145/356816517:5(1-23)Online publication date: 7-Apr-2023
https://dl.acm.org/doi/10.1145/3568165
Plehn JFuchs AKuhn MLüttgau JLudwig T(2022)Data-Aware Compression for HPC using Machine LearningACM SIGOPS Operating Systems Review10.1145/3544497.354450856:1(62-69)Online publication date: 14-Jun-2022
https://dl.acm.org/doi/10.1145/3544497.3544508
Purandare DBittman DMiller E(2022)Analysis and Workload Characterization of the CERN EOS Storage SystemACM SIGOPS Operating Systems Review10.1145/3544497.354450756:1(55-61)Online publication date: 14-Jun-2022
https://dl.acm.org/doi/10.1145/3544497.3544507
Gan YLiang MDev SLo DDelimitrou C(2022)Enabling Practical Cloud Performance Debugging with Unsupervised LearningACM SIGOPS Operating Systems Review10.1145/3544497.354450356:1(34-41)Online publication date: 14-Jun-2022
https://dl.acm.org/doi/10.1145/3544497.3544503
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents