Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3357223.3362714acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

DCUDA: Dynamic GPU Scheduling with Live Migration Support

Published: 20 November 2019 Publication History

Abstract

In clouds and data centers, GPU servers which consist of multiple GPUs are widely deployed. Current state-of-the-art GPU scheduling algorithm are "static" in assigning applications to different GPUs. These algorithms usually ignore the dynamics of the GPU utilization and are often inaccurate in estimating resource demand before assigning/running applications, so there is a large opportunity to further load balance and to improve GPU utilization. Based on CUDA (Compute Unified Device Architecture), we develop a runtime system called DCUDA which supports "dynamic" scheduling of running applications between multiple GPUs. In particular, DCUDA provides a realtime and lightweight method to accurately monitor the resource demand of applications and GPU utilization. Furthermore, it provides a universal migration facility to migrate "running applications" between GPUs with negligible overhead. More importantly, DCUDA transparently supports all CUDA applications without changing their source codes. Experiments with our prototype system show that DCUDA can reduce 78.3% of overloaded time of GPUs on average. As a result, for different workloads consisting of a wide range applications we studied, DCUDA can reduce the average execution time of applications by up to 42.1%. Furthermore, DCUDA also reduces 13.3% energy in the light load scenario.

References

[1]
2019. CUDA samples. http://docs.nvidia.com/cuda/cuda-samples/index.html.
[2]
2019. CUDA Toolkit. https://developer.nvidia.com/cuda-toolkit.
[3]
2019. nvdia-smi. https://developer.nvidia.com/nvidia-system-management-interface.
[4]
2019. nvprof. http://docs.nvidia.com/cuda/profiler-users-guide/index.html.
[5]
2019. Tensorflow benchmarks. https://github.com/tensorflow/benchmarks.
[6]
2019. Unified Memory on Pascal. https://devblogs.nvidia.com/beyond-gpu-memory-limits-unified-memory-pascal/.
[7]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI, Vol. 16. 265--283.
[8]
James Bergstra, Frédéric Bastien, Olivier Breuleux, Lamblin, et al. 2011. Theano: Deep learning on gpus with python. In NIPS 2011, BigLearning Workshop. Citeseer.
[9]
Guoyang Chen, Yue Zhao, Xipeng Shen, and Huiyang Zhou. 2017. EffiSha: A Software Framework for Enabling Effficient Preemptive Scheduling of GPU. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 3--16.
[10]
Quan Chen, Hailong Yang, Jason Mars, and Lingjia Tang. 2016. Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 681--696.
[11]
Anthony Danalis, Gabriel Marin, Collin McCurdy, Jeremy S Meredith, Philip C Roth, Kyle Spafford, Vinod Tipparaju, and Jeffrey S Vetter. 2010. The scalable heterogeneous computing (SHOC) benchmark suite. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units. ACM, 63--74.
[12]
Khaled M Diab, M Mustafa Rafique, and Mohamed Hefeeda. 2013. Dynamic sharing of GPUs in cloud systems. In IEEE 27th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW). IEEE, 947--954.
[13]
José Duato, Antonio J Pena, Federico Silla, Rafael Mayo, and Enrique S Quintana-Ortí. 2010. rCUDA: Reducing the number of GPU-based accelerators in high performance clusters. In 2010 International Conference on High Performance Computing and Simulation (HPCS). IEEE, 224--231.
[14]
Wenbin Fang, Ka Keung Lau, Mian Lu, Xiangye Xiao, Chi K Lam, Philip Yang Yang, Bingsheng He, Qiong Luo, Pedro V Sander, and Ke Yang. 2008. Parallel data mining on graphics processors. Hong Kong Univ. Sci. and Technology, Hong Kong, China, Tech. Rep. HKUST-CS08-07 (2008).
[15]
Mariza Ferro, André Yokoyama, Vinicius Klôh, Gabrieli Silva, Rodrigo Gandra, Ricardo Bragança, Andre Bulcao, Bruno Schulze, and Petróleo Brasileiro SA-PETROBRAS. 2017. Analysis of gpu power consumption using internal sensors. In Anais do XVI Workshop em Desempenho de Sistemas Computacionais e de Comunicaç ao, Sao Paulo-SP. Sociedade Brasileira de Computaçao (SBC).
[16]
Chris Gregg, Jonathan Dorn, Kim M Hazelwood, and Kevin Skadron. 2012. FineGrained Resource Sharing for Concurrent GPGPU Kernels. In HotPar.
[17]
Vishakha Gupta, Ada Gavrilovska, Karsten Schwan, Harshvardhan Kharche, Niraj Tolia, Vanish Talwar, and Parthasarathy Ranganathan. 2009. GViM: GPU-accelerated virtual machines. In Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing. ACM, 17--24.
[18]
Vishakha Gupta, Karsten Schwan, Niraj Tolia, Vanish Talwar, and Parthasarathy Ranganathan. 2011. Pegasus: Coordinated scheduling for virtualized accelerator-based systems. In 2011 USENIX Annual Technical Conference (USENIX ATC'11). 31.
[19]
Everton Hermann, Bruno Raffin, François Faure, Thierry Gautier, and Jérémie Allard. 2010. Multi-GPU and multi-CPU parallelization for interactive physics simulations. In European Conference on Parallel Processing. Springer.
[20]
Raj Jain, Dah-Ming Chiu, and William Hawe. 1984. A Quantitative Measure of Fairness and Discrimination for Resource Allocation in Shared Systems, Digital Equipment Corporation. Technical Report DEC-TR-301 (1984).
[21]
Shinpei Kato, Karthik Lakshmanan, Ragunathan Raj Rajkumar, and Yutaka Ishikawa. 2011. TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments. In 2011 USENIX Annual Technical Conference (USENIX ATC). Citeseer, 17.
[22]
Naoya Maruyama, Tatsuo Nomura, Kento Sato, and Satoshi Matsuoka. 2011. Physis: an implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. ACM.
[23]
Akira Nukada, Hiroyuki Takizawa, and Satoshi Matsuoka. 2011. NVCR: A transparent checkpoint-restart library for NVIDIA CUDA. In IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW). IEEE, 104--113.
[24]
Sreepathi Pai, Matthew J Thazhuthaveetil, and Ramaswamy Govindarajan. 2013. Improving GPGPU concurrency with elastic kernels. In ACM SIGPLAN Notices, Vol. 48. ACM, 407--418.
[25]
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke. 2015. Chimera: Collaborative preemption for multitasking on a shared GPU. ACM SIGARCH Computer Architecture News 43, 1 (2015), 593--606.
[26]
Dipanjan Sengupta, Raghavendra Belapure, and Karsten Schwan. 2013. Multi-tenancy on GPGPU-based servers. In Proceedings of the 7th international workshop on virtualization technologies in distributed computing. ACM, 3--10.
[27]
Dipanjan Sengupta, Anshuman Goswami, Karsten Schwan, and Krishna Pallavi. 2014. Scheduling multi-tenant cloud workloads on accelerator-based systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press, 513--524.
[28]
Lin Shi, Hao Chen, Jianhua Sun, and Kenli Li. 2012. vCUDA: GPU-accelerated high-performance computing in virtual machines. IEEE Trans. Comput. 61, 6 (2012), 804--816.
[29]
Hiroyuki Takizawa, Katsuto Sato, Kazuhiko Komatsu, and Hiroaki Kobayashi. 2009. CheCUDA: A checkpoint/restart tool for CUDA applications. In International Conference on Parallel and Distributed Computing, Applications and Technologies. IEEE, 408--413.
[30]
Yash Ukidave, Xiangyu Li, and David Kaeli. 2016. Mystic: Predictive scheduling for gpu based cloud servers using machine learning. In 2016 IEEE International Parallel and Distributed Processing Symposium. IEEE, 353--362.
[31]
Matthias Vogelgesang, Suren Chilingaryan, Tomy dos_Santos Rolo, and Andreas Kopmann. 2012. UFO: A scalable GPU-based image processing framework for on-line monitoring. In IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS) & IEEE 14th International Conference on High Performance Computing and Communication. IEEE.
[32]
Bo Wu, Xu Liu, Xiaobo Zhou, and Changjun Jiang. 2017. FLEP: Enabling Flexible and Efficient Preemption on GPUs. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 483--496.
[33]
Shucai Xiao, Pavan Balaji, James Dinan, Qian Zhu, Rajeev Thakur, Susan Coghlan, Heshan Lin, Gaojin Wen, Jue Hong, and Wu-chun Feng. 2012. Transparent accelerator migration in a virtualized GPU environment. In Proceedings of the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012). IEEE Computer Society, 124--131.

Cited By

View all
  • (2024)InferCool: Enhancing AI Inference Cooling through Transparent, Non-Intrusive Task ReassignmentProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698556(487-504)Online publication date: 20-Nov-2024
  • (2024)CARE: Cloudified Android With Optimized Rendering PlatformIEEE Transactions on Multimedia10.1109/TMM.2023.327430326(958-971)Online publication date: 1-Jan-2024
  • (2024)MediatorDNN: Contention Mitigation for Co-Located DNN Inference Jobs2024 IEEE 17th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD62652.2024.00063(502-512)Online publication date: 7-Jul-2024
  • Show More Cited By

Index Terms

  1. DCUDA: Dynamic GPU Scheduling with Live Migration Support

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SoCC '19: Proceedings of the ACM Symposium on Cloud Computing
    November 2019
    503 pages
    ISBN:9781450369732
    DOI:10.1145/3357223
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 November 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. GPU scheduling
    2. live migration

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    SoCC '19
    Sponsor:
    SoCC '19: ACM Symposium on Cloud Computing
    November 20 - 23, 2019
    CA, Santa Cruz, USA

    Acceptance Rates

    SoCC '19 Paper Acceptance Rate 39 of 157 submissions, 25%;
    Overall Acceptance Rate 169 of 722 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)108
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 18 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)InferCool: Enhancing AI Inference Cooling through Transparent, Non-Intrusive Task ReassignmentProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698556(487-504)Online publication date: 20-Nov-2024
    • (2024)CARE: Cloudified Android With Optimized Rendering PlatformIEEE Transactions on Multimedia10.1109/TMM.2023.327430326(958-971)Online publication date: 1-Jan-2024
    • (2024)MediatorDNN: Contention Mitigation for Co-Located DNN Inference Jobs2024 IEEE 17th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD62652.2024.00063(502-512)Online publication date: 7-Jul-2024
    • (2024)Multi-GPU work sharing in a task-based dataflow programming modelFuture Generation Computer Systems10.1016/j.future.2024.03.017156(313-324)Online publication date: Jul-2024
    • (2023)Resource scheduling techniques in cloud from a view of coordination: a holistic survey从协同视角论云资源调度技术:综述Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.210029824:1(1-40)Online publication date: 23-Jan-2023
    • (2023)Semi-Supervised Sentiment Classification and Emotion Distribution Learning Across DomainsACM Transactions on Knowledge Discovery from Data10.1145/357173617:5(1-30)Online publication date: 27-Feb-2023
    • (2023)Ada-MIP: Adaptive Self-supervised Graph Representation Learning via Mutual Information and Proximity OptimizationACM Transactions on Knowledge Discovery from Data10.1145/356816517:5(1-23)Online publication date: 7-Apr-2023
    • (2022)Data-Aware Compression for HPC using Machine LearningACM SIGOPS Operating Systems Review10.1145/3544497.354450856:1(62-69)Online publication date: 14-Jun-2022
    • (2022)Analysis and Workload Characterization of the CERN EOS Storage SystemACM SIGOPS Operating Systems Review10.1145/3544497.354450756:1(55-61)Online publication date: 14-Jun-2022
    • (2022)Enabling Practical Cloud Performance Debugging with Unsupervised LearningACM SIGOPS Operating Systems Review10.1145/3544497.354450356:1(34-41)Online publication date: 14-Jun-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media