Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Public Access

FLEP: Enabling Flexible and Efficient Preemption on GPUs

Published: 04 April 2017 Publication History

Abstract

GPUs are widely adopted in HPC and cloud computing platforms to accelerate general-purpose workloads. However, modern GPUs do not support flexible preemption, leading to performance and priority inversion problems in multi-tasking environments.
In this paper, we propose and develop FLEP, the first software system that enables flexible kernel preemption and kernel scheduling on commodity GPUs. The FLEP compilation engine transforms the GPU program into preemptable forms, which can be interrupted during execution and yield all or part of the streaming multi-processors (SMs) in the GPU. The FLEP runtime engine intercepts all kernel invocations and determines which kernels and how those kernels should be preempted and scheduled. Experimental results on two-kernel co-runs demonstrate up to 24.2X speedup for high-priority kernels and up to 27X improvement on normalized average turnaround time for kernels with the same priority. FLEP reduces the preemption latency by up to 41% compared to yielding the whole GPU when the waiting kernels only need several SMs. With all the benefits, FLEP only introduces 2.5% runtime overhead, which is substantially lower than the kernel slicing approach.

References

[1]
clang: a C language family frontend for LLVM. http://clang.llvm.org/; accessed 23-02-2016.
[2]
NVLink Communication Protocol. https://en.wikipedia.org/wiki/NVLink.
[3]
OpenCL. http://www.khronos.org/opencl/.
[4]
J. Adriaens, K. Compton, N. S. Kim, and M. J. Schulte. The case for GPGPU spatial multitasking. In 18th IEEE International Symposium on High Performance Computer Architecture, HPCA 2012, New Orleans, LA, USA, 25--29 February, 2012, pages 79--90, 2012.
[5]
C. Basaran and K. Kang. Supporting preemptive task executions and memory copies in gpgpus. In 24th Euromicro Conference on Real-Time Systems, ECRTS 2012, Pisa, Italy, July 11--13, 2012, pages 287--296, 2012.
[6]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In IISWC, 2009.
[7]
G. Chen, X. Shen, and H. Zhou. A software framework for efficient preemptive scheduling on gpu. Technical report, North Carolina State University, 2016.
[8]
G. Chen, Y. Zhao, X. Shen, and H. Zhou. Effisha: A software framework for enabling efficient preemptive scheduling of gpu. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP'17, 2017.
[9]
Q. Chen, H. Yang, J. Mars, and L. Tang. Baymax : Qos awareness and increased utilization of non-preemptive accelerators in warehouse scale computers. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '16, 2016.
[10]
A. Danalis, G. Marin, C. McCurdy, J. S. Meredith, P. C. Roth, K. Spafford, V. Tipparaju, and J. S. Vetter. The scalable heterogeneous computing (shoc) benchmark suite. In GPGPU, 2010.
[11]
J. Dean and L. A. Barroso. The tail at scale. Commun. ACM, 56(2):74--80, Feb. 2013.
[12]
Y. Dong, M. Xue, X. Zheng, J. Wang, Z. Qi, and H. Guan. Boosting gpu virtualization performance with hybrid shadow page tables. In 2015 USENIX Annual Technical Conference (USENIX ATC 15), pages 517--528, Santa Clara, CA, July 2015. USENIX Association.
[13]
G. A. Elliott and J. H. Anderson. Real-world constraints of gpus in real-time systems. In 17th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, RTCSA 2011, Toyama, Japan, August 28--31, 2011, Volume 2, pages 48--54, 2011.
[14]
S. Eyerman and L. Eeckhout. System-level performance metrics for multiprogram workloads. IEEE Micro, 28(3):42--53, 2008.
[15]
C. Gregg, J. Dorn, K. Hazelwood, and K. Skadron. Fine-grained resource sharing for concurrent gpgpu kernels. In Presented as part of the 4th USENIX Workshop on Hot Topics in Parallelism, Berkeley, CA, 2012. USENIX.
[16]
K. Gupta, J. A. Stuart, and J. D. Owens. A study of persistent threads style gpu programming for gpgpu workloads. In Innovative Parallel Computing, page 14, May 2012.
[17]
U. Hoelzle and L. A. Barroso. The Datacenter As a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan and Claypool Publishers, 1st edition, 2009.
[18]
A. Jog, O. Kayiran, T. Kesten, A. Pattnaik, E. Bolotin, N. Chatterjee, S. W. Keckler, M. T. Kandemir, and C. R. Das. Anatomy of gpu memory system for multi-application execution. In Proceedings of the 2015 International Symposium on Memory Systems, MEMSYS '15, pages 223--234, New York, NY, USA, 2015. ACM.
[19]
S. Kato, K. Lakshmanan, A. Kumar, M. Kelkar, Y. Ishikawa, and R. Rajkumar. RGEM: A responsive GPGPU execution model for runtime engines. In Proceedings of the 32nd IEEE Real-Time Systems Symposium, RTSS 2011, Vienna, Austria, November 29 - December 2, 2011, pages 57--66, 2011.
[20]
S. Kato, K. Lakshmanan, R. Rajkumar, and Y. Ishikawa. Timegraph: Gpu scheduling for real-time multi-tasking environments. In Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference, USENIXATC'11, pages 2--2, Berkeley, CA, USA, 2011. USENIX Association.
[21]
S. Kato, M. McThrow, C. Maltzahn, and S. Brandt. Gdev: First-class gpu resource management in the operating system. In Presented as part of the 2012 USENIX Annual Technical Conference (USENIX ATC 12), pages 401--412, Boston, MA, 2012. USENIX.
[22]
J. Kehne, J. Metter, and F. Bellosa. Gpuswap: Enabling oversubscription of gpu memory through transparent swapping. In Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE '15), pages 65--77, Istanbul, Turkey, Mar. 14--15 2015.
[23]
T. Li, V. K. Narayana, and T. A. El-Ghazawi. Reordering GPU kernel launches to enable efficient concurrent execution. CoRR, abs/1511.07983, 2015.
[24]
Y. Liang, H. P. Huynh, K. Rupnow, R. S. M. Goh, and D. Chen. Efficient GPU spatial-temporal multitasking. IEEE Trans. Parallel Distrib. Syst., 26(3):748--760, 2015.
[25]
C. Margiolas and M. F. P. O'Boyle. Portable and transparent software managed scheduling on accelerators for fair resource sharing. In Proceedings of the 2016 International Symposium on Code Generation and Optimization, CGO 2016, pages 82--93, New York, NY, USA, 2016. ACM.
[26]
S. Muthukrishnan, R. Rajaraman, A. Shaheen, and J. E. Gehrke. Online scheduling to minimize average stretch. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science, FOCS '99, Washington, DC, USA, 1999.
[27]
NVIDIA. Cuda software development toolkit v7.0\\.texttt https://developer.nvidia.com/cuda-toolkit-70.
[28]
NVIDIA. Nvidia's next generation cuda computer architecture: Fermi. Technical report.
[29]
NVIDIA. Next generation cuda computer architecture kepler gk110. Technical report, 2012.
[30]
NVIDIA. Sharing a gpu between mpi processes: multi-process service (mps) overview. Technical report, 2013.
[31]
S. Pai, M. J. Thazhuthaveetil, and R. Govindarajan. Improving gpgpu concurrency with elastic kernels. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '13, pages 407--418, New York, NY, USA, 2013.
[32]
J. J. K. Park, Y. Park, and S. Mahlke. Chimera: Collaborative preemption for multitasking on a shared gpu. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '15, pages 593--606, New York, NY, USA, 2015. ACM.
[33]
C. J. Rossbach, J. Currey, M. Silberstein, B. Ray, and E. Witchel. Ptask: Operating system abstractions to manage gpus as compute devices. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP '11, pages 233--248, New York, NY, USA, 2011. ACM.
[34]
Y. Suzuki, S. Kato, H. Yamada, and K. Kono. Gpuvm: Why not virtualizing gpus at the hypervisor? In 2014 USENIX Annual Technical Conference (USENIX ATC 14), pages 109--120, Philadelphia, PA, June 2014. USENIX Association.
[35]
I. Tanasic, I. Gelado, J. Cabezas, A. Ramirez, N. Navarro, and M. Valero. Enabling preemptive multiprogramming on gpus. In Proceeding of the 41st Annual International Symposium on Computer Architecuture, ISCA '14, pages 193--204, Piscataway, NJ, USA, 2014. IEEE Press.
[36]
K. Tian, Y. Dong, and D. Cowperthwaite. A full gpu virtualization solution with mediated pass-through. In 2014 USENIX Annual Technical Conference (USENIX ATC 14), pages 121--132, Philadelphia, PA, June 2014. USENIX Association.
[37]
K. Wang, X. Ding, R. Lee, S. Kato, and X. Zhang. Gdm: Device memory management for gpgpu computing. In The 2014 ACM International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '14, pages 533--545, New York, NY, USA, 2014. ACM.
[38]
K. Wang, K. Zhang, Y. Yuan, S. Ma, R. Lee, X. Ding, and X. Zhang. Concurrent analytical query processing with gpus. Proc. VLDB Endow., 7(11):1011--1022, July 2014.
[39]
Z. Wang, J. Yang, R. Melhem, B. Childers, Y. Zhang, and M. Guo. Simultaneous multikernel: Fine-grained sharing of gpgpus. IEEE COMPUTER ARCHITECTURE LETTERS, PP(99):748--760, 2015.
[40]
B. Wu, G. Chen, D. Li, X. Shen, and J. S. Vetter. Enabling and exploiting flexible task assignment on GPU through sm-centric program transformations. In Proceedings of the 29th ACM on International Conference on Supercomputing, ICS'15, Newport Beach/Irvine, CA, USA, June 08 - 11, 2015, pages 119--130, 2015.
[41]
H. Zhou, G. Tong, and C. Liu. GPES: a preemptive execution system for GPGPU computing. In 21st IEEE Real-Time and Embedded Technology and Applications Symposium, Seattle, WA, USA, April 13--16, 2015, pages 87--97, 2015.

Cited By

View all
  • (2023)Kernel-as-a-ServiceProceedings of the 24th International Middleware Conference10.1145/3590140.3629115(192-206)Online publication date: 27-Nov-2023
  • (2023)Secure and Timely GPU Execution in Cyber-physical SystemsProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3623197(2591-2605)Online publication date: 15-Nov-2023
  • (2022)RAISE: Efficient GPU Resource Management via Hybrid Scheduling2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid54584.2022.00078(685-695)Online publication date: May-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 52, Issue 4
ASPLOS '17
April 2017
811 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3093336
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems
    April 2017
    856 pages
    ISBN:9781450344654
    DOI:10.1145/3037697
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 April 2017
Published in SIGPLAN Volume 52, Issue 4

Check for updates

Author Tags

  1. gpgpu
  2. kernel scheduling
  3. multi-tasking
  4. preemption

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)344
  • Downloads (Last 6 weeks)92
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Kernel-as-a-ServiceProceedings of the 24th International Middleware Conference10.1145/3590140.3629115(192-206)Online publication date: 27-Nov-2023
  • (2023)Secure and Timely GPU Execution in Cyber-physical SystemsProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3623197(2591-2605)Online publication date: 15-Nov-2023
  • (2022)RAISE: Efficient GPU Resource Management via Hybrid Scheduling2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid54584.2022.00078(685-695)Online publication date: May-2022
  • (2022)QoS-aware dynamic resource allocation with improved utilization and energy efficiency on GPUParallel Computing10.1016/j.parco.2022.102958113(102958)Online publication date: Oct-2022
  • (2022)gShareFuture Generation Computer Systems10.1016/j.future.2021.12.016130:C(181-192)Online publication date: 1-May-2022
  • (2022)FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUsThe Journal of Supercomputing10.1007/s11227-021-03819-z78:1(43-71)Online publication date: 1-Jan-2022
  • (2022)Interruptible Nodes: Reducing Queueing Costs in Irregular Streaming Dataflow Applications on Wide-SIMD ArchitecturesInternational Journal of Parallel Programming10.1007/s10766-022-00745-251:1(43-60)Online publication date: 5-Dec-2022
  • (2021)GPGPU Task Scheduling Technique for Reducing the Performance Deviation of Multiple GPGPU Tasks in RPC-Based GPU Virtualization EnvironmentsSymmetry10.3390/sym1303050813:3(508)Online publication date: 20-Mar-2021
  • (2021)Gost: Enabling Efficient Spatio-Temporal GPU Sharing for Network Function Virtualization2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS)10.1109/IWQOS52092.2021.9521266(1-10)Online publication date: 25-Jun-2021
  • (2021)CTXBack: Enabling Low Latency GPU Context Switching via Context Flashback2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00021(121-130)Online publication date: May-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media