research-article

GDM: device memory management for gpgpu computing

Authors:

Xiaodong ZhangAuthors Info & Claims

SIGMETRICS '14: The 2014 ACM international conference on Measurement and modeling of computer systems

Pages 533 - 545

https://doi.org/10.1145/2591971.2592002

Published: 16 June 2014 Publication History

Abstract

GPGPUs are evolving from dedicated accelerators towards mainstream commodity computing resources. During the transition, the lack of system management of device memory space on GPGPUs has become a major hurdle. In existing GPGPU systems, device memory space is still managed explicitly by individual applications, which not only increases the burden of programmers but can also cause application crashes, hangs, or low performance.

In this paper, we present the design and implementation of GDM, a fully functional GPGPU device memory manager to address the above problems and unleash the computing power of GPGPUs in general-purpose environments. To effectively coordinate the device memory usage of different applications, GDM takes control over device memory allocations and data transfers to and from device memory, leveraging a buffer allocated in each application's virtual memory. GDM utilizes the unique features of GPGPU systems and relies on several effective optimization techniques to guarantee the efficient usage of device memory space and to achieve high performance.

We have evaluated GDM and compared it against state-of-the-art GPGPU system software on a range of workloads. The results show that GDM can prevent applications from crashes, including those induced by device memory leaks, and improve system performance by up to 43%.

References

[1]

http://mathworks.com/matlabcentral/newsreader/view thread/324086.

[2]

http://milkyway.cs.rpi.edu/milkyway/forum thread.php?id=2780.

[3]

http://culatools.com/blog/2012/03/12/3099.

[4]

http://blenderartists.org/forum/showthread.php?269777.

[5]

https://github.com/Theano/Theano (commit#:5a755867f21b9a61, fe69a5a5b3a44695, 410016f9d6025064, 9bdeda96639e77af).

[6]

http://mail-archive.com/[email protected]/msg02432.html.

[7]

http://amd.com/en-us/innovations/software-technologies/apu.

[8]

http://documen.tician.de/pycuda/.

[9]

https://devtalk.nvidia.com/default/topic/513370/cublas-problem.

[10]

http://setiweb.ssl.berkeley.edu/beta/forum thread.php?id=1441.

[11]

http://mathworks.com/matlabcentral/answers/85601-unavoidable-memory-leaks-in-mex.

[12]

http://nouveau.freedesktop.org.

[13]

https://github.com/serban/kmeans.

[14]

http://www.hsafoundation.com/.

[15]

CULA linear algebra libraries. culatools.com.

[16]

AMD. AMD accelerated parallel processing OpenCL programming guide, 2013.

[17]

J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. Theano: a CPU and GPU math expression compiler. In SciPy, 2010.

[18]

B. Black, M. Annavaram, N. Brekelbaum, J. DeVale, L. Jiang, G. H. Loh, D. McCaule, P. Morrow, D. W. Nelson, D. Pantuso, P. Reed, J. Rupley, S. Shankar, J. Shen, and C. Webb. Die stacking (3d) microarchitecture. In MICRO, 2006.

Digital Library

[19]

R. Chaiken, B. Jenkins, P.-A. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. Scope: easy and efficient parallel processing of massive data sets. Proc. VLDB Endow., 1(2):1265{1276, 2008.

Digital Library

[20]

S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In IISWC, 2009.

Digital Library

[21]

P. J. Denning. Virtual memory. ACM Comput. Surv., 2(3):153--189, 1970.

Digital Library

[22]

P. J. Denning. Third generation computer systems. ACM Comput. Surv., 3(4):175--216, Dec. 1971.

Digital Library

[23]

D. R. Engler and M. F. Kaashoek. Exterminate all operating system abstractions. In HOTOS, 1995.

Digital Library

[24]

I. Gelado, J. E. Stone, J. Cabezas, S. Patel, N. Navarro, and W.-m. W. Hwu. An asymmetric distributed shared memory model for heterogeneous parallel systems. In ASPLOS, 2010.

Digital Library

[25]

K. O. W. Group. The OpenCL specification 1.2, 2013.

[26]

V. Gupta, K. Schwan, N. Tolia, V. Talwar, and P. Ranganathan. Pegasus: coordinated scheduling for virtualized accelerator-based systems. In USENIX ATC, 2011.

Digital Library

[27]

T. B. Jablin, P. Prabhu, J. A. Jablin, N. P. Johnson, S. R. Beard, and D. I. August. Automatic CPU-GPU communication management and optimization. In PLDI, 2011.

Digital Library

[28]

K. Jang, S. Han, S. Han, S. Moon, and K. Park. SSLShader: cheap SSL acceleration with commodity processors. In NSDI, 2011.

Digital Library

[29]

F. Ji, H. Lin, and X. Ma. RSVM: a region-based software virtual memory for GPU. In PACT, 2013.

Digital Library

[30]

S. Kato, K. Lakshmanan, R. Rajkumar, and Y. Ishikawa. TimeGraph: GPU scheduling for real-time multi-tasking environments. In USENIX ATC, 2011.

Digital Library

[31]

S. Kato, M. McThrow, C. Maltzahn, and S. Brandt. Gdev: first-class GPU resource management in the operating system. In USENIX ATC, 2012.

Digital Library

[32]

H. Kim. Supporting virtual memory in GPGPU without supporting precise exceptions. In MSPC, 2012.

Digital Library

[33]

V. W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey. Debunking the 100x GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In ISCA, 2010.

Digital Library

[34]

E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro, 28(2), 2008.

Digital Library

[35]

M. Macedonia. The GPU enters computing's mainstream. Computer, 36(10):106--108, 2003.

Digital Library

[36]

J. Menon, M. De Kruijf, and K. Sankaralingam. iGPU: exception support and speculative execution on GPUs. In ISCA, 2012.

Digital Library

[37]

T. Ni. Direct Compute: Bring GPU computing to the mainstream. In GTC, 2009.

[38]

NVIDIA. NVIDIA's next generation CUDA compute architecture: Kepler GK110, 2012.

[39]

NVIDIA. NVIDIA CUDA C programming guide, 2013.

[40]

J. Poulton. An embedded DRAM for CMOS ASICs. In ARVLSI, 1997.

Digital Library

[41]

C. J. Rossbach, J. Currey, M. Silberstein, B. Ray, and E. Witchel. PTask: operating system abstractions to manage GPUs as compute devices. In SOSP, 2011.

Digital Library

[42]

M. Silberstein, B. Ford, I. Keidar, and E. Witchel. GPUfs: integrating a file system with GPUs. In ASPLOS, 2013.

Digital Library

[43]

A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous multithreaded processor. In ASPLOS, 2000.

Digital Library

[44]

K. Wang, Y. Huai, R. Lee, F. Wang, X. Zhang, and J. H. Saltz. Accelerating pathology image data cross-comparison on CPU-GPU hybrid systems. Proc. VLDB Endow., 5(11), 2012.

Digital Library

[45]

Y. Yuan, R. Lee, and X. Zhang. The yin and yang of processing data warehousing queries on GPU devices. Proc. VLDB Endow., 6(10):817--828, 2013.

Digital Library

Cited By

Sun HXu JJiang XChen GYue YQin X(2024)gLSM: Using GPGPU to Accelerate Compactions in LSM-tree-based Key-value StoresACM Transactions on Storage10.1145/363378220:1(1-41)Online publication date: 30-Jan-2024
https://dl.acm.org/doi/10.1145/3633782
Markthub PBelviranli MLee SVetter JMatsuoka S(2018)DRAGONProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291699(1-13)Online publication date: 11-Nov-2018
https://dl.acm.org/doi/10.5555/3291656.3291699
Markthub PBelviranli MLee SVetter JMatsuoka S(2018)DRAGONProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC.2018.00035(1-13)Online publication date: 11-Nov-2018
https://dl.acm.org/doi/10.1109/SC.2018.00035
Show More Cited By

Index Terms

GDM: device memory management for gpgpu computing

Recommendations

GDM: device memory management for gpgpu computing
Performance evaluation review

GPGPUs are evolving from dedicated accelerators towards mainstream commodity computing resources. During the transition, the lack of system management of device memory space on GPGPUs has become a major hurdle. In existing GPGPU systems, device memory ...
Efficient heterogeneous execution on large multicore and accelerator platforms: Case study using a block tridiagonal solver

The algorithmic and implementation principles are explored in gainfully exploiting GPU accelerators in conjunction with multicore processors on high-end systems with large numbers of compute nodes, and evaluated in an implementation of a scalable block ...
Exploration of Non-volatile Memory Management in the OS Kernel
ICNC '12: Proceedings of the 2012 Third International Conference on Networking and Computing

Non-volatile memory's future is promising because its performance has been improved significantly. The performance improvement enables non-volatile memory to be a major part of the memory of general purpose systems. Utilization of large non-volatile ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMETRICS '14: The 2014 ACM international conference on Measurement and modeling of computer systems

June 2014

614 pages

ISBN:9781450327893

DOI:10.1145/2591971

General Chairs:
Sujay Sanghavi
The University of Texas at Austin
,
Sanjay Shakkottai
The University of Texas at Austin
,
Program Chairs:
Marc Lelarge
INRIA, France
,
Bianca Schroeder
University of Toronto

ACM SIGMETRICS Performance Evaluation Review Volume 42, Issue 1
Performance evaluation review
June 2014
569 pages
ISSN:0163-5999
DOI:10.1145/2637364
Editors:
Derek Eager
University of Saskatchewan
,
Carey Williamson
University of Calgary
Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMETRICS: ACM Special Interest Group on Measurement and Evaluation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 June 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

SIGMETRICS '14

Sponsor:

SIGMETRICS

SIGMETRICS '14: ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems

June 16 - 20, 2014

Texas, Austin, USA

Acceptance Rates

SIGMETRICS '14 Paper Acceptance Rate 40 of 237 submissions, 17%;

Overall Acceptance Rate 459 of 2,691 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

36
Total Citations
View Citations
504
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)1

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sun HXu JJiang XChen GYue YQin X(2024)gLSM: Using GPGPU to Accelerate Compactions in LSM-tree-based Key-value StoresACM Transactions on Storage10.1145/363378220:1(1-41)Online publication date: 30-Jan-2024
https://dl.acm.org/doi/10.1145/3633782
Markthub PBelviranli MLee SVetter JMatsuoka S(2018)DRAGONProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291699(1-13)Online publication date: 11-Nov-2018
https://dl.acm.org/doi/10.5555/3291656.3291699
Markthub PBelviranli MLee SVetter JMatsuoka S(2018)DRAGONProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC.2018.00035(1-13)Online publication date: 11-Nov-2018
https://dl.acm.org/doi/10.1109/SC.2018.00035
Chen GZhao YShen XZhou H(2017)EffiShaACM SIGPLAN Notices10.1145/3155284.301874852:8(3-16)Online publication date: 26-Jan-2017
https://dl.acm.org/doi/10.1145/3155284.3018748
Tanasic IGelado IJorda MAyguade ENavarro NHunter HMoreno JEmer JSanchez D(2017)Efficient exception handling support for GPUsProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3123950(109-122)Online publication date: 14-Oct-2017
https://dl.acm.org/doi/10.1145/3123939.3123950
Wu BLiu XZhou XJiang C(2017)FLEPACM SIGARCH Computer Architecture News10.1145/3093337.303774245:1(483-496)Online publication date: 4-Apr-2017
https://dl.acm.org/doi/10.1145/3093337.3037742
Wu BLiu XZhou XJiang C(2017)FLEPACM SIGPLAN Notices10.1145/3093336.303774252:4(483-496)Online publication date: 4-Apr-2017
https://dl.acm.org/doi/10.1145/3093336.3037742
Wu BLiu XZhou XJiang C(2017)FLEPACM SIGOPS Operating Systems Review10.1145/3093315.303774251:2(483-496)Online publication date: 4-Apr-2017
https://doi.org/10.1145/3093315.3037742
Liu CBhimani JLeeser MKumar ASarkar SGerndt M(2017)Using High Level GPU Tasks to Explore Memory and Communications Options on Heterogeneous PlatformsProceedings of the 2017 Workshop on Software Engineering Methods for Parallel and High Performance Applications10.1145/3085158.3086160(21-28)Online publication date: 26-Jun-2017
https://dl.acm.org/doi/10.1145/3085158.3086160
Kehne JHillenbrand MMetter JGottschlag MMerkel MBellosa FChen DDesnoyers Pde Lara E(2017)GPrioSwapProceedings of the 10th ACM International Systems and Storage Conference10.1145/3078468.3078474(1-10)Online publication date: 22-May-2017
https://dl.acm.org/doi/10.1145/3078468.3078474
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents