research-article

Region-based memory management for GPU programming languages: enabling rich data structures on a spartan host

Authors:

Andrew LumsdaineAuthors Info & Claims

ACM SIGPLAN Notices, Volume 49, Issue 10

Pages 141 - 155

https://doi.org/10.1145/2714064.2660244

Published: 15 October 2014 Publication History

Abstract

Graphics processing units (GPUs) can effectively accelerate many applications, but their applicability has been largely limited to problems whose solutions can be expressed neatly in terms of linear algebra. Indeed, most GPU programming languages limit the user to simple data structures - typically only multidimensional rectangular arrays of scalar values. Many algorithms are more naturally expressed using higher level language features, such as algebraic data types (ADTs) and first class procedures, yet building these structures in a manner suitable for a GPU remains a challenge. We present a region-based memory management approach that enables rich data structures in Harlan, a language for data parallel computing. Regions enable rich data structures by providing a uniform representation for pointers on both the CPU and GPU and by providing a means of transferring entire data structures between CPU and GPU memory. We demonstrate Harlan's increased expressiveness on several example programs and show that Harlan performs well on more traditional data-parallel problems.

Supplementary Material

SHA256 File (oopsla190.sha256)

Download
.06 KB

ZIP File (oopsla190.zip)

Download
3194.65 MB

References

[1]

Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: Expressing locality and independence with logical regions. In: 2012 International Conference for High Performance Computing, Networking, Storage and Analysis (SC) (2012).

Digital Library

[2]

Blelloch, G. E., Chatterjee, S., Hardwick, J. C., Sipelstein, J., Zagha, M.: Implementation of a portable nested dataparallel language. Journal of Parallel and Distributed Computing 21(1), 4--14 (Apr 1994).

Digital Library

[3]

Bocchino, Jr., R. L., Adve, V. S., Adve, S. V., Snir, M.: Parallel programming must be deterministic by default. In: Proceedings of the First USENIX conference on Hot topics in parallelism. USENIX Association (2009).

Digital Library

[4]

Catanzaro, B.C., Garland, M., Keutzer, K.: Copperhead: compiling an embedded data parallel language. In: PPOPP. pp. 47--56 (2011).

Digital Library

[5]

Chafi, H., Sujeeth, A. K., Brown, K. J., Lee, H., Atreya, A. R., Olukotun, K.: A domain-specific approach to heterogeneous parallelism. In: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming. ACM (2011).

Digital Library

[6]

Chakravarty, M. M., Keller, G., Lee, S., McDonell, T. L., Grover, V.: Accelerating Haskell array codes with multicore GPUs. In: Proceedings of the sixth workshop on Declarative aspects of multicore programming. pp. 3--14. DAMP '11, ACM, New York, NY, USA (2011).

Digital Library

[7]

Chakravarty, M., Keller, G., Lee, S., McDonell, T., Grover, V.: Accelerating Haskell array codes with multicore GPUs. In: Proceedings of the sixth workshop on Declarative aspects of multicore programming. pp. 3--14. ACM (2011).

Digital Library

[8]

Collins, A., Grewe, D., Grover, V., Lee, S., Susnea, A.: NOVA: A functional language for data parallelism. Tech. Rep. NVR-2013-001, NVIDIA (July 2013).

[9]

Cooper, K. D., Torczon, L.: Engineering a Compiler. Elsevier Science (October 2003).

[10]

Crary, K., Weirich, S., Morrisett, G.: Intensional polymorphism in type-erasure semantics. In: Proceedings of the third ACM SIGPLAN international conference on Functional programming. ACM (1998).

Digital Library

[11]

Cunningham, D., Bordawekar, R., Saraswat, V.: Gpu programming in a high level language: Compiling x10 to cuda. In: Proceedings of the 2011 ACM SIGPLAN X10 Workshop. pp. 8:1--8:10. X10 '11, ACM, New York, NY, USA (2011).

Digital Library

[12]

Gal, A., Eich, B., Shaver, M., Anderson, D., Mandelin, D., Haghighat, M. R., Kaplan, B., Hoare, G., Zbarsky, B., Orendorff, J., Ruderman, J., Smith, E. W., Reitmaier, R., Bebenita, M., Chang, M., Franz, M.: Trace-based just-in-time type specialization for dynamic languages. In: Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM (2009).

Digital Library

[13]

Grossman, D., Morrisett, G., Jim, T., Hicks, M.,Wang, Y., Cheney, J.: Region-based memory management in Cyclone. In: Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation. ACM (2002).

Digital Library

[14]

Holk, E., Byrd, W., Mahajan, N., Willcock, J., Chauhan, A., Lumsdaine, A.: Declarative parallel programming for GPUs. In: Proceedings of the International Conference on Parallel Computing (ParCo) (Sep 2011).

[15]

Holk, E., Pathirage, M., Chauhan, A., Lumsdaine, A., Matsakis, N. D.: GPU programming in Rust: Implementing high-level abstractions in a systems-level language. In: Proceedings of the 18th International Workshop on High-Level Parallel Programming Models and Supportive Environments (May 2013).

Digital Library

[16]

Jablin, T. B., Prabhu, P., Jablin, J. A., Johnson, N. P., Beard, S. R., August, D. I.: Automatic cpu-gpu communication management and optimization. In: Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation. ACM (2011).

Digital Library

[17]

Ji, F., Lin, H., Ma, X.: Rsvm: A region-based software virtual memory for gpu. In: Proceedings of the 22Nd International Conference on Parallel Architectures and Compilation Techniques. pp. 269--278. PACT '13, IEEE Press, Piscataway, NJ, USA (2013).

Digital Library

[18]

Johnsson, T.: Lambda lifting: Transforming programs to recursive equations. In: Functional programming languages and computer architecture. pp. 190--203. Springer (1985).

Digital Library

[19]

Khronos OpenCLWorking Group: The OpenCL Specification (Nov 2012).

[20]

NVIDIA: CUDA C Programming Guide (Oct 2012).

[21]

Parker, S. G., Bigler, J., Dietrich, A., Friedrich, H., Hoberock, J., Luebke, D., McAllister, D., McGuire, M., Morley, K., Robison, A., Stich, M.: OptiX: a general purpose ray tracing engine. In: ACM SIGGRAPH 2010 papers. ACM (2010).

Digital Library

[22]

Prabhu, T., Ramalingam, S., Might, M., Hall, M.: EigenCFA: accelerating flow analysis with GPUs. In: Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. ACM (2011).

Digital Library

[23]

Reynolds, J. C.: Definitional interpreters for higher-order programming languages. In: Proceedings of the ACM Annual Conference - Volume 2. ACM (1972).

Digital Library

[24]

The Rust programming language. http://www.rust-lang.org/

[25]

Ryoo, S., Rodrigues, C. I., Baghsorkhi, S. S., Stone, S. S., Kirk, D. B., Hwu, W.m. W.: Optimization principles and application performance evaluation of a multithreaded gpu using cuda. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming. ACM (2008).

Digital Library

[26]

Sarkar, D.,Waddell, O., Dybvig, R. K.: A nanopass infrastructure for compiler education. In: Proceedings of the ninth ACM SIGPLAN international conference on Functional programming. ACM (2004).

Digital Library

[27]

Tarjan, R.: Depth-first search and linear graph algorithms. SIAM Journal on Computing 1(2), 146--160 (1972).

[28]

Tofte, M., Talpin, J. P.: Region-based memory management. Information and Computation 132(2) (1997).

Digital Library

[29]

Yang, K., He, B., Luo, Q., Sander, P. V., Shi, J.: Stack-based parallel recursion on graphics processors. In: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming. ACM (2009).

Digital Library

Cited By

Bychkov ANikolskiy V(2022)Rust Language for GPU ProgrammingSupercomputing10.1007/978-3-031-22941-1_38(522-532)Online publication date: 26-Sep-2022
https://dl.acm.org/doi/10.1007/978-3-031-22941-1_38
Henriksen T(2021)Bounds Checking on GPUInternational Journal of Parallel Programming10.1007/s10766-021-00703-449:6(761-775)Online publication date: 1-Dec-2021
https://dl.acm.org/doi/10.1007/s10766-021-00703-4
Hovgaard AHenriksen TElsman M(2019)High-Performance Defunctionalisation in FutharkZivilgesellschaft und Wohlfahrtsstaat im Wandel10.1007/978-3-030-18506-0_7(136-156)Online publication date: 24-Apr-2019
https://doi.org/10.1007/978-3-030-18506-0_7
Show More Cited By

Index Terms

Region-based memory management for GPU programming languages: enabling rich data structures on a spartan host
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments
    2. General programming languages
      1. Language types
        Concurrent programming languages
        Distributed programming languages
        Functional languages
        Parallel programming languages
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Garbage collection

Recommendations

Region-based memory management for GPU programming languages: enabling rich data structures on a spartan host
OOPSLA '14: Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications

Graphics processing units (GPUs) can effectively accelerate many applications, but their applicability has been largely limited to problems whose solutions can be expressed neatly in terms of linear algebra. Indeed, most GPU programming languages limit ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Performance Evaluation and Improvements of the PoCL Open-Source OpenCL Implementation on Intel CPUs
IWOCL '21: Proceedings of the 9th International Workshop on OpenCL

The Portable Computing Language (PoCL) is a vendor independent open-source OpenCL implementation that aims to support a variety of compute devices in a single platform. Evaluating PoCL versus the Intel OpenCL implementation reveals significant ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 49, Issue 10

OOPSLA '14

October 2014

907 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/2714064

Editor:
Andy Gill
University of Kansas, Lawrence, KS

Issue’s Table of Contents

OOPSLA '14: Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications
October 2014
946 pages
ISBN:9781450325851
DOI:10.1145/2660193
General Chair:
Andrew Black
Portland State University, USA
,
Program Chair:
Todd Millstein
University of California, Los Angeles, USA

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2014

Published in SIGPLAN Volume 49, Issue 10

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
390
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bychkov ANikolskiy V(2022)Rust Language for GPU ProgrammingSupercomputing10.1007/978-3-031-22941-1_38(522-532)Online publication date: 26-Sep-2022
https://dl.acm.org/doi/10.1007/978-3-031-22941-1_38
Henriksen T(2021)Bounds Checking on GPUInternational Journal of Parallel Programming10.1007/s10766-021-00703-449:6(761-775)Online publication date: 1-Dec-2021
https://dl.acm.org/doi/10.1007/s10766-021-00703-4
Hovgaard AHenriksen TElsman M(2019)High-Performance Defunctionalisation in FutharkZivilgesellschaft und Wohlfahrtsstaat im Wandel10.1007/978-3-030-18506-0_7(136-156)Online publication date: 24-Apr-2019
https://doi.org/10.1007/978-3-030-18506-0_7
Springer MWauligmann PMasuhara HElsman MGrelck CKloeckner APadua DSolomonik E(2017)Modular array-based GPU computing in a dynamically-typed languageProceedings of the 4th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming10.1145/3091966.3091974(48-55)Online publication date: 18-Jun-2017
https://dl.acm.org/doi/10.1145/3091966.3091974
Shahar SBergman SSilberstein MMin SLoh G(2016)ActivePointersProceedings of the 43rd International Symposium on Computer Architecture10.1109/ISCA.2016.58(596-608)Online publication date: 18-Jun-2016
https://dl.acm.org/doi/10.1109/ISCA.2016.58
Paudyal PLee JBanerjee AGupta S(2019)A Comparison of Techniques for Sign Language Alphabet Recognition Using Armband WearablesACM Transactions on Interactive Intelligent Systems10.1145/31509749:2-3(1-26)Online publication date: 27-Mar-2019
https://dl.acm.org/doi/10.1145/3150974
Sciascio CStrohmaier DErrecalde MVeas E(2019)Interactive Quality Analytics of User-generated ContentACM Transactions on Interactive Intelligent Systems10.1145/31509739:2-3(1-42)Online publication date: 27-Mar-2019
https://dl.acm.org/doi/10.1145/3150973
Goodarzi BKhorasani FSarkar VGoswami D(2019)High Performance Multilevel Graph Partitioning on GPU2019 International Conference on High Performance Computing & Simulation (HPCS)10.1109/HPCS48598.2019.9188120(769-778)Online publication date: Jul-2019
https://doi.org/10.1109/HPCS48598.2019.9188120
Hovgaard AHenriksen TElsman M(2019)High-Performance Defunctionalisation in FutharkZivilgesellschaft und Wohlfahrtsstaat im Wandel10.1007/978-3-030-18506-0_7(136-156)Online publication date: 24-Apr-2019
https://doi.org/10.1007/978-3-030-18506-0_7
Shahar SBergman SSilberstein M(2018)ActivePointersACM SIGOPS Operating Systems Review10.1145/3273982.327399052:1(84-95)Online publication date: 28-Aug-2018
https://dl.acm.org/doi/10.1145/3273982.3273990
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents