Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Region-based memory management for GPU programming languages: enabling rich data structures on a spartan host

Published: 15 October 2014 Publication History

Abstract

Graphics processing units (GPUs) can effectively accelerate many applications, but their applicability has been largely limited to problems whose solutions can be expressed neatly in terms of linear algebra. Indeed, most GPU programming languages limit the user to simple data structures - typically only multidimensional rectangular arrays of scalar values. Many algorithms are more naturally expressed using higher level language features, such as algebraic data types (ADTs) and first class procedures, yet building these structures in a manner suitable for a GPU remains a challenge. We present a region-based memory management approach that enables rich data structures in Harlan, a language for data parallel computing. Regions enable rich data structures by providing a uniform representation for pointers on both the CPU and GPU and by providing a means of transferring entire data structures between CPU and GPU memory. We demonstrate Harlan's increased expressiveness on several example programs and show that Harlan performs well on more traditional data-parallel problems.

Supplementary Material

SHA256 File (oopsla190.sha256)
ZIP File (oopsla190.zip)

References

[1]
Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: Expressing locality and independence with logical regions. In: 2012 International Conference for High Performance Computing, Networking, Storage and Analysis (SC) (2012).
[2]
Blelloch, G. E., Chatterjee, S., Hardwick, J. C., Sipelstein, J., Zagha, M.: Implementation of a portable nested dataparallel language. Journal of Parallel and Distributed Computing 21(1), 4--14 (Apr 1994).
[3]
Bocchino, Jr., R. L., Adve, V. S., Adve, S. V., Snir, M.: Parallel programming must be deterministic by default. In: Proceedings of the First USENIX conference on Hot topics in parallelism. USENIX Association (2009).
[4]
Catanzaro, B.C., Garland, M., Keutzer, K.: Copperhead: compiling an embedded data parallel language. In: PPOPP. pp. 47--56 (2011).
[5]
Chafi, H., Sujeeth, A. K., Brown, K. J., Lee, H., Atreya, A. R., Olukotun, K.: A domain-specific approach to heterogeneous parallelism. In: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming. ACM (2011).
[6]
Chakravarty, M. M., Keller, G., Lee, S., McDonell, T. L., Grover, V.: Accelerating Haskell array codes with multicore GPUs. In: Proceedings of the sixth workshop on Declarative aspects of multicore programming. pp. 3--14. DAMP '11, ACM, New York, NY, USA (2011).
[7]
Chakravarty, M., Keller, G., Lee, S., McDonell, T., Grover, V.: Accelerating Haskell array codes with multicore GPUs. In: Proceedings of the sixth workshop on Declarative aspects of multicore programming. pp. 3--14. ACM (2011).
[8]
Collins, A., Grewe, D., Grover, V., Lee, S., Susnea, A.: NOVA: A functional language for data parallelism. Tech. Rep. NVR-2013-001, NVIDIA (July 2013).
[9]
Cooper, K. D., Torczon, L.: Engineering a Compiler. Elsevier Science (October 2003).
[10]
Crary, K., Weirich, S., Morrisett, G.: Intensional polymorphism in type-erasure semantics. In: Proceedings of the third ACM SIGPLAN international conference on Functional programming. ACM (1998).
[11]
Cunningham, D., Bordawekar, R., Saraswat, V.: Gpu programming in a high level language: Compiling x10 to cuda. In: Proceedings of the 2011 ACM SIGPLAN X10 Workshop. pp. 8:1--8:10. X10 '11, ACM, New York, NY, USA (2011).
[12]
Gal, A., Eich, B., Shaver, M., Anderson, D., Mandelin, D., Haghighat, M. R., Kaplan, B., Hoare, G., Zbarsky, B., Orendorff, J., Ruderman, J., Smith, E. W., Reitmaier, R., Bebenita, M., Chang, M., Franz, M.: Trace-based just-in-time type specialization for dynamic languages. In: Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM (2009).
[13]
Grossman, D., Morrisett, G., Jim, T., Hicks, M.,Wang, Y., Cheney, J.: Region-based memory management in Cyclone. In: Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation. ACM (2002).
[14]
Holk, E., Byrd, W., Mahajan, N., Willcock, J., Chauhan, A., Lumsdaine, A.: Declarative parallel programming for GPUs. In: Proceedings of the International Conference on Parallel Computing (ParCo) (Sep 2011).
[15]
Holk, E., Pathirage, M., Chauhan, A., Lumsdaine, A., Matsakis, N. D.: GPU programming in Rust: Implementing high-level abstractions in a systems-level language. In: Proceedings of the 18th International Workshop on High-Level Parallel Programming Models and Supportive Environments (May 2013).
[16]
Jablin, T. B., Prabhu, P., Jablin, J. A., Johnson, N. P., Beard, S. R., August, D. I.: Automatic cpu-gpu communication management and optimization. In: Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation. ACM (2011).
[17]
Ji, F., Lin, H., Ma, X.: Rsvm: A region-based software virtual memory for gpu. In: Proceedings of the 22Nd International Conference on Parallel Architectures and Compilation Techniques. pp. 269--278. PACT '13, IEEE Press, Piscataway, NJ, USA (2013).
[18]
Johnsson, T.: Lambda lifting: Transforming programs to recursive equations. In: Functional programming languages and computer architecture. pp. 190--203. Springer (1985).
[19]
Khronos OpenCLWorking Group: The OpenCL Specification (Nov 2012).
[20]
NVIDIA: CUDA C Programming Guide (Oct 2012).
[21]
Parker, S. G., Bigler, J., Dietrich, A., Friedrich, H., Hoberock, J., Luebke, D., McAllister, D., McGuire, M., Morley, K., Robison, A., Stich, M.: OptiX: a general purpose ray tracing engine. In: ACM SIGGRAPH 2010 papers. ACM (2010).
[22]
Prabhu, T., Ramalingam, S., Might, M., Hall, M.: EigenCFA: accelerating flow analysis with GPUs. In: Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. ACM (2011).
[23]
Reynolds, J. C.: Definitional interpreters for higher-order programming languages. In: Proceedings of the ACM Annual Conference - Volume 2. ACM (1972).
[24]
The Rust programming language. http://www.rust-lang.org/
[25]
Ryoo, S., Rodrigues, C. I., Baghsorkhi, S. S., Stone, S. S., Kirk, D. B., Hwu, W.m. W.: Optimization principles and application performance evaluation of a multithreaded gpu using cuda. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming. ACM (2008).
[26]
Sarkar, D.,Waddell, O., Dybvig, R. K.: A nanopass infrastructure for compiler education. In: Proceedings of the ninth ACM SIGPLAN international conference on Functional programming. ACM (2004).
[27]
Tarjan, R.: Depth-first search and linear graph algorithms. SIAM Journal on Computing 1(2), 146--160 (1972).
[28]
Tofte, M., Talpin, J. P.: Region-based memory management. Information and Computation 132(2) (1997).
[29]
Yang, K., He, B., Luo, Q., Sander, P. V., Shi, J.: Stack-based parallel recursion on graphics processors. In: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming. ACM (2009).

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 49, Issue 10
OOPSLA '14
October 2014
907 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2714064
  • Editor:
  • Andy Gill
Issue’s Table of Contents
  • cover image ACM Conferences
    OOPSLA '14: Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications
    October 2014
    946 pages
    ISBN:9781450325851
    DOI:10.1145/2660193
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2014
Published in SIGPLAN Volume 49, Issue 10

Check for updates

Author Tags

  1. algebraic data types
  2. compilers
  3. first class procedures
  4. gpu
  5. harlan
  6. implementation
  7. opencl
  8. optimization
  9. parallel programming
  10. performance
  11. recursion

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Rust Language for GPU ProgrammingSupercomputing10.1007/978-3-031-22941-1_38(522-532)Online publication date: 26-Sep-2022
  • (2021)Bounds Checking on GPUInternational Journal of Parallel Programming10.1007/s10766-021-00703-449:6(761-775)Online publication date: 1-Dec-2021
  • (2019)High-Performance Defunctionalisation in FutharkZivilgesellschaft und Wohlfahrtsstaat im Wandel10.1007/978-3-030-18506-0_7(136-156)Online publication date: 24-Apr-2019
  • (2017)Modular array-based GPU computing in a dynamically-typed languageProceedings of the 4th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming10.1145/3091966.3091974(48-55)Online publication date: 18-Jun-2017
  • (2016)ActivePointersProceedings of the 43rd International Symposium on Computer Architecture10.1109/ISCA.2016.58(596-608)Online publication date: 18-Jun-2016
  • (2019)A Comparison of Techniques for Sign Language Alphabet Recognition Using Armband WearablesACM Transactions on Interactive Intelligent Systems10.1145/31509749:2-3(1-26)Online publication date: 27-Mar-2019
  • (2019)Interactive Quality Analytics of User-generated ContentACM Transactions on Interactive Intelligent Systems10.1145/31509739:2-3(1-42)Online publication date: 27-Mar-2019
  • (2019)High Performance Multilevel Graph Partitioning on GPU2019 International Conference on High Performance Computing & Simulation (HPCS)10.1109/HPCS48598.2019.9188120(769-778)Online publication date: Jul-2019
  • (2019)High-Performance Defunctionalisation in FutharkZivilgesellschaft und Wohlfahrtsstaat im Wandel10.1007/978-3-030-18506-0_7(136-156)Online publication date: 24-Apr-2019
  • (2018)ActivePointersACM SIGOPS Operating Systems Review10.1145/3273982.327399052:1(84-95)Online publication date: 28-Aug-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media