Nothing Special   »   [go: up one dir, main page]

skip to main content
tutorial

Just-In-Time GPU Compilation for Interpreted Languages with Partial Evaluation

Published: 08 April 2017 Publication History

Abstract

Computer systems are increasingly featuring powerful parallel devices with the advent of many-core CPUs and GPUs. This offers the opportunity to solve computationally-intensive problems at a fraction of the time traditional CPUs need. However, exploiting heterogeneous hardware requires the use of low-level programming language approaches such as OpenCL, which is incredibly challenging, even for advanced programmers.
On the application side, interpreted dynamic languages are increasingly becoming popular in many domains due to their simplicity, expressiveness and flexibility. However, this creates a wide gap between the high-level abstractions offered to programmers and the low-level hardware-specific interface. Currently, programmers must rely on high performance libraries or they are forced to write parts of their application in a low-level language like OpenCL. Ideally, nonexpert programmers should be able to exploit heterogeneous hardware directly from their interpreted dynamic languages.
In this paper, we present a technique to transparently and automatically offload computations from interpreted dynamic languages to heterogeneous devices. Using just-in-time compilation, we automatically generate OpenCL code at runtime which is specialized to the actual observed data types using profiling information. We demonstrate our technique using R, which is a popular interpreted dynamic language predominately used in big data analytic. Our experimental results show the execution on a GPU yields speedups of over 150x compared to the sequential FastR implementation and the obtained performance is competitive with manually written GPU code. We also show that when taking into account start-up time, large speedups are achievable, even when the applications run for as little as a few seconds.

References

[1]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. H. Lee, and K. Skadron. Rodinia: A Benchmark Suite for Heterogeneous Computing. IISWC 2009.
[2]
G. Duboscq, T. Würthinger, L. Stadler, C. Wimmer, D. Simon, and H. Mössenböck. Graal IR: An Intermediate Representation for Speculative Optimizations in a Dynamic Compiler. VMIL 2013.
[3]
J. J. Fumero, T. Remmelg, M. Steuwer, and C. Dubach. Runtime Code Generation and Data Management for Heterogeneous Computing in Java. PPPJ 2015.
[4]
J. J. Fumero, M. Steuwer, and C. Dubach. A Composable Array Function Interface for Heterogeneous Computing in Java. ARRAY, 2014.
[5]
Y. Futamura. Partial Evaluation of Computation Process--An Approach to a Compiler-Compiler. Higher-Order and Symbolic Computation, 1999.
[6]
A. Gal, C. W. Probst, and M. Franz. HotpathVM: An Effective JIT Compiler for Resource-constrained Devices. VEE 2006.
[7]
U. Hölzle, C. Chambers, and D. Ungar. Debugging optimized code with dynamic deoptimization. PLDI 1992.
[8]
K. Ishizaki, A. Hayashi, G. Koblents, and V. Sarkar. Compiling and optimizing java 8 programs for gpu execution. In PACT, 2015.
[9]
T. Kalibera, P. Maj, F. Morandat, and J. Vitek. A Fast Abstract Syntax Tree Interpreter for R. VEE 2014.
[10]
M.-J. Kallen and H. Mühleisen. Latest developments around renjin. Talk at R Summit & Workshop, Copenhagen, 2015.
[11]
M. N. Kedlaya, B. Robatmili, C. Caşcaval, and B. Hardekopf. Deoptimization for Dynamic Language JITs on Typed, Stack-based Virtual Machines. VEE 2014.
[12]
T. Kotzmann, C. Wimmer, H. Mössenböck, T. Rodriguez, K. Russell, and D. Cox. Design of the Java HotSpot&Trade; Client Compiler for Java 6. ACM Trans. Archit. Code Optim.
[13]
S. K. Lam, A. Pitrou, and S. Seibert. Numba: A LLVM-based Python JIT Compiler. LLVM 2015.
[14]
M. Paleczny, C. Vick, and C. Click. The java hotspottm server compiler. JVM' 2001.
[15]
U. Pitambare, A. Chauhan, and S. Malviya. Just-in-time Acceleration of JavaScript. In Technical Report, School of Informatics and Computing, Indiana University, 2013.
[16]
P. C. Pratt-Szeliga, J. W. Fawcett, and R. D. Welch. Rootbeer: Seamlessly Using GPUs from Java. HPCC-ICESS, 2012.
[17]
K. Rupp. GPU-Accelerated Non-negative Matrix Factorization for Text Mining. page 77, 2012.
[18]
L. Stadler, A. Welc, C. Humer, and M. Jordan. Optimizing R Language Execution via Aggressive Speculation. DLS 2016.
[19]
L. Stadler, T. Würthinger, and H. Mössenböck. Partial escape analysis and scalar replacement for Java. In CGO, 2014.
[20]
J. Talbot, Z. DeVito, and P. Hanrahan. Riposte: A Trace-driven Compiler and Parallel VM for Vector Code in R. PACT '12, 2012.
[21]
H. Wang, D. Padua, and P. Wu. Vectorization of Apply to Reduce Interpretation Overhead of R. OOPSLA 2015.
[22]
H. Wang, P. Wu, and D. Padua. Optimizing R VM: Allocation Removal and Path Length Reduction via Interpreter-level Specialization. CGO 2014.
[23]
T. Würthinger, C. Wimmer, A. Wöß, L. Stadler, G. Duboscq, C. Humer, G. Richards, D. Simon, and M. Wolczko. One VM to Rule Them All. Onward! 2013.
[24]
W. Zaremba, Y. Lin, and V. Grover. JaBEE: Framework for Object-oriented Java Bytecode Compilation and Execution on Graphics Processor Units. GPGPU-5, 2012.

Cited By

View all
  • (2019)Accelerating GPU computing at runtime with binary optimizationProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314911(276-277)Online publication date: 16-Feb-2019
  • (2019)Python programmers have GPUs too: automatic Python loop parallelization with staged dependence analysisProceedings of the 15th ACM SIGPLAN International Symposium on Dynamic Languages10.1145/3359619.3359743(42-54)Online publication date: 20-Oct-2019
  • (2019)ALPyNA: acceleration of loops in Python for novel architecturesProceedings of the 6th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming10.1145/3315454.3329956(25-34)Online publication date: 8-Jun-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 52, Issue 7
VEE '17
July 2017
256 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3140607
Issue’s Table of Contents
  • cover image ACM Conferences
    VEE '17: Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments
    April 2017
    261 pages
    ISBN:9781450349482
    DOI:10.1145/3050748
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 April 2017
Published in SIGPLAN Volume 52, Issue 7

Check for updates

Qualifiers

  • Tutorial
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)52
  • Downloads (Last 6 weeks)7
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Accelerating GPU computing at runtime with binary optimizationProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314911(276-277)Online publication date: 16-Feb-2019
  • (2019)Python programmers have GPUs too: automatic Python loop parallelization with staged dependence analysisProceedings of the 15th ACM SIGPLAN International Symposium on Dynamic Languages10.1145/3359619.3359743(42-54)Online publication date: 20-Oct-2019
  • (2019)ALPyNA: acceleration of loops in Python for novel architecturesProceedings of the 6th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming10.1145/3315454.3329956(25-34)Online publication date: 8-Jun-2019
  • (2017)Modular array-based GPU computing in a dynamically-typed languageProceedings of the 4th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming10.1145/3091966.3091974(48-55)Online publication date: 18-Jun-2017
  • (2023)Unified Shared Memory: Friend or Foe? Understanding the Implications of Unified Memory on Managed HeapsProceedings of the 20th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes10.1145/3617651.3622984(143-157)Online publication date: 19-Oct-2023
  • (2023)Cross-Language Interoperability of Heterogeneous CodeCompanion Proceedings of the 7th International Conference on the Art, Science, and Engineering of Programming10.1145/3594671.3594675(17-21)Online publication date: 13-Mar-2023
  • (2022)Enabling pipeline parallelism in heterogeneous managed runtime environments via batch processingProceedings of the 18th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3516807.3516821(58-71)Online publication date: 25-Feb-2022
  • (2021)Automatically exploiting the memory hierarchy of GPUs through just-in-time compilationProceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3453933.3454014(57-70)Online publication date: 7-Apr-2021
  • (2021)Bounds Checking on GPUInternational Journal of Parallel Programming10.1007/s10766-021-00703-4Online publication date: 25-Mar-2021
  • (2020)DelayRepay: delayed execution for kernel fusion in PythonProceedings of the 16th ACM SIGPLAN International Symposium on Dynamic Languages10.1145/3426422.3426980(43-56)Online publication date: 17-Nov-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media