tutorial

Just-In-Time GPU Compilation for Interpreted Languages with Partial Evaluation

Authors:

Michel Steuwer,

Christophe DubachAuthors Info & Claims

ACM SIGPLAN Notices, Volume 52, Issue 7

Pages 60 - 73

https://doi.org/10.1145/3140607.3050761

Published: 08 April 2017 Publication History

Abstract

Computer systems are increasingly featuring powerful parallel devices with the advent of many-core CPUs and GPUs. This offers the opportunity to solve computationally-intensive problems at a fraction of the time traditional CPUs need. However, exploiting heterogeneous hardware requires the use of low-level programming language approaches such as OpenCL, which is incredibly challenging, even for advanced programmers.

On the application side, interpreted dynamic languages are increasingly becoming popular in many domains due to their simplicity, expressiveness and flexibility. However, this creates a wide gap between the high-level abstractions offered to programmers and the low-level hardware-specific interface. Currently, programmers must rely on high performance libraries or they are forced to write parts of their application in a low-level language like OpenCL. Ideally, nonexpert programmers should be able to exploit heterogeneous hardware directly from their interpreted dynamic languages.

In this paper, we present a technique to transparently and automatically offload computations from interpreted dynamic languages to heterogeneous devices. Using just-in-time compilation, we automatically generate OpenCL code at runtime which is specialized to the actual observed data types using profiling information. We demonstrate our technique using R, which is a popular interpreted dynamic language predominately used in big data analytic. Our experimental results show the execution on a GPU yields speedups of over 150x compared to the sequential FastR implementation and the obtained performance is competitive with manually written GPU code. We also show that when taking into account start-up time, large speedups are achievable, even when the applications run for as little as a few seconds.

References

[1]

S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. H. Lee, and K. Skadron. Rodinia: A Benchmark Suite for Heterogeneous Computing. IISWC 2009.

[2]

G. Duboscq, T. Würthinger, L. Stadler, C. Wimmer, D. Simon, and H. Mössenböck. Graal IR: An Intermediate Representation for Speculative Optimizations in a Dynamic Compiler. VMIL 2013.

Digital Library

[3]

J. J. Fumero, T. Remmelg, M. Steuwer, and C. Dubach. Runtime Code Generation and Data Management for Heterogeneous Computing in Java. PPPJ 2015.

Digital Library

[4]

J. J. Fumero, M. Steuwer, and C. Dubach. A Composable Array Function Interface for Heterogeneous Computing in Java. ARRAY, 2014.

Digital Library

[5]

Y. Futamura. Partial Evaluation of Computation Process--An Approach to a Compiler-Compiler. Higher-Order and Symbolic Computation, 1999.

[6]

A. Gal, C. W. Probst, and M. Franz. HotpathVM: An Effective JIT Compiler for Resource-constrained Devices. VEE 2006.

Digital Library

[7]

U. Hölzle, C. Chambers, and D. Ungar. Debugging optimized code with dynamic deoptimization. PLDI 1992.

Digital Library

[8]

K. Ishizaki, A. Hayashi, G. Koblents, and V. Sarkar. Compiling and optimizing java 8 programs for gpu execution. In PACT, 2015.

Digital Library

[9]

T. Kalibera, P. Maj, F. Morandat, and J. Vitek. A Fast Abstract Syntax Tree Interpreter for R. VEE 2014.

Digital Library

[10]

M.-J. Kallen and H. Mühleisen. Latest developments around renjin. Talk at R Summit & Workshop, Copenhagen, 2015.

[11]

M. N. Kedlaya, B. Robatmili, C. Caşcaval, and B. Hardekopf. Deoptimization for Dynamic Language JITs on Typed, Stack-based Virtual Machines. VEE 2014.

Digital Library

[12]

T. Kotzmann, C. Wimmer, H. Mössenböck, T. Rodriguez, K. Russell, and D. Cox. Design of the Java HotSpot&Trade; Client Compiler for Java 6. ACM Trans. Archit. Code Optim.

[13]

S. K. Lam, A. Pitrou, and S. Seibert. Numba: A LLVM-based Python JIT Compiler. LLVM 2015.

Digital Library

[14]

M. Paleczny, C. Vick, and C. Click. The java hotspottm server compiler. JVM' 2001.

[15]

U. Pitambare, A. Chauhan, and S. Malviya. Just-in-time Acceleration of JavaScript. In Technical Report, School of Informatics and Computing, Indiana University, 2013.

[16]

P. C. Pratt-Szeliga, J. W. Fawcett, and R. D. Welch. Rootbeer: Seamlessly Using GPUs from Java. HPCC-ICESS, 2012.

Digital Library

[17]

K. Rupp. GPU-Accelerated Non-negative Matrix Factorization for Text Mining. page 77, 2012.

[18]

L. Stadler, A. Welc, C. Humer, and M. Jordan. Optimizing R Language Execution via Aggressive Speculation. DLS 2016.

Digital Library

[19]

L. Stadler, T. Würthinger, and H. Mössenböck. Partial escape analysis and scalar replacement for Java. In CGO, 2014.

Digital Library

[20]

J. Talbot, Z. DeVito, and P. Hanrahan. Riposte: A Trace-driven Compiler and Parallel VM for Vector Code in R. PACT '12, 2012.

Digital Library

[21]

H. Wang, D. Padua, and P. Wu. Vectorization of Apply to Reduce Interpretation Overhead of R. OOPSLA 2015.

Digital Library

[22]

H. Wang, P. Wu, and D. Padua. Optimizing R VM: Allocation Removal and Path Length Reduction via Interpreter-level Specialization. CGO 2014.

Digital Library

[23]

T. Würthinger, C. Wimmer, A. Wöß, L. Stadler, G. Duboscq, C. Humer, G. Richards, D. Simon, and M. Wolczko. One VM to Rule Them All. Onward! 2013.

Digital Library

[24]

W. Zaremba, Y. Lin, and V. Grover. JaBEE: Framework for Object-oriented Java Bytecode Compilation and Execution on Graphics Processor Units. GPGPU-5, 2012.

Cited By

Li GLiu LFeng XKandemir MJimborean AMoseley T(2019)Accelerating GPU computing at runtime with binary optimizationProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314911(276-277)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.5555/3314872.3314911
Jacob DTrinder PSinger JMarr SFumero J(2019)Python programmers have GPUs too: automatic Python loop parallelization with staged dependence analysisProceedings of the 15th ACM SIGPLAN International Symposium on Dynamic Languages10.1145/3359619.3359743(42-54)Online publication date: 20-Oct-2019
https://dl.acm.org/doi/10.1145/3359619.3359743
Jacob DSinger JGibbons J(2019)ALPyNA: acceleration of loops in Python for novel architecturesProceedings of the 6th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming10.1145/3315454.3329956(25-34)Online publication date: 8-Jun-2019
https://dl.acm.org/doi/10.1145/3315454.3329956
Show More Cited By

Recommendations

Just-In-Time GPU Compilation for Interpreted Languages with Partial Evaluation
VEE '17: Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments

Computer systems are increasingly featuring powerful parallel devices with the advent of many-core CPUs and GPUs. This offers the opportunity to solve computationally-intensive problems at a fraction of the time traditional CPUs need. However, ...
Accelerating Interpreted Programming Languages on Gpus with Just-In-Time Compilation and Runtime Optimisations
Remote Just-in-Time Compilation for Dynamic Languages
SPLASH 2023: Companion Proceedings of the 2023 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity

Cloud platforms allow applications to meet fluctuating levels of demand through automatic horizontal scaling. These deployment models are characterized by short-lived applications running in resource-constrained environments. This amplifies the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 52, Issue 7

VEE '17

July 2017

256 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/3140607

Editor:
Matthew Fluet

Issue’s Table of Contents

VEE '17: Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments
April 2017
261 pages
ISBN:9781450349482
DOI:10.1145/3050748

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 April 2017

Published in SIGPLAN Volume 52, Issue 7

Check for updates

Qualifiers

Tutorial
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
636
Total Downloads

Downloads (Last 12 months)52
Downloads (Last 6 weeks)7

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li GLiu LFeng XKandemir MJimborean AMoseley T(2019)Accelerating GPU computing at runtime with binary optimizationProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314911(276-277)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.5555/3314872.3314911
Jacob DTrinder PSinger JMarr SFumero J(2019)Python programmers have GPUs too: automatic Python loop parallelization with staged dependence analysisProceedings of the 15th ACM SIGPLAN International Symposium on Dynamic Languages10.1145/3359619.3359743(42-54)Online publication date: 20-Oct-2019
https://dl.acm.org/doi/10.1145/3359619.3359743
Jacob DSinger JGibbons J(2019)ALPyNA: acceleration of loops in Python for novel architecturesProceedings of the 6th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming10.1145/3315454.3329956(25-34)Online publication date: 8-Jun-2019
https://dl.acm.org/doi/10.1145/3315454.3329956
Springer MWauligmann PMasuhara HElsman MGrelck CKloeckner APadua DSolomonik E(2017)Modular array-based GPU computing in a dynamically-typed languageProceedings of the 4th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming10.1145/3091966.3091974(48-55)Online publication date: 18-Jun-2017
https://dl.acm.org/doi/10.1145/3091966.3091974
Fumero JBlanaru FStratikopoulos ADohrmann SViswanathan SKotselidis CBruno RMoss E(2023)Unified Shared Memory: Friend or Foe? Understanding the Implications of Unified Memory on Managed HeapsProceedings of the 20th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes10.1145/3617651.3622984(143-157)Online publication date: 19-Oct-2023
https://dl.acm.org/doi/10.1145/3617651.3622984
Stratikopoulos ABlanaru FFumero JXekalaki MPapadakis OKotselidis C(2023)Cross-Language Interoperability of Heterogeneous CodeCompanion Proceedings of the 7th International Conference on the Art, Science, and Engineering of Programming10.1145/3594671.3594675(17-21)Online publication date: 13-Mar-2023
https://dl.acm.org/doi/10.1145/3594671.3594675
Blanaru FStratikopoulos AFumero JKotselidis CCriswell JWilliams DXia Y(2022)Enabling pipeline parallelism in heterogeneous managed runtime environments via batch processingProceedings of the 18th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3516807.3516821(58-71)Online publication date: 25-Feb-2022
https://dl.acm.org/doi/10.1145/3516807.3516821
Papadimitriou MFumero JStratikopoulos AKotselidis CTitzer BXu HZhang I(2021)Automatically exploiting the memory hierarchy of GPUs through just-in-time compilationProceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3453933.3454014(57-70)Online publication date: 7-Apr-2021
https://dl.acm.org/doi/10.1145/3453933.3454014
Henriksen T(2021)Bounds Checking on GPUInternational Journal of Parallel Programming10.1007/s10766-021-00703-4Online publication date: 25-Mar-2021
https://doi.org/10.1007/s10766-021-00703-4
Morton JKaszyk KLi LSun JDubach CSteuwer MCole MO'Boyle M(2020)DelayRepay: delayed execution for kernel fusion in PythonProceedings of the 16th ACM SIGPLAN International Symposium on Dynamic Languages10.1145/3426422.3426980(43-56)Online publication date: 17-Nov-2020
https://dl.acm.org/doi/10.1145/3426422.3426980
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents