research-article

Velociraptor: an embedded compiler toolkit for numerical programs targeting CPUs and GPUs

Authors:

Laurie HendrenAuthors Info & Claims

PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilation

Pages 317 - 330

https://doi.org/10.1145/2628071.2628097

Published: 24 August 2014 Publication History

Abstract

Developing just-in-time (JIT) compilers that that allow scientific programmers to efficiently target both CPUs and GPUs is of increasing interest. However building such compilers requires considerable effort. We present a reusable and embeddable compiler toolkit called Velociraptor that can be used to easily build compilers for numerical programs targeting multicores and GPUs.

Velociraptor provides a new high-level IR called VRIR which has been specifically designed for numeric computations, with rich support for arrays, plus support for high-level parallel and GPU constructs. A compiler developer uses Velociraptor by generating VRIR for key parts of an input program. Velociraptor provides an optimizing compiler toolkit for generating CPU and GPU code and also provides a smart runtime system to manage the GPU.

To demonstrate Velociraptor in action, we present two proof-of-concept case studies: a GPU extension for a JIT implementation of MATLAB language, and a JIT compiler for Python targeting CPUs and GPUs.

References

[1]

Advanced Micro Devices Inc. Aparapi. http://code.google.com/p/aparapi/.

[2]

C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier. StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par 2009, 23:187--198, Feb. 2011.

Digital Library

[3]

J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. Theano: a CPU and GPU math expression compiler. In SciPy 2010, June 2010.

[4]

J. Bezanson, S. Karpinski, V. B. Shah, and A. Edelman. Julia: A fast dynamic language for technical computing. CoRR, abs/1209.5145, 2012.

[5]

B. Catanzaro, M. Garland, and K. Keutzer. Copperhead: compiling an embedded data parallel language. In PPOPP 2011, pages 47--56, 2011.

Digital Library

[6]

M. Chevalier-Boisvert, L. Hendren, and C. Verbrugge. Optimizing MATLAB through just-in-time specialization. In CC 2010, pages 46--65, 2010.

Digital Library

[7]

A. Collins, D. Grewe, V. Grover, S. Lee, and A. Susnea. NOVA : A functional language for data parallelism. Technical report, Nvidia Research, 2013.

[8]

R. Garg and J. N. Amaral. Compiling Python to a hybrid execution environment. In GPGPU 2010, pages 19--30, 2010.

Digital Library

[9]

R. Garg and L. Hendren. Just-in-time shape inference for array-based languages. In ARRAY'14 workshop at PLDI 2014, 2014.

Digital Library

[10]

R. Garg and L. Hendren. A portable and high-performance general matrix-multiply (GEMM) library for GPUs and single-chip CPU/GPU systems. In Proceedings of 22nd Euromicro International Conference on Parallel, Distributed and network-based Processing, Special session on GPU computing, 2014.

Digital Library

[11]

A. Klöckner. PyCUDA. http://mathema.tician.de/software/pycuda.

[12]

A. Klöckner. PyOpenCL web page. http://mathema.tician.de/software/pyopencl.

[13]

MathWorks. MATLAB: The Language of Technical Computing.

[14]

N. T. V. Nguyen, F. Irigoin, C. Ancourt, and R. Keryell. Efficient intraprocedural array bound checking. Technical report, Ecole des Mines de Paris, 2000.

[15]

T. Oliphant. Numba Python bytecode to LLVM translator. In Proceedings of the Python for Scientific Computing Conference (SciPy), June 2012. Oral Presentation.

[16]

A. Prasad, J. Anantpur, and R. Govindarajan. Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors. In PLDI 2011, pages 152--163, 2011.

Digital Library

[17]

G. Pryor, B. Lucey, S. Maddipatla, C. McClanahan, J. Melonakos, V. Venugopalakrishnan, K. Patel, P. Yalamanchili, and J. Malcolm. High-level GPU computing with Jacket for MATLAB and C/C++. Proceedings of SPIE (online), 8060(806005), 2011.

[18]

Python.org. Python Programming Language: Official Website.

[19]

R-project.org. The R Project for Statistical Computing.

[20]

C. J. Rossbach, Y. Yu, J. Currey, J.-P. Martin, and D. Fetterly. Dandelion: a compiler and runtime for heterogeneous systems. In SOSP'13: The 24th ACM Symposium on Operating Systems Principles, 2013.

Digital Library

[21]

A. Rubinsteyn, E. Hielscher, N. Weinman, and D. Shasha. Parakeet: A just-in-time parallel accelerator for Python. In HotPar 12, 2012.

Digital Library

[22]

SciPy.org. NumPy: Scientific Computing Tools for Python.

[23]

D. S. Seljebotn. Fast numerical computations with Cython. In G. Varoquaux, S. van der Walt, and J. Millman, editors, Proceedings of the 8th Python in Science Conference, pages 15--22, Pasadena, CA USA, 2009.

[24]

L. Shure. Memory management for functions and variables. http://blogs.mathworks.com/loren/2006/05/10/memory-management-for-functions-and-variables/.

Cited By

Reis LBispo JCardoso J(2020)Compilation of MATLAB computations to CPU/GPU via C/OpenCL generationConcurrency and Computation: Practice and Experience10.1002/cpe.585432:22Online publication date: Jun-2020
https://doi.org/10.1002/cpe.5854
(2019)Performance evaluation of OpenMP's target construct on GPUs-exploring compiler optimisationsInternational Journal of High Performance Computing and Networking10.5555/3302714.330271813:1(54-69)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.5555/3302714.3302718
Shirako JHayashi ASarkar VWu PHack S(2017)Optimized two-level parallelization for GPU accelerators using the polyhedral modelProceedings of the 26th International Conference on Compiler Construction10.1145/3033019.3033022(22-33)Online publication date: 5-Feb-2017
https://dl.acm.org/doi/10.1145/3033019.3033022
Show More Cited By

Index Terms

Velociraptor: an embedded compiler toolkit for numerical programs targeting CPUs and GPUs
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Velociraptor: a compiler toolkit for array-based languages targeting CPUs and GPUs
ARRAY 2015: Proceedings of the 2nd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming

We present a toolkit called Velociraptor that can be used by compiler writers to quickly build compilers and other tools for array-based languages. Velociraptor operates on its own unique intermediate representation (IR) designed to support a variety ...
Just-in-time shape inference for array-based languages
ARRAY'14: Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming

In dynamic array-based languages, the most computationally intensive parts of the program often involve either explicit loops or vector operations. These loops and vector operations can be better optimized if the compiler has accurate information about ...
Matlab-like scripting for the java platform with the jLab environment

The jLab environment extends the potential of Java for scientific computing. It provides a Matlab/Scilab like scripting language that is executed by an interpreter implemented in the Java language. The jLab environment combines effectively Groovy like ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilation

August 2014

514 pages

ISBN:9781450328098

DOI:10.1145/2628071

General Chair:
J. Nelson Amaral
University of Alberta, Canada
,
Program Chair:
Josep Torrellas
University of Illinois, USA

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

IFIP WG 10.3: IFIP WG 10.3
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE CS TCPP: IEEE Computer Society Technical Committee on Parallel Processing
IEEE CS TCAA: IEEE CS technical committee on architectural acoustics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PACT '14

Sponsor:

IFIP WG 10.3
SIGARCH
IEEE CS TCPP
IEEE CS TCAA

PACT '14: International Conference on Parallel Architectures and Compilation

August 24 - 27, 2014

AB, Edmonton, Canada

Acceptance Rates

PACT '14 Paper Acceptance Rate 54 of 144 submissions, 38%;

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
280
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)2

Reflects downloads up to 29 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Reis LBispo JCardoso J(2020)Compilation of MATLAB computations to CPU/GPU via C/OpenCL generationConcurrency and Computation: Practice and Experience10.1002/cpe.585432:22Online publication date: Jun-2020
https://doi.org/10.1002/cpe.5854
(2019)Performance evaluation of OpenMP's target construct on GPUs-exploring compiler optimisationsInternational Journal of High Performance Computing and Networking10.5555/3302714.330271813:1(54-69)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.5555/3302714.3302718
Shirako JHayashi ASarkar VWu PHack S(2017)Optimized two-level parallelization for GPU accelerators using the polyhedral modelProceedings of the 26th International Conference on Compiler Construction10.1145/3033019.3033022(22-33)Online publication date: 5-Feb-2017
https://dl.acm.org/doi/10.1145/3033019.3033022
Clarkson JKotselidis CBrown GLuján M(2017)Boosting Java Performance Using GPGPUsArchitecture of Computing Systems - ARCS 201710.1007/978-3-319-54999-6_5(59-70)Online publication date: 4-Mar-2017
https://doi.org/10.1007/978-3-319-54999-6_5
Hayashi AShirako JTiotto EHo RSarkar VChandrasekaran SJuckeland G(2016)Exploring compiler optimization opportunities for the OpenMP 4.x accelerator model on a POWER8+GPU platformProceedings of the Third International Workshop on Accelerator Programming Using Directives10.5555/3019120.3019127(68-78)Online publication date: 13-Nov-2016
https://dl.acm.org/doi/10.5555/3019120.3019127
Hayashi AShirako JTiotto EHo RSarkar V(2016)Exploring Compiler Optimization Opportunities for the OpenMP 4.× Accelerator Model on a POWER8+GPU Platform2016 Third Workshop on Accelerator Programming Using Directives (WACCPD)10.1109/WACCPD.2016.011(68-78)Online publication date: Nov-2016
https://doi.org/10.1109/WACCPD.2016.011
Garg RJagdale SHendren LHendren LMasuhara HSheeran MVitek J(2015)Velociraptor: a compiler toolkit for array-based languages targeting CPUs and GPUsProceedings of the 2nd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming10.1145/2774959.2774967(19-24)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2774959.2774967

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents