Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2628071.2628097acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Velociraptor: an embedded compiler toolkit for numerical programs targeting CPUs and GPUs

Published: 24 August 2014 Publication History

Abstract

Developing just-in-time (JIT) compilers that that allow scientific programmers to efficiently target both CPUs and GPUs is of increasing interest. However building such compilers requires considerable effort. We present a reusable and embeddable compiler toolkit called Velociraptor that can be used to easily build compilers for numerical programs targeting multicores and GPUs.
Velociraptor provides a new high-level IR called VRIR which has been specifically designed for numeric computations, with rich support for arrays, plus support for high-level parallel and GPU constructs. A compiler developer uses Velociraptor by generating VRIR for key parts of an input program. Velociraptor provides an optimizing compiler toolkit for generating CPU and GPU code and also provides a smart runtime system to manage the GPU.
To demonstrate Velociraptor in action, we present two proof-of-concept case studies: a GPU extension for a JIT implementation of MATLAB language, and a JIT compiler for Python targeting CPUs and GPUs.

References

[1]
Advanced Micro Devices Inc. Aparapi. http://code.google.com/p/aparapi/.
[2]
C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier. StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par 2009, 23:187--198, Feb. 2011.
[3]
J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. Theano: a CPU and GPU math expression compiler. In SciPy 2010, June 2010.
[4]
J. Bezanson, S. Karpinski, V. B. Shah, and A. Edelman. Julia: A fast dynamic language for technical computing. CoRR, abs/1209.5145, 2012.
[5]
B. Catanzaro, M. Garland, and K. Keutzer. Copperhead: compiling an embedded data parallel language. In PPOPP 2011, pages 47--56, 2011.
[6]
M. Chevalier-Boisvert, L. Hendren, and C. Verbrugge. Optimizing MATLAB through just-in-time specialization. In CC 2010, pages 46--65, 2010.
[7]
A. Collins, D. Grewe, V. Grover, S. Lee, and A. Susnea. NOVA : A functional language for data parallelism. Technical report, Nvidia Research, 2013.
[8]
R. Garg and J. N. Amaral. Compiling Python to a hybrid execution environment. In GPGPU 2010, pages 19--30, 2010.
[9]
R. Garg and L. Hendren. Just-in-time shape inference for array-based languages. In ARRAY'14 workshop at PLDI 2014, 2014.
[10]
R. Garg and L. Hendren. A portable and high-performance general matrix-multiply (GEMM) library for GPUs and single-chip CPU/GPU systems. In Proceedings of 22nd Euromicro International Conference on Parallel, Distributed and network-based Processing, Special session on GPU computing, 2014.
[11]
A. Klöckner. PyCUDA. http://mathema.tician.de/software/pycuda.
[12]
A. Klöckner. PyOpenCL web page. http://mathema.tician.de/software/pyopencl.
[13]
MathWorks. MATLAB: The Language of Technical Computing.
[14]
N. T. V. Nguyen, F. Irigoin, C. Ancourt, and R. Keryell. Efficient intraprocedural array bound checking. Technical report, Ecole des Mines de Paris, 2000.
[15]
T. Oliphant. Numba Python bytecode to LLVM translator. In Proceedings of the Python for Scientific Computing Conference (SciPy), June 2012. Oral Presentation.
[16]
A. Prasad, J. Anantpur, and R. Govindarajan. Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors. In PLDI 2011, pages 152--163, 2011.
[17]
G. Pryor, B. Lucey, S. Maddipatla, C. McClanahan, J. Melonakos, V. Venugopalakrishnan, K. Patel, P. Yalamanchili, and J. Malcolm. High-level GPU computing with Jacket for MATLAB and C/C++. Proceedings of SPIE (online), 8060(806005), 2011.
[18]
Python.org. Python Programming Language: Official Website.
[19]
R-project.org. The R Project for Statistical Computing.
[20]
C. J. Rossbach, Y. Yu, J. Currey, J.-P. Martin, and D. Fetterly. Dandelion: a compiler and runtime for heterogeneous systems. In SOSP'13: The 24th ACM Symposium on Operating Systems Principles, 2013.
[21]
A. Rubinsteyn, E. Hielscher, N. Weinman, and D. Shasha. Parakeet: A just-in-time parallel accelerator for Python. In HotPar 12, 2012.
[22]
SciPy.org. NumPy: Scientific Computing Tools for Python.
[23]
D. S. Seljebotn. Fast numerical computations with Cython. In G. Varoquaux, S. van der Walt, and J. Millman, editors, Proceedings of the 8th Python in Science Conference, pages 15--22, Pasadena, CA USA, 2009.
[24]
L. Shure. Memory management for functions and variables. http://blogs.mathworks.com/loren/2006/05/10/memory-management-for-functions-and-variables/.

Cited By

View all
  • (2020)Compilation of MATLAB computations to CPU/GPU via C/OpenCL generationConcurrency and Computation: Practice and Experience10.1002/cpe.585432:22Online publication date: Jun-2020
  • (2019)Performance evaluation of OpenMP's target construct on GPUs-exploring compiler optimisationsInternational Journal of High Performance Computing and Networking10.5555/3302714.330271813:1(54-69)Online publication date: 1-Jan-2019
  • (2017)Optimized two-level parallelization for GPU accelerators using the polyhedral modelProceedings of the 26th International Conference on Compiler Construction10.1145/3033019.3033022(22-33)Online publication date: 5-Feb-2017
  • Show More Cited By

Index Terms

  1. Velociraptor: an embedded compiler toolkit for numerical programs targeting CPUs and GPUs

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilation
    August 2014
    514 pages
    ISBN:9781450328098
    DOI:10.1145/2628071
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 August 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. compiler framework for array-based language
    2. gpu hybrid systems
    3. matlab
    4. python

    Qualifiers

    • Research-article

    Conference

    PACT '14
    Sponsor:
    • IFIP WG 10.3
    • SIGARCH
    • IEEE CS TCPP
    • IEEE CS TCAA

    Acceptance Rates

    PACT '14 Paper Acceptance Rate 54 of 144 submissions, 38%;
    Overall Acceptance Rate 121 of 471 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 29 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Compilation of MATLAB computations to CPU/GPU via C/OpenCL generationConcurrency and Computation: Practice and Experience10.1002/cpe.585432:22Online publication date: Jun-2020
    • (2019)Performance evaluation of OpenMP's target construct on GPUs-exploring compiler optimisationsInternational Journal of High Performance Computing and Networking10.5555/3302714.330271813:1(54-69)Online publication date: 1-Jan-2019
    • (2017)Optimized two-level parallelization for GPU accelerators using the polyhedral modelProceedings of the 26th International Conference on Compiler Construction10.1145/3033019.3033022(22-33)Online publication date: 5-Feb-2017
    • (2017)Boosting Java Performance Using GPGPUsArchitecture of Computing Systems - ARCS 201710.1007/978-3-319-54999-6_5(59-70)Online publication date: 4-Mar-2017
    • (2016)Exploring compiler optimization opportunities for the OpenMP 4.x accelerator model on a POWER8+GPU platformProceedings of the Third International Workshop on Accelerator Programming Using Directives10.5555/3019120.3019127(68-78)Online publication date: 13-Nov-2016
    • (2016)Exploring Compiler Optimization Opportunities for the OpenMP 4.× Accelerator Model on a POWER8+GPU Platform2016 Third Workshop on Accelerator Programming Using Directives (WACCPD)10.1109/WACCPD.2016.011(68-78)Online publication date: Nov-2016
    • (2015)Velociraptor: a compiler toolkit for array-based languages targeting CPUs and GPUsProceedings of the 2nd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming10.1145/2774959.2774967(19-24)Online publication date: 13-Jun-2015

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media