research-article

HJ-OpenCL: Reducing the Gap Between the JVM and Accelerators

Authors:

Vivek SarkarAuthors Info & Claims

PPPJ '15: Proceedings of the Principles and Practices of Programming on The Java Platform

Pages 2 - 15

https://doi.org/10.1145/2807426.2807427

Published: 08 September 2015 Publication History

Abstract

Recently there has been increasing interest in supporting execution of Java Virtual Machine (JVM) applications on accelerator architectures, such as GPUs. Unfortunately, there is a large gap between the features of the JVM and those commonly supported by accelerators. Examples of important JVM features include exceptions, dynamic memory allocation, use of arbitrary composite objects, file I/O, and more. Recent work from our research group tackled the first feature in that list, JVM exception semantics[14]. This paper continues along that path by enabling the acceleration of JVM parallel regions that include object references and dynamic memory allocation.

The contributions of this work include 1) serialization and deserialization of JVM objects using a format that is compatible with OpenCL accelerators, 2) advanced code generation techniques for converting JVM bytecode to OpenCL kernels when object references and dynamic memory allocation are used, 3) runtime techniques for supporting dynamic memory allocation on OpenCL accelerators, and 4) a novel redundant data movement elimination technique based on inter-parallel-region dataflow analysis using runtime bytecode inspection.

Experimental results presented in this paper show performance improvements of up to 18.33× relative to parallel Java Streams for GPU-accelerated parallel regions, even when those regions include object references and dynamic memory allocation. In our evaluation, we fully characterize where accelerators or the JVM see performance wins and point out opportunities for future work.

References

[1]

Apache Tomcat. http://tomcat.apache.org/. Accessed: 2015-06-05.

[2]

Everything I Ever Learned About JVM Performance Tuning. http://bit.ly/QOYhg6. Accessed: 2015-06-04.

[3]

RetroLambda. https://github.com/orfjackal/retrolambda. Accessed: 2015-06-05.

[4]

G. K. V. S. Akihiro Hayashi, Kazuaki Ishizaki. Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection. In 12th International Conference on the Principles and Practice of Programming on the Java Platform, PPPJ, 2015.

Digital Library

[5]

V. Cavé, J. Zhao, J. Shirako, and V. Sarkar. Habanero-java: the new adventures of old x10. In Proceedings of the 9th International Conference on Principles and Practice of Programming in Java, pages 51--61. ACM, 2011.

Digital Library

[6]

P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. Von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. Acm Sigplan Notices, 40(10):519--538, 2005.

Digital Library

[7]

S. Che, J. W. Sheaffer, and K. Skadron. Dymaxion: optimizing memory access patterns for heterogeneous systems. In Proceedings of 2011 international conference for high performance computing, networking, storage and analysis, page 13. ACM, 2011.

Digital Library

[8]

R. Coleman, U. Ghattamaneni, M. Logan, and A. Labouseur. Computational Finance with Map-Reduce in Scala. In Conference on Parallel and Distributed Processing (PDPTAâĂ&Zacute;12), CSREA, 2012.

[9]

Eric Caspole. AMD's Prototype HSAIL-enabled JDK8 for the OpenJDK Sumatra Project. http://www.oracle.com/technetwork/java/jvmls2013caspole-2013527.pdf, 2013.

[10]

G. Frost. APARAPI in AMD Developer Website.

[11]

J. J. Fumero, M. Steuwer, and C. Dubach. A composable array function interface for heterogeneous computing in java. In Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, page 44. ACM, 2014.

Digital Library

[12]

M. Grossman, M. Breternitz, and V. Sarkar. Hadoopcl2: Motivating the design of a distributed, heterogeneous programming system with machine-learning applications. In IEEE Transactions on Parallel and Distributed Systems.

[13]

A. Hayashi, M. Grossman, J. Zhao, J. Shirako, and V. Sarkar. Accelerating Habanero-Java programs with OpenCL generation. In Proceedings of the 2013 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools, pages 124--134. ACM, 2013.

Digital Library

[14]

A. Hayashi, M. Grossman, J. Zhao, J. Shirako, and V. Sarkar. Speculative execution of parallel programs with precise exception semantics on gpus. In Languages and Compilers for Parallel Computing, pages 342--356. Springer, 2014.

[15]

S. Imam and V. Sarkar. Cooperative Scheduling of Parallel Tasks with General Synchronization Patterns. In European Conference on Object-Oriented Programming (ECOOP), pages 618--643. Springer, 2014.

[16]

S. Imam and V. Sarkar. Habanero-Java Library: A Java 8 Framework for Multicore Programming. In 11th International Conference on the Principles and Practice of Programming on the Java Platform, PPPJ, volume 14, 2014.

[17]

S. Imam and V. Sarkar. The Eureka Programming Model for Speculative Task Parallelism. In European Conference on Object-Oriented Programming (ECOOP), 2015.

[18]

D. Majeti, R. Barik, J. Zhao, M. Grossman, and V. Sarkar. Compiler-driven data layout transformation for heterogeneous platforms. In Euro-Par 2013: Parallel Processing Workshops, pages 188--197. Springer, 2014.

[19]

D. Nikolic and F. Spoto. Definite expression aliasing analysis for java bytecode. In Theoretical Aspects of Computing--ICTAC 2012, pages 74--89. Springer, 2012.

Digital Library

[20]

OpenJDK. Project Sumatra. http://openjdk.java.net/projects/sumatra/.

[21]

P. C. Pratt-Szeliga, J. W. Fawcett, and R. D. Welch. Rootbeer: Seamlessly using GPUs from java. In High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), 2012 IEEE 14th International Conference on, pages 375--380. IEEE, 2012.

Digital Library

[22]

G. L. Taboada, S. Ramos, R. R. Expósito, J. Touriño, and R. Doallo. Java in the High Performance Computing arena: Research, practice and experience. Science of Computer Programming, 78(5):425--444, 2013.

Digital Library

[23]

W. VanderHeyden, E. D. Dendy, and N. Padial-Collins. CartaBlancaâĂŤa pure-Java, component-based systems simulation tool for coupled nonlinear physics on unstructured gridsâĂŤan update. Concurrency and Computation: Practice and Experience, 15(3-5):431--458, 2003.

[24]

Vivek Sarkar. COMP 322: Introduction to Parallel Programming. https://wiki.rice.edu/confluence/display/PARPROG/COMP322.

[25]

W. Zaremba, Y. Lin, and V. Grover. Jabee: framework for object-oriented java bytecode compilation and execution on graphics processor units. In Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, pages 74--83. ACM, 2012.

Digital Library

Cited By

Xekalaki MFumero JStratikopoulos ADoka KKatsakioris CBitsakos CKoziris NKotselidis C(2022)Enabling Transparent Acceleration of Big Data Frameworks Using Heterogeneous HardwareProceedings of the VLDB Endowment10.14778/3565838.356584215:13(3869-3882)Online publication date: 1-Sep-2022
https://dl.acm.org/doi/10.14778/3565838.3565842
Kumar VDolby JBlackburn SZheng YBinder WTůma P(2016)Integrating Asynchronous Task Parallelism and Data-centric AtomicityProceedings of the 13th International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools10.1145/2972206.2972214(1-10)Online publication date: 29-Aug-2016
https://dl.acm.org/doi/10.1145/2972206.2972214

Index Terms

HJ-OpenCL: Reducing the Gap Between the JVM and Accelerators

Recommendations

Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs
SCC '12: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis

OpenCL (Open Computing Language) is a framework for general-purpose parallel programming. Programs written in OpenCL are functionally portable across multiple processors including CPUs, GPUs, and also FPGAs. Using an auto-tuning technique makes ...
An OpenCL micro-benchmark suite for GPUs and CPUs

Open computing language (OpenCL) is a new industry standard for task-parallel and data-parallel heterogeneous computing on a variety of modern CPUs, GPUs, DSPs, and other microprocessor designs. OpenCL is vendor independent and hence not specialized for ...
Developing High-Performance, Portable OpenCL Code via Multi-Dimensional Homomorphisms
IWOCL '19: Proceedings of the International Workshop on OpenCL

A key challenge in programming high-performance applications is achieving portable performance, such that the same program code can reach a consistent level of performance over the variety of modern parallel processors, including multi-core CPU and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

PPPJ '15: Proceedings of the Principles and Practices of Programming on The Java Platform

September 2015

190 pages

ISBN:9781450337120

DOI:10.1145/2807426

General Chair:
Ryan Stansifer
Florida Institute of Technology, USA
,
Program Chair:
Andreas Krall
Vienna University of Technology, Austria

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

ACM: Association for Computing Machinery

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 September 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

PPPJ '15

PPPJ '15: Principles and Practices of Programming on the Java Platform

September 8 - 11, 2015

FL, Melbourne, USA

Acceptance Rates

PPPJ '15 Paper Acceptance Rate 15 of 27 submissions, 56%;

Overall Acceptance Rate 29 of 58 submissions, 50%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
156
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xekalaki MFumero JStratikopoulos ADoka KKatsakioris CBitsakos CKoziris NKotselidis C(2022)Enabling Transparent Acceleration of Big Data Frameworks Using Heterogeneous HardwareProceedings of the VLDB Endowment10.14778/3565838.356584215:13(3869-3882)Online publication date: 1-Sep-2022
https://dl.acm.org/doi/10.14778/3565838.3565842
Kumar VDolby JBlackburn SZheng YBinder WTůma P(2016)Integrating Asynchronous Task Parallelism and Data-centric AtomicityProceedings of the 13th International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools10.1145/2972206.2972214(1-10)Online publication date: 29-Aug-2016
https://dl.acm.org/doi/10.1145/2972206.2972214

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents