Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2807426.2807427acmotherconferencesArticle/Chapter ViewAbstractPublication PagespppjConference Proceedingsconference-collections
research-article

HJ-OpenCL: Reducing the Gap Between the JVM and Accelerators

Published: 08 September 2015 Publication History

Abstract

Recently there has been increasing interest in supporting execution of Java Virtual Machine (JVM) applications on accelerator architectures, such as GPUs. Unfortunately, there is a large gap between the features of the JVM and those commonly supported by accelerators. Examples of important JVM features include exceptions, dynamic memory allocation, use of arbitrary composite objects, file I/O, and more. Recent work from our research group tackled the first feature in that list, JVM exception semantics[14]. This paper continues along that path by enabling the acceleration of JVM parallel regions that include object references and dynamic memory allocation.
The contributions of this work include 1) serialization and deserialization of JVM objects using a format that is compatible with OpenCL accelerators, 2) advanced code generation techniques for converting JVM bytecode to OpenCL kernels when object references and dynamic memory allocation are used, 3) runtime techniques for supporting dynamic memory allocation on OpenCL accelerators, and 4) a novel redundant data movement elimination technique based on inter-parallel-region dataflow analysis using runtime bytecode inspection.
Experimental results presented in this paper show performance improvements of up to 18.33× relative to parallel Java Streams for GPU-accelerated parallel regions, even when those regions include object references and dynamic memory allocation. In our evaluation, we fully characterize where accelerators or the JVM see performance wins and point out opportunities for future work.

References

[1]
Apache Tomcat. http://tomcat.apache.org/. Accessed: 2015-06-05.
[2]
Everything I Ever Learned About JVM Performance Tuning. http://bit.ly/QOYhg6. Accessed: 2015-06-04.
[3]
RetroLambda. https://github.com/orfjackal/retrolambda. Accessed: 2015-06-05.
[4]
G. K. V. S. Akihiro Hayashi, Kazuaki Ishizaki. Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection. In 12th International Conference on the Principles and Practice of Programming on the Java Platform, PPPJ, 2015.
[5]
V. Cavé, J. Zhao, J. Shirako, and V. Sarkar. Habanero-java: the new adventures of old x10. In Proceedings of the 9th International Conference on Principles and Practice of Programming in Java, pages 51--61. ACM, 2011.
[6]
P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. Von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. Acm Sigplan Notices, 40(10):519--538, 2005.
[7]
S. Che, J. W. Sheaffer, and K. Skadron. Dymaxion: optimizing memory access patterns for heterogeneous systems. In Proceedings of 2011 international conference for high performance computing, networking, storage and analysis, page 13. ACM, 2011.
[8]
R. Coleman, U. Ghattamaneni, M. Logan, and A. Labouseur. Computational Finance with Map-Reduce in Scala. In Conference on Parallel and Distributed Processing (PDPTAâĂŹ12), CSREA, 2012.
[9]
Eric Caspole. AMD's Prototype HSAIL-enabled JDK8 for the OpenJDK Sumatra Project. http://www.oracle.com/technetwork/java/jvmls2013caspole-2013527.pdf, 2013.
[10]
G. Frost. APARAPI in AMD Developer Website.
[11]
J. J. Fumero, M. Steuwer, and C. Dubach. A composable array function interface for heterogeneous computing in java. In Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, page 44. ACM, 2014.
[12]
M. Grossman, M. Breternitz, and V. Sarkar. Hadoopcl2: Motivating the design of a distributed, heterogeneous programming system with machine-learning applications. In IEEE Transactions on Parallel and Distributed Systems.
[13]
A. Hayashi, M. Grossman, J. Zhao, J. Shirako, and V. Sarkar. Accelerating Habanero-Java programs with OpenCL generation. In Proceedings of the 2013 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools, pages 124--134. ACM, 2013.
[14]
A. Hayashi, M. Grossman, J. Zhao, J. Shirako, and V. Sarkar. Speculative execution of parallel programs with precise exception semantics on gpus. In Languages and Compilers for Parallel Computing, pages 342--356. Springer, 2014.
[15]
S. Imam and V. Sarkar. Cooperative Scheduling of Parallel Tasks with General Synchronization Patterns. In European Conference on Object-Oriented Programming (ECOOP), pages 618--643. Springer, 2014.
[16]
S. Imam and V. Sarkar. Habanero-Java Library: A Java 8 Framework for Multicore Programming. In 11th International Conference on the Principles and Practice of Programming on the Java Platform, PPPJ, volume 14, 2014.
[17]
S. Imam and V. Sarkar. The Eureka Programming Model for Speculative Task Parallelism. In European Conference on Object-Oriented Programming (ECOOP), 2015.
[18]
D. Majeti, R. Barik, J. Zhao, M. Grossman, and V. Sarkar. Compiler-driven data layout transformation for heterogeneous platforms. In Euro-Par 2013: Parallel Processing Workshops, pages 188--197. Springer, 2014.
[19]
D. Nikolic and F. Spoto. Definite expression aliasing analysis for java bytecode. In Theoretical Aspects of Computing--ICTAC 2012, pages 74--89. Springer, 2012.
[20]
OpenJDK. Project Sumatra. http://openjdk.java.net/projects/sumatra/.
[21]
P. C. Pratt-Szeliga, J. W. Fawcett, and R. D. Welch. Rootbeer: Seamlessly using GPUs from java. In High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), 2012 IEEE 14th International Conference on, pages 375--380. IEEE, 2012.
[22]
G. L. Taboada, S. Ramos, R. R. Expósito, J. Touriño, and R. Doallo. Java in the High Performance Computing arena: Research, practice and experience. Science of Computer Programming, 78(5):425--444, 2013.
[23]
W. VanderHeyden, E. D. Dendy, and N. Padial-Collins. CartaBlancaâĂŤa pure-Java, component-based systems simulation tool for coupled nonlinear physics on unstructured gridsâĂŤan update. Concurrency and Computation: Practice and Experience, 15(3-5):431--458, 2003.
[24]
Vivek Sarkar. COMP 322: Introduction to Parallel Programming. https://wiki.rice.edu/confluence/display/PARPROG/COMP322.
[25]
W. Zaremba, Y. Lin, and V. Grover. Jabee: framework for object-oriented java bytecode compilation and execution on graphics processor units. In Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, pages 74--83. ACM, 2012.

Cited By

View all
  • (2022)Enabling Transparent Acceleration of Big Data Frameworks Using Heterogeneous HardwareProceedings of the VLDB Endowment10.14778/3565838.356584215:13(3869-3882)Online publication date: 1-Sep-2022
  • (2016)Integrating Asynchronous Task Parallelism and Data-centric AtomicityProceedings of the 13th International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools10.1145/2972206.2972214(1-10)Online publication date: 29-Aug-2016

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
PPPJ '15: Proceedings of the Principles and Practices of Programming on The Java Platform
September 2015
190 pages
ISBN:9781450337120
DOI:10.1145/2807426
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 September 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPU
  2. JVM
  3. OpenCL
  4. offload
  5. serialization

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

PPPJ '15

Acceptance Rates

PPPJ '15 Paper Acceptance Rate 15 of 27 submissions, 56%;
Overall Acceptance Rate 29 of 58 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Enabling Transparent Acceleration of Big Data Frameworks Using Heterogeneous HardwareProceedings of the VLDB Endowment10.14778/3565838.356584215:13(3869-3882)Online publication date: 1-Sep-2022
  • (2016)Integrating Asynchronous Task Parallelism and Data-centric AtomicityProceedings of the 13th International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools10.1145/2972206.2972214(1-10)Online publication date: 29-Aug-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media