Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3098572.3098579acmotherconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article

Code Generation in Serializers and Comparators of Apache Flink

Published: 19 June 2017 Publication History

Abstract

There is a shift in the Big Data world. Applications used to be I/O bound. InfiniBand, SSDs reduced the I/O overhead and more sophisticated algorithms were developed. CPU became a bottleneck for some applications. Using state of the art CPUs, reduced CPU usage can lead to reduced electricity costs even when an application is I/O bound.
Apache Flink is an open source framework for processing streams of data and batch jobs. It is using serialization for wide variety of purposes. Not only for sending data over the network, saving it to the hard disk, or for fault tolerance, but also some of the operators can work on the serialized representation of the data instead of Java objects. This approach can improve the performance significantly. Flink has a custom serialization method that enables operators to work on the serialized formats.
Currently, Apache Flink uses reflection to serialize Plain Old Java Objects (POJOs). Reflection in Java is notoriously slow. Moreover, the structure of the code is harder to optimize for the JIT compiler. As a Google Summer of Code project in 2016, we implemented code generation for serializers and comparators for POJOs to improve the performance of Apache Flink. Flink has a delicate type system which provides us with lots of information about the types that need to be serialized. Using this information it is possible to generate specialized code with great performance.
We achieved more than 6X performance improvement in the serialization which was a 20% overall improvement.

References

[1]
Alexander Alexandrov, Rico Bergmann, Stephan Ewen, Johann-Christoph Freytag, Fabian Hueske, Arvid Heise, Odej Kao, Marcus Leich, Ulf Leser, Volker Markl, Felix Naumann, Mathias Peters, Astrid Rheinländer, Matthias J. Sax, Sebastian Schelter, Mareike Höger, Kostas Tzoumas, and Daniel Warneke. 2014. The Stratosphere platform for big data analytics. The VLDB Journal 23, 6 (2014), 939--964.
[2]
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink™: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 36, 4 (2015), 28--38.
[3]
EsotericSoftware. 2017. Kryo: a fast and efficient Object Graph Serialization Framework for Java. https://github.com/EsotericSoftware/kryo. (2017).
[4]
The Apache Software Foundation. 2017. Apache FreeMarker. http://freemarker.org/. (2017).
[5]
Gábor Gévay. 2016. Nine Men's Morris Implementation. https://github.com/ggevay/flink/tree/malom/gg. (2016).
[6]
Gábor E. Gévay and Gábor Danner. 2016. Calculating Ultrastrong and Extended Solutions for Nine Men's Morris, Morabaraba, and Lasker Morris. IEEE Transactions on Computational Intelligence and AI in Games 8, 3 (Sept 2016), 256--267.
[7]
Li Gong. 1998. Secure Java class loading. IEEE Internet Computing 2, 6 (Nov 1998), 56--61.
[8]
Gábor Horváth. 2016. Code generation for PojoSerializer and PojoComparator, link to pull request. https://github.com/apache/flink/pull/2211. (2016).
[9]
Fabian Hüske. 2015. Juggling with Bits and Bytes. https://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html. (2015).
[10]
janino compiler. 2017. Janino -- A super-small, super-fast Java compiler. http://janino-compiler.github.io/janino/. (2017).
[11]
Min Li, Jian Tan, Yandong Wang, Li Zhang, and Valentina Salapura. 2017. SparkBench: a spark benchmarking suite characterizing large-scale in-memory data analytics. Cluster Computing (2017), 1--15.
[12]
Davies Liu. 2015. Spark using Janino. https://issues.apache.org/jira/browse/SPARK-7956. (2015).
[13]
Kay Ousterhout, Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun. 2015. Making Sense of Performance in Data Analytics Frameworks. In Proceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI '15), Vol. 15. 293--307.
[14]
Jeffrey Shafer, Scott Rixner, and Alan L. Cox. 2010. The Hadoop distributed filesystem: Balancing portability and performance. In 2010 IEEE International Symposium on Performance Analysis of Systems Software (ISPASS). IEEE, 122--133.
[15]
Cătălin Tudose, Carmen Odubăşteanu, and Serban Radu. 2013. Java Reflection Performance Analysis Using Different Java Development. Springer Berlin Heidelberg, Berlin, Heidelberg, 439--452.
[16]
Reynold Xin and Josh Rosen. 2015. Project Tungsten: Bringing Apache Spark Closer to Bare Metal. https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html. (2015).

Cited By

View all
  • (2022)Instantiation of Java GenericsActa Cybernetica10.14232/actacyb.28407325:4(897-908)Online publication date: 21-Jan-2022
  • (2019)Effective type parametrization in JavaCENTRAL EUROPEAN SYMPOSIUM ON THERMOPHYSICS 2019 (CEST)10.1063/1.5114360(350007)Online publication date: 2019

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICOOOLPS'17: Proceedings of the 12th Workshop on Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems
June 2017
39 pages
ISBN:9781450350884
DOI:10.1145/3098572
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Flink
  2. Janino
  3. Java
  4. big data
  5. code generation

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ECOOP '17

Acceptance Rates

ICOOOLPS'17 Paper Acceptance Rate 6 of 8 submissions, 75%;
Overall Acceptance Rate 11 of 14 submissions, 79%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Instantiation of Java GenericsActa Cybernetica10.14232/actacyb.28407325:4(897-908)Online publication date: 21-Jan-2022
  • (2019)Effective type parametrization in JavaCENTRAL EUROPEAN SYMPOSIUM ON THERMOPHYSICS 2019 (CEST)10.1063/1.5114360(350007)Online publication date: 2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media