Abstract
Recent developments in Big Data are increasingly focusing on supporting computations in higher data velocity environments, including processing of continuous data streams in support of the discovery of valuable insights in real-time. In this work we investigate performance of streaming engines, specifically we address a problem of identifying optimal parameters that may affect the throughput (messages processed/second) and the latency (time to process a message). These parameters are also function of the parallelism property, i.e. a number of additional parallel tasks (threads) available to support parallel computation. In experimental evaluation we identify optimal cluster performance by balancing the degree of parallelism with number of nodes, which yield maximum throughput with minimum latency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Apache Hadoop, http://hadoop.apache.org/.
- 2.
References
Bedini, I., Sakr, S., Theeten, B., Sala, A., Cogan, P.: Modeling performance of a parallel streaming engine: Bridging theory and costs. In: Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering, pp. 173–184. ICPE 2013, NY, USA. ACM, New York (2013). http://doi.acm.org/10.1145/2479871.2479895
Casale, G., Ustinova, T.: State of the art analysis (2015)
Lohrmann, B., Janacik, P., Kao, O.: Elastic stream processing with latency guarantees (2015)
Lohrmann, B., Warneke, D., Kao, O.: Nephele streaming: stream processing under qos constraints at scale. Cluster Comput. 17(1), 61–78 (2014). http://dx.doi.org/10.1007/s10586-013-0281-8
Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: Distributed stream computing platform. In: 2010 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 170–177, December 2010
da Silva Morais, T.: Survey on frameworks for distributed computing: Hadoop, spark and storm (2015)
Theeten, B., Bedini, I., Cogan, P., Sala, A., Cucinotta, T.: Towards the optimization of a parallel streaming engine for telco applications. Bell Labs Techn. J. 18(4), 181–197 (2014)
Toshniwal, A., Taneja, S., Shukla, A., Ramasamy, K., Patel, J.M., Kulkarni, S., Jackson, J., Gade, K., Fu, M., Donham, J., et al.: Storm@ twitter. In: Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pp. 147–156. ACM (2014)
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, pp. 2–2. USENIX Association (2012)
Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., Stoica, I.: Discretized streams: Fault-tolerant streaming computation at scale. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pp. 423–438. ACM (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Franciscus, N., Milosevic, Z., Stantic, B. (2016). Influence of Parallelism Property of Streaming Engines on Their Performance. In: Ivanović, M., et al. New Trends in Databases and Information Systems. ADBIS 2016. Communications in Computer and Information Science, vol 637. Springer, Cham. https://doi.org/10.1007/978-3-319-44066-8_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-44066-8_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44065-1
Online ISBN: 978-3-319-44066-8
eBook Packages: Computer ScienceComputer Science (R0)