StreamMine3G: Elastic and Fault Tolerant Large Scale Stream Processing

André Martin³,
Andrey Brito⁴ &
Christof Fetzer³

324 Accesses
1 Citations

Introduction

During the past decade, we have been witnessing a massive growth of data. In particular the advent of new mobile devices such as smartphones, tablets and online services like Facebook and Twitter created a complete new era for data processing. Although there exist already well-established approaches such as MapReduce (Dean and Ghemawat 2008) and its open-source implementation Hadoop (2015) in order to cope with these large amounts of data, there is a recent trend of moving away from batch processing to low-latency online processing using event stream processing (ESP) systems. Inspired by the simplicity of the MapReduce programming paradigm, a number of open-source as well as commercial ESP systems have evolved over the past years such as Apache S4 (Neumeyer et al. 2010; Apache 2015) (originally pushed by Yahoo!), Apache Storm (2015) (Twitter), and Apache Samza (2015) (LinkedIn), addressing the strong need for data processing in near real time.

Since the amount of data...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

Apache s4 – distributed stream computing platform (2015). https://incubator.apache.org/s4/
Apache samza – a distributed stream processing framework (2015). http://samza.incubator.apache.org/
Apache storm – distributed and faulttolerant realtime computation (2015). https://storm.incubator.apache.org/
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
Gu Y, Zhang Z, Ye F, Yang H, Kim M, Lei H, Liu Z (2009) An empirical study of high availability in stream processing systems. In: Proceedings of the 10th ACM/IFIP/USENIX international conference on middleware, Middleware ’09, New York, pp 23:1–23:9. Springer, New York
Google Scholar
Hadoop mapreduce open source implementation (2015). http://hadoop.apache.org/
Hunt P, Konar M, Junqueira FP, Reed B (2010) Zookeeper: wait-free coordination for Internet-scale systems. In: Proceedings of the 2010 USENIX conference on USENIX annual technical conference, USENIXATC’10, Berkeley. USENIX Association, pp 1–14
Google Scholar
Hwang J-H, Balazinska M, Rasin A, Cetintemel U, Stonebraker M, Zdonik S (2005) High-availability algorithms for distributed stream processing. In: Proceedings of the 21st international conference on data engineering, ICDE ’05, Washington, DC. IEEE Computer Society, pp 779–790
Google Scholar
Martin A, Fetzer C, Brito A (2011) Active replication at (almost) no cost. In: Proceedings of the 2011 IEEE 30th international symposium on reliable distributed systems, SRDS ’11, Washington, DC. IEEE Computer Society, pp 21–30
Google Scholar
Martin A, Knauth T, Creutz S, Becker D, Weigert S, Fetzer C, Brito A (2011) Low-overhead fault tolerance for high-throughput data processing systems. In: Proceedings of the 2011 31st international conference on distributed computing systems, ICDCS ’11, Washington, DC. IEEE Computer Society, pp 689–699
Google Scholar
Martin A, Smaneoto T, Dietze T, Brito A, Fetzer C (2015) User-constraint and self-adaptive fault tolerance for event stream processing systems. In: Proceedings of The 45th annual IEEE/IFIP international conference on dependable systems and networks (DSN 2015), Los Alamitos. IEEE Computer Society
Google Scholar
Neumeyer L, Robbins B, Nair A, Kesari A (2010) S4: distributed stream computing platform. In: Proceedings of the 2010 IEEE international conference on data mining workshops, ICDMW ’10, Washington, DC. IEEE Computer Society, pp 170–177
Google Scholar
Schneider FB (1990) Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Comput Surv 22(4):299–319
Article Google Scholar

Download references

Author information

Authors and Affiliations

TU Dresden, Dresden, Germany
André Martin & Christof Fetzer
UFCG, Campina Grande, Brazil
Andrey Brito

Authors

André Martin
View author publications
You can also search for this author in PubMed Google Scholar
Andrey Brito
View author publications
You can also search for this author in PubMed Google Scholar
Christof Fetzer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to André Martin .

Editor information

Editors and Affiliations

School of Comp. Sci. and Engineering, University of New South Wales School of Comp. Sci. and Engineering, Eveleigh, New South Wales, Australia
Sherif Sakr
Sch of Info Techno, Building J12, University of Sydney Sch of Info Techno, Building J12, Sydney, Australia
Albert Zomaya

Section Editor information

Delft University of Technology, Delft, Netherlands
Asterios Katsifodimos Ph.D
School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
Pramod Bhatotia

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Martin, A., Brito, A., Fetzer, C. (2018). StreamMine3G: Elastic and Fault Tolerant Large Scale Stream Processing. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_145-1

Download citation

DOI: https://doi.org/10.1007/978-3-319-63962-8_145-1
Published: 14 February 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63962-8
Online ISBN: 978-3-319-63962-8
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics