Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3093742.3095103acmconferencesArticle/Chapter ViewAbstractPublication PagesdebsConference Proceedingsconference-collections
short-paper

StreamLearner: Distributed Incremental Machine Learning on Event Streams: Grand Challenge

Published: 08 June 2017 Publication History

Abstract

Today, massive amounts of streaming data from smart devices need to be analyzed automatically to realize the Internet of Things. The Complex Event Processing (CEP) paradigm promises low-latency pattern detection on event streams. However, CEP systems need to be extended with Machine Learning (ML) capabilities such as online training and inference in order to be able to detect fuzzy patterns (e.g. outliers) and to improve pattern recognition accuracy during runtime using incremental model training. In this paper, we propose a distributed CEP system denoted as StreamLearner for ML-enabled complex event detection. The proposed programming model and data-parallel system architecture enable a wide range of real-world applications and allow for dynamically scaling up and out system resources for low-latency, high-throughput event processing. We show that the DEBS Grand Challenge 2017 case study (i.e., anomaly detection in smart factories) integrates seamlessly into the StreamLearner API. Our experiments verify scalability and high event throughput of StreamLearner.

References

[1]
Darko Anicic, Paul Fodor, Sebastian Rudolph, and Nenad Stojanovic. 2011. EP-SPARQL: A Unified Language for Event Processing and Stream Reasoning. In Proceedings of the 20th International Conference on World Wide Web (WWW '11). ACM, New York, NY, USA, 635--644.
[2]
Arvind Arasu, Shivnath Babu, and Jennifer Widom. 2006. The CQL Continuous Query Language: Semantic Foundations and Query Execution. The VLDB Journal 15, 2 (June 2006), 121--142.
[3]
Michael Batty. 2013. Big data, smart cities and city planning. Dialogues in Human Geography 3, 3 (2013), 274--279.
[4]
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi. and Kostas Tzoumas. 2015. Apache flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 36, 4 (2015).
[5]
Gert Cauwenberghs and Tomaso Poggio. 2001. Incremental and decremental support vector machine learning. In Advances in neural information processing systems. 409--415.
[6]
Gianpaolo Cugola and Alessandro Margara. 2010. TESLA: A Formally Defined Event Specification Language. In Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems (DEBS '10). ACM, New York, NY, USA, 50--61.
[7]
Gianpaolo Cugola and Alessandro Margara. 2012. Processing flows of information: From data stream to complex event processing. ACM Computing Surveys (CSUR) 44, 3(2012), 15.
[8]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51, 1 (Jan. 2008), 107--113.
[9]
Shen Furao, Tomotaka Ogura, and Osamu Hasegawa. 2007. An enhanced self-organizing incremental neural network for online unsupervised learning. Neural Networks 20, 8 (2007), 893--903.
[10]
Vincenzo Gulisano, Zbigniew Jerzak, Roman Katerinenko, Martin Strohbach. and Holger Ziekow. 2017. The DEBS 2017 grand challenge. In Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems, DEBS '17, Barcelona, Spain, June 19-23, 2017.
[11]
Jay Kreps, Neha Narkhede, Jun Rao, and others. 2011. Kafka: A distributed messaging system for log processing. In Proceedings of the NetDB. 1--7.
[12]
Narayanan C Krishnan and Diane J Cook. 2014. Activity recognition on streaming sensor data. Pervasive and mobile computing 10 (2014), 138--154.
[13]
Meena Mahajan, Prajakta Nimbhorkar, and Kasturi Varadarajan. 2009. The planar k-means problem is NP-hard. In International Workshop on Algorithms and Computation. Springer, 274--285.
[14]
Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, 135--146.
[15]
Christian Mayer, Muhammad Adnan Tariq, Chen Li, and Kurt Rothermel. 2016. GrapH: Heterogeneity-Aware Graph Computation with Adaptive Partitioning. In Proc. of IEEE ICDCS.
[16]
Ruben Mayer, Boris Koldehofe, and Kurt Rothermel. 2015. Predictable Low-Latency Event Detection with Parallel Complex Event Processing. Internet of Things journal, IEEE 2, 4 (Aug 2015), 274--286.
[17]
Ruben Mayer, Christian Mayer, Muhammad Adnan Tariq, and Kurt Rothermel. 2016. GraphCEP: Real-time Data Analytics Using Parallel Complex Event and Graph Processing. In Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems (DEBS '16). ACM, New York, NY, USA, 309--316.
[18]
Ruben Mayer, Muhammad Adnan Tariq, and Kurt Rothermel. 2017. Minimizing Communication Overhead in Window-Based Parallel Complex Event Processing. In Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems (DEBS '17). ACM, New York, NY, USA, 12.
[19]
Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham. and others. 2014. Storm@twitter. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, 147--156.
[20]
Haizhou Wang and Mingzhou Song. 2011. Ckmeans. 1d. dp: optimal k-means clustering in one dimension by dynamic programming. The R journal 3, 2 (2011). 29.
[21]
Robert C Wilson, Matthew R Nassar, and Joshua I Gold. 2010. Bayesian online learning of the hazard rate in change-point problems. Neural computation 22, 9 (2010), 2452--2476.
[22]
Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized streams: Fault-tolerant streaming computation at scale. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, 423--438.

Cited By

View all
  • (2023)EPAComp: An Architectural Model for EPA CompositionProceedings of the XIX Brazilian Symposium on Information Systems10.1145/3592813.3592889(61-69)Online publication date: 29-May-2023
  • (2021)Autonomously Improving Systems in Industry: A Systematic Literature ReviewSoftware Business10.1007/978-3-030-67292-8_3(30-45)Online publication date: 22-Jan-2021
  • (2018)Uncertainty-Based Deep Learning Networks for Limited Data Wetland User Models2018 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR)10.1109/AIVR.2018.00011(19-26)Online publication date: Dec-2018

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DEBS '17: Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems
June 2017
393 pages
ISBN:9781450350655
DOI:10.1145/3093742
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Complex Event Processing
  2. Machine Learning
  3. Stream Processing

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Conference

DEBS '17

Acceptance Rates

DEBS '17 Paper Acceptance Rate 22 of 60 submissions, 37%;
Overall Acceptance Rate 145 of 583 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)1
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)EPAComp: An Architectural Model for EPA CompositionProceedings of the XIX Brazilian Symposium on Information Systems10.1145/3592813.3592889(61-69)Online publication date: 29-May-2023
  • (2021)Autonomously Improving Systems in Industry: A Systematic Literature ReviewSoftware Business10.1007/978-3-030-67292-8_3(30-45)Online publication date: 22-Jan-2021
  • (2018)Uncertainty-Based Deep Learning Networks for Limited Data Wetland User Models2018 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR)10.1109/AIVR.2018.00011(19-26)Online publication date: Dec-2018

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media