Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3274808.3274826acmconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article

GeneaLog: Fine-Grained Data Streaming Provenance at the Edge

Published: 26 November 2018 Publication History

Abstract

Fine-grained data provenance in data streaming allows linking each result tuple back to the source data that contributed to it, something beneficial for many applications (e.g., to find the conditions triggering a security- or safety-related alert). Further, when data transmission or storage has to be minimized, as in edge computing and cyber-physical systems, it can help in identifying the source data to be prioritized.
The memory and processing costs of fine-grained data provenance, possibly afforded by high-end servers, can be prohibitive for the resource-constrained devices deployed in edge computing and cyber-physical systems. Motivated by this challenge, we present GeneaLog, a novel fine-grained data provenance technique for data streaming applications. Leveraging the logical dependencies of the data, GeneaLog takes advantage of cross-layer properties of the software stack and incurs a minimal, constant size per-tuple overhead. Furthermore, it allows for a modular and efficient algorithmic implementation using only standard data streaming operators. This is particularly useful for distributed streaming applications since the provenance processing can be executed at separate nodes, orthogonal to the data processing. We evaluate an implementation of GeneaLog using vehicular and smart grid applications, confirming it efficiently captures fine-grained provenance data with minimal overhead.

Supplementary Material

MP4 File (p227-palyvos-giannas.mp4)

References

[1]
Daniel J Abadi, Yanif Ahmad, Magdalena Balazinska, Ugur Cetintemel, Mitch Cherniack, Jeong-Hyon Hwang, Wolfgang Lindner, Anurag Maskey, Alex Rasin, Esther Ryvkina, et al. 2005. The Design of the Borealis Stream Processing Engine. In Second Biennial Conference on Innovative Data Systems Research (CIDR 2005), Vol. 5. Asilomar, CA, USA, 277--289.
[2]
Tyler Akidau, Alex Balikov, Kaya Bekiroğrlu, Slava Chernyak, Josh Haberman, Reuven Lax, Sam McVeety, Daniel Mills, Paul Nordstrom, and Sam Whittle. 2013. MillWheel: fault-tolerant stream processing at internet scale. Proceedings of the VLDB Endowment 6, 11 (2013), 1033--1044.
[3]
M. H. Ali, C. Gerea, B. S. Raman, B. Sezgin, T. Tarnavski, T. Verona, P. Wang, P. Zabback, A. Ananthanarayan, A. Kirilov, M. Lu, A. Raizman, R. Krishnan, R. Schindlauer, T. Grabs, S. Bjeletich, B. Chandramouli, J. Goldstein, S. Bhat, Ying Li, V. Di Nicola, X. Wang, David Maier, S. Grell, O. Nano, and I. Santos. 2009. Microsoft CEP Server and Online Behavioral Targeting. Proc. VLDB Endow. 2, 2 (Aug. 2009), 1558--1561.
[4]
Yael Amsterdamer, Susan B Davidson, Daniel Deutch, Tova Milo, Julia Stoyanovich, and Val Tannen. 2011. Putting lipstick on pig: Enabling database-style workflow provenance. Proceedings of the VLDB Endowment 5, 4 (2011), 346--357.
[5]
Arvind Arasu, Mitch Cherniack, Eduardo Galvez, David Maier, Anurag S. Maskey, Esther Ryvkina, Michael Stonebraker, and Richard Tibbetts. 2004. Linear Road: A Stream Data Management Benchmark. In Proceedings of the Thirtieth International Conference on Very Large Data Bases - Volume 30 (VLDB '04). VLDB Endowment, Toronto, Canada, 480--491. htttp://dl.acm.org/citation.cfm?id=1316689.1316732
[6]
Magdalena Balazinska, Hari Balakrishnan, Samuel Madden, and Michael Stonebraker. 2005. Fault-tolerance in the Borealis Distributed Stream Processing System. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (SIGMOD '05). ACM, New York, NY, USA, 13--24.
[7]
Nicole Bidoit, Melanie Herschel, and Aikaterini Tzompanaki. 2015. Efficient Computation of Polynomial Explanations of Why-Not Questions. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (CIKM '15). ACM, New York, NY, USA, 713--722.
[8]
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 36, 4 (2015).
[9]
James Cheney, Laura Chiticariu, and Wang-Chiew Tan. 2007. Provenance in Databases: Why, How, and Where. Foundations and Trends in Databases 1, 4 (2007), 379--474.
[10]
Riccardo Coppola and Maurizio Morisio. 2016. Connected car: technologies, issues, future trends. ACM Computing Surveys (CSUR) 49, 3 (2016), 46.
[11]
Stefania Costache, Vincenzo Gulisano, and Marina Papatriantafilou. 2016. Understanding the data-processing challenges in Intelligent Vehicular Systems. In Intelligent Vehicles Symposium (IV), 2016 IEEE. IEEE, Gothenburg, Sweden, 611--618.
[12]
Alfredo Cuzzocrea. 2015. Provenance research issues and challenges in the big data era. Proceedings - International Computer Software and Applications Conference 3 (2015), 684--686.
[13]
Susan B. Davidson and Juliana Freire. 2008. Provenance and Scientific Workflows: Challenges and Opportunities. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD '08). ACM, New York, NY, USA, 1345--1350.
[14]
Wim De Pauw, Mihai LeŢia, Buğrra Gedik, Henrique Andrade, Andy Frenkiel, Michael Pfeifer, and Daby Sow. 2010. Visual Debugging for Stream Processing Applications. In Runtime Verification, Howard Barringer, Ylies Falcone, Bernd Finkbeiner, Klaus Havelund, Insup Lee, Gordon Pace, Grigore Roşu, Oleg Sokolsky, and Nikolai Tillmann (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 18--35.
[15]
Juliana Freire, David Koop, Emanuele Santos, and Cláudio T. Silva. 2008. Provenance for computational tasks: A survey. Computing in Science and Engineering 10, 3 (2008), 11--21.
[16]
Boris Glavic, Kyumars Sheykh Esmaili, Peter M Fischer, and Nesime Tatbul. 2014. Efficient stream provenance via operator instrumentation. ACM Transactions on Internet Technology (TOIT) 14, 1 (2014), 7.
[17]
Paul Groth and Luc Moreau. 2013. PROV-Overview. An Overview of the PROV Family of Documents. (April 2013). https://eprints.soton.ac.uk/356854/
[18]
Vincenzo Gulisano. 2012. StreamCloud: An Elastic Parallel-Distributed Stream Processing Engine. Ph.D. Dissertation. Universidad Politécnica de Madrid.
[19]
Vincenzo Gulisano, Yiannis Nikolakopoulos, Daniel Cederman, Marina Papatriantafilou, and Philippas Tsigas. 2017. Efficient Data Streaming Multiway Aggregation Through Concurrent Algorithmic Designs and New Abstract Data Types. ACM Trans. Parallel Comput. 4, 2 (Oct. 2017), 11:1--11:28.
[20]
Vincenzo Gulisano, Yiannis Nikolakopoulos, Marina Papatriantafilou, and Philippas Tsigas. 2016. Scalejoin: A deterministic, disjoint-parallel and skew-resilient stream join. IEEE Transactions on Big Data (2016), 1--1.
[21]
Melanie Herschel, Ralf Diestelkämper, and Houssem Ben Lahmar. 2017. A survey on provenance: What for? What form? What from? VLDB Journal 26, 6 (2017), 881--906.
[22]
Mohammad Rezwanul Huq, Andreas Wombacher, and Peter M.G. Apers. 2011. Adaptive Inference of Fine-grained Data Provenance to Achieve High Accuracy at Lower Storage Costs. IEEE Computer Society, USA, 202--209. eemcs-eprint-21400.
[23]
J. Hwang, M. Balazinska, A. Rasin, U. Cetintemel, M. Stonebraker, and S. Zdonik. 2005. High-availability algorithms for distributed stream processing. In 21st International Conference on Data Engineering (ICDE'05). IEEE, Tokyo, Japan, 779--790.
[24]
Matteo Interlandi, Kshitij Shah, Sai Deep Tetali, Muhammad Ali Gulzar, Seunghyun Yoo, Miryung Kim, Todd Millstein, and Tyson Condie. 2015. Titian: Data provenance support in Spark. Proceedings of the VLDB Endowment 9, 3 (2015), 216--227.
[25]
Yuanzhen Ji, Hongjin Zhou, Zbigniew Jerzak, Anisoara Nica, Gregor Hackenbroich, and Christof Fetzer. 2015. Quality-Driven Continuous Query Execution over Out-of-Order Data Streams. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15). ACM, New York, NY, USA, 889--894.
[26]
Evangelia Kalyvianaki, Marco Fiscato, Theodoros Salonidis, and Peter Pietzuch. 2016. THEMIS: Fairness in Federated Stream Processing Under Overload. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). ACM, New York, NY, USA, 541--553.
[27]
liebre 2017. Liebre SPE. https://github.com/vincenzo-gulisano/Liebre.
[28]
M. M. Michael. 2004. Hazard pointers: safe memory reclamation for lock-free objects. IEEE Transactions on Parallel and Distributed Systems 15, 6 (June 2004), 491--504.
[29]
Odroid-XU4 2016. Odroid-XU4. http://www.hardkernel.com.
[30]
Christopher Olston and Benjamin Reed. 2011. Inspector Gadget: A Framework for Custom Monitoring and Debugging of Distributed Dataflows. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD '11). ACM, New York, NY, USA, 1221--1224.
[31]
Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. 2008. Pig Latin: A Not-so-foreign Language for Data Processing. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD '08). ACM, New York, NY, USA, 1099--1110.
[32]
storm 2017. Apache Storm. http://storm.apache.org/.
[33]
Håkan Sundell and Philippas Tsigas. 2005. Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems. J. Parallel and Distrib. Comput. 65, 5 (2005), 609--627.
[34]
Nithya N. Vijayakumar and Beth Plale. 2006. Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering. In Provenance and Annotation of Data, Luc Moreau and Ian Foster (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 46--54.
[35]
Ivan Walulya, Dimitris Palyvos-Giannas, Yiannis Nikolakopoulos, Vincenzo Gulisano, Marina Papatriantafilou, and Philippas Tsigas. 2018. Viper: A module for communication-layer determinism and scaling in low-latency stream processing. Future Generation Computer Systems 88 (2018), 297--308.
[36]
Min Wang, Marion Blount, John Davis, Archan Misra, and Daby Sow. 2007. A Time-and-value Centric Provenance Model and Architecture for Medical Event Streams. In Proceedings of the 1st ACM SIGMOBILE International Workshop on Systems and Networking Support for Healthcare and Assisted Living Environments (HealthNet '07). ACM, New York, NY, USA, 95--100.
[37]
Y. Richard Wang and Stuart E. Madnick. 1990. A Polygen Model for Heterogeneous Database Systems: The Source Tagging Perspective. In Proceedings of the 16th International Conference on Very Large Data Bases (VLDB '90). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 519--538. http://dl.acm.org/citation.cfm?id=645916.758355
[38]
Matei Zaharia, Reynold S Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J Franklin, et al. 2016. Apache spark: a unified engine for big data processing. Commun. ACM 59, 11 (2016), 56--65.

Cited By

View all
  • (2024)Research Summary: Enhancing Localization, Selection, and Processing of Data in Vehicular Cyber-Physical SystemsProceedings of the 2024 Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating algorithms for Distributed systems10.1145/3663338.3663680(1-5)Online publication date: 17-Jun-2024
  • (2024)Nona: A Framework for Elastic Stream Provenance2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS60910.2024.00071(703-714)Online publication date: 23-Jul-2024
  • (2022)Research Summary: Deterministic, Explainable and Efficient Stream ProcessingProceedings of the 2022 Workshop on Advanced tools, programming languages, and PLatforms for Implementing and Evaluating algorithms for Distributed systems10.1145/3524053.3542750(65-69)Online publication date: 25-Jul-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
Middleware '18: Proceedings of the 19th International Middleware Conference
November 2018
299 pages
ISBN:9781450357029
DOI:10.1145/3274808
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 November 2018

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Data streaming
  2. Edge architectures
  3. Fine-grained data provenance

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

Middleware '18
Sponsor:
  • ACM
  • USENIX Assoc
  • IFIP

Acceptance Rates

Overall Acceptance Rate 203 of 948 submissions, 21%

Upcoming Conference

MIDDLEWARE '24
25th International Middleware Conference
December 2 - 6, 2024
Hong Kong , Hong Kong

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)1
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Research Summary: Enhancing Localization, Selection, and Processing of Data in Vehicular Cyber-Physical SystemsProceedings of the 2024 Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating algorithms for Distributed systems10.1145/3663338.3663680(1-5)Online publication date: 17-Jun-2024
  • (2024)Nona: A Framework for Elastic Stream Provenance2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS60910.2024.00071(703-714)Online publication date: 23-Jul-2024
  • (2022)Research Summary: Deterministic, Explainable and Efficient Stream ProcessingProceedings of the 2022 Workshop on Advanced tools, programming languages, and PLatforms for Implementing and Evaluating algorithms for Distributed systems10.1145/3524053.3542750(65-69)Online publication date: 25-Jul-2022
  • (2021)s2p: Provenance Research for Stream Processing SystemApplied Sciences10.3390/app1112552311:12(5523)Online publication date: 15-Jun-2021
  • (2021)Poster: Twins, a Middleware for Adaptive Streaming Provenance at the EdgeProceedings of the 22nd International Conference on Distributed Computing and Networking10.1145/3427796.3433931(235-236)Online publication date: 5-Jan-2021
  • (2021)Time- and Computation-Efficient Data Localization at Vehicular Networks’ EdgeIEEE Access10.1109/ACCESS.2021.31185969(137714-137732)Online publication date: 2021
  • (2021)ARTINALI++: Multi-dimensional specification mining for cyber-physical system securityJournal of Systems and Software10.1016/j.jss.2021.111016(111016)Online publication date: Jun-2021
  • (2021)Amnis: Optimized Stream Processing for Edge ComputingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.10.001Online publication date: Nov-2021
  • (2019)HarenProceedings of the 20th International Middleware Conference Demos and Posters10.1145/3366627.3368108(19-20)Online publication date: 9-Dec-2019
  • (2019)Automatic Translation of Spatio-Temporal Logics to Streaming-Based Monitoring Applications for IoT-Equipped Autonomous AgentsProceedings of the 6th International Workshop on Middleware and Applications for the Internet of Things10.1145/3366610.3368097(7-12)Online publication date: 9-Dec-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media