Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Ananke: a streaming framework for live forward provenance

Published: 01 November 2020 Publication History

Abstract

Data streaming enables online monitoring of large and continuous event streams in Cyber-Physical Systems (CPSs). In such scenarios, fine-grained backward provenance tools can connect streaming query results to the source data producing them, allowing analysts to study the dependency/causality of CPS events. While CPS monitoring commonly produces many events, backward provenance does not help prioritize event inspection since it does not specify if an event's provenance could still contribute to future results.
To cover this gap, we introduce Ananke, a framework to extend any fine-grained backward provenance tool and deliver a live bipartite graph of fine-grained forward provenance. With Ananke, analysts can prioritize the analysis of provenance data based on whether such data is still potentially being processed by the monitoring queries. We prove our solution is correct, discuss multiple implementations, including one leveraging streaming APIs for parallel analysis, and show Ananke results in small overheads, close to those of existing tools for fine-grained backward provenance.

References

[1]
Tyler Akidau, Alex Balikov, Kaya Bekiroğlu, Slava Chernyak, Josh Haberman, Reuven Lax, Sam McVeety, Daniel Mills, Paul Nordstrom, and Sam Whittle. 2013. MillWheel: fault-tolerant stream processing at internet scale. Proceedings of the VLDB Endowment 6, 11 (2013), 1033--1044.
[2]
Tyler Akidau, Robert Bradshaw, Craig Chambers, Slava Chernyak, Rafael J Fer Andez-Moctezuma, Reuven Lax, Sam Mcveety, Daniel Mills, Frances Perry, Eric Schmidt, and Sam Whittle Google. 2015. The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing. VLDB 8, 12 (2015), 1792--1803.
[3]
Apache. 2020. Beam. Retrieved November 12, 2020 from https://beam.apache.org/
[4]
Apache. 2020. Heron. Retrieved November 12, 2020 from https://heron.incubator.apache.org/
[5]
Apache. 2020. Storm. Retrieved November 12, 2020 from http://storm.apache.org/
[6]
Arvind Arasu, Mitch Cherniack, Eduardo Galvez, David Maier, Anurag S. Maskey, Esther Ryvkina, Michael Stonebraker, and Richard Tibbetts. 2004. Linear Road: A Stream Data Management Benchmark. In Proceedings of the Thirtieth International Conference on Very Large Data Bases - Volume 30 (Toronto, Canada) (VLDB '04). VLDB Endowment, Toronto, Canada, 480--491. http://dl.acm.org/citation.cfm?id=1316689.1316732
[7]
Leilani Battle, Danyel Fisher, Robert DeLine, Mike Barnett, Badrish Chandramouli, and Jonathan Goldstein. 2016. Making Sense of Temporal Queries with Interactive Visualization. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems - CHI '16. ACM Press, Santa Clara, California, USA, 5433--5443.
[8]
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 36, 4 (2015), 28--38.
[9]
Badrish Chandramouli, Jonathan Goldstein, Mike Barnett, Robert Deline, Danyel Fisher, John C Platt, James F Terwilliger, and John Wernsing. 2015. Trill : A High-Performance Incremental Query Processor for Diverse Analytics. VLDB - Very Large Data Bases 8, 4 (2015), 401--412.
[10]
Ming-Fang Chang, John Lambert, Patsorn Sangkloy, Jagjeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, et al. 2019. Argoverse: 3D Tracking and Forecasting With Rich Maps. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8740--8749.
[11]
James Cheney, Laura Chiticariu, and Wang-Chiew Tan. 2007. Provenance in Databases: Why, How, and Where. Foundations and Trends in Databases 1, 4 (2007), 379--474.
[12]
Daniel Crawl, Jianwu Wang, and Ilkay Altintas. 2011. Provenance for MapReduce-Based Data-Intensive Workflows. In Proceedings of the 6th Workshop on Workflows in Support of Large-Scale Science (Seattle, Washington, USA) (WORKS '11). Association for Computing Machinery, New York, NY, USA, 21--30.
[13]
Yingwei Cui, Jennifer Widom, and Janet L. Wiener. 2000. Tracing the Lineage of View Data in a Warehousing Environment. ACM Transactions on Database Systems 25, 2 (June 2000), 179--227.
[14]
Wim De Pauw, Mihai Leţia, Buğra Gedik, Henrique Andrade, Andy Frenkiel, Michael Pfeifer, and Daby Sow. 2010. Visual Debugging for Stream Processing Applications. In Runtime Verification, Howard Barringer, Ylies Falcone, Bernd Finkbeiner, Klaus Havelund, Insup Lee, Gordon Pace, Grigore Roşu, Oleg Sokolsky, and Nikolai Tillmann (Eds.). Vol. 6418. Springer Berlin Heidelberg, Berlin, Heidelberg, 18--35.
[15]
Katarina Nielsen Dominiak and Anders Ringgaard Kristensen. 2017. Prioritizing alarms from sensor-based detection models in livestock production - A review on model performance and alarm reducing methods. Computers and Electronics in Agriculture 133 (2017), 46 -- 67.
[16]
Romaric Duvignau, Vincenzo Gulisano, Marina Papatriantafilou, and Vladimir Savic. 2019. Streaming Piecewise Linear Approximation for Efficient Data Management in Edge Computing. In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing - SAC '19. ACM Press, Limassol, Cyprus, 593--596.
[17]
Bugra Gedik, Henrique Andrade, Kun-Lung Wu, Philip S. Yu, and Myungcheol Doo. 2008. SPADE: The Systems Declarative Stream Processing Engine. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD '08). Association for Computing Machinery, New York, NY, USA, 1123--1134.
[18]
GitHub. 2020. Ananke Implementation. Retrieved November 12, 2020 from https://github.com/dmpalyvos/ananke
[19]
Boris Glavic, Kyumars Sheykh Esmaili, Peter M. Fischer, and Nesime Tatbul. 2014. Efficient Stream Provenance via Operator Instrumentation. ACM Trans. Internet Technol. 14, 1, Article 7 (Aug. 2014), 26 pages.
[20]
Boris Glavic, Kyumars Sheykh Esmaili, Peter Michael Fischer, and Nesime Tatbul. 2013. Ariadne: Managing Fine-Grained Provenance on Data Streams. In Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems (Arlington, Texas, USA) (DEBS '13). Association for Computing Machinery, New York, NY, USA, 39--50.
[21]
HardKernel. 2020. Odroid-XU4. Retrieved November 12, 2020 from http://www.hardkernel.com
[22]
Bastian Havers, Romaric Duvignau, Hannaneh Najdataei, Vincenzo Gulisano, Marina Papatriantafilou, and Ashok Chaitanya Koppisetty. 2020. DRIVEN: A framework for efficient Data Retrieval and clustering in Vehicular Networks. Future Generation Computer Systems 107 (2020), 1--17.
[23]
Melanie Herschel, Ralf Diestelkämper, and Houssem Ben Lahmar. 2017. A Survey on Provenance: What for? What Form? What From? VLDB Journal 26, 6 (2017), 881--906.
[24]
Mohammad Rezwanul Huq, Andreas Wombacher, and Peter M.G. Apers. 2011. Adaptive Inference of Fine-grained Data Provenance to Achieve High Accuracy at Lower Storage Costs. IEEE Computer Society, USA, 202--209.
[25]
Jeong-Hyon Hwang, Ugur Cetintemel, and Stan Zdonik. 2007. Fast and Reliable Stream Processing over Wide Area Networks. In 2007 IEEE 23rd International Conference on Data Engineering Workshop. 604--613.
[26]
Jin Li, Kristin Tufte, Vladislav Shkapenyuk, Vassilis Papadimos, Theodore Johnson, and David Maier. 2008. Out-of-order processing: a new architecture for high-performance stream systems. Proceedings of the VLDB Endowment 1, 1 (2008), 274--288.
[27]
MongoDB. 2020. MongoDB. Retrieved November 12, 2020 from https://www.mongodb.com
[28]
Hannaneh Najdataei, Yiannis Nikolakopoulos, Vincenzo Gulisano, and Marina Papatriantafilou. 2018. Continuous and Parallel LiDAR Point-Cloud Clustering. In 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS). IEEE Computer Society, Vienna, Austria, 671--684.
[29]
Neo4j. 2020. Neo4j. Retrieved November 12, 2020 from https://neo4j.com/
[30]
Dimitris Palyvos-Giannas, Vincenzo Gulisano, and Marina Papatriantafilou. 2019. GeneaLog: Fine-grained data streaming provenance in cyber-physical systems. Parallel Comput. 89 (2019), 102552.
[31]
Dimitris Palyvos-Giannas, Vincenzo Gulisano, and Marina Papatriantafilou. 2019. Haren: A Framework for Ad-Hoc Thread Scheduling Policies for Data Streaming Applications. In Proceedings of the 13th ACM International Conference on Distributed and Event-Based Systems (DEBS '19). ACM, Darmstadt, Germany, 19--30.
[32]
Fabio Pasqualetti, Florian Dörfler, and Francesco Bullo. 2013. Attack Detection and Identification in Cyber-Physical Systems. IEEE Trans. Automat. Control 58, 11 (2013), 2715--2729.
[33]
PostgreSQL. 2020. PostgreSQL. Retrieved November 12, 2020 from https://www.postgresql.org
[34]
Saeed Salah, Gabriel Maciá-Fernández, and Jesús E. Díaz-Verdejo. 2013. A model-based survey of alert correlation techniques. Computer Networks 57, 5 (2013), 1289 -- 1317.
[35]
SQLite. 2020. SQLite. Retrieved November 12, 2020 from https://www.sqlite.org/
[36]
Michael Stonebraker, Uǧur Çetintemel, and Stan Zdonik. 2005. The 8 requirements of real-time stream processing. ACM Sigmod Record 34, 4 (2005), 42--47.
[37]
Joris van Rooij, Vincenzo Gulisano, and Marina Papatriantafilou. 2018. LoCo-Volt: Distributed Detection of Broken Meters in Smart Grids through Stream Processing. In Proceedings of the 12th ACM International Conference on Distributed and Event-Based Systems (Hamilton, New Zealand) (DEBS '18). Association for Computing Machinery, New York, NY, USA, 171--182.
[38]
Chad Vicknair, Michael Macias, Zhendong Zhao, Xiaofei Nan, Yixin Chen, and Dawn Wilkins. 2010. A Comparison of a Graph Database and a Relational Database: A Data Provenance Perspective. In Proceedings of the 48th Annual Southeast Regional Conference (Oxford, Mississippi) (ACM SE '10). Association for Computing Machinery, New York, NY, USA, Article 42, 6 pages.
[39]
Nithya N. Vijayakumar and Beth Plale. 2006. Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering. In Provenance and Annotation of Data, Luc Moreau and Ian Foster (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 46--54.
[40]
Min Wang, Marion Blount, John Davis, Archan Misra, and Daby Sow. 2007. A Time-and-value Centric Provenance Model and Architecture for Medical Event Streams. In Proceedings of the 1st ACM SIGMOBILE International Workshop on Systems and Networking Support for Healthcare and Assisted Living Environments (San Juan, Puerto Rico) (HealthNet '07). ACM, New York, NY, USA, 95--100.
[41]
Yu Zheng, Xing Xie, and Wei-Ying Ma. 2010. Geolife: A collaborative social networking service among user, location and trajectory. IEEE Data Eng. Bull. 33, 2 (2010), 32--39.

Cited By

View all
  • (2024)Research Summary: Enhancing Localization, Selection, and Processing of Data in Vehicular Cyber-Physical SystemsProceedings of the 2024 Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating algorithms for Distributed systems10.1145/3663338.3663680(1-5)Online publication date: 17-Jun-2024
  • (2024)Aggregates are all you need (to bridge stream processing and Complex Event Recognition)Proceedings of the 18th ACM International Conference on Distributed and Event-based Systems10.1145/3629104.3666032(66-77)Online publication date: 24-Jun-2024
  • (2024)A survey on the evolution of stream processing systemsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00819-833:2(507-541)Online publication date: 1-Mar-2024
  • Show More Cited By

Index Terms

  1. Ananke: a streaming framework for live forward provenance
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 14, Issue 3
      November 2020
      217 pages
      ISSN:2150-8097
      Issue’s Table of Contents

      Publisher

      VLDB Endowment

      Publication History

      Published: 01 November 2020
      Published in PVLDB Volume 14, Issue 3

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)28
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 21 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Research Summary: Enhancing Localization, Selection, and Processing of Data in Vehicular Cyber-Physical SystemsProceedings of the 2024 Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating algorithms for Distributed systems10.1145/3663338.3663680(1-5)Online publication date: 17-Jun-2024
      • (2024)Aggregates are all you need (to bridge stream processing and Complex Event Recognition)Proceedings of the 18th ACM International Conference on Distributed and Event-based Systems10.1145/3629104.3666032(66-77)Online publication date: 24-Jun-2024
      • (2024)A survey on the evolution of stream processing systemsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00819-833:2(507-541)Online publication date: 1-Mar-2024
      • (2024)Evolutionary Computation Meets Stream ProcessingApplications of Evolutionary Computation10.1007/978-3-031-56852-7_24(377-393)Online publication date: 3-Mar-2024
      • (2023)Augmented lineage: traceability of data analysis including complex UDF processingThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-022-00769-732:5(963-983)Online publication date: 1-Sep-2023
      • (2022)ErebusProceedings of the VLDB Endowment10.14778/3565816.356582516:2(230-242)Online publication date: 1-Oct-2022
      • (2022)Towards data-driven additive manufacturing processesProceedings of the 23rd International Middleware Conference Industrial Track10.1145/3564695.3564778(43-49)Online publication date: 7-Nov-2022
      • (2022)Proposing a framework for evaluating learning strategies in vehicular CPSsProceedings of the 23rd International Middleware Conference Industrial Track10.1145/3564695.3564775(22-28)Online publication date: 7-Nov-2022
      • (2022)Research Summary: Deterministic, Explainable and Efficient Stream ProcessingProceedings of the 2022 Workshop on Advanced tools, programming languages, and PLatforms for Implementing and Evaluating algorithms for Distributed systems10.1145/3524053.3542750(65-69)Online publication date: 25-Jul-2022
      • (2021)LachesisProceedings of the 22nd International Middleware Conference10.1145/3464298.3493407(365-378)Online publication date: 6-Dec-2021

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media