Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-030-80960-7_9guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Incremental Inference of Provenance Types

Published: 22 June 2020 Publication History

Abstract

Long-running applications nowadays are increasingly instrumented to continuously log provenance. In that context, we observe an emerging need for processing fragments of provenance continuously produced by applications. Thus, there is an increasing requirement for processing of provenance incrementally, while the application is still running, to replace batch processing of a complete provenance dataset available only after the application has completed. A type of processing of particular interest is summarising provenance graphs, which has been proposed as an effective way of extracting key features of provenance and storing them in an efficient manner. To that goal, summarisation makes use of provenance types, which, in loose terms, are an encoding of the neighbourhood of nodes.
This paper shows that the process of creating provenance summaries of continuously provided data can benefit from a mode of incremental processing of provenance types. We also introduce the concept of a library of types to reduce the need for storing copies of the same string representations for types multiple times. Further, we show that the computational complexity associated with the task of inferring types is, in most common cases, the best possible: only new nodes have to be processed. We also identify and analyse the exception scenarios. Finally, although our library of types, in theory, can be exponentially large, we present empirical results that show it is quite compact in practice.

References

[1]
Chirigati, F., Shasha, D., Freire, J.: Reprozip: using provenance to support computational reproducibility. In: Presented as part of the 5th USENIX Workshop on the Theory and Practice of Provenance (2013)
[2]
Fan, W., Wang, X., Wu, Y.: Incremental graph pattern matching. ACM Trans. Database Syst. 38(3) (2013).
[3]
Gil, Y., et al.: PROV model primer. W3C Working Group Note (2013)
[4]
Glavic, B., Sheykh Esmaili, K., Fischer, P.M., Tatbul, N.: Ariadne: managing fine-grained provenance on data streams. In: Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems, DEBS 2013, pp. 39–50. Association for Computing Machinery, New York (2013).
[5]
Goldman, R., Widom, J.: Dataguides: enabling query formulation and optimization in semistructured databases. In: 23rd International Conference on Very Large Data Bases (VLDB 1997) (1997). http://ilpubs.stanford.edu:8090/232/
[6]
Gou, X., Zou, L., Zhao, C., Yang, T.: Fast and accurate graph stream summarization. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 1118–1129. IEEE (2019)
[7]
Groth, P., Moreau, L. (eds.): PROV-Overview. An Overview of the PROV Family of Documents. W3C Working Group Note NOTE-PROV-overview-20130430, World Wide Web Consortium, April 2013. http://www.w3.org/TR/2013/NOTE-prov-overview-20130430/
[8]
Han, X., Pasquier, T., Ranjan, T., Goldstein, M., Seltzer, M.: Frappuccino: fault-detection through runtime analysis of provenance. In: 9th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 2017) (2017)
[9]
Johnson, A.E., et al.: Mimic-iii, a freely accessible critical care database. Sci. Data 3, 160035 (2016)
[10]
Ma X, Fox P, Tilmes C, Jacobs K, and Waple A Capturing provenance of global change information Nat. Clim. Chang. 2014 4 409-413
[11]
Mariconti, E., Onwuzurike, L., Andriotis, P., Cristofaro, E.D., Ross, G.J., Stringhini, G.: Mamadroid: detecting android malware by building Markov chains of behavioral models. CoRR abs/1612.04433 (2016). http://arxiv.org/abs/1612.04433
[12]
Moreau L The foundations for provenance on the web Found. Trends Web Sci. 2010 2 2–3 99-241
[13]
Moreau, L.: Aggregation by provenance types: a technique for summarising provenance graphs. In: Graphs as Models 2015 (An ETAPS 2015 Workshop), pp. 129–144. Electronic Proceedings in Theoretical Computer Science, London, UK, April 2015.
[14]
Ramchurn, S., Huynh, T.D., Venanzi, M., Shi, B.: Collabmap: crowdsourcing maps for emergency planning. In: Proceedings of the 3rd Annual ACM Web Science Conference, WebSci 2013, pp. 326–335 (2013).
[15]
Ramchurn SD et al. A disaster response system based on human-agent collectives J. Artif. Intell. Res. 2016 57 661-708
[16]
Shervashidze, N., Schweitzer, P., Leeuwen, E.J.V., Mehlhorn, K., Borgwardt, K.M.: Weisfeiler-Lehman graph kernels. J. Mach. Learn. Re. 12(Sep), 2539–2561 (2011)
[17]
Song, C., Ge, T.: Labeled graph sketches. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 1312–1315. IEEE (2018)
[18]
Vijayakumar NN and Plale B Moreau L and Foster I Towards low overhead provenance tracking in near real-time stream filtering Provenance and Annotation of Data 2006 Heidelberg Springer 46-54
[19]
Vries GKD Blockeel H, Kersting K, Nijssen S, and Železný F A fast approximation of the Weisfeiler-Lehman graph kernel for RDF data Machine Learning and Knowledge Discovery in Databases 2013 Heidelberg Springer 606-621
[20]
Yao, Y., Holder, L.: Scalable SVM-based classification in dynamic graphs. In: 2014 IEEE International Conference on Data Mining, pp. 650–659, December 2014.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
Provenance and Annotation of Data and Processes: 8th and 9th International Provenance and Annotation Workshop, IPAW 2020 + IPAW 2021, Virtual Event, July 19–22, 2021, Proceedings
Jun 2020
273 pages
ISBN:978-3-030-80959-1
DOI:10.1007/978-3-030-80960-7

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 22 June 2020

Author Tags

  1. Provenance summaries
  2. Provenance types
  3. Incremental processing of provenance

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media