Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3026947.3026958guideproceedingsArticle/Chapter ViewAbstractPublication PagestappConference Proceedingsconference-collections
Article

The data, they are a-changin

Published: 08 June 2016 Publication History

Abstract

The cost of deriving actionable knowledge from large datasets has been decreasing thanks to a convergence of positive factors: low cost data generation, inexpensively scalable storage and processing infrastructure (cloud), software frameworks and tools for massively distributed data processing, and parallelisable data analytics algorithms. One observation that is often overlooked, however, is that each of these elements is not immutable, rather they all evolve over time. As those datasets change over time, the value of their derivative knowledge may decay, unless it is preserved by reacting to those changes. Our broad research goal is to develop models, methods, and tools for selectively reacting to changes by balancing costs and benefits, i.e. through complete or partial re-computation of some of the underlying processes. In this paper we present an initial model for reasoning about change and re-computations, and show how analysis of detailed provenance of derived knowledge informs re-computation decisions. We illustrate the main ideas through a real-world case study in genomics, namely on the interpretation of human variants in support of genetic diagnosis.

References

[1]
C. A. Curino, H. J. Moon, and C. Zaniolo. Graceful Database Schema Evolution: The PRISM Workbench. Proc. VLDB Endow., 1(1):761-772, aug 2008. ISSN 2150-8097.
[2]
S. Gao and C. Zaniolo. Provenance Management in Databases Under Schema Evolution. Proceedings of the 4th USENIX Conference on Theory and Practice of Provenance, (iii):11, 2012.
[3]
R. Ikeda and J. Widom. Panda: A system for provenance and data. Proceedings of the 2nd USENIX Workshop on the Theory and Practice of Provenance TaPP10, 33:1-8, 2010.
[4]
R. Ikeda, S. Salihoglu, and J. Widom. Provenance-based refresh in data-oriented workflows. Proceedings of the 20th ACM international conference on Information and knowledge management, pages 1659-1668, 2011.
[5]
R. Ikeda, A. Das Sarma, and J. Widom. Logical provenance in data-oriented workflows? In 2013 IEEE 29th International Conference on Data Engineering (ICDE), pages 877-888. IEEE, apr 2013. ISBN 978- 1-4673-4910-9.
[6]
D. Koop, E. Santos, B. Bauer, M. Troyer, J. Freire, and C. T. Silva. Bridging workflow and data provenance using strong links. In Scientific and statistical database management, pages 397-415. Springer, 2010. ISBN 3642138179.
[7]
P. Missier, E. Wijaya, R. Kirby, and M. Keogh. SVI: a simple single-nucleotide Human Variant Interpretation tool for Clinical Use. In Procs. 11th International conference on Data Integration in the Life Sciences, Los Angeles, CA, 2015. Springer.
[8]
L. Moreau, P. Missier, K. Belhajjame, R. B'Far, J. Cheney, S. Coppens, S. Cresswell, Y. Gil, P. Groth, G. Klyne, T. Lebo, J. McCusker, S. Miles, J. Myers, S. Sahoo, and C. Tilmes. PROV-DM: The PROV Data Model. Technical report, World Wide Web Consortium, 2012.
[9]
W. Pugh and T. Teitelbaum. Incremental Computation via Function Caching. In Proceedings of the 16th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '89, pages 315- 328, New York, NY, USA, 1989. ACM. ISBN 0-89791-294-2.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
TaPP'16: Proceedings of the 8th USENIX Conference on Theory and Practice of Provenance
June 2016
58 pages

Publisher

USENIX Association

United States

Publication History

Published: 08 June 2016

Author Tags

  1. big data analytics
  2. data change
  3. data refresh
  4. provenance

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 10 of 17 submissions, 59%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Nov 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media