Abstract
The provenance, or lineage, of a workflow data product can be reconstructed by keeping a complete trace of workflow execution. This lineage information, however, is likely to be both imprecise, because of the black-box nature of the services that compose the workflow, and noisy, because of the many trivial data transformations that obscure the intended purpose of the workflow. In this paper we argue that these shortcomings can be alleviated by introducing a small set of optional lightweight annotations to the workflow, in a principled way. We begin by presenting a baseline, annotation-free lineage model for the Taverna workflow system, and then show how the proposed annotations improve the results of fundamental lineage queries.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Altintas, I., Barney, O., Jaeger-Frank, E.: Provenance collection support in the Kepler scientific workflow system. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 118–132. Springer, Heidelberg (2006)
Belhajjame, K., Embury, S.M., Paton, N.W., Stevens, R., Goble, C.A.: Automatic annotation of web services based on workflow definitions. ACM Transactions on the Web 2(2) (2008)
Benjelloun, O., Das Sarma, A., Halevy, A.Y., Theobald, M., Widom, J.: Databases with uncertainty and lineage. VLDB J. 17(2), 243–264 (2008)
Biton, O., Cohen-Boulakia, S., Davidson, S., Hara, C.: Querying and managing provenance through user views in scientific workflows. In: Procs. Internation. Conference on Data Engineering (ICDE) (April 2008)
Buneman, P., Khanna, S., Chiew Tan, W.: Why and where: A characterization of data provenance. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 316–330. Springer, Heidelberg (2000)
Buneman, P., Naqvi, S.A., Tannen, V., Wong, L.: Principles of programming with complex objects and collection types. Theor. Comput. Sci. 149(1), 3–48 (1995)
Callahan, S.P., Freire, J., Santos, E., Scheidegger, C.E., Silva, C.T., Vo, H.T.: VisTrails: visualization meets data management. In: SIGMOD Conference, pp. 745–747 (2006)
Chapman, A., Jagadish, H.V.: Issues in building practical provenance systems. IEEE Data Eng. Bull. 30(4), 38–43 (2007)
Hidders, J., Kwasnikowska, N., Sroka, J., Tyszkiewicz, J., Van den Bussche, J.: A formal model of dataflow repositories. In: Cohen-Boulakia, S., Tannen, V. (eds.) DILS 2007. LNCS (LNBI), vol. 4544, pp. 105–121. Springer, Heidelberg (2007)
Hull, D.: Description and classification of shims in mygrid. Technical report, University of Manchester (2006)
Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M.R., Li, P., Oinn, T.: Taverna: a tool for building and running workflows of services. Nucleic Acids Research 34, W729–W732 (2006)
Miles, S.: Electronically querying for the provenance of entities. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 184–192. Springer, Heidelberg (2006)
Miles, S., Wong, S.C., Fang, W., Groth, P.T., Zauner, K.-P., Moreau, L.: Provenance-based validation of e-science experiments. J. Web Sem. 5(1), 28–38 (2007)
Moreau, L., Freire, J., Futrelle, J., McGrath, R., Myers, J., Paulson, P.: The Open Provenance Model (December 2007)
Oinn, T., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, M., Carver, T., Glover, K., Pocock, M.R., Wipat, A., Li, P.: Taverna: A tool for the composition and enactment of bioinformatics workflows. Bioinformatics, 3045–3054 (November 2004)
Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance in e-science. SIGMOD Rec. 34(3), 31–36 (2005)
Simmhan, Y.L., Plale, B., Gannon, D.: A framework for collecting provenance in data-centric scientific workflows. In: ICWS, pp. 427–436 (2006)
Simmhan, Y.L., Plale, B., Gannon, D.: Towards a quality model for effective data selection in collaboratories. In: Proceedings of 22nd International Conference on Data Engineering Workshops, pp. 72–72 (2006)
Chiew Tan, W.: Provenance in databases: Past, current, and future. IEEE Data Eng. Bull. 30(4), 3–12 (2007)
Turi, D., Missier, P., De Roure, D., Goble, C., Oinn, T.: Taverna Workflows: Syntax and Semantics. In: Proceedings of the 3rd e-Science conference, Bangalore, India (December 2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Missier, P., Belhajjame, K., Zhao, J., Roos, M., Goble, C. (2008). Data Lineage Model for Taverna Workflows with Lightweight Annotation Requirements. In: Freire, J., Koop, D., Moreau, L. (eds) Provenance and Annotation of Data and Processes. IPAW 2008. Lecture Notes in Computer Science, vol 5272. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89965-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-89965-5_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89964-8
Online ISBN: 978-3-540-89965-5
eBook Packages: Computer ScienceComputer Science (R0)