Abstract
Many curated databases are constructed by scientists integrating various existing data sources “by hand”, that is, by manually entering or copying data from other sources. Capturing provenance in such an environment is a challenging problem, requiring a good model of the process of curation. Existing models of provenance focus on queries/views in databases or computations on the Grid, not updates of databases or Web sites. In this paper we motivate and present a simple model of provenance for manually curated databases and discuss ongoing and future work.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bhagwat, D., Chiticariu, L., Tan, W.C., Vijayvargiya, G.: An annotation management system for relational databases. In: Proc. of the Intl. Conf. on Very Large Data Bases (VLDB), pp. 900–911. Morgan Kaufmann, San Francisco (2004)
Braganholo, V.P., Davidson, S.B., Heuser, C.A.: From XML view updates to relational view updates: old solutions to a new problem. In: VLDB 2004, pp. 276–287 (2004)
Braun, U., Garfinkel, S., Holland, D.A., Muniswamy-Reddy, K.-K., Seltzer, M.I.: Issues in automatic provenance collection. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 171–183. Springer, Heidelberg (2006)
Buneman, P.: How to cite curated databases and how to make them citable. In: SSDBM (to appear, 2006)
Buneman, P., Chapman, A.P., Cheney, J.: Provenance management in curated databases. In: SIGMOD (to appear, 2006)
Buneman, P., Khanna, S., Tan, W.-C.: Why and Where: A characterization of data provenance. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 316–330. Springer, Heidelberg (2000)
Cui, Y., Widom, J.: Lineage tracing for general data warehouse transformations. In: Proceedings of the 27th VLDB Conference, Roma, Italy, pp. 41–58 (2001)
Dellaire, G., Farrall, R., Bickmore, W.A.: The nuclear protein database (NPD): sub-nuclear localisation and functional annotation of the nuclear proteome. Nucleic Acids Research 31(1), 328–330 (2003)
Foster, I., Vockler, J., Eilde, M., Zhao, Y.: Chimera: A virtual data system for representing, querying, and automating data derivation. In: Ludäscher, B., Mamoulis, N. (eds.) SSDBM 2008. LNCS, vol. 5069, pp. 1–10. Springer, Heidelberg (2008)
Groth, P., Miles, S., Munroe, S.: Principles of high quality documentation for provenance: A philosophical discussion. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 278–286. Springer, Heidelberg (2006)
Groth, P., Miles, S., Fang, W., Wong, S.C., Zauner, K.-P., Moreau, L.: Recording and using provenance in a protein compressibility experiment. In: HPDC (2005)
Groth, P.T., Luck, M., Moreau, L.: A protocol for recording provenance in service-oriented grids. In: Higashino, T. (ed.) OPODIS 2004. LNCS, vol. 3544, pp. 124–139. Springer, Heidelberg (2005)
Muniswamy-Reddy, K., Holland, D., Braun, U., Seltzer, M.: Provenance-aware storage systems. In: Proceedings of the 2006 USENIX Annual Technical Conference, Boston, MA (June 2006) (to appear)
Roussel, N., Tabard, A., Letondal, C.: All you need is log. In: WWW 2006 Workshop on Logging Traces of Web Activity: The Mechanics of Data Collection (May 2006), Manuscript available at: http://torch.cs.dal.ca/~www2006/roussel-www2006-MechanicsDataCollection.pdf
Stevens, R.D., Robinson, A.J., Goble, C.A.: my Grid: personalised bioinformatics on the information grid. Bioinformatics (2003)
UniProt, http://www.ebi.ac.uk/uniprot/
Wang, Y.R., Madnick, S.E.: A polygen model for heterogeneous database systems: The source tagging perspective. In: McLeod, D., Sacks-Davis, R., Schek, H.-J. (eds.) 16th International Conference on Very Large Data Bases, Proceedings, Brisbane, Queensland, Australia, August 13-16, 1990, pp. 519–538. Morgan Kaufmann, San Francisco (1990)
Widom, J.: Trio: A system for integrated management of data, accuracy, and lineage. In: CIDR, pp. 262–276 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Buneman, P., Chapman, A., Cheney, J., Vansummeren, S. (2006). A Provenance Model for Manually Curated Data. In: Moreau, L., Foster, I. (eds) Provenance and Annotation of Data. IPAW 2006. Lecture Notes in Computer Science, vol 4145. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11890850_17
Download citation
DOI: https://doi.org/10.1007/11890850_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46302-3
Online ISBN: 978-3-540-46303-0
eBook Packages: Computer ScienceComputer Science (R0)