Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/1316689.1316767dlproceedingsArticle/Chapter ViewAbstractPublication PagesvldbConference Proceedingsconference-collections
Article

An annotation management system for relational databases

Published: 31 August 2004 Publication History

Abstract

We present an annotation management system for relational databases. In this system, every piece of data in a relation is assumed to have zero or more annotations associated with it and annotations are propagated along, from the source to the output, as data is being transformed through a query. Such an annotation management system is important for understanding the provenance and quality of data, especially in applications that deal with integration of scientific and biological data.
We present an extension, pSQL, of a fragment of SQL that has three different types of annotation propagation schemes, each useful for different purposes. The default scheme propagates annotations according to where data is copied from. The default-all scheme propagates annotations according to where data is copied from among all equivalent formulations of a given query. The custom scheme allows a user to specify how annotations should propagate. We present a storage scheme for the annotations and describe algorithms for translating a pSQL query under each propagation scheme into one or more SQL queries that would correctly retrieve the relevant annotations according to the specified propagation scheme. For the default-all scheme, we also show how we generate finitely many queries that can simulate the annotation propagation behavior of the set of all equivalent queries, which is possibly infinite. The algorithms are implemented and the feasibility of the system is demonstrated by a set of experiments that we have conducted.

References

[1]
{1} S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison Wesley Publishing Co, 1995.
[2]
{2} R. Apweiler, A. Bairoch, C. Wu, W. Barker, B. Boeckmann, S. Ferro, E. Gasteiger, H. Huang, R. Lopez, M. Magrane, M. Martin, D. Natale, C. O'Donovan, N. Redaschi, and L. Yeh. Uniprot: the universal protein knowledgebase. Nucleic Acids Research, 32:D115-D119, 2004.
[3]
{3} A. Bairoch and R. Apweiler. The SWISS-PROT protein sequence database and its supplement TrEMBL. Nucleic Acids Research, 28:45-48, 2000.
[4]
{4} P. Bernstein and T. Bergstraesser. Meta-Data Support for Data Transformations Using Microsoft Repository. IEEE Data Engineering Bulletin, 22(1):9-14, 1999.
[5]
{5} biodas.org. http://biodas.org.
[6]
{6} P. Buneman, S. Khanna, and W. Tan. Why and Where: A Characterization of Data Provenance. In Proceedings of the International Conference on Database Theory (ICDT), pages 316-330, London, United Kingdom, 2001.
[7]
{7} P. Buneman, S. Khanna, and W. Tan. On Propagation of Deletions and Annotations Through Views. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS), pages 150-158, Wisconsin, Madison, 2002.
[8]
{8} S. Chaudhuri and M. Y. Vardi. Optimization of real conjunctive queries. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS), pages 59-70, Washington, DC, 1993.
[9]
{9} Y. Cui, J. Widom, and J. Wiener. Tracing the Lineage of View Data in a Warehousing Environment. ACM Transactions on Database Systems (TODS), 25(2):179-227, 2000.
[10]
{10} DBCAT, The Public Catalog of Databases. http://www.infobiogen.fr/services/dbcat/, cited 5 June 2000.
[11]
{11} D. E. Denning, T. F. Lunt, R. R. Schell, W. R. Shockley, and M. Heckman. The SeaView Security Model. In IEEE Symposium on Security and Privacy, pages 218-233, Washington, DC, 1988.
[12]
{12} R. Dowell. A Distributed Annotation System. Technical report, Department of Computer Science, Washington University in St. Louis, 2001.
[13]
{13} S. Jajodia and R. S. Sandhu. Polyinstantiation integrity in multilevel relations. In IEEE Symposium on Security and Privacy, pages 104-115, Oakland, California, 1990.
[14]
{14} J. Kahan, M. Koivunen, E. Prud'Hommeaux, and R. Swick. Annotea: An open rdf infrastructure for shared web annotations. In Proceedings of the International World Wide Web Conference(WWW10), pages 623-632, Hong Kong, China, 2001.
[15]
{15} A. Kementseitsidis, M. Arenas, and R. J. Miller. Mapping Data in Peer-to-Peer Systems: Semantics and Algorithmic Issues. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), pages 325-336, San Diego, CA, 2003.
[16]
{16} W. J. Kent, C. W. Sugnet, T. S. Furey, K. M. Roskin, T. H. Pringle, A. M. Zahler, and D. Haussler. The Human Genome Browser at UCSC. Genome Research, 12(5):996-1006, 2002.
[17]
{17} D. LaLiberte and A. Braverman. A Protocol for Scalable Group and Public Annotations. In Proceedings of the International World Wide Web Conference(WWW3), Darmstadt, Germany, 1995.
[18]
{18} T. Lee, S. Bressan, and S. Madnick. Source Attribution for Querying Against Semi-structured Documents. In Workshop on Web Information and Data Management (WIDM), Washington, DC, 1998.
[19]
{19} A. C. Myers and B. Liskov. A decentralized model for information control. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), pages 129-142, Saint-Malo, France, 1997.
[20]
{20} T. A. Phelps and R. Wilensky. Multivalent Annotations. In Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries, pages 287-303, Pisa, Italy, 1997.
[21]
{21} T. A. Phelps and R. Wilensky. Multivalent documents. Proceedings of the Communications of the Association for Computing Machinery (CACM), 43(6):82-90, 2000.
[22]
{22} T. A. Phelps and R. Wilensky. Robust intra-document locations. In Proceedings of the International World Wide Web Conference(WWW9), pages 105-118, Amsterdam, Netherlands, 2000.
[23]
{23} M. A. Schickler, M. S. Mazer, and C. Brooks. Pan-Browser Support for Annotations and Other Meta-Information on the World Wide Web. In Proceedings of the International World Wide Web Conference(WWW5), Paris, France, 1996.
[24]
{24} W. Tan. Containment of relational queries with annotation propagation. In Proceedings of the International Workshop on Database and Programming Languages (DBPL), Potsdam, Germany, 2003.
[25]
{25} W3C. Annotea Project. http://www.w3.org/2001/Annotea.
[26]
{26} Y. R. Wang and S. E. Madnick. A Polygen Model for Heterogeneous Database Systems: The Source Tagging Perspective. In Proceedings of the International Conference on Very Large Data Bases (VLDB), pages 519-538, Brisbane, Queensland, Australia, 1990.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
August 2004
1380 pages

Sponsors

  • VLDB Endowment: Very Large Database Endowment

Publisher

VLDB Endowment

Publication History

Published: 31 August 2004

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2019)SmokeProceedings of the VLDB Endowment10.14778/3199517.319952211:6(719-732)Online publication date: 17-Jan-2019
  • (2018)SmokeProceedings of the VLDB Endowment10.5555/3199517.319952211:6(719-732)Online publication date: 1-Feb-2018
  • (2018)SmokeProceedings of the VLDB Endowment10.14778/3184470.318447511:6(719-732)Online publication date: 5-Oct-2018
  • (2018)Provenance for Interactive VisualizationsProceedings of the Workshop on Human-In-the-Loop Data Analytics10.1145/3209900.3209904(1-8)Online publication date: 10-Jun-2018
  • (2016)Incremental Stream Processing of Nested-Relational QueriesProceedings, Part I, 27th International Conference on Database and Expert Systems Applications - Volume 982710.1007/978-3-319-44403-1_19(305-320)Online publication date: 5-Sep-2016
  • (2015)Proactive Annotation Management in Relational DatabasesProceedings of the 2015 ACM SIGMOD International Conference on Management of Data10.1145/2723372.2749435(2017-2030)Online publication date: 27-May-2015
  • (2015)Even Metadata is Getting BigProceedings of the 2015 ACM SIGMOD International Conference on Management of Data10.1145/2723372.2735355(1409-1414)Online publication date: 27-May-2015
  • (2014)Efficient Stream Provenance via Operator InstrumentationACM Transactions on Internet Technology10.1145/263368914:1(1-26)Online publication date: 7-Aug-2014
  • (2014)InsightNotesProceedings of the 2014 ACM SIGMOD International Conference on Management of Data10.1145/2588555.2610501(661-672)Online publication date: 18-Jun-2014
  • (2014)LabelFlowRevised Selected Papers of the 5th International Provenance and Annotation Workshop on Provenance and Annotation of Data and Processes - Volume 862810.1007/978-3-319-16462-5_7(84-96)Online publication date: 9-Jun-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media