Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/564691.564693acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Archiving scientific data

Published: 03 June 2002 Publication History

Abstract

We present an archiving technique for hierarchical data with key structure. Our approach is based on the notion of timestamps whereby an element appearing in multiple versions of the database is stored only once along with a compact description of versions in which it appears. The basic idea of timestamping was discovered by Driscoll et. al. in the context of persistent data structures where one wishes to track the sequences of changes made to a data structure. We extend this idea to develop an archiving tool for XML data that is capable of providing meaningful change descriptions and can also efficiently support a variety of basic functions concerning the evolution of data such as retrieval of any specific version from the archive and querying the temporal history of any element. This is in contrast to diff-based approaches where such operations may require undoing a large number of changes or significant reasoning with the deltas. Surprisingly, our archiving technique does not incur any significant space overhead when contrasted with other approaches. Our experimental results support this and also show that the compacted archive file interacts well with other compression techniques. Finally, another useful property of our approach is that the resulting archive is also in XML and hence can directly leverage existing XML tools.

References

[1]
A. Bairoch and R. Apweiler. The SWISS-PROT protein sequence database and its supplement TrEMBL. Nucleic Acids Research, 28:45-48, 2000.
[2]
P. Buneman, S. Khanna, K. Tajima, and W. Tan. Archiving Scientific Data. Technical report, University of Pennsylvania, 2002.
[3]
The WWW Virtual Library of Cell Biology. http://vlib.org/Science/Cell_Biology/databases.shtml.
[4]
Concurrent Versions System. Unix man pages - cvs.
[5]
E. Myers. An O(ND) difference algorithm and its variations. Algorithmica, 1(2):251-266, 1986.
[6]
G. Cobena and S. Abiteboul and A. Marian. Detecting Changes in XML Documents. In Int'l Conf. on Data Engineering, 2001.
[7]
XML TreeDiff. http://www.alphaworks.ibm.com/formula/xmltreediff.
[8]
J. Clark and S. DeRose. XML Path Language (XPath). W3C Working Draft, November 1999. http://www.w3.org/TR/xpath.
[9]
J. R. Driscoll and N. Sarnak and D. D. Sleator and R. E. Tarjan. Making Data Structures Persistent. Journal of Computer and System Sciences, 38(1):86-124, 1989.
[10]
K. Zhang and D. Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal of Computing, 18(6):1245-1262, 1989.
[11]
K. Zhang and D. Shasha. Fast algorithms for unit cost editing distance between trees. Journal of Algorithms, 11(6):581-621, 1990.
[12]
H. Liefke and D. Suciu. XMill: an Efficient Compressor for XML Data. In Proc. of ACM SIGMOD Int'l Conf. on Mgmt. of Data, 2000.
[13]
A. Marian, S. Abiteboul, G. Cobena, and L. Mignet. Change-Centric Management of Versions in an XML Warehouse. In Int'l Conf. of Very Large Data Bases, 2001.
[14]
Online Mendelian Inheritance in Man, OMIM (TM), 2000. http://www.ncbi.nlm.nih.gov/omim/.
[15]
P. Buneman and S. Davidson and W. Fan and C. Hara and W. Tan. Keys for XML. In Proc. of Int'l World Wide Web Conf., 2001.
[16]
The NIST Reference on Constants, Units, and Uncertainty. http://physics.nist.gov/cuu/Constants/links.html.
[17]
R. Ramakrishnan and J. Gehrke. Database Management Systems. McGraw-Hill Higher Education, 2000.
[18]
S. Chien and V. J. Tsotras and C. Zaniolo. Efficient Management of Multiversion Documents by Object Referencing. In Int'l Conf. of Very Large Data Bases, 2001.
[19]
S. S. Chawathe and A. Rajaraman and H. Garcia-Molina and J. Widom. Change Detection in Hierarchically Structured Information. In Proc. of ACM SIGMOD Int'l Conf. on Mgmt. of Data, 1996.
[20]
S. S. Chawathe and H. Garcia-Molina. Meaningful Change Detection in Structured Data. In Proc. of ACM SIGMOD Int'l Conf. on Mgmt. of Data, 1997.
[21]
Source Code Control System. Unix man pages - sccs.
[22]
A. R. Schmidt, F. Waas, M. L. Kersten, D. Florescu, I. Manolescu, M. J. Carey, and R. Busse. The XML Benchmark Project. Technical report, INS-R0103, CWI, 2001. http://monetdb.cwi.nl/xml/index.html.
[23]
K. Tufte and D. Maier. Aggregation and Accumulation of XML Data. IEEE Data Engineering Bulletin, 24(2):34-39, 2001.
[24]
W. Miller and E. Myers. A file comparison program. Software-Practice and Experience, 15(11):1025-1040, 1985.
[25]
W3C. Extensible Markup Language (XML) 1.0, Feb 1998. http://www.w3.org/TR/REC-xml.
[26]
W3C. Namespaces in XML, January 1999. http://www.w3.org/TR/REC-xml-names.
[27]
W3C. XML Schema Part 0: Primer, May 2000. http://www.w3.org/TR/xmlschema-0/.
[28]
W3C. XQuery 1.0: An XML Query Language, June 2001. http://www.w3.org/TR/xquery/.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '02: Proceedings of the 2002 ACM SIGMOD international conference on Management of data
June 2002
654 pages
ISBN:1581134975
DOI:10.1145/564691
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 June 2002

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS02

Acceptance Rates

SIGMOD '02 Paper Acceptance Rate 42 of 240 submissions, 18%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Data ProvenanceEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_1305(812-812)Online publication date: 7-Dec-2018
  • (2016)Data ProvenanceEncyclopedia of Database Systems10.1007/978-1-4899-7993-3_1305-2(1-2)Online publication date: 13-Dec-2016
  • (2012)On Modeling and Querying Concept EvolutionJournal on Data Semantics10.1007/s13740-012-0001-11:1(31-55)Online publication date: 21-Mar-2012
  • (2012)Modeling temporal dimensions of semistructured dataJournal of Intelligent Information Systems10.1007/s10844-011-0170-738:3(601-644)Online publication date: 1-Jun-2012
  • (2011)Supporting queries spanning across phases of evolving artifacts using Steiner forestsProceedings of the 20th ACM international conference on Information and knowledge management10.1145/2063576.2063815(1649-1658)Online publication date: 24-Oct-2011
  • (2011)PRESIDIOACM Transactions on Storage10.1145/1970348.19703517:2(1-60)Online publication date: 1-Jul-2011
  • (2010)A pattern-based temporal XML query languageProceedings of the 11th international conference on Web information systems engineering10.5555/1991336.1991385(428-441)Online publication date: 12-Dec-2010
  • (2010)Design, implementation and use of a simulation data archive for coastal scienceProceedings of the 19th ACM International Symposium on High Performance Distributed Computing10.1145/1851476.1851572(651-657)Online publication date: 21-Jun-2010
  • (2010)Managing scientific dataCommunications of the ACM10.1145/1743546.174356853:6(68-78)Online publication date: 1-Jun-2010
  • (2010)A Pattern-Based Temporal XML Query Language11th International Conference on Web Information Systems Engineering --- WISE 2010 - Volume 648810.1007/978-3-642-17616-6_39(428-441)Online publication date: 12-Dec-2010
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media