Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Incremental validation of XML documents

Published: 12 December 2004 Publication History

Abstract

We investigate the incremental validation of XML documents with respect to DTDs, specialized DTDs, and XML Schemas, under updates consisting of element tag renamings, insertions, and deletions. DTDs are modeled as extended context-free grammars. "Specialized DTDs" allow the decoupling of element types from element tags. XML Schemas are abstracted as specialized DTDs with limitations on the type assignment. For DTDs and XML Schemas, we exhibit an O(m log n) incremental validation algorithm using an auxiliary structure of size O(n), where n is the size of the document and m the number of updates. The algorithm does not handle the incremental validation of XML Schema wrt renaming of internal nodes, which is handled by the specialized DTDs incremental validation algorithm. For specialized DTDs, we provide an O(m log2 n) incremental algorithm, again using an auxiliary structure of size O(n). This is a significant improvement over brute-force re-validation from scratch.We exhibit a restricted class of DTDs called local that arise commonly in practice and for which incremental validation can be done in practically constant time by maintaining only a list of counters. We present implementations of both general incremental validation and local validation on an XML database built on top of a relational database.Our experimentation includes a study of the applicability of local validation in practice, results on the calibration of parameters of the auxiliary data structure, and results on the performance comparison between the general incremental validation technique, the local validation technique, and brute-force validation from scratch.

References

[1]
Balmin, A. and Papakonstantinou, Y. 2004. Storing and querying XML data using denormalized relational databases. VLDB J. (Online First section). Available online at http://springerlink.metapress.com/openurl.asp?genre=article&id= Also available online at http://www.db.ucsd.edu/people/andrey.
[2]
Barbosa, D., Mendelzon, A., Libkin, L., Mignet, L., and Arenas, M. 2004. Efficient incremental validation of xml documents. In Proceedings of the 20th International Conference on Data Engineering (ICDE'04). IEEE Computer Society Press, Los Alamitos, CA.
[3]
Beeri, C. and Milo, T. 1999. Schemas for integration and translation of structured and semi-structured data. In Proceedings of the 7th International Conference on Database Theory. Lecture Notes in Computer Science, vol. 1540. Springer, New York, NY.
[4]
Bruggemann-Klein, A., Murata, M., and Wood, D. 2001. Regular tree and regular hedge languages over non-ranked alphabets. Tech. rep. HKUST-TCSC-2001-05. Hong Kong University of Science and Technology Computer Science Center, Hong Kong. Available online at http://www.cs.ust.hk/tcsc/RR/2001-05.ps.gz.
[5]
Bruggemann-Klein, A. and Wood, D. 1998. One-unambiguous regular languages. Inform. Computat. 142, 2, 182--206.
[6]
Choi, B. 2002. What are real DTDs like? In Proceedings of the Fifth International Workshop on the Web and Databases, WebDB. Informal proceedings, 43--48.
[7]
Cluet, S., Delobel, C., Simeon, J., and Smaga, K. 1998. Your mediators need data conversion! In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. ACM Press New York, NY, 177--188.
[8]
Cormen, T., Leiserson, C., and Rivest, R. 1990. Introduction to Algorithms. MIT Press, Cambridge, MA.
[9]
Dong, G. and Su, J. 1995. Space-bounded foies. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (May 22--25, San Jose, CA). ACM Press, New York, NY, 139--150.
[10]
Garcia-Molina, H., Ullman, J., and Widom, J. 2001. Database Systems: The Complete Book. Prentice Hall, Upper Saddle River, NJ.
[11]
Ghezzi, C. and Mandrioli, D. 1980. Augmenting parsers to support incrementality. J. Assoc. Comput. Mach. 27, 3, 564--579.
[12]
Hesse, B. and Immerman, N. 2002. Complete problems for dynamic complexity classes. In Proceedings of the 17th IEEE Symposium on Logic in Computer Science (LICS 2002). IEEE Computer Society Press, Los Alamitos, CA.
[13]
Ipedo. Ipedo XML Database: Technical overview. Available online at http://www.ipedo.com/downloads/products_ixd_technical_overview.pdf.
[14]
Jalili, F. and Gallier, J. 1982. Building friendly parsers. In Proceedings of the Ninth Annual ACM Symposium on Principles of Programming Languages. ACM Press, New York, NY.
[15]
Larcheveque, J. 1995. Optimal incremental parsing. ACM Trans. Programm. Lang. Syst. 17, 1, 1--15.
[16]
Levine, C. 1997. Standard benchmarks for database systems. Presented at Sigmod 97. Available online at http://www.tpc.org/information/sessions/sigmod/indexc.htm.
[17]
Li, W. 1995. A simple and efficient incremental LL(1) parsing. In Proceedings of the 22nd Seminar on Current Trends in Theory and Practice of Informatics(SOFSEM '95). Lecture Notes in Computer Science, vol. 1012. Springer, New York, NY.
[18]
Linden, G. 1993. Incremental updates in structured documents. Licentiate thesis, Report C-1993-19. Department of Computer Science, University of Helsinki, Helsinki, Finland.
[19]
Lohrey, M. 2001. On the parallel complexity of tree automata. In Proceedings of the 12th International Conference on Rewriting Techniques and Applications (RTA 2001). Lecture Notes in Computer Science, vol. 2051. Springer, New York, NY.
[20]
Miltersen, P., Subramanian, S., Vitter, J., and Tamassia, R. 1994. Complexity models for incremental computation. Theor. Comput. Sci. 130, 1, 203--236.
[21]
Murching, A., Prasant, Y., and Srikant, Y. 1990. Incremental recursive descent parsing. Comput. Lang. 15, 4, 193--204.
[22]
Neven, F. 2002. Automata, logic and XML. In Computer Science Logic, 16th International Workshop (CSL 2002). Lecture Notes in Computer Science, vol. 2471. Springer, New York, NY. Available online at http://alpha.luc.ac.be/lucg5503/publs.html.
[23]
Papakonstantinou, Y. and Vianu, V. 2000. DTD inference for views of XML data. In Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2000, Dallas, TX). ACM Press, New York, NY, 35--46.
[24]
Patnaik, S. and Immerman, N. 1997. Dyn-FO: A parallel, dynamic complexity class. J. Comput. Syst. Sci. 55, 2, 199--209.
[25]
Petrone, L. 1995. Reusing batch parsers as incremental parsers. In Foundations of Software Technology and Theoretical Computer Science. Lecture Notes in Computer Science, vol. 1026. Springer, New York, NY.
[26]
RELAX NG. Available online at http://www.relaxng.org.
[27]
Segoufin, L. 2002. Personal communication.
[28]
Shanmugasundaram, J., Tufte, K., Zhang, C., He, G., DeWitt, D. J., and Naughton, J. F. 1999. Relational databases for querying XML documents: Limitations and opportunities. In Proceedings of the 25th International Conference on Very Large Data Bases (VLDB'99), Edinburgh, Scotland, UK. Morgan Kaufmann, San Francisco, CA, 302--314.
[29]
Sur, G., Hammer, J., and Simeon, J. 2004. UpdateX---an XQuery-based language for processing updates in XML. In International Workshop on Programming Language Technologies for XML (PLAN-X 2004). Informal proceedings.
[30]
Tatarinov, I., Ives, Z., Halevy, A., and Weld, D. 2001. Updating XML. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY.
[31]
TPC-C. Benchmark. Available online at http://www.tpc.org/tpcc/.
[32]
Vollmer, H. 1999. Introduction to Circuit Complexity. Springer Verlag, New York, NY.
[33]
W3C. 1998. The extensible markup language (XML). W3C Recomendation. Available online at http://www.w3c.org/XML.
[34]
W3C. 2001. XML schema definition. W3C Recomendation. Available online at http://www. w3c.org/XML/Schema.
[35]
Wagner, T. and Graham, S. 1998. Efficient and flexible incremental parsing. ACM Trans. Programm. Lang. Syst. 20, 2, 980--1013.
[36]
Well. WellLogML DTD. Available online at http://www.posc.org/ebiz/WellLogML/.
[37]
XML Edt. XML editor products. Available online at http://www.perfectxml.com/soft.asp?cat=6.
[38]
XMLmind. XMLmind XML Editor. Available online at http://www.xmlmind.com/xmleditor/.
[39]
XMLspy document editor. Available online at http://www.xmlspy.com/products_doc.html.

Cited By

View all
  • (2022)A proposal for future data organization in enterprise systems—an analysis of established database approachesInformation Systems and e-Business Management10.1007/s10257-022-00555-620:3(441-494)Online publication date: 1-Sep-2022
  • (2021)Towards Elastic Incrementalization for DatalogProceedings of the 23rd International Symposium on Principles and Practice of Declarative Programming10.1145/3479394.3479415(1-16)Online publication date: 6-Sep-2021
  • (2021)Learning Finite Automata with ShuffleAdvances in Knowledge Discovery and Data Mining10.1007/978-3-030-75765-6_25(308-320)Online publication date: 11-May-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 29, Issue 4
December 2004
250 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/1042046
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 December 2004
Published in TODS Volume 29, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Update
  2. XML
  3. validation

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)2
Reflects downloads up to 23 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)A proposal for future data organization in enterprise systems—an analysis of established database approachesInformation Systems and e-Business Management10.1007/s10257-022-00555-620:3(441-494)Online publication date: 1-Sep-2022
  • (2021)Towards Elastic Incrementalization for DatalogProceedings of the 23rd International Symposium on Principles and Practice of Declarative Programming10.1145/3479394.3479415(1-16)Online publication date: 6-Sep-2021
  • (2021)Learning Finite Automata with ShuffleAdvances in Knowledge Discovery and Data Mining10.1007/978-3-030-75765-6_25(308-320)Online publication date: 11-May-2021
  • (2020)XML data manipulation in conventional and temporal XML databases: A surveyComputer Science Review10.1016/j.cosrev.2020.10023136(100231)Online publication date: May-2020
  • (2019)Enumeration on Trees with Tractable Combined Complexity and Efficient UpdatesProceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3294052.3319702(89-103)Online publication date: 25-Jun-2019
  • (2018)MSO Queries on TreesProceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science10.1145/3209108.3209144(769-778)Online publication date: 9-Jul-2018
  • (2018)Enumeration for FO Queries over Nowhere Dense GraphsProceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3196959.3196971(151-163)Online publication date: 27-May-2018
  • (2018)Enumeration of MSO Queries on Strings with Constant Delay and Logarithmic UpdatesProceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3196959.3196961(179-191)Online publication date: 27-May-2018
  • (2018)Toward Rapid Integration in High Assurance Mission SystemsNAECON 2018 - IEEE National Aerospace and Electronics Conference10.1109/NAECON.2018.8556811(109-116)Online publication date: Jul-2018
  • (2017)Linear Time Membership in a Class of Regular Expressions with Counting, Interleaving, and Unordered ConcatenationACM Transactions on Database Systems10.1145/313270142:4(1-44)Online publication date: 13-Nov-2017
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media