Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

XML data exchange: Consistency and query answering

Published: 15 May 2008 Publication History

Abstract

Data exchange is the problem of finding an instance of a target schema, given an instance of a source schema and a specification of the relationship between the source and the target. Theoretical foundations of data exchange have recently been investigated for relational data.
In this article, we start looking into the basic properties of XML data exchange, that is, restructuring of XML documents that conform to a source DTD under a target DTD, and answering queries written over the target schema. We define XML data exchange settings in which source-to-target dependencies refer to the hierarchical structure of the data. Combining DTDs and dependencies makes some XML data exchange settings inconsistent. We investigate the consistency problem and determine its exact complexity.
We then move to query answering, and prove a dichotomy theorem that classifies data exchange settings into those over which query answering is tractable, and those over which it is coNP-complete, depending on classes of regular expressions used in DTDs. Furthermore, for all tractable cases we give polynomial-time algorithms that compute target XML documents over which queries can be answered.

References

[1]
Abiteboul, S., and Duschka, O. M. 1998. Complexity of answering queries using materialized views. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 254--263.
[2]
Abiteboul, S., Kanellakis, P. C., and Grahne, G. 1991. On the representation and querying of sets of possible worlds. Theoret. Comput. Sci. 78, 1, 158--187.
[3]
Abiteboul, S., Segoufin, L., and Vianu, V. 2001. Representing and querying XML with incomplete information. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 150--161.
[4]
Amer-Yahia, S., Cho, S., Lakshmanan, L. V. S., and Srivastava, D. 2002. Tree pattern query minimization. VLDB J. 11, 4, 315--331.
[5]
Amer-Yahia, S., and Kotidis, Y. 2004. Web-services architecture for efficient XML data exchange. In Proceedings of the IEEE International Conference on Data Engineering (ICDE). IEEE Computer Society Press, Los Alamitos, CA, 523--534.
[6]
Arenas, M., Barceló, P., Fagin, R., and Libkin, L. 2004. Locally consistent transformations and query answering in data exchange. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 229--240.
[7]
Arenas, M., and Libkin, L. 2005. XML data exchange: consistency and query answering. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 13--24.
[8]
Benedikt, M., Fan, W., and Kuper, G. M. 2003. Structural properties of XPath fragments. In Proceedings of the International Conference on Database Theory (ICDT). Springer-Verlag, Heidelberg, Germany, 79--95.
[9]
Comon, H., Dauchet, M., Gilleron, R., Löding, C., Jacquemard, F., Lugiez, D., Tison, S., and Tommasi, M. 2007. Tree automata techniques and applications. (Available on: http://www.grappa.univ-lille3.fr/tata. release October, 12th 2007).
[10]
Deutsch, A., and Tannen, V. 2001. Containment and integrity constraints for XPath. In Proceedings of the International Workshop on Knowledge Representation meets Databases (KRDB). CEUR-WS.org.
[11]
Fagin, R., Kolaitis, P. G., Miller, R. J., and Popa, L. 2005a. Data exchange: Semantics and query answering. Theoret. Comput. Sci. 336, 1, 89--124.
[12]
Fagin, R., Kolaitis, P. G., and Popa, L. 2005b. Data exchange: getting to the core. ACM Trans. Datab. Syst. 30, 1, 174--210.
[13]
Fagin, R., Kolaitis, P. G., Popa, L., and Tan, W. C. 2005c. Composing schema mappings: Second-order dependencies to the rescue. ACM Trans. Datab. Syst. 30, 4, 994--1055.
[14]
Garey, M. R., and Johnson, D. S. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman.
[15]
Gottlob, G., Koch, C., and Schulz, K. U. 2006. Conjunctive queries over trees. J. ACM 53, 2, 238--272.
[16]
Imielinski, T., and Lipski, W. 1984. Incomplete information in relational databases. J. ACM 31, 4, 761--791.
[17]
Kozen, D. 2002. On two letters versus three. In Proceedings of the Fixed Points in Computer Science (FICS). University of Aarhus, 44--50.
[18]
Krishnamurthy, R., Kaushik, R., and Naughton, J. F. 2003. XML-SQL query translation literature: The state of the art and open problems. In Proceedings of the International XML Database Symposium (XSym). Springer-Verlag, Heidelberg, Germany, 1--18.
[19]
Lakshmanan, L. V. S., Ramesh, G., Wang, H., and Zhao, Z. 2004. On testing satisfiability of tree pattern queries. In Proceedings of the International Conference Very Large Data Bases (VLDB). Morgan Kaufmann, San Mateo, CA, 120--131.
[20]
Lenstra, H. W. 1983. Integer programming in a fixed number of variables. Math. Oper. Res. 8, 4, 538--548.
[21]
Lenzerini, M. 2002. Data integration: A theoretical perspective. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 233--246.
[22]
Miller, R. J., Hernández, M. A., Haas, L. M., Yan, L.-L., Ho, C. T. H., Fagin, R., and Popa, L. 2001. The Clio project: Managing heterogeneity. SIGMOD Record 30, 1, 78--83.
[23]
Neven, F. 2002. Automata, logic, and XML. In Proceedings of the Conference for Computer Science Logic (CSL). Springer-Verlag, Heidelberg, Germany, 2--26.
[24]
Neven, F., and Schwentick, T. 2003. XPath containment in the presence of disjunction, DTDs, and variables. In Proceedings of the International Conference on Database Theory (ICDT). Springer-Verlag, Heidelberg, Germany, 312--326.
[25]
Popa, L., Velegrakis, Y., Miller, R. J., Hernández, M. A., and Fagin, R. 2002. Translating web data. In Proceedings of the International Conference Very Large Data Bases (VLDB). Morgan-Kaufmann, San Mateo, CA, 598--609.
[26]
Seidl, H. 1990. Deciding equivalence of finite tree automata. SIAM J. Comput. 19, 3, 424--437.
[27]
Shu, N. C., Housel, B. C., Taylor, R. W., Ghosh, S. P., and Lum, V. Y. 1977. Express: A data extraction, processing, and restructuring system. ACM Trans. Database Syst. 2, 2, 134--174.
[28]
Vianu, V. 2001. A web odyssey: From Codd to XML. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 1--15.
[29]
Wood, P. T. 2003. Containment for XPath fragments under DTD constraints. In Proceedings of the International Conference on Database Theory (ICDT). Springer-Verlag, Heidelberg, Germany, 297--311.
[30]
Yu, C., and Popa, L. 2004. Constraint-based xml query rewriting for data integration. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, New York, 371--382.

Cited By

View all
  • (2021)Universal solutions for temporal data exchangeInformation and Computation10.1016/j.ic.2021.104793281:COnline publication date: 1-Dec-2021
  • (2020)The Absolute Consistency Problem for Relational Schema Mappings with Functional DependenciesIEICE Transactions on Information and Systems10.1587/transinf.2020EDP7102E103.D:11(2278-2288)Online publication date: 1-Nov-2020
  • (2020)Consistency and Certain Answers in Relational to RDF Data Exchange with Shape ConstraintsNew Trends in Databases and Information Systems10.1007/978-3-030-54623-6_9(97-107)Online publication date: 17-Aug-2020
  • Show More Cited By

Recommendations

Reviews

Anthony Joseph Duben

Extensible Markup Language (XML) data exchange is the process by which XML documents structured in a source document type definition (DTD) are queried under a different target DTD, given a set of transformation rules between the source DTD and target DTD. This task is to be accomplished in a manner that remains consistent with the structure of the source DTD. The process is unrelated to the use of the same DTD to structure documents passed among different applications that parse them using the same DTD rules. XML data exchange is a new area of research. Most of the important papers on this topic were published within the past eight years. This long mini-monograph was written to initiate and stimulate discussion of the basic theoretical issues underlying XML data exchange. Theoretical work so far has been concentrated on the relational model. The authors begin with these formulations and burrow deeper. Because of the length of this paper, it is better considered to be something of a short book. The paper is divided into seven sections and an appendix. The first section is an introduction, which poses the problem by employing an example of finding a book using two different DTDs. Section 2 states basic definitions, notational conventions, and shorthand representations in describing the contents of a DTD. These are important, since the source DTD is rewritten on the fly when a query is made in the target DTD. The third section is on XML data exchange settings. The authors depart from the relational source-to-target dependencies in the literature to use a formalism based on a tree pattern like the XML languages themselves. The fourth section continues the discussion of data exchange settings to study consistency, namely, whether a solution in the transformation between DTDs exists. Section 5 is on query answering. Certain answers, defined as the intersection of query results over all solutions, are sought. The sixth section occupies nearly half of the paper. It is on computing certain answers, and classifying them according to complexity. The seventh section briefly summarizes the accomplishments of the paper, and poses additional areas for further research. The appendix goes into the details of some of the proofs in the previous sections. This paper is a major contribution in a new research area. However, it is very terse (if that could be an appropriate description of a 72-page paper) because there is so much here. There is enough covered here to fill a book an order of magnitude larger in size. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of the ACM
Journal of the ACM  Volume 55, Issue 2
May 2008
282 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/1346330
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 May 2008
Accepted: 01 January 2008
Revised: 01 January 2008
Received: 01 December 2005
Published in JACM Volume 55, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data exchange
  2. XML
  3. computing certain answers
  4. consistency

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)3
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Universal solutions for temporal data exchangeInformation and Computation10.1016/j.ic.2021.104793281:COnline publication date: 1-Dec-2021
  • (2020)The Absolute Consistency Problem for Relational Schema Mappings with Functional DependenciesIEICE Transactions on Information and Systems10.1587/transinf.2020EDP7102E103.D:11(2278-2288)Online publication date: 1-Nov-2020
  • (2020)Consistency and Certain Answers in Relational to RDF Data Exchange with Shape ConstraintsNew Trends in Databases and Information Systems10.1007/978-3-030-54623-6_9(97-107)Online publication date: 17-Aug-2020
  • (2019)Answering Queries Using Views, Second EditionSynthesis Lectures on Data Management10.2200/S00884ED2V01Y201811DTM05414:3(1-275)Online publication date: 15-Apr-2019
  • (2018)Reflections on Schema Mappings, Data Exchange, and Metadata ManagementProceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3196959.3196991(107-109)Online publication date: 27-May-2018
  • (2018)Minimization of Tree PatternsJournal of the ACM10.1145/318028165:4(1-46)Online publication date: 25-Jul-2018
  • (2018)Reasoning about integrity constraints for tree-structured dataTheory of Computing Systems10.1007/s00224-017-9771-z62:4(941-976)Online publication date: 1-May-2018
  • (2017)Answering Queries Using ViewsSynthesis Lectures on Data Management10.2200/S00805ED1V01Y201709DTM0469:2(1-235)Online publication date: Dec-2017
  • (2017)A Data Exchange Tool Based on Ontology for Emergency Response SystemsMetadata and Semantic Research10.1007/978-3-319-70863-8_7(74-79)Online publication date: 14-Nov-2017
  • (2017)Schema Mappings: From Data Translation to Data CleaningA Comprehensive Guide Through the Italian Database Research Over the Last 25 Years10.1007/978-3-319-61893-7_12(203-217)Online publication date: 31-May-2017
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media