Abstract
We study a class of integrity constraints for tree-structured data modelled as data trees, whose nodes have a label from a finite alphabet and store a data value from an infinite data domain. The constraints require each tuple of nodes selected by a conjunctive query (using navigational axes and labels) to satisfy a positive combination of equalities and a positive combination of inequalities over the stored data values. Such constraints are instances of the general framework of XML-to-relational constraints proposed recently by Niewerth and Schwentick. They cover some common classes of constraints, including W3C XML Schema key and unique constraints, as well as domain restrictions and denial constraints, but cannot express inclusion constraints, such as reference keys. Our main result is that consistency of such integrity constraints with respect to a given schema (modelled as a tree automaton) is decidable. An easy extension gives decidability for the entailment problem. Equivalently, we show that validity and containment of unions of conjunctive queries using navigational axes, labels, data equalities and inequalities is decidable, as long as none of the conjunctive queries uses both equalities and inequalities; without this restriction, both problems are known to be undecidable. In the context of XML data exchange, our result can be used to establish decidability for a consistency problem for XML schema mappings. All the decision procedures are doubly exponential, with matching lower bounds. The complexity may be lowered to singly exponential, when conjunctive queries are replaced by tree patterns, and the number of data comparisons is bounded.
Similar content being viewed by others
Notes
Provided by Michał Pilipczuk, during the Warsaw Automata Group’s research camp Autob ó z 2015.
References
Arenas, M., Barceló, P., Libkin, L., Murlak, F.: Foundations of data exchange. Cambridge University Press (2014)
Arenas, M., Fan, W., Libkin, L.: On the complexity of verifying consistency of XML specifications. SIAM J. Comput. 38(3), 841–880 (2008)
Arenas, M., Libkin, L.: A normal form for XML documents. ACM Trans. Database Syst. 29, 195–232 (2004)
Arenas, M., Libkin, L.: XML data exchange: Consistency and query answering. J. ACM, 55(2) (2008)
Benedikt, M., Fan, W., Geerts, F.: XPath satisfiability in the presence of DTDs. J. ACM, 55(2) (2008)
Björklund, H., Martens, W., Schwentick, T.: Conjunctive query containment over trees using schema information. Acta Informatica, 1–40 (2016)
Bojańczyk, M., Murlak, F., Witkowski, A.: Containment of monadic datalog programs via bounded clique-width. In: Proceedings of the ICALP 2015, pp. 427–439 (2015)
Bojańczyk, M., Muscholl, A., Schwentick, T., Segoufin, L.: Two-variable logic on data trees and XML reasoning. J. ACM 56(3), 13:1–13:48 (2009)
Chandra, A.K., Merlin, P.M.: Optimal implementation of conjunctive queries in relational data bases. In: Proceedings of the STOC 1977, pp. 77–90 (1977)
Courcelle, B., Makowsky, J.A., Rotics, U.: Linear time solvable optimization problems on graphs of bounded clique-width. Theory Comput. Syst. 33(2), 125–150 (2000)
Courcelle, B., Olariu, S.: Upper bounds to the clique width of graphs. Discret. Appl. Math. 101(1-3), 77–114 (2000)
David, C., Gheerbrant, A., Libkin, L., Martens, W.: Containment of pattern-based queries over data trees. In: Proceedings of the ICDT 2013, pp. 201–212 (2013)
David, C., Hofman, P., Murlak, F., Pilipczuk, M.: Synthesizing transformations from XML schema mappings. In: Proceedings of the ICDT 2014, pp. 61–71 (2014)
Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: semantics and query answering. Theor. Comput. Sci. 336(1), 89–124 (2005)
Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.C.: Composing schema mappings: Second-order dependencies to the rescue. ACM Trans. Database Syst. 30(4), 994–1055 (2005)
Fagin, R., Vardi, M.Y.: The theory of data dependencies - a survey. In: Mathematics of information processing, volume 34 of proceedings of symposia in applied mathematics, pp. 19–71. American Mathematical Society, Providence, Rhode Island (1986)
Figueira, D.: Alternating register automata on finite words and trees. Logical Methods Comput. Sci. 8(1), 2012
Gao, S., Sperberg-McQueen, C.M., Thompson, H.S., Mendelsohn, N., Beech, D., Maloney, M.: W3C XML Schema Definition Language (XSD) 1.1, Part 1: Structures. Technical report, World Wide Web Consortium, April (2009)
Gogacz, T., Marcinkowski, J.: All-instances termination of chase is undecidable. In: Proceedings of the ICALP 2014, pp. 293–304 (2014)
Gogacz, T., Marcinkowski, J.: Red spider meets a rainworm: query finite determinacy is undecidable. In: Proceedings of the PODS 2016, pp. 121–134 (2016)
Hartmann, S., Link, S.: More functional dependencies for XML. In: Proceedings of the ADBIS 2003, pp. 355–369 (2003)
Hartmann, S., Link, S., Trinh, T.: Solving the implication problem for XML, functional dependencies with properties. In: Proceedings of the WoLLIC 2010, pp. 161–175 (2010)
Jurdzinski, M., Lazic, R.: Alternating automata on data trees and XPath satisfiability. ACM Trans. Comput. Logic 12(3), 19:1–19:21 (2011)
Lenzerini, M.: Data integration: A theoretical perspective. In: Proceedings of the PODS 2002, pp. 233–246 (2002)
Miklau, G., Suciu, D: Containment and equivalence for a fragment of XPath. J. ACM, 51(1), 2–45 (2004)
Neven, F., Schwentick, T.: On the complexity of XPath containment in the presence of disjunction, DTDs, and variables. Log. Meth. Comput. Sci., 2(3) (2006)
Niewerth, M., Schwentick, T.: Reasoning XML constraints based on XML-to-relational mappings. In: Proceedings of the ICDT 2014, pp. 72–83 (2014)
Vardi, M.Y.: Fundamentals of dependency theory. In: Borger, E. (ed.) Trends in theoretical computer science, pp. 171–224. Computer Science Press (1987)
Acknowledgements
We thank the anonymous referees of ICDT 2016 and TOCS for their insightful questions.
Author information
Authors and Affiliations
Corresponding author
Additional information
This article is part of the Topical Collection on Special Issue on Database Theory
The First, third, and fourth author of this paper were supported by Poland’s National Science Centre grant 2013/11/D/ST6/03075.
Rights and permissions
About this article
Cite this article
Czerwiński, W., David, C., Murlak, F. et al. Reasoning about integrity constraints for tree-structured data. Theory Comput Syst 62, 941–976 (2018). https://doi.org/10.1007/s00224-017-9771-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00224-017-9771-z