Abstract
Data in many industrial application systems are often neither completely structured nor unstructured. Consequently semi-structured data models such as XML have become popular as a lowest common denominator to manage such data. The problem is that although XML is adequate to represent the flexible portion of the data, it fails to exploit the highly structured portion of the data. XML normalization theory could be used to factor out the structured portion of the data at the schema level, however, queries written against the original schema no longer run on the normalized XML data. In this paper, we propose a new approach called eXtricate that stores XML documents in a space-efficient decomposed way while supporting efficient processing on the original queries. Our method exploits the fact that considerable amount of information is shared among similar XML documents, and by regarding each document as consisting of a shared framework and a small diff script, we can leverage the strengths of both the relational and XML data models at the same time to handle such data effectively. We prototyped our approach on top of DB2 9 pureXML (a commercial hybrid relational-XML DBMS). Our experiments validate the amount of redundancy in real e-catalog data and show the effectiveness of our method.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Somani, A., Xu, Y.: Storage and querying of e-commerce data. In: VLDB. Morgan Kaufmann, San Francisco (2001)
Wang, M., Chang, Y., Padmanabhan, S.: Supporting efficient parametric search of ecommerce data: A loosely-coupled solution. In: Chaudhri, A.B., Unland, R., Djeraba, C., Lindner, W. (eds.) EDBT 2002. LNCS, vol. 2490, pp. 409–426. Springer, Heidelberg (2002)
Copeland, G.P., Khoshafian, S.: A decomposition storage model. In: SIGMOD, pp. 268–279. ACM Press, New York (1985)
Khoshafian, S., Copeland, G.P., Jagodis, T., Boral, H., Valduriez, P.: A query processing strategy for the decomposed storage model. In: ICDE, pp. 636–643. IEEE, Los Alamitos (1987)
Lim, L., Wang, M.: Managing e-commerce catalogs in a DBMS with native XML support. In: ICEBE. IEEE, Los Alamitos (2005)
Liefke, H., Suciu, D.: XMill: An efficient compressor for XML data. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) SIGMOD, pp. 153–164 (2000)
Tolani, P., Haritsa, J.R.: XGrind: A query-friendly XML compressor. In: ICDE (2002)
Arenas, M., Libkin, L.: A normal form for XML documents. In: PODS, pp. 85–96 (2002)
Libkin, L.: Normalization theory for XML. In: Barbosa, D., Bonifati, A., Bellahsene, Z., Hunt, E., Unland, R. (eds.) XSym. LNCS, vol. 4704, pp. 1–13. Springer, Heidelberg (2007)
Arenas, M.: Normalization theory for XML. SIGMOD Rec. 35, 57–64 (2006)
Nicola, M., der Linden, B.V.: Native XML support in DB2 universal database. In: VLDB, pp. 1164–1174 (2005)
Ozcan, F., Cochrane, R., Pirahesh, H., Kleewein, J., Beyer, K., Josifovski, V., Zhang, C.: System RX: One part relational, one part XML. In: SIGMOD (2005)
Funderburk, J.E., Malaika, S., Reinwald, B.: XML programming with SQL/XML and XQuery. IBM Systems Journal 41 (2002)
Zhang, K.: A constrained edit distance between unordered labeled trees. Algorithmica 15, 205–222 (1996)
Wang, Y., DeWitt, D.J., yi Cai, J.: X-Diff: An effective change detection algorithm for XML documents. In: ICDE, pp. 519–530 (2003)
Chawathe, S.S., Rajaraman, A., Garcia-Molina, H., Widom, J.: Change detection in hierarchically structured information. In: SIGMOD, pp. 493–504. ACM Press, New York (1996)
Cobena, G., Abdessalem, T., Hinnach, Y.: A comparative study of XML diff tools (2002), http://www.deltaxml.com/pdf/is2004.pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lim, L., Wang, H., Wang, M. (2008). Modeling and Querying E-Commerce Data in Hybrid Relational-XML DBMSs. In: Li, Q., Spaccapietra, S., Yu, E., Olivé, A. (eds) Conceptual Modeling - ER 2008. ER 2008. Lecture Notes in Computer Science, vol 5231. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87877-3_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-87877-3_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87876-6
Online ISBN: 978-3-540-87877-3
eBook Packages: Computer ScienceComputer Science (R0)