Abstract
XML is a flexible and powerful tool that enables information and security sharing in heterogeneous environments. Scalable technologies are needed to effectively manage the growing volumes of XML data. A wide variety of methods exist for storing and searching XML data; the two most common techniques are conventional tree-based and relational approaches. Tree-based approaches represent XML as a tree and use indexes and path join algorithms to process queries. In contrast, the relational approach utilizes the power of a mature relational database to store and search XML. This method relationally maps XML queries to SQL and reconstructs the XML from the database results. To date, the limited acceptance of the relational approach to XML processing is due to the need to redesign the relational schema each time a new XML hierarchy is defined. We, in contrast, describe a relational approach that is fixed schema eliminating the need for schema redesign at the expense of potentially longer runtimes. We show, however, that these potentially longer runtimes are still significantly shorter than those of the tree approach.
We use a popular XML benchmark to compare the scalability of both approaches. We generated large collections of heterogeneous XML documents ranging in size from 500 MB to 8 GB using the XBench benchmark. The scalability of each method was measured by running XML queries that cover a wide range of XML search features on each collection. We measure the scalability of each method over different query features as the collection size increases. In addition, we examine the performance of each method as the result size and the number of predicates increase. Our results show that our relational approach provides a scalable approach to XML retrieval by leveraging existing relational database optimizations. Furthermore, we show that the relational approach typically outperforms the tree-based approach while scaling consistently over all collections studied.
Similar content being viewed by others
References
SQLGenerator. http://ir.iit.edu/projects/SQLGenerator.html. Retrieved on August 2007
DBLP. http://www.informatik.uni-trier.de/~ley/db/. Retrieved on August 2007
Extensible markup language (XML). http://www.w3.org/XML/. Retrieved on August 2007
Open source native XML database. http://exist.sourceforge.net/. Retrieved on August 2007
XBench—a family of benchmarks for XML DBMSs. http://db.uwaterloo.ca/~ddbms/projects/xbench/index.html. Retrieved on August 2007
INitiative for the Evaluation of XML retrieval (INEX) (2007) http://inex.is.informatik.uni-duisburg.de/2007/. Retrieved on August 2007
Afanasiev L, Marx M (2006) An analysis of the current XQuery benchmarks. In: International Workshop on Performance and Evaluation of Data Management Systems (EXPDB)
Amagasa T, Yoshikawa M, Uemura S (2003) QRS: a robust numbering scheme for XML documents. In: Proceedings of the 19th international conference on data engineering (ICDE’03)
Amer-Yahia S, Lalmas M (2006) XML search: languages, INEX and scoring. SIGMOD Rec 35(4):16–23
Beyer KS, Cochrane R, Josifovski V, Kleewein J, Lapis G, Lohman GM, Lyle B, Ozcan F, Pirahesh H, Seemann N, Truong TC, der Linden BV, Vickery B, Zhang C (2005) System RX: one part relational, one part XML. In: SIGMOD conference, pp 347–358
Boncz P, Grust T, van Keulen M, Manegold S, Rittinger J, Teubner J (2006) MonetDB/XQuery: a fast XQuery processor powered by a relational engine. In: SIGMOD international conference on management of data, pp 479–490
Bosak J (2007) Shakespeare XML collection. http://www.ibiblio.org/bosak/xml/eg/. Retrieved on August 2007
Bremer J, Gertz M (2006) Integrating document and data retrieval based on XML. VLDB J 15(1):53–83
Bhme T, Rahm E (2004) Supporting efficient streaming and insertion of XML data in RDBMS. In: 3rd international workshop data integration over the web (DIWeb)
Cathey R, Beitzel S, Jensen E, Grossman D, Frieder O (2007) Relationally mapping XML queries for scalable XML search. In: Proceedings of IEEE conference on the intelligence and security informatics (ISI’07), May 2007
Chien S, Tsotras V, Zaniolo C, Zhang D (2002) Efficient complex query support for multiversion XML documents. In: Proceedings of the EDBT conference
Denoyer L, Gallinari P (2006) The Wikipedia XML corpus. SIGIR Forum 40(1):64–69
Fan W, Yu JX, Lu H, Lu J, Rastogi R (2005) Query translation from XPath to SQL in the presence of recursive DTDs. In: VLDB, pp 337–348
Florescu D, Kossman D (1999) A performance evaluation of alternative mapping schemes for storing XML data in a relational database. Technical report, INRIA, France
Florescu D, Kossman D (1999) Storing and querying XML data using an RDBMS. IEEE Data Eng Bull 22(3):27–34
Gottlob G, Koch C, Pichler R, Segoufin L (2005) The complexity of XPath query evaluation and XML typing. J ACM 52:284–335
Grohe M (2002) Parameterized complexity for the database theorist. ACM SIGMOD Rec 31:86–96
Grust T (2002) Accelerating XPath location steps. In: ACM SIGMOD international conference on management of data, pp 109–120
Grust T, Sakr S, Teubner J (2004) XQuery on SQL hosts. In: VLDB, pp 252–263
Jiang H, Lu H, Wang W, Yu JX (2002) Path materialization revisited: an efficient storage model for XML data. In: Proceedings of the thirteenth Australian conference on database technologies, vol 5, pp 85–94
Khan L, Rao Y (2001) A performance evaluation of storing XML data in relational database management systems. In: Proceedings of the third international workshop on web information and data management, pp 31–38
Koch C (2006) On the complexity of nonrecursive XQuery and functional query languages on complex values. ACM Trans Database Syst 31:1215–1256
Koch C (2006) Processing queries on tree-structured data efficiently. In: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 213–224
Krishnamurthy R, Chakaravarthy V, Kaushik R, Naughton J (2004) Recursive XML schemas, recursive XML queries, and relational storage: XML-to-SQL query translation. In: IEEE international conference on data engineering (ICDE)
Krishnamurthy R, Kaushik R, Naughton J (2003) XML-to-SQL query translation literature: the state of the art and open problems, September 2003
Krishnaprasad M, Liu ZH, Manikutty A, Warner JW, Arora V, Kotsovolos S (2004) Query rewrite for XML in Oracle XML DB. In: VLDB, pp 1122–1133
Lee Y, Yoo S, Yoon K, Berra P (1996) Index structures for structured documents. In: Proceedings of the 1st ACM international conference on digital libraries, March 1996
Li Q, Moon B (2001) Indexing and querying XML data for regular path expressions. In: Proceedings of the 27th international conference on very large databases, September 2001
Liu ZH, Krishnaprasad M, Arora V (2005) Native XQuery processing in Oracle XMLDB. In: SIGMOD conference, pp 828–833
Manegold S (2006) An empirical evaluation of XQuery processors. In: International workshop on performance and evaluation of data management systems (EXPDB)
Meier W (2002) eXist: an open source native XML database. In: Web, web-services, and database systems. NODe 2002 web- and database-related workshops, October 2002. Springer LNCS Series, 2593
Meier W (2006) Index-driven XQuery processing in the eXist XML database. In: IXML Prague
Pal S, Cseri I, Schaller G, Seeliger O, Giakoumakis L, Zolotov VV (2004) Indexing XML data stored in a relational database. In: VLDB, pp 1134–1145
Papadimitriou CH, Yannakakis M (1997) On the complexity of database queries. In: Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, pp 12–19
Rys M (2004) XQuery in relational database systems. In: XML 2004
Rys M (2005) XML and relational database management systems: inside microsoft SQL server 2005. In: SIGMOD international conference on management of data, pp 958–962
Al-Khalifa SS, Jagadish H, Koudas N, Patel J, Srivastava D, Wu Y (2002) Structural joins: a primitive for efficient XML query pattern matching. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 141–152
Schmidt A, Kersten M, Windhouwer M, Waas F (2001) Efficient relational storage and retrieval of XML documents. Lect Notes Comput Sci 1997:137
Shanmugasundaram J, Kiernan J, Shekita EJ, Fan C, Funderburk JE (2001) Querying XML views of relational data. In: VLDB, pp 261–270
Shanmugasundaram J, Tufte K, He G, Zhang C, DeWitt D, Naughton J (1999) Relational database for querying XML documents: limitations and opportunities. In: Proc of VLDB
Snelson J (2005) All XML databses are equal. XTech 2005: XML, the web and beyond
Suciu D (2001) On database theory and XML. ACM SIGMOD Rec Arch 30(3):39–45
Suizo N (2006) XML propels security intelligence. Network world, August 2006
Tatarinov I, Viglas S, Beyer K, Shanmugasundaram J, Shekita E, Zhang C (2002) Storing and querying ordered XML using a relational database system. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, pp 204–215
Tian F, DeWitt D, Chen J, Zhang C (2002) The design and performance evaluation of alternative XML storage strategies. ACM SIGMOD Rec 31(1):5–10
Vakali A, Catania B, Maddalena A (2005) XML data stores: emerging practices. IEEE Internet Comput 9(2):62–69
Wang H, Meng X (2005) On the sequencing of tree structures for XML indexing. In: ICDE, pp 372–383
Weigel F, Schulz KU, Meuss H (2005) Exploiting native XML indexing techniques for XML retrieval in relational database systems. In: Workshop on web information and data management (WIDM’05), pp 23–30
Weigel F, Schulz KU, Meuss H (2005) The BIRD numbering scheme for XML and tree databases—deciding and reconstructing tree relations using efficient arithmetic operations. In: XSym, pp 49–67
Xing G, Tseng B (2004) Extendible range-based numbering scheme for XML document. In: Proceedings of the international conference on information technology: coding and computing (ITCC’04)
Yoshikawa M, Amagasa T (2001) XRel: a path-based approach to storage and retrieval of XML documents using relational databases. ACM Trans Internet Technol (TOIT) 1(1):110–141
Zhang C, Naughton J, DeWitt D, Luo Q, Lohman G (2001) On supporting containment queries in relational database management systems. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 425–438
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cathey, R.J., Beitzel, S.M., Jensen, E.C. et al. Using a relational database for scalable XML search. J Supercomput 44, 146–178 (2008). https://doi.org/10.1007/s11227-007-0153-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-007-0153-1