Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1096601.1096641acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
Article

Prefiltering techniques for efficient XML document processing

Published: 02 November 2005 Publication History

Abstract

Document Object Model (DOM) and Simple API for XML (SAX) are the two major programming models for XML document processing. Each, however, has its own efficiency limitation. DOM assumes an in-core representation of XML documents which can be problematic for large documents. SAX needs to scan over the document in a linear manner in order to locate the interesting fragments. Previously, we have used tree-to-table mapping and indexing techniques to help answer structural queries to large, or large collections of, XML documents. In this paper, we generalize the previous techniques into a prefiltering framework where repeated access to large XML documents can be efficiently carried out within the existing DOM and SAX models. The prefiltering framework essentially uses a tiny search engine to locate useful fragments in the target XML documents by approximately executing the user's queries. Those fragments are gathered into a candidate-set XML document, and is returned to the user's DOM- or SAX-based applications for further processing. This results in a practical and efficient model of XML processing, especially when the XML documents are large and infrequently updated, but are frequently being queried.

References

[1]
A. Campillo, T. J. Green, A. Gupta, M. Onizuka, D. Raven, and D. Suciu. XMLTK: An XML toolkit for scalable XML stream processing. In Proc. of PLANX, 2002.
[2]
A. Slominski. Design of a Pull and Push Parser System for Streaming XML. Department of Computer Science, Indiana University, Technical Report TR550. 2001. Available: http://www.extreme.indiana.edu/xgws/papers/xml_push_pull.pdf
[3]
A. R. Schmidt, F. Waas, M. L. Kersten, D. Florescu, I. Manolescu, M. J. Carey, and R. Busse. The XML Benchmark Project. Technical Report INS-R0103, Centrum voor Wiskunde en Informatica, 2001.
[4]
C. H. Huang, T. R. Chuang, and H. M. Lee. Fast Structural Query with Application to Chinese Treebank Sentence Retrieval. In Proc. of the 2004 ACM Symposium on Document Engineering, 2004, pp. 11--20.
[5]
C. L. Chang, Y. H. Chang, T. R. Chuang, S. Ho, and F. T. Lin. Bridging Two Geography Languages: Experience in Mapping SEF to GML. In GML Dev Days: 2nd GML Developers Conference, 2003.
[6]
C. Y. Chan, P. Felber, M. Garofalakis, and R. Rastogi. Efficient Filtering of XML Documents with XPath Expressions. The VLDB Journal, No. 11, 2002, pp. 292--314.
[7]
D. Chen and R. K. Wong. Optimizing the lazy DFA approach for XML stream processing. In Proc. of the 15th conference on Australasian database, Vol. 27, 2004, pp. 131--140.
[8]
D. Megginson. SAX: A Simple API for XML. Available: http://www.saxproject.org/
[9]
D. Olteanu, H. Meuss, T. Furche, and F. Bry. XPath: Looking Forward. In Proc. of the Workshops XMLDM, MDDE, and YRWS on XML-Based Data Management and Multimedia Engineering-Revised, 2002, pp. 109--127.
[10]
DOM, World Wide Web Consortium. Document Object Model (DOM), W3C Recommendation.
[11]
J. Ferraiolo, editor, Scalable Vector Graphics (SVG) 1.0 Specification, W3C Recommendation, 2001.
[12]
K. J. Chen, C. C. Luo, Z. M. Gao, M. C. Chang, F. Y. Chen, C. J. Chen, and C. R. Huang. The CKIP Chinese Treebank. In Journees ATALA sur les Corpus annotes pour la syntaxe, Talana, Paris VII, 1999.
[13]
M. Altinel and M. J. Franklin. Efficient Filtering of XML Documents for Selective Dissemination of Information. In Proc. of 26th International Conference on Very Large Data Bases, 2000, pp. 53--64.
[14]
M. Yoshikawa, T. Amagasa, T. Shimura, and S. Uemura. XRel: A Path-based Approach to Storage and Retrieval of XML Documents Using Relational Databases. ACM Transactions on Internet Technology, Vol. 1, No. 1, 2001, pp. 110--141.
[15]
Q. Li and B. Moon. Indexing and Querying XML Data for Regular Path Expressions. In Proc. of 27th International Conference on Very Large Data Bases, 2001, pp. 361--370.
[16]
Q. Zou, S. Liu, and W. W. Chu. Ctree: A Compact Tree for Indexing XML Data. In Proc. of the 6th Annual ACM International Workshop on Web Information and Data Management, 2004, pp. 39--46.
[17]
R. Goldman and J. Widom. DataGuides: Enable Query Formulation and Optimization in Semi-structured Databases. In Proc. of 23rd International Conference on Very Large Data Bases, 1997, pp. 436--445.
[18]
S. A. Yahia, L. V.S. Lakshmanan, and S. Pandit. FleXPath: Flexible Structure and Full-Text Querying for XML. In Proc. of the 2004 ACM SIGMOD International Conference on Management of Data, 2004, pp. 83--94.
[19]
S. Cox, P. Daisey, R. Lake, C. Portele, and A. Whiteside, editors. OpenGIS© Geography Markup Language (GML) Implementation Specification, Version: 3.00, 2003.
[20]
T. Grust, M. V. Keulen, and J. Teubner, Accelerating XPath evaluation in any RDBMS, ACM Transactions on Database Systems (TODS), Vol. 29. No. 1, 2004, pp. 91--131.
[21]
XML Fragment Interchange (Candidate Recommendation), World Wide Web Consortium.
[22]
XPath, World Wide Web Consortium. XML Path Language (XPath). W3C Recommendation.
[23]
XPointer, World Wide Web Consortium. XML Pointer Language (XPointer), W3C working Draft.
[24]
XQuery, World Wide Web Consortium. XML Query (XQuery). W3C Recommendation.
[25]
XSLT, World Wide Web Consortium. The Extensible Stylesheet Language Transformations (XSLT). W3C.
[26]
Y. Diao, P. Fischer, M. Franklin, and R. To. YFilter: Efficient and Scalable Filtering of XML Documents. In Proc. of International Conference on Data Engineering, 2002, pp. 341--344.

Cited By

View all
  • (2012)Efficient string-based XML stream prefilteringProceedings of the Twenty-Third Australasian Database Conference - Volume 12410.5555/2483739.2483757(145-152)Online publication date: 31-Jan-2012
  • (2011)A Time/Space Efficient XML Filtering System for Mobile EnvironmentProceedings of the 2011 IEEE 12th International Conference on Mobile Data Management - Volume 0110.1109/MDM.2011.78(184-193)Online publication date: 6-Jun-2011
  • (2009)Building GML-native web-based geographic information systemsComputers & Geosciences10.1016/j.cageo.2008.11.00935:9(1802-1816)Online publication date: 1-Sep-2009
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DocEng '05: Proceedings of the 2005 ACM symposium on Document engineering
November 2005
252 pages
ISBN:1595932402
DOI:10.1145/1096601
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. DOM
  2. SAX
  3. prefiltering
  4. structural query
  5. two-phased XML processing model

Qualifiers

  • Article

Conference

DocEng05
Sponsor:
DocEng05: ACM Symposium on Document Engineering
November 2 - 4, 2005
Bristol, United Kingdom

Acceptance Rates

Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2012)Efficient string-based XML stream prefilteringProceedings of the Twenty-Third Australasian Database Conference - Volume 12410.5555/2483739.2483757(145-152)Online publication date: 31-Jan-2012
  • (2011)A Time/Space Efficient XML Filtering System for Mobile EnvironmentProceedings of the 2011 IEEE 12th International Conference on Mobile Data Management - Volume 0110.1109/MDM.2011.78(184-193)Online publication date: 6-Jun-2011
  • (2009)Building GML-native web-based geographic information systemsComputers & Geosciences10.1016/j.cageo.2008.11.00935:9(1802-1816)Online publication date: 1-Sep-2009
  • (2009)Document engineering approaches toward scalable and structured multimedia, web and printable documentsMultimedia Tools and Applications10.1007/s11042-009-0288-643:3(195-202)Online publication date: 1-Jul-2009
  • (2007)A document object modeling method to retrieve data from a very large XML documentProceedings of the 2007 ACM symposium on Document engineering10.1145/1284420.1284439(59-68)Online publication date: 28-Aug-2007
  • (2007)Querying and browsing XML and relational data sourcesProceedings of the 2007 ACM symposium on Applied computing10.1145/1244002.1244116(489-493)Online publication date: 11-Mar-2007
  • (2006)XML EvolutionProceedings of the 32nd international conference on Very large data bases10.5555/1182635.1164247(1215-1218)Online publication date: 1-Sep-2006
  • (2006)Efficient GML-native processors for web-based GISProceedings of the 14th annual ACM international symposium on Advances in geographic information systems10.1145/1183471.1183488(91-98)Online publication date: 10-Nov-2006

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media