Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/1182635.1164150acmconferencesArticle/Chapter ViewAbstractPublication PagesvldbConference Proceedingsconference-collections
Article

An incrementally maintainable index for approximate lookups in hierarchical data

Published: 01 September 2006 Publication History

Abstract

Several recent papers argue for approximate lookups in hierarchical data and propose index structures that support approximate searches in large sets of hierarchical data. These index structures must be updated if the underlying data changes. Since the performance of a full index reconstruction is prohibitive, the index must be updated incrementally.We propose a persistent and incrementally maintainable index for approximate lookups in hierarchical data. The index is based on small tree patterns, called pq-grams. It supports efficient updates in response to structure and value changes in hierarchical data and is based on the log of tree edit operations. We prove the correctness of the incremental maintenance for sequences of edit operations. Our algorithms identify a small set of pq-grams that must be updated to maintain the index. The experimental results with synthetic and real data confirm the scalability of our approach.

References

[1]
{1} S. Al-Khalifa, H. V. Jagadish, J. M. Patel, Y. Wu, N. Koudas, and D. Srivastava. Structural joins: A primitive for efficient XML query pattern matching. In Proc. of ICDE, pages 141-152. IEEE Computer Society, 2002.
[2]
{2} N. Augsten, M. Böhlen, and J. Gamper. Approximate matching of hierarchical data using pq-grams. In Proc. of VLDB, pages 301-312. Morgan Kaufmann Publishers Inc., 2005.
[3]
{3} N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: Optimal XML pattern matching. In Proc. of SIGMOD, pages 310-321. ACM Press, 2002.
[4]
{4} G. Cobéna, S. Abiteboul, and A. Marian. Detecting changes in XML documents. In Proc. of ICDE, pages 41-52. IEEE Computer Society, 2002.
[5]
{5} B. Cooper, N. Sample, M. J. Franklin, G. R. Hjaltason, and M. Shadmon. A fast index for semistructured data. In Proc. of VLDB, pages 341-350. Morgan Kaufmann Publishers Inc., 2001.
[6]
{6} M. Garofalakis and A. Kumar. XML stream processing using tree-edit distance embeddings. ACM Trans. on Database Systems, 30(1):279-332, 2005.
[7]
{7} S. Guha, H. V. Jagadish, N. Koudas, D. Srivastava, and T. Yu. Approximate XML joins. In Proc. of SIGMOD, pages 287-298. ACM Press, 2002.
[8]
{8} S. Guha, N. Koudas, D. Srivastava, and T. Yu. Index-based approximate XML joins. In Proc. of ICDE, pages 708-710. IEEE Computer Society, 2003.
[9]
{9} H. Jiang, H. Lu, W. Wang, and B. C. Ooi. XR-tree: Indexing XML data for efficient structural joins. In Proc. of ICDE, pages 253-263. IEEE Computer Society, 2003.
[10]
{10} R. M. Karp and M. O. Rabin. Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development, 31(2):249-260, 1987.
[11]
{11} R. Kaushik, P. Bohannon, J. F. Naughton, and P. Shenoy. Updates for structure indexes. In Proc. of VLDB, pages 239-250. Morgan Kaufmann Publishers Inc., 2002.
[12]
{12} K.-H. Lee, Y.-C. Choy, and S.-B. Cho. An efficient algorithm to compute differences between structured documents. IEEE Transactions on Knowledge and Data Engineering (TKDE), 16(8):965-979, 2004.
[13]
{13} Q. Li and B. Moon. Indexing and querying XML data for regular path expressions. In Proc. of VLDB, pages 361-370. Morgan Kaufmann Publishers Inc., 2001.
[14]
{14} N. Polyzotis, M. Garofalakis, and Y. Ioannidis. Approximate XML query answers. In Proc. of SIGMOD, pages 263-274. ACM Press, 2004.
[15]
{15} C. Qun, A. Lim, and K. W. Ong. D(k)-index: An adaptive structural summary for graph-structured data. In Proc. of SIGMOD, pages 134-144. ACM Press, 2003.
[16]
{16} R. Schenkel, A. Theobald, and G. Weikum. Efficient creation and incremental maintenance of the HOPI index for complex XML document collections. In ICDE, pages 360-371. IEEE Computer Society, 2005.
[17]
{17} E. Ukkonen. Approximate string-matching with q-grams and maximal matches. Theoretical Computer Science, 92(1):191-211, 1992.
[18]
{18} M. Weis and F. Naumann. DogmatiX tracks down duplicates in XML. In Proc. of SIGMOD, pages 431-442. ACM Press, 2005.
[19]
{19} R. Yang, P. Kalnis, and A. K. H. Tung. Similarity evaluation on tree-structured data. In Proc. of SIGMOD, pages 754-765. ACM Press, 2005.
[20]
{20} K. Zhang and D. Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal on Computing, 18(6):1245-1262, 1989.

Cited By

View all
  • (2017)Frag-shells cube based on hierarchical dimension encoding treeProceedings of the 11th International Conference on Ubiquitous Information Management and Communication10.1145/3022227.3022229(1-9)Online publication date: 5-Jan-2017
  • (2014)On-the-fly token similarity joins in relational databasesProceedings of the 2014 ACM SIGMOD International Conference on Management of Data10.1145/2588555.2610530(1495-1506)Online publication date: 18-Jun-2014
  • (2011)RTEDProceedings of the VLDB Endowment10.14778/2095686.20956925:4(334-345)Online publication date: 1-Dec-2011
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
VLDB '06: Proceedings of the 32nd international conference on Very large data bases
September 2006
1269 pages

Sponsors

  • SIGMOD: ACM Special Interest Group on Management of Data
  • K.I.S.S. SIG on Databases
  • AJU Information Technology Co., Ltd
  • US Army ITC-PAC Asian Research Office
  • Google Inc.
  • The Database Society of Japan
  • Samsung SOS
  • Advanced Information Technology Research Center
  • Naver
  • Microsoft: Microsoft
  • Korea Info Sci Society: Korea Information Science Society
  • SK telecom
  • Systems Applications Products
  • ORACLE: ORACLE
  • International Business Management
  • Air Force Office of Scientific Research/Asian Office of Aerospace R&D
  • Kosef
  • Kaist
  • LG Electronics
  • CCF-DBS

Publisher

VLDB Endowment

Publication History

Published: 01 September 2006

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Frag-shells cube based on hierarchical dimension encoding treeProceedings of the 11th International Conference on Ubiquitous Information Management and Communication10.1145/3022227.3022229(1-9)Online publication date: 5-Jan-2017
  • (2014)On-the-fly token similarity joins in relational databasesProceedings of the 2014 ACM SIGMOD International Conference on Management of Data10.1145/2588555.2610530(1495-1506)Online publication date: 18-Jun-2014
  • (2011)RTEDProceedings of the VLDB Endowment10.14778/2095686.20956925:4(334-345)Online publication date: 1-Dec-2011
  • (2010)pq-hashProceedings of the 2010 international conference on Web-age information management10.5555/1927585.1927600(125-134)Online publication date: 15-Jul-2010
  • (2009)Comparing starsProceedings of the VLDB Endowment10.14778/1687627.16876312:1(25-36)Online publication date: 1-Aug-2009

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media