Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/1316689.1316712dlproceedingsArticle/Chapter ViewAbstractPublication PagesvldbConference Proceedingsconference-collections
Article

Bloom histogram: path selectivity estimation for XML data with updates

Published: 31 August 2004 Publication History

Abstract

Cost-based XML query optimization calls for accurate estimation of the selectivity of path expressions. Some other interactive and internet applications can also benefit from such estimations. While there are a number of estimation techniques proposed in the literature, almost none of them has any guarantee on the estimation accuracy within a given space limit. In addition, most of them assume that the XML data are more or less static, i.e., with few updates. In this paper, we present a framework for XML path selectivity estimation in a dynamic context. Specifically, we propose a novel data structure, bloom histogram, to approximate XML path frequency distribution within a small space budget and to estimate the path selectivity accurately with the bloom histogram. We obtain the upper bound of its estimation error and discuss the trade-offs between the accuracy and the space limit. To support updates of bloom histograms efficiently when underlying XML data change, a dynamic summary layer is used to keep exact or more detailed XML path information. We demonstrate through our extensive experiments that the new solution can achieve significantly higher accuracy with an even smaller space than the previous methods in both static and dynamic environments.

References

[1]
{1} A. Aboulnaga, A. R. Alameldeen, and J. F. Naughton. Estimating the selectivity of XML path expressions for Internet scale applications. In Proceedings of the 27th International Conference on Very Large Data Bases (VLDB 2001), pages 591-600, 2001.
[2]
{2} B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422-426, 1970.
[3]
{3} A. Broder and M. Mitzenmacher. Network applications of bloom filters: A survey. In Proceeding of Allerton Conference, 2002.
[4]
{4} N. Bruno, S. Chaudhuri, and L. Gravano. STHoles: A multidimensional workload-aware histogram. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data (SIGMOD 2001).
[5]
{5} S. Chaudhuri, R. Motwani, and V. R. Narasayya. Random sampling for histogram construction: How much is enough? In Proceedings ACM SIGMOD International Conference on Management of Data (SIGMOD 1998), pages 436-447.
[6]
{6} Z. Chen, H. V. Jagadish, F. Korn, N. Koudas, S. Muthukrishnan, R. T. Ng, and D. Srivastava. Counting twig matches in a tree. In Proceedings of the 17th International Conference on Data Engineering (ICDE 2001), pages 595-604, 2001.
[7]
{7} S. Cohen and Y. Matias. Spectral bloom filters. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD 2003), pages 241-252.
[8]
{8} G. Cormode and S. Muthukrishnan. Improved data stream summaries: The count-min sketch and its applications (extended abstract). In Latin American Theoretical Informatics 2004 (LATIN 2004), 2004.
[9]
{9} J. Freire, J. R. Haritsa, M. Ramanath, P. Roy, and J. Siméon. Statix: making XML count. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD 2002), pages 181-191, 2002.
[10]
{10} P. B. Gibbons, Y. Matias, and V. Poosala. Fast incremental maintenance of approximate histograms. In Proceedings of 23rd International Conference on Very Large Data Bases (VLDB 1997).
[11]
{11} A. C. Gilbert, S. Guha, P. Indyk, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Fast, small-space algorithms for approximate histogram maintenance. In Proceedings on 34th Annual ACM Symposium on Theory of Computing (STOC 2002).
[12]
{12} Y. E. Ioannidis. The history of histograms (abridged). In Proceedings of 29th International Conference on Very Large Data Bases (VLDB 2003), pages 19-30.
[13]
{13} H. V. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. C. Sevcik, and T. Suel. Optimal histograms with quality guarantees. In Proceedings of 24rd International Conference on Very Large Data Bases (VLDB 1998), pages 275-286.
[14]
{14} L. Lim, M. Wang, S. Padmanabhan, J. S. Vitter, and R. Parr. XPathLearner: An on-line self-tuning Markov histogram for XML path selectivity estimation. In Proceedings of 28th International Conference on Very Large Data Bases (VLDB 2002), pages 442-453, 2002.
[15]
{15} L. Lim, M. Wang, and J. S. Vitter. SASH: A self-adaptive histogram set for dynamically changing workloads. In Proceedings of 29th International Conference on Very Large Data Bases, September 9-12, 2003 (VLDB 2003).
[16]
{16} L. F. Mackert and G. M. Lohman. R* optimizer validation and performance evaluation for local queries. In Proceedings of the 1986 ACM SIGMOD International Conference on Management of Data (SIGMOD 1986), pages 84-95, 1986.
[17]
{17} N. Polyzotis and M. N. Garofalakis. Statistical synopses for graph-structured XML databases. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD 2002), pages 358-369, 2002.
[18]
{18} N. Polyzotis and M. N. Garofalakis. Structure and value synopses for XML data graphs. In Proceedings of 28th International Conference on Very Large Data Bases (VLDB 2002), pages 466-477, 2002.
[19]
{19} L. Qiao, D. Agrawal, and A. E. Abbadi. RHist: adaptive summarization over continuous data streams. In Proceedings of the 2002 ACM CIKM International Conference on Information and Knowledge Management (CIKM 2002).
[20]
{20} N. Thaper, S. Guha, P. Indyk, and N. Koudas. Dynamic multidimensional histograms. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD 2002).

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
August 2004
1380 pages

Sponsors

  • VLDB Endowment: Very Large Database Endowment

Publisher

VLDB Endowment

Publication History

Published: 31 August 2004

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)2
Reflects downloads up to 20 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Accurate Summary-based Cardinality Estimation Through the Lens of Cardinality Estimation GraphsACM SIGMOD Record10.1145/3604437.360445852:1(94-102)Online publication date: 8-Jun-2023
  • (2016)Are Few Bins EnoughProceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/2902251.2902274(455-463)Online publication date: 15-Jun-2016
  • (2016)XHQEInformation Systems Frontiers10.1007/s10796-015-9561-618:6(1233-1249)Online publication date: 1-Dec-2016
  • (2014)Versatile and scalable parallel histogram constructionProceedings of the 23rd international conference on Parallel architectures and compilation10.1145/2628071.2628108(127-138)Online publication date: 24-Aug-2014
  • (2012)Towards benefit-based RDF source selection for SPARQL queriesProceedings of the 4th International Workshop on Semantic Web Information Management10.1145/2237867.2237869(1-8)Online publication date: 20-May-2012
  • (2011)Estimating selectivity for joined RDF triple patternsProceedings of the 20th ACM international conference on Information and knowledge management10.1145/2063576.2063784(1435-1444)Online publication date: 24-Oct-2011
  • (2010)Selectivity-based XML query processing in structured peer-to-peer networksProceedings of the Fourteenth International Database Engineering & Applications Symposium10.1145/1866480.1866513(236-244)Online publication date: 16-Aug-2010
  • (2010)LCA-based selection for XML document collectionsProceedings of the 19th international conference on World wide web10.1145/1772690.1772743(511-520)Online publication date: 26-Apr-2010
  • (2010)Statistics-based parallelization of XPath queries in shared memory systemsProceedings of the 13th International Conference on Extending Database Technology10.1145/1739041.1739063(159-170)Online publication date: 22-Mar-2010
  • (2009)Synopsis based load shedding in XML streamsProceedings of the 2009 EDBT/ICDT Workshops10.1145/1698790.1698806(93-98)Online publication date: 22-Mar-2009
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media