Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Statistics collection in oracle spatial and graph: fast histogram construction for complex geometry objects

Published: 01 August 2013 Publication History

Abstract

Oracle Spatial and Graph is a geographic information system (GIS) which provides users the ability to store spatial data alongside conventional data in Oracle. As a result of the coexistence of spatial and other data, we observe a trend towards users performing increasingly complex queries which involve spatial as well as non-spatial predicates. Accurate selectivity values, especially for queries with multiple predicates requiring joins among numerous tables, are essential for the database optimizer to determine a good execution plan. For queries involving spatial predicates, this requires that reasonably accurate statistics collection has been performed on the spatial data. For extensible data cartridges such as Oracle Spatial and Graph, the optimizer expects to receive accurate predicate selectivity and cost values from functions implemented within the data cartridge. Although statistics collection for spatial data has been researched in academia for a few years; to the best of our knowledge, this is the first work to present spatial statistics collection implementation details for a commercial GIS database. In this paper, we describe our experiences with implementation of statistics collection methods for complex geometry objects within Oracle Spatial and Graph. Firstly, we exemplify issues with previous partitioning-based algorithms in presence of complex geometry objects and suggest enhancements which resolve the issues. Secondly, we propose a main memory implementation which not only speeds up the disk-based partitioning algorithms but also utilizes existing R-tree indexes to provide surprisingly accurate selectivity estimates. Last but not the least, we provide extensive experimental results and an example study which displays the efficacy of our approach on Oracle query performance.

References

[1]
IBM Informix Spatial DataBlade. http://www-01. ibm.com/software/data/informix/blades/spatial/.
[2]
Microsoft SQL Server 2012. http://msdn.microsoft. com/en-us/library/bb418440(v=sql.10).aspx.
[3]
Oracle Exadata Database Machine. http://www.oracle.com/us/products/database/exadata/overview/index.html.
[4]
Oracle Spatial and Graph. http://www.oracle.com/us/products/database/options/spatial/overview/index.html.
[5]
PostGIS. http://postgis.refractions.net.
[6]
A. Aboulnaga, P. Haas, M. Kandil, S. Lightstone, G. Lohman, V. Markl, I. Popivanov, and V. Raman. Automated Statistics Collection in DB2 UDB. In VLDB, pages 1158-1169, 2004.
[7]
A. Aboulnaga and J. Naughton. Accurate Estimation of the Cost of Spatial Selections. In IEEE ICDE, pages 123-134, 2000.
[8]
S. Acharya, V. Poosala, and S. Ramaswamy. Selectivity Estimation in Spatial Databases. In ACM SIGMOD, pages 13-24, 1999.
[9]
P. Aoki. How to Avoid Building DataBlades (r) That Know the Value of Everything and the Cost of Nothing. In IEEE SSDBM, pages 122-133, 1999.
[10]
N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger. The R*-tree: An Efficient and Robust Access Method for Points and Rectangles. In ACM SIGMOD, pages 322-331, 1990.
[11]
T. Brinkhoff, H. Horn, H. Kriegel, and R. Schneider. A Storage and Access Architecture for Efficient Query Processing in Spatial Database Systems. In SSD, pages 357-376, 1993.
[12]
N. Bruno, S. Chaudhuri, and L. Gravano. STHoles: A Multidimensional Workload-aware Histogram. In ACM SIGMOD, pages 211-222, 2001.
[13]
S. Chakkappen, T. Cruanes, B. Dageville, L. Jiang, U. Shaft, H. Su, and M. Zait. Efficient and Scalable Statistics Gathering for Large Databases in Oracle 11g. In ACM SIGMOD, pages 1053-1064, 2008.
[14]
T. Eavis and A. Lopez. Rk-hist: An R-tree based Histogram for Multi-dimensional Selectivity Estimation. In ACM CIKM, pages 475-484, 2007.
[15]
P. Gibbons. Distinct Sampling for Highly-accurate Answers to Distinct Values Queries and Event Reports. In VLDB, pages 541-550, 2001.
[16]
D. Gunopulos, G. Kollios, V. Tsotras, and C. Domeniconi. Approximating Multi-dimensional Aggregate Range Queries over Real Attributes. In ACM SIGMOD, pages 463-474, 2000.
[17]
D. Gustafson and W. Kessel. Fuzzy clustering with a fuzzy covariance matrix. In IEEE Conference on Decision and Control, pages 761-766, 1978.
[18]
A. Guttman. R-trees: A Dynamic Index Structure for Spatial Searching. In ACM SIGMOD, pages 47-57, 1984.
[19]
P. Haas, J. Naughton, S. Seshadri, and L. Stokes. Sampling-based Estimation of the Number of Distinct Values of an Attribute. In VLDB, pages 311-322, 1995.
[20]
P. Haas and A. Swami. Sampling-based Selectivity Estimation for Joins using Augmented Frequent Value Statistics. In IEEE ICDE, pages 522-531, 1995.
[21]
Y. Ioannidis. The History of Histograms (abridged). In VLDB, pages 19-30, 2003.
[22]
H. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. Sevcik, and T. Suel. Optimal Histograms with Quality Guarantees. In VLDB, pages 275-286, 1998.
[23]
I. Kamel and C. Faloutsos. On packing R-trees. In ACM CIKM, pages 490-499, 1993.
[24]
K. Kanth, S. Ravada, J. Sharma, and J. Banerjee. Indexing Medium-dimensionality Data in Oracle. In ACM SIGMOD, pages 521-522, 1999.
[25]
R. Kothuri, S. Ravada, and D. Abugov. Quadtree and R-tree Indexes in Oracle Spatial: A Comparison using GIS Data. In ACM SIGMOD, pages 546-557, 2002.
[26]
R. Kothuri, S. Ravada, and N. An. Incorporating Updates in Domain Indexes: Experiences with Oracle Spatial R-trees. In IEEE ICDE, pages 745-753, 2004.
[27]
R. Lipton, J. Naughton, and D. Schneider. Practical Selectivity Estimation through Adaptive Sampling. In ACM SIGMOD, pages 40-46, 1990.
[28]
M. Muralikrishna and D. DeWitt. Equi-depth Multidimensional Histograms. In ACM SIGMOD, pages 28-36, 1988.
[29]
S. Muthukrishnan, V. Poosala, and T. Suel. On Rectangular Partitionings in Two Dimensions: Algorithms, Complexity and Applications. In ICDT, pages 236-256, 1999.
[30]
V. Poosala, P. Haas, Y. Ioannidis, and E. Shekita. Improved Histograms for Selectivity Estimation of Range Predicates. In ACM SIGMOD, pages 294-305, 1996.
[31]
Y. Roh, J. Kim, Y. Chung, J. Son, and M. Kim. Hierarchically Organized Skew-tolerant Histograms for Geographic Data Objects. In ACM SIGMOD, pages 627-638, 2010.
[32]
T. Sellis, N. Roussopoulos, and C. Faloutsos. The R+-tree: A Dynamic Index for Multi-dimensional Objects. In VLDB, pages 40-46, 1987.
[33]
U. Srivastava, P. Haas, V. Markl, M. Kutsch, and T. Tran. ISOMER: Consistent Histogram Construction using Query Feedback. In IEEE ICDE, pages 39-50, 2006.
[34]
X. Xie and G. Beni. A Validity Measure for Fuzzy Clustering. IEEE Transations on Pattern Analysis and Machine Intelligence, 13(8):841-847, 1991.

Index Terms

  1. Statistics collection in oracle spatial and graph: fast histogram construction for complex geometry objects
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 6, Issue 11
      August 2013
      237 pages

      Publisher

      VLDB Endowment

      Publication History

      Published: 01 August 2013
      Published in PVLDB Volume 6, Issue 11

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 92
        Total Downloads
      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 20 Nov 2024

      Other Metrics

      Citations

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media