Abstract
Tree structure metric-space indexing methods recursively partition data according to their distances to a set of selected reference points (also called pivots). There are two basic forms of data partitioning: ball partition and General Hyper-plane (GH) partition. Most existing work only shows their superiority experimentally, and little theoretical proof is found. We propose an approach to unify existing data partitioning methods and analyze their performance theoretically. First, in theory, we unify the two basic forms of partitioning by proving that there are rotations of each other. Second, we show several theoretical or experimental results, which are able to indicate that ball partition outperforms GH partition. Our work takes a step forward in the theoretical study of metric-space indexing and is able to give a guideline of future index design.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aizerman, M., Braverman, E., Rozonoer, L.: Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control 25, 821–837 (1964)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. ACM Commun. 18(9), 509–517 (1975)
Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: ICML, pp. 97–104 (2006)
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Haussler, D. (ed.) 5th Annual ACM Workshop on COLT, pp. 144–152. ACM Press, Pittsburgh (1992)
Bozkaya, T., Ozsoyoglu, M.: Indexing large metric spaces for similarity search queries. ACM Trans. Database Syst. 24(3), 361–404 (1999)
Brin, S.: Near Neighbor Search in Large Metric Spaces. In: The 21th International Conference on Very Large Data Bases (VLDB 1995). Morgan Kaufmann Publishers Inc. (1995)
Bustos, B., Navarro, G., Chavez, E.: Pivot selection techniquesfor proximity searching in metric spaces. Pattern Recogn. Lett. 24(14), 2357–2366 (2003)
Casella, G., Berger, R.L.: Statistical Inference. Duxbury Press (2001)
Chavez, E., Navarro, G., Baeza-Yates, R., Marroqu, J.: Searching in metric spaces. ACM Computing Surveys 33(3), 273–321 (2001)
Ciaccia, P., Patella, M.: Bulk loading the M-tree. In: 9th Australasian Database Conference, ADO 1998 (1998)
Ciaccia, P., Patella, M.: Proceedings of the Third International Conference on Similarity Search and Applications, Istanbul, Turkey, September 18-19. ACM, New York (2010)
Ciaccia, P., Patella, M., Zezula, P.: M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. Presented at the 23rd International Conference on Very Large Data Bases (VLDB 1997), Athens, Greece (1997)
Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20 (1995)
Feng, Y.-C., Kui, C., Cao, Z.-S.: A Multidimensional Index Structure for Fast Similarity Retrieval. Journal of Software 13(8), 1678–1685 (2002)
Filho, R.F.S., Traina, A.J.M., Traina, C., Faloutsos, C.: Similaritysearch without tears: The OMNI family of all-purpose accessmethods. In: ICDE, pp. 623–630 (2001)
Fuchs, H., Kedem, Z.M., Naylor, B.F.: On visible surface generation by a priori tree structures. In: Proceedings of the 7th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH 1980), SIGGRAPH Computer Graphics, vol. 14(3), pp. 124–133. ACM, New York (1980)
Hjaltason, G.R., Samet, H.: Index-driven similarity search in metric spaces. ACM Transactions on Database Systems (TODS) 28(4), 517–580 (2003)
Hochbaum, D.S., Shmoys, D.B.: A best possible heuristic for the k-center problem. Mathematics of Operational Research 10(2), 180–184 (1985)
Jagadish, H.V., Ooi, B.C., Tan, K., Yu, C., Zhang, R.: iDistance: An adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst. 30(2), 364–397 (2005)
Johnson, R.A., Wichern, D.W.: Applied Multivariate Statistical Analysis, 6th edn. Prentice Hall (2007)
Karger, D., Ruhl, M.: Finding nearest neighbors in growth restricted metrics. In: Proceedings of the 34th Annual ACM Symposium on Theory of Computing, pp. 741–750 (2002)
Li, J.-Z., Zhang, Z.-G.: Haperplane Tree: A Structure of Indexing Metric Spaces for Similarity Search Queries. Journal of Computer Research and Development 40(8), 1209–1215 (2003)
Lokoč, J., Skopal, T.: On applications of parameterized hyper-plane partitioning. Poster in the Proceedings of the Third International Conference on Similarity Search and Applications (SISAP2010), Istanbul, Turkey, September 18-19, pp. 131–132 (2010)
Mao, R., Miranker, W., Miranker, D.P.: Pivot Selection:Dimension Reduction for Distance-Based Indexing. Journal of Discrete Algorithms 13, 32–46 (2012)
Mao, R., Xu, W., Ramakrishnan, S., Nuckolls, G., Miranker, D.P.: On Optimizing Distance-Based Similarity Search for Biological Databases. In: The 2005 IEEE Computational Systems Bioinformatics Conference (2005)
Matousek, J.: Lectures on Discrete Geometry, p. 497. Springer-Verlag New York, Inc. (2002)
MoBIoS test suite, http://www.cs.utexas.edu/~mobios/
Navarro, G.: Searching in Metric Spaces by Spatial Approximation. In: Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware. IEEE Computer Society (1999)
Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan-Kaufmann (2006)
Shen, H.T., Ooi, B.C., Zhou, X.: Towards Effective Indexing for Very Large Video Sequence Database. In: SIGMOD Conference 2005, pp. 730–741 (2005)
SISAP test suite, http://sisap.org/Metric_Space_Library.html
Traina Jr., C., Traina, A.J.M., Seeger, B., Faloutsos, C.: Slim-trees: High performance metric trees minimizing overlap between nodes. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 51–65. Springer, Heidelberg (2000)
Uhlmann, J.K.: Satisfying General Proximity/Similarity Queries with Metric Trees. Information Processing Letter 40(4), 175–179 (1991)
Venkateswaran, J., Kahveci, T., Jermaine, C.M., Lachwani, D.: Reference-based indexing for metric spaces with costly distance measures. VLDB J. 17(5), 1231–1251 (2008)
Xu, W., Miranker, D.P.: A Metric Model of Amino Acid Substitution. Bioinformatics 20(8), 1214–1221 (2004)
Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: The Fourth Annual ACM-SIAM Symposium on Discrete Algorithms.Society for Industrial and Applied Mathematics (1993)
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: the Metric Space Approach. Springer, Heidelberg (2006)
Zhang, Z.-G., Li, J.-Z.: An Algorithm Based on RGH-Tree for Similarity Search Queries. Journal of software 13(10), 1969–1976 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Mao, R., Liu, S., Xu, H., Zhang, D., Miranker, D.P. (2014). On Data Partitioning in Tree Structure Metric-Space Indexes. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science, vol 8421. Springer, Cham. https://doi.org/10.1007/978-3-319-05810-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-05810-8_10
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05809-2
Online ISBN: 978-3-319-05810-8
eBook Packages: Computer ScienceComputer Science (R0)