Abstract
Distributions of very high dimensional data are, in most cases, not even, but skewed. For this reason, there can be more effective dimensions than others in partitioning a high dimensional data set. Effective dimensions can be used to partition the data set in more balanced way so that data are located in more evenly distributed. In this paper, we present schemes to select dimensions by which high dimensional data sets are partitioned for efficient similarity joins. Especially, in order to efficiently reduce the number of partition dimensions, we propose a novel scheme using diagonal dimensions compared with perpendicular dimensions. The experimental results show that the proposed schemes substantially improve the performance of the partition-based similarity joins in high dimensional data spaces.
This paper was supported by Konkuk University in 2006.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Brinkhoff, T., Kriegel, H.-P., Seeger, B.: Efficient processing of spatial joins using R-trees. In: Proceedings of the 1996 VLDB Conference, Bombay, India (September 1996)
Huang, Y.W., Jing, N., Rundensteiner, E.: Spatial joins using R-trees: Breadth-first traversal with global optimizations. In: Proceedings of the 1997 VLDB Conference (1997)
Hjaltason, G.R., Samet, H.: Incremental distance join algorithms for spatial databases. In: Proceedings of the 1998 ACM-SIGMOD Conference, Seattle, WA, June 1998, pp. 237–248 (1998)
Shin, H., Moon, B., Lee, S.: Adaptive multi-stage distance join processing. In: Proceedings of the 2000 ACM-SIGMOD Conference, Dallas, TX, May 2000, pp. 343–354 (2000)
Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM-SIGMOD Conference, Boston, MA, June 1984, pp. 47–57 (1984)
Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: An efficient and robust access method for points and rectangles. In: Proceedings of the 1990 ACM-SIGMOD Conference, Atlantic City, NJ, May 1990, pp. 322–331 (1990)
Berchtold, S., Keim, D.A., Kriegel, H.-P.: The X-tree: An index structure for highdimensional data. In: Proceedings of the 1996 Conference, Bombay, India (September 1996)
Patel, J.M., Dewitt, D.J.: Partition based spatial-merge join. In: Proceedings of the 1996 ACM-SIGMOD Conference, Montreal, Canada, June 1996, pp. 259–270 (1996)
Lo, M.-L., Ravishankar, C.V.: Spatial hash join. In: Proceedings of the 1996 ACM-SIGMOD Conference, Montreal, Canada pp. 247–258 (June 1996)
Shim, K., Srikant, R., Agrawal, R.: High-dimensional similarity joins. In: Proceedings of the 1997 IEEE International Conference on Data Engineering (1997)
Koudas, N., Sevcik, C.: High dimensional similarity joins: Algorithms and performance evaluation. In: Proceedings of the 1998 IEEE International Conference on Data Engineering (1998)
Bohm, C., Braunmuller, B., Krebs, F., Kriegel, H.-P.: Epsilon grid order. An algorithm for the similarity join on massive high-dimensional data. In: Proceedings of the 2001 ACM-SIGMOD Conference (2001)
Lin, K.-I., Jagadish, H.V., Faloutsos, C.: The TV-Tree: An Index Structure for High-Dimensional Data. VLDB Journal 3(4), 517–542 (1994)
Shin, H., Moon, B., Lee, S.: Partition-Based Similarity Join in High Dimensional Data Spaces. In: Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds.) DEXA 2002. LNCS, vol. 2453, pp. 741–750. Springer, Heidelberg (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shin, H. (2006). Partition-Based Similarity Joins Using Diagonal Dimensions in High Dimensional Data Spaces. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2006. IDEAL 2006. Lecture Notes in Computer Science, vol 4224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875581_66
Download citation
DOI: https://doi.org/10.1007/11875581_66
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45485-4
Online ISBN: 978-3-540-45487-8
eBook Packages: Computer ScienceComputer Science (R0)