Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Dimensional testing for reverse k-nearest neighbor search

Published: 01 March 2017 Publication History

Abstract

Given a query object q, reverse k-nearest neighbor (RkNN) search aims to locate those objects of the database that have q among their k-nearest neighbors. In this paper, we propose an approximation method for solving RkNN queries, where the pruning operations and termination tests are guided by a characterization of the intrinsic dimensionality of the data. The method can accommodate any index structure supporting incremental (forward) nearest-neighbor search for the generation and verification of candidates, while avoiding impractically-high preprocessing costs. We also provide experimental evidence that our method significantly outperforms its competitors in terms of the tradeoff between execution time and the quality of the approximation. Our approach thus addresses many of the scalability issues surrounding the use of previous methods in data mining.

References

[1]
E. Achtert, C. Böhm, H.-P. Kriegel, and P. Kröger. Online hierarchical clustering in a data warehouse environment. In ICDM, pages 10--17, 2005.
[2]
E. Achtert, C. Böhm, P. Kröger, P. Kunath, A. Pryakhin, and M. Renz. Approximate reverse k-nearest neighbor queries in general metric spaces. In CIKM, pages 788--789, 2006.
[3]
E. Achtert, C. Böhm, P. Kröger, P. Kunath, A. Pryakhin, and M. Renz. Efficient reverse k-nearest neighbor search in arbitrary metric spaces. In SIGMOD, pages 515--526, 2006.
[4]
E. Achtert, C. Böhm, P. Kröger, P. Kunath, A. Pryakhin, and M. Renz. Efficient reverse k-nearest neighbor estimation. In BTW, pages 344--363, 2007.
[5]
L. Amsaleg, O. Chelly, T. Furon, S. Girard, M. E. Houle, K. Kawarabayashi, and M. Nett. Estimating local intrinsic dimensionality. In KDD, pages 29--38, 2015.
[6]
A. Beygelzimer, S. Kakade, and J. Langford. Cover trees for nearest neighbor. In ICML, pages 97--104, 2006.
[7]
J. A. Blackard and D. J. Dean. Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Comp Ag, 24:131--151, 1999.
[8]
N. Boujemaa, J. Fauqueur, M. Ferecatu, F. Fleuret, V. Gouet, B. LeSaux, and H. Sahbi. IKONA: Interactive specific and generic image retrieval. In MMCBIR, 2001.
[9]
M. A. Cheema, X. Lin, W. Zhang, and Y. Zhang. Influence zone: Efficiently processing reverse k nearest neighbors queries. In ICDE, pages 577--588, 2011.
[10]
P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In VLDB, pages 426--435, 1997.
[11]
K. L. Clarkson. Nearest neighbor queries in metric spaces. DCG, 22(1):63--93, 1999.
[12]
T. de Vries, S. Chawla, and M. E. Houle. Finding local anomalies in very high dimensional space. In ICDM, pages 128--137, 2010.
[13]
K. Falconer. Fractal Geometry: Mathematical Foundations and Applications. Wiley, 2003.
[14]
J. M. Geusebroek, G. J. Burghouts, and A. W. M. Smeulders. The Amsterdam Library of Object Images. Intern. J. Computer Vision, 61(1):103--112, 2005.
[15]
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, pages 518--529, 1999.
[16]
P. Grassberger and I. Procaccia. Measuring the strangeness of strange attractors. In The Theory of Chaotic Attractors, pages 170--189. Springer, 2004.
[17]
A. Guttman. R-Trees: A dynamic index structure for spatial searching. In SIGMOD, pages 47--57, 1984.
[18]
V. Hautamäki, I. Kärkkäinen, and P. Fränti. Outlier detection using k-nearest neighbour graph. In ICPR, pages 430--433, 2004.
[19]
M. Hein and J.-Y. Audibert. Intrinsic dimensionality estimation of submanifolds in R<sup>d</sup>. In ICML, pages 289--296, 2005.
[20]
M. E. Houle. Dimensionality, discriminability, density & distance distributions. In ICDMW, pages 468--473, 2013.
[21]
M. E. Houle. Inlierness, outlierness, hubness and discriminability: an extreme-value-theoretic foundation. Technical Report NII-2015-002E, 2015.
[22]
M. E. Houle, H. Kashima, and M. Nett. Generalized expansion dimension. In ICDMW, pages 587--594, 2012.
[23]
M. E. Houle, X. Ma, M. Nett, and V. Oria. Dimensional testing for multi-step similarity search. In ICDM, pages 299--308, 2012.
[24]
M. E. Houle, X. Ma, and V. Oria. Effective and efficient algorithms for flexible aggregate similarity search in high dimensional spaces. TKDE, 27(12):3258--3273, 2015.
[25]
M. E. Houle, X. Ma, V. Oria, and J. Sun. Efficient similarity search within user-specified projective subspaces. Inf. Syst., 59:2--14, 2016.
[26]
M. E. Houle and J. Sakuma. Fast approximate similarity search in extremely high-dimensional data sets. In ICDE, pages 619--630, 2005.
[27]
W. Jin, A. K. H. Tung, J. Han, and W. Wang. Ranking outliers using symmetric neighborhood relationship. In PAKDD, pages 577--593, 2006.
[28]
D. R. Karger and M. Ruhl. Finding nearest neighbors in growth-restricted metrics. In STOC, pages 741--750, 2002.
[29]
F. Korn and S. Muthukrishnan. Influence sets based on reverse nearest neighbor queries. In SIGMOD, pages 201--212, 2000.
[30]
H.-P. Kriegel, P. Kröger, M. Renz, A. Züfle, and A. Katzdobler. Incremental reverse nearest neighbor ranking. In ICDE, pages 1560--1567, 2009.
[31]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278--2324, 1998.
[32]
M. Lichman. UCI machine learning repository, 2013. University of California, Irvine, School of Information and Computer Sciences.
[33]
A. Maheshwari, J. Vahrenhold, and N. Zeh. On reverse nearest neighbor queries. In CCCG, pages 128--132, 2002.
[34]
P. Mattila. Hausdorff dimension, orthogonal projections and intersections with planes. Ann. Acad. Sci. Fenn. A Math., 1:227--244, 1975.
[35]
I. Ntoutsi, A. Zimek, T. Palpanas, P. Kröger, and H.-P. Kriegel. Density-based projected clustering over high dimensional data streams. In SDM, pages 987--998, 2012.
[36]
D. Pokrajac, A. Lazarevic, and L. J. Latecki. Incremental local outlier detection for data streams. In CIDM, pages 504--515, 2007.
[37]
M. Radovanovic, A. Nanopoulos, and M. Ivanovic. Reverse nearest neighbors in unsupervised distance-based outlier detection. TKDE, 27(5):1369--1382, 2015.
[38]
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet large scale visual recognition challenge. IJCV, 115(3):211--252, 2015.
[39]
E. Schubert, A. Koos, T. Emrich, A. Züfle, K. A. Schmid, and A. Zimek. A framework for clustering uncertain data. PVLDB, 8(12):1976--1987, 2015.
[40]
A. Singh, H. Ferhatosmanoglu, and A. S. Tosun. High dimensional reverse nearest neighbor queries. In CIKM, pages 91--98, 2003.
[41]
I. Stanoi, D. Agrawal, and A. El Abbadi. Reverse nearest neighbor queries for dynamic databases. In SIGMODW, pages 44--53, 2000.
[42]
F. Takens. Detecting strange attractors in turbulence. Springer, 1981.
[43]
Y. Tao, D. Papadias, and X. Lian. Reverse kNN search in arbitrary dimensionality. In VLDB, pages 744--755, 2004.
[44]
Y. Tao, M. L. Yiu, and N. Mamoulis. Reverse nearest neighbor search in metric spaces. TKDE, 18(9):1239--1252, 2006.
[45]
J. Theiler. Lacunarity in a best estimator of fractal dimension. Phys. Lett. A, 133(4):195--200, 1988.
[46]
N. Tomasev, M. Radovanovic, D. Mladenic, and M. Ivanovic. The role of hubness in clustering high-dimensional data. TKDE, 26(3):739--751, 2014.
[47]
R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, pages 194--205, 1998.
[48]
R. C.-W. Wong, M. T. Özsu, A. W.-C. Fu, P. S. Yu, L. Liu, and Y. Liu. Maximizing bichromatic reverse nearest neighbor for l<sub>p</sub> -norm in two- and three-dimensional spaces. VLDB J., 20(6):893--919, 2011.
[49]
W. Wu, F. Yang, C. Y. Chan, and K.-L. Tan. FINCH: Evaluating reverse k-nearest-neighbor queries on location data. PVLDB, 1(1):1056--1067, 2008.
[50]
T. Xia, D. Zhang, E. Kanoulas, and Y. Du. On computing top-t most influential spatial sites. In VLDB, pages 946--957, 2005.
[51]
C. Yang and K.-I. Lin. An index structure for efficient reverse nearest neighbor queries. In ICDE, pages 485--492, 2001.

Cited By

View all
  • (2023)LiteHST: A Tree Embedding based Method for Similarity SearchProceedings of the ACM on Management of Data10.1145/35887151:1(1-26)Online publication date: 30-May-2023
  • (2023)Local intrinsic dimensionality measures for graphs, with applications to graph embeddingsInformation Systems10.1016/j.is.2023.102272119:COnline publication date: 1-Oct-2023
  • (2023)Processing Reverse Nearest Neighbor Queries Based on Unbalanced Multiway Region Tree IndexWeb Information Systems Engineering – WISE 202310.1007/978-981-99-7254-8_57(733-747)Online publication date: 25-Oct-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 10, Issue 7
March 2017
132 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 March 2017
Published in PVLDB Volume 10, Issue 7

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)LiteHST: A Tree Embedding based Method for Similarity SearchProceedings of the ACM on Management of Data10.1145/35887151:1(1-26)Online publication date: 30-May-2023
  • (2023)Local intrinsic dimensionality measures for graphs, with applications to graph embeddingsInformation Systems10.1016/j.is.2023.102272119:COnline publication date: 1-Oct-2023
  • (2023)Processing Reverse Nearest Neighbor Queries Based on Unbalanced Multiway Region Tree IndexWeb Information Systems Engineering – WISE 202310.1007/978-981-99-7254-8_57(733-747)Online publication date: 25-Oct-2023
  • (2021)The Effect of Random Projection on Local Intrinsic DimensionalitySimilarity Search and Applications10.1007/978-3-030-89657-7_16(201-214)Online publication date: 29-Sep-2021
  • (2020)Spatial Information Retrieval in Digital EcosystemsProceedings of the 12th International Conference on Management of Digital EcoSystems10.1145/3415958.3433038(10-17)Online publication date: 2-Nov-2020
  • (2020)Local Intrinsic Dimensionality III: Density and SimilaritySimilarity Search and Applications10.1007/978-3-030-60936-8_19(248-260)Online publication date: 30-Sep-2020
  • (2019)Subspace Determination Through Local Intrinsic Dimensional DecompositionSimilarity Search and Applications10.1007/978-3-030-32047-8_25(281-289)Online publication date: 2-Oct-2019
  • (2019)The Role of Local Intrinsic Dimensionality in Benchmarking Nearest Neighbor SearchSimilarity Search and Applications10.1007/978-3-030-32047-8_11(113-127)Online publication date: 2-Oct-2019
  • (2018)LIDH: An Efficient Filtering Method for Approximate k Nearest Neighbor Queries Based on Local Intrinsic DimensionWeb and Big Data10.1007/978-3-319-96890-2_22(268-276)Online publication date: 23-Jul-2018
  • (2018)Intrinsic Degree: An Estimator of the Local Growth Rate in GraphsSimilarity Search and Applications10.1007/978-3-030-02224-2_15(195-208)Online publication date: 7-Oct-2018
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media