Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index

Published: 01 September 2014 Publication History

Abstract

Nearest neighbor searches in high-dimensional space have many important applications in domains such as data mining, and multimedia databases. The problem is challenging due to the phenomenon called "curse of dimensionality". An alternative solution is to consider algorithms that returns a c-approximate nearest neighbor (c-ANN) with guaranteed probabilities. Locality Sensitive Hashing (LSH) is among the most widely adopted method, and it achieves high efficiency both in theory and practice. However, it is known to require an extremely high amount of space for indexing, hence limiting its scalability.
In this paper, we propose several surprisingly simple methods to answer c-ANN queries with theoretical guarantees requiring only a single tiny index. Our methods are highly flexible and support a variety of functionalities, such as finding the exact nearest neighbor with any given probability. In the experiment, our methods demonstrate superior performance against the state-of-the-art LSH-based methods, and scale up well to 1 billion high-dimensional points on a single commodity PC.

References

[1]
N. Ailon and B. Chazelle. Faster dimension reduction. Commun. ACM, 53(2), 2010.
[2]
A. Andoni, P. Indyk, H. L. Nguyen, and I. Razenshteyn. Beyond locality-sensitive hashing. In SODA, 2014.
[3]
S. Arya, T. Malamatos, and D. M. Mount. Space-time tradeoffs for approximate nearest neighbor searching. J. ACM, 57(1), 2009.
[4]
S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM, 45(6), 1998.
[5]
B. Bahmani, A. Goel, and R. Shinde. Efficient distributed locality sensitive hashing. In CIKM, 2012.
[6]
A. Beygelzimer, S. Kakade, and J. Langford. Cover trees for nearest neighbor. In ICML, 2006.
[7]
A. Borodin, R. Ostrovsky, and Y. Rabani. Lower bounds for high dimensional nearest neighbor search and related problems. In STOC, 1999.
[8]
A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations (extended abstract). In STOC, 1998.
[9]
L. Cayton. Accelerating nearest neighbor search on manycore systems. In IPDPS, 2012.
[10]
A. Chakrabarti, B. Chazelle, B. Gum, and A. Lvov. A lower bound on the complexity of approximate nearest-neighbor searching on the hamming cube. In STOC, 1999.
[11]
T. M. Chan. Approximate nearest neighbor queries revisited. Discrete & Computational Geometry, 20(3), 1998.
[12]
M. Charikar. Similarity estimation techniques from rounding algorithms. In STOC, 2002.
[13]
S. Dasgupta and A. Gupta. An elementary proof of a theorem of johnson and lindenstrauss. Random Struct. Algorithms, 22(1), 2003.
[14]
M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Symposium on Computational Geometry, 2004.
[15]
W. Dong, Z. Wang, W. Josephson, M. Charikar, and K. Li. Modeling lsh for performance tuning. In CIKM, 2008.
[16]
R. Fagin et al. Efficient similarity search and classification via rank aggregation. In SIGMOD Conference, 2003.
[17]
J. Gan, J. Feng, Q. Fang, and W. Ng. Locality-sensitive hashing scheme based on dynamic collision counting. In SIGMOD Conference, 2012.
[18]
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, 1999.
[19]
M. E. Houle et al. Fast approximate similarity search in extremely high-dimensional data sets. In ICDE, 2005.
[20]
P. Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM, 2006.
[21]
P. Indyk et al. Approximate nearest neighbors: Towards removing the curse of dimensionality. In STOC, 1998.
[22]
H. V. Jagadish et al. idistance: An adaptive b+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst., 30(2), 2005.
[23]
H. Jégou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell., 33(1), 2011.
[24]
W. B. Johnson et al. Extensions of lipschitz mapping into hilbert space. Contemporary Mathematics, 26, 1984.
[25]
K. V. R. Kanth, S. Ravada, and D. Abugov. Quadtree and r-tree indexes in oracle spatial: a comparison using gis data. In SIGMOD Conference, 2002.
[26]
J. M. Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. In STOC, 1997.
[27]
R. Krauthgamer and J. R. Lee. Navigating nets: simple algorithms for proximity search. In SODA, 2004.
[28]
E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. In STOC, 1998.
[29]
B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by model selection. The Annals of Statistics, 28(5), 2000.
[30]
T. Liu, A. W. Moore, A. G. Gray, and K. Yang. An investigation of practical approximate nearest neighbor algorithms. In NIPS, 2004.
[31]
Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe lsh: Efficient indexing for high-dimensional similarity search. In VLDB, 2007.
[32]
S. Meiser. Point location in arrangements of hyperplanes. Inf. Comput., 106(2), 1993.
[33]
R. O'Donnell et al. Optimal lower bounds for locality sensitive hashing (except when q is tiny). In ICS, 2011.
[34]
J. Pan and D. Manocha. Bi-level locality sensitive hashing for k-nearest neighbor computation. In ICDE, 2012.
[35]
R. Panigrahy. Entropy based nearest neighbor search in high dimensions. In SODA, 2006.
[36]
V. Pestov. Lower bounds on performance of metric tree indexing schemes for exact similarity search in high dimensions. Algorithmica, 66(2), 2013.
[37]
H. Samet. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufman, 2006.
[38]
V. Satuluri and S. Parthasarathy. Bayesian locality sensitive hashing for fast similarity search. PVLDB, 5(5), 2012.
[39]
R. Shinde, A. Goel, P. Gupta, and D. Dutta. Similarity search and locality sensitive hashing using ternary content addressable memories. In SIGMOD Conference, 2010.
[40]
M. Slaney et al. Optimal parameters for locality-sensitive hashing. Proceedings of the IEEE, 100(9), 2012.
[41]
N. Sundaram, A. Turmukhametova, N. Satish, T. Mostak, P. Indyk, S. Madden, and P. Dubey. Streaming similarity search over one billion tweets using parallel locality-sensitive hashing. PVLDB, 6(14), 2013.
[42]
Y. Tao, K. Yi, C. Sheng, and P. Kalnis. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Trans. Database Syst., 35(3), 2010.
[43]
Y. Tao, J. Zhang, D. Papadias, and N. Mamoulis. An efficient cost model for optimization of nearest neighbor search in low and medium dimensional spaces. IEEE Trans. Knowl. Data Eng., 16(10), 2004.
[44]
K. Ueno, X. Xi, E. J. Keogh, and D.-J. Lee. Anytime classification using the nearest neighbor algorithm with applications to stream mining. In ICDM, 2006.
[45]
R. Weber et al. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, 1998.
[46]
Y. Weiss et al. Spectral hashing. In NIPS, 2008.
[47]
A. C.-C. Yao and F. F. Yao. A general approach to d-dimensional geometric queries (extended abstract). In STOC, 1985.
[48]
S. Yin, M. Badr, and D. Vodislav. Dynamic multi-probe lsh: An i/o efficient index structure for approximate nearest neighbor search. In DEXA (1), 2013.

Cited By

View all
  • (2024)RoarGraph: A Projected Bipartite Graph for Efficient Cross-Modal Approximate Nearest Neighbor SearchProceedings of the VLDB Endowment10.14778/3681954.368195917:11(2735-2749)Online publication date: 1-Jul-2024
  • (2024)DET-LSH: A Locality-Sensitive Hashing Scheme with Dynamic Encoding Tree for Approximate Nearest Neighbor SearchProceedings of the VLDB Endowment10.14778/3665844.366585417:9(2241-2254)Online publication date: 1-May-2024
  • (2024)DIDS: Double Indices and Double Summarizations for Fast Similarity SearchProceedings of the VLDB Endowment10.14778/3665844.366585117:9(2198-2211)Online publication date: 1-May-2024
  • Show More Cited By
  1. SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 8, Issue 1
    September 2014
    100 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 September 2014
    Published in PVLDB Volume 8, Issue 1

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)22
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 14 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)RoarGraph: A Projected Bipartite Graph for Efficient Cross-Modal Approximate Nearest Neighbor SearchProceedings of the VLDB Endowment10.14778/3681954.368195917:11(2735-2749)Online publication date: 1-Jul-2024
    • (2024)DET-LSH: A Locality-Sensitive Hashing Scheme with Dynamic Encoding Tree for Approximate Nearest Neighbor SearchProceedings of the VLDB Endowment10.14778/3665844.366585417:9(2241-2254)Online publication date: 1-May-2024
    • (2024)DIDS: Double Indices and Double Summarizations for Fast Similarity SearchProceedings of the VLDB Endowment10.14778/3665844.366585117:9(2198-2211)Online publication date: 1-May-2024
    • (2024)Near-Duplicate Text Alignment with One Permutation HashingProceedings of the ACM on Management of Data10.1145/36771362:4(1-26)Online publication date: 30-Sep-2024
    • (2024)RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor SearchProceedings of the ACM on Management of Data10.1145/36549702:3(1-27)Online publication date: 30-May-2024
    • (2024)Bridging Software-Hardware for CXL Memory Disaggregation in Billion-Scale Nearest Neighbor SearchACM Transactions on Storage10.1145/363947120:2(1-30)Online publication date: 19-Feb-2024
    • (2024)Can LSH (locality-sensitive hashing) be replaced by neural network?Soft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-023-09402-328:2(1041-1053)Online publication date: 1-Jan-2024
    • (2023)ARKGraph: All-Range Approximate K-Nearest-Neighbor GraphProceedings of the VLDB Endowment10.14778/3603581.360360116:10(2645-2658)Online publication date: 1-Jun-2023
    • (2023)ELPIS: Graph-Based Similarity Search for Scalable Data ScienceProceedings of the VLDB Endowment10.14778/3583140.358316616:6(1548-1559)Online publication date: 1-Feb-2023
    • (2023)SPFresh: Incremental In-Place Update for Billion-Scale Vector SearchProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613166(545-561)Online publication date: 23-Oct-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media