A robust method based on locality sensitive hashing for K-nearest neighbors searching

Dongdong Cheng ORCID: orcid.org/0000-0003-3500-5461^1,2,
Jinlong Huang¹,
Sulan Zhang¹ &
…
Quanwang Wu³

472 Accesses
4 Citations
Explore all metrics

Abstract

K-nearest neighbors searching (KNNS) is to find K-nearest neighbors for query points. It is a primary problem in clustering analysis, classification, outlier detection and pattern recognition, and has been widely used in various applications. The exact searching algorithms, like KD-tree, M-tree, are not suitable for high-dimensional data. Approximate KNNS algorithms for high-dimensional data based on locality sensitive hashing (LSH) is becoming popular. However, the existing searching strategies are sensitive to the parameters of constructing LSH index. To solve this problem, a robust strategy for KNNS, called Robust-LSH, is proposed. It makes full use of points that frequently appear together with the query points to improve the diversity of candidates, so that it can use fewer hash tables to obtain more valuable candidates for KNNS. We do experiments on synthetic and real data. The results show that in terms of searching accuracy and running time, Robust-LSH has better performance than the p-stable LSH, RLSH and KD-tree algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Approximate Nearest Neighbor Search Algorithm Using Distance-Based Hashing

Data-Dependent Locality Sensitive Hashing

LGTM: A Fast and Accurate kNN Search Algorithm in High-Dimensional Spaces

Notes

Data will be available on reasonable request.

References

Almalawi, A. M., Fahad, A., Tari, Z., Cheema, M. A., & Khalil, I. (2016). $k$ nnvwc: An efficient $k$ -nearest neighbors approach based on various-widths clustering. IEEE Transactions on Knowledge and Data Engineering, 28(1), 68–81.
Article Google Scholar
Beygelzimer, A., Kakade, S., & Langford, J. (2006). Cover trees for nearest neighbor. In Proceedings of the 23rd international conference on machine learning, ICML ’06, p. 97C104. Association for Computing Machinery, New York, NY, USA.
Broder, A. Z., Charikar, M., & Frieze, A. M. (2000). Min-wise independent permutations. Journal of computer and system sciences, 60(3), 630–659.
Article MathSciNet Google Scholar
Charikar, Moses, S. (2002). Similarity estimation techniques from rounding algorithms. In The 34th symposium on theory of computing (STOC), pp. 380–388.
Chen, Y., Hu, X., Fan, W., Shen, L., Zhang, Z., Liu, X., Du, J., Li, H., Chen, Y., & Li, H. (2020). Fast density peak clustering for large scale data based on knn. Knowledge-Based Systems, 187, 104824.
Article Google Scholar
Chen, Y., Zhou, L., Bouguila, N., Zhong, B., Wu, F., Lei, Z., Du, J., & Li, H. (2018). Semi-convex hull tree: Fast nearest neighbor queries for large scale data on gpus. In 2018 IEEE International Conference on Data Mining (ICDM), pp. 911–916 . https://doi.org/10.1109/ICDM.2018.00110
Chen, Y., Zhou, L., Pei, S., Yu, Z., Chen, Y., Liu, X., Du, J., & Xiong, N. (2019). Knn-block dbscan: Fast clustering for large-scale data. IEEE Transactions on Systems, Man, and Cybernetics: Systems. https://doi.org/10.1109/TSMC.2019.2956527
Article Google Scholar
Cheng, D., Huang, J., Zhang, S., Zhang, X., & Luo, X. (2021). A novel approximate spectral clustering algorithm with dense cores and density peaks. IEEE Transactions on Systems, Man, and Cybernetics: Systems.https://doi.org/10.1109/TSMC.2021.3049490
Article Google Scholar
Cheng, D., Zhu, Q., Huang, J., Wu, Q., & Yang, L. (2019). A novel cluster validity index based on local cores. IEEE Transactions on Neural Networks and Learning Systems, 30(4), 985–999.
Article Google Scholar
Cheng, D., Zhu, Q., Huang, J., Wu, Q., & Yang, L. (2021). Clustering with local density peaks-based minimum spanning tree. IEEE Transactions on Knowledge and Data Engineering, 33(2), 374–387. https://doi.org/10.1109/TKDE.2019.2930056
Article Google Scholar
Ciaccia, P., Patella, M., & Zezula, P. (1997). M-tree: An efficient access method for similarity search in metric spaces. In Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB ’97, p. 426C435. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
Dasgupta, A., Kumar, R., & Sarlos, T. (2011). Fast locality-sensitive hashing. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1073–1081. Association for Computing Machinery, New York, NY, USA.
Datar, M., Immorlica, N., Indyk, P., & Mirrokni, V. (2004). Locality sensitive hashing scheme based on p-stable distributions. socg ’04. In SCG’04 proceedings of the twentieth annual symposium on computational Geometry, pp. 253–262.
Gu, X., Zhang, L., Zhang, D., Zhang, Y., Li, J., & Bao, N. (2012). Query range sensitive probability guided multi-probe locality sensitive hashing.
Huang, J., Zhu, Q., Yang, L., & Ji, F. (2015). A non-parameter outlier detection algorithm based on natural neighbor. Knowledge-Based Systems, 92, 71–77.
Article Google Scholar
Indyk, P., Motwani, R., Raghavan, P., & Vempala, S. (1997). Locality-preserving hashing in multidimensional spaces. In The 29th Symposium on Theory of Computing (STOC), pp. 618–625.
Indyk, P., R., M. (1998). Approximate nearest neighbor: Towards removing the curse of dimensionality. In The 30th symposium on theory of computing (STOC), pp. 618–625.
Koga, H., Oguri, M., & Watanabe, T. (2012). Mixed-lsh: Reduction of remote accesses in distributed locality-sensitive hashing based on l1-distance. In 2012 IEEE 26th international conference on advanced information networking and applications, pp. 175–182 . https://doi.org/10.1109/AINA.2012.29
Kulis, B.K.G. (2012). Kernelized locality-sensitive hashing. IEEE Transactions on Pattern Analysis & Machine Intelligence, 34(6), 1092–1104.
Li, P., Wang, M., Cheng, J., Xu, C., & Lu, H. (2013). Spectral hashing with semantically consistent graph for image indexing. IEEE Transactions on Multimedia, 15(1), 141–152. https://doi.org/10.1109/TMM.2012.2199970
Article Google Scholar
Lu, Y., Ma, T., Zhong, S., Cao, J., Wang, X., & Abdullah, A. (2014). Improved locality-sensitive hashing method for the approximate nearest neighbor problem. Chinese Physics B, 23(8), 80.
Article Google Scholar
Luo, X., Zhou, M., Li, S., Hu, L., & Shang, M. (2020). Non-negativity constrained missing data estimation for high-dimensional and sparse matrices from industrial applications. IEEE Transactions on Cybernetics, 50(5), 1844–1855.
Article Google Scholar
Luo, X., Zhou, M., Li, S., & Shang, M. (2018). An inherently nonnegative latent factor model for high-dimensional and sparse matrices from industrial applications. IEEE Transactions on Industrial Informatics, 14(5), 2011–2022.
Article Google Scholar
Muja, M., & Lowe, D. G. (2014). Scalable nearest neighbor algorithms for high dimensional data. IEEE Transactions on Pattern Analysis & Machine Intelligence, 36(11), 2227–2240.
Article Google Scholar
Panigrahy, R. (2005). Entropy based nearest neighbor search in high dimensions. In SODA 06: proceedings of the seventeenth annual acm-siam symposium on discrete algorithms, pp. 1186–1195.
Qin, L., Josephson, W., Zhe, W., Charikar, M., Kai, L. (2007). Multi-probe lsh: efficient indexing for high-dimensional similarity search. In International conference on very large data bases, pp. 950–961.
Shakhnarovich, G., Darrell, T., & Indyk, P. (2006). Locality-sensitive hashing using stable distributions, pp. 61–72.
Slaney, M., Lifshits, Y., & He, J. (2012). Optimal parameters for locality-sensitive hashing. Proceedings of the IEEE, 100(9), 2604–2623. https://doi.org/10.1109/JPROC.2012.2193849
Article Google Scholar
Sproull, R. F. (1991). Refinements to nearest-neighbor searching in k-dimensional trees. Algorithmica, 6(1–6), 579–589.
Article MathSciNet Google Scholar
Wang, J., Kumar, S., & Chang, S. F. (2012). Semi-supervised hashing for large-scale search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(12), 2393–2406. https://doi.org/10.1109/TPAMI.2012.48
Article Google Scholar
Wang, W., Zhang, H., Zhang, Z., Liu, L., & Shao, L. (2021). Sparse graph based self-supervised hashing for scalable image retrieval. Information Sciences, 547, 622–640.
Article MathSciNet Google Scholar
Wang, X. (2011). A fast exact k-nearest neighbors algorithm for high dimensional search using k-means clustering and triangle inequality. In The 2011 international joint conference on neural networks, pp. 1293–1299 . https://doi.org/10.1109/IJCNN.2011.6033373
Weber, R., & Blott, S. (1998). A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In 1998 the 24th international conference on very large data bases (VLDB), pp. 194–205.
Weiss, Y., Torralba, A., & Fergus, R. (2009). Spectral hashing. Advances in Neural Information Processing Systems, 282(3), 1753–1760.
Google Scholar
Xue, Z., & Wang, H. (2021). Effective density-based clustering algorithms for incomplete data. Big Data Mining and Analysis, 3, 183–194.
Article Google Scholar
Yang, L., Zhu, Q., Huang, J., & Cheng, D. (2017). Adaptive edited natural neighbor algorithm. Neurocomputing, 230, 427–433.
Article Google Scholar
Zhu, P., Zhan, X., & Qiu, W. (2015). Efficient k-nearest neighbors search in high dimensions using mapreduce. In 2015 IEEE fifth international conference on big data and cloud computing, pp. 23–30 . https://doi.org/10.1109/BDCloud.2015.51

Download references

Acknowledgements

This work is supported in part by National Natural Science Foundation of China under Grant 62006029, 62172065, 62106024, in part by Postdoctoral Innovative Talent Support Program under Grant CQBX2021024, in part by Natural Science Foundation of Chongqing (China) under Grant cstc2019jcyj-msxmX0683, cstc2019jcyj-msxmX0838, cstc2019jcyj-msxmX0871, cstc2020jscx-lyjsAX0008, cstc2019jcyj-cxttX0002, cstc2021ycjh-bgzxm0013, and in part by Project of Chongqing Municipal Education Commission, China under Grant KJQN202001434, KJQN202001442, KJQN201901408, KJZDM20190140, HZ2021008.

Author information

Authors and Affiliations

College of Big Data and Intelligent Engineering, Yangtze Normal University, Chongqing, China
Dongdong Cheng, Jinlong Huang & Sulan Zhang
Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, China
Dongdong Cheng
College of Computer Science, Chongqing University, Chongqing, China
Quanwang Wu

Authors

Dongdong Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Jinlong Huang
View author publications
You can also search for this author in PubMed Google Scholar
Sulan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Quanwang Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dongdong Cheng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cheng, D., Huang, J., Zhang, S. et al. A robust method based on locality sensitive hashing for K-nearest neighbors searching. Wireless Netw 30, 4195–4208 (2024). https://doi.org/10.1007/s11276-022-02927-9

Download citation

Accepted: 14 February 2022
Published: 12 March 2022
Issue Date: July 2024
DOI: https://doi.org/10.1007/s11276-022-02927-9

A robust method based on locality sensitive hashing for K-nearest neighbors searching

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Approximate Nearest Neighbor Search Algorithm Using Distance-Based Hashing

Data-Dependent Locality Sensitive Hashing

LGTM: A Fast and Accurate kNN Search Algorithm in High-Dimensional Spaces

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A robust method based on locality sensitive hashing for K-nearest neighbors searching

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Approximate Nearest Neighbor Search Algorithm Using Distance-Based Hashing

Data-Dependent Locality Sensitive Hashing

LGTM: A Fast and Accurate kNN Search Algorithm in High-Dimensional Spaces

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation