Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

A robust method based on locality sensitive hashing for K-nearest neighbors searching

  • Published:
Wireless Networks Aims and scope Submit manuscript

Abstract

K-nearest neighbors searching (KNNS) is to find K-nearest neighbors for query points. It is a primary problem in clustering analysis, classification, outlier detection and pattern recognition, and has been widely used in various applications. The exact searching algorithms, like KD-tree, M-tree, are not suitable for high-dimensional data. Approximate KNNS algorithms for high-dimensional data based on locality sensitive hashing (LSH) is becoming popular. However, the existing searching strategies are sensitive to the parameters of constructing LSH index. To solve this problem, a robust strategy for KNNS, called Robust-LSH, is proposed. It makes full use of points that frequently appear together with the query points to improve the diversity of candidates, so that it can use fewer hash tables to obtain more valuable candidates for KNNS. We do experiments on synthetic and real data. The results show that in terms of searching accuracy and running time, Robust-LSH has better performance than the p-stable LSH, RLSH and KD-tree algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Data will be available on reasonable request.

References

  1. Almalawi, A. M., Fahad, A., Tari, Z., Cheema, M. A., & Khalil, I. (2016). \(k\) nnvwc: An efficient \(k\) -nearest neighbors approach based on various-widths clustering. IEEE Transactions on Knowledge and Data Engineering, 28(1), 68–81.

    Article  Google Scholar 

  2. Beygelzimer, A., Kakade, S., & Langford, J. (2006). Cover trees for nearest neighbor. In Proceedings of the 23rd international conference on machine learning, ICML ’06, p. 97C104. Association for Computing Machinery, New York, NY, USA.

  3. Broder, A. Z., Charikar, M., & Frieze, A. M. (2000). Min-wise independent permutations. Journal of computer and system sciences, 60(3), 630–659.

    Article  MathSciNet  Google Scholar 

  4. Charikar, Moses, S. (2002). Similarity estimation techniques from rounding algorithms. In The 34th symposium on theory of computing (STOC), pp. 380–388.

  5. Chen, Y., Hu, X., Fan, W., Shen, L., Zhang, Z., Liu, X., Du, J., Li, H., Chen, Y., & Li, H. (2020). Fast density peak clustering for large scale data based on knn. Knowledge-Based Systems, 187, 104824.

    Article  Google Scholar 

  6. Chen, Y., Zhou, L., Bouguila, N., Zhong, B., Wu, F., Lei, Z., Du, J., & Li, H. (2018). Semi-convex hull tree: Fast nearest neighbor queries for large scale data on gpus. In 2018 IEEE International Conference on Data Mining (ICDM), pp. 911–916 . https://doi.org/10.1109/ICDM.2018.00110

  7. Chen, Y., Zhou, L., Pei, S., Yu, Z., Chen, Y., Liu, X., Du, J., & Xiong, N. (2019). Knn-block dbscan: Fast clustering for large-scale data. IEEE Transactions on Systems, Man, and Cybernetics: Systems. https://doi.org/10.1109/TSMC.2019.2956527

    Article  Google Scholar 

  8. Cheng, D., Huang, J., Zhang, S., Zhang, X., & Luo, X. (2021). A novel approximate spectral clustering algorithm with dense cores and density peaks. IEEE Transactions on Systems, Man, and Cybernetics: Systems.https://doi.org/10.1109/TSMC.2021.3049490

    Article  Google Scholar 

  9. Cheng, D., Zhu, Q., Huang, J., Wu, Q., & Yang, L. (2019). A novel cluster validity index based on local cores. IEEE Transactions on Neural Networks and Learning Systems, 30(4), 985–999.

    Article  Google Scholar 

  10. Cheng, D., Zhu, Q., Huang, J., Wu, Q., & Yang, L. (2021). Clustering with local density peaks-based minimum spanning tree. IEEE Transactions on Knowledge and Data Engineering, 33(2), 374–387. https://doi.org/10.1109/TKDE.2019.2930056

    Article  Google Scholar 

  11. Ciaccia, P., Patella, M., & Zezula, P. (1997). M-tree: An efficient access method for similarity search in metric spaces. In Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB ’97, p. 426C435. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

  12. Dasgupta, A., Kumar, R., & Sarlos, T. (2011). Fast locality-sensitive hashing. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1073–1081. Association for Computing Machinery, New York, NY, USA.

  13. Datar, M., Immorlica, N., Indyk, P., & Mirrokni, V. (2004). Locality sensitive hashing scheme based on p-stable distributions. socg ’04. In SCG’04 proceedings of the twentieth annual symposium on computational Geometry, pp. 253–262.

  14. Gu, X., Zhang, L., Zhang, D., Zhang, Y., Li, J., & Bao, N. (2012). Query range sensitive probability guided multi-probe locality sensitive hashing.

  15. Huang, J., Zhu, Q., Yang, L., & Ji, F. (2015). A non-parameter outlier detection algorithm based on natural neighbor. Knowledge-Based Systems, 92, 71–77.

    Article  Google Scholar 

  16. Indyk, P., Motwani, R., Raghavan, P., & Vempala, S. (1997). Locality-preserving hashing in multidimensional spaces. In The 29th Symposium on Theory of Computing (STOC), pp. 618–625.

  17. Indyk, P., R., M. (1998). Approximate nearest neighbor: Towards removing the curse of dimensionality. In The 30th symposium on theory of computing (STOC), pp. 618–625.

  18. Koga, H., Oguri, M., & Watanabe, T. (2012). Mixed-lsh: Reduction of remote accesses in distributed locality-sensitive hashing based on l1-distance. In 2012 IEEE 26th international conference on advanced information networking and applications, pp. 175–182 . https://doi.org/10.1109/AINA.2012.29

  19. Kulis, B.K.G. (2012). Kernelized locality-sensitive hashing. IEEE Transactions on Pattern Analysis & Machine Intelligence, 34(6), 1092–1104.

  20. Li, P., Wang, M., Cheng, J., Xu, C., & Lu, H. (2013). Spectral hashing with semantically consistent graph for image indexing. IEEE Transactions on Multimedia, 15(1), 141–152. https://doi.org/10.1109/TMM.2012.2199970

    Article  Google Scholar 

  21. Lu, Y., Ma, T., Zhong, S., Cao, J., Wang, X., & Abdullah, A. (2014). Improved locality-sensitive hashing method for the approximate nearest neighbor problem. Chinese Physics B, 23(8), 80.

    Article  Google Scholar 

  22. Luo, X., Zhou, M., Li, S., Hu, L., & Shang, M. (2020). Non-negativity constrained missing data estimation for high-dimensional and sparse matrices from industrial applications. IEEE Transactions on Cybernetics, 50(5), 1844–1855.

    Article  Google Scholar 

  23. Luo, X., Zhou, M., Li, S., & Shang, M. (2018). An inherently nonnegative latent factor model for high-dimensional and sparse matrices from industrial applications. IEEE Transactions on Industrial Informatics, 14(5), 2011–2022.

    Article  Google Scholar 

  24. Muja, M., & Lowe, D. G. (2014). Scalable nearest neighbor algorithms for high dimensional data. IEEE Transactions on Pattern Analysis & Machine Intelligence, 36(11), 2227–2240.

    Article  Google Scholar 

  25. Panigrahy, R. (2005). Entropy based nearest neighbor search in high dimensions. In SODA 06: proceedings of the seventeenth annual acm-siam symposium on discrete algorithms, pp. 1186–1195.

  26. Qin, L., Josephson, W., Zhe, W., Charikar, M., Kai, L. (2007). Multi-probe lsh: efficient indexing for high-dimensional similarity search. In International conference on very large data bases, pp. 950–961.

  27. Shakhnarovich, G., Darrell, T., & Indyk, P. (2006). Locality-sensitive hashing using stable distributions, pp. 61–72.

  28. Slaney, M., Lifshits, Y., & He, J. (2012). Optimal parameters for locality-sensitive hashing. Proceedings of the IEEE, 100(9), 2604–2623. https://doi.org/10.1109/JPROC.2012.2193849

    Article  Google Scholar 

  29. Sproull, R. F. (1991). Refinements to nearest-neighbor searching in k-dimensional trees. Algorithmica, 6(1–6), 579–589.

    Article  MathSciNet  Google Scholar 

  30. Wang, J., Kumar, S., & Chang, S. F. (2012). Semi-supervised hashing for large-scale search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(12), 2393–2406. https://doi.org/10.1109/TPAMI.2012.48

    Article  Google Scholar 

  31. Wang, W., Zhang, H., Zhang, Z., Liu, L., & Shao, L. (2021). Sparse graph based self-supervised hashing for scalable image retrieval. Information Sciences, 547, 622–640.

    Article  MathSciNet  Google Scholar 

  32. Wang, X. (2011). A fast exact k-nearest neighbors algorithm for high dimensional search using k-means clustering and triangle inequality. In The 2011 international joint conference on neural networks, pp. 1293–1299 . https://doi.org/10.1109/IJCNN.2011.6033373

  33. Weber, R., & Blott, S. (1998). A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In 1998 the 24th international conference on very large data bases (VLDB), pp. 194–205.

  34. Weiss, Y., Torralba, A., & Fergus, R. (2009). Spectral hashing. Advances in Neural Information Processing Systems, 282(3), 1753–1760.

    Google Scholar 

  35. Xue, Z., & Wang, H. (2021). Effective density-based clustering algorithms for incomplete data. Big Data Mining and Analysis, 3, 183–194.

    Article  Google Scholar 

  36. Yang, L., Zhu, Q., Huang, J., & Cheng, D. (2017). Adaptive edited natural neighbor algorithm. Neurocomputing, 230, 427–433.

    Article  Google Scholar 

  37. Zhu, P., Zhan, X., & Qiu, W. (2015). Efficient k-nearest neighbors search in high dimensions using mapreduce. In 2015 IEEE fifth international conference on big data and cloud computing, pp. 23–30 . https://doi.org/10.1109/BDCloud.2015.51

Download references

Acknowledgements

This work is supported in part by National Natural Science Foundation of China under Grant 62006029, 62172065, 62106024, in part by Postdoctoral Innovative Talent Support Program under Grant CQBX2021024, in part by Natural Science Foundation of Chongqing (China) under Grant cstc2019jcyj-msxmX0683, cstc2019jcyj-msxmX0838, cstc2019jcyj-msxmX0871, cstc2020jscx-lyjsAX0008, cstc2019jcyj-cxttX0002, cstc2021ycjh-bgzxm0013, and in part by Project of Chongqing Municipal Education Commission, China under Grant KJQN202001434, KJQN202001442, KJQN201901408, KJZDM20190140, HZ2021008.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dongdong Cheng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, D., Huang, J., Zhang, S. et al. A robust method based on locality sensitive hashing for K-nearest neighbors searching. Wireless Netw 30, 4195–4208 (2024). https://doi.org/10.1007/s11276-022-02927-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11276-022-02927-9

Keywords

Navigation