Abstract
K-nearest neighbors searching (KNNS) is to find K-nearest neighbors for query points. It is a primary problem in clustering analysis, classification, outlier detection and pattern recognition, and has been widely used in various applications. The exact searching algorithms, like KD-tree, M-tree, are not suitable for high-dimensional data. Approximate KNNS algorithms for high-dimensional data based on locality sensitive hashing (LSH) is becoming popular. However, the existing searching strategies are sensitive to the parameters of constructing LSH index. To solve this problem, a robust strategy for KNNS, called Robust-LSH, is proposed. It makes full use of points that frequently appear together with the query points to improve the diversity of candidates, so that it can use fewer hash tables to obtain more valuable candidates for KNNS. We do experiments on synthetic and real data. The results show that in terms of searching accuracy and running time, Robust-LSH has better performance than the p-stable LSH, RLSH and KD-tree algorithms.
Similar content being viewed by others
Notes
Data will be available on reasonable request.
References
Almalawi, A. M., Fahad, A., Tari, Z., Cheema, M. A., & Khalil, I. (2016). \(k\) nnvwc: An efficient \(k\) -nearest neighbors approach based on various-widths clustering. IEEE Transactions on Knowledge and Data Engineering, 28(1), 68–81.
Beygelzimer, A., Kakade, S., & Langford, J. (2006). Cover trees for nearest neighbor. In Proceedings of the 23rd international conference on machine learning, ICML ’06, p. 97C104. Association for Computing Machinery, New York, NY, USA.
Broder, A. Z., Charikar, M., & Frieze, A. M. (2000). Min-wise independent permutations. Journal of computer and system sciences, 60(3), 630–659.
Charikar, Moses, S. (2002). Similarity estimation techniques from rounding algorithms. In The 34th symposium on theory of computing (STOC), pp. 380–388.
Chen, Y., Hu, X., Fan, W., Shen, L., Zhang, Z., Liu, X., Du, J., Li, H., Chen, Y., & Li, H. (2020). Fast density peak clustering for large scale data based on knn. Knowledge-Based Systems, 187, 104824.
Chen, Y., Zhou, L., Bouguila, N., Zhong, B., Wu, F., Lei, Z., Du, J., & Li, H. (2018). Semi-convex hull tree: Fast nearest neighbor queries for large scale data on gpus. In 2018 IEEE International Conference on Data Mining (ICDM), pp. 911–916 . https://doi.org/10.1109/ICDM.2018.00110
Chen, Y., Zhou, L., Pei, S., Yu, Z., Chen, Y., Liu, X., Du, J., & Xiong, N. (2019). Knn-block dbscan: Fast clustering for large-scale data. IEEE Transactions on Systems, Man, and Cybernetics: Systems. https://doi.org/10.1109/TSMC.2019.2956527
Cheng, D., Huang, J., Zhang, S., Zhang, X., & Luo, X. (2021). A novel approximate spectral clustering algorithm with dense cores and density peaks. IEEE Transactions on Systems, Man, and Cybernetics: Systems.https://doi.org/10.1109/TSMC.2021.3049490
Cheng, D., Zhu, Q., Huang, J., Wu, Q., & Yang, L. (2019). A novel cluster validity index based on local cores. IEEE Transactions on Neural Networks and Learning Systems, 30(4), 985–999.
Cheng, D., Zhu, Q., Huang, J., Wu, Q., & Yang, L. (2021). Clustering with local density peaks-based minimum spanning tree. IEEE Transactions on Knowledge and Data Engineering, 33(2), 374–387. https://doi.org/10.1109/TKDE.2019.2930056
Ciaccia, P., Patella, M., & Zezula, P. (1997). M-tree: An efficient access method for similarity search in metric spaces. In Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB ’97, p. 426C435. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
Dasgupta, A., Kumar, R., & Sarlos, T. (2011). Fast locality-sensitive hashing. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1073–1081. Association for Computing Machinery, New York, NY, USA.
Datar, M., Immorlica, N., Indyk, P., & Mirrokni, V. (2004). Locality sensitive hashing scheme based on p-stable distributions. socg ’04. In SCG’04 proceedings of the twentieth annual symposium on computational Geometry, pp. 253–262.
Gu, X., Zhang, L., Zhang, D., Zhang, Y., Li, J., & Bao, N. (2012). Query range sensitive probability guided multi-probe locality sensitive hashing.
Huang, J., Zhu, Q., Yang, L., & Ji, F. (2015). A non-parameter outlier detection algorithm based on natural neighbor. Knowledge-Based Systems, 92, 71–77.
Indyk, P., Motwani, R., Raghavan, P., & Vempala, S. (1997). Locality-preserving hashing in multidimensional spaces. In The 29th Symposium on Theory of Computing (STOC), pp. 618–625.
Indyk, P., R., M. (1998). Approximate nearest neighbor: Towards removing the curse of dimensionality. In The 30th symposium on theory of computing (STOC), pp. 618–625.
Koga, H., Oguri, M., & Watanabe, T. (2012). Mixed-lsh: Reduction of remote accesses in distributed locality-sensitive hashing based on l1-distance. In 2012 IEEE 26th international conference on advanced information networking and applications, pp. 175–182 . https://doi.org/10.1109/AINA.2012.29
Kulis, B.K.G. (2012). Kernelized locality-sensitive hashing. IEEE Transactions on Pattern Analysis & Machine Intelligence, 34(6), 1092–1104.
Li, P., Wang, M., Cheng, J., Xu, C., & Lu, H. (2013). Spectral hashing with semantically consistent graph for image indexing. IEEE Transactions on Multimedia, 15(1), 141–152. https://doi.org/10.1109/TMM.2012.2199970
Lu, Y., Ma, T., Zhong, S., Cao, J., Wang, X., & Abdullah, A. (2014). Improved locality-sensitive hashing method for the approximate nearest neighbor problem. Chinese Physics B, 23(8), 80.
Luo, X., Zhou, M., Li, S., Hu, L., & Shang, M. (2020). Non-negativity constrained missing data estimation for high-dimensional and sparse matrices from industrial applications. IEEE Transactions on Cybernetics, 50(5), 1844–1855.
Luo, X., Zhou, M., Li, S., & Shang, M. (2018). An inherently nonnegative latent factor model for high-dimensional and sparse matrices from industrial applications. IEEE Transactions on Industrial Informatics, 14(5), 2011–2022.
Muja, M., & Lowe, D. G. (2014). Scalable nearest neighbor algorithms for high dimensional data. IEEE Transactions on Pattern Analysis & Machine Intelligence, 36(11), 2227–2240.
Panigrahy, R. (2005). Entropy based nearest neighbor search in high dimensions. In SODA 06: proceedings of the seventeenth annual acm-siam symposium on discrete algorithms, pp. 1186–1195.
Qin, L., Josephson, W., Zhe, W., Charikar, M., Kai, L. (2007). Multi-probe lsh: efficient indexing for high-dimensional similarity search. In International conference on very large data bases, pp. 950–961.
Shakhnarovich, G., Darrell, T., & Indyk, P. (2006). Locality-sensitive hashing using stable distributions, pp. 61–72.
Slaney, M., Lifshits, Y., & He, J. (2012). Optimal parameters for locality-sensitive hashing. Proceedings of the IEEE, 100(9), 2604–2623. https://doi.org/10.1109/JPROC.2012.2193849
Sproull, R. F. (1991). Refinements to nearest-neighbor searching in k-dimensional trees. Algorithmica, 6(1–6), 579–589.
Wang, J., Kumar, S., & Chang, S. F. (2012). Semi-supervised hashing for large-scale search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(12), 2393–2406. https://doi.org/10.1109/TPAMI.2012.48
Wang, W., Zhang, H., Zhang, Z., Liu, L., & Shao, L. (2021). Sparse graph based self-supervised hashing for scalable image retrieval. Information Sciences, 547, 622–640.
Wang, X. (2011). A fast exact k-nearest neighbors algorithm for high dimensional search using k-means clustering and triangle inequality. In The 2011 international joint conference on neural networks, pp. 1293–1299 . https://doi.org/10.1109/IJCNN.2011.6033373
Weber, R., & Blott, S. (1998). A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In 1998 the 24th international conference on very large data bases (VLDB), pp. 194–205.
Weiss, Y., Torralba, A., & Fergus, R. (2009). Spectral hashing. Advances in Neural Information Processing Systems, 282(3), 1753–1760.
Xue, Z., & Wang, H. (2021). Effective density-based clustering algorithms for incomplete data. Big Data Mining and Analysis, 3, 183–194.
Yang, L., Zhu, Q., Huang, J., & Cheng, D. (2017). Adaptive edited natural neighbor algorithm. Neurocomputing, 230, 427–433.
Zhu, P., Zhan, X., & Qiu, W. (2015). Efficient k-nearest neighbors search in high dimensions using mapreduce. In 2015 IEEE fifth international conference on big data and cloud computing, pp. 23–30 . https://doi.org/10.1109/BDCloud.2015.51
Acknowledgements
This work is supported in part by National Natural Science Foundation of China under Grant 62006029, 62172065, 62106024, in part by Postdoctoral Innovative Talent Support Program under Grant CQBX2021024, in part by Natural Science Foundation of Chongqing (China) under Grant cstc2019jcyj-msxmX0683, cstc2019jcyj-msxmX0838, cstc2019jcyj-msxmX0871, cstc2020jscx-lyjsAX0008, cstc2019jcyj-cxttX0002, cstc2021ycjh-bgzxm0013, and in part by Project of Chongqing Municipal Education Commission, China under Grant KJQN202001434, KJQN202001442, KJQN201901408, KJZDM20190140, HZ2021008.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cheng, D., Huang, J., Zhang, S. et al. A robust method based on locality sensitive hashing for K-nearest neighbors searching. Wireless Netw 30, 4195–4208 (2024). https://doi.org/10.1007/s11276-022-02927-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11276-022-02927-9