Survey on KNN Methods in Data Science

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13621))

Included in the following conference series:

International Conference on Learning and Intelligent Optimization

805 Accesses
3 Citations

Abstract

The k-nearest neighbors (KNN) algorithm remains a useful and widely applied approach. In the recent years, we have seen many advances in KNN methods, but few research works give a holistic account of all aspects of KNN and the progress made. This paper is a brief survey on modern KNN methods and their role in data science. Furthermore, we survey: the challenges, how they are approached in the literature, the impact of the distance metric, several KNN variations, as well as query methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Enhancing K-nearest neighbor algorithm: a comprehensive review and performance analysis of modifications

Article Open access 11 August 2024

ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms

Which Fast Nearest Neighbour Search Algorithm to Use?

References

Alfeilat, H.A., et al.: Effects of distance measure choice on K-nearest neighbor classifier performance: a review. Big Data, 7 (2019)
Google Scholar
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. (1991)
Google Scholar
Ando, S.: Classifying imbalanced data in distance-based feature space. Knowl. Inf. Syst. 46 (2016)
Google Scholar
Andoni, A., Razenshteyn, I.: Optimal data-dependent hashing for approximate near neighbors. In: Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, STOC 2015, pp. 793–801. Association for Computing Machinery, New York, NY (2015). ISBN 9781450335362
Google Scholar
Arnaiz-González, Á., Díez-Pastor, J.-F., Rodríguez, J.J., García-Osorio, C.: Instance selection of linear complexity for big data. Knowl.-Based Syst. 107, 83–95 (2016)
Article Google Scholar
Babenko, A., Lempitsky, V.: The inverted multi-index. IEEE Trans. Pattern Anal. Mach. Intell. 37(6), 1247–1260 (2014)
Article Google Scholar
Bandaragoda, T.R., Ting, K.M., Albrecht, D., Liu, F.T., Wells, J.R.: Efficient anomaly detection by isolation using nearest neighbour ensemble. In: 2014 IEEE International Conference on Data Mining Workshop, pp. 698–705. IEEE (2014)
Google Scholar
Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 97–104 (2006)
Google Scholar
Chatzimilioudis, G., Costa, C., Zeinalipour-Yazti, D., Lee, W.-C., Pitoura, E.: Distributed in-memory processing of all k nearest neighbor queries. IEEE Trans. Knowl. Data Eng. 28(4), 925–938 (2015)
Article Google Scholar
Cunningham, P., Delany, S.: k-nearest neighbour classifiers. Mult Classif. Syst. 54, 04 (2007)
Google Scholar
Cunningham, P., Delany, S.J.: k-nearest neighbour classifiers - a tutorial. ACM Comput. Surv. (CSUR) 54(6), 1–25 (2021)
Article Google Scholar
Derrac, J., García, S., Herrera, F.: Fuzzy nearest neighbor algorithms: taxonomy, experimental analysis and prospects. Inf. Sci. 260, 98–119 (2014)
Article Google Scholar
Dudani, S.A.: The distance-weighted k-nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. SMC-6(4), 325–327 (1976)
Google Scholar
Fernández, A., del Río, S., Chawla, N.V., Herrera, F.: An insight into imbalanced big data classification: outcomes and challenges. Complex Intell. Syst. 3(2), 105–120 (2017)
Article Google Scholar
Friedman, J.H., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. (TOMS) 3(3), 209–226 (1977)
Article MATH Google Scholar
Garcia, S., Derrac, J., Cano, J., Herrera, F.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)
Article Google Scholar
García-Pedrajas, N., Romero del Castillo, J.A., Cerruela-García, G.: A proposal for local $k$ values for $k$ -nearest neighbor rule. IEEE Trans. Neural Netw. Learn. Syst. 28(2), 470–475 (2017)
Article Google Scholar
Goldberger, J., Hinton, G.E., Roweis, S., Salakhutdinov, R.R.: Neighbourhood components analysis. In: Saul, L., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 17. MIT Press, Cambridge (2004)
Google Scholar
Gou, J., Du, L., Zhang, Y., Xiong, T.: A new distance-weighted k-nearest neighbor classifier. J. Inf. Comput. Sci. 9, 1429–1436 (2012)
Google Scholar
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. (2009)
Google Scholar
He, J., Liu, W., Chang, S.-F.: Scalable similarity search with optimized kernel hashing. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1129–1138 (2010)
Google Scholar
He, X., Niyogi, P.: Locality preserving projections. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems, vol. 16. MIT Press, Cambridge (2003)
Google Scholar
Hu, L.-Y., Huang, M.-W., Ke, S.-W., Tsai, C.-F.: The distance function effect on k-nearest neighbor classification for medical datasets. Springerplus 5(1), 1–9 (2016). https://doi.org/10.1186/s40064-016-2941-7
Article Google Scholar
Indyk, P.: Nearest neighbors in high-dimensional spaces (2004)
Google Scholar
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Conference Proceedings of the Annual ACM Symposium on Theory of Computing, pp. 604–613, October 2000
Google Scholar
Iwamura, M., Sato, T., Kise, K.: What is the most efficient way to select nearest neighbor candidates for fast approximate nearest neighbor search? In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3535–3542 (2013)
Google Scholar
Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2010)
Article Google Scholar
Jin, Z., Li, C., Lin, Y., Cai, D.: Density sensitive hashing. IEEE Trans. Cybern. 44(8), 1362–1371 (2013)
Article Google Scholar
Kim, W., Kim, Y., Shim, K.: Parallel computation of k-nearest neighbor joins using mapreduce. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 696–705. IEEE (2016)
Google Scholar
Li, J., et al.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), 1–45 (2017)
Article Google Scholar
Li, S., Harner, E.J., Adjeroh, D.A.: Random KNN feature selection - a fast and stable alternative to random forests. BMC Bioinform. 12(1), 1–11 (2011)
Article Google Scholar
Liu, W., Chawla, S.: Class confidence weighted kNN algorithms for imbalanced data sets. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011. LNCS (LNAI), vol. 6635, pp. 345–356. Springer, Heidelberg (2011). ISBN 978-3-642-20847-8. https://doi.org/10.1007/978-3-642-20847-8_29
Loizou, G., Maybank, S.J.: The nearest neighbor and the Bayes error rates. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-9(2), 254–262 (1987)
Google Scholar
Maillo, J., Ramírez, S., Triguero, I., Herrera, F.: KNN-IS: an iterative spark-based design of the k-nearest neighbors classifier for big data. Knowl.-Based Syst. 117, 3–15 (2017)
Article Google Scholar
Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP (1), 2 (331–340), 2 (2009)
Google Scholar
Muja, M., Lowe, D.G.: Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014)
Article Google Scholar
Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), vol. 2, pp. 2161–2168. IEEE (2006)
Google Scholar
Pang, G., Ting, K.M., Albrecht, D.: LeSiNN: detecting anomalies by identifying least similar nearest neighbours. In: 2015 IEEE International Conference on Data Mining Workshop (ICDMW), pp. 623–630. IEEE (2015)
Google Scholar
Park, C.H., Kim, S.B.: Sequential random k-nearest neighbor feature selection for high-dimensional data. Expert Syst. Appl. 42(5), 2336–2342 (2015)
Article Google Scholar
Patwary, M.M.A., et al.: Panda: extreme scale parallel k-nearest neighbor on distributed architectures. In: 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 494–503. IEEE (2016)
Google Scholar
Rogati, M., Yang, Y.: High-performing feature selection for text classification. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 659–661 (2002)
Google Scholar
Shalev-Shwartz, S., Singer, Y., Ng, A.Y.: Online and batch learning of pseudo-metrics. In: Proceedings of the Twenty-First International Conference on Machine Learning, ICML 2004, pp. 94. Association for Computing Machinery, New York (2004)
Google Scholar
Shimomura, L.C., Oyamada, R.S., Vieira, M.R., Kaster, D.S.: A survey on graph-based methods for similarity searches in metric spaces. Inf. Syst. 95, 101507 (2021)
Article Google Scholar
Silpa-Anan, C., Hartley, R.: Optimised KD-trees for fast image descriptor matching. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)
Google Scholar
Sisodia, D., Sisodia, D.S.: Quad division prototype selection-based k-nearest neighbor classifier for click fraud detection from highly skewed user click dataset. Int. J. Eng. Sci. Technol. 28, 101011 (2022)
Google Scholar
Song, Y., Liang, J., Lu, J., Zhao, X.: An efficient instance selection algorithm for k nearest neighbor regression. Neurocomputing 251, 26–34 (2017)
Article Google Scholar
Susan, S., Kumar, A.: DST-ML-EkNN: data space transformation with metric learning and elite k-nearest neighbor cluster formation for classification of imbalanced datasets. In: Chiplunkar, N.N., Fukao, T. (eds.) Advances in Artificial Intelligence and Data Engineering. AISC, vol. 1133, pp. 319–328. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-3514-7_26
Chapter Google Scholar
Tahir, M.A., Bouridane, A., Kurugollu, F.: Simultaneous feature selection and feature weighting using hybrid Tabu search/K-nearest neighbor classifier. Pattern Recogn. Lett. 28(4), 438–446 (2007)
Article Google Scholar
Tang, B., He, H.: ENN: extended nearest neighbor method for pattern recognition [research frontier]. IEEE Comput. Intell. Mag. 10(3), 52–60 (2015)
Article Google Scholar
Ting, K.M., Washio, T., Wells, J.R., Aryal, S.: Defying the gravity of learning curve: a characteristic of nearest neighbour anomaly detectors. Mach. Learn. 106(1), 55–91 (2017)
Article MathSciNet MATH Google Scholar
Triguero, I., Peralta, D., Bacardit, J., García, S., Herrera, F.: MRPR: a mapreduce solution for prototype reduction in big data classification. Neurocomputing 150, 331–345 (2015)
Google Scholar
Triguero, I., García-Gil, D., Maillo, J., Luengo, J., García, S., Herrera, F.: Transforming big data into smart data: an insight on the use of the k-nearest neighbors algorithm to obtain quality data. WIREs Data Min. Knowl. Discov. 9(2) (2019)
Google Scholar
Vasuki, A., Vanathi, P.: A review of vector quantization techniques. IEEE Potentials 25(4), 39–47 (2006)
Article Google Scholar
Vincent, P., Bengio, Y.: K-local hyperplane and convex distance nearest neighbor algorithms. In: Dietterich, T., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14. MIT Press, Cambridge (2001)
Google Scholar
Wang, A., An, N., Chen, G., Li, L., Alterovitz, G.: Accelerating wrapper-based feature selection with k-nearest-neighbor. Knowl.-Based Syst. 83, 81–91 (2015)
Article Google Scholar
Wang, J., Neskovic, P., Cooper, L.N.: Neighborhood size selection in the k-nearest-neighbor rule using statistical confidence. Pattern Recogn. 39(3), 417–423 (2006)
Article MATH Google Scholar
Wang, J., Zhang, T., Song, J., Sebe, N., Shen, H.T.: A survey on learning to hash. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 769–790 (2018)
Google Scholar
Wang, M., Xu, X., Yue, Q., Wang, Y.: A comprehensive survey and experimental comparison of graph-based approximate nearest neighbor search. arXiv preprint arXiv:2101.12631 (2021)
Weinberger, K., Blitzer, J., Saul, L.: Distance metric learning for large margin nearest neighbor classification, January 2005
Google Scholar
Wettschereck, D., Dietterich, T.: Locally adaptive nearest neighbor algorithms. In: Cowan, J., Tesauro, G., Alspector, J. (eds.) Advances in Neural Information Processing Systems, vol. 6. Morgan-Kaufmann, Burlington (1993)
Google Scholar
Wu, Z., Yu, J.: Vector quantization: a review. Front. Inf. Technol. Electron. Eng. 20(4), 507–524 (2019). https://doi.org/10.1631/FITEE.1700833
Article Google Scholar
Xiao, C., Chaovalitwongse, W.A.: Optimization models for feature selection of decomposed nearest neighbor. IEEE Trans. Syst. Man Cybern. Syst. 46(2), 177–184 (2016)
Article Google Scholar
Xing, E., Jordan, M., Russell, S.J., Ng, A.: Distance metric learning with application to clustering with side-information. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, vol. 15. MIT Press, Cambridge (2002)
Google Scholar
Xu, H., Wang, J., Li, Z., Zeng, G., Li, S., Yu, N.: Complementary hashing for approximate nearest neighbor search. In: 2011 International Conference on Computer Vision, pp. 1631–1638 (2011)
Google Scholar
Yu, Z., Chen, H., Liu, J., You, J., Leung, H., Han, G.: Hybrid $k$-nearest neighbor classifier. IEEE Trans. Cybern. 46(6), 1263–1275 (2016)
Google Scholar
Zhang, S.: Challenges in KNN classification. IEEE Trans. Knowl. Data Eng. 1 (2021)
Google Scholar
Zhang, S., Li, X., Zong, M., Zhu, X., Cheng, D.: Learning k for KNN classification. ACM Trans. Intell. Syst. Technol. (TIST) 8(3), 1–19 (2017)
Google Scholar
Zhang, S., Li, X., Zong, M., Zhu, X., Wang, R.: Efficient KNN classification with different numbers of nearest neighbors. IEEE Trans. Neural Netw. Learn. Syst. 29(5), 1774–1785 (2017)
Article MathSciNet Google Scholar
Zhang, S., Cheng, D., Deng, Z., Zong, M., Deng, X.: A novel KNN algorithm with data-driven k parameter computation. Pattern Recogn. Lett. 109, 44–54 (2018). Special Issue on Pattern Discovery Multi-Source Data (PDMSD)
Google Scholar
Zhang, X., Li, Y., Kotagiri, R., Wu, L., Tari, Z., Cheriet, M.: KRNN: k rare-class nearest neighbour classification. Pattern Recogn. 62, 33–44 (2017)
Article Google Scholar
Zhang, X., Xiao, H., Gao, R., Zhang, H., Wang, Y.: K-nearest neighbors rule combining prototype selection and local feature weighting for classification. Knowl.-Based Syst. 243, 108451 (2022)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computational Intelligence Laboratory, Department of Mathematics, University of Patras, 26110, Patras, Greece
Panos K. Syriopoulos, Sotiris B. Kotsiantis & Michael N. Vrahatis

Authors

Panos K. Syriopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Sotiris B. Kotsiantis
View author publications
You can also search for this author in PubMed Google Scholar
Michael N. Vrahatis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Panos K. Syriopoulos .

Editor information

Editors and Affiliations

SBA Research, Vienna, Austria
Dimitris E. Simos
Moscow Aviation Institute (National Research University), Moscow, Russia
Varvara A. Rasskazova
Università degli Studi di Milano-Bicocca, Milan, Italy
Francesco Archetti
Wilfrid Laurier University, Waterloo, ON, Canada
Ilias S. Kotsireas
University of Florida, Gainesville, FL, USA
Panos M. Pardalos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Syriopoulos, P.K., Kotsiantis, S.B., Vrahatis, M.N. (2022). Survey on KNN Methods in Data Science. In: Simos, D.E., Rasskazova, V.A., Archetti, F., Kotsireas, I.S., Pardalos, P.M. (eds) Learning and Intelligent Optimization. LION 2022. Lecture Notes in Computer Science, vol 13621. Springer, Cham. https://doi.org/10.1007/978-3-031-24866-5_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-24866-5_28
Published: 05 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24865-8
Online ISBN: 978-3-031-24866-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Survey on KNN Methods in Data Science

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Enhancing K-nearest neighbor algorithm: a comprehensive review and performance analysis of modifications

ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms

Which Fast Nearest Neighbour Search Algorithm to Use?

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Survey on KNN Methods in Data Science

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Enhancing K-nearest neighbor algorithm: a comprehensive review and performance analysis of modifications

ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms

Which Fast Nearest Neighbour Search Algorithm to Use?

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation