Fast k Most Similar Neighbor Classifier for Mixed Data Based on Approximating and Eliminating

Selene Hernández-Rodríguez¹,
J. Ariel Carrasco-Ochoa¹ &
J. Fco. Martínez-Trinidad¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5012))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2558 Accesses

Abstract

The k nearest neighbor (k-NN) classifier has been a widely used nonparametric technique in Pattern Recognition. In order to decide the class of a new prototype, the k-NN classifier performs an exhaustive comparison between the prototype to classify (query) and the prototypes in the training set T. However, when T is large, the exhaustive comparison is expensive. To avoid this problem, many fast k-NN algorithms have been developed. Some of these algorithms are based on Approximating-Eliminating search. In this case, the Approximating and Eliminating steps rely on the triangle inequality. However, in soft sciences, the prototypes are usually described by qualitative and quantitative features (mixed data), and sometimes the comparison function does not satisfy the triangle inequality. Therefore, in this work, a fast k most similar neighbour classifier for mixed data (AEMD) is presented. This classifier consists of two phases. In the first phase, a binary similarity matrix among the prototypes in T is stored. In the second phase, new Approximating and Eliminating steps, which are not based on the triangle inequality, are presented. The proposed classifier is compared against other fast k-NN algorithms, which are adapted to work with mixed data. Some experiments with real datasets are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Article Open access 06 November 2019

Enhanced KNNC Using Train Sample Clustering

Integrated Effect of Nearest Neighbors and Distance Measures in k-NN Algorithm

References

Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. Trans. Information Theory 13, 21–27 (1967)
Article MATH Google Scholar
Vidal, E.: An algorithm for finding nearest neighbours in (approximately) constant average time complexity. Pattern Recognition Letters 4, 145–157 (1986)
Article Google Scholar
Micó, L., Oncina, J., Vidal, E.: A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing-time and memory requirements. Pattern Recognition Letters 15, 9–17 (1994)
Article Google Scholar
Mico, L., Oncina, J., Carrasco, R.: A fast Branch and Bound nearest neighbor classifier in metric spaces. Pattern Recognition Letters 17, 731–739 (1996)
Article Google Scholar
Figueroa, K., Chávez, E., Navarro, G., Paredes, R.: On the last cost for proximity searching in metric spaces. In: Àlvarez, C., Serna, M.J. (eds.) WEA 2006. LNCS, vol. 4007, pp. 279–290. Springer, Heidelberg (2006)
Chapter Google Scholar
Fukunaga, K., Narendra, P.: A branch and bound algorithm for computing k-nearest neighbors. IEEE Trans. Comput. 24, 743–750 (1975)
MathSciNet Google Scholar
Gómez-Ballester, E., Mico, L., Oncina, J.: Some approaches to improve tree-based nearest neighbor search algorithms. Pattern Recognition Letters 39, 171–179 (2006)
MATH Google Scholar
Yong-Sheng, C., Yi-Ping, H., Chiou-Shann, F.: Fast and versatile algorithm for nearest neighbor search based on lower bound tree. Pattern Recognition Letters 40(2), 360–375 (2007)
MATH Google Scholar
Bustos, B., Navarro, G.: Probabilistic proximity search algorithms based on compact partitions. Journal of Discrete Algorithms (JDA) 2(1), 115–134 (2003)
Article MathSciNet Google Scholar
Tokoro, K., Yamaguchi, K., Masuda, S.: Improvements of TLAESA nearest neighbor search and extension to approximation search. In: ACSC 2006: Proceedings of the 29th Australian Computer Science Conference, pp. 77–83 (2006)
Google Scholar
García-Serrano, J.R., Martínez-Trinidad, J.F.: Extension to C-Means Algorithm for the use of Similarity Functions. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 354–359. Springer, Heidelberg (1999)
Google Scholar
Blake, C., Merz, C.: UCI Repository of machine learning databases, Department of Information and Computer Science, University of California, Irvine, CA (1998), http://www.uci.edu/mlearn/databases/

Download references

Author information

Authors and Affiliations

Computer Science Department National Institute of Astrophysics, Optics and Electronics, Luis Enrique Erro No. 1, Sta. María Tonantzintla, Puebla, CP: 72840, México
Selene Hernández-Rodríguez, J. Ariel Carrasco-Ochoa & J. Fco. Martínez-Trinidad

Authors

Selene Hernández-Rodríguez
View author publications
You can also search for this author in PubMed Google Scholar
J. Ariel Carrasco-Ochoa
View author publications
You can also search for this author in PubMed Google Scholar
J. Fco. Martínez-Trinidad
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Takashi Washio Einoshin Suzuki Kai Ming Ting Akihiro Inokuchi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hernández-Rodríguez, S., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F. (2008). Fast k Most Similar Neighbor Classifier for Mixed Data Based on Approximating and Eliminating. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68125-0_66

Download citation

DOI: https://doi.org/10.1007/978-3-540-68125-0_66
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68124-3
Online ISBN: 978-3-540-68125-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Fast k Most Similar Neighbor Classifier for Mixed Data Based on Approximating and Eliminating

Abstract

Access this chapter

Preview

Similar content being viewed by others

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Enhanced KNNC Using Train Sample Clustering

Integrated Effect of Nearest Neighbors and Distance Measures in k-NN Algorithm

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Fast k Most Similar Neighbor Classifier for Mixed Data Based on Approximating and Eliminating

Abstract

Access this chapter

Preview

Similar content being viewed by others

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Enhanced KNNC Using Train Sample Clustering

Integrated Effect of Nearest Neighbors and Distance Measures in k-NN Algorithm

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation