Abstract
The k nearest neighbor (k-NN) classifier has been a widely used nonparametric technique in Pattern Recognition. In order to decide the class of a new prototype, the k-NN classifier performs an exhaustive comparison between the prototype to classify (query) and the prototypes in the training set T. However, when T is large, the exhaustive comparison is expensive. To avoid this problem, many fast k-NN algorithms have been developed. Some of these algorithms are based on Approximating-Eliminating search. In this case, the Approximating and Eliminating steps rely on the triangle inequality. However, in soft sciences, the prototypes are usually described by qualitative and quantitative features (mixed data), and sometimes the comparison function does not satisfy the triangle inequality. Therefore, in this work, a fast k most similar neighbour classifier for mixed data (AEMD) is presented. This classifier consists of two phases. In the first phase, a binary similarity matrix among the prototypes in T is stored. In the second phase, new Approximating and Eliminating steps, which are not based on the triangle inequality, are presented. The proposed classifier is compared against other fast k-NN algorithms, which are adapted to work with mixed data. Some experiments with real datasets are presented.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. Trans. Information Theory 13, 21–27 (1967)
Vidal, E.: An algorithm for finding nearest neighbours in (approximately) constant average time complexity. Pattern Recognition Letters 4, 145–157 (1986)
Micó, L., Oncina, J., Vidal, E.: A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing-time and memory requirements. Pattern Recognition Letters 15, 9–17 (1994)
Mico, L., Oncina, J., Carrasco, R.: A fast Branch and Bound nearest neighbor classifier in metric spaces. Pattern Recognition Letters 17, 731–739 (1996)
Figueroa, K., Chávez, E., Navarro, G., Paredes, R.: On the last cost for proximity searching in metric spaces. In: Àlvarez, C., Serna, M.J. (eds.) WEA 2006. LNCS, vol. 4007, pp. 279–290. Springer, Heidelberg (2006)
Fukunaga, K., Narendra, P.: A branch and bound algorithm for computing k-nearest neighbors. IEEE Trans. Comput. 24, 743–750 (1975)
Gómez-Ballester, E., Mico, L., Oncina, J.: Some approaches to improve tree-based nearest neighbor search algorithms. Pattern Recognition Letters 39, 171–179 (2006)
Yong-Sheng, C., Yi-Ping, H., Chiou-Shann, F.: Fast and versatile algorithm for nearest neighbor search based on lower bound tree. Pattern Recognition Letters 40(2), 360–375 (2007)
Bustos, B., Navarro, G.: Probabilistic proximity search algorithms based on compact partitions. Journal of Discrete Algorithms (JDA) 2(1), 115–134 (2003)
Tokoro, K., Yamaguchi, K., Masuda, S.: Improvements of TLAESA nearest neighbor search and extension to approximation search. In: ACSC 2006: Proceedings of the 29th Australian Computer Science Conference, pp. 77–83 (2006)
García-Serrano, J.R., Martínez-Trinidad, J.F.: Extension to C-Means Algorithm for the use of Similarity Functions. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 354–359. Springer, Heidelberg (1999)
Blake, C., Merz, C.: UCI Repository of machine learning databases, Department of Information and Computer Science, University of California, Irvine, CA (1998), http://www.uci.edu/mlearn/databases/
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hernández-Rodríguez, S., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F. (2008). Fast k Most Similar Neighbor Classifier for Mixed Data Based on Approximating and Eliminating. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68125-0_66
Download citation
DOI: https://doi.org/10.1007/978-3-540-68125-0_66
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68124-3
Online ISBN: 978-3-540-68125-0
eBook Packages: Computer ScienceComputer Science (R0)