Nothing Special   »   [go: up one dir, main page]

Skip to main content

Fast k Most Similar Neighbor Classifier for Mixed Data Based on Approximating and Eliminating

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2008)

Abstract

The k nearest neighbor (k-NN) classifier has been a widely used nonparametric technique in Pattern Recognition. In order to decide the class of a new prototype, the k-NN classifier performs an exhaustive comparison between the prototype to classify (query) and the prototypes in the training set T. However, when T is large, the exhaustive comparison is expensive. To avoid this problem, many fast k-NN algorithms have been developed. Some of these algorithms are based on Approximating-Eliminating search. In this case, the Approximating and Eliminating steps rely on the triangle inequality. However, in soft sciences, the prototypes are usually described by qualitative and quantitative features (mixed data), and sometimes the comparison function does not satisfy the triangle inequality. Therefore, in this work, a fast k most similar neighbour classifier for mixed data (AEMD) is presented. This classifier consists of two phases. In the first phase, a binary similarity matrix among the prototypes in T is stored. In the second phase, new Approximating and Eliminating steps, which are not based on the triangle inequality, are presented. The proposed classifier is compared against other fast k-NN algorithms, which are adapted to work with mixed data. Some experiments with real datasets are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. Trans. Information Theory 13, 21–27 (1967)

    Article  MATH  Google Scholar 

  2. Vidal, E.: An algorithm for finding nearest neighbours in (approximately) constant average time complexity. Pattern Recognition Letters 4, 145–157 (1986)

    Article  Google Scholar 

  3. Micó, L., Oncina, J., Vidal, E.: A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing-time and memory requirements. Pattern Recognition Letters 15, 9–17 (1994)

    Article  Google Scholar 

  4. Mico, L., Oncina, J., Carrasco, R.: A fast Branch and Bound nearest neighbor classifier in metric spaces. Pattern Recognition Letters 17, 731–739 (1996)

    Article  Google Scholar 

  5. Figueroa, K., Chávez, E., Navarro, G., Paredes, R.: On the last cost for proximity searching in metric spaces. In: Àlvarez, C., Serna, M.J. (eds.) WEA 2006. LNCS, vol. 4007, pp. 279–290. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Fukunaga, K., Narendra, P.: A branch and bound algorithm for computing k-nearest neighbors. IEEE Trans. Comput. 24, 743–750 (1975)

    MathSciNet  Google Scholar 

  7. Gómez-Ballester, E., Mico, L., Oncina, J.: Some approaches to improve tree-based nearest neighbor search algorithms. Pattern Recognition Letters 39, 171–179 (2006)

    MATH  Google Scholar 

  8. Yong-Sheng, C., Yi-Ping, H., Chiou-Shann, F.: Fast and versatile algorithm for nearest neighbor search based on lower bound tree. Pattern Recognition Letters 40(2), 360–375 (2007)

    MATH  Google Scholar 

  9. Bustos, B., Navarro, G.: Probabilistic proximity search algorithms based on compact partitions. Journal of Discrete Algorithms (JDA) 2(1), 115–134 (2003)

    Article  MathSciNet  Google Scholar 

  10. Tokoro, K., Yamaguchi, K., Masuda, S.: Improvements of TLAESA nearest neighbor search and extension to approximation search. In: ACSC 2006: Proceedings of the 29th Australian Computer Science Conference, pp. 77–83 (2006)

    Google Scholar 

  11. García-Serrano, J.R., Martínez-Trinidad, J.F.: Extension to C-Means Algorithm for the use of Similarity Functions. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 354–359. Springer, Heidelberg (1999)

    Google Scholar 

  12. Blake, C., Merz, C.: UCI Repository of machine learning databases, Department of Information and Computer Science, University of California, Irvine, CA (1998), http://www.uci.edu/mlearn/databases/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Takashi Washio Einoshin Suzuki Kai Ming Ting Akihiro Inokuchi

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hernández-Rodríguez, S., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F. (2008). Fast k Most Similar Neighbor Classifier for Mixed Data Based on Approximating and Eliminating. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68125-0_66

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68125-0_66

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68124-3

  • Online ISBN: 978-3-540-68125-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics