Abstract
This paper presents a new self-generating prototype method based on information entropy to reduce the size of training datasets. The method accelerates the classifier training time without significantly decreasing the quality in the data classification task. The effectiveness of the proposed method is compared to the K-nearest neighbour classifier (kNN) and the genetic algorithm prototype selection (GA). kNN is a benchmark method used for data classification tasks, while GA is a prototype selection method that provides competitive optimisation of accuracy and the data reduction ratio. Considering thirty different public datasets, the results of the comparisons demonstrate that the proposed method outperforms kNN when using the original training set as well as the reduced training set obtained via GA prototype selection.
This study was financed in part by the Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior - Brasil (CAPES) - Finance Code 001.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Acampora, G., Tortora, G., Vitiello, A.: Applying SPEA2 to prototype selection for nearest neighbor classification. In: 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 003924–003929. IEEE (2016)
Brighton, H., Mellish, C.: Advances in instance selection for instance-based learning algorithms. Data Min. Knowl. Disc. 6(2), 153–172 (2002)
Chen, C., Jóźwik, A.: A sample set condensation algorithm for the class sensitive artificial neural network. Pattern Recogn. Lett. 17(8), 819–823 (1996)
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
Deng, Z., Zhu, X., Cheng, D., Zong, M., Zhang, S.: Efficient KNN classification algorithm for big data. Neurocomputing 195, 143–148 (2016)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, New York (2001)
Garcia, S., Derrac, J., Cano, J., Herrera, F.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)
García, S., Molina, D., Lozano, M., Herrera, F.: A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: a case study on the CEC’2005 special session on real parameter optimization. J. Heuristics 15(6), 617 (2009)
Ougiaroglou, S., Evangelidis, G., Dervos, D.A.: FHC: an adaptive fast hybrid method for K-NN classification. Logic J. IGPL 23(3), 431–450 (2015). https://doi.org/10.1093/jigpal/jzv015
Pękalska, E., Duin, R.P., Paclík, P.: Prototype selection for dissimilarity-based classifiers. Pattern Recogn. 39(2), 189–208 (2006)
Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30, 1958–1970 (2008). https://doi.org/10.1109/TPAMI.2008.128
Triguero, I., Derrac, J., Garcia, S., Herrera, F.: A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42, 86–100 (2012). https://doi.org/10.1109/TSMCC.2010.2103939
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bull. 1(6), 80–83 (1945)
Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38(3), 257–286 (2000)
Wu, X., Kumar, V., Quinlan, J.R., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1) (2008). https://doi.org/10.1007/s10115-007-0114-2
Wu, X., Kumar, V.: The Top Ten Algorithms in Data Mining. CRC Press, Boca Raton (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Manastarla, A., Silva, L.A. (2019). A Self-generating Prototype Method Based on Information Entropy Used for Condensing Data in Classification Tasks. In: Yin, H., Camacho, D., Tino, P., Tallón-Ballesteros, A., Menezes, R., Allmendinger, R. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2019. IDEAL 2019. Lecture Notes in Computer Science(), vol 11871. Springer, Cham. https://doi.org/10.1007/978-3-030-33607-3_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-33607-3_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33606-6
Online ISBN: 978-3-030-33607-3
eBook Packages: Computer ScienceComputer Science (R0)