Abstract
The task of assigning tags to the words of a sentence has many applications today in natural language processing (NLP) and therefore requires a fast and accurate algorithm. This paper presents a Part-of-Speech Tagger based on Global-Best Harmony Search (GBHS) which includes local optimization (based on the Hill Climbing algorithm that includes knowledge of the problem to define the neighborhood) for the best harmony after each improvisation (iteration). In the proposed algorithm, a candidate solution (harmony) is represented as a vector of the size of the numbers of word in a sentence, while the fitness function considers the cumulative probability of tagging each word and its relation to its predecessor and successor word. The proposed algorithm obtained 95.2% precision values and improved on the results obtained by other taggers. The experimental results were analyzed with Friedman non-parametric statistical tests, with a level of significance of 90%. The proposed Part-of-Speech Tagger algorithm was found to perform with quality and efficiency in the tagging problem, in contrast to the comparison algorithms. The Brown corpus divided into 5 folders was used to conduct the experiments, thereby allowing application of cross-validation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Pearson - Addison Wesley, New York (1999)
Sammut, C., Webb, G.I. (eds.): Encyclopedia of Machine Learning (Part of Speech Tagging). Springer, New York (2010)
Paul, A., Purkayastha, B.S., Sarkar, S.I.: Hidden Markov model based part of speech tagging for Nepali language. In: 2015 International Symposium on Advanced Computing and Communication (lSACC), Silchar, pp. 149–156 (2015)
Makazhanov, A., Yessenbayev, Z., Sabyrgaliyev, I., Sharafudinov, A.: On certain aspects of Kazakh part-of-speech. In: IEEE 8th International Conference on Application of Information and Communication Technologies (AICT), Astana, pp. 1–4 (2014)
Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 113–142 (1996)
Ariaratnam, I., Weerasinghe, A.R., Liyanage, C.: A shallow parser for Tamil. In: 2014 International Conference on Advances in ICT for Emerging Regions (ICTer), Colombo, pp. 197–203 (2014)
Brants, T.: TnT - a statistical part-of-speech tagger. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, ANLC 2000, Stroudsburg, PA, USA, pp. 224–231 (2000)
Brill, E.: A simple rule-based part of speech tagger. In: Proceedings of the Third Conference on Applied Natural Language Processing, ANLC 1992, Stroudsburg, PA, USA, pp. 152–155 (1992)
Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput. Linguist. 21(4), 543–565 (1995)
Schmid, H.: Part-of-speech tagging with neural networks. In: Proceedings of the 15th Conference on Computational Linguistics, Stroudsburg, PA, USA, pp. 172–176 (1994)
Nakamura, M., Shikano, K.: A study of English word category prediction based on neutral networks, acoustics, speech, and signal processing. In: International Conference on Acoustics, Speech, and Signal Processing, IEEE, Glasgow, pp. 731–734 (1989)
Forsati, R., Shamsfard, M., Mojtahedpour, P.: An efficient meta heuristic algorithm for POS-tagging. In: 2010 Fifth International Multi-Conference on Computing in the Global Information Technology (ICCGI), Valencia (2010)
Forsati, R., Shamsfard, M.: Novel harmony search-based algorithms for part-of-speech tagging. Knowl. Inf. Syst. 42(3), 709–736 (2015)
Silva, A.P., Silva, A., Rodríguez, I.: An approach to the POS tagging problem using genetic algorithms. In: Madani, K., Correia, A., Rosa, A., Filipe, J. (eds.) Computational Intelligence. Studies in Computational Intelligence, vol. 577, pp. 3–17. Springer, Cham (2012). https://doi.org/10.1007/978-3-319-11271-8_1
Jianchao, T.: An English part of speech tagging method based on maximum entropy. In: 2015 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS), Halong Bay, Vietnam, pp. 76–80 (2015)
Ranjan Das, B., Sahoo, S., Sekhar Panda, C., Patnaik, S.: Part of Speech tagging in Odia using support vector machine. In: Procedia Computer Science, International Conference on Intelligent Computing, Communication Converge, ICCC-2015, vol. 48, pp. 507–512 (2015)
Ekbal, A., Bandyopadhyay, S.: Part of speech tagging in Bengali using support vector machine. In: International Conference on Information Technology, ICIT 2008, pp. 10–111 (2008)
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning 2001, pp. 282–289 (2001)
Araujo, L.: How evolutionary algorithms are applied to statistical natural language processing. Artif. Intell. Rev. 28(4), 275–303 (2007)
AlSuhaibani, R.S., Newman, C.D., Collard, M.L., Maletic, J.I.: Heuristic-based part-of-speech tagging of source code identifiers and comments. In: 2015 IEEE 5th Workshop on Mining Unstructured Data (MUD), Bremen, pp. 1–5 (2015)
Aziz, T.A., Sunitha, C.: A hybrid parts of speech tagger for Malayalam. In: 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Kochi, pp. 1502–1507 (2015)
Mall, S., Jaiswal, U.C.: Innovative algorithms for parts of speech tagging in Hindi-English machine. In: 2015 International Conference on Green Computing and Internet of Things (ICGCIoT), Noida, pp. 709–714 (2015)
Tian, Y., Lo, D.: A comparative study on the effectiveness of part-of-speech tagging techniques on bug reports. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution and Reengineering (SANER), Montreal, QB, pp. 570–574 (2015)
Carneiro, H.C., França, F.M., Lima, P.M.: Multilingual part-of-speech tagging with weightless neural networks. Neural Netw. 66, 11–21 (2015)
Carneiro, H.C., França, F.M., Lima, P.M.: WANN-tagger - a weightless artificial neural network tagger for the Portuguese language. In: Proceedings of the International Conference on Fuzzy Computation and International Conference on Neural Computation, ICFC-ICNC 2010, Valencia, pp. 330–335 (2010)
Poel, M., Boschman, E, Akker, R.A.: Neural network based Dutch part of speech tagger. In: Proceedings of the Twentieth Belgian-Dutch Artificial Intelligence Conference, BNAIC 2008, The Netherlands, pp. 217–224 (2008)
Zennaki, O., Semmar, N., Besacier, L.: Unsupervised and lightly supervised part-of-speech tagging using recurrent neural networks. In: 29th Pacific Asian Conference on Language, Information and Computation, Shangai, China, pp. 133–142 (2015)
Duong, L., Cohn, T., Verspoor, K., Bird, S., Cook, P.: What a can we get from 1000 tokens? A case study of multilingual POS tagging for resource-poor languages. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, pp. 886–897 (2014)
Forsati, R., Shamsfard, M.: Cooperation of evolutionary and statistical statistical POS-tagging. In: 2012 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP), Shiraz, Fars, pp. 446–451 (2012)
Silva, A.P., Silva, A., Rodríguez, I.: Part-of-speech tagging using evolutionary computation. In: Terrazas, G., Otero, F., Masegosa, A. (eds.) Nature Inspired Cooperative Strategies for Optimization (NICSO 2013). Studies in Computational Intelligence, vol. 512, pp. 167–178. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-01692-4_13
Silva, A.P., Silva, A., Rodríguez, I.: Tagging with disambiguation rules a new evolutionary approach to the part-of-speech tagging problem. In: Proceedings of the 4th International Joint Conference on Computational Intelligence, ECTA-2012, pp. 5–14 (2012)
Silva, A.P., Silva, A., Rodríguez, I.: A new approach to the POS tagging problem using evolutionary. In: Proceedings of Recent Advances in Natural Language Processing, Hissar, Bulgaria, pp. 619–625 (2013)
Silva, A.P., Silva, A., Rodríguez, I.: PSO-tagger: a new biologically inspired approach to the part-of-speech tagging problem. In: Tomassini, M., Antonioni, A., Daolio, F., Buesser, P. (eds.) ICANNGA 2013. LNCS, vol. 7824, pp. 90–99. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37213-1_10
Bachir Menai, M.E.: Word sense disambiguation using evolutionary algorithms – application to Arabic language. Comput. Hum. Behav. 41, 92–103 (2014)
Ekbal, A., Saha, S.: Simulated annealing based classifier ensemble techniques: application to part of speech tagging. Inf. Fusion 14(3), 288–300 (2013)
Ekbal, A., Saha, S.: A multiobjective simulated annealing approach for classifier ensemble: named entity recognition in Indian languages as case studies. Expert Syst. Appl. 38, 14760–14772 (2011)
Dinakaramani, A., Rashel, F., Luthfi, A., Manurung, R.: Designing an Indonesian part of speech tagset and manually tagged Indonesian corpus. In: 2014 International Conference on Asian Language Processing (IALP), Kuching (2014)
Petrov, S., Das, D., McDonald, R.: A universal part-of-speech tagset. In: Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, Istambul (2012)
Rabbi, I., Khan, M.A., Ali, R.: Developing a tagset for Pashto part of speech tagging. In: Second International Conference on Electrical Engineering, Lahore (Pakistan) (2008)
Francis, W.N., Kucera, H.: Brown Corpus (1979). http://clu.uni.no/icame/manuals/BROWN/INDEX.HTM#bc8. Accessed 21 Nov 2016
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the penn treebank. J. Comput. Linguist. Spec. Issue Using Large Corpora, II 19(2), 313–330 (1993)
Koehn, P.: Europarl: a parallel corpus for statistical machine translation. In: Proceedings of the Tenth Machine Translation Summit (MT Summit XX), Phuket, Thailand (2005)
Brownlee, J.: Clever algorithms nature-inspired programming recipes. Melbourne, lulu.com (2011)
Yang, X.-S.: Harmony search as a metaheuristic algorithm. In: Geem, Z.W. (ed.) Music-Inspired Harmony Search Algorithm. Studies in Computational Intelligence, vol. 191, pp. 1–14. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00185-7_1
Lee, K.S., Geem, Z.W.: A new meta-heuristic algorithm for continuos engineering optimization: harmony search theory and practice. Comput. Methods Appl. Mech. Eng. 194, 3902–3933 (2005)
Mahdavi, M., Fesanghary, M., Damangir, E.: An improved harmony search algorithm for solving optimization problems. Appl. Math. Comput. 188(2), 1567–1579 (2007)
Omran, M.G., Mahdavi, M.: Global-best harmony search. Appl. Math. Comput. 198, 643–656 (2008)
Forsati, R., Shamsfard, M.: Hybrid PoS-tagging: a cooperation of evolutionary and statistical approaches. Appl. Math. Model. 38(13), 3193–3211 (2014)
NLTK Project: Natural Language Toolkit (2017). http://www.nltk.org/. Accessed 15 June 2017
Sierra, L.-M., Cobos, L., Corrales, J.-C.: Continuous optimization based on a hybridization of differential evolution with k-means. In: Bazzan, A.L.C., Pichara, K. (eds.) IBERAMIA 2014. LNCS, vol. 8864, pp. 381–392. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12027-0_31
Eberhart, R., Kennedy, J.: A new optimizer using particle swarm theory. In: Proceedings of the Sixth International Symposium on Micromachine and Human Science (1995)
Acknowledgements
Sierra, Cobos and Corrales are grateful to University of Cauca and its research groups GTI and GIT of the Computer Science and Telematics departments. We are especially grateful to Colin McLachlan for suggestions relating to the English text.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Sierra Martínez, L.M., Cobos, C.A., Corrales, J.C. (2017). Memetic Algorithm Based on Global-Best Harmony Search and Hill Climbing for Part of Speech Tagging. In: Ghosh, A., Pal, R., Prasath, R. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2017. Lecture Notes in Computer Science(), vol 10682. Springer, Cham. https://doi.org/10.1007/978-3-319-71928-3_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-71928-3_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71927-6
Online ISBN: 978-3-319-71928-3
eBook Packages: Computer ScienceComputer Science (R0)