Nothing Special   »   [go: up one dir, main page]

Skip to main content

Memetic Algorithm Based on Global-Best Harmony Search and Hill Climbing for Part of Speech Tagging

  • Conference paper
  • First Online:
Mining Intelligence and Knowledge Exploration (MIKE 2017)

Abstract

The task of assigning tags to the words of a sentence has many applications today in natural language processing (NLP) and therefore requires a fast and accurate algorithm. This paper presents a Part-of-Speech Tagger based on Global-Best Harmony Search (GBHS) which includes local optimization (based on the Hill Climbing algorithm that includes knowledge of the problem to define the neighborhood) for the best harmony after each improvisation (iteration). In the proposed algorithm, a candidate solution (harmony) is represented as a vector of the size of the numbers of word in a sentence, while the fitness function considers the cumulative probability of tagging each word and its relation to its predecessor and successor word. The proposed algorithm obtained 95.2% precision values and improved on the results obtained by other taggers. The experimental results were analyzed with Friedman non-parametric statistical tests, with a level of significance of 90%. The proposed Part-of-Speech Tagger algorithm was found to perform with quality and efficiency in the tagging problem, in contrast to the comparison algorithms. The Brown corpus divided into 5 folders was used to conduct the experiments, thereby allowing application of cross-validation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Pearson - Addison Wesley, New York (1999)

    Google Scholar 

  2. Sammut, C., Webb, G.I. (eds.): Encyclopedia of Machine Learning (Part of Speech Tagging). Springer, New York (2010)

    Google Scholar 

  3. Paul, A., Purkayastha, B.S., Sarkar, S.I.: Hidden Markov model based part of speech tagging for Nepali language. In: 2015 International Symposium on Advanced Computing and Communication (lSACC), Silchar, pp. 149–156 (2015)

    Google Scholar 

  4. Makazhanov, A., Yessenbayev, Z., Sabyrgaliyev, I., Sharafudinov, A.: On certain aspects of Kazakh part-of-speech. In: IEEE 8th International Conference on Application of Information and Communication Technologies (AICT), Astana, pp. 1–4 (2014)

    Google Scholar 

  5. Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 113–142 (1996)

    Google Scholar 

  6. Ariaratnam, I., Weerasinghe, A.R., Liyanage, C.: A shallow parser for Tamil. In: 2014 International Conference on Advances in ICT for Emerging Regions (ICTer), Colombo, pp. 197–203 (2014)

    Google Scholar 

  7. Brants, T.: TnT - a statistical part-of-speech tagger. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, ANLC 2000, Stroudsburg, PA, USA, pp. 224–231 (2000)

    Google Scholar 

  8. Brill, E.: A simple rule-based part of speech tagger. In: Proceedings of the Third Conference on Applied Natural Language Processing, ANLC 1992, Stroudsburg, PA, USA, pp. 152–155 (1992)

    Google Scholar 

  9. Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput. Linguist. 21(4), 543–565 (1995)

    MathSciNet  Google Scholar 

  10. Schmid, H.: Part-of-speech tagging with neural networks. In: Proceedings of the 15th Conference on Computational Linguistics, Stroudsburg, PA, USA, pp. 172–176 (1994)

    Google Scholar 

  11. Nakamura, M., Shikano, K.: A study of English word category prediction based on neutral networks, acoustics, speech, and signal processing. In: International Conference on Acoustics, Speech, and Signal Processing, IEEE, Glasgow, pp. 731–734 (1989)

    Google Scholar 

  12. Forsati, R., Shamsfard, M., Mojtahedpour, P.: An efficient meta heuristic algorithm for POS-tagging. In: 2010 Fifth International Multi-Conference on Computing in the Global Information Technology (ICCGI), Valencia (2010)

    Google Scholar 

  13. Forsati, R., Shamsfard, M.: Novel harmony search-based algorithms for part-of-speech tagging. Knowl. Inf. Syst. 42(3), 709–736 (2015)

    Article  Google Scholar 

  14. Silva, A.P., Silva, A., Rodríguez, I.: An approach to the POS tagging problem using genetic algorithms. In: Madani, K., Correia, A., Rosa, A., Filipe, J. (eds.) Computational Intelligence. Studies in Computational Intelligence, vol. 577, pp. 3–17. Springer, Cham (2012). https://doi.org/10.1007/978-3-319-11271-8_1

    Google Scholar 

  15. Jianchao, T.: An English part of speech tagging method based on maximum entropy. In: 2015 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS), Halong Bay, Vietnam, pp. 76–80 (2015)

    Google Scholar 

  16. Ranjan Das, B., Sahoo, S., Sekhar Panda, C., Patnaik, S.: Part of Speech tagging in Odia using support vector machine. In: Procedia Computer Science, International Conference on Intelligent Computing, Communication Converge, ICCC-2015, vol. 48, pp. 507–512 (2015)

    Google Scholar 

  17. Ekbal, A., Bandyopadhyay, S.: Part of speech tagging in Bengali using support vector machine. In: International Conference on Information Technology, ICIT 2008, pp. 10–111 (2008)

    Google Scholar 

  18. Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning 2001, pp. 282–289 (2001)

    Google Scholar 

  19. Araujo, L.: How evolutionary algorithms are applied to statistical natural language processing. Artif. Intell. Rev. 28(4), 275–303 (2007)

    Article  Google Scholar 

  20. AlSuhaibani, R.S., Newman, C.D., Collard, M.L., Maletic, J.I.: Heuristic-based part-of-speech tagging of source code identifiers and comments. In: 2015 IEEE 5th Workshop on Mining Unstructured Data (MUD), Bremen, pp. 1–5 (2015)

    Google Scholar 

  21. Aziz, T.A., Sunitha, C.: A hybrid parts of speech tagger for Malayalam. In: 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Kochi, pp. 1502–1507 (2015)

    Google Scholar 

  22. Mall, S., Jaiswal, U.C.: Innovative algorithms for parts of speech tagging in Hindi-English machine. In: 2015 International Conference on Green Computing and Internet of Things (ICGCIoT), Noida, pp. 709–714 (2015)

    Google Scholar 

  23. Tian, Y., Lo, D.: A comparative study on the effectiveness of part-of-speech tagging techniques on bug reports. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution and Reengineering (SANER), Montreal, QB, pp. 570–574 (2015)

    Google Scholar 

  24. Carneiro, H.C., França, F.M., Lima, P.M.: Multilingual part-of-speech tagging with weightless neural networks. Neural Netw. 66, 11–21 (2015)

    Article  Google Scholar 

  25. Carneiro, H.C., França, F.M., Lima, P.M.: WANN-tagger - a weightless artificial neural network tagger for the Portuguese language. In: Proceedings of the International Conference on Fuzzy Computation and International Conference on Neural Computation, ICFC-ICNC 2010, Valencia, pp. 330–335 (2010)

    Google Scholar 

  26. Poel, M., Boschman, E, Akker, R.A.: Neural network based Dutch part of speech tagger. In: Proceedings of the Twentieth Belgian-Dutch Artificial Intelligence Conference, BNAIC 2008, The Netherlands, pp. 217–224 (2008)

    Google Scholar 

  27. Zennaki, O., Semmar, N., Besacier, L.: Unsupervised and lightly supervised part-of-speech tagging using recurrent neural networks. In: 29th Pacific Asian Conference on Language, Information and Computation, Shangai, China, pp. 133–142 (2015)

    Google Scholar 

  28. Duong, L., Cohn, T., Verspoor, K., Bird, S., Cook, P.: What a can we get from 1000 tokens? A case study of multilingual POS tagging for resource-poor languages. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, pp. 886–897 (2014)

    Google Scholar 

  29. Forsati, R., Shamsfard, M.: Cooperation of evolutionary and statistical statistical POS-tagging. In: 2012 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP), Shiraz, Fars, pp. 446–451 (2012)

    Google Scholar 

  30. Silva, A.P., Silva, A., Rodríguez, I.: Part-of-speech tagging using evolutionary computation. In: Terrazas, G., Otero, F., Masegosa, A. (eds.) Nature Inspired Cooperative Strategies for Optimization (NICSO 2013). Studies in Computational Intelligence, vol. 512, pp. 167–178. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-01692-4_13

    Chapter  Google Scholar 

  31. Silva, A.P., Silva, A., Rodríguez, I.: Tagging with disambiguation rules a new evolutionary approach to the part-of-speech tagging problem. In: Proceedings of the 4th International Joint Conference on Computational Intelligence, ECTA-2012, pp. 5–14 (2012)

    Google Scholar 

  32. Silva, A.P., Silva, A., Rodríguez, I.: A new approach to the POS tagging problem using evolutionary. In: Proceedings of Recent Advances in Natural Language Processing, Hissar, Bulgaria, pp. 619–625 (2013)

    Google Scholar 

  33. Silva, A.P., Silva, A., Rodríguez, I.: PSO-tagger: a new biologically inspired approach to the part-of-speech tagging problem. In: Tomassini, M., Antonioni, A., Daolio, F., Buesser, P. (eds.) ICANNGA 2013. LNCS, vol. 7824, pp. 90–99. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37213-1_10

    Chapter  Google Scholar 

  34. Bachir Menai, M.E.: Word sense disambiguation using evolutionary algorithms – application to Arabic language. Comput. Hum. Behav. 41, 92–103 (2014)

    Article  Google Scholar 

  35. Ekbal, A., Saha, S.: Simulated annealing based classifier ensemble techniques: application to part of speech tagging. Inf. Fusion 14(3), 288–300 (2013)

    Article  Google Scholar 

  36. Ekbal, A., Saha, S.: A multiobjective simulated annealing approach for classifier ensemble: named entity recognition in Indian languages as case studies. Expert Syst. Appl. 38, 14760–14772 (2011)

    Article  Google Scholar 

  37. Dinakaramani, A., Rashel, F., Luthfi, A., Manurung, R.: Designing an Indonesian part of speech tagset and manually tagged Indonesian corpus. In: 2014 International Conference on Asian Language Processing (IALP), Kuching (2014)

    Google Scholar 

  38. Petrov, S., Das, D., McDonald, R.: A universal part-of-speech tagset. In: Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, Istambul (2012)

    Google Scholar 

  39. Rabbi, I., Khan, M.A., Ali, R.: Developing a tagset for Pashto part of speech tagging. In: Second International Conference on Electrical Engineering, Lahore (Pakistan) (2008)

    Google Scholar 

  40. Francis, W.N., Kucera, H.: Brown Corpus (1979). http://clu.uni.no/icame/manuals/BROWN/INDEX.HTM#bc8. Accessed 21 Nov 2016

  41. Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the penn treebank. J. Comput. Linguist. Spec. Issue Using Large Corpora, II 19(2), 313–330 (1993)

    Google Scholar 

  42. Koehn, P.: Europarl: a parallel corpus for statistical machine translation. In: Proceedings of the Tenth Machine Translation Summit (MT Summit XX), Phuket, Thailand (2005)

    Google Scholar 

  43. Brownlee, J.: Clever algorithms nature-inspired programming recipes. Melbourne, lulu.com (2011)

  44. Yang, X.-S.: Harmony search as a metaheuristic algorithm. In: Geem, Z.W. (ed.) Music-Inspired Harmony Search Algorithm. Studies in Computational Intelligence, vol. 191, pp. 1–14. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00185-7_1

    Chapter  Google Scholar 

  45. Lee, K.S., Geem, Z.W.: A new meta-heuristic algorithm for continuos engineering optimization: harmony search theory and practice. Comput. Methods Appl. Mech. Eng. 194, 3902–3933 (2005)

    Article  MATH  Google Scholar 

  46. Mahdavi, M., Fesanghary, M., Damangir, E.: An improved harmony search algorithm for solving optimization problems. Appl. Math. Comput. 188(2), 1567–1579 (2007)

    MathSciNet  MATH  Google Scholar 

  47. Omran, M.G., Mahdavi, M.: Global-best harmony search. Appl. Math. Comput. 198, 643–656 (2008)

    MathSciNet  MATH  Google Scholar 

  48. Forsati, R., Shamsfard, M.: Hybrid PoS-tagging: a cooperation of evolutionary and statistical approaches. Appl. Math. Model. 38(13), 3193–3211 (2014)

    Article  Google Scholar 

  49. NLTK Project: Natural Language Toolkit (2017). http://www.nltk.org/. Accessed 15 June 2017

  50. Sierra, L.-M., Cobos, L., Corrales, J.-C.: Continuous optimization based on a hybridization of differential evolution with k-means. In: Bazzan, A.L.C., Pichara, K. (eds.) IBERAMIA 2014. LNCS, vol. 8864, pp. 381–392. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12027-0_31

    Google Scholar 

  51. Eberhart, R., Kennedy, J.: A new optimizer using particle swarm theory. In: Proceedings of the Sixth International Symposium on Micromachine and Human Science (1995)

    Google Scholar 

Download references

Acknowledgements

Sierra, Cobos and Corrales are grateful to University of Cauca and its research groups GTI and GIT of the Computer Science and Telematics departments. We are especially grateful to Colin McLachlan for suggestions relating to the English text.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luz Marina Sierra Martínez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sierra Martínez, L.M., Cobos, C.A., Corrales, J.C. (2017). Memetic Algorithm Based on Global-Best Harmony Search and Hill Climbing for Part of Speech Tagging. In: Ghosh, A., Pal, R., Prasath, R. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2017. Lecture Notes in Computer Science(), vol 10682. Springer, Cham. https://doi.org/10.1007/978-3-319-71928-3_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-71928-3_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-71927-6

  • Online ISBN: 978-3-319-71928-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics