Abstract
The aim of textual entailment and paraphrase recognition is to determine whether the meaning of a text fragment can be inferred (is entailed) from the meaning of another text fragment. In this paper, we address the task of automatically recognizing textual entailment (RTE) and paraphrases from text written in the Portuguese language employing supervised machine learning techniques. Firstly, we formulate the task as a multi-class classification problem. We conclude that semantic-based approaches are very promising to recognize textual entailment and that combining data from European and Brazilian Portuguese brings several challenges typical with cross-language learning. Then, we formulate the task as a binary classification problem and demonstrate the capability of the proposed classifier for RTE and paraphrases. The results reported in this work are promising, achieving 0.83 of accuracy on the test data.
Similar content being viewed by others
References
Agirre, E., Banea, C., Cardie, C., Cer, D.M., Diab, M.T., Gonzalez-Agirre, A., Guo, W., Lopez-Gazpio, I., Maritxalar, M., Mihalcea, R., Rigau, G., Uria, L., Wiebe, J.: Semeval-2015 task 2: semantic textual similarity, english, spanish and pilot on interpretability. In: Cer, D.M., Jurgens, D., Nakov, P., Zesch, T. (eds.) Proceedings of the 9th International Workshop on Semantic Evaluation, Denver, USA, pp. 252–263. ACL (2015)
Al-Rfou, R., Perozzi, B., Skiena, S.: Polyglot: distributed word representations for multilingual NLP. In: Proceedings of Seventeenth Conference on Computational Natural Language Learning, pp. 183–192. ACL, Sofia, Bulgaria, August 2013
Alves, A.O., Oliveira, H., Rodrigues, R.: ASAPP: Alinhamento Semântico Automático de Palavras aplicado ao Português. Linguamática 8(2), 43–58 (2016)
Androutsopoulos, I., Malakasiotis, P.: A survey of paraphrasing and textual entailment methods. J. Artif. Int. Res. 38(1), 135–187 (2010)
Beltagy, I., Roller, S., Cheng, P., Erk, K., Mooney, R.J.: Representing meaning with a combination of logical and distributional models. Comput. Linguist. 42(4), 763–808 (2016)
Bentivogli, L., Dagan, I., Dang, H.T., Giampiccolo, D., Magnini, B.: Fifth PASCAL recognizing textual entailment challenge. In: Proceedings of Text Analysis Conference (2009)
Dagan, I., Glickman, O., Magnini, B.: The PASCAL recognising Textual entailment challenge. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds.) MLCW 2005. LNCS, vol. 3944, pp. 177–190. Springer, Heidelberg (2006). doi:10.1007/11736790_9
Dagan, I., Roth, D., Sammons, M., Zanzotto, F.M.: Recognizing Textual Entailment: Models and Applications. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers, San Rafael (2013)
De Marneffe, M., Rafferty, A.N., Manning, C.D.: Finding contradictions in text. In: Association for Computational Linguistics (2008)
Fellbaum, C. (ed.): WordNet: an electronic lexical database Language, speech, and communication. MIT Press, Cambridge (1998)
Fialho, P., Marques, R., Martins, B., Coheur, L., Quaresma, P.: INESC-ID@ASSIN: Medição de Similaridade Semântica e Reconhecimento de Inferência Textual. Linguamática 8(2), 33–42 (2016)
Fonseca, E., Santos, L., Criscuolo, M., Aluisio, S.: ASSIN: avaliacao de similaridade semantica e inferencia textual. In: Computational Processing of the Portuguese Language - 12th International Conference, Tomar, Portugal, 13–15 July (2016)
Garcia, M., Gamallo, P.: Yet another suite of multilingual NLP tools. In: Sierra-Rodríguez, J.-L., Leal, J.P., Simões, A. (eds.) SLATE 2015. CCIS, vol. 563, pp. 65–75. Springer, Cham (2015). doi:10.1007/978-3-319-27653-3_7
Gonçalo Oliveira, H.: CONTO.PT: groundwork for the automatic creation of a fuzzy portuguese wordnet. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS, vol. 9727, pp. 283–295. Springer, Cham (2016). doi:10.1007/978-3-319-41552-9_29
Hartmann, N.S.: Solo Queue at ASSIN: Combinando Abordagens Tradicionais e Emergentes. Linguamática 8(2), 59–64 (2016)
Lai, A., Hockenmaier, J.: Illinois-LH: a denotational and distributional approach to semantics. In: Proceedings of 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 329–334. ACL, Dublin, Ireland, August 2014
Lin, C.Y., Och, F.J.: Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In: Proceedings of 42nd Annual Meeting Association for Computational Linguistics, Stroudsburg, PA, USA (2004)
Lippi, M., Torroni, P.: Argumentation mining: state of the art and emerging trends. ACM Trans. Internet Technol. 16(2), 10:1–10:25 (2016)
Madnani, N., Dorr, B.J.: Generating phrasal and sentential paraphrases: a survey of data-driven methods. Comput. Linguist. 36(3), 341–387 (2010)
Marelli, M., Bentivogli, L., Baroni, M., Bernardi, R., Menini, S., Zamparelli, R.: Semeval-2014 task 1: evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In: Nakov, P., Zesch, T. (eds.) Proceedings of 8th International Workshop on Semantic Evaluation, COLING, Dublin, Ireland, pp. 1–8. ACL (2014)
Moens, M.F.: Information Extraction: Algorithms and Prospects in a Retrieval Context. Springer, Heidelberg (2009)
Mollá, D., Vicedo, J.L.: Question answering in restricted domains: an overview. Comput. Linguist. 33(1), 41–61 (2007)
Padó, S., Galley, M., Jurafsky, D., Manning, C.: Robust machine translation evaluation with entailment features. In: Proceedings of Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, vol. 1, pp. 297–305. ACL, Stroudsburg, PA, USA (2009)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: A method for automatic evaluation of machine translation. In: Proceedings of 40th Annual Meeting Association Computational Linguistics, pp. 311–318. ACL, Stroudsburg, PA, USA (2002)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Rocha, G., Lopes Cardoso, H., Teixeira, J.: ArgMine: a framework for argumentation mining. In: 12th International Conference on Computational Processing of the Portuguese Language - PROPOR 2016, Student Research Workshop, Tomar, Portugal, 13–15 July (2016)
Rocktäschel, T., Grefenstette, E., Hermann, K.M., Kociský, T., Blunsom, P.: Reasoning about entailment with neural attention. CoRR abs/1509.06664 (2015)
Sammons, M., Vydiswaran, V., Roth, D.: Recognizing textual entailment. In: Bikel, D.M., Zitouni, I. (eds.) Multilingual Natural Language Applications: From Theory to Practice, pp. 209–258. Prentice Hall, Upper Saddle River (2012)
Acknowledgments
The first author is partially supported by a doctoral grant from Doctoral Program in Informatics Engineering (ProDEI) from the Faculty of Engineering of the University of Porto (FEUP).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Rocha, G., Lopes Cardoso, H. (2017). Recognizing Textual Entailment and Paraphrases in Portuguese. In: Oliveira, E., Gama, J., Vale, Z., Lopes Cardoso, H. (eds) Progress in Artificial Intelligence. EPIA 2017. Lecture Notes in Computer Science(), vol 10423. Springer, Cham. https://doi.org/10.1007/978-3-319-65340-2_70
Download citation
DOI: https://doi.org/10.1007/978-3-319-65340-2_70
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65339-6
Online ISBN: 978-3-319-65340-2
eBook Packages: Computer ScienceComputer Science (R0)