Abstract
The aim of coreference resolution is to automatically determine all linguistic expressions included in a piece of text that refer to the same entity. Following the mention-pair model, we employ machine learning techniques to address coreference resolution from text written in Portuguese. Based on a modest annotated corpus, we highlight the impact that different training-set creation strategies have on the quality of the predictions made by the system. We conclude that enriching the system with semantic-based features significantly improves the overall performance of the system.
Similar content being viewed by others
References
Al-Rfou, R., Perozzi, B., Skiena, S.: Polyglot: distributed word representations for multilingual NLP. In: Proceedings of 17th Conference on Computational Natural Language Learning, pp. 183–192. ACL, Sofia, August 2013
Bick, E.: Multi-level NER for Portuguese in a CG framework. In: Mamede, N.J., Trancoso, I., Baptista, J., das Graças Volpe Nunes, M. (eds.) PROPOR 2003. LNCS, vol. 2721, pp. 118–125. Springer, Heidelberg (2003). doi:10.1007/3-540-45011-4_18
Coreixas, T.: Resolução de correferência e categorias de entidades nomeadas. Master’s thesis, Pontifícia Universidade Católica do Rio Grande do Sul (2010)
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. Language, Speech, and Communication. MIT Press, Cambridge (1998)
Fonseca, E.B., Antonitsch, A., Collovini, S., do Amaral, D.O.F., Vieira, R., Figueira, A.: Summ-it++: an enriched version of the summ-it corpus. In: Proceedings of 10th International Conference on Language Resources and Evaluation, Portorož, Slovenia (2016)
Fonseca, E.B., Vieira, R., Vanin, A.A.: Coreference resolution in Portuguese: detecting person, location and organization. J. Braz. Comput. Intell. Soc. 12, 86–97 (2014)
Fonseca, E.B., Vieira, R., Vanin, A.A.: Dealing with imbalanced datasets for coreference resolution. In: Proceedings of 28th International Florida Artificial Intelligence Research Society Conference, FLAIRS, Hollywood, Florida, 18–20 May, pp. 169–174 (2015)
Fonseca, E., Vieira, R., Vanin, A.: Improving coreference resolution with semantic knowledge. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS, vol. 9727, pp. 213–224. Springer, Cham (2016). doi:10.1007/978-3-319-41552-9_21
Freitas, C., Mota, C., Santos, D., Oliveira, H.G., Carvalho, P.: Second harem: advancing the state of the art of named entity recognition in portuguese. In: Calzolari, N., Choukri, K., Maegaard, B., et al. (eds.) Proceedings of Seventh International Conference on Language Resources and Evaluation (LREC). ELRA, Valletta, Malta, May 2010
Gamallo, P., García, M.: Multilingual open information extraction. In: Proceedings of Progress in Artificial Intelligence - 17th Portuguese Conference on Artificial Intelligence, EPIA 2015, Coimbra, Portugal, 8–11 September 2015, pp. 711–722 (2015)
García, M., Gamallo, P.: An entity-centric coreference resolution system for person entities with rich linguistic information. In: 25th International Conference on Computational Linguistics: Technical Papers, 23–29 August, Dublin, Ireland, pp. 741–752 (2014)
Garcia, M., Gamallo, P.: Multilingual corpora with coreferential annotation of person entities. In: The 9th edn. of the Language Resources and Evaluation Conference, pp. 3229–3233. European Language Resources Association (2014)
Gonçalo Oliveira, H.: CONTO.PT: groundwork for the automatic creation of a fuzzy portuguese wordnet. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS, vol. 9727, pp. 283–295. Springer, Cham (2016). doi:10.1007/978-3-319-41552-9_29
Grosz, B.J., Joshi, A.K., Weinstein, S.: Providing a unified account of definite noun phrases in discourse. In: Proceedings of the 21st Annual Meeting on ACL, pp. 44–50. ACL 1983. ACL, Stroudsburg (1983)
Jurafsky, D., Martin, J.H.: Speech and Language Processing, 2nd edn. Prentice-Hall Inc., Upper Saddle River (2009)
Lee, H., Peirsman, Y., Chang, A., Chambers, N., Surdeanu, M., Jurafsky, D.: Stanford’s multi-pass sieve coreference resolution system at the CoNLL-2011 shared task. In: Proceedings of the Fifteenth Conference on CoNLL: Shared Task, pp. 28–34. ACL, Stroudsburg (2011)
More, A.: Survey of resampling techniques for improving classification performance in unbalanced datasets. Computing Research Repository (CoRR) (2016)
Ng, V., Cardie, C.: Improving machine learning approaches to coreference resolution. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 104–111. ACL 2002. ACL, Stroudsburg (2002)
Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Pradhan, S., Ramshaw, L., Marcus, M., Palmer, M., Weischedel, R., Xue, N.: CoNLL-2011 shared task: modeling unrestricted coreference in ontonotes. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, pp. 1–27. CONLL Shared Task 2011. ACL, Stroudsburg (2011)
Rich, E., LuperFoy, S.: An architecture for anaphora resolution. In: Proceedings of the Second Conference on ANLC, pp. 18–24. ACL, Stroudsburg (1988)
Sapena, E., Padró, L., Turmo, J.: A constraint-based hypergraph partitioning approach to coreference resolution. Comput. Linguist. 39(4), 847–884 (2013)
Soon, W.M., Ng, H.T., Lim, D.C.Y.: A machine learning approach to coreference resolution of noun phrases. Comput. Linguist. 27(4), 521–544 (2001)
Vilain, M., Burger, J., Aberdeen, J., Connolly, D., Hirschman, L.: A model-theoretic coreference scoring scheme. In: Proceedings of the 6th Conference on Message Understanding. pp. 45–52. MUC6 1995. ACL (1995)
Walker, M., Joshi, A., Prince, E.: Centering Theory in Discourse. Clarendon Press, Wotton-under-Edge (1998)
Acknowledgments
The first author is partially supported by a doctoral grant from Doctoral Program in Informatics Engineering (ProDEI) from the Faculty of Engineering of the University of Porto (FEUP).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Rocha, G., Lopes Cardoso, H. (2017). Towards a Mention-Pair Model for Coreference Resolution in Portuguese. In: Oliveira, E., Gama, J., Vale, Z., Lopes Cardoso, H. (eds) Progress in Artificial Intelligence. EPIA 2017. Lecture Notes in Computer Science(), vol 10423. Springer, Cham. https://doi.org/10.1007/978-3-319-65340-2_69
Download citation
DOI: https://doi.org/10.1007/978-3-319-65340-2_69
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65339-6
Online ISBN: 978-3-319-65340-2
eBook Packages: Computer ScienceComputer Science (R0)