Nothing Special   »   [go: up one dir, main page]

Skip to main content

Towards a Mention-Pair Model for Coreference Resolution in Portuguese

  • Conference paper
  • First Online:
Progress in Artificial Intelligence (EPIA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10423))

Included in the following conference series:

Abstract

The aim of coreference resolution is to automatically determine all linguistic expressions included in a piece of text that refer to the same entity. Following the mention-pair model, we employ machine learning techniques to address coreference resolution from text written in Portuguese. Based on a modest annotated corpus, we highlight the impact that different training-set creation strategies have on the quality of the predictions made by the system. We conclude that enriching the system with semantic-based features significantly improves the overall performance of the system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://polyglot.readthedocs.io/en/latest/index.html.

References

  1. Al-Rfou, R., Perozzi, B., Skiena, S.: Polyglot: distributed word representations for multilingual NLP. In: Proceedings of 17th Conference on Computational Natural Language Learning, pp. 183–192. ACL, Sofia, August 2013

    Google Scholar 

  2. Bick, E.: Multi-level NER for Portuguese in a CG framework. In: Mamede, N.J., Trancoso, I., Baptista, J., das Graças Volpe Nunes, M. (eds.) PROPOR 2003. LNCS, vol. 2721, pp. 118–125. Springer, Heidelberg (2003). doi:10.1007/3-540-45011-4_18

    Chapter  MATH  Google Scholar 

  3. Coreixas, T.: Resolução de correferência e categorias de entidades nomeadas. Master’s thesis, Pontifícia Universidade Católica do Rio Grande do Sul (2010)

    Google Scholar 

  4. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. Language, Speech, and Communication. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  5. Fonseca, E.B., Antonitsch, A., Collovini, S., do Amaral, D.O.F., Vieira, R., Figueira, A.: Summ-it++: an enriched version of the summ-it corpus. In: Proceedings of 10th International Conference on Language Resources and Evaluation, Portorož, Slovenia (2016)

    Google Scholar 

  6. Fonseca, E.B., Vieira, R., Vanin, A.A.: Coreference resolution in Portuguese: detecting person, location and organization. J. Braz. Comput. Intell. Soc. 12, 86–97 (2014)

    Google Scholar 

  7. Fonseca, E.B., Vieira, R., Vanin, A.A.: Dealing with imbalanced datasets for coreference resolution. In: Proceedings of 28th International Florida Artificial Intelligence Research Society Conference, FLAIRS, Hollywood, Florida, 18–20 May, pp. 169–174 (2015)

    Google Scholar 

  8. Fonseca, E., Vieira, R., Vanin, A.: Improving coreference resolution with semantic knowledge. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS, vol. 9727, pp. 213–224. Springer, Cham (2016). doi:10.1007/978-3-319-41552-9_21

    Chapter  Google Scholar 

  9. Freitas, C., Mota, C., Santos, D., Oliveira, H.G., Carvalho, P.: Second harem: advancing the state of the art of named entity recognition in portuguese. In: Calzolari, N., Choukri, K., Maegaard, B., et al. (eds.) Proceedings of Seventh International Conference on Language Resources and Evaluation (LREC). ELRA, Valletta, Malta, May 2010

    Google Scholar 

  10. Gamallo, P., García, M.: Multilingual open information extraction. In: Proceedings of Progress in Artificial Intelligence - 17th Portuguese Conference on Artificial Intelligence, EPIA 2015, Coimbra, Portugal, 8–11 September 2015, pp. 711–722 (2015)

    Chapter  Google Scholar 

  11. García, M., Gamallo, P.: An entity-centric coreference resolution system for person entities with rich linguistic information. In: 25th International Conference on Computational Linguistics: Technical Papers, 23–29 August, Dublin, Ireland, pp. 741–752 (2014)

    Google Scholar 

  12. Garcia, M., Gamallo, P.: Multilingual corpora with coreferential annotation of person entities. In: The 9th edn. of the Language Resources and Evaluation Conference, pp. 3229–3233. European Language Resources Association (2014)

    Google Scholar 

  13. Gonçalo Oliveira, H.: CONTO.PT: groundwork for the automatic creation of a fuzzy portuguese wordnet. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS, vol. 9727, pp. 283–295. Springer, Cham (2016). doi:10.1007/978-3-319-41552-9_29

    Chapter  Google Scholar 

  14. Grosz, B.J., Joshi, A.K., Weinstein, S.: Providing a unified account of definite noun phrases in discourse. In: Proceedings of the 21st Annual Meeting on ACL, pp. 44–50. ACL 1983. ACL, Stroudsburg (1983)

    Google Scholar 

  15. Jurafsky, D., Martin, J.H.: Speech and Language Processing, 2nd edn. Prentice-Hall Inc., Upper Saddle River (2009)

    Google Scholar 

  16. Lee, H., Peirsman, Y., Chang, A., Chambers, N., Surdeanu, M., Jurafsky, D.: Stanford’s multi-pass sieve coreference resolution system at the CoNLL-2011 shared task. In: Proceedings of the Fifteenth Conference on CoNLL: Shared Task, pp. 28–34. ACL, Stroudsburg (2011)

    Google Scholar 

  17. More, A.: Survey of resampling techniques for improving classification performance in unbalanced datasets. Computing Research Repository (CoRR) (2016)

    Google Scholar 

  18. Ng, V., Cardie, C.: Improving machine learning approaches to coreference resolution. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 104–111. ACL 2002. ACL, Stroudsburg (2002)

    Google Scholar 

  19. Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  20. Pradhan, S., Ramshaw, L., Marcus, M., Palmer, M., Weischedel, R., Xue, N.: CoNLL-2011 shared task: modeling unrestricted coreference in ontonotes. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, pp. 1–27. CONLL Shared Task 2011. ACL, Stroudsburg (2011)

    Google Scholar 

  21. Rich, E., LuperFoy, S.: An architecture for anaphora resolution. In: Proceedings of the Second Conference on ANLC, pp. 18–24. ACL, Stroudsburg (1988)

    Google Scholar 

  22. Sapena, E., Padró, L., Turmo, J.: A constraint-based hypergraph partitioning approach to coreference resolution. Comput. Linguist. 39(4), 847–884 (2013)

    Article  Google Scholar 

  23. Soon, W.M., Ng, H.T., Lim, D.C.Y.: A machine learning approach to coreference resolution of noun phrases. Comput. Linguist. 27(4), 521–544 (2001)

    Article  Google Scholar 

  24. Vilain, M., Burger, J., Aberdeen, J., Connolly, D., Hirschman, L.: A model-theoretic coreference scoring scheme. In: Proceedings of the 6th Conference on Message Understanding. pp. 45–52. MUC6 1995. ACL (1995)

    Google Scholar 

  25. Walker, M., Joshi, A., Prince, E.: Centering Theory in Discourse. Clarendon Press, Wotton-under-Edge (1998)

    MATH  Google Scholar 

Download references

Acknowledgments

The first author is partially supported by a doctoral grant from Doctoral Program in Informatics Engineering (ProDEI) from the Faculty of Engineering of the University of Porto (FEUP).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gil Rocha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Rocha, G., Lopes Cardoso, H. (2017). Towards a Mention-Pair Model for Coreference Resolution in Portuguese. In: Oliveira, E., Gama, J., Vale, Z., Lopes Cardoso, H. (eds) Progress in Artificial Intelligence. EPIA 2017. Lecture Notes in Computer Science(), vol 10423. Springer, Cham. https://doi.org/10.1007/978-3-319-65340-2_69

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-65340-2_69

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-65339-6

  • Online ISBN: 978-3-319-65340-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics