A Corpus Study of Verbal Multiword Expressions in Brazilian Portuguese

Carlos Ramisch²¹,
Renata Ramisch²²,
Leonardo Zilio²³,
Aline Villavicencio²⁴ &
…
Silvio Cordeiro²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11122))

Included in the following conference series:

International Conference on Computational Processing of the Portuguese Language

906 Accesses

Abstract

Verbal multiword expressions (VMWEs) such as to make ends meet require special attention in NLP and linguistic research, and annotated corpora are valuable resources for studying them. Corpora annotated with VMWEs in several languages, including Brazilian Portuguese, were made freely available in the PARSEME shared task. The goal of this paper is to describe and analyze this corpus in terms of the characteristics of annotated VMWEs in Brazilian Portuguese. First, we summarize and exemplify the criteria used to annotate VMWEs. Then, we analyze their frequency, average length, discontinuities and variability. We further discuss challenging constructions and borderline cases. We believe that this analysis can improve the annotated corpus and its results can be used to develop systems for automatic VMWE identification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Verbal Multi-Word Expressions in Yiddish

Verbal Multiword Expressions in Slovene

Verb rhyme in Russian poetry: a quantitative analysis

Article 01 July 2020

Notes

1.
Editions 1.0 (2017) and 1.1 (2018): http://multiword.sourceforge.net/sharedtask2018.
2.
http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.1.
3.
Boldface indicates lexicalized components for all examples throughout this paper.
4.
A flexibility test verifies to what extent a change usually allowed by a language’s grammar also applies to the candidate to annotate.
5.
A word that does not co-occur with any other word outside the VMWE.
6.
http://hdl.handle.net/11372/LRT-2842.
7.
In number of intervening tokens.
8.
The normalized form of a VMWE is its sequence of lemmatized lexicalized components in lexicographic order, whereas its surface form is the textual sequence [8].

References

Baldwin, T., Kim, S.N.: Multiword expressions. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn, pp. 267–292. CRC Press, Boca Raton (2010)
Google Scholar
Bocorny Finatto, M.J., Scarton, C.E., Rocha, A., Aluísio, S.M.: Características do jornalismo popular: avaliação da inteligibilidade e auxílio à descrição do gênero. In: VIII Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pp. 30–39. Sociedade Brasileira de Computação, Cuiabá, MT, Brazil (2011)
Google Scholar
Constant, M., et al.: Multiword expression processing: a survey. Comput. Linguistics 43(4), 837–892 (2017). https://doi.org/10.1162/COLI_a_00302
Article MathSciNet Google Scholar
Constant, M., Nivre, J.: A transition-based system for joint lexical and syntactic analysis. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 161–171. Association for Computational Linguistics, August 2016. http://www.aclweb.org/anthology/P16-1016
Fotopoulou, A., Markantonatou, S., Giouli, V.: Encoding MWEs in a conceptual lexicon. In: Proceedings of the 10th Workshop on Multiword Expressions, MWE 2014, pp. 43–47. Association for Computational Linguistics (2014)
Google Scholar
Nissim, M., Zaninello, A.: Modeling the internal variability of multiword expressions through a pattern-based method. ACM TSLP Special Issue MWEs 10(2) (2013)
Article Google Scholar
Nivre, J., et al.: Universal dependencies v1: A multilingual treebank collection. In: Calzolari, N., et al. (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation, LREC 2016, pp. 1659–1666. European Language Resources Association (ELRA), May 2016
Google Scholar
Pasquer, C.: Expressions polylexicales verbales: étude de la variabilité en corpus. In: Actes de la 18e Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (TALN-RÉCITAL 2017) (2017)
Google Scholar
Riedl, M., Biemann, C.: Impact of MWE resources on multiword recognition. In: Proceedings of the 12th Workshop on Multiword Expressions, MWE 2016, pp. 107–111. Association for Computational Linguistics (2016). http://anthology.aclweb.org/W16-1816
Rosén, V., et al.: A survey of multiword expressions in treebanks. In: Proceedings of the 14th International Workshop on Treebanks & Linguistic Theories Conference, December 2015. https://hal.archives-ouvertes.fr/hal-01226001
Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45715-1_1
Chapter Google Scholar
Sanches Duran, M., Scarton, C.E., Aluísio, S.M., Ramisch, C.: Identifying Pronominal Verbs: Towards Automatic Disambiguation of the Clitic ’se’ in Portuguese. In: Proceedings of the 9th Workshop on Multiword Expressions, pp. 93–100. Association for Computational Linguistics, Atlanta, June 2013. http://www.aclweb.org/anthology/W13-1014
Savary, A., Cordeiro, S.R.: Literal readings of multiword expressions: as scarce as hen’s teeth. In: Proceedings of the 16th Workshop on Treebanks and Linguistic Theories (TLT 2016), Prague, Czech Republic (2018)
Google Scholar
Savary, A., Jacquemin, C.: Reducing information variation in text. In: Renals, S., Grefenstette, G. (eds.) Text- and Speech-Triggered Information Access. LNCS (LNAI), vol. 2705, pp. 145–181. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45115-0_6
Chapter Google Scholar
Savary, A., et al.: The PARSEME Shared Task on automatic identification of verbal multiword expressions. In: Proceedings of the 13th Workshop on Multiword Expressions, MWE 217, pp. 31–47. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/W17-1704, http://aclanthology.coli.uni-saarland.de/pdf/W/W17/W17-1704.pdf
Straka, M., Straková, J.: Tokenizing, pos tagging, lemmatizing and parsing ud 2.0 with UDPipe. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 88–99. Association for Computational Linguistics, Vancouver, August 2017
Google Scholar
Tutin, A.: Comparing morphological and syntactic variations of support verb constructions and verbal full phrasemes in French: a corpus based study. In: PARSEME COST Action. Relieving the Pain in the Neck in Natural Language Processing: 7th Final General Meeting, Dubrovnik, Croatia (2016)
Google Scholar
van Gompel, M., van der Sloot, K., Reynaert, M., van den Bosch, A.: FoLiA in practice: the infrastructure of a linguistic annotation format, pp. 71–81 (2017). https://doi.org/10.5334/bbi.6
Zeman, D., et al.: Conll 2017 shared task: Multilingual parsing from raw text to universal dependencies. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 1–19 Association for Computational Linguistics, Vancouver, Canada, August 2017. http://www.aclweb.org/anthology/K/K17/K17-3001.pdf

Download references

Acknowledgement

We would like to thank Helena Caseli for her participation as an annotator. We would also like to thank the PARSEME shared task organizers, especially Agata Savary and Veronika Vincze. This work was supported by the IC1207 PARSEME COST action (http://www.parseme.eu) and by the PARSEME-FR project (ANR-14-CERA-0001). (http://parsemefr.lif.univ-mrs.fr/)

Author information

Authors and Affiliations

Aix Marseille Univ, Université de Toulon, CNRS, LIS, Marseille, France
Carlos Ramisch & Silvio Cordeiro
Interinstitutional Center for Computational Linguistics, São Carlos, Brazil
Renata Ramisch
Université catholique de Louvain, Louvain-la-Neuve, Belgium
Leonardo Zilio
University of Essex, Colchester, UK
Aline Villavicencio

Authors

Carlos Ramisch
View author publications
You can also search for this author in PubMed Google Scholar
Renata Ramisch
View author publications
You can also search for this author in PubMed Google Scholar
Leonardo Zilio
View author publications
You can also search for this author in PubMed Google Scholar
Aline Villavicencio
View author publications
You can also search for this author in PubMed Google Scholar
Silvio Cordeiro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlos Ramisch .

Editor information

Editors and Affiliations

Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
Aline Villavicencio
Instituto de Informática - UFRGS, Porto Alegre, Brazil
Viviane Moreira
INESC-ID, Lisbon, Portugal
Alberto Abad
UFSCAR, Sao Carlos, Brazil
Helena Caseli
Centro Singular de Investigación en Tecnoloxías, Universidade de Santiago de Compostela, Santiago de Compostela, La Coruña, Spain
Pablo Gamallo
Université de Toulon, Parc Scientifique Technologique Luminy, Marseille, France
Carlos Ramisch
Centro de Informática e Sistemas, Universidade de Coimbra, Coimbra, Portugal
Hugo Gonçalo Oliveira
Federal University of Technology, Dois Vizinhos, Paraná, Brazil
Gustavo Henrique Paetzold

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ramisch, C., Ramisch, R., Zilio, L., Villavicencio, A., Cordeiro, S. (2018). A Corpus Study of Verbal Multiword Expressions in Brazilian Portuguese. In: Villavicencio, A., et al. Computational Processing of the Portuguese Language. PROPOR 2018. Lecture Notes in Computer Science(), vol 11122. Springer, Cham. https://doi.org/10.1007/978-3-319-99722-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-99722-3_3
Published: 26 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99721-6
Online ISBN: 978-3-319-99722-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics