2018
pdf
bib
abs
The RST Spanish-Chinese Treebank
Shuyuan Cao
|
Iria da Cunha
|
Mikel Iruskieta
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)
Discourse analysis is necessary for different tasks of Natural Language Processing (NLP). As two of the most spoken languages in the world, discourse analysis between Spanish and Chinese is important for NLP research. This paper aims to present the first open Spanish-Chinese parallel corpus annotated with discourse information, whose theoretical framework is based on the Rhetorical Structure Theory (RST). We have evaluated and harmonized each annotation part to obtain a high annotated-quality corpus. The corpus is already available to the public.
2017
pdf
bib
abs
The arText prototype: An automatic system for writing specialized texts
Iria da Cunha
|
M. Amor Montané
|
Luis Hysa
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics
This article describes an automatic system for writing specialized texts in Spanish. The arText prototype is a free online text editor that includes different types of linguistic information. It is designed for a variety of end users and domains, including specialists and university students working in the fields of medicine and tourism, and laypersons writing to the public administration. ArText provides guidance on how to structure a text, prompts users to include all necessary contents in each section, and detects lexical and discourse problems in the text.
pdf
bib
Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms
M. Taboada
|
I. da Cunha
|
E.G. Maziero
|
P. Cardoso
|
J.D. Antonio
|
M. Iruskieta
Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms
pdf
bib
Discourse Segmentation for Building a RST Chinese Treebank
Shuyuan Cao
|
Nianwen Xue
|
Iria da Cunha
|
Mikel Iruskieta
|
Chuan Wang
Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms
2016
pdf
bib
CobaltF: A Fluent Metric for MT Evaluation
Marina Fomicheva
|
Núria Bel
|
Lucia Specia
|
Iria da Cunha
|
Anton Malinovskiy
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers
pdf
bib
abs
A Corpus-based Approach for Spanish-Chinese Language Learning
Shuyuan Cao
|
Iria da Cunha
|
Mikel Iruskieta
Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016)
Due to the huge population that speaks Spanish and Chinese, these languages occupy an important position in the language learning studies. Although there are some automatic translation systems that benefit the learning of both languages, there is enough space to create resources in order to help language learners. As a quick and effective resource that can give large amount language information, corpus-based learning is becoming more and more popular. In this paper we enrich a Spanish-Chinese parallel corpus automatically with part of-speech (POS) information and manually with discourse segmentation (following the Rhetorical Structure Theory (RST) (Mann and Thompson, 1988)). Two search tools allow the Spanish-Chinese language learners to carry out different queries based on tokens and lemmas. The parallel corpus and the research tools are available to the academic community. We propose some examples to illustrate how learners can use the corpus to learn Spanish and Chinese.
2015
pdf
bib
UPF-Cobalt Submission to WMT15 Metrics Task
Marina Fomicheva
|
Núria Bel
|
Iria da Cunha
|
Anton Malinovskiy
Proceedings of the Tenth Workshop on Statistical Machine Translation
2011
pdf
bib
On the Development of the RST Spanish Treebank
Iria da Cunha
|
Juan-Manuel Torres-Moreno
|
Gerardo Sierra
Proceedings of the 5th Linguistic Annotation Workshop
pdf
bib
The RST Spanish Treebank On-line Interface
Iria da Cunha
|
Juan-Manuel Torres-Moreno
|
Gerardo Sierra
|
Luis-Adrián Cabrera-Diego
|
Brenda-Gabriela Castro-Rolón
|
Juan-Miguel Rolland Bartilotti
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011
2010
pdf
bib
abs
Évaluation automatique de résumés avec et sans référence
Juan-Manuel Torres-Moreno
|
Horacio Saggion
|
Iria da Cunha
|
Patricia Velázquez-Morales
|
Eric Sanjuan
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs
Nous étudions différentes méthodes d’évaluation de résumé de documents basées sur le contenu. Nous nous intéressons en particulier à la corrélation entre les mesures d’évaluation avec et sans référence humaine. Nous avons développé FRESA, un nouveau système d’évaluation fondé sur le contenu qui calcule les divergences entre les distributions de probabilité. Nous appliquons notre système de comparaison aux diverses mesures d’évaluation bien connues en résumé de texte telles que la Couverture, Responsiveness, Pyramids et Rouge en étudiant leurs associations dans les tâches du résumé multi-document générique (francais/anglais), focalisé (anglais) et résumé mono-document générique (français/espagnol).
pdf
bib
Multilingual Summarization Evaluation without Human Models
Horacio Saggion
|
Juan-Manuel Torres-Moreno
|
Iria da Cunha
|
Eric SanJuan
|
Patricia Velázquez-Morales
Coling 2010: Posters
pdf
bib
abs
Automatic Summarization Using Terminological and Semantic Resources
Jorge Vivaldi
|
Iria da Cunha
|
Juan-Manuel Torres-Moreno
|
Patricia Velázquez-Morales
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
This paper presents a new algorithm for automatic summarization of specialized texts combining terminological and semantic resources: a term extractor and an ontology. The term extractor provides the list of the terms that are present in the text together their corresponding termhood. The ontology is used to calculate the semantic similarity among the terms found in the main body and those present in the document title. The general idea is to obtain a relevance score for each sentence taking into account both the termhood of the terms found in such sentence and the similarity among such terms and those terms present in the title of the document. The phrases with the highest score are chosen to take part of the final summary. We evaluate the algorithm with Rouge, comparing the resulting summaries with the summaries of other summarizers. The sentence selection algorithm was also tested as part of a standalone summarizer. In both cases it obtains quite good results although the perception is that there is a space for improvement.