2018
pdf
bib
abs
Towards an Automatic Classification of Illustrative Examples in a Large Japanese-French Dictionary Obtained by OCR
Christian Boitet
|
Mathieu Mangeot
|
Mutsuko Tomokiyo
Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing
We work on improving the Cesselin, a large and open source Japanese-French bilingual dictionary digitalized by OCR, available on the web, and contributively improvable online. Labelling its examples (about 226000) would significantly enhance their usefulness for language learners. Examples are proverbs, idiomatic constructions, normal usage examples, and, for nouns, phrases containing a quantifier. Proverbs are easy to spot, but not examples of other types. To find a method for automatically or at least semi-automatically annotating them, we have studied many entries, and hypothesized that the degree of lexical similarity between results of MT into a third language might give good cues. To confirm that hypothesis, we sampled 500 examples and used Google Translate to translate into English their Japanese expressions and their French translations. The hypothesis holds well, in particular for distinguishing examples of normal usage from idiomatic examples. Finally, we propose a detailed annotation procedure and discuss its future automatization.
2017
pdf
bib
Development of a classifiers/quantifiers dictionary towards French-Japanese MT
Mutsuko Tomokiyo
|
Mathieu Mangeot
|
Christian Boitet
Proceedings of Machine Translation Summit XVI: Research Track
2016
pdf
bib
abs
Corpus and dictionary development for classifiers/quantifiers towards a French-Japanese machine translation
Mutsuko Tomokiyo
|
Christian Boitet
Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex - V)
Although quantifiers/classifiers expressions occur frequently in everyday communications or written documents, there is no description for them in classical bilingual paper dictionaries, nor in machine-readable dictionaries. The paper describes a corpus and dictionary development for quantifiers/classifiers, and their usage in the framework of French-Japanese machine translation (MT). They often cause problems of lexical ambiguity and of set phrase recognition during analysis, in particular for a long-distance language pair like French and Japanese. For the development of a dictionary aiming at ambiguity resolution for expressions including quantifiers and classifiers which may be ambiguous with common nouns, we have annotated our corpus with UWs (interlingual lexemes) of UNL (Universal Networking Language) found on the UNL-jp dictionary. The extraction of potential classifiers/quantifiers from corpus is made by UNLexplorer web service. Keywords : classifiers, quantifiers, phraseology study, corpus annotation, UNL (Universal Networking Language), UWs dictionary, Tori Bank, French-Japanese machine translation (MT).
2015
pdf
bib
Post-editing a chapter of a specialized textbook into 7 languages: importance of terminological proximity with English for productivity
Ritesh Shah
|
Christian Boitet
|
Pushpak Bhattacharyya
|
Mithun Padmakumar
|
Leonardo Zilio
|
Ruslan Kalitvianski
|
Mohammad Nasiruddin
|
Mutsuko Tomokiyo
|
Sandra Castellanos Páez
Proceedings of the 12th International Conference on Natural Language Processing
2004
pdf
bib
Towards fairer evaluations of commercial MT systems on basic travel expressions corpora
Herve Blanchon
|
Christian Boitet
|
Francis Brunet-Manquat
|
Mutsuko Tomokiyo
|
Agnes Hamon
|
Vo Trung Hung
|
Youcef Bey
Proceedings of the First International Workshop on Spoken Language Translation: Evaluation Campaign
1996
pdf
bib
Theory and practice of ambiguity labelling with a view to interactive disambiguation in text and speech MT
Christian Boitet
|
Mutsuko Tomokiyo
COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics
1992
pdf
bib
A Spoken Language Translation System: SL-TRANS2
Tsuyoshi Morimoto
|
Masami Suzuki
|
Toshiyuki Takezawa
|
Gen’ichiro Kikui
|
Masaaki Nagata
|
Mutsuko Tomokiyo
COLING 1992 Volume 3: The 14th International Conference on Computational Linguistics