Mutsuko Tomokiyo

2018

pdf bib abs
Towards an Automatic Classification of Illustrative Examples in a Large Japanese-French Dictionary Obtained by OCR
Christian Boitet | Mathieu Mangeot | Mutsuko Tomokiyo
Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing

We work on improving the Cesselin, a large and open source Japanese-French bilingual dictionary digitalized by OCR, available on the web, and contributively improvable online. Labelling its examples (about 226000) would significantly enhance their usefulness for language learners. Examples are proverbs, idiomatic constructions, normal usage examples, and, for nouns, phrases containing a quantifier. Proverbs are easy to spot, but not examples of other types. To find a method for automatically or at least semi-automatically annotating them, we have studied many entries, and hypothesized that the degree of lexical similarity between results of MT into a third language might give good cues. To confirm that hypothesis, we sampled 500 examples and used Google Translate to translate into English their Japanese expressions and their French translations. The hypothesis holds well, in particular for distinguishing examples of normal usage from idiomatic examples. Finally, we propose a detailed annotation procedure and discuss its future automatization.

2017

pdf bib
Development of a classifiers/quantifiers dictionary towards French-Japanese MT
Mutsuko Tomokiyo | Mathieu Mangeot | Christian Boitet
Proceedings of Machine Translation Summit XVI: Research Track

2016

pdf bib abs
Corpus and dictionary development for classifiers/quantifiers towards a French-Japanese machine translation
Mutsuko Tomokiyo | Christian Boitet
Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex - V)

Although quantifiers/classifiers expressions occur frequently in everyday communications or written documents, there is no description for them in classical bilingual paper dictionaries, nor in machine-readable dictionaries. The paper describes a corpus and dictionary development for quantifiers/classifiers, and their usage in the framework of French-Japanese machine translation (MT). They often cause problems of lexical ambiguity and of set phrase recognition during analysis, in particular for a long-distance language pair like French and Japanese. For the development of a dictionary aiming at ambiguity resolution for expressions including quantifiers and classifiers which may be ambiguous with common nouns, we have annotated our corpus with UWs (interlingual lexemes) of UNL (Universal Networking Language) found on the UNL-jp dictionary. The extraction of potential classifiers/quantifiers from corpus is made by UNLexplorer web service. Keywords : classifiers, quantifiers, phraseology study, corpus annotation, UNL (Universal Networking Language), UWs dictionary, Tori Bank, French-Japanese machine translation (MT).