Abstract
This paper summarizes ongoing efforts to provide software infrastructure (and methodology) for open-source machine translation that combines a deep semantic transfer approach with advanced stochastic models. The resulting infrastructure combines precise grammars for parsing and generation, a semantic-transfer based translation engine and stochastic controllers. We provide both a qualitative and quantitative experience report from instantiating our general architecture for Japanese–English MT using only open-source components, including HPSG-based grammars of English and Japanese.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, Association for Computational Linguistics, Ann Arbor, Michigan, pp 65–72
Barreiro A, Scott B, Kasper W, Kiefer B (2011) OpenLogos machine translation: philosophy, model, resources and customization. Mach Transl 25 (this volume)
Bond F, Breen J (2007) Semi-automatic refinement of the JMdict/EDICT Japanese–English dictionary. In: 13th annual meeting of the Association for Natural Language Processing, Kyoto, pp 364–367
Bond F, Oepen S, Siegel M, Copestake A, Flickinger D (2005) Open source machine translation with DELPH-IN. In: Open-source machine translation: workshop at MT Summit X, Phuket, pp 15–22
Bond F, Kuribayashi T, Hashimoto C (2008) Construction of a free Japanese treebank based on HPSG. In: 14th annual meeting of the Association for Natural Language Processing, Tokyo, pp 241–244 (in Japanese)
Bond F, Isahara H, Uchimoto K, Kuribayashi T, Kanzaki K (2010) Japanese WordNet 1.0. In: 16th annual meeting of the Association for Natural Language Processing, Tokyo, pp A3–A5
Breen JW (2004) JMDict: a Japanese-multilingual dictionary. In: Coling 2004 workshop on multilingual linguistic resources, Geneva, pp 71–78
Burnard L (2000) The British National Corpus users reference guide. Oxford University Computing Services, Oxford
Callmeier U (2002) Preprocessing and encoding techniques in PET. In: Oepen S, Flickinger D, Tsujii J, Uszkoreit H (eds) Collaborative language engineering. A case study in efficient grammar-based processing. CSLI Publications, Stanford, CA
Carroll J, Oepen S (2005) High-efficiency realization for a wide-coverage unification grammar. In: Dale R, Wong KF (eds) Proceedings of the 2nd International Joint Conference on Natural Language Processing (Jeju, Korea). Lecture Notes in Artificial Intelligence, vol 3651. Springer, pp 165–176
Copestake A (2002) Implementing typed feature structure grammars. CSLI Publications, Stanford, CA
Copestake A (2009) Slacker semantics: why superficiality, dependency and avoidance of commitment can be the right way to go. In: Proceedings of the 12th conference of the European chapter of the ACL (EACL 2009), Athens, pp 1–9
Copestake A, Flickinger D, Pollard C, Sag IA (2005) Minimal recursion semantics. An introduction. J Res Lang Comput 3(4): 281–332
Dyvik H (1999) The universality of f-structure. Discovery or stipulation? The case of modals. In: Proceedings of the 4th International Lexical Functional Gammar Conference, Manchester, UK
Flickinger D (2000) On building a more efficient grammar by exploiting types. Nat Lang Eng 6(1): 15–28
Forcada ML, Ginestí-Rosell M, Nordfalk J, O’Regan J, Ortiz-Rojas S, Pérez-Ortiz JA, Sánchez-Martínez F, Ramírez-Sánchez G, Tyers FM (2011) Apertium: a free/open-source platform for rule-based machine translation. Mach Trans 25 (this volume)
Fujita S, Bond F, Oepen S, Tanaka T (2007) Exploiting semantic information for HPSG parse selection. In: Proceedings of the first ACL workshop on deep linguistic processing, Prague, Czech Republic, pp 25–32
Haugereid P, Bond F (2011) Extracting transfer rules for multiword expressions from parallel corpora. In: Proceedings of the workshop on multiword expressions: from parsing and generation to the real world. ACL HLT 2011, Portland, Oregon, pp 92–100
Ikehara S, Shirai S, Bond F (1996) Approaches to disambiguation in ALT-J/E. In: International seminar on multimodal interactive disambiguation: MIDDIM-96, Grenoble, pp 107–117
Jellinghaus M (2007) Automatic acquisition of semantic transfer rules for machine translation. Master’s thesis, Universität des Saarlandes
Koehn P, Shen W, Federico M, Bertoldi N, Callison-Burch C, Cowan B, Dyer C, Hoang H, Bojar O, Zens R, Constantin A, Herbst E, Moran C, Birch A (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the ACL 2007 interactive presentation sessions, Prague
Lardilleux A, Lepage Y (2009) Sampling-based multilingual alignment. In: Proceedings of Recent Advances in Natural Language Processing (RANLP 2009), Borovets, pp 214–218
Mayor A, Alegria I, Díazde Ilarraza A, Labaka G, Lersundi M, Sarasola K (2011) Matxin, an open-source rule-based machine translation system for Basque. Mach Trans. 25(1): 53–82
Mel’čuk I, Wanner L (2006) Syntactic mismatches in machine translation. Mach Trans 20: 81–138
Nichols E, Bond F, Appling DS, Matsumoto Y (2007) Combining resources for open source machine translation. In: The 11th international conference on theoretical and methodological issues in machine translation (TMI-07), Skövde, pp 134–142
Nichols E, Bond F, Appling DS, Matsumoto Y (2010) Paraphrasing training data for statistical machine translation. J Nat Lang Process 17(3): 101–122 (Special issue on empirical methods in Asian language processing)
Nygaard L, Lønning JT, Nordgård T, Oepen S (2006) Using a bi-lingual dictionary in lexical transfer. In: Proceedings of the 11th conference of the European Association for Machine Translation, Oslo, Norway, pp 233–238
Och FJ (2005) Statistical machine translation: foundations and recent advances. In: MT Summit X. Tutorial, Phuket
Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, Philadelphia, PA, pp 295–302
Oepen S, Flickinger DP (1998) Towards systematic grammar profiling. Test suite technology ten years after. J Comput Speech Lang 12(4): 411–436 ((Special issue on evaluation))
Oepen S, Lønning JT (2006) Discriminant-based MRS banking. In: Proceedings of the 5th international conference on language resources and evaluation (LREC 2006), Genoa, Italy, pp 1250–1255
Oepen S, Dyvik H, Lønning JT, Velldal E, Beermann D, Carroll J, Flickinger D, Hellan L, Johannessen JB, Meurer P, Nordgård T, Rosén V (2004a) Som å kapp-ete med trollet? Towards MRS-based Norwegian–English machine translation. In: Proceedings of the 10th International Conference on Theoretical and Methodological Issues in Machine Translation, Baltimore, MD, pp 11–20
Oepen S, Flickinger D, Toutanova K, Manning CD (2004b) LinGO Redwoods. A rich and dynamic treebank for HPSG. J Res Lang Comput 2(4): 575–596
Oepen S, Velldal E, Lønning JT, Meurer P, Rosén V, Flickinger D (2007) Towards hybrid quality-oriented machine translation. On linguistics and probabilities in MT. In: Proceedings of the 10th International Conference on Theoretical and Methodological Issues in Machine Translation, Skövde, Sweden, pp 144–153
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: 40th annual meeting of the Association for Computational Linguistics: ACL-2002, pp 311–318
Paul M (2006) Overview of the IWSLT 2006 evaluation campaign. In: Proceedings of the International Workshop on Spoken Language Translation, Kyoto, Japan, pp 1–15
Siegel M, Bender EM (2002) Efficient deep processing of Japanese. In: Proceedings of the 3rd workshop on asian language resources and international standardization at the 19th international conference on computational linguistics, Taipei, pp 1–8
Sukehiro T, Kitamura M, Murata T (2001) Collaborative translation environment ‘Yakushite.Net’. In: Proceedings of the sixth Natural Language Processing Pacific Rim Symposium: NLPRS-2001, Tokyo, pp 769–770
Tanaka Y (2001) Compilation of a multilingual parallel corpus. In: Proceedings of PACLING 2001, Kyushu, pp 265–268
Uchimoto K, Zhang Y, Sudo K, Murata M, Sekine S, Isahara H (2004) Multilingual aligned parallel treebank corpus reflecting contextual information and its applications. In: Sérasset G (ed) COLING 2004 multilingual linguistic resources, COLING, Geneva, Switzerland, pp 57–64
Velldal E, Oepen S (2006) Statistical ranking in tactical generation. In: Proceedings of the 2006 conference on empirical methods in natural language processing, Sydney, Australia, pp 517–525
Way A (1999) A hybrid architecture for robust MT using LFG-DOP. J Exper Theor Artif Intell 11: 441–471 (Special issue on memory-based language processing)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bond, F., Oepen, S., Nichols, E. et al. Deep open-source machine translation. Machine Translation 25, 87–105 (2011). https://doi.org/10.1007/s10590-011-9099-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-011-9099-4