Abstract
The word-aligned bilingual corpus is an important knowledge source for many tasks in NLP especially in machine translation. Among the existing word alignment methods, the unknown word problem, the synonym problem and the global optimization problem are very important factors impacting the recall and precision of alignment results. In this paper, we proposed a word alignment model between Chinese and Japanese which measures similarity in terms of morphological similarity, semantic distance, part of speech and co-occurrence, and matches words by maximum weight matching on bipartite graph. The model can partly solve the problems mentioned above. The model was proved to be effective by experiments. It achieved 80% as F-Score than 72% of GIZA++.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Och, F., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003)
Brown, P.F., Cocke, J., Pietra, S.A.D., Pietra, V.J.D., Jelinek, F., Lafferty, J.D., Mercer, R.L., Roossin, P.S.: A statistical approach to machine translation. Computational Linguistics 16(2), 79–85 (1990)
Toutanova, K., Ilhan, H.T., Manning, C.D.: Extensions to hmm-based statistical word alignment models. In: Proceedings of Conference on Empirical Methods for Natural Language Processing, Philadelphia, PA, pp. 87–94 (2002)
Gale, W., Church, K.: Identifying Word Correspondances in Parallel Texts. In: Proceedings of DARPA Workshop on Speech and Natural Language, Pacific Grove, CA, pp. 152–157 (1991)
Ker, S.J., Chang, J.S.: A Class-based Approach to Word Alignment. Computational Linguistics 23(2), 313–343 (1997)
Simard, M., Foster, G., Isabelle, P.: Using Cognates to Align Sentences in Bilingual Corpora. In: Proceedings of the Fourth International Conference on Theoretical and Methodogical Issues in Machine translation (TMI 1992) (Montreal), pp. 67–81 (1992)
Zhang, Y., Ma, Q., Isahara, H.: Use of Kanji Information in Constructing a Japanese-Chinese Bilingual Lexicon. In: Proceedings of The 4th workshop on ALR, Hainan, China (2004)
Wu, D.: Bracketing and aligning words and constituents in parallel text using Stochastic Inversion Transduction Grammars. In: Parallel Text Processing: Alignment and Use of Translation Corpora. Kluwer, Dordrecht (2000)
Zhang, Y., Ma, Q., Isahara, H.: Automatic acquisition of a Japanese-Chinese Bilingual lexicon Using English as an Intermediary. In: IEEE NLPKE 2003, Beijing (2003)
Ma, Q., Zhang, Y., Masaki, M., Isahara, H.: Semantic Maps for Word Alignment in Bilingual Parallel Corpora. In: ACL 2003 Workshop: Second SIGHAN Workshop on Chinese Language Processing, Sapporo, pp. 98–103 (2003)
Budanitsky, A., Hirst, G.: Semantic Distance in WordNet: An Experimental, Application-oriented Evaluation of Five Measures. In: The North American Chapter of the Association for Computational Linguistics, Pittsburgh, PA (2001)
Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of International Conference on Research in Computational Linguistics, Taiwan (1997)
Kuhn, H.W.: The Hungarian Method for the assignment problem. Naval Research Logistic Quarterly 2, 83–97 (1955)
Melamed, I.: Automatic construction of clean broad-coverage lexicons. In: Proceedings of the 2nd Conf. AMTA, Montreal, CA, pp. 125–134 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, H., Liu, S. (2006). Word Alignment Between Chinese and Japanese Using Maximum Weight Matching on Bipartite Graph. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_8
Download citation
DOI: https://doi.org/10.1007/11940098_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49667-0
Online ISBN: 978-3-540-49668-7
eBook Packages: Computer ScienceComputer Science (R0)