Word Alignment Between Chinese and Japanese Using Maximum Weight Matching on Bipartite Graph

Honglin Wu²² &
Shaoming Liu²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4285))

Included in the following conference series:

International Conference on Computer Processing of Oriental Languages

1057 Accesses

Abstract

The word-aligned bilingual corpus is an important knowledge source for many tasks in NLP especially in machine translation. Among the existing word alignment methods, the unknown word problem, the synonym problem and the global optimization problem are very important factors impacting the recall and precision of alignment results. In this paper, we proposed a word alignment model between Chinese and Japanese which measures similarity in terms of morphological similarity, semantic distance, part of speech and co-occurrence, and matches words by maximum weight matching on bipartite graph. The model can partly solve the problems mentioned above. The model was proved to be effective by experiments. It achieved 80% as F-Score than 72% of GIZA++.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Chinese-Vietnamese Word Alignment Method Based on Bidirectional RNN and Linguistic Features

Evaluating cross-lingual textual similarity on dictionary alignment problem

Article 29 June 2020

Chinese Word Similarity Computing Based on Combination Strategy

References

Och, F., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003)
Article Google Scholar
Brown, P.F., Cocke, J., Pietra, S.A.D., Pietra, V.J.D., Jelinek, F., Lafferty, J.D., Mercer, R.L., Roossin, P.S.: A statistical approach to machine translation. Computational Linguistics 16(2), 79–85 (1990)
Google Scholar
Toutanova, K., Ilhan, H.T., Manning, C.D.: Extensions to hmm-based statistical word alignment models. In: Proceedings of Conference on Empirical Methods for Natural Language Processing, Philadelphia, PA, pp. 87–94 (2002)
Google Scholar
Gale, W., Church, K.: Identifying Word Correspondances in Parallel Texts. In: Proceedings of DARPA Workshop on Speech and Natural Language, Pacific Grove, CA, pp. 152–157 (1991)
Google Scholar
Ker, S.J., Chang, J.S.: A Class-based Approach to Word Alignment. Computational Linguistics 23(2), 313–343 (1997)
Google Scholar
Simard, M., Foster, G., Isabelle, P.: Using Cognates to Align Sentences in Bilingual Corpora. In: Proceedings of the Fourth International Conference on Theoretical and Methodogical Issues in Machine translation (TMI 1992) (Montreal), pp. 67–81 (1992)
Google Scholar
Zhang, Y., Ma, Q., Isahara, H.: Use of Kanji Information in Constructing a Japanese-Chinese Bilingual Lexicon. In: Proceedings of The 4th workshop on ALR, Hainan, China (2004)
Google Scholar
Wu, D.: Bracketing and aligning words and constituents in parallel text using Stochastic Inversion Transduction Grammars. In: Parallel Text Processing: Alignment and Use of Translation Corpora. Kluwer, Dordrecht (2000)
Google Scholar
Zhang, Y., Ma, Q., Isahara, H.: Automatic acquisition of a Japanese-Chinese Bilingual lexicon Using English as an Intermediary. In: IEEE NLPKE 2003, Beijing (2003)
Google Scholar
Ma, Q., Zhang, Y., Masaki, M., Isahara, H.: Semantic Maps for Word Alignment in Bilingual Parallel Corpora. In: ACL 2003 Workshop: Second SIGHAN Workshop on Chinese Language Processing, Sapporo, pp. 98–103 (2003)
Google Scholar
Budanitsky, A., Hirst, G.: Semantic Distance in WordNet: An Experimental, Application-oriented Evaluation of Five Measures. In: The North American Chapter of the Association for Computational Linguistics, Pittsburgh, PA (2001)
Google Scholar
Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of International Conference on Research in Computational Linguistics, Taiwan (1997)
Google Scholar
Kuhn, H.W.: The Hungarian Method for the assignment problem. Naval Research Logistic Quarterly 2, 83–97 (1955)
Article Google Scholar
Melamed, I.: Automatic construction of clean broad-coverage lexicons. In: Proceedings of the 2nd Conf. AMTA, Montreal, CA, pp. 125–134 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Natural Language Processing Lab, Institute of Software and Theory, Northeastern, University, Shenyang, 110004, China
Honglin Wu
Corporate Research Group, Fuji Xerox, Co., Ltd., Kanagawa, Japan
Shaoming Liu

Authors

Honglin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Shaoming Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Information Science, Nara Institute of Science and Technology, 630-0192, Takayama, Ikoma, Nara, Japan
Yuji Matsumoto
Dept of ECE, University of Illinois at Urbana Champaign, IL 61801, Urbana, USA
Richard W. Sproat
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Kam-Fai Wong
State Key Lab of Intelligent Tech. & Sys., Tsinghua University,
Min Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, H., Liu, S. (2006). Word Alignment Between Chinese and Japanese Using Maximum Weight Matching on Bipartite Graph. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_8

Download citation

DOI: https://doi.org/10.1007/11940098_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49667-0
Online ISBN: 978-3-540-49668-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics