Nothing Special   »   [go: up one dir, main page]

Skip to main content

Word Alignment Between Chinese and Japanese Using Maximum Weight Matching on Bipartite Graph

  • Conference paper
Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead (ICCPOL 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4285))

Included in the following conference series:

Abstract

The word-aligned bilingual corpus is an important knowledge source for many tasks in NLP especially in machine translation. Among the existing word alignment methods, the unknown word problem, the synonym problem and the global optimization problem are very important factors impacting the recall and precision of alignment results. In this paper, we proposed a word alignment model between Chinese and Japanese which measures similarity in terms of morphological similarity, semantic distance, part of speech and co-occurrence, and matches words by maximum weight matching on bipartite graph. The model can partly solve the problems mentioned above. The model was proved to be effective by experiments. It achieved 80% as F-Score than 72% of GIZA++.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Och, F., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003)

    Article  Google Scholar 

  2. Brown, P.F., Cocke, J., Pietra, S.A.D., Pietra, V.J.D., Jelinek, F., Lafferty, J.D., Mercer, R.L., Roossin, P.S.: A statistical approach to machine translation. Computational Linguistics 16(2), 79–85 (1990)

    Google Scholar 

  3. Toutanova, K., Ilhan, H.T., Manning, C.D.: Extensions to hmm-based statistical word alignment models. In: Proceedings of Conference on Empirical Methods for Natural Language Processing, Philadelphia, PA, pp. 87–94 (2002)

    Google Scholar 

  4. Gale, W., Church, K.: Identifying Word Correspondances in Parallel Texts. In: Proceedings of DARPA Workshop on Speech and Natural Language, Pacific Grove, CA, pp. 152–157 (1991)

    Google Scholar 

  5. Ker, S.J., Chang, J.S.: A Class-based Approach to Word Alignment. Computational Linguistics 23(2), 313–343 (1997)

    Google Scholar 

  6. Simard, M., Foster, G., Isabelle, P.: Using Cognates to Align Sentences in Bilingual Corpora. In: Proceedings of the Fourth International Conference on Theoretical and Methodogical Issues in Machine translation (TMI 1992) (Montreal), pp. 67–81 (1992)

    Google Scholar 

  7. Zhang, Y., Ma, Q., Isahara, H.: Use of Kanji Information in Constructing a Japanese-Chinese Bilingual Lexicon. In: Proceedings of The 4th workshop on ALR, Hainan, China (2004)

    Google Scholar 

  8. Wu, D.: Bracketing and aligning words and constituents in parallel text using Stochastic Inversion Transduction Grammars. In: Parallel Text Processing: Alignment and Use of Translation Corpora. Kluwer, Dordrecht (2000)

    Google Scholar 

  9. Zhang, Y., Ma, Q., Isahara, H.: Automatic acquisition of a Japanese-Chinese Bilingual lexicon Using English as an Intermediary. In: IEEE NLPKE 2003, Beijing (2003)

    Google Scholar 

  10. Ma, Q., Zhang, Y., Masaki, M., Isahara, H.: Semantic Maps for Word Alignment in Bilingual Parallel Corpora. In: ACL 2003 Workshop: Second SIGHAN Workshop on Chinese Language Processing, Sapporo, pp. 98–103 (2003)

    Google Scholar 

  11. Budanitsky, A., Hirst, G.: Semantic Distance in WordNet: An Experimental, Application-oriented Evaluation of Five Measures. In: The North American Chapter of the Association for Computational Linguistics, Pittsburgh, PA (2001)

    Google Scholar 

  12. Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of International Conference on Research in Computational Linguistics, Taiwan (1997)

    Google Scholar 

  13. Kuhn, H.W.: The Hungarian Method for the assignment problem. Naval Research Logistic Quarterly 2, 83–97 (1955)

    Article  Google Scholar 

  14. Melamed, I.: Automatic construction of clean broad-coverage lexicons. In: Proceedings of the 2nd Conf. AMTA, Montreal, CA, pp. 125–134 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wu, H., Liu, S. (2006). Word Alignment Between Chinese and Japanese Using Maximum Weight Matching on Bipartite Graph. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_8

Download citation

  • DOI: https://doi.org/10.1007/11940098_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49667-0

  • Online ISBN: 978-3-540-49668-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics