Abstract
An integrated method for bilingual chunk partition andalignment, called “Interactional Matching”, is proposed in this paper. Different from former works, our method tries to get as necessary information as possible from the bilingual corpora themselves, and through bilingual constraint it can automatically build one-to-one chunk-pairs associated with the chunk-pair confidence coefficients. Also, our method partitions bilingual sentences entirely into chunks with no fragments left, different from collocation extracting methods. Furthermore, with the technology of Probabilistic Latent Semantic Indexing(PLSI), this method can deal with not only compositional chunks, but also non-compositional ones. The experiments show that, for overall process (including partition and alignment), our method can obtain 85% precision with 57% recall for the written language chunk-pairs and 78% precision with 53% recall for the spoken language chunk-pairs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Smadja, F.: Retrieving Collocations from Text: Xtract. Computational Linguistics 19(1), 143–177 (1993)
Zhou, Q.: Automatically Bracket and Tag Chinese Phrase. Journal of Chinese Information Processing 11(1), 1–10 (1997)
Chen, B., Du Alignment, L.: of Single Source Words and Target Multiword Units from Parallet Corpus. In: 1st Students’ Workshop on Computational Linguistics Proceedings, August 20-23, pp. 318-127 (2002)
Silva, J.F., Dias, G., Guillor, S., Lopes, J.G.P.: Using Localmaxs Algorithm for Extraction of Contiguous and Non-contiguous Multiword Lexical Units. In: 9th Portuguese Conference in Artificial Intelligence. Lecture Notes, Spring-Verlag, Universidade de Evora (1999)
Wang, W., Zhou, M., Huang, J., Huang, C.: Structure Alignment Using Bilingual Chunking. In: Proceedings of COLING 2002, Taipei, August 24-September 1 (2002)
Hofmann, T.: Probabilistic Latent Semantic Indexing. In: Proceedings of the 22nd Annual ACM Conference on Research and Development in Information Retrieval, Berkeley, Cali-fornia, pp. 50–57 (1999)
Wu, D.: Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora. Computational Linguistics 23(3), 377–400 (1997)
Blei, D., Ng, A.Y., Jordan, M.: Latent Dirichlet Allocation. Journal of Machine Learning Research, 993–1022 (2003)
Golub, G., Solna, K., Van Dooren, P.: Computing the SVD of a General Matrix Product/Quotient. SIAM Journal on Matrix Analysis and Applications 22(1), 1–19 (2000)
Cheng, W., Zhao, J., Xu, B., Liu, F.: Bilingual Chunking for Chinese- English Spoken-language Translation. Journal of Chinese Information Processing 17(2), 21–27 (2003)
Zhao, J.: The Framework of Cross-lingual Information Retrieval. Chinese-Japanese Natural Language Processing Proseminar (2nd) (2002)
Li, C., Li, H.: Word Translation Disambiguation Using Bilingual Bootstrapping. In: Proceedings of the Fortieth Annual Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia (July 2002)
Watanabe, H., Kurohashi, S., Aramaki, E.: Finding Structural Correspondences from Bilingual Parsed Corpus for Corpus-based Translation. In: COLING 2000 (2000)
Le, S., Youbing, J., Lin, D., Yufang, S.: Word Alignment of English-Chinese Bilingual Corpus Based on Chunks. In: Proc. 2000 EMNLP and VLC, pp. 110–116 (2000)
Jin, Q.: Zhao, J., Xu, B.: Weakly-Supervised Probabilistic Latent Semantic Analysis and its Applications in Multilingual Information Retrieval. In: Proceedings of 7th Joint Symposium on Computational Linguistics, August 9-11, pp. 9–11 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, F., Jin, Q., Zhao, J., Xu, B. (2005). Bilingual Chunk Alignment Based on Interactional Matching and Probabilistic Latent Semantic Indexing. In: Su, KY., Tsujii, J., Lee, JH., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2004. IJCNLP 2004. Lecture Notes in Computer Science(), vol 3248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30211-7_44
Download citation
DOI: https://doi.org/10.1007/978-3-540-30211-7_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24475-2
Online ISBN: 978-3-540-30211-7
eBook Packages: Computer ScienceComputer Science (R0)