Abstract
Cross-language information retrieval (CLIR), where queriesand documents are in different languages, has of late become one ofthe major topics within the information retrieval community. Thispaper proposes a Japanese/English CLIR system, where we combine aquery translation and retrieval modules. We currently target theretrieval of technical documents, and therefore the performance of oursystem is highly dependent on the quality of the translation oftechnical terms. However, the technical term translation is stillproblematic in that technical terms are often compound words, and thusnew terms are progressively created by combining existing basewords. In addition, Japanese often represents loanwords based on itsspecial phonogram. Consequently, existing dictionaries find itdifficult to achieve sufficient coverage. To counter the firstproblem, we produce a Japanese/English dictionary for base words, andtranslate compound words on a word-by-word basis. We also use aprobabilistic method to resolve translation ambiguity. For the secondproblem, we use a transliteration method, which corresponds wordsunlisted in the base word dictionary to their phonetic equivalents inthe target language. We evaluate our system using a test collectionfor CLIR, and show that both the compound word translation andtransliteration methods improve the system performance.
Similar content being viewed by others
References
AAAI. ElectronicWorking Notes of the AAAI Spring Symposium on Cross-Language Text and Speech Retrieval, 1997, http://www.clis.umd.edu/dlrg/filter/sss/papers/
ACM SIGIR. Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1996–1998.
Aone, C., N. Charocopos and J. Gorlinsky. “An Intelligent Multilingual Information Browsing and Retrieval System Using Information Extraction”. In Proceedings of the 5th Conference on Applied Natural Language Processing, 1997, pp. 332–339.
Ballesteros L. and W. B. Croft. “Phrasal Translation and Query Expansion Techniques for Cross-Language Information Retrieval”. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1997, pp. 84–91.
Ballesteros L. and W. B. Croft. “Resolving Ambiguity for Cross-Language Retrieval”. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1998, pp. 64–71.
Brown, P. F., S. A. D. Pietra, V. J. D. Pietra and R. L. Merce. “The Mathematics of Statistical Machine Translation: Parameter Estimation”. Computational Linguistics, 19(2) (1993), 263–311.
Carbonell, J. G., Y. Yang, R. E. Frederking, R. D. Brown, Y. Geng and D. Lee. “Translingual Information Retrieval: A Comparative Evaluation”. In Proceedings of the 15th International Joint Conference on Artificial Intelligence, 1997, pp. 708–714.
Chen, H. H., S. J. Huang, Y. W. Ding and S. C. Tsai. “Proper Name Translation in Cross-Language Information Retrieval”. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics, 1998, pp. 232–236.
Chen, H. H., G. W. Bian and W. C. Lin. “Resolving Translation Ambiguity and Target Polysemy in Cross-Language Information Retrieval”. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, 1999, pp. 215–222.
Church, K. W. and R. L. Mercer. “Introduction to the Special Issue on Computational Linguistics Using Large Corpora”. Computational Linguistics, 19(1) (1993), 1–24.
Davis M.W. and W. C. Ogden. “QUILT: Implementing a Large-Scale Cross-Language Text Retrieval System”. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1997, pp. 92–98.
Deerwester, S., S. T. Dumais, G. W. Furnas, T. K. Landauer and R. Harshman. “Indexing by Latent Semantic Analysis”. Journal of the American Society for Information Science, 41(6) (1990), 391–407.
Dijkstra, E. W. “A Note on Two Problems in Connexion with Graphs”. Numerische Mathematik, 1 (1959), 269–271.
Dorr, B. J. and D. W. Oard. “Evaluating Resources for Query Translation in Cross-Language Information Retrieval”. In Proceedings of the 1st International Conference on Language Resources and Evaluation, 1998, pp. 759–764.
Dumais, S. T., T. K. Landauer and M. L. Littman. “Automatic Cross-Linguistic Information Retrieval Using Latent Semantic Indexing”. In ACM SIGIR Workshop on Cross-Linguistic Information Retrieval, 1996.
Fellbaum, C. (Ed.). WordNet: An Electronic Lexical Database. MIT Press, 1998.
Ferber, G. English-Japanese, Japanese-English Dictionary of Computer and Data-Processing Terms. MIT Press, 1989.
Fung, P., L. Xiaohu and C. C. Shun. “Mixed Language Query Disambiguation”. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, 1999, pp. 333–340.
Fung, P. “A Pattern Matching Method for Finding Noun and Proper Noun Translations From Noisy Parallel Corpora”. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, 1995, pp. 236–243.
Gachot, D. A., E. Lange and J. Yang. “The SYSTRAN NLP Browser: An Application of Machine Translation Technology in Multilingual Information Retrieval”. In ACM SIGIR Workshop on Cross-Linguistic Information Retrieval, 1996.
Gilarranz, J., J. Gonzalo and F. Verdejo. “An Approach to Conceptual Text Retrieval Using the EuroWordNetMultilingual Semantic Database”. In ElectronicWorking Notes of the AAAI Spring Symposium on Cross-Language Text and Speech Retrieval, 1997.
Gonzalo, J., F. Verdejo, C. Peters and N. Calzolari. “Applying EuroWordNet to Cross-Language Text Retrieval”. Computers and the Humanities, 32 (1998), 185–207.
Hull, D. A. and G. Grefenstette. “Querying Across Languages: A Dictionary-Based Approach to Multilingual Information Retrieval”. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1996, pp. 49–57.
Hull, D. A. “Using Statistical Testing in the Evaluation of Retrieval Experiments”. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1993, pp. 329–338.
Hull, D. A. “Using Structured Queries for Disambiguation in Cross-Language Information Retrieval”. In Electronic Working Notes of the AAAI Spring Symposium on Cross-Language Text and Speech Retrieval, 1997.
Japan Electronic Dictionary Research Institute. Bilingual dictionary (In Japanese), 1995a.
Japan Electronic Dictionary Research Institute. Technical terminology dictionary (information processing) (In Japanese), 1995b.
Kaji H. and T. Aizono. “Extracting Word Correspondences From Bilingual Corpora Based on Word Co-Occurrence Information”. In Proceedings of the 16th International Conference on Computational Linguistics, 1996, pp. 23–28.
Kando, N., K. Kuriyama and T. Nozue. “NACSIS Test Collection Workshop (NTCIR-1)”. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1999, pp. 299–300.
Keen. E. M. “Presenting Results of Experimental Retrieval Comparisons”. Information Processing & Management, 28(4) (1992), 491–502.
Knight K. and J. Graehl. “Machine Transliteration”. Computational Linguistics, 24(4) (1998), 599–612.
Kobayashi, Y., T. Tokunaga and H. Tanaka. “Analysis of Japanese Compound Nouns Using Collocational Information”. In Proceedings of the 15th International Conference on Computational Linguistics, 1994, pp. 865–869.
Kwon, O. W., I. Kang, J. H. Lee and G. Lee. “Conceptual Cross-Language Text Retrieval Based on Document Translation Using Japanese-to-Korean MT System”. International Journal of Computer Processing of Oriental Languages, 12(1) (1998), 1–16.
Lee, J. S. and K. S. Choi. “A StatisticalMethod to Generate Various ForeignWord Transliterations in Multilingual Information Retrieval System”. In Proceedings of the 2nd International Workshop on Information Retrieval with Asian Languages, 1997, pp. 123–128.
Mani, I. and E. Bloedorn. “Machine Learning of Generic and User-Focused Summarization”. In Proceedings of AAAI/IAAI-98, 1998, pp. 821–826.
Matsumoto, Y., A. Kitauchi, T. Yamashita, Y. Hirano, O. Imaichi and T. Imamura. “Japanese Morphological Analysis System ChaSen Manual”. Technical Report NAIST-IS-TR97007, NAIST (In Japanese), 1997.
McCarley, J. S. “Should We Translate the Documents or the Queries in Cross-Language Information Retrieval”? In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, 1999, pp. 208–214.
Mongar, P. E. “International Co-Operation in Abstracting Services for Road Engineering”. The Information Scientist, 3 (1969), 51–62.
Nichigai Associates. English-Japanese Computer Terminology Dictionary (In Japanese), 1996.
Nie, J. Y., M. Simard, P. Isabelle and R. Durand. “Cross-Language Information Retrieval Based on Parallel Texts and Automatic Mining of Parallel Texts From the Web”. In Proceedings of the 22nd Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, 1999, pp. 74–81.
National Institute of Standards & Technology. Proceedings of the Text Retrieval Conferences, 1992–1998, http://trec.nist.gov/pubs.html
Oard, D. W. and P. Resnik. “Support for Interactive Document Selection in Cross-Language Information Retrieval”. Information Processing & Management, 35(3) (1999), 363–379.
Oard, D. W. “A Comparative Study of Query and Document Translation for Cross-Language Information Retrieval”. In Proceedings of the 3rd Conference of the Association for Machine Translation in the Americas, 1998, pp. 472–483.
Okumura, A., K. Ishikawa and K. Satoh. “Translingual Information Retrieval by a Bilingual Dictionary and Comparable Corpus”. In The 1st International Conference on Language Resources and Evaluation, Workshop on Translingual Information Management: Current Levels and Future Abilities, 1998.
Pirkola, A. “The Effects of Query Structure and Dictionary Setups in Dictionary-Based Cross-Language Information Retrieval ”. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1998, pp. 55–63.
Sakai, T., M. Kajiura, K. Sumita, G. Jones and N. Collier. “A Study on English-Japanese/Japanese-English Cross-Language Information Retrieval Using Machine Translation”. Transactions of Information Processing Society of Japan, 40(11) (1999), 4075–4086 (In Japanese).
Salton, G. and C. Buckley. “Term-Weighting Approaches in Automatic Text Rretrieval”. Information Processing & Management, 24(5) (1988), 513–523.
Salton, G. and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1983.
Salton, G. “Automatic Processing of Foreign Language Documents”. Journal of the American Society for Information Science, 21(3) (1970), 187–194.
Salton, G. The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall, 1971.
Salton, G. Experiments in multi-lingual information retrieval. Technical Report TR 72–154, Computer Science Department, Cornell University, 1972.
Schäuble, P. and P. Sheridan. “Cross-Language Information Retrieval (CLIR) Track Overview”. In The 6th Text Retrieval Conference, 1997.
Sheridan, P. and J. P. Ballerini. “Experiments in Multilingual Information Retrieval Using the SPIDER System”. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1996, pp. 58–65.
Smadja, F., K. R. McKeown and V. Hatzivassiloglou. “Translating Collocations for Bilingual Lexicons: A Statistical Approach”. Computational Linguistics, 22(1) (1996), 1–38.
Suzuki, M., N. Inoue and K. Hashimoto. “Effect on Displaying Translated Major Keywords of Contents as Browsing Support in Cross-Language Information Retrieval”. Information Processing Society of Japan SIGNL Notes, 98(63) (1998), 99–106 (In Japanese).
Suzuki, M., N. Inoue and K. Hashimoto. “Effects of Partial Translation for Users 'Document Selection in Cross-Language Information Retrieval”. In Proceedings of The 5th Annual Meeting of The Association for Natural Language Processing, 1999, pp. 371–374 (In Japanese).
Tombros, A. and M. Sanderson. “Advantages of Query Biased Summaries in Information Retrieval”. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1998, pp. 2–10.
Tsuji, K. and K. Kageura. “An HMM-Based Method for Segmenting Japanese Terms and Keywords Based on Domain-Specific Bilingual Corpora”. In Proceedings of the 4th Natural Language Processing Pacific Rim Symposium, 1997, pp. 557–560.
Voorhees, E. M. “Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness”. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1998, pp. 315–323.
Vossen, P. “Introduction to EuroWordNet”. Computers and the Humanities, 32 (1998), 73–89.
Wong, S. K. M., W. Siarko and P. C. N. Wong. “Generalized Vector Space Model in Information Retrieval”. In Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1985, pp. 18–25.
Xu, J. and W. B. Croft. “Query Expansion Using Local and Global Document Analysis”. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1996, pp. 4–11.
Yamabana, K., K. Muraki, S. Doi and S. Kamei. “A Language Conversion Front-End for Cross-Linguistic Information Retrieval”. In ACM SIGIR Workshop on Cross-Linguistic Information Retrieval, 1996.
Zobel, J. and A. Moffat. “Exploring the Similarity Space”. ACM SIGIR FORUM, 32(1) (1998), 18–34.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Fujii, A., Ishikawa, T. Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration. Computers and the Humanities 35, 389–420 (2001). https://doi.org/10.1023/A:1011856202986
Issue Date:
DOI: https://doi.org/10.1023/A:1011856202986