Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/996350.996378acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
Article

Translating unknown cross-lingual queries in digital libraries using a web-based approach

Published: 07 June 2004 Publication History

Abstract

Users' cross-lingual queries to a digital library system might be short and not included in a common translation dictionary (unknown terms). In this paper, we investigate the feasibility of exploiting the Web as the corpus source to translate unknown query terms for cross-language information retrieval (CLIR) in digital libraries. We propose a Web-based term translation approach to determine effective translations for unknown query terms by mining bilingual search-result pages obtained from a real Web search engine. This approach can enhance the construction of a domain-specific bilingual lexicon and benefit CLIR services in a digital library that only has monolingual document collections Very promising results have been obtained in generating effective translation equivalents for many unknown terms, including proper nouns, technical terms and Web query terms.

References

[1]
Chakrabarti, S. Mining the Web: Analysis of Hypertext and Semi Structured Data, Morgan Kaufmann, 2002.]]
[2]
Chen, A. Jiang, H. and Gey, F Combining Multiple Sources for Short Query Translation in Chinese-English Cross-Language Information Retrieval. In Proceedings of the 5th International Workshop on Information Retrieval with Asian Languages (IRAL 2000), 2000, 17--23.]]
[3]
Chien, L F PAT-Tree-based Keyword Extraction for Chinese Information Retrieval. In Proceedings of the 20th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR 1997), 1997, 50--58.]]
[4]
Dumais, S. T. Landauer, T. K and Littman, M. L. Automatic Cross-Linguistic Information Retrieval Using Latent Semantic Indexing In Proceedings of ACM-SIGIR Workshop on Cross-Linguistic Information Retrieval (SIGIR 1996), 1996, 16--24.]]
[5]
Fung, P. and Yee, L. Y. An IR Approach for Translating New Words from Nonparallel, Comparable Texts. In Proceedings of the 36th Annual Conference of the Association for Computational Linguistics (ACL 1998), 1998, 414--420.]]
[6]
Gale, W. A. and Church, K. W. Identifying Word Correspondences in Parallel Texts. In Proceedings of DARPA Speech and Natural Language Workshop, 1991, 152--157.]]
[7]
Gale, W. A. and Church, K. W. A Program for Aligning Sentences in Bilingual Corpora Computational Linguistics, 19, 1 (1993), 75--102.]]
[8]
Gonnet, G. H. Baeza-yates, R. A. and Snider, T. New Indices for Text: Pat Trees and Pat Arrays Information Retrieval Data Structures & Algorithms, Prentice Hall, 1992, 66--82.]]
[9]
Kwok, K L NTCIR-2 Chinese, Cross Language Retrieval Experiments Using PIRCS. In Proceedings of NTCIR workshop meeting, 2001, 111--118.]]
[10]
Larson, R. R. Gey, F. and Chen, A. Harvesting Translingual Vocabulary Mappings for Multilingual Digital Libraries. In Proceedings of ACM/IEEE Joint Conference on Digital Libraries (JCDL 2002), 2002, 185--190.]]
[11]
Lavrenko, V. Choquette, M. and Croft, W. B. Cross-Lingual Relevance Models. In Proceedings of ACM Conference on Research and Development in Information Retrieval (SIGIR 2002), 2002, 175--182.]]
[12]
Lu, W. H. Chien, L. F. and Lee, H. J. Translation of Web Queries using Anchor Text Mining ACM Transactions on Asian Language Information Processing, 1 (2002), 159--172.]]
[13]
Lu, W. H. Chien, L. F. and Lee, H. J. Anchor Text Mining for Translation of Web Queries: A Transitive Translation Approach ACM Transactions on Information Systems, 22 (2004), 1--28.]]
[14]
Manber, U. and Baeza-yates, R. An Algorithm for String Matching with a Sequence of Don't Cares Information Processing Letters, 37 (1991), 133--136.]]
[15]
Morrison, D. PATRICIA: Practical Algorithm to Retrieve Information Coded in Alphanumeric JACM, 1968, 514--534.]]
[16]
Nie, J. Y. Isabelle, P. Simard, M. and Durand, R Cross-language Information Retrieval Based on Parallel Texts and Automatic Mining of Parallel Texts from the Web In Proceedings of ACM Conference on Research and Development in Information Retrieval (SIGIR 1999), 1999, 74--81.]]
[17]
Rapp, R. Automatic Identification of Word Translations from Unrelated English and German Corpora. In Proceedings of the 37th Annual Conference of the Association for Computational Linguistics (ACL 1999), 1999, 519--526.]]
[18]
Silva, J. F. Dias, G. Guillore, S. and Lopes, G. P. Using LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units Lecture Notes in Artificial Intelligence, 1695, Springer-Verlag, 1999, 113--132.]]
[19]
Silva, J. F. and Lopes, G. P. A Local Maxima Method and a Fair Dispersion Normalization for Extracting Multiword Units. In Proceedings of the 6th Meeting on the Mathematics of Language, 1999, 369--381.]]
[20]
Smadja, F. McKeown, K. and Hatzivassiloglou, V. Translating Collocations for Bilingual Lexicons: A Statistical Approach, Computational Linguistics, 22, 1 (1996), 1--38.]]

Cited By

View all
  • (2017)An Approach for Chinese-Japanese Named Entity Equivalents Extraction Using Inductive Learning and Hanzi-Kanji Mapping TableIEICE Transactions on Information and Systems10.1587/transinf.2016EDP7425E100.D:8(1882-1892)Online publication date: 2017
  • (2014)Multilingual Digital Libraries: A review of issues in system-centered and user-centered studies, information retrieval and user behaviorInternational Information & Library Review10.1080/10572317.2013.1076636745:1-2(3-19)Online publication date: 8-Jan-2014
  • (2014)Chinese-English OOV Term Translation with Web Mining, Multiple Feature Fusion and Supervised LearningChinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data10.1007/978-3-319-12277-9_21(234-246)Online publication date: 2014
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
JCDL '04: Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
June 2004
440 pages
ISBN:1581138326
DOI:10.1145/996350
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cross-language information retrieval
  2. digital library
  3. term extraction
  4. term translation
  5. web mining

Qualifiers

  • Article

Conference

JCDL04

Acceptance Rates

JCDL '04 Paper Acceptance Rate 61 of 249 submissions, 24%;
Overall Acceptance Rate 415 of 1,482 submissions, 28%

Upcoming Conference

JCDL '24
The 2024 ACM/IEEE Joint Conference on Digital Libraries
December 16 - 20, 2024
Hong Kong , China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2017)An Approach for Chinese-Japanese Named Entity Equivalents Extraction Using Inductive Learning and Hanzi-Kanji Mapping TableIEICE Transactions on Information and Systems10.1587/transinf.2016EDP7425E100.D:8(1882-1892)Online publication date: 2017
  • (2014)Multilingual Digital Libraries: A review of issues in system-centered and user-centered studies, information retrieval and user behaviorInternational Information & Library Review10.1080/10572317.2013.1076636745:1-2(3-19)Online publication date: 8-Jan-2014
  • (2014)Chinese-English OOV Term Translation with Web Mining, Multiple Feature Fusion and Supervised LearningChinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data10.1007/978-3-319-12277-9_21(234-246)Online publication date: 2014
  • (2013)BibliographyNatural Language Processing10.1201/b15472-19(311-329)Online publication date: 30-Oct-2013
  • (2012)Multilingual needs and expectations in digital librariesThe Electronic Library10.1108/0264047121122132230:2(182-197)Online publication date: 6-Apr-2012
  • (2011)Web-Based Verification on the Representativeness of Terms Extracted from Single Short DocumentsProceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 0310.1109/WI-IAT.2011.258(114-117)Online publication date: 22-Aug-2011
  • (2010)Fusion of multiple features and ranking SVM for web-based English-Chinese OOV term translationProceedings of the 23rd International Conference on Computational Linguistics: Posters10.5555/1944566.1944730(1435-1443)Online publication date: 23-Aug-2010
  • (2010)Mining large-scale comparable corpora from Chinese-English news collectionsProceedings of the 23rd International Conference on Computational Linguistics: Posters10.5555/1944566.1944620(472-480)Online publication date: 23-Aug-2010
  • (2010)Transferring structural markup across translations using multilingual alignment and projectionProceedings of the 10th annual joint conference on Digital libraries10.1145/1816123.1816126(11-20)Online publication date: 21-Jun-2010
  • (2010)Unsupervised multilingual concept discovery from daily online news extracts2010 IEEE International Conference on Intelligence and Security Informatics10.1109/ISI.2010.5484763(132-134)Online publication date: May-2010
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media