Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/860435.860519acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Enhancing cross-language information retrieval by an automatic acquisition of bilingual terminology from comparable corpora

Published: 28 July 2003 Publication History

Abstract

This paper presents an approach to bilingual lexicon extraction from comparable corpora and evaluations on Cross-Language Information Retrieval. We explore a bi-directional extraction of bilingual terminology primarily from comparable corpora. A combined statistics-based and linguistics-based model to select best translation candidates to phrasal translation is proposed. Evaluations using a large test collection for Japanese-English revealed the proposed combination of bi-directional comparable corpora, bilingual dictionaries and transliteration, augmented with linguistics-based pruning to be highly effective in Cross-Language Information Retrieval.

References

[1]
Dejean, H., Gaussier, E., and Sadat, F. An Approach based on Multilingual Thesauri and Model Combination for Bilingual Lexicon Extraction. In Proceedings of COLING 2002, pp. 218--224, 2002.
[2]
Fung, P. A Statistical View of Bilingual Lexicon Extraction: From Parallel Corpora to Non-Parallel Corpora. 2000. In Jean Veronis, Ed. Parallel Text Processing.
[3]
Knight, K., and Graehl, J. Machine Transliteration. Computational Linguistics 24(4). 1998.
[4]
Koehn, P., and Knight, K. Learning a Translation Lexicon from Monolingual Corpora. In Proceedings of the ACL-02 Workshop on Unsupervised Lexical Acquisition. 2002.
[5]
Rapp, R. Automatic Identification of Word Translations from Unrelated English and German Corpora. In proceedings of EACL 1999.

Cited By

View all
  • (2014)Mining a Persian-English comparable corpus for cross-language information retrievalInformation Processing and Management: an International Journal10.1016/j.ipm.2013.10.00250:2(384-398)Online publication date: 1-Mar-2014
  • (2013)A language modeling approach for extracting translation knowledge from comparable corporaProceedings of the 35th European conference on Advances in Information Retrieval10.1007/978-3-642-36973-5_51(606-617)Online publication date: 24-Mar-2013
  • (2010)Using comparable corpora to improve the effectiveness of cross-language information retrievalProceedings of the 7th international conference on Advances in natural language processing10.5555/1884371.1884409(320-331)Online publication date: 16-Aug-2010
  • Show More Cited By

Index Terms

  1. Enhancing cross-language information retrieval by an automatic acquisition of bilingual terminology from comparable corpora

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
    July 2003
    490 pages
    ISBN:1581136463
    DOI:10.1145/860435
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 July 2003

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. bilingual lexicon extraction
    2. comparable corpora
    3. cross-language information retrieval
    4. disambiguation
    5. part-of-speech
    6. transliteration

    Qualifiers

    • Article

    Conference

    SIGIR03
    Sponsor:

    Acceptance Rates

    SIGIR '03 Paper Acceptance Rate 46 of 266 submissions, 17%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 21 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2014)Mining a Persian-English comparable corpus for cross-language information retrievalInformation Processing and Management: an International Journal10.1016/j.ipm.2013.10.00250:2(384-398)Online publication date: 1-Mar-2014
    • (2013)A language modeling approach for extracting translation knowledge from comparable corporaProceedings of the 35th European conference on Advances in Information Retrieval10.1007/978-3-642-36973-5_51(606-617)Online publication date: 24-Mar-2013
    • (2010)Using comparable corpora to improve the effectiveness of cross-language information retrievalProceedings of the 7th international conference on Advances in natural language processing10.5555/1884371.1884409(320-331)Online publication date: 16-Aug-2010
    • (2010)Exploiting comparable corpora for cross-language information retrievalProceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence10.5555/1884293.1884363(662-667)Online publication date: 30-Aug-2010
    • (2010)Exploiting Comparable Corpora for Cross-Language Information RetrievalPRICAI 2010: Trends in Artificial Intelligence10.1007/978-3-642-15246-7_66(662-667)Online publication date: 2010
    • (2010)Using Comparable Corpora to Improve the Effectiveness of Cross-Language Information RetrievalAdvances in Natural Language Processing10.1007/978-3-642-14770-8_36(320-331)Online publication date: 2010
    • (2008)A cost-effective lexical acquisition process for large-scale thesaurus translationLanguage Resources and Evaluation10.1007/s10579-008-9074-843:1(27-40)Online publication date: 4-Nov-2008
    • (2006)Leveraging reusabilityProceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics10.3115/1220175.1220294(945-952)Online publication date: 17-Jul-2006
    • (2004)Knowledge acquisition from collections of news articles to cross-language information retrievalCoupling approaches, coupling media and coupling languages for information retrieval10.5555/2816272.2816318(504-513)Online publication date: 26-Apr-2004
    • (2003)Learning bilingual translations from comparable corpora to cross-language information retrievalProceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 1110.3115/1118935.1118943(57-64)Online publication date: 7-Jul-2003

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media