Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Query Expansion in Resource-Scarce Languages: A Multilingual Framework Utilizing Document Structure

Published: 18 November 2016 Publication History

Abstract

Retrievals in response to queries to search engines in resource-scarce languages often produce no results, which annoys the user. In such cases, at least partially relevant documents must be retrieved. We propose a novel multilingual framework, MultiStructPRF, which expands the query with related terms by (i) using a resource-rich assisting language and (ii) giving varied importance to the expansion terms depending on their position of occurrence in the document. Our system uses the help of an assisting language to expand the query in order to improve system recall. We propose a systematic expansion model for weighting the expansion terms coming from different parts of the document. To combine the expansion terms from query language and assisting language, we propose a heuristics-based fusion model. Our experimental results show an improvement over other PRF techniques in both precision and recall for multiple resource-scarce languages like Marathi, Bengali, Odia, Finnish, and the like. We study the effect of different assisting languages on precision and recall for multiple query languages. Our experiments reveal an interesting fact: Precision is positively correlated with the typological closeness of query language and assisting language, whereas recall is positively correlated with the resource richness of the assisting language.

References

[1]
Bashar Al-Shboul and Sung-Hyon Myaeng. 2011. Query phrase expansion using Wikipedia in patent class search. In AIRS. 115--126.
[2]
Arjun Atreya, Yogesh Kakde, Pushpak Bhattacharyya, and Ganesh Ramakrishnan. 2013. Structure cognizant pseudo relevance feedback. In Proceedings of IJCNLP. 982--986.
[3]
Olivier Bodenreider. 2004. The unified medical language system (UMLS): Integrating biomedical terminology. Nucleic Acids Research 32, suppl 1 (2004), D267--D270.
[4]
Martin Braschler and Carol Peters. 2004. Cross-language evaluation forum: Objectives, results, achievements. Information Retrieval 7, 1--2 (2004), 7--31.
[5]
C. Buckeley, G. Salton, J. Allan, and A. Stinghal. 1994. Automatic query expansion using SMART. In Proceedings of the 3rd Text Retrieval Conference. 69--80.
[6]
Guihong Cao, Jian-Yun Nie, Jianfeng Gao, and Stephen Robertson. 2008. Selecting good expansion terms for pseudo-relevance feedback. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 243--250.
[7]
Claudio Carpineto and Giovanni Romano. 2012. A survey of automatic query expansion in information retrieval. ACM Computing Surveys (CSUR) 44, 1 (2012), 1.
[8]
Manoj K. Chinnakotla, Karthik Raman, and Pushpak Bhattacharyya. 2010a. Multilingual PRF: English lends a helping hand. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 659--666.
[9]
Manoj K. Chinnakotla, Karthik Raman, and Pushpak Bhattacharyya. 2010b. Multilingual pseudo-relevance feedback: Performance study of assisting languages. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1346--1356.
[10]
Kevyn Collins-Thompson and Jamie Callan. 2005. Query expansion using random walk models. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management. ACM, 704--711.
[11]
W. Bruce Croft and David J. Harper. 1979. Using probabilistic models of document retrieval without relevance information. Journal of Documentation 35, 4 (1979), 285--295.
[12]
Steve Cronen-Townsend, Yun Zhou, and W. Bruce Croft. 2004. A framework for selective query expansion. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management. ACM, 236--237.
[13]
Hang Cui, Ji-Rong Wen, Jian-Yun Nie, and Wei-Ying Ma. 2002. Probabilistic query expansion using query logs. In Proceedings of the 11th International Conference on World Wide Web. ACM, 325--332.
[14]
Surya Ganesh and Vasudeva Verma. 2009. Exploiting structure and content of Wikipedia for query expansion in the context. In International Conference RANLP. 103--106.
[15]
Wei Gao, John Blitzer, and Ming Zhou. 2008. Using english information in non-english web search. In Proceedings of the 2nd ACM Workshop on Improving Non-English Web Searching. ACM, 17--24.
[16]
K. Sparck Jones, Steve Walker, and Stephen E. Robertson. 2000. A probabilistic model of information retrieval: Development and comparative experiments: Part 1. Information Processing 8 Management 36, 6 (2000), 779--808.
[17]
John Lafferty and Chengxiang Zhai. 2001a. Document language models, query models, and risk minimization for information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 111--119.
[18]
John Lafferty and Chengxiang Zhai. 2001b. Document language models, query models, and risk minimization for information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’01). ACM, New York, 111--119.
[19]
Victor Lavrenko and W. Bruce Croft. 2001. Relevance based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 120--127.
[20]
Craig Macdonald and Iadh Ounis. 2007. Expertise drift and query expansion in expert search. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM’07). ACM, New York, NY, USA, 341--350.
[21]
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Vol. 1. Cambridge University Press.
[22]
Mandar Mitra, Amit Singhal, and Chris Buckley. 1998. Improving automatic query expansion. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 206--214.
[23]
Yonggang Qiu and Hans-Peter Frei. 1993. Concept based query expansion. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’93). ACM, New York, 160--169.
[24]
Tetsuya Sakai, Toshihiko Manabe, and Makoto Koyama. 2005. Flexible pseudo-relevance feedback via selective sampling. ACM Transactions on Asian Language Information Processing (TALIP) 4, 2 (2005), 111--135.
[25]
M. Sanderson and M. Braschler. 2009. Best Practices for Test Collection Creation and Information Retrieval System Evaluation. Technical Report. TrebleCLEF Project.
[26]
Alan F. Smeaton, Fergus Kelledy, and Ruairi O’Donnell. 1995. TREC-4 experiments at Dublin City University: Thresholding posting lists, query expansion with WordNet and POS tagging of Spanish. Harman {6} (1995), 373--389.
[27]
Tao Tao and Cheng Xiang Zhai. 2006. Regularized estimation of mixture models for robust pseudo-relevance feedback. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 162--169.
[28]
Dolf Trieschnigg, Djoerd Hiemstra, Franciska de Jong, and Wessel Kraaij. 2010. A cross-lingual framework for monolingual biomedical information retrieval. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management. ACM, 169--178.
[29]
Ellen M. Voorhees. 2005. The TREC robust retrieval track. ACM SIGIR Forum, Vol. 39. ACM, 11--20.
[30]
Yang Xu, Gareth J. F. Jones, and Bin Wang. 2009a. Query dependent pseudo-relevance feedback based on wikipedia. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 59--66.
[31]
Yang Xu, Gareth J. F. Jones, and Bin Wang. 2009b. Query dependent pseudo-relevance feedback based on wikipedia. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’09). ACM, New York, 59--66.
[32]
Zhijun Yin, Milad Shokouhi, and Nick Craswell. 2009. Query expansion using external evidence. In Advances in Information Retrieval. Springer, 362--374.
[33]
Chengxiang Zhai and John Lafferty. 2001. Model-based feedback in the language modeling approach to information retrieval. In Proceedings of the 10th International Conference on Information and Knowledge Management. ACM, 403--410.
[34]
Guangyou Zhou, Fang Liu, Yang Liu, Shizhu He, Jun Zhao, and others. 2013. Statistical machine translation improves question retrieval in community question answering via matrix factorization. ACL (1). 852--861.
[35]
Guangyou Zhou, Kang Liu, Jun Zhao, and others. 2012. Exploiting bilingual translation for question retrieval in community-based question answering. In COLING. 3153--3170.
[36]
Guangyou Zhou, Zhiwen Xie, Tingting He, Jun Zhao, and Xiaohua Tony Hu. 2016. Learning the multilingual translation representations for question retrieval in community question answering via non-negative matrix factorization. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 7 (2016), 1305--1314.

Cited By

View all
  • (2024)A Hybrid Query Expansion Method for Effective Bengali Information RetrievalProceedings of 4th International Conference on Frontiers in Computing and Systems10.1007/978-981-97-2611-0_26(377-397)Online publication date: 29-Jun-2024

Index Terms

  1. Query Expansion in Resource-Scarce Languages: A Multilingual Framework Utilizing Document Structure

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 16, Issue 2
    TALLIP Notes and Regular Papers
    June 2017
    136 pages
    ISSN:2375-4699
    EISSN:2375-4702
    DOI:10.1145/3008658
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 November 2016
    Accepted: 01 September 2016
    Revised: 01 July 2016
    Received: 01 March 2016
    Published in TALLIP Volume 16, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Query expansion
    2. multilingual retrieval
    3. resource scarce languages

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 21 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Hybrid Query Expansion Method for Effective Bengali Information RetrievalProceedings of 4th International Conference on Frontiers in Computing and Systems10.1007/978-981-97-2611-0_26(377-397)Online publication date: 29-Jun-2024

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media