Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Utilizing inter-passage and inter-document similarities for reranking search results

Published: 27 December 2010 Publication History

Abstract

We present a novel language-model-based approach to reranking search results; that is, reordering the documents in an initially retrieved list so as to improve precision at top ranks. Our model integrates whole-document information with that induced from passages. Specifically, inter-passage, inter-document, and query-based similarities, which constitute a rich source of information, are combined in our model. Empirical evaluation shows that the precision-at-top-ranks performance of our model is substantially better than that of the initial ranking upon which reranking is performed. Furthermore, the performance is substantially better than that of a commonly used passage-based document ranking method that does not exploit inter-item similarities. Our model also generalizes and outperforms a recently proposed reranking method that utilizes inter-document similarities, but which does not exploit passage-based information. Finally, the model's performance is superior to that of a state-of-the-art pseudo-feedback-based retrieval approach.

References

[1]
Abdul-Jaleel, N., Allan, J., Croft, W. B., Diaz, F., Larkey, L., Li, X., Smucker, M. D., and Wade, C. 2004. UMASS at TREC 2004—Novelty and hard. In Proceedings of the 13th Text Retrieval Conference (TREC-13). 715--725.
[2]
Balinski, J. and Danilowicz, C. 2005. Re-Ranking method based on inter-document distances. Inform. Process. Manag. 41, 4, 759--775.
[3]
Bendersky, M. and Kurland, O. 2008a. Re-Ranking search results using document-passage graphs. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 853--854. Poster.
[4]
Bendersky, M. and Kurland, O. 2008b. Utilizing passage-based language models for document retrieval. In Proceedings of the European Conference on IR Research (ECIR). 162--174.
[5]
Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the World Wide Web Conference. 107--117.
[6]
Buckley, C., Salton, G., Allan, J., and Singhal, A. 1994. Automatic query expansion using SMART: TREC3. In Proceedings of the Text Retrieval Conference (TREC-3). 69--80.
[7]
Cai, D., Yu, S., Wen, J., and Ma, W. 2004. Block-Based web search. In Proceedings of the Annual ACM SIGIR Conference on Research and Develoopment in Information Retrieval. 456--463.
[8]
Callan, J. P. 1994. Passage-Level evidence in document retrieval. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 302--310.
[9]
Croft, W. B. and Lafferty, J., Eds. 2003. Language Modeling for Information Retrieval. Information Retrieval Book Series, No. 13. Kluwer.
[10]
Denoyer, L., Zaragoza, H., and Gallinari, P. 2001. HMM-Based passage models for document classification and ranking. In Proceedings of the European Conference in IR Research (ECIR). 126--135.
[11]
Diaz, F. 2005. Regularizing ad hoc retrieval scores. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM). 672--679.
[12]
Diaz, F. and Metzler, D. 2006. Improving the estimation of relevance models using large external corpora. In Proceedings of the Annual ACM SIGIR Conference on Research and Devclopment in Information Retrieval. 154--161.
[13]
Erkan, G. 2006. Language model based document clustering using random walks. In Proceedings of the Annual Conference on Human Language Technologies and North American Chapter of the Association for Computational Linguistics (HLT/NAACL).
[14]
Erkan, G. and Radev, D. R. 2004. LexPageRank: Prestige in multi-document text summariza-tion. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 365-371. Poster.
[15]
Golub, G. H. and van Loan, C. F. 1996. Matrix Computations 3rd Ed. The Johns Hopkins University Press.
[16]
Hearst, M. A. and Plaunt, C. 1993. Subtopic structuring for full-length document access. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 56-89.
[17]
Hussain, M. 2004. Language modeling based passage retrieval for question answering systems. M.S. thesis, Saarland University.
[18]
Jiang, J. and Zhai, C. 2004. UIUC in HARD 2004—Passage retrieval using HMMs. In Proceedings of the Text Retrieval Conference (TREC-13).
[19]
Kaszkiel, M. and Zobel, J. 1997. Passage retrieval revisited. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 178--185.
[20]
Kaszkiel, M. and Zobel, J. 2001. Effective ranking with arbitrary passages. J. Amer. Soc. Inform. Sci. 52, 4, 344--364.
[21]
Kleinberg, J. 1997. Authoritative sources in a hyperlinked environment. Tech. rep., RJ 10076, IBM.
[22]
Krikon, E., Kurland, O., and Bendersky, M. 2009. Utilizing inter-passage and inter-document similarities for re-ranking search results. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM). (To appear).
[23]
Kurland, O. 2006. Inter-Document similarities, language models, and ad hoc retrieval. Ph.D. thesis, Cornell University.
[24]
Kurland, O. 2008. The opposite of smoothing: A language model approach to ranking query-specific document clusters. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval.
[25]
Kurland, O. and Lee, L. 2005. PageRank without hyperlinks: Structural re-ranking using links induced by language models. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 306--313.
[26]
Kurland, O. and Lee, L. 2006. Respect my authority! HITS without hyperlinks utilizing cluster-based language models. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 83--90.
[27]
Lafferty, J. D. and Zhai, C. 2001. Document language models, query models, and risk mini-mization for information retrieval. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 111--119.
[28]
Lavrenko, V. 2004. A generative theory of relevance. Ph.D. thesis, University of Massachusetts Amherst.
[29]
Lavrenko, V., Allan, J., DeGuzman, E., LaFlamme, D., Pollard, V., and Thomas, S. 2002. Relevance models for topic detection and tracking. In Proceedings of the Annual Meeting of the Association for Computational Linguistics on Human Language Technologies (HLT). 104--110.
[30]
Lavrenko, V. and Croft, W. B. 2001. Relevance-Based language models. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 120--127.
[31]
Lavrenko, V. and Croft, W. B. 2003. Relevance models in information retrieval. In Language Modeling for Information Retrieval, W. B. Croft and J. Lafferty, Eds. Kluwer, 11--56.
[32]
Liu, X. and Croft, W. B. 2002. Passage retrieval based on language models. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM). 375--382.
[33]
Liu, X. and Croft, W. B. 2004. Cluster-Based retrieval using language models. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 186--193.
[34]
Liu, X. and Croft, W. B. 2006. Experiments on retrieval of optimal clusters. Tech. rep. IR-478, Center for Intelligent Information Retrieval (CIIR), University of Massachusetts.
[35]
Liu, X. and Croft, W. B. 2008. Evaluating text representations for retrieval of the best group of documents. In Proceedings of the European Conference on IR Research (ECIR). 454--462.
[36]
Mei, Q., Zhang, D., and Zhai, C. 2008. A general optimization framework for smoothing language models on graph structures. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 611--618.
[37]
Mihalcea, R. 2004. Graph-Based ranking algorithms for sentence extraction, applied to text summarization. In The Companion Volume to the Proceedings of the Meeting of the Association of Computational Linguistics (ACL). 170--173.
[38]
Mihalcea, R. and Tarau, P. 2004. TextRank: Bringing order into texts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 404--411. Poster.
[39]
Mittendorf, E. and Schauble, P. 1994. Document and passage retrieval based on hidden Markov models. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 318--327.
[40]
Murdock, V. and Croft, W. B. 2005. A translation model for sentence retrieval. In Proceedings of the Annual Meeting of the Association for Computational Linguistics on Human Language Technologies and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP). 684--695.
[41]
Na, S., Kang, I., Lee, Y., and Lee, J. 2008. Completely-arbitrary passage retrieval in language modeling approach. In Proceedings of the AIRS Conference. 22--33.
[42]
Otterbacher, J., Erkan, G., and Radev, D. R. 2005. Using random walks for question-focused sentence retrieval. In Proceedings of the Annual Meeting of the Association for Computational Linguistics on Human Language Technologies and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP). 915--922.
[43]
Ponte, J. M. and Croft, W. B. 1997. Text segmentation by topic. In Proceedings of the European Conference on Research and Advanced Technology for Digital Libraries. 113--125.
[44]
Ponte, J. M. and Croft, W. B. 1998. A language modeling approach to information retrieval. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 275--281.
[45]
Salton, G., Allan, J., and Buckley, C. 1993. Approaches to passage retrieval in full text information systems. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 49--58.
[46]
Voorhees, E. M. 2005. Overview of the TREC 2005 robust retrieval task. In Proceedings of the 14th Text Retrieval Conference (TREC).
[47]
Voorhees, E. M. and Harman, D. K., Eds. 2000. The 8th Text Retrieval Conference (TREC-8). NIST.
[48]
Voorhees, E. M. and Harman, D. K. 2005. TREC: Experiments and Evlautaion in Information Retrieval. The MIT Press.
[49]
Wade, C. and Allan, J. 2005. Passage retrieval and evaluation. Tech. rep. IR-396, Center for Intelligent Information Retrieval (CIIR), University of Massachusetts.
[50]
Wan, X., Yang, J., and Xiao, J. 2008. Towards a unified approach to document similarity search using manifold-ranking of blocks. Inform. Process. Manag. 44, 3, 1032--1048.
[51]
Wang, M. and Si, L. 2008. Discriminative probabilistic models for passage based retrieval. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 419--426.
[52]
Wilkinson, R. 1994. Effective retrieval of structured documents. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 311--317.
[53]
Willett, P. 1985. Query specific automatic document classification. Int. Forum Inform. Document. 10, 2, 28--32.
[54]
Xu, J. and Croft, W. B. 1996. Query expansion using local and global document analysis. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 4--11.
[55]
Yang, L., Ji, D., Zhou, G., Nie, Y., and Xiao, G. 2006. Document re-ranking using cluster validation and label propagation. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM). 690--697.
[56]
Zhai, C. and Lafferty, J. D. 2001. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 334--342.
[57]
Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., Chen, Z., and Ma, W. 2005. Improving web search results using affinity graph. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 504--511.

Cited By

View all
  • (2024)Passage-aware Search Result DiversificationACM Transactions on Information Systems10.1145/365367242:5(1-29)Online publication date: 13-May-2024
  • (2023)Heterogeneous graph attention networks for passage retrievalInformation Retrieval10.1007/s10791-023-09424-326:1-2Online publication date: 16-Nov-2023
  • (2022)Passage Retrieval on Structured Documents Using Graph Attention NetworksAdvances in Information Retrieval10.1007/978-3-030-99739-7_2(13-21)Online publication date: 10-Apr-2022
  • Show More Cited By

Index Terms

  1. Utilizing inter-passage and inter-document similarities for reranking search results

    Recommendations

    Reviews

    Fazli Can

    With search engines, users usually examine the top ten search results. In this paper, the authors present a language-model-based approach to re-ranking search results to improve precision at the top five and top ten rank positions. They do this by re-ranking the top 50 search results. The study utilizes inter-passage and inter-document similarities. The authors base the study on the fact that a long or heterogeneous relevant document may contain parts (passages) that are not relevant to the query. Earlier passage-based studies addressed this issue, but they did not consider similarity relationships between documents or between passages. Their model integrates document-query, passage-query, inter-document, and inter-passage similarities. In the experiments that this paper discusses, the authors use five different Text Retrieval Conference (TREC) test collections. The collections show variety in terms of the number of documents they contain, the nature of the documents (such as news or Web documents), and the length of the documents. It is advantageous to have such variety to see the effectiveness of a model under different conditions. In this detailed study, the authors show that, in several cases, their model statistically and significantly outperforms many other methods. The TREC information retrieval test collections are an excellent research tool: they provide a lab environment where researchers can repeat experiments and compare the results of different studies. While reading this paper, and while conducting similar activities in my own research, a few questions came to mind: What would happen if we applied this experiment in real life__?__ Would the users feel or appreciate the difference at a statistically significant level (such as a top five precision improvement from 33.9 to 37.1, as shown in Table 1 with the TREC WT10G collection) while using the test collection__?__ Incidentally, WT10G is the most challenging test collection used in the experiments. The paper provides much better improvements for some of the other collections. While it would be beneficial to add an actual user dimension to the experiments, and eventually to the test collections, doing so is easier said than done. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Information Systems
    ACM Transactions on Information Systems  Volume 29, Issue 1
    December 2010
    232 pages
    ISSN:1046-8188
    EISSN:1558-2868
    DOI:10.1145/1877766
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 December 2010
    Accepted: 01 September 2010
    Revised: 01 September 2010
    Received: 01 September 2009
    Published in TOIS Volume 29, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Ad hoc retrieval
    2. document centrality
    3. inter-document similarities
    4. inter-passage similarities
    5. passage centrality
    6. passage-based retrieval
    7. reranking

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)13
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 26 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Passage-aware Search Result DiversificationACM Transactions on Information Systems10.1145/365367242:5(1-29)Online publication date: 13-May-2024
    • (2023)Heterogeneous graph attention networks for passage retrievalInformation Retrieval10.1007/s10791-023-09424-326:1-2Online publication date: 16-Nov-2023
    • (2022)Passage Retrieval on Structured Documents Using Graph Attention NetworksAdvances in Information Retrieval10.1007/978-3-030-99739-7_2(13-21)Online publication date: 10-Apr-2022
    • (2021)Recommending Search Queries in Documents Using Inter N-Gram SimilaritiesProceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3471158.3472252(211-220)Online publication date: 11-Jul-2021
    • (2021)A Principled Approach Using Fuzzy Set Theory for Passage-Based Document RetrievalIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2020.299011029:7(1967-1977)Online publication date: Jul-2021
    • (2020)A passage-based approach to learning to rank documentsInformation Retrieval Journal10.1007/s10791-020-09369-x23:2(159-186)Online publication date: 6-Mar-2020
    • (2019)Investigation of Passage Based Ranking Models to Improve Document RetrievalKnowledge Discovery, Knowledge Engineering and Knowledge Management10.1007/978-3-030-15640-4_6(100-117)Online publication date: 15-Mar-2019
    • (2017)A Domain-Specific Web Document Re-ranking Algorithm2017 6th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI)10.1109/IIAI-AAI.2017.125(385-390)Online publication date: Jul-2017
    • (2015)Learning Asymmetric Co-RelevanceProceedings of the 2015 International Conference on The Theory of Information Retrieval10.1145/2808194.2809454(281-290)Online publication date: 27-Sep-2015
    • (2015)Two-Stage Document Length Normalization for Information RetrievalACM Transactions on Information Systems10.1145/269966933:2(1-40)Online publication date: 17-Feb-2015
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media