research-article

Utilizing inter-passage and inter-document similarities for reranking search results

Authors:

Eyal Krikon,

Oren Kurland,

Michael BenderskyAuthors Info & Claims

ACM Transactions on Information Systems (TOIS), Volume 29, Issue 1

Article No.: 3, Pages 1 - 28

https://doi.org/10.1145/1877766.1877769

Published: 27 December 2010 Publication History

Get Access

Abstract

We present a novel language-model-based approach to reranking search results; that is, reordering the documents in an initially retrieved list so as to improve precision at top ranks. Our model integrates whole-document information with that induced from passages. Specifically, inter-passage, inter-document, and query-based similarities, which constitute a rich source of information, are combined in our model. Empirical evaluation shows that the precision-at-top-ranks performance of our model is substantially better than that of the initial ranking upon which reranking is performed. Furthermore, the performance is substantially better than that of a commonly used passage-based document ranking method that does not exploit inter-item similarities. Our model also generalizes and outperforms a recently proposed reranking method that utilizes inter-document similarities, but which does not exploit passage-based information. Finally, the model's performance is superior to that of a state-of-the-art pseudo-feedback-based retrieval approach.

References

[1]

Abdul-Jaleel, N., Allan, J., Croft, W. B., Diaz, F., Larkey, L., Li, X., Smucker, M. D., and Wade, C. 2004. UMASS at TREC 2004—Novelty and hard. In Proceedings of the 13^th Text Retrieval Conference (TREC-13). 715--725.

Google Scholar

[2]

Balinski, J. and Danilowicz, C. 2005. Re-Ranking method based on inter-document distances. Inform. Process. Manag. 41, 4, 759--775.

Digital Library

Google Scholar

[3]

Bendersky, M. and Kurland, O. 2008a. Re-Ranking search results using document-passage graphs. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 853--854. Poster.

Digital Library

Google Scholar

[4]

Bendersky, M. and Kurland, O. 2008b. Utilizing passage-based language models for document retrieval. In Proceedings of the European Conference on IR Research (ECIR). 162--174.

Digital Library

Google Scholar

[5]

Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the World Wide Web Conference. 107--117.

Digital Library

Google Scholar

[6]

Buckley, C., Salton, G., Allan, J., and Singhal, A. 1994. Automatic query expansion using SMART: TREC3. In Proceedings of the Text Retrieval Conference (TREC-3). 69--80.

Google Scholar

[7]

Cai, D., Yu, S., Wen, J., and Ma, W. 2004. Block-Based web search. In Proceedings of the Annual ACM SIGIR Conference on Research and Develoopment in Information Retrieval. 456--463.

Digital Library

Google Scholar

[8]

Callan, J. P. 1994. Passage-Level evidence in document retrieval. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 302--310.

Digital Library

Google Scholar

[9]

Croft, W. B. and Lafferty, J., Eds. 2003. Language Modeling for Information Retrieval. Information Retrieval Book Series, No. 13. Kluwer.

Digital Library

Google Scholar

[10]

Denoyer, L., Zaragoza, H., and Gallinari, P. 2001. HMM-Based passage models for document classification and ranking. In Proceedings of the European Conference in IR Research (ECIR). 126--135.

Google Scholar

[11]

Diaz, F. 2005. Regularizing ad hoc retrieval scores. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM). 672--679.

Digital Library

Google Scholar

[12]

Diaz, F. and Metzler, D. 2006. Improving the estimation of relevance models using large external corpora. In Proceedings of the Annual ACM SIGIR Conference on Research and Devclopment in Information Retrieval. 154--161.

Digital Library

Google Scholar

[13]

Erkan, G. 2006. Language model based document clustering using random walks. In Proceedings of the Annual Conference on Human Language Technologies and North American Chapter of the Association for Computational Linguistics (HLT/NAACL).

Digital Library

Google Scholar

[14]

Erkan, G. and Radev, D. R. 2004. LexPageRank: Prestige in multi-document text summariza-tion. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 365-371. Poster.

Google Scholar

[15]

Golub, G. H. and van Loan, C. F. 1996. Matrix Computations 3rd Ed. The Johns Hopkins University Press.

Digital Library

Google Scholar

[16]

Hearst, M. A. and Plaunt, C. 1993. Subtopic structuring for full-length document access. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 56-89.

Digital Library

Google Scholar

[17]

Hussain, M. 2004. Language modeling based passage retrieval for question answering systems. M.S. thesis, Saarland University.

Google Scholar

[18]

Jiang, J. and Zhai, C. 2004. UIUC in HARD 2004—Passage retrieval using HMMs. In Proceedings of the Text Retrieval Conference (TREC-13).

Google Scholar

[19]

Kaszkiel, M. and Zobel, J. 1997. Passage retrieval revisited. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 178--185.

Digital Library

Google Scholar

[20]

Kaszkiel, M. and Zobel, J. 2001. Effective ranking with arbitrary passages. J. Amer. Soc. Inform. Sci. 52, 4, 344--364.

Digital Library

Google Scholar

[21]

Kleinberg, J. 1997. Authoritative sources in a hyperlinked environment. Tech. rep., RJ 10076, IBM.

Google Scholar

[22]

Krikon, E., Kurland, O., and Bendersky, M. 2009. Utilizing inter-passage and inter-document similarities for re-ranking search results. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM). (To appear).

Digital Library

Google Scholar

[23]

Kurland, O. 2006. Inter-Document similarities, language models, and ad hoc retrieval. Ph.D. thesis, Cornell University.

Digital Library

Google Scholar

[24]

Kurland, O. 2008. The opposite of smoothing: A language model approach to ranking query-specific document clusters. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval.

Digital Library

Google Scholar

[25]

Kurland, O. and Lee, L. 2005. PageRank without hyperlinks: Structural re-ranking using links induced by language models. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 306--313.

Digital Library

Google Scholar

[26]

Kurland, O. and Lee, L. 2006. Respect my authority&excl; HITS without hyperlinks utilizing cluster-based language models. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 83--90.

Digital Library

Google Scholar

[27]

Lafferty, J. D. and Zhai, C. 2001. Document language models, query models, and risk mini-mization for information retrieval. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 111--119.

Digital Library

Google Scholar

[28]

Lavrenko, V. 2004. A generative theory of relevance. Ph.D. thesis, University of Massachusetts Amherst.

Digital Library

Google Scholar

[29]

Lavrenko, V., Allan, J., DeGuzman, E., LaFlamme, D., Pollard, V., and Thomas, S. 2002. Relevance models for topic detection and tracking. In Proceedings of the Annual Meeting of the Association for Computational Linguistics on Human Language Technologies (HLT). 104--110.

Digital Library

Google Scholar

[30]

Lavrenko, V. and Croft, W. B. 2001. Relevance-Based language models. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 120--127.

Digital Library

Google Scholar

[31]

Lavrenko, V. and Croft, W. B. 2003. Relevance models in information retrieval. In Language Modeling for Information Retrieval, W. B. Croft and J. Lafferty, Eds. Kluwer, 11--56.

Google Scholar

[32]

Liu, X. and Croft, W. B. 2002. Passage retrieval based on language models. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM). 375--382.

Digital Library

Google Scholar

[33]

Liu, X. and Croft, W. B. 2004. Cluster-Based retrieval using language models. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 186--193.

Digital Library

Google Scholar

[34]

Liu, X. and Croft, W. B. 2006. Experiments on retrieval of optimal clusters. Tech. rep. IR-478, Center for Intelligent Information Retrieval (CIIR), University of Massachusetts.

Google Scholar

[35]

Liu, X. and Croft, W. B. 2008. Evaluating text representations for retrieval of the best group of documents. In Proceedings of the European Conference on IR Research (ECIR). 454--462.

Digital Library

Google Scholar

[36]

Mei, Q., Zhang, D., and Zhai, C. 2008. A general optimization framework for smoothing language models on graph structures. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 611--618.

Digital Library

Google Scholar

[37]

Mihalcea, R. 2004. Graph-Based ranking algorithms for sentence extraction, applied to text summarization. In The Companion Volume to the Proceedings of the Meeting of the Association of Computational Linguistics (ACL). 170--173.

Digital Library

Google Scholar

[38]

Mihalcea, R. and Tarau, P. 2004. TextRank: Bringing order into texts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 404--411. Poster.

Google Scholar

[39]

Mittendorf, E. and Schauble, P. 1994. Document and passage retrieval based on hidden Markov models. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 318--327.

Digital Library

Google Scholar

[40]

Murdock, V. and Croft, W. B. 2005. A translation model for sentence retrieval. In Proceedings of the Annual Meeting of the Association for Computational Linguistics on Human Language Technologies and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP). 684--695.

Digital Library

Google Scholar

[41]

Na, S., Kang, I., Lee, Y., and Lee, J. 2008. Completely-arbitrary passage retrieval in language modeling approach. In Proceedings of the AIRS Conference. 22--33.

Digital Library

Google Scholar

[42]

Otterbacher, J., Erkan, G., and Radev, D. R. 2005. Using random walks for question-focused sentence retrieval. In Proceedings of the Annual Meeting of the Association for Computational Linguistics on Human Language Technologies and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP). 915--922.

Digital Library

Google Scholar

[43]

Ponte, J. M. and Croft, W. B. 1997. Text segmentation by topic. In Proceedings of the European Conference on Research and Advanced Technology for Digital Libraries. 113--125.

Digital Library

Google Scholar

[44]

Ponte, J. M. and Croft, W. B. 1998. A language modeling approach to information retrieval. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 275--281.

Digital Library

Google Scholar

[45]

Salton, G., Allan, J., and Buckley, C. 1993. Approaches to passage retrieval in full text information systems. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 49--58.

Digital Library

Google Scholar

[46]

Voorhees, E. M. 2005. Overview of the TREC 2005 robust retrieval task. In Proceedings of the 14^th Text Retrieval Conference (TREC).

Google Scholar

[47]

Voorhees, E. M. and Harman, D. K., Eds. 2000. The 8th Text Retrieval Conference (TREC-8). NIST.

Crossref

Google Scholar

[48]

Voorhees, E. M. and Harman, D. K. 2005. TREC: Experiments and Evlautaion in Information Retrieval. The MIT Press.

Digital Library

Google Scholar

[49]

Wade, C. and Allan, J. 2005. Passage retrieval and evaluation. Tech. rep. IR-396, Center for Intelligent Information Retrieval (CIIR), University of Massachusetts.

Google Scholar

[50]

Wan, X., Yang, J., and Xiao, J. 2008. Towards a unified approach to document similarity search using manifold-ranking of blocks. Inform. Process. Manag. 44, 3, 1032--1048.

Digital Library

Google Scholar

[51]

Wang, M. and Si, L. 2008. Discriminative probabilistic models for passage based retrieval. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 419--426.

Digital Library

Google Scholar

[52]

Wilkinson, R. 1994. Effective retrieval of structured documents. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 311--317.

Digital Library

Google Scholar

[53]

Willett, P. 1985. Query specific automatic document classification. Int. Forum Inform. Document. 10, 2, 28--32.

Google Scholar

[54]

Xu, J. and Croft, W. B. 1996. Query expansion using local and global document analysis. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 4--11.

Digital Library

Google Scholar

[55]

Yang, L., Ji, D., Zhou, G., Nie, Y., and Xiao, G. 2006. Document re-ranking using cluster validation and label propagation. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM). 690--697.

Digital Library

Google Scholar

[56]

Zhai, C. and Lafferty, J. D. 2001. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 334--342.

Digital Library

Google Scholar

[57]

Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., Chen, Z., and Ma, W. 2005. Improving web search results using affinity graph. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 504--511.

Digital Library

Google Scholar

Cited By

View all

Su ZDou ZZhu YWen J(2024)Passage-aware Search Result DiversificationACM Transactions on Information Systems10.1145/365367242:5(1-29)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3653672
Albarede LMulhem PGoeuriot LMarié SLe Pape-Gardeux CChardin-Segui T(2023)Heterogeneous graph attention networks for passage retrievalInformation Retrieval10.1007/s10791-023-09424-326:1-2Online publication date: 16-Nov-2023
https://dl.acm.org/doi/10.1007/s10791-023-09424-3
Albarede LMulhem PGoeuriot LLe Pape-Gardeux CMarie SChardin-Segui T(2022)Passage Retrieval on Structured Documents Using Graph Attention NetworksAdvances in Information Retrieval10.1007/978-3-030-99739-7_2(13-21)Online publication date: 10-Apr-2022
https://dl.acm.org/doi/10.1007/978-3-030-99739-7_2
Show More Cited By

Index Terms

Utilizing inter-passage and inter-document similarities for reranking search results
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Re-ranking search results using document-passage graphs
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

We present a novel passage-based approach to re-ranking documents in an initially retrieved list so as to improve precision at top ranks. While most work on passage-based document retrieval ranks a document based on the query similarity of its ...
Utilizing Inter-Passage Similarities for Focused Retrieval
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

Our main goal is studying the merits of using inter-passage similarities for the task of focused retrieval; i.e., ranking passages in documents by their relevance to an information need expressed by a query. As an initial research direction we study the ...
Utilizing inter-document similarities in federated search
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

We demonstrate the merits of using inter-document similarities for federated search. Specifically, we study a results merging method that utilizes information induced from clusters of similar documents created across the lists retrieved from the ...

Reviews

Reviewer: Fazli Can

With search engines, users usually examine the top ten search results. In this paper, the authors present a language-model-based approach to re-ranking search results to improve precision at the top five and top ten rank positions. They do this by re-ranking the top 50 search results. The study utilizes inter-passage and inter-document similarities. The authors base the study on the fact that a long or heterogeneous relevant document may contain parts (passages) that are not relevant to the query. Earlier passage-based studies addressed this issue, but they did not consider similarity relationships between documents or between passages. Their model integrates document-query, passage-query, inter-document, and inter-passage similarities. In the experiments that this paper discusses, the authors use five different Text Retrieval Conference (TREC) test collections. The collections show variety in terms of the number of documents they contain, the nature of the documents (such as news or Web documents), and the length of the documents. It is advantageous to have such variety to see the effectiveness of a model under different conditions. In this detailed study, the authors show that, in several cases, their model statistically and significantly outperforms many other methods. The TREC information retrieval test collections are an excellent research tool: they provide a lab environment where researchers can repeat experiments and compare the results of different studies. While reading this paper, and while conducting similar activities in my own research, a few questions came to mind: What would happen if we applied this experiment in real life__?__ Would the users feel or appreciate the difference at a statistically significant level (such as a top five precision improvement from 33.9 to 37.1, as shown in Table 1 with the TREC WT10G collection) while using the test collection__?__ Incidentally, WT10G is the most challenging test collection used in the experiments. The paper provides much better improvements for some of the other collections. While it would be beneficial to add an actual user dimension to the experiments, and eventually to the test collections, doing so is easier said than done. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

ACM Transactions on Information Systems Volume 29, Issue 1

December 2010

232 pages

ISSN:1046-8188

EISSN:1558-2868

DOI:10.1145/1877766

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 December 2010

Accepted: 01 September 2010

Revised: 01 September 2010

Received: 01 September 2009

Published in TOIS Volume 29, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Israel Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
559
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)2

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Su ZDou ZZhu YWen J(2024)Passage-aware Search Result DiversificationACM Transactions on Information Systems10.1145/365367242:5(1-29)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3653672
Albarede LMulhem PGoeuriot LMarié SLe Pape-Gardeux CChardin-Segui T(2023)Heterogeneous graph attention networks for passage retrievalInformation Retrieval10.1007/s10791-023-09424-326:1-2Online publication date: 16-Nov-2023
https://dl.acm.org/doi/10.1007/s10791-023-09424-3
Albarede LMulhem PGoeuriot LLe Pape-Gardeux CMarie SChardin-Segui T(2022)Passage Retrieval on Structured Documents Using Graph Attention NetworksAdvances in Information Retrieval10.1007/978-3-030-99739-7_2(13-21)Online publication date: 10-Apr-2022
https://dl.acm.org/doi/10.1007/978-3-030-99739-7_2
Sheetrit EFyodorov YRaiber FKurland OHasibi FFang YAizawa A(2021)Recommending Search Queries in Documents Using Inter N-Gram SimilaritiesProceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3471158.3472252(211-220)Online publication date: 11-Jul-2021
https://dl.acm.org/doi/10.1145/3471158.3472252
Dang ELuk RAllan J(2021)A Principled Approach Using Fuzzy Set Theory for Passage-Based Document RetrievalIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2020.299011029:7(1967-1977)Online publication date: Jul-2021
https://doi.org/10.1109/TFUZZ.2020.2990110
Sheetrit EShtok AKurland O(2020)A passage-based approach to learning to rank documentsInformation Retrieval Journal10.1007/s10791-020-09369-x23:2(159-186)Online publication date: 6-Mar-2020
https://doi.org/10.1007/s10791-020-09369-x
Sarwar GO’Riordan CNewell J(2019)Investigation of Passage Based Ranking Models to Improve Document RetrievalKnowledge Discovery, Knowledge Engineering and Knowledge Management10.1007/978-3-030-15640-4_6(100-117)Online publication date: 15-Mar-2019
https://doi.org/10.1007/978-3-030-15640-4_6
Zhao GZhang X(2017)A Domain-Specific Web Document Re-ranking Algorithm2017 6th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI)10.1109/IIAI-AAI.2017.125(385-390)Online publication date: Jul-2017
https://doi.org/10.1109/IIAI-AAI.2017.125
Raiber FKurland ORadlinski FShokouhi MAllan JCroft Bde Vries AZhai C(2015)Learning Asymmetric Co-RelevanceProceedings of the 2015 International Conference on The Theory of Information Retrieval10.1145/2808194.2809454(281-290)Online publication date: 27-Sep-2015
https://dl.acm.org/doi/10.1145/2808194.2809454
Na S(2015)Two-Stage Document Length Normalization for Information RetrievalACM Transactions on Information Systems10.1145/269966933:2(1-40)Online publication date: 17-Feb-2015
https://dl.acm.org/doi/10.1145/2699669
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Re-ranking search results using document-passage graphs

Utilizing Inter-Passage Similarities for Focused Retrieval

Utilizing inter-document similarities in federated search

Reviews

Access critical reviews of Computing literature here