Abstract
We show that several previously proposed passage-based document ranking principles, along with some new ones, can be derived from the same probabilistic model. We use language models to instantiate specific algorithms, and propose a passage language model that integrates information from the ambient document to an extent controlled by the estimated document homogeneity. Several document-homogeneity measures that we propose yield passage language models that are more effective than the standard passage model for basic document retrieval and for constructing and utilizing passage-based relevance models; the latter outperform a document-based relevance model. We also show that the homogeneity measures are effective means for integrating document-query and passage-query similarity information for document retrieval.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Salton, G., Allan, J., Buckley, C.: Approaches to passage retrieval in full text information systems. In: Proceedings of SIGIR, pp. 49–58 (1993)
Callan, J.P.: Passage-level evidence in document retrieval. In: Proceedings of SIGIR, pp. 302–310 (1994)
Mittendorf, E., Schäuble, P.: Document and passage retrieval based on hidden Markov models. In: Proceedings of SIGIR, pp. 318–327 (1994)
Wilkinson, R.: Effective retrieval of structured documents. In: Proceedings of SIGIR, pp. 311–317 (1994)
Kaszkiel, M., Zobel, J.: Passage retrieval revisited. In: Proceedings of SIGIR, pp. 178–185 (1997)
Denoyer, L., Zaragoza, H., Gallinari, P.: HMM-based passage models for document classification and ranking. In: Proceedings of ECIR, pp. 126–135 (2001)
Kaszkiel, M., Zobel, J.: Effective ranking with arbitrary passages. Journal of the American Society for Information Science 52(4), 344–364 (2001)
Liu, X., Croft, W.B.: Passage retrieval based on language models. In: Proceedings of the 11th International Conference on Information and Knowledge Managment (CIKM), pp. 375–382 (2002)
Croft, W.B., Lafferty, J. (eds.): Language Modeling for Information Retrieval. Information Retrieval Book Series, vol. 13. Kluwer, Dordrecht (2003)
Lavrenko, V., Croft, W.B.: Relevance-based language models. In: Proceedings of SIGIR, pp. 120–127 (2001)
Lavrenko, V.: A Generative Theory of Relevance. PhD thesis, University of Massachusetts Amherst (2004)
Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of SIGIR, pp. 275–281 (1998)
Kurland, O., Lee, L.: Corpus structure, language models, and ad hoc information retrieval. In: Proceedings of SIGIR, pp. 194–201 (2004)
Buckley, C., Salton, G., Allan, J., Singhal, A.: Automatic query expansion using SMART: TREC3. In: Proceedings of of the Third Text Retrieval Conference (TREC-3), pp. 69–80 (1994)
Cai, D., Yu, S., Wen, J.R., Ma, W.Y.: Block-based web search. In: Proceedings of SIGIR, pp. 456–463 (2004)
Zhai, C., Lafferty, J.D.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of SIGIR, pp. 334–342 (2001)
Abdul-Jaleel, N., Allan, J., Croft, W.B., Diaz, F., Larkey, L., Li, X., Smucker, M.D., Wade, C.: UMASS at TREC 2004 — novelty and hard. In: Proceedings of the Thirteenth Text Retrieval Conference (TREC-13) (2004)
Hussain, M.: Language modeling based passage retrieval for question answering systems. Master’s thesis, Saarland University (2004)
Ogilvie, P., Callan, J.: Hierarchical language models for XML component retrieval. In: Proceedings of INEX (2004)
Murdock, V., Croft, W.B.: A translation model for sentence retrieval. In: Proceedings of HLT/EMNLP, pp. 684–695 (2005)
Sigurbjörnsson, B., Kamps, J.: The effect of structured queries and selective indexing on XML retrieval. In: Proceedings of INEX, pp. 104–118 (2005)
Wade, C., Allan, J.: Passage retrieval and evaluation. Technical Report IR-396, Center for Intelligent Information Retrieval (CIIR), University of Massachusetts (2005)
Kurland, O., Lee, L.: PageRank without hyperlinks: Structural re-ranking using links induced by language models. In: Proceedings of SIGIR, pp. 306–313 (2005)
Corrada-Emmanuel, A., Croft, W.B., Murdock, V.: Answer passage retrieval for question answering. Technical Report IR-283, Center for Intelligent Information Retrieval, University of Massachusetts (2003)
Zhang, D., Lee, W.S.: A language modeling approach to passage question answering. In: Proceedings of the Twelfth Text Retrieval Conference (TREC-12), pp. 489–495 (2004)
Jiang, J., Zhai, C.: UIUC in HARD 2004 — passage retrieval using HMMs. In: Proceedings of the Thirteenth Text Retrieval Conference (TREC-13) (2004)
Kurland, O., Lee, L., Domshlak, C.: Better than the real thing? Iterative pseudo-query processing using cluster-based language models. In: Proceedings of SIGIR, pp. 19–26 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bendersky, M., Kurland, O. (2008). Utilizing Passage-Based Language Models for Document Retrieval. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds) Advances in Information Retrieval. ECIR 2008. Lecture Notes in Computer Science, vol 4956. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78646-7_17
Download citation
DOI: https://doi.org/10.1007/978-3-540-78646-7_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78645-0
Online ISBN: 978-3-540-78646-7
eBook Packages: Computer ScienceComputer Science (R0)