Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

“Is this document relevant?…probably”: a survey of probabilistic models in information retrieval

Published: 01 December 1998 Publication History

Abstract

This article surveys probablistic approaches to modeling information retrieval. The basic concepts of probabilistic approaches to information retrieval are outlined and the principles and assumptions upon which the approaches are based are presented. The various models proposed in the development of IR are described, classified, and compared using a common formalism. New approaches that constitute the basis of future research are described.

References

[1]
AMATI, G. AND KERPEDJIEV, S. 1992. An information retrieval logical model: Implementation and experiments. Tech. Rep. Rel 5B04892 (March), Fondazione Ugo Bordoni, Roma, Italy.
[2]
AMATI, G. AND VAN RIJSBERGEN, C. J. 1995. Probability, information and information retrieval. In Proceedings of the First International Workshop on Information Retrieval, Uncertainty and Logic (Glasgow, Sept.).
[3]
BIEBRICHER, P., FUHR, N., KNORZ, G., LUSTIG, G., AND SCHWANTNER, M. 1988. The automatic indexing system AIX/PHYS--from research to application. In Proceedings of ACM SIGIR (Grenoble, France), 333-342.
[4]
BOOKSTEIN, A. AND COOPER, W.S. 1976. A general mathematical model for information retrieval systems. Libr. Quart. 46, 2.
[5]
BOOKSTEIN, A. AND SWANSON, D. 1974. Probabilistic models for automatic indexing. J. Am. Soc. Inf. Sci. 25, 5, 312-318.
[6]
BORGOGNA, G. AND PASI, G. 1993. A fuzzy linguistic approach generalizing Boolean information retrieval: A model and its evaluation. J. Am. Soc. Inf. Sci. 2, 70-82, 44.
[7]
BRUZA, P. D. 1993. Stratified information disclosure: A synthesis between hypermedia and information retrieval. Ph.D. Thesis, Katholieke Universiteit Nijmegen, The Netherlands.
[8]
BRUZA, P. D. AND VAN DER WEIDE, T. P. 1992. Stratified hypermedia structures for information disclosure. Comput. J. 35, 3, 208-220.
[9]
CAMPBELL, I. AND VAN RIJSBERGEN, C. J. 1996. The ostensive model of developing information needs. In Proceedings of CoLIS 2 (Copenhagen, Oct.), 251-268.
[10]
CHIARAMELLA, Y. AND CHEVALLET, J. P. 1992. About retrieval models and logic. Comput. J. 35, 3, 233-242.
[11]
COOPER, W. S. 1971. A definition of relevance for information retrieval. Inf. Storage Retrieval 7, 19-37.
[12]
COOPER, W. S. 1995. Some inconsistencies and misnomers in probabilistic information retrieval. ACM Trans. Inf. Syst. 13, 1, 100-111.
[13]
COOPER, W. S., GEY, F. C., AND DABNEY, D. P. 1992. Probabilistic retrieval based on staged logistic regression. In Proceedings of ACM SIGIR (Copenhagen, June), 198-210.
[14]
Cox, D. R. 1970. Analysis of Binary Data. Methuen, London.
[15]
CRESTANI, F. AND VAN RIJSBERGEN, C.J. 1995a. Information retrieval by logical imaging. J. Doc. 51, 1, 1-15.
[16]
CRESTANI, F. AND VAN RIJSBERGEN, C.J. 1995b. Probability kinematics in information retrieval. In Proceedings of ACM SIGIR (Seattle, WA, July), 291-299.
[17]
CROFT, W. B. 1987. Approaches to intelligent information retrieval. Inf. Process. Manage. 23, 4, 249-254.
[18]
CROFT, W. B. AND HARPER, D. J. 1979. Using probabilistic models of document retrieval without relevance information. J. Doc. 35, 285-295.
[19]
CROFT, W. B. AND THOMPSON, R.H. 1987. I3R: A new approach to the design of document retrieval systems. J. Am. Soc. Inf. Sci. 38, 6, 389-404.
[20]
CROFT, W. B., LUCIA, T. J. AND COHEN, P.R. 1988. Retrieving documents by plausible inference: A preliminary study. In Proceedings of ACM SIGIR (Grenoble, France, June).
[21]
CROFT, W. B., LUCIA, T. J., CRIGEAN, J., AND WIL- LET, P. 1989. Retrieving documents by plausible inference: An experimental study. Inf. Process. Manage. 25, 6, 599-614.
[22]
CROFT, W. B., SMITH, L. A., AND TURTLE, H. R. 1992. A loosely coupled integration of a text retrieval system and an object-oriented database system. In Proceedings of ACM SIGIR (Copenhagen, June), 223-232.{
[23]
DE SILVA, W. T. AND MILIDIU, R.L. 1993. Belief function model for information retrieval. J. Am. Soc. Inf. Sci. 4, 1, 10-18.
[24]
DEMPSTER, A. P. 1968. A generalization of the Bayesian inference. J. Royal Stat. Soc. 30, 205-447.
[25]
DUNLOP, M. D. 1991. Multimedia information retrieval. Ph.D. Thesis, Department of Computing Science, University of Glasgow, Glasgow.
[26]
FUHR, N. 1989. Models for retrieval with probabilistic indexing. Inf. Process. Manage. 25, 1, 55-72.
[27]
FUHR, N. 1990. A probabilistic framework for vague queries and imprecise information in databases. In Proceedings of the International Conference on Very Large Databases (Los Altos, CA), Morgan-Kaufmann, San Mateo, CA, 696-707.
[28]
FUHR, N. 1992a. Integration of probabilistic fact and text retrieval. In Proceedings of ACM SIGIR (Copenhagen, June), 211-222.
[29]
FUHR, N. 1992b. Probabilistic models in information retrieval. Comput. J. 35, 3, 243-254.
[30]
FUHR, N. 1993. A probabilistic relational model for the integration of IR and databases. In Proceedings of ACM SIGIR (Pittsburgh, PA, June), 309 -317.
[31]
FUHR, N. AND BUCKLEY, C. 1991. A probabilistic learning approach for document indexing. ACM Trans. Inf. Syst. 9, 3, 223-248.
[32]
FUHR, N. AND BUCKLEY, C. 1993. Optimizing document indexing and search term weighting based on probabilistic models. In The First Text Retrieval Conference (TREC-1), D. Harman, Ed., Special Publication 500-207. National Institute of Standards and Technology, Gaithersburg, MD, 89-100.
[33]
FUHR, N. AND KNOWRZ, G. 1984. Retrieval test evaluation of a rule based automatic indexing (AIR/PHYS). In Research and Development in Information Retrieval, C.J. van Rijsbergen, Ed., Cambridge University Press, Cambridge, UK, 391-408.
[34]
FUHR, N. AND PFEIFER, U. 1991. Combining model-oriented and description-oriented approaches for probabilistic indexing. In Proceedings of ACM SIGIR (Chicago, Oct.), 46- 56.
[35]
FUNG, R. M., CRAWFORD, S. L., APPELBAUM, L. A., AND TONG, R.M. 1990. An architecture for probabilistic concept based information retrieval. In Proceedings of ACM SIGIR (Bruxelles, Belgium, Sept.), 455-467.
[36]
GOOD, I.J. 1950. Probability and the Weighing of Evidence. Charles Griffin Symand.
[37]
HARMAN, D. 1992a. Relevance feedback and other query modification techniques. In Information Retrieval: Data Structures and Algorithms, W.B. Frakes and R. Baeza-Yates, Eds., Prentice Hall, Englewood Cliffs, NJ, Chapter 11.
[38]
HARMAN, D. 1992b. Relevance feedback revisited. In Proceedings of ACM SIGIR (Copenhagen, June), 1-10.
[39]
HARMAN, D. 1993. Overview of the first TREC conference. In Proceedings of ACM SIGIR (Pittsburgh, PA, June), 36-47.
[40]
HARMAN, D. 1996. Overview of the fifth text retrieval conference (TREC-5). In Proceeding of the TREC Conference (Gaithersburg, MD, Nov.).
[41]
HARPER, D. J. AND WALKER, A. D. M. 1992. ECLAIR: An extensible class library for information retrieval. Comput. J. 35, 3, 256-267.
[42]
HARTER, S.P. 1975. A probabilistic approach to automatic keyword indexing: Part 1. J. Am. Soc. Inf. Sci. 26, 4, 197-206.
[43]
HUIBERS, T. W. C. 1996. An axiomatic theory for information retrieval. Ph.D. Thesis, Utrecht University, The Netherlands.
[44]
JEFFREY, R. C. 1965. The Logic of Decision. McGraw-Hill, New York.
[45]
KWOK, K.L. 1990. Experiments with a component theory of probabilistic information retrieval based on single terms as document components. ACM Trans. Inf. Syst. 8, 4 (Oct.), 363-386.
[46]
LALMAS, M. 1992. A logic model of information retrieval based on situation theory. In Proceedings of the Fourteenth BCS Information Retrieval Colloquium (Lancaster, UK, Dec.).
[47]
LALMAS, M. 1997. Logical models in information retrieval: Introduction and overview. Inf. Process. Manage. 34, 1, 19-33.
[48]
MARGULIS, E. L. 1992. N-Poisson document modelling. In Proceedings of ACM SIGIR (Copenhagen, June), 177-189.
[49]
MARGULIS, E. L. 1993. Modelling documents with multiple Poisson distributions. Inf. Process. Manage. 29, 2, 215-227.
[50]
MARON, M. E. AND KUHNS, J.L. 1960. On relevance, probabilistic indexing and retrieval. J. ACM 7, 216-244.
[51]
MILLER, W. L. 1971. A probabilistic search strategy for MEDLARS. J. Doc. 27, 254-266.
[52]
MIZZARO, S. 1996. Relevance: The whole (hi)story. Tech. Rep. UDMI/12/96/RR (Dec.), Dipartimento di Matematica e Informatica, Universita' di Udine, Italy.
[53]
NEAPOLITAN, R. E. 1990. Probabilistic Reasoning in Expert Systems. Wiley, New York.
[54]
NIL, J.Y. 1988. An outline of a general model for information retrieval. In Proceedings of ACM SIGIR (Grenoble, France, June), 495- 506.
[55]
NIL, J.Y. 1989. An information retrieval model based on modal logic. Inf. Process. Manage. 25, 5, 477-491.
[56]
NIL, J.Y. 1992. Towards a probabilistic modal logic for semantic based information retrieval. In Proceedings of ACM SIGIR (Copenhagen, June), 140-151.
[57]
NIL, J. Y., LEPAGE, F., AND BRISEBOIS, M. 1996. Information retrieval as counterfactual. Comput. J. 38, 8, 643-657.
[58]
PEARL, J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan-Kaufmann, San Mateo, CA.
[59]
PEARL, g. 1990. Jeffrey's rule, passage of experience and Neo-Bayesianism. In Knowledge Representation and Defeasible Reasoning, H. E. Kyburg, R. P. Luoi, and G. N. Carlson, Eds., Kluwer Academic, Dordrecht, The Netherlands, 245-265.
[60]
ROBERTSON, S.E. 1977. The probability ranking principle in IR. J. Doc. 33, 4 (Dec.), 294-304.
[61]
ROBERTSON, S. E. AND SPARCK JONES, K. 1976. Relevance weighting of search terms. J. Am. Soc. Inf. Sci. 27, 129-146.
[62]
ROBERTSON, S. E. AND WALKER, S. 1994. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In Proceedings of ACM SIGIR (Dublin, June), 232-241.
[63]
ROBERTSON, S. E., MARON, M. E., AND COOPER, W.S. 1982. Probability of relevance: A unification of two competing models for document retrieval. Inf. Technol. Res. Dev. 1, 1-21.
[64]
SALTON, G. 1968. Automatic Information Organization and Retrieval. McGraw-Hill, New York.
[65]
SAVOY, J. 1992. Bayesian inference networks and spreading activation in hypertext systerns. Inf. Process. Manage. 28, 3, 389-406.
[66]
SCHOKEN, S. S. AND HUMMEL, R.A. 1993. On the use of Dempster-Shafer model in information indexing and retrieval applications. Int. J. Man-Mach. Stud. 39, 1-37.
[67]
SEBASTIANI, F. 1994. A probabilistic terminlogical logic for modelling information retrieval. In Proceedings of ACM SIGIR (Dublin, June), 122-131.
[68]
SEMBOK, T. M. T. AND VAN RIJSBERGEN, C. J. 1993. Imaging: A relevance feedback retrieval with nearest neighbour clusters. In Proceedings of the BCS Colloquium in Information Retrieval (Glasgow, March), 91-107.
[69]
SERACEVIC, T. 1970. The concept of "relevance" in information science: A historical review. In Introduction to Information Science, T. Seracevic, Ed., R. R. Bower, New York, Chapter 14.
[70]
SHAPER, G. 1976. A Mathematical Theory of Evidence. Princeton University Press, Princeton, NJ.
[71]
SMITH, S. AND STANFILL, C. 1988. An analysis of the effects of data corruption on text retrieval performance. Tech. Rep. (Dec.), Thinking Machines Corporation, Cambridge, MA.
[72]
SPARCK JONES, K. 1981. Information Retrieval Experiments. Butterworth, London.
[73]
STALNAKER, R. 1981. Probability and conditionals. In Ifs, W. L. Harper, R. Stalnaker, and G. Pearce, Eds., The University of Western Ontario Series in Philosophy of Science, D. Riedel, Dordrecht, Holland, 107-128.
[74]
THOMPSON, P. 1990a. A combination of expert opinion approach to probabilistic information retrieval. Part 1: The conceptual model. Inf. Process. Manage. 26, 3, 371-382.
[75]
THOMPSON, P. 1990b. A combination of expert opinion approach to probabilistic information retrieval. Part 2: Mathematical treatment of CEO model 3. Inf. Process. Manage. 26, 3, 383-394.{
[76]
THOMPSON, R.H. 1989. The design and implementation of an intelligent interface for information retrieval. Tech. Rep., Computer and Information Science Department, University of Massachusetts, Amherst, MA.
[77]
TURTLE, H.R. 1990. Inference networks for document retrieval. Ph.D. Thesis, Computer and Information Science Department, University of Massachusetts, Amherst, MA.
[78]
TURTLE, H. R. AND CROFT, W.B. 1990. Inference networks for document retrieval. In Proceedings of ACM SIGIR (Brussels, Belgium, Sept.).
[79]
TURTLE, H. R. AND CROFT, W.B. 1991. Evaluation of an inference network-based retrieval model. ACM Trans. Inf. Syst. 9, 3 (July), 187-222.
[80]
TURTLE, H. R. AND CROFT, W.B. 1992a. A comparison of text retrieval models. Comput. J. 35, 3 (June), 279-290.
[81]
TURTLE, H. R. AND CROFT, W.B. 1992b. Uncertainty in information retrieval systems. Unpublished paper.
[82]
VAN RIJSBERGEN, C.J. 1977. A theoretical basis for the use of co-occurrence data in information retrieval. J. Doc. 33, 2 (June), 106-119.
[83]
VAN RIJSBERGEN, C. J. 1979. Information Retrieval (Second ed.). Butterworths, London.
[84]
VAN RIJSBERGEN, C. J. 1986. A non-classical logic for information retrieval. Comput. J. 29, 6, 481-485.
[85]
VAN RIJSBERGEN, C.J. 1989. Toward a new information logic. In Proceedings of ACM SIGIR (Cambridge, MA, June).
[86]
VAN RIJSBERGEN, C. J. 1992. Probabilistic retrieval revisited. Departmental Research Report 1992/R2 (Jan.), Computing Science Department, University of Glasgow, Glasgow.
[87]
VAN RIJSBERGEN, C. J. AND LALMAS, M. 1996. An information calculus for information retrieval. J. Am. Soc. Inf. Sci. 47, 5, 385-398.
[88]
WONG, S. K. M. AND YAO, Y.Y. 1989. A probability distribution model for information retrieval. Inf. Process. Manage. 25, 1, 39-53.
[89]
WONG, S. K. M. AND YAO, Y. Y. 1995. On modelling information retrieval with probabilistic inference. ACM Trans. Inf. Syst. 13, 1, 38-68.
[90]
ZADEH, L. A. 1987. Fuzzy Sets and Applications: Selected Papers. Wiley, New York.

Cited By

View all
  • (2024)Semantic Ranking for Automated Adversarial Technique Annotation in Security TextProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3645000(49-62)Online publication date: 1-Jul-2024
  • (2024)Improving Performance of Neural IR Models by Using a Keyword-Extraction-Based Weak-Supervision MethodIEEE Access10.1109/ACCESS.2024.338219012(46851-46863)Online publication date: 2024
  • (2024)An Inverse Retrieval Method via Query Generation for Xiaohongshu’s Search EngineAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5675-9_31(362-373)Online publication date: 1-Aug-2024
  • Show More Cited By

Recommendations

Reviews

Karen Sparck-Jones

This useful review provides a competent, clear, and acces sible account of retrieval models that take probability as their grounding notion in defining the relevance relation between queries and documents. These models are divided into two broad classes: those treating the probability of relevance of a document to a query in a direct way (statistical models) and those allowing indirection through reasoning (inference models). After presenting the pertinent generic information retrieval notions, the paper describes eight approaches in the first class and two in the second. As it covers the source material well and uses a common formalism that makes comparisons between the different approaches easier, the paper will be a valuable reference for students and beginning researchers. The paper has a major weakness, however. There is no detail on, assessment of, or serious literature reference to experiments where the models have been applied. The paper's primary emphasis on models is legitimate. However, there is no proper indication of the extent to which the different models have been implemented, of the range of retrieval tests conducted, or, especially, of direct performance comparisons between them. Much model development has been promoted by actual retrieval experiments, and more than one model from the first class and one from the second have been heavily applied. The NIST/DARPA Text Retrieval Conferences (TREC) [ 1] have, moreover, allowed direct comparisons between these models, which are important in providing evidence both for their relative merits and for their performance when compared with other nonprobabilistic models or even with wholly informal approaches. Even at the time the paper was completed (1997), there was a substantial TREC literature.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 30, Issue 4
Dec. 1998
142 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/299917
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 1998
Published in CSUR Volume 30, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. information retrieval
  2. probabilistic indexing
  3. probabilistic modeling
  4. probabilistic retrieval
  5. uncertain inference modeling

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)155
  • Downloads (Last 6 weeks)16
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Semantic Ranking for Automated Adversarial Technique Annotation in Security TextProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3645000(49-62)Online publication date: 1-Jul-2024
  • (2024)Improving Performance of Neural IR Models by Using a Keyword-Extraction-Based Weak-Supervision MethodIEEE Access10.1109/ACCESS.2024.338219012(46851-46863)Online publication date: 2024
  • (2024)An Inverse Retrieval Method via Query Generation for Xiaohongshu’s Search EngineAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5675-9_31(362-373)Online publication date: 1-Aug-2024
  • (2023)A typology of research discovery toolsJournal of Information Science10.1177/0165551521104065449:4(1086-1095)Online publication date: 1-Aug-2023
  • (2023)Automated Keyphrase Generation for Brazilian Legal Information Retrieval2023 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN54540.2023.10191598(1-8)Online publication date: 18-Jun-2023
  • (2023)Diachronic Named Entity Disambiguation for Ancient Chinese Historical RecordsNeural Information Processing10.1007/978-981-99-8145-8_24(305-319)Online publication date: 27-Nov-2023
  • (2022)Mitigating Bias in Search Results Through Contextual Document Reranking and Neutrality RegularizationProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531891(2532-2538)Online publication date: 6-Jul-2022
  • (2022)Generating Personalized Phishing Emails for Social Engineering Training Based on Neural Language ModelsAdvances on Broad-Band Wireless Computing, Communication and Applications10.1007/978-3-031-20029-8_26(270-281)Online publication date: 18-Oct-2022
  • (2021)4 KidRec - what does good look likeACM SIGIR Forum10.1145/3483382.348339154:2(1-7)Online publication date: 20-Aug-2021
  • (2019)A Natural-language-based Visual Query Approach of Uncertain Human TrajectoriesIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2019.2934671(1-1)Online publication date: 2019
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media