Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

On modeling information retrieval with probabilistic inference

Published: 02 January 1995 Publication History

Abstract

This article examines and extends the logical models of information retrieval in the context of probability theory. The fundamental notions of term weights and relevance are given probabilistic interpretations. A unified framework is developed for modeling the retrieval process with probabilistic inference. This new approach provides a common conceptual and mathematical basis for many retrieval models, such as the Boolean, fuzzy set, vector space, and conventional probabilistic models. Within this framework, the underlying assumptions employed by each model are identified, and the inherent relationships between these models are analyzed. Although this article is mainly a theoretical analysis of probabilistic inference for information retrieval, practical methods for estimating the required probabilities are provided by simple examples.

References

[1]
ADAMS, E.W. 1975. The Logic ofCondtttonals. Reidel, Dordrecht.
[2]
BELLMAN, R. E. AND ZADEH, L.A. 1970 Decislon-making in fuzzy environment Manage. Sci. 17, 141-164.
[3]
BOOKSTEIN, A. 1985. Implications of Boolean structure for information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development ~n Information Retrieval. ACM, New York, 11-17.
[4]
BUELL, D. A. AND KRAFT, D.H. 1981a. Threshold values and Boolean retrieval systems. Inf. Process. Manage. 17, 127-136
[5]
BUELL, D. A. AND KRAFT, D.H. 1981b. A model for a weighted retrieval system. J. Am. Soc. Inf. Sci. 32, 211-216.
[6]
CARNAP, R. 1971. Inductive logic and rational decision. In Studies m Inductive Logic and Probab~ltty. Vol. 1. University of California Press, Berkeley, Calif., 5-31.
[7]
CAR~, R. 1962. Logical Foundations of Probability. 2nd ed. University of Chicago Press, Chicago, Ill.
[8]
CmAR~IELLA, Y. AND CHEVALLET, J.P. 1992. About retrieval models and logic. Comput. J. 35, 233 242.
[9]
C~ow, C. K. AND LIu, C. N. 1968. Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theor. IT-14, 462-467.
[10]
COOPER, W.S. 1971. A definition of relevance for information retrieval. Inf. Storage Retriev. 7, 19-37
[11]
CROFT, W. B. AND HARPER, D. J. 1979. Using probabilistic models of document retrieval without relevance information. J. Doc. 35, 285 295.
[12]
CROFT, W. B., CRINGEAN, T. J. J., AND WILLETT, P. 1989. Retrieving documents by plausible inference: An experimental study. Inf. Process. Manage. 25, 599-614.
[13]
DE FINET~I, B. 1974. Theory of Probability. Wiley, New York.
[14]
FINE, T L. 1973. Theorzes of Probability: An Examination of Foundatzons Academic Press, New York.
[15]
FUHR, N. 1992. Probabilistic models in information retrieval. Comput. J. 35, 243-255.
[16]
FUHR, N. 1986. Two models of retrieval with probabilistic indexing. In Proceedings of the ACM SIGIR Conference On Research and Development in Informatzon Retrieval ACM, New York, 249-257.
[17]
GILES, R. 1976. Lukasiewicz logic fuzzy theory. Int. J. Man-Machine Stud. 8, 313-327.
[18]
GOOD, I.J. 1983. Good Thinking: The Foundations of Probability and Its Application. University of Minnesota Press, Minneapolis.
[19]
GORDON, M. AND KOCHE~, M. 1989. Recall-precision trade-off: A derivation. J. Am. Soc. Inf. Sc~. 40, 145 151.
[20]
HACKING, I. 1975. The Emergence of Probability. Cambridge University Press, London.
[21]
HARRISON, M.A. 1965. Introduction to Switching and Automata Theory. McGraw-Hill Book Company, New York.
[22]
JAMES, E.T. 1979. Where do we stand on maximum entropy. In The Maximum Entropy Formalism. The MIT Press, Cambridge, Mass.
[23]
KLm, G. J. ANn FOLGER, T.A. 1988. Fuzzy sets, Uncertainty, and Information. Prentice-Hall, Englewood Cliffs, N. J.
[24]
MARON, M. E. AND KUHNS, J.L. 1960. On relevance, probabilistic indexing and information retrieval. J. ACM 7, 216-244.
[25]
MARR, D. 1982. Vision. Freeman, San Francisco.
[26]
NIE, J.Y. 1992. Towards a probabilistic modal logic for semantic-based information retrieval. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 140-151.
[27]
Nm, J. 1989. An information retrieval model based on modal logic. Inf. Process. Manage. 25, 477-491.
[28]
PAWLAK, Z., WONG, S. K. M., AND ZIARKO, W. 1988 Rough sets: Probabilistic versus deterministic approach. Int. J. Man-Machine Stud. 29, 81-95.
[29]
PEARL, J. 1988. Probabil~stic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, Calif.
[30]
RADECKI, T. 1976. Mathematic model of information retrieval based on the concept of a fuzzy thesaurus. Inf. Process. Manage. 12,313-318.
[31]
RAGHAVAN, V. V. AND WONG, S. K. M. 1986. A critical analysis of vector space model in information retrieval. J. Am. Soc. Inf. Sc~. 37, 279-287.
[32]
ROBERTSON, S.E. 1977. The probability ranking principle in IR. J. Doc. 33, 294-304.
[33]
ROBERTSON, S. E. ANn SPARCK JONES, K. 1976. Relevance weighting of search terms. J. Am. Soc. Inf. Sci. 27, 129-146.
[34]
ROBERTSON, S. E., MARON, M. E., AND COOPER, W. S. 1982. Probability of relevance: A unification of two competing models for document retrieval. Inf. Tech. Res. Dev. 1, 1-21.
[35]
SALTON, G., (ED). 1971. The SMART Retrieval System--Experiments in Automatic Document Processing. Prentice-Hall, Englewood Cliffs, N. J.
[36]
SALTON, G. AND McGmL, M.H. 1983. Introduction to Modern Information Retrieval. McGraw- Hill, New York.
[37]
SALTON, G., Fox, E. A., AND Wu, H. 1983. Extended Boolean information retrieval. Commun. ACM 26, 1022-1036.
[38]
SARACEVIC, T. 1975. Relevance: A review of and a framework for the thinking on the notion in information science. J. Am. Soc. Inf. Sci. 26, 321-343.
[39]
SAVAGE, L.J. 1972. The Foundations of Statistics. Dover, New York.
[40]
SCH~UBLE, P. 1987. Thesaurus based concept spaces. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 254-262.
[41]
SH~ER, G. 1987. Probability judgment in artificial and expert systems. Stat. Sci. 2, 3-16.
[42]
SHANNON, C.E. 1948. The mathematical theory of communication. Bell Syst. Tech. J. 27, 379-423,623-656.
[43]
T~OMPSON, P. 1990. A combination of expert opinion approach to probabilistic information retrieval, Part 1: The conceptual model; Part 2: Mathematical treatment of CEO model 3. Inf. Process. Manage. 26, 371-382. 383 394.
[44]
THOMPSON, P. 1988. Subjective probability and information retrieval: A review of the psychological literature. J. Doc. 44, 119-143.
[45]
TURTLE, H. R. AND CROFT, W.B. 1992. A comparison of text retrieval models. Comput. J. 35, 279-290.
[46]
TURTLE, $. AND CROFT, W.B. 1990. Inference network for document retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 1-24.
[47]
VAN RIJSBERGEN, C.J. 1992. Probabilistic retrieval revisited. Comput. J. 35, 291-298.
[48]
vAN RIJSBERGEN, C.J. 1989. Towards an information logic. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 77-86.
[49]
VAN RIJSBERGEN, C.J. 1986 A non-classical logic for information retrieval. Comput. J. 29, 481-485.
[50]
VAN RIJSBERGEN, C.J. 1979. Information Retrieval. Butterworth, London.
[51]
WATANABE, S. 1985. Pattern Recognition: Human and Mechanical. Wiley, New York.
[52]
WON6, S K. M. AND YAO, Y.Y. 1991. A probabilistic inference model for information retrieval. Inf. Syst. 16, 301-321.
[53]
WONG, S. K. M. AND YAO, Y.Y. 1990. A generalized binary probabilistic independence model. J. Am Soc. Inf. Sci. 41,324-329.
[54]
WONG, S. K. M., BOLLMANN, P., AND YAO, Y.Y. 1991. Information retrieval based on axiomatic decision theory. Int. J. Gen. Syst. 19,301-321.
[55]
WONG, S. K. M., ZIARKO, W., RAGHAVAN, V. V., ~D WONG, P. C.N. 1987. On modeling of information retrieval concepts in vector spaces. ACM Trans. Database Syst. 12,299-321.
[56]
YAO, Y. Y. AND WONG, S. K. M. 1992. A decision theoretic framework for approximating concepts. Int. J. Man-machtne Stud. 37, 793 807.
[57]
Yu, C. T. AND SALTON, G. 1976. Precision weighting--An effective automatic indexing method. J. ACM 23, 76-85.
[58]
ZADEH, L.A. 1965. Fuzzy sets. Inf. Contr. 8, 338-353.

Cited By

View all
  • (2023)Bengali document retrieval using a language modeling approach enhanced by improved cluster-based smoothingSādhanā10.1007/s12046-023-02258-148:4Online publication date: 4-Oct-2023
  • (2021)Truncated Models for Probabilistic Weighted RetrievalACM Transactions on Information Systems10.1145/347683740:3(1-24)Online publication date: 8-Dec-2021
  • (2018)Information Retrieval: Concepts, Models, and SystemsComputational Analysis and Understanding of Natural Languages: Principles, Methods and Applications10.1016/bs.host.2018.07.009(331-401)Online publication date: 2018
  • Show More Cited By

Recommendations

Reviews

Duncan A. Buell

The problem of the mathematical framework for the docu<__?__Pub Fmt hyphen-point>ments-to-queries matching portion of an information retrieval system is examined. Assuming some universe in which probabilities can be computed, the fundamental notion is that the degree of support for a query of q provided by a document d is P q&vbm0;d = P q?d P d . Through a series of examples, the authors then show that this probabilistic inference model is similar to other models for information retrieval (Boolean, fuzzy-set, vector space, and so on). Although this paper is interesting and easy to read, what is most striking about it is the discrepancy between the broad claim that this method “provides a common conceptual and mathematical basis” for other retrieval models, which are called “special cases,” and the many caveats that appear when the individual models are examined. For example, the authors show that different variations in the new model lead to approximations to the vector space model under a given similarity measure, but some well-known similarity measures (like the cosine measure)<__?__Pub Caret> cannot be interpreted thus. Similarly, the new model is the same as the standard Boolean model or the fuzzy-set model, provided one views the latter two in a particularly narrow way.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 January 1995
Published in TOIS Volume 13, Issue 1

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)109
  • Downloads (Last 6 weeks)14
Reflects downloads up to 24 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Bengali document retrieval using a language modeling approach enhanced by improved cluster-based smoothingSādhanā10.1007/s12046-023-02258-148:4Online publication date: 4-Oct-2023
  • (2021)Truncated Models for Probabilistic Weighted RetrievalACM Transactions on Information Systems10.1145/347683740:3(1-24)Online publication date: 8-Dec-2021
  • (2018)Information Retrieval: Concepts, Models, and SystemsComputational Analysis and Understanding of Natural Languages: Principles, Methods and Applications10.1016/bs.host.2018.07.009(331-401)Online publication date: 2018
  • (2017)A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information RetrievalACM SIGIR Forum10.1145/3130348.313037751:2(268-276)Online publication date: 2-Aug-2017
  • (2016)Logics, Lattices and Probability: The Missing Links to Information RetrievalThe Computer Journal10.1093/comjnl/bxw034Online publication date: 29-Jul-2016
  • (2016)A probabilistic inference model for recommender systemsApplied Intelligence10.1007/s10489-016-0783-145:3(686-694)Online publication date: 1-Oct-2016
  • (2015)Modeling Tag-Aware Recommendations Based on User PreferencesInternational Journal of Information Technology & Decision Making10.1142/S021962201550019414:05(947-970)Online publication date: Sep-2015
  • (2015)Fuzzy Formal Concept Analysis Approach for Information RetrievalProceedings of the Fifth International Conference on Fuzzy and Neuro Computing (FANCCO - 2015)10.1007/978-3-319-27212-2_20(255-271)Online publication date: 25-Nov-2015
  • (2014)Exploiting Inference from Semantic Annotations for Information RetrievalProceedings of the 7th International Workshop on Exploiting Semantic Annotations in Information Retrieval10.1145/2663712.2666197(43-45)Online publication date: 7-Nov-2014
  • (2014)Computing paper similarity based on latent dirichlet allocationProceedings of the 8th International Conference on Ubiquitous Information Management and Communication10.1145/2557977.2558028(1-6)Online publication date: 9-Jan-2014
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media