article

Free access

A framework for effective retrieval

Authors:

C. T. Yu,

W. Meng,

S. ParkAuthors Info & Claims

ACM Transactions on Database Systems (TODS), Volume 14, Issue 2

Pages 147 - 167

https://doi.org/10.1145/63500.63519

Published: 01 June 1989 Publication History

PDF eReader

Abstract

The aim of an effective retrieval system is to yield high recall and precision (retrieval effectiveness). The nonbinary independence model, which takes into consideration the number of occurrences of terms in documents, is introduced. It is shown to be optimal under the assumption that terms are independent. It is verified by experiments to yield significant improvement over the binary independence model. The nonbinary model is extended to normalized vectors and is applicable to more general queries.

Various ways to alleviate the consequences of the term independence assumption are discussed. Estimation of parameters required for the nonbinary independence model is provided, taking into consideration that a term may have different meanings.

References

[1]

BOOKSTEIN, A. Fuzzy requests. J. Am. Soc. Inf. Sci. (1980), 240-247.

Crossref

Google Scholar

[2]

BOOKSTEIN, A. Information retreival: A sequential learing process. J. Am. Soc. Inf. Sci. (1983) 331-342.

Crossref

Google Scholar

[3]

BOOKSTEIN, A. Performance of self-taught documents: Exploiting co-relevance structure in a document collection. In ACM S!G!R Conference (1986), 244-248~

Digital Library

Google Scholar

[4]

BOOKSTEIN, A., AND SWANSON, D.R. Probabilistic models for automatic indexing. J. Am. Soc. Inf. Sci. 25 (1974), 312-319.

Crossref

Google Scholar

[5]

CHENG, Y., AND FU, U. S. Conceptual clustering in knowledge organization. IEEE Trans. Pattern Anal. Mach. Intell. (1985), 592-598.

Google Scholar

[6]

CHOW, D., AND YU, C.T. On the construction of feedback queries. J. ACM (1982), 127-151.

Digital Library

Google Scholar

[7]

CROFT, W.B. Experiments with representation in a document system. Tech. Rep. 82-21, Univ. of Massachusetts, Amherst, 1982.

Google Scholar

[8]

CROFT, W., AND HARPEB, D. Using probabilistic models of document retrieval without relevant information. J. Doc. (1979), 285-295.

Crossref

Google Scholar

[9]

DEERWESTER, S. Private communication.

Google Scholar

[10]

FELLER, W. An introduction to Probability Theory and its Applications. Wiley, New York, 1968.

Google Scholar

[11]

HARPER, D., AND VAN RIJSBERGEN, C.g. An evaluation of feedback in document retrieval using co-occurence data. J. Doc. (Sept. 1978), 189-216

Google Scholar

[12]

HARTER, S.P. A probabilistic approach to automatic keyword indexing. J. Am. Soc. Inf. Sci. 26 (1975), part I, 197-205, part II, 280-289.

Crossref

Google Scholar

[13]

KOLL~ Mo An approach to co ncept-b~sed information retrieval. ACM SIGIR Conference (!978). ACM, New York, 1978.

Digital Library

Google Scholar

[14]

KRAFT, D. Research into fuzzy extensions of information retrieval. ACM SIGIR Forum (1986).

Digital Library

Google Scholar

[15]

LAM K., AND YU, C. A clustered search algorithm with arbitrary term dependencies. ACM Trans. Database Syst. (1982), 500-508.

Digital Library

Google Scholar

[16]

LOSEE, R. The performance of probabilistic models of document retrieval systems. Ph.D. thesis, Univ. of Chicago, Chicago, Ill., 1986.

Google Scholar

[17]

LOSEE, R., BOOKSTEIN, A., AND YU, C. Probabilistic models for document retrieval: A comparison of performance on experimental and synthetic databases. In ACM SIGIR Conference (1986). ACM, New York, 1986, 258-264.

Digital Library

Google Scholar

[18]

MARON, M., AND KUHNS, J. On relevance, probabilistic indexing and information retrieval. J. ACM (1960), 216-244.

Digital Library

Google Scholar

[19]

MCCUNE, B., TONG, R., DEAN, J., AND SHAPIRO, D. RUBRIC: A system for rule-based information retreival. IEEE Trans. Softw. Eng. (1985), 939-944.

Digital Library

Google Scholar

[20]

RADECKI, T. Fuzzy set theoretical approach to document retrieval. Inf. Process. Manage. (1979), 247-259.

Google Scholar

[21]

RAGHAVAN, V., SHI, H., AND YU, C. Evaluation of the 2-poisson models as a basis for using term frequency data in searching. In ACM SIGIR Conference (1983). ACM, New York, 1983, 88-100.

Digital Library

Google Scholar

[22]

ROBERTSON, S.E. The probability ranking principle in information retrieval. J. Doc. (1977), 294-304.

Crossref

Google Scholar

[23]

ROBERTSON, S. E., MARON, M., AND COOPER, W. Probability of relevance: A unification of two competing models for document retrieval. Inf. Technol. (1982), 1-21.

Google Scholar

[24]

ROBERTSON, S. E., AND SPARCH JONES, K. Relevance weighing of search terms. J. Am. Soc. Inf. Sci. (1976), 129-146.

Crossref

Google Scholar

[25]

ROBERTSON, S. E., VAN RIJSBERGEN, C. J., AND PORTER, M.F. Probabilistic models of indexing and searching. In Information Retrieval Researched, Oddy, Robertson, Van Rijsbergen, and Williams (Eds.), 1981, pp. 35-56.

Digital Library

Google Scholar

[26]

ROCCHIO, J. Relevance feedback in information retrieval. In The Smart Retrieval System, G. Salton, Ed., Prentice-Hall, Englewood Cliffs, N.J., 1971.

Google Scholar

[27]

SALTON, G. Dynamic Information and Library Processing. Prentice-Hall, Englewood Cliffs, N.J., 1975.

Digital Library

Google Scholar

[28]

SALTON, G. Recent trends in automatic information retrieval. In ACM SIGIR Conference (1986). ACM, New York, 1986, 1-10.

Digital Library

Google Scholar

[29]

SALTON, G., AND MCGiLL, M. introduction to Modern information Retrieval. McGraw Hill, New York, 1983.

Digital Library

Google Scholar

[30]

SALTON, G., YANG, C., AND YU, C. A theory of term importance in automatic text analysis. J. Am. Soc. Inf. Sci. (1975), 33-44.

Crossref

Google Scholar

[31]

SPARCK JONES, K. A statistical interpretation of term specificity and its application in retrieval. J. Doc. (1972), 11-21~

Crossref

Google Scholar

[32]

TAGUE, J., AND NELSON, M. Simulation of bibliographic databases using hyperterms. In Research and Development in Information Retrieval, Salton and Schneider (Eds.), Springer- Verlag, New York, 1983,

Digital Library

Google Scholar

[33]

TONG, R., ASKMAN, V., CUNNINGHAM, AND TOLLANDER. Rubric" An environment for full text information retrieval. In ACM SIGIR Conference (1985). ACM, New York, 1985, 243-251.

Digital Library

Google Scholar

[34]

TSICHmTZ}S, D., Ed. Office Automation. Springer-Verlag, New York, 1985.

Google Scholar

[35]

VAN RIJSBERGEN, C.J. A theoretical basis for the use of co-occurrence data in information retrieval, j. Doc. (1977), 106-119.

Google Scholar

[36]

VAN RIJSBERGEN, C.J. Information Retrieval. 2nd ed., Butterworth, 1979.

Digital Library

Google Scholar

[37]

TONG, M. Private communication.

Google Scholar

[38]

WONG, M., ZiARKO, W., AND WONG, P. C. Generalized vector space model in information retrieval. ACM SIGIR Conference (1985). ACM, New York, 1985, 18-25.

Digital Library

Google Scholar

[39]

Wu, H., AND SALTON, G. The estimation of term relevance weights using relevance feedback. J. Doc. (!981), 194-214.

Google Scholar

[40]

Yu, C. W., AND LEE, T.C. Non-binary independence model. ACM SIGIR Conference (1986). ACM, New York, 1986, 265-268.

Digital Library

Google Scholar

[41]

Yu, C. T., AND SALTON, G. Precision weighing--An effective automatic indexing method~ J. ACM (1976), 76-88.

Digital Library

Google Scholar

[42]

Yu, C. T., LUK, W., AND CHEUNG, T. A statistical model for relevance feedback in information retrieval. J. ACM (1976), 273-286.

Digital Library

Google Scholar

[43]

YU, C. T., LUK, W. S., ANO SIu, M.K. On models of information retrieval processes. Inf. Syst. (1979), 205-218.

Google Scholar

[44]

Yu, C. T., WANG, Y. T., AND CHEN, C. H. Adaptive documents clustering. ACM SIGIR Conference (1985). ACM, New York, 1985, 131-137.

Digital Library

Google Scholar

[45]

Yu, C. W., BUCKLEY, C., LAM, K., AND SALTON, G. A generalized term dependence model in information retrieval. Inf. Tech. (1983), 129-154.

Google Scholar

[46]

Yu, C. T.W., SUEN, C. M., LAM, K., AND SIU, M. K. Adaptive record clustering. ACM Trans. Databse Syst. (1985), 180-204

Digital Library

Google Scholar

[47]

ZADEH, L. Fuzzy sets. Inf. Control (1965), 338-353.

Google Scholar

Cited By

View all

Lin SChen MHo JHuang Y(2002)ACIRDIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2002.100034514:3(599-614)Online publication date: 1-May-2002
https://dl.acm.org/doi/10.1109/TKDE.2002.1000345
Dominich S(2000)A unified mathematical definition of classical information retrievalJournal of the American Society for Information Science10.1002/(SICI)1097-4571(2000)51:7<614::AID-ASI4>3.0.CO;2-S51:7(614-624)Online publication date: 2000
https://doi.org/10.1002/(SICI)1097-4571(2000)51:7<614::AID-ASI4>3.0.CO;2-S
Iliev ONikolovska SDojcinoski K(1999)Integrated information system of the archive of MacedoniaIFAC Proceedings Volumes10.1016/S1474-6670(17)57020-232:2(5979-5984)Online publication date: Jul-1999
https://doi.org/10.1016/S1474-6670(17)57020-2
Show More Cited By

Index Terms

A framework for effective retrieval
1. Information systems
  1. Information retrieval
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Towards effective genomic information retrieval: The impact of query complexity and expansion strategies

The goal of this study is to examine the influence of query complexity and different query expansion strategies on the effectiveness of genomic information retrieval. Query complexity is defined as the average number of terms in a query. The query ...
On the query reformulation technique for effective MEDLINE document retrieval

Improving the retrieval accuracy of MEDLINE documents is still a challenging issue due to low retrieval precision. Focusing on a query expansion technique based on pseudo-relevance feedback (PRF), this paper addresses the problem by systematically ...
Effective information retrieval using term accuracy

The performance of information retrieval systems can be evaluated in a number of different ways. Much of the published evaluation work is based on measuring the retrieval performance of an average user query. Unfortunately, formal proofs are difficult ...

Reviews

Reviewer: Robert G Crawford

In their conclusion, the authors note four reasons for the mediocre retrieval performance of existing systems: “(i) Terms are not independent. (ii) Some relevant documents may not have any terms in common with a given query and therefore they cannot be retrieved. (iii) Term weights in documents are not assigned optimally. (iv) Previous parameter estimation techniques are incapable of estimating the parameter values correctly. . . . ” They propose a framework to address these four problems. The nonbinary independence model is introduced as a way of taking into consideration the frequency of occurrences of terms in documents. The authors also suggest methods to alleviate the term independence assumption, present an approach to estimating the parameters required by the model, and give a method to collect statistics, allowing for different usages of any particular term. The paper notes, “As in all other probabilistic models, parameters . . . need to be estimated from previously retrieved relevant and irrelevant documents” and “it is assumed that the document collection has been used for some time by a given user.” It is critical that these assumptions be understood in reading the work, and the authors have been careful in this regard. With these underlying requirements, the authors analyze results for small collections that have known queries and relevance assessments. The paper is clearly written. The mathematics is carefully done and is presented in a way that the reader can follow. Those interested in retrieval models, whether theoretically or for practical implementation, should read this solid piece of work.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems

ACM Transactions on Database Systems Volume 14, Issue 2

June 1989

144 pages

ISSN:0362-5915

EISSN:1557-4644

DOI:10.1145/63500

Editor:
Gio Wiederhold
Computer Science Department, Stanford University, Stanford, CA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1989

Published in TODS Volume 14, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
494
Total Downloads

Downloads (Last 12 months)38
Downloads (Last 6 weeks)5

Reflects downloads up to 01 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Lin SChen MHo JHuang Y(2002)ACIRDIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2002.100034514:3(599-614)Online publication date: 1-May-2002
https://dl.acm.org/doi/10.1109/TKDE.2002.1000345
Dominich S(2000)A unified mathematical definition of classical information retrievalJournal of the American Society for Information Science10.1002/(SICI)1097-4571(2000)51:7<614::AID-ASI4>3.0.CO;2-S51:7(614-624)Online publication date: 2000
https://doi.org/10.1002/(SICI)1097-4571(2000)51:7<614::AID-ASI4>3.0.CO;2-S
Iliev ONikolovska SDojcinoski K(1999)Integrated information system of the archive of MacedoniaIFAC Proceedings Volumes10.1016/S1474-6670(17)57020-232:2(5979-5984)Online publication date: Jul-1999
https://doi.org/10.1016/S1474-6670(17)57020-2
Turtle HCroft W(1997)Uncertainty in Information Retrieval SystemsUncertainty Management in Information Systems10.1007/978-1-4615-6245-0_7(189-224)Online publication date: 1997
https://doi.org/10.1007/978-1-4615-6245-0_7
Carpineto CRomano G(1996)A lattice conceptual clustering system and its application to browsing retrievalMachine Learning10.1007/BF0005865424:2(95-122)Online publication date: Aug-1996
https://doi.org/10.1007/BF00058654
Tresch MPalmer NLuniewski A(1995)Type Classification of Semi-Structured DocumentsProceedings of the 21th International Conference on Very Large Data Bases10.5555/645921.673309(263-274)Online publication date: 11-Sep-1995
https://dl.acm.org/doi/10.5555/645921.673309
Jing YCroft WBrentano JSeitz F(1994)An association thesaurus for information retrievalIntelligent Multimedia Information Retrieval Systems and Management - Volume 110.5555/2856823.2856838(146-160)Online publication date: 11-Oct-1994
https://dl.acm.org/doi/10.5555/2856823.2856838
Iwayama MTokunaga TJacobs P(1994)A probabilistic model for text categorizationProceedings of the fourth conference on Applied natural language processing10.3115/974358.974395(162-167)Online publication date: 13-Oct-1994
https://dl.acm.org/doi/10.3115/974358.974395
Fuhr NPfeifer U(1994)Probabilistic information retrieval as a combination of abstraction, inductive learning, and probabilistic assumptionsACM Transactions on Information Systems10.1145/174608.17461212:1(92-115)Online publication date: 2-Jan-1994
https://dl.acm.org/doi/10.1145/174608.174612
DOMINICH S(1994)INTERACTION INFORMATION RETRIEVALJournal of Documentation10.1108/eb02693050:3(197-212)Online publication date: Mar-1994
https://doi.org/10.1108/eb026930
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

References

Cited By

Index Terms

Recommendations

Towards effective genomic information retrieval: The impact of query complexity and expansion strategies

On the query reformulation technique for effective MEDLINE document retrieval

Effective information retrieval using term accuracy

Reviews

Access critical reviews of Computing literature here

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations