Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

A unified approach for artificial intelligence and information retrieval

Published: 01 May 1986 Publication History

Abstract

In the past, several mathematical models for document retrieval systems have been developed [C82, S83, S83a, T76, WO84]. These models are used to formally represent the basic characteristics, functional components, and the retrieval processes of document retrieval systems. Two basic categories of models that have been employed in information retrieval are the vector processing models and the Boolean retrieval models.In the conventional vector space model (VSM), proposed by Salton [S71, S83] index terms are basic vectors in a vector space. Each document or query is represented as a linear combination of these basic term vectors. The retrieval operation consists of computing the cosine similarity function between a given query vector and the set of document vectors and then ranking documents accordingly. In this approach, the interpretation that the occurrence frequency of a term in a document represents the component of the document vector along the corresponding basic term vectors is made.The advantages of this model are that it is simple and yet powerful. The vector operations can be performed efficiently enough to handle very large collections. Furthermore, it has been shown that the retrieval effectiveness is significantly higher compared to that of the Boolean retrieval models. However, this vector model has been incorporated into very few commercial systems.In the strict Boolean retrieval systems [BU81, P84] the user query normally consists of index terms that are connected by Boolean operators AND, OR and NOT. The advantage of using Boolean connectives is to provide a better structure to formulate the user query. The major problem in such a system is that there is no provision for associating weights of importance to the terms which are assigned either to the documents or to the queries. In other words, the representation is binary, indicating either the presence or the absence of the various index terms. The output obtained in response to a query is not ranked in any order of presumed importance to the user. In most cases, the AND connectives tend to be too restrictive [BU81]. Mose commercially available retrieval systems essentially conform to this model.One of the challenges for researchers in information retrieval has been to achieve greater acceptance of the vector processing models in commercial systems. The main difficulty in this connection is due to the inability of the vector processing systems to handle Boolean queries. In recent years some progress has been made in expressing Boolean queries as vectors [S83a, S83b]. If attractive ways to achieve this are advanced, it would then be possible to modify existing systems to use vector processing techniques without a great deal of cost and effort.Another problem in the conventional vector space model is that it assumes that term vectors are orthogonal. It is generally agreed that terms are correlated and it is necessary to generalize the model to incorporate term correlations. A vector processing model termed the GVSM [WO84a, WO85] was proposed in response to this need. In the GVSM, the queries are assumed to be presented as a list of terms and corresponding weights. Thus, no provision is made for processing Boolean queries. However, the premises of the model naturally lead to a scheme for handling Boolean queries. In this paper we present the details of this scheme. This result will help achieve the aim of integrating vector processing capabilities into existing systems which use Boolean retrieval models.

References

[1]
{BU81} Buell, D., "A General Model of Query Processing in Information Retrieval Systems," Information Processing & Management, 17 (1981), pp. 249--262.
[2]
{C82} Croft, W., "Experiments with Representation in a Document Retrieval System," COINS Technical Report 82--21, University of Massachusetts (1982).
[3]
{P84} Paice, C. D., "Soft evaluation of Boolean search queries in information retrieval systems," Information Technology, Vol. 3, No. 1 (1984), pp. 33-41.
[4]
{S71} Salton, G., "The SMART Retrieval System - Experiment in Automatic Document Processing," Prentice-Hall, New Jersey (1971).
[5]
{S83} Salton, G. and McGill, M. H., "Introduction To Modern Information Retrieval," McGill-Hill, Inc., New York (1983), pp. 146--151.
[6]
{S83a} Salton, G., Fox, E. A. and Wu, H., "Extended Boolean Information Retrieval," Communications of the ACM, Vol. 26, No. 11 (1983), pp. 1022--1036.
[7]
{S83b} Salton, G., Fox, E. A. and Wu, H., "An Automatic Environment for Boolean Information Retrieval," Information Processing & Management (1983), pp. 755--762.
[8]
{T76} Tahani, V., "A Fuzzy Model of Document Retrieval Systems," Information Processing & Management, Vol. 12, (1976), pp. 177--187.
[9]
{WO84} Wong, S. K. M. and Ziarko, W., "A Unified Model in Information Retrieval," Fundamenta Informaticae, Vol. X (1987).
[10]
{WO85} Wong, S. K. M., Ziarko, W. and Wong, P. C. N., "Generalized Vector Space Model in Information Retrieval," Proceedings of the Seventh International Conference on Information Storage and Retrieval (1985), pp. 18--25.

Cited By

View all
  • (2021)Improved multiclass support vector data description for planetary gearbox fault diagnosisControl Engineering Practice10.1016/j.conengprac.2021.104867114(104867)Online publication date: Sep-2021
  • (1990)Specialized Parallel Architectures for Textual DatabasesAdvances in Computers Volume 3010.1016/S0065-2458(08)60297-1(1-37)Online publication date: 1990

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGIR Forum
ACM SIGIR Forum  Volume 20, Issue 1-4
Spring-Summer 1986
36 pages
ISSN:0163-5840
DOI:10.1145/15497
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1986
Published in SIGIR Volume 20, Issue 1-4

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)2
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Improved multiclass support vector data description for planetary gearbox fault diagnosisControl Engineering Practice10.1016/j.conengprac.2021.104867114(104867)Online publication date: Sep-2021
  • (1990)Specialized Parallel Architectures for Textual DatabasesAdvances in Computers Volume 3010.1016/S0065-2458(08)60297-1(1-37)Online publication date: 1990

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media