Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

A corpus analysis approach for automatic query expansion and its extension to multiple databases

Published: 01 July 1999 Publication History

Abstract

Searching online text collections can be both rewarding and frustrating. While valuable information can be found, typically many irrelevant documents are also retrieved, while many relevant ones are missed. Terminology mismatches between the user's query and document contents are a main cause of retrieval failures. Expanding a user's query with related words can improve search performances, but finding and using related words is an open problem. This research uses corpus analysis techniques to automatically discover similar words directly from the contents of the databases which are not tagged with part-of-speech labels. Using these similarities, user queries are automatically expanded, resulting in conceptual retrieval rather than requiring exact word matches between queries and documents. We are able to achieve a 7.6% improvement for TREC 5 queries and up to a 28.5% improvement on the narrow-domain Cystic Fibrosis collection. This work has been extended to multidatabase collections where each subdatabase has a collection-specific similarity matrix associated with it. If the best matrix is selected, substantial search improvements are possible. Various techniques to select the appropriate matrix for a particular query are analyzed, and a 4.8% improvement in the results is validated.

References

[1]
BUCKLEY, C. 1985. Implementation of the SMART information retrieval system. Tech. Rep. 85-686. Department of Computer Science, Cornell University, Ithaca, NY.
[2]
CHURCH, K. W. AND HANKS, P. 1990. Word association norms, mutual information and lexicography. Comput. Linguist. 16, 1 (Mar. 1990), 22-29.
[3]
CROFT, W. B., COOK, R., AND WILDER, D. 1995. Providing government information on the Internet: Experiences with THOMAS. In Proceedings of the Digital Libraries Conference (DL '95). 19-24.
[4]
CROUCH, C. g. AND YANG, B. 1992. Experiments in automatic statistical thesaurus construction. In Proceedings of the 15th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '92, Copenhagen, Denmark, June 21-24), N. Belkin, P. Ingwersen, A. M. Pejtersen, and E. Fox, Eds. ACM Press, New York, NY, 77-88.
[5]
DEERWESTER, S., DUMAI, S. T., FURNAS, G. W., LANDAUER, T. K., AND HARSHMAN, R. 1990. Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 6, 391-407.
[6]
FINCH, S. AND CHATER, N. 1992. Bootstrapping syntactic categories using statistical methods. In Proceedings of the 1st SHOE Workshop (The Netherlands), W. Daelemans and D. Powers, Eds. 229-235.
[7]
GAUCH, S. AND CHONG, M. 1995. Automatic word similarity detection for TREC 4 query expansion. In Proceedings of the 4th Text Retrieval Conference (TREC-4, Washington, D.C., Nov.), D. K. Harman, Ed. National Institute of Standards and Technology, Gaithersburg, MD, 527-536.
[8]
GAUCH, S. AND RACHAKONDA, S. 1997. Experiments in automatic similarity matrix selection for query expansion. Tech. Rep. ITTC-FY97-TR-11100-3. Information and Telecommunication Technology Center, University of Kansas, Lawrence, KS.
[9]
GAUCH, S. AND SMITH, J. B. 1991. Search improvement via automatic query reformulation. ACM Trans. Inf. Syst. 9, 3 (July 1991), 249-280.
[10]
GAUCH, S. AND SMITH, J. B. 1993. An expert system for automatic query reformulation. J. Am. Soc. Inf. Sci. 44, 3, 124-136.
[11]
GAUCH, S. AND WANG, J. 1996. Automatic word similarity detection for TREC 5 query expansion. In Proceedings of the 5th Text Retrieval Conference (TREC-5, Gaithersburg, MD, Nov.), E. M. Voorhees and D. K. Harman, Eds. National Institute of Standards and Technology, Gaithersburg, MD.
[12]
GAUCH, S. AND WANG, g. 1997. Tuning a corpus analysis approach for automatic query expansion. Tech. Rep. ITTC-FY97-TR-11100-2. Information and Telecommunication Technology Center, University of Kansas, Lawrence, KS.
[13]
GREFENSTETTE, a. 1992. Use of syntactic context to produce term association lists for text retrieval. In Proceedings of the 15th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '92, Copenhagen, Denmark, June 21-24), N. Belkin, P. Ingwersen, A. M. Pejtersen, and E. Fox, Eds. ACM Press, New York, NY, 89-97.
[14]
HARMAN, D. 1992. Relevance feedback revisited. In Proceedings of the 15th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '92, Copenhagen, Denmark, June 21-24), N. Belkin, P. Ingwersen, A. M. Pejtersen, and E. Fox, Eds. ACM Press, New York, NY, 1-10.
[15]
JING, Y. AND CROFT, W. B. 1994. An association thesaurus for information retrieval. In Proceedings of the Intelligent Multimedia Information Retrieval Systems (RIAO '94, New York, NY). 146-160.
[16]
LIDDY, E. D. AND MYAENG, S. H. 1993. DR-LINK's linguistic-conceptual approach to document detection. In Proceedings of the 1st Text Retrieval Conference. 113-129.
[17]
MILLER, G. A. AND CHARLES, W. G. 1991. Contextual correlates of semantic similarity. Lang. Cogn. Process. 6, 1, 1-28.
[18]
MYAENG, S. H. AND LI, M. 1992. Building term clusters by acquiring lexical semantics from a corpus. In Proceedings of the 1st International Conference on Information and Knowledge Management (CIKM-92, Baltimore, MD, Nov.), Y. Yesha, Ed. 130-137.
[19]
QIu, Y. AND FREI, H. P. 1993. Concept based query expansion. In Proceedings of the 16th Annual International ACM SIGIR Conference (Pittsburgh, PA). ACM Press, New York, NY, 160-169.
[20]
SCHUTZE, H. AND PEDERSEN, g. 1994. A cooccurrence-based thesaurus and two applications to information retrieval. In Proceedings of the Intelligent Multimedia Information Retrieval Systems (RIAO '94, New York, NY). 266-274.
[21]
SHAW, W. M. JR., WOOD, J. B., WOOD, R. E., AND TIBBO, H. R. 1991. The cystic fibrosis database: Content and research opportunities. Libr. Inf. Sci. Res. 12, 347-366.
[22]
SPARCK JONES, K. 1971. Automatic Keyword Classification for Information Retrieval. Butterworths, London, UK.
[23]
VOORHEES, E. M. 1994. Query expansion using lexical-semantic relations. In Proceedings of the 17th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '94, Dublin, Ireland, July 3-6), W. B. Croft and C. J. van Rijsbergen, Eds. Springer-Verlag, New York, NY, 61-69.
[24]
Xu, J. AND CROFT, W. B. 1996. Query expansion using local and global document analysis. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '96, Zurich, Switzerland, Aug. 18-22), H.-P. Frei, D. Harman, P. Scha bie, and R. Wilkinson, Eds. ACM Press, New York, NY, 4-11.

Cited By

View all
  • (2023)Recent Query Reformulation Approaches for Information Retrieval System - A SurveyRecent Advances in Computer Science and Communications10.2174/266625581566622040409192016:1Online publication date: Jan-2023
  • (2021)An automated method to enrich consumer health vocabularies using GloVe word embeddings and an auxiliary lexical resourcePeerJ Computer Science10.7717/peerj-cs.6687(e668)Online publication date: 9-Aug-2021
  • (2021)A Query Expansion Method Using Multinomial Naive BayesApplied Sciences10.3390/app11211028411:21(10284)Online publication date: 2-Nov-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems
ACM Transactions on Information Systems  Volume 17, Issue 3
July 1999
113 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/314516
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 July 1999
Published in TOIS Volume 17, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tag

  1. query expansion

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)93
  • Downloads (Last 6 weeks)17
Reflects downloads up to 27 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Recent Query Reformulation Approaches for Information Retrieval System - A SurveyRecent Advances in Computer Science and Communications10.2174/266625581566622040409192016:1Online publication date: Jan-2023
  • (2021)An automated method to enrich consumer health vocabularies using GloVe word embeddings and an auxiliary lexical resourcePeerJ Computer Science10.7717/peerj-cs.6687(e668)Online publication date: 9-Aug-2021
  • (2021)A Query Expansion Method Using Multinomial Naive BayesApplied Sciences10.3390/app11211028411:21(10284)Online publication date: 2-Nov-2021
  • (2019)Query Expansion Using DBpedia and WordNetProceedings of the ArabWIC 6th Annual International Conference Research Track10.1145/3333165.3333184(1-6)Online publication date: 7-Mar-2019
  • (2019)Query expansion techniques for information retrieval: A surveyInformation Processing & Management10.1016/j.ipm.2019.05.00956:5(1698-1735)Online publication date: Sep-2019
  • (2018)Proximity-Based Good Turing Discounting and Kernel Functions for Pseudo-Relevance FeedbackInformation Retrieval and Management10.4018/978-1-5225-5191-1.ch100(2244-2266)Online publication date: 2018
  • (2018)Using semantics for granularities of tokenizationComputational Linguistics10.1162/coli_a_0032544:3(483-524)Online publication date: 1-Sep-2018
  • (2018)Strength Pareto fitness assignment for pseudo-relevance feedbackFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-016-5560-012:1(163-176)Online publication date: 1-Feb-2018
  • (2017)RQUERYProceedings of the Thirty-First AAAI Conference on Artificial Intelligence10.5555/3298023.3298140(3936-3943)Online publication date: 4-Feb-2017
  • (2017)Proximity-Based Good Turing Discounting and Kernel Functions for Pseudo-Relevance FeedbackInternational Journal of Information Retrieval Research10.4018/IJIRR.20170701017:3(1-21)Online publication date: 1-Jul-2017
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media