Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/956863.956952acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Exploiting syntactic structure of queries in a language modeling approach to IR

Published: 03 November 2003 Publication History

Abstract

Natural Language Processing (NLP) techniques have been explored to enhance the performance of Information Retrieval (IR) methods with varied results. Most efforts in using NLP techniques have been to identify better index terms for representing documents. This use in the indexing phase of IR has implicit effect on retrieval performance. However, the explicit use of NLP techniques during the retrieval or information seeking phase has been restricted to interactive or dialogue systems. Recent advances in IR are based on using Statistical Language Models (SLM) to represent documents and ranking them based on their model generating a given user query. This paper presents a novel method for using NLP techniques on user queries, specifically, a syntactic parse of a query, in the statistical language modeling approach to IR. In the proposed method, named Concept Language Models, a query is viewed as a sequence of concepts and a concept as a sequence terms. The paper presents different approximations to estimate the concept and term probabilities and compute the query likelihood estimate for documents. Some empirical results on TREC test collections comparing Concept Language Models with smoothed N-gram language models are presented.

References

[1]
R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, New York, New York, 1999.]]
[2]
A. Berger and J. D. Lafferty. Information retrieval as statistical translation. In Proceedings of SIGIR'99, pages 222--229, 1999.]]
[3]
P. F. Brown, J. Cocke, S. D. Pietra, V. J. D. Pietra, F. Jelinek, J. D. Lafferty, R. L. Mercer, and P. S. Roossin. A statistical approach to machine translation. Computational Linguistics, 16(2):79--85, 1990.]]
[4]
C. Chelba and F. Jelinek. Exploiting Syntactic Structures for Language Modeling. In Proceedings of the COLING-ACL Meeting, pages 225--231, Montreal, Canada, 1998.]]
[5]
E. Greengrass. Information retrieval: A survey.]]
[6]
D. Hiemstra. A linguistically motivated probabilistic model of information retrieval. In European Conference on Digital Libraries, pages 569--584, 1998.]]
[7]
F. Jelinek. Statistical Methods for Speech Recognition. The MIT Press, Cambridge, Massachusetts, 1998.]]
[8]
S. Khudanpur and J. Wu. Maximum Entropy Techniques for Exploiting Syntactic, Semantic and Collocational Dependencies in Language Modeling. Computer Speech and Language, pages 355--372, 2000.]]
[9]
J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In Proceedings of SIGIR'01, pages 111--119, 2001.]]
[10]
V. Lavrenko and W. B. Croft. R elevance-based Language Models. In Proceedings of SIGIR'01, pages 120--127. ACM, New York, 2001.]]
[11]
D. R. H. Miller, T. Leek, and R. M. Schwartz. A hidden markov model information retrieval system. In Proceedings of SIGIR'99, pages 214--221, 1999.]]
[12]
M. Mitra, C. Buckley, A. Singhal, and C. Cardie. An Analysis of Statistical and Syntactic Phrases. In Proceedings of RIAO -97, 5th International Conference "Recherche d'Information Assistee par Ordinateur", pages 200--214, Montreal, CA, 1997.]]
[13]
R. Nallapati and J. Allan. Capturing Term Dependencies using a Sentence Tree based Language Model. In Proceedings of CIKM'02, pages 383--390, 2002.]]
[14]
M. Narita and Y. Ogawa. The Use of Phrases from Query Texts in Information Retrieval. In Proceedings of SIGIR'00, 2000.]]
[15]
National Institute of Standards and Technology (NIST). Text REtrieval Conferences (TREC), Gaithersburg, MD., 1993-. URL:http://trec.nist.gov .]]
[16]
J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of SIGIR'98, pages 275--281. ACM, New York, 1998.]]
[17]
R. Rosenfeld. A aptive Statistical Language Modeling: A Maximum Entropy Approach. PhD thesis, Carnegie Mellon University, April 1994.]]
[18]
A. F. Smeaton and C. J. van Rijsbergen. Experiments on Incorporating Syntactic Processing of User Queries into a Document Retrieval Strategy. In Proceedings of SIGIR'88, 1988.]]
[19]
F. Song and W. B. Croft. A general language model for information retrieval. In Proceedings of SIGIR'99, pages 279--280, 1999.]]
[20]
K. Sparck-Jones. What is the Role of NLP in Text Retrieval. In T. Strzalkowski, editor, Natural Language Information Retrieval, pages 1--25. Kluwer, 1999.]]
[21]
R. Srihari and W. Li. A Question Answering System Supported by Information Extraction. In Proceedings of ANLP'00, pages 166--172, 2000.]]
[22]
R. Srihari, C. Niu, and W. Li. A Hybrid Approach to Named Entity and Sub-type Tagging. In Proceedings of ANLP'00, pages 247--254, 2000.]]
[23]
R. K. Srihari, M. Srikanth, C. Niu, and W. Li. Use of maximum entropy in back-off modeling for a named entity tagger. In Proceedings of the HKK Conference, pages 159--164, June 1999.]]
[24]
M. Srikanth and R. Srihari. Biterm Language Models for Document Retrieval. In Proceedings of SIGIR'02, pages 425--426, 2002.]]
[25]
M. Srikanth and R. Srihari. Incorporating Query Term Dependencies in Language Models for Document Retrieval. to appear in Proceedings of SIGIR'03, 2003.]]
[26]
T. Strzalkowski, J. Perez-Carballo, J. Karlgren, A. Hulth, P. Tapanainen, and T. Lahtinen. Natural Language Information Retrieval: TREC -8 report. In Proceedings of TREC -8, Gaithersburg, CA, 1999. NIST.]]
[27]
C. J. van Rijsbergen. Information Retrieval. Butterworths, 1979.]]
[28]
H. Zaragoza, D. Hiemstra, and M. Tipping. Bayesian Extension to the Langauge Model for Ad Hoc Information Retrieval. In Proceedings of SIGIR'03, pages 4--9. ACM, New York, 2003.]]
[29]
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of SIGIR'01, pages 334--342, 2001.]]

Cited By

View all
  • (2022)Multi-word terms selection for information retrievalInformation Discovery and Delivery10.1108/IDD-12-2021-014251:1(74-87)Online publication date: 28-Jun-2022
  • (2018)A novel retrieval approach reflecting variability of syntactic phrase representationJournal of Intelligent Information Systems10.1007/s10844-007-0045-031:3(265-286)Online publication date: 28-Dec-2018
  • (2010)A generic framework for collaborative multi‐perspective ontology acquisitionOnline Information Review10.1108/1468452101102417334:1(145-159)Online publication date: 23-Feb-2010
  • Show More Cited By

Index Terms

  1. Exploiting syntactic structure of queries in a language modeling approach to IR

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management
      November 2003
      592 pages
      ISBN:1581137230
      DOI:10.1145/956863
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 03 November 2003

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. information retrieval
      2. language models
      3. natural language processing
      4. query processing

      Qualifiers

      • Article

      Conference

      CIKM03

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 19 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Multi-word terms selection for information retrievalInformation Discovery and Delivery10.1108/IDD-12-2021-014251:1(74-87)Online publication date: 28-Jun-2022
      • (2018)A novel retrieval approach reflecting variability of syntactic phrase representationJournal of Intelligent Information Systems10.1007/s10844-007-0045-031:3(265-286)Online publication date: 28-Dec-2018
      • (2010)A generic framework for collaborative multi‐perspective ontology acquisitionOnline Information Review10.1108/1468452101102417334:1(145-159)Online publication date: 23-Feb-2010
      • (2009)Word or phrase?Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 210.5555/1690219.1690293(1048-1056)Online publication date: 2-Aug-2009
      • (2009)Model Fusion in Conceptual Language ModelingProceedings of the 31th European Conference on IR Research on Advances in Information Retrieval10.1007/978-3-642-00958-7_23(240-251)Online publication date: 18-Apr-2009
      • (2008)An axiomatic approach to exploit term dependencies in language modelProceedings of the 4th Asia information retrieval conference on Information retrieval technology10.5555/1786374.1786457(586-591)Online publication date: 15-Jan-2008
      • (2008)A generic framework for collaborative multi-perspective ontology acquisitionProceedings of the 17th international conference on World Wide Web10.1145/1367497.1367639(1027-1028)Online publication date: 21-Apr-2008
      • (2008)An Axiomatic Approach to Exploit Term Dependencies in Language ModelInformation Retrieval Technology10.1007/978-3-540-68636-1_68(586-591)Online publication date: 2008
      • (2007)Using social annotations to smooth the language model for IRProceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining10.5555/1764441.1764558(1015-1021)Online publication date: 22-May-2007
      • (2007)A novel dependency language model for information retrievalJournal of Zhejiang University-SCIENCE A10.1631/jzus.2007.A08718:6(871-882)Online publication date: 1-May-2007
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media