Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2348283.2348354acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Automatic term mismatch diagnosis for selective query expansion

Published: 12 August 2012 Publication History

Abstract

People are seldom aware that their search queries frequently mismatch a majority of the relevant documents. This may not be a big problem for topics with a large and diverse set of relevant documents, but would largely increase the chance of search failure for less popular search needs. We aim to address the mismatch problem by developing accurate and simple queries that require minimal effort to construct. This is achieved by targeting retrieval interventions at the query terms that are likely to mismatch relevant documents. For a given topic, the proportion of relevant documents that do not contain a term measures the probability for the term to mismatch relevant documents, or the term mismatch probability. Recent research demonstrates that this probability can be estimated reliably prior to retrieval. Typically, it is used in probabilistic retrieval models to provide query dependent term weights. This paper develops a new use: Automatic diagnosis of term mismatch. A search engine can use the diagnosis to suggest manual query reformulation, guide interactive query expansion, guide automatic query expansion, or motivate other responses. The research described here uses the diagnosis to guide interactive query expansion, and create Boolean conjunctive normal form (CNF) structured queries that selectively expand 'problem' query terms while leaving the rest of the query untouched. Experiments with TREC Ad-hoc and Legal Track datasets demonstrate that with high quality manual expansion, this diagnostic approach can reduce user effort by 33%, and produce simple and effective structured queries that surpass their bag of word counterparts.

References

[1]
J. A. Aslam and V. Pavlu. A practical sampling strategy for efficient retrieval evaluation. Report. May 2007.
[2]
J. R. Baron, D. D. Lewis and D. W. Oard. TREC 2006 Legal Track Overview. In Proceedings of the fifteenth Text REtrieval Conference (TREC '06), 2007.
[3]
G. Cao, S. Robertson and J. Nie. Selecting Query Term Alterations for Web Search by Exploiting Query Contexts. In Proceedings of 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-08: HLT). 148--155, 2008.
[4]
C. L. A. Clarke, G. V. Cormack and F. J. Burkowski. Shortest Substring Ranking (MultiText Experiments for TREC-4). In Proceedings of the Fourth Text REtrieval Conference (TREC-4). 1996.
[5]
V. Dang and W. B. Croft. Query reformulation using anchor text. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM '10). 41--50, 2010.
[6]
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6): 391--407. 1990.
[7]
G. W. Furnas, T. K. Landauer, L. M. Gomez, and S. T. Dumais. The vocabulary problem in human-system communication. Communications of ACM, 30(11): 964--971. ACM. New York, NY. November, 1987.
[8]
W. Greiff. A theory of term weighting based on exploratory data analysis. In Proceedings of 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '98). 11--19, 1998.
[9]
D. Harman. Towards interactive query expansion. In Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '88). 321--331, 1988.
[10]
D. Harman. Overview of the Third Text REtrieval Conference (TREC-3). In Proceedings of the 3rd Text REtrieval Conference (TREC '94), 1995.
[11]
D. Harman. Overview of the Third Text REtrieval Conference (TREC-4). In Proceedings of the 4th Text REtrieval Conference (TREC '95), 1996.
[12]
S. Harter. Online Information Retrieval: Concepts, Principles, and Techniques. Academic Press. San Diego, California. 1986.
[13]
M. Hearst. Improving full-text precision on short queries using simple constraints. In Proceedings of the Fifth Annual Symposium on Document Analysis and Information Retrieval (SDAIR '96), 1996.
[14]
R. Jones, B. Rey, O. Madani and W. Greiner. Generating query substitutions. In Proceedings of the 15th International Conference on World Wide Web (WWW '06). 387--396, 2006.
[15]
J. Lamping and S. Baker. Determining query term synonyms within query context. United States Patent No. 7,636,714. USPTO March, 2005.
[16]
W. Lancaster. Information Retrieval Systems: Characteristics, Testing and Evaluation. Wiley. New York, New York, USA. 1968.
[17]
V. Lavrenko and W. B. Croft. Relevance-based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '01). 120--127, 2001.
[18]
J. Lin and M. D. Smucker. How do users find things with ?: towards automatic utility evaluation with user simulations. In Proceedings of the 31st annual international ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR '08). 19--26, 2008.
[19]
D. Metzler and W.B. Croft. Combining the language model and inference network approaches to retrieval. Information Processing and Management Special Issue on Bayesian Networks and Information Retrieval, 40(5), 735--750, 2004.
[20]
D. Metzler. Generalized inverse document frequency. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM '08). 399--408, 2008.
[21]
M. Mitra, A. Singhal and C. Buckley. Improving automatic query expansion, Proceedings of 21st annual international ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR '98). 206--214, 1998.
[22]
F. Peng, N. Ahmed, X. Li and Y. Lu. Context sensitive stemming for Web search. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '07). 639--646, 2007.
[23]
S. E. Robertson and K. Spärck Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3):129--146. 1976.
[24]
G. Salton, E. A. Fox and H. Wu. Extended Boolean information retrieval. Communications of ACM, 26(11): 1022--1036. ACM. New York, NY. November 1983.
[25]
S. Tomlinson. Experiments with the Negotiated Boolean Queries of the TREC 2006 Legal Discovery Track. In Proceedings of the fifteenth Text REtrieval Conference (TREC '06), 2007.
[26]
S. Tomlinson, D. W. Oard, J. R. Baron and P. Thompson. Overview of the TREC 2007 Legal Track. In Proceedings of the sixteenth Text REtrieval Conference (TREC '07), 2008.
[27]
E. Tudhope. Query based stemming. PhD Thesis. University of Waterloo. 1996.
[28]
X. Wang and C. Zhai. Mining term association patterns from search logs for effective query reformulation. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM '08). 479--488, 2008.
[29]
R. W. White, I. Ruthven, J. M. Jose, and C. J. Van Rijsbergen. Evaluating implicit feedback models using searcher simulations. ACM Trans. Inf. Syst. 23(3): 325--361. July, 2005.
[30]
X. Xue, W. B. Croft and D. A. Smith. Modeling reformulation using passage analysis. In Proceedings of the 19th ACM Conference on Information and Knowledge Management (CIKM '10). 2010.
[31]
L. Zhao and J. Callan. Effective and efficient structured retrieval (poster description). In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM '09). 1573--1576, 2009.
[32]
L. Zhao and J. Callan. Term necessity prediction. In Proceedings of the 19th ACM Conference on Information and Knowledge Management (CIKM '10). 2010.
[33]
Y. Zhu, L. Zhao, J. Callan and J. Carbonell. Structured queries for legal search. In Proceedings of the sixteenth Text REtrieval Conference (TREC '07), 2008

Cited By

View all
  • (2024)Query Expansion Using Proposed Location-Based Algorithm for Hindi–English CLIR: Analyzing Three Test CollectionsInternational Journal of Pattern Recognition and Artificial Intelligence10.1142/S021800142459001838:05Online publication date: 11-May-2024
  • (2023)Selective Query Processing: A Risk-Sensitive Selection of Search ConfigurationsACM Transactions on Information Systems10.1145/360847442:1(1-35)Online publication date: 21-Aug-2023
  • (2023)Bengali Document Retrieval Using Model CombinationProceedings of International Conference on Frontiers in Computing and Systems10.1007/978-981-99-2680-0_9(91-101)Online publication date: 1-Aug-2023
  • Show More Cited By

Index Terms

  1. Automatic term mismatch diagnosis for selective query expansion

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
      August 2012
      1236 pages
      ISBN:9781450314725
      DOI:10.1145/2348283
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 August 2012

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. boolean conjunctive normal form queries
      2. query term diagnosis
      3. simulated user interactions
      4. term expansion
      5. term mismatch

      Qualifiers

      • Research-article

      Conference

      SIGIR '12
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)6
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 01 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Query Expansion Using Proposed Location-Based Algorithm for Hindi–English CLIR: Analyzing Three Test CollectionsInternational Journal of Pattern Recognition and Artificial Intelligence10.1142/S021800142459001838:05Online publication date: 11-May-2024
      • (2023)Selective Query Processing: A Risk-Sensitive Selection of Search ConfigurationsACM Transactions on Information Systems10.1145/360847442:1(1-35)Online publication date: 21-Aug-2023
      • (2023)Bengali Document Retrieval Using Model CombinationProceedings of International Conference on Frontiers in Computing and Systems10.1007/978-981-99-2680-0_9(91-101)Online publication date: 1-Aug-2023
      • (2022)Analytics Methods to Understand Information Retrieval Effectiveness—A SurveyMathematics10.3390/math1012213510:12(2135)Online publication date: 19-Jun-2022
      • (2022)An empirical study of the effectiveness of IR-based bug localization for large-scale industrial projectsEmpirical Software Engineering10.1007/s10664-021-10082-627:2Online publication date: 28-Jan-2022
      • (2021)Defining an Optimal Configuration Set for Selective Search Strategy - A Risk-Sensitive ApproachProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482422(1335-1345)Online publication date: 26-Oct-2021
      • (2021)A Comparative Studies of Automatic Query Formulation in Full-Text Database Search of Chinese Digital HumanitiesDiversity, Divergence, Dialogue10.1007/978-3-030-71292-1_35(457-468)Online publication date: 19-Mar-2021
      • (2020)Translation and Expansion: Enabling Laypeople Access to the COVID-19 Academic CollectionData and Information Management10.2478/dim-2020-00114:3(177-190)Online publication date: Sep-2020
      • (2020)How Graduate Computing Students Search When Using an Unfamiliar Programming LanguageProceedings of the 28th International Conference on Program Comprehension10.1145/3387904.3389274(160-171)Online publication date: 13-Jul-2020
      • (2019)An Advanced User Intent Model Based On User Learning ProcessInternational Journal of Pattern Recognition and Artificial Intelligence10.1142/S021800142050024X34:09(2050024)Online publication date: 13-Dec-2019
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media