research-article

Recommending Search Queries in Documents Using Inter N-Gram Similarities

Authors:

Eilon Sheetrit,

Yaroslav Fyodorov,

Oren KurlandAuthors Info & Claims

ICTIR '21: Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval

Pages 211 - 220

https://doi.org/10.1145/3471158.3472252

Published: 31 August 2021 Publication History

Abstract

Reading a document can often trigger a need for additional information. For example, a reader of a news article might be interested in information about the persons and events mentioned in the article. Accordingly, there is a line of work on recommending search-engine queries given a document read by a user. Often, the recommended queries are selected from a query log independently of each other, and are presented to the user without any context. We address a novel query recommendation task where the recommended queries must be n-grams (sequences of consecutive terms) in the document. Furthermore, inspired by work on using inter-document similarities for document retrieval, we explore the merits of using inter n-gram similarities for query recommendation. Specifically, we use a supervised approach to learn an inter n-gram similarity measure where the goal is that n-grams that are likely to serve as queries will be deemed more similar to each other than to other n-grams. We use the similarity measure in a wide variety of query recommendation approaches which we devise as adaptations of ad hoc document retrieval techniques. Empirical evaluation performed using data gathered from Yahoo!'s search engine logs attests to the effectiveness of the resultant recommendation methods.

References

[1]

James Allan, Margaret E. Connell, W. Bruce Croft, Fang-Fang Feng, David Fisher, and Xiaoyan Li. 2000. INQUERY and TREC-9. In Proc. of TREC. 551--562.

[2]

Ricardo Baeza-Yates, Carlos Hurtado, and Marcelo Mendoza. 2004. Query recommendation using query logs in search engines. In Proc. of EDBT. 588--596.

Digital Library

[3]

Sumit Bhatia, Debapriyo Majumdar, and Prasenjit Mitra. 2011. Query suggestions in the absence of query logs. In Proc. of SIGIR. 795--804.

Digital Library

[4]

Paolo Boldi, Francesco Bonchi, Carlos Castillo, Debora Donato, and Sebastiano Vigna. 2009. Query suggestions using query-flow graphs. In Proc. of WSCD. 56--63.

Digital Library

[5]

Ilaria Bordino, Gianmarco De Francisci Morales, Ingmar Weber, and Francesco Bonchi. 2013. From machu_picchu to rafting the urubamba river: anticipating information needs via the entity-query graph. In Proc. of WSDM. 275--284.

Digital Library

[6]

Sergey Brin and Lawrence Page. 1998. The Anatomy of a Large-Scale Hypertextual Web Search Engine. In Proc. of WWW. 107--117.

Digital Library

[7]

David Carmel, Yaroslav Fyodorov, Saar Kuzi, Avihai Mejer, Fiana Raiber, and Elad Rainshmidt. 2019. Enriching News Articles with Related Search Queries. In Proc. of WWW. 162--172.

Digital Library

[8]

David Carmel, Anna Shtok, and Oren Kurland. 2013. Position-based contextualization for passage retrieval. In Proc. of CIKM. 1241--1244.

Digital Library

[9]

Zhicong Cheng, Bin Gao, and Tie-Yan Liu. 2010. Actively predicting diverse search intent from user browsing behaviors. In Proc. of WWW. 221--230.

Digital Library

[10]

Gordon V. Cormack, Charles L. A. Clarke, and Stefan Büttcher. 2009. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proc. of SIGIR. 758--759.

Digital Library

[11]

Van Dang and Bruce W Croft. 2010. Query reformulation using anchor text. In Proc. of WSDM. 41--50.

Digital Library

[12]

Bruno M Fonseca, Paulo Golgher, Bruno Pôssas, Berthier Ribeiro-Neto, and Nivio Ziviani. 2005. Concept-based interactive query expansion. In Proc. of CIKM. 696--703.

Digital Library

[13]

Jerome H. Friedman. 2001. Greedy function approximation: A gradient boosting machine. The Annals of Statistics 28, 5 (2001), 1379--1389.

[14]

Evgeniy Gabrilovich and Shaul Markovitch. 2007. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proc. of IJCAI, Vol. 7. 1606--1611.

[15]

Claudia Hauff, Vanessa Murdock, and Ricardo A. Baeza-Yates. 2008. Improved query difficulty prediction for the web. In Proc. of CIKM. 439--448.

[16]

Bernard J Jansen, Amanda Spink, and Tefko Saracevic. 2000. Real life, real users, and real needs: a study and analysis of user queries on the web. Information processing & management 36, 2 (2000), 207--227.

[17]

Nick Jardine and Cornelis Joost van Rijsbergen. 1971. The use of hierarchic clustering in information retrieval. Information storage and retrieval 7, 5 (1971), 217--240.

[18]

Thorsten Joachims. 2006. Training linear SVMs in linear time. In Proc. of KDD. 217--226.

Digital Library

[19]

Weize Kong, Rui Li, Jie Luo, Aston Zhang, Yi Chang, and James Allan. 2015. Predicting Search Intent Based on Pre-Search Context. In Proc. of SIGIR. 503--512.

Digital Library

[20]

Eyal Krikon, Oren Kurland, and Michael Bendersky. 2010. Utilizing inter-passage and inter-document similarities for re-ranking search results. ACM Transactions on Information Systems 29, 1 (2010).

Digital Library

[21]

Oren Kurland. 2009. Re-ranking search results using language models of query- specific clusters. Journal of Information Retrieval 12, 4 (August 2009), 437--460.

Digital Library

[22]

Oren Kurland and Carmel Domshlak. 2008. A rank-aggregation approach to searching for optimal query-specific clusters. In Proc. of SIGIR. 547--554.

Digital Library

[23]

Oren Kurland and Lillian Lee. 2010. PageRank without hyperlinks: Structural reranking using links induced by language models. ACM Transactions on Information Systems 28, 4 (2010), 18.

Digital Library

[24]

Chia-Jung Lee and W Bruce Croft. 2012. Generating queries from user-selected text. In Proc. of IIiX. 100--109.

Digital Library

[25]

Kyung-Soon Lee, Young-Chan Park, and Key-Sun Choi. 2001. Re-ranking model based on document clusters. Information Processing and Management 37, 1 (2001), 1--14.

Digital Library

[26]

Or Levi, Ido Guy, Fiana Raiber, and Oren Kurland. 2018. Selective Cluster Presentation on the Search Results Page. ACM Transactions on Information Systems 36, 3 (2018), 28:1--28:42.

Digital Library

[27]

Tie-Yan Liu. 2011. Learning to Rank for Information Retrieval. Springer. I--XVII, 1--285 pages.

Digital Library

[28]

Xiaoyong Liu and W. Bruce Croft. 2004. Cluster-Based Retrieval Using Language Models. In Proc. of SIGIR. 186--193.

[29]

Xiaoyong Liu and W. Bruce Croft. 2008. Evaluating text representations for retrieval of the best group of documents. In Proc. of ECIR. 454--462.

[30]

Craig Macdonald, Rodrygo L. T. Santos, and Iadh Ounis. 2012. On the usefulness of query features for learning to rank. In Proc. of CIKM. 2559--2562.

Digital Library

[31]

Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Proc. of ACL. 55--60.

[32]

Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proc. of EMNLP. 404--411.

[33]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013).

[34]

Rodrigo Nogueira and Jimmy Lin. 2019. From doc2query to docTTTTTquery. Online preprint (2019).

[35]

Rodrigo Nogueira, Wei Yang, Jimmy Lin, and Kyunghyun Cho. 2019. Document Expansion by Query Prediction. CoRR abs/1904.08375 (2019).

[36]

Adi Omari, David Carmel, Oleg Rokhlenko, and Idan Szpektor. 2016. Novelty based ranking of human answers for community questions. In Proc. of SIGIR. 215--224.

Digital Library

[37]

Mandar Rahurkar and Silviu Cucerzan. 2008. Predicting when browsing context is relevant to search. In Proc. of SIGIR. 841--842.

Digital Library

[38]

Fiana Raiber and Oren Kurland. 2013. Ranking document clusters using markov random fields. In Proc. of SIGIR. 333--342.

Digital Library

[39]

Fiana Raiber and Oren Kurland. 2014. The correlation between cluster hypothesis tests and the effectiveness of cluster-based retrieval. In Proc. of SIGIR. 1155--1158.

Digital Library

[40]

Fiana Raiber, Oren Kurland, Filip Radlinski, and Milad Shokouhi. 2015. Learning Asymmetric Co-Relevance. In Proc. of ICTIR. 281--290.

Digital Library

[41]

Hadas Raviv, Oren Kurland, and David Carmel. 2013. The cluster hypothesis for entity oriented search. In Proc. of SIGIR. 841--844.

Digital Library

[42]

Eilon Sheetrit, Anna Shtok, and Oren Kurland. 2020. A passage-based approach to learning to rank documents. Information Retrieval Journal 23, 2 (2020), 159--186.

Digital Library

[43]

Eilon Sheetrit, Anna Shtok, Oren Kurland, and Igal Shprincis. 2018. Testing the Cluster Hypothesis with Focused and Graded Relevance Judgments. In Proc. of SIGIR. 1173--1176.

Digital Library

[44]

Wei Shen, Jianyong Wang, and Jiawei Han. 2014. Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Transactions on Knowledge and Data Engineering 27, 2 (2014), 443--460.

[45]

Sandeep Subramanian, Tong Wang, Xingdi Yuan, Saizheng Zhang, Yoshua Bengio, and Adam Trischler. 2017. Neural models for key phrase detection and question generation. CoRR abs/1706.04560 (2017).

[46]

Idan Szpektor, Aristides Gionis, and Yoelle Maarek. 2011. Improving recommendation for long-tail queries via templates. In Proc. of WWW. 47--56.

Digital Library

[47]

C. J. van Rijsbergen. 1979. Information Retrieval (second ed.). Butterworths.

[48]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Proc. of NIPS. 5998--6008.

[49]

Ellen M. Voorhees. 1985. The cluster hypothesis revisited. In Proc. of SIGIR. 188--196.

Digital Library

[50]

Qiang Wu, Christopher J. C. Burges, Krysta Marie Svore, and Jianfeng Gao. 2010. Adapting boosting for information retrieval measures. Information Retrieval 13, 3 (2010), 254--270.

Digital Library

[51]

Ying Zhao, Falk Scholer, and Yohannes Tsegay. 2008. Effective Pre-retrieval Query Performance Prediction Using Similarity and Variability Evidence. In Proc. of ECIR. 52--64.

Cited By

Index Terms

Recommending Search Queries in Documents Using Inter N-Gram Similarities
1. Information systems
  1. Information retrieval

Recommendations

Recommending high-utility search engine queries via a query-recommending model

Query recommendation technology is of great importance for search engines, because it can assist users to find the information they require. Many query recommendation algorithms have been proposed, but they all aim to recommend similar queries and ...
A unified framework for recommending diverse and relevant queries
WWW '11: Proceedings of the 20th international conference on World wide web

Query recommendation has been considered as an effective way to help search users in their information seeking activities. Traditional approaches mainly focused on recommending alternative queries with close search intent to the original query. However, ...
Entity-Based Query Recommendation for Long-Tail Queries

Query recommendation, which suggests related queries to search engine users, has attracted a lot of attention in recent years. Most of the existing solutions, which perform analysis of users’ search history (or query logs), are often insufficient for ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICTIR '21: Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval

July 2021

334 pages

ISBN:9781450386111

DOI:10.1145/3471158

General Chair:
Faegheh Hasibi
Radboud University, Netherlands
,
Program Chairs:
Yi Fang
Santa Clara University, USA
,
Akiko Aizawa
National Institute of Informatics, Japan

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 August 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Israel Science Foundation

Conference

ICTIR '21

Sponsor:

SIGIR

ICTIR '21: The 2021 ACM SIGIR International Conference on the Theory of Information Retrieval

July 11, 2021

Virtual Event, Canada

Acceptance Rates

Overall Acceptance Rate 235 of 527 submissions, 45%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
77
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents