Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3471158.3472252acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
research-article

Recommending Search Queries in Documents Using Inter N-Gram Similarities

Published: 31 August 2021 Publication History

Abstract

Reading a document can often trigger a need for additional information. For example, a reader of a news article might be interested in information about the persons and events mentioned in the article. Accordingly, there is a line of work on recommending search-engine queries given a document read by a user. Often, the recommended queries are selected from a query log independently of each other, and are presented to the user without any context. We address a novel query recommendation task where the recommended queries must be n-grams (sequences of consecutive terms) in the document. Furthermore, inspired by work on using inter-document similarities for document retrieval, we explore the merits of using inter n-gram similarities for query recommendation. Specifically, we use a supervised approach to learn an inter n-gram similarity measure where the goal is that n-grams that are likely to serve as queries will be deemed more similar to each other than to other n-grams. We use the similarity measure in a wide variety of query recommendation approaches which we devise as adaptations of ad hoc document retrieval techniques. Empirical evaluation performed using data gathered from Yahoo!'s search engine logs attests to the effectiveness of the resultant recommendation methods.

References

[1]
James Allan, Margaret E. Connell, W. Bruce Croft, Fang-Fang Feng, David Fisher, and Xiaoyan Li. 2000. INQUERY and TREC-9. In Proc. of TREC. 551--562.
[2]
Ricardo Baeza-Yates, Carlos Hurtado, and Marcelo Mendoza. 2004. Query recommendation using query logs in search engines. In Proc. of EDBT. 588--596.
[3]
Sumit Bhatia, Debapriyo Majumdar, and Prasenjit Mitra. 2011. Query suggestions in the absence of query logs. In Proc. of SIGIR. 795--804.
[4]
Paolo Boldi, Francesco Bonchi, Carlos Castillo, Debora Donato, and Sebastiano Vigna. 2009. Query suggestions using query-flow graphs. In Proc. of WSCD. 56--63.
[5]
Ilaria Bordino, Gianmarco De Francisci Morales, Ingmar Weber, and Francesco Bonchi. 2013. From machu_picchu to rafting the urubamba river: anticipating information needs via the entity-query graph. In Proc. of WSDM. 275--284.
[6]
Sergey Brin and Lawrence Page. 1998. The Anatomy of a Large-Scale Hypertextual Web Search Engine. In Proc. of WWW. 107--117.
[7]
David Carmel, Yaroslav Fyodorov, Saar Kuzi, Avihai Mejer, Fiana Raiber, and Elad Rainshmidt. 2019. Enriching News Articles with Related Search Queries. In Proc. of WWW. 162--172.
[8]
David Carmel, Anna Shtok, and Oren Kurland. 2013. Position-based contextualization for passage retrieval. In Proc. of CIKM. 1241--1244.
[9]
Zhicong Cheng, Bin Gao, and Tie-Yan Liu. 2010. Actively predicting diverse search intent from user browsing behaviors. In Proc. of WWW. 221--230.
[10]
Gordon V. Cormack, Charles L. A. Clarke, and Stefan Büttcher. 2009. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proc. of SIGIR. 758--759.
[11]
Van Dang and Bruce W Croft. 2010. Query reformulation using anchor text. In Proc. of WSDM. 41--50.
[12]
Bruno M Fonseca, Paulo Golgher, Bruno Pôssas, Berthier Ribeiro-Neto, and Nivio Ziviani. 2005. Concept-based interactive query expansion. In Proc. of CIKM. 696--703.
[13]
Jerome H. Friedman. 2001. Greedy function approximation: A gradient boosting machine. The Annals of Statistics 28, 5 (2001), 1379--1389.
[14]
Evgeniy Gabrilovich and Shaul Markovitch. 2007. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proc. of IJCAI, Vol. 7. 1606--1611.
[15]
Claudia Hauff, Vanessa Murdock, and Ricardo A. Baeza-Yates. 2008. Improved query difficulty prediction for the web. In Proc. of CIKM. 439--448.
[16]
Bernard J Jansen, Amanda Spink, and Tefko Saracevic. 2000. Real life, real users, and real needs: a study and analysis of user queries on the web. Information processing & management 36, 2 (2000), 207--227.
[17]
Nick Jardine and Cornelis Joost van Rijsbergen. 1971. The use of hierarchic clustering in information retrieval. Information storage and retrieval 7, 5 (1971), 217--240.
[18]
Thorsten Joachims. 2006. Training linear SVMs in linear time. In Proc. of KDD. 217--226.
[19]
Weize Kong, Rui Li, Jie Luo, Aston Zhang, Yi Chang, and James Allan. 2015. Predicting Search Intent Based on Pre-Search Context. In Proc. of SIGIR. 503--512.
[20]
Eyal Krikon, Oren Kurland, and Michael Bendersky. 2010. Utilizing inter-passage and inter-document similarities for re-ranking search results. ACM Transactions on Information Systems 29, 1 (2010).
[21]
Oren Kurland. 2009. Re-ranking search results using language models of query- specific clusters. Journal of Information Retrieval 12, 4 (August 2009), 437--460.
[22]
Oren Kurland and Carmel Domshlak. 2008. A rank-aggregation approach to searching for optimal query-specific clusters. In Proc. of SIGIR. 547--554.
[23]
Oren Kurland and Lillian Lee. 2010. PageRank without hyperlinks: Structural reranking using links induced by language models. ACM Transactions on Information Systems 28, 4 (2010), 18.
[24]
Chia-Jung Lee and W Bruce Croft. 2012. Generating queries from user-selected text. In Proc. of IIiX. 100--109.
[25]
Kyung-Soon Lee, Young-Chan Park, and Key-Sun Choi. 2001. Re-ranking model based on document clusters. Information Processing and Management 37, 1 (2001), 1--14.
[26]
Or Levi, Ido Guy, Fiana Raiber, and Oren Kurland. 2018. Selective Cluster Presentation on the Search Results Page. ACM Transactions on Information Systems 36, 3 (2018), 28:1--28:42.
[27]
Tie-Yan Liu. 2011. Learning to Rank for Information Retrieval. Springer. I--XVII, 1--285 pages.
[28]
Xiaoyong Liu and W. Bruce Croft. 2004. Cluster-Based Retrieval Using Language Models. In Proc. of SIGIR. 186--193.
[29]
Xiaoyong Liu and W. Bruce Croft. 2008. Evaluating text representations for retrieval of the best group of documents. In Proc. of ECIR. 454--462.
[30]
Craig Macdonald, Rodrygo L. T. Santos, and Iadh Ounis. 2012. On the usefulness of query features for learning to rank. In Proc. of CIKM. 2559--2562.
[31]
Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Proc. of ACL. 55--60.
[32]
Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proc. of EMNLP. 404--411.
[33]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013).
[34]
Rodrigo Nogueira and Jimmy Lin. 2019. From doc2query to docTTTTTquery. Online preprint (2019).
[35]
Rodrigo Nogueira, Wei Yang, Jimmy Lin, and Kyunghyun Cho. 2019. Document Expansion by Query Prediction. CoRR abs/1904.08375 (2019).
[36]
Adi Omari, David Carmel, Oleg Rokhlenko, and Idan Szpektor. 2016. Novelty based ranking of human answers for community questions. In Proc. of SIGIR. 215--224.
[37]
Mandar Rahurkar and Silviu Cucerzan. 2008. Predicting when browsing context is relevant to search. In Proc. of SIGIR. 841--842.
[38]
Fiana Raiber and Oren Kurland. 2013. Ranking document clusters using markov random fields. In Proc. of SIGIR. 333--342.
[39]
Fiana Raiber and Oren Kurland. 2014. The correlation between cluster hypothesis tests and the effectiveness of cluster-based retrieval. In Proc. of SIGIR. 1155--1158.
[40]
Fiana Raiber, Oren Kurland, Filip Radlinski, and Milad Shokouhi. 2015. Learning Asymmetric Co-Relevance. In Proc. of ICTIR. 281--290.
[41]
Hadas Raviv, Oren Kurland, and David Carmel. 2013. The cluster hypothesis for entity oriented search. In Proc. of SIGIR. 841--844.
[42]
Eilon Sheetrit, Anna Shtok, and Oren Kurland. 2020. A passage-based approach to learning to rank documents. Information Retrieval Journal 23, 2 (2020), 159--186.
[43]
Eilon Sheetrit, Anna Shtok, Oren Kurland, and Igal Shprincis. 2018. Testing the Cluster Hypothesis with Focused and Graded Relevance Judgments. In Proc. of SIGIR. 1173--1176.
[44]
Wei Shen, Jianyong Wang, and Jiawei Han. 2014. Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Transactions on Knowledge and Data Engineering 27, 2 (2014), 443--460.
[45]
Sandeep Subramanian, Tong Wang, Xingdi Yuan, Saizheng Zhang, Yoshua Bengio, and Adam Trischler. 2017. Neural models for key phrase detection and question generation. CoRR abs/1706.04560 (2017).
[46]
Idan Szpektor, Aristides Gionis, and Yoelle Maarek. 2011. Improving recommendation for long-tail queries via templates. In Proc. of WWW. 47--56.
[47]
C. J. van Rijsbergen. 1979. Information Retrieval (second ed.). Butterworths.
[48]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Proc. of NIPS. 5998--6008.
[49]
Ellen M. Voorhees. 1985. The cluster hypothesis revisited. In Proc. of SIGIR. 188--196.
[50]
Qiang Wu, Christopher J. C. Burges, Krysta Marie Svore, and Jianfeng Gao. 2010. Adapting boosting for information retrieval measures. Information Retrieval 13, 3 (2010), 254--270.
[51]
Ying Zhao, Falk Scholer, and Yohannes Tsegay. 2008. Effective Pre-retrieval Query Performance Prediction Using Similarity and Variability Evidence. In Proc. of ECIR. 52--64.

Cited By

View all

Index Terms

  1. Recommending Search Queries in Documents Using Inter N-Gram Similarities

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICTIR '21: Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval
    July 2021
    334 pages
    ISBN:9781450386111
    DOI:10.1145/3471158
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 August 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. inter N-gram similarity
    2. query recommendation

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ICTIR '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 235 of 527 submissions, 45%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 77
      Total Downloads
    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media