Abstract
Query auto-completion (QAC) aims at suggesting plausible completions for a given query prefix. Traditionally, QAC systems have leveraged tries curated from historical query logs to suggest most popular completions. In this context, there are two specific scenarios that are difficult to handle for any QAC system: short prefixes (which are inherently ambiguous) and unseen prefixes. Recently, personalized Natural Language Generation (NLG) models have been proposed to leverage previous session queries as context for addressing these two challenges. However, such NLG models suffer from two drawbacks: (1) some of the previous session queries could be noisy and irrelevant to the user intent for the current prefix, and (2) NLG models cannot directly incorporate historical query popularity. This motivates us to propose a novel NLG model for QAC, Trie-NLG, which jointly leverages popularity signals from trie and personalization signals from previous session queries. We train the Trie-NLG model by augmenting the prefix with rich context comprising of recent session queries and top trie completions. This simple modeling approach overcomes the limitations of trie-based and NLG-based approaches, and leads to state-of-the-art performance. We evaluate the Trie-NLG model using two large QAC datasets. On average, our model achieves huge \(\sim\)57% and \(\sim\)14% boost in MRR over the popular trie-based lookup and the strong BART-based baseline methods, respectively. We make our code publicly available at https://github.com/kaushal0494/Trie-NLG.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and materials
We make our code available as supplementary information. It is in line with our claims in the paper and can be trusted to reproduce the results. We use one public dataset (AOL) and a Bing dataset. Since Bing data is proprietary, it cannot be made publicly available.
Code availability
We make our code publicly available and it is attached as supplementary data.
Notes
A small percent of suggestions are not prefix preserving; in this work, we focus on prefix-preserving suggestions only.
References
Bar-Yossef Z, Kraus N (2011) Context-sensitive query auto-completion. In: Proceedings of the 20th international conference on world wide web (pp. 107–116)
Bhatia S,Majumdar D, Mitra P (2011) Query suggestions in the absence of query logs. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval (pp. 795–804)
Bonchi F, Perego R, Silvestri F, Vahabi H, Venturini R (2012) Efficient query recommendations in the long tail via center-piece subgraphs. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval (pp. 345–354)
Burges CJ (2010) From ranknet to lambdarank to lambdamart: an overview. Learning 11:23–581
Cai F, De Rijke M (2016) Query auto completion in information retrieval. Universiteit van Amsterdam [Host]
Cai F , Liang S , De Rijke, M (2014) Time-sensitive personalized query auto-completion. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management (pp. 1599–1608)
Dehghani M , Rothe S , Alfonseca E , Fleury P (2017) Learning to attend, copy, and generate for session-based query suggestion. Proceedings of the 2017 ACM on conference on information and knowledge management (pp. 1747–1756)
Gehman S , Gururangan S , Sap M , Choi Y , Smith NA (2020) Real Toxicity Prompts: evaluating neural toxic degeneration in language models. Findings of the association for computational linguistics: Emnlp 2020 (pp. 3356–3369). Online Association for Computational Linguistics
Hofmann K, Mitra B, Radlinski F, Shokouhi M (2014) An eye-tracking study of user interactions with query auto completion. Proceedings of the 23rd ACM international conference on conference on information and knowledge management (pp. 549–558)
Hsu B-J, Ottaviano G (2013) Space-efficient data structures for top-k completion. In: Proceedings of the 22nd international conference on world wide web (pp. 583–594)
Huang C-K, Chien L-F, Oyang Y-J (2003) Relevant term suggestion in interactive web search based on contextual information in query session logs. J Am Soc Inf Sci Technol 54(7):638–649
Jiang J-Y , Ke Y-Y , Chien P-Y , Cheng P-J (2014) Learning user reformulation behavior for query auto-completion. In: Proceedings of the 37th international ACM SIGIIR conference on research & development in information retrieval (pp. 445–454)
Lewis M , Liu Y , Goyal N , Ghazvininejad M , Mohamed A , Levy O, Zettlemoyer, L (2020) Bart: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 7871–7880)
Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N et al (2020) Retrieval-augmented generation for knowledge-intensive NLP tasks. Adv Neural Inf Process Syst 33:9459–9474
Maxwell D , Bailey P , Hawking D (2017) Large-scale generative query autocompletion. In: Proceedings of the 22nd australasian document computing symposium (pp. 1–8)
Mei Q , Zhou D , Church K (2008) Query suggestion using hitting time. In:: Proceedings of the 17th acm conference on information and knowledge management (pp. 469–478)
Mitra B, Craswell N (2015) Query auto-completion for rare prefixes. In: Proceedings of the 24th acm international on conference on information and knowledge management (pp. 1755–1758)
Mitra B , Shokouhi M , Radlinski F, Hofmann K (2014) On user interactions with query auto-completion. In: Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval (pp. 1055–1058)
Mustar A, Lamprier S, Piwowarski B (2020) Using bert and bart for query suggestion. Circle
Park DH, Chiba R(2017) A neural language model for query auto-completion. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval (pp. 1189–1192)
Pass G , Chowdhury A , Torgeson C (2006) A picture of search. In: Proceedings of the 1st international conference on scalable information systems (pp. 1–es)
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M et al (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(140):1–67
Rosset C , Jose D , Ghosh G , Mitra B Tiwary S (2018) Optimizing query evaluations using reinforcement learning for web search. In: The 41st international ACM SIGIR conference on research & development in information retrieval (pp. 1193–1196)
Sadikov E , Madhavan J , Wang L , Halevy A (2010) Clustering query refinements by user intent. In: Proceedings of the 19th international conference on world wide web (pp. 841–850)
Shokouhi M (2013) Learning to personalize query auto-completion. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval (pp. 103–112)
Shokouhi M, Radinsky K (2012) Time-sensitive query auto-completion. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval (pp. 601–610)
Sordoni A , Bengio Y , Vahabi H , Lioma C , Grue Simonsen J , Nie J-Y (2015) A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. In: Proceedings of the 24th ACM international on conference on information and knowledge management (pp. 553–562)
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al (2017) Attention is all you need. Adv Neural Inf Process Syst. Vol 30
Wang P-W, Zhang H, Mohan V, Dhillon IS, Kolter JZ (2018) Realtime query completion via deep language models. ecom@ sigir
Wu Q, Burges CJ, Svore KM, Gao J (2010) Adapting boosting for information retrieval measures. Inf Retrieval 13(3):254–270
Yadav N, Sen R, Hill DN, Mazumdar A, Dhillon IS (2021) Session-aware query auto-completion using extreme multi-label ranking. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining (pp. 3835–3844)
Yin D, Tan J, Zhang Z, Deng H, Huang S, Chen J (2020) Learning to generate personalized query auto-completions via a multi-view multi-task attentive approach. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 2998–3007)
Zhou G, Zhu X, Song C, Fan Y, Zhu H , Ma X, et al (2018) Deep interest network for click-through rate prediction. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 1059–1068)
Funding
This research was partially funded by Microsoft Academic Partnership Grant 2022.
Author information
Authors and Affiliations
Contributions
All authors contributed to the design of the project. Specifically, Conceptualization, Methodology, Data preparation: [KKM]; Writing—original draft preparation: [KKM, MG, MSD]; Writing—review and editing: [everyone]; Funding acquisition, Resources, Supervision: [MG, MSD, PA].
Corresponding author
Ethics declarations
Conflicts of interest
The authors have no competing interests to declare relevant to this article’s content.
Consent for publication
All the authors of this work have been informed of the submission, and everyone supports it.
Additional information
Responsible editor: Charalampos Tsourakakis.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Maurya, K.K., Desarkar, M.S., Gupta, M. et al. trie-nlg: trie context augmentation to improve personalized query auto-completion for short and unseen prefixes. Data Min Knowl Disc 37, 2306–2329 (2023). https://doi.org/10.1007/s10618-023-00966-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-023-00966-0