Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3583780.3615189acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Neural Disentanglement of Query Difficulty and Semantics

Published: 21 October 2023 Publication History

Abstract

Researchers have shown that the retrieval effectiveness of queries may depend on other factors in addition to the semantics of the query. In other words, several queries expressed with the same intent, and even using overlapping keywords, may exhibit completely different degrees of retrieval effectiveness. As such, the objective of our work in this paper is to propose a neural disentanglement method that is able to disentangle query semantics from query difficulty. The disentangled query semantics representation provides the means to determine semantic association between queries whereas the disentangled query difficulty representation would allow for the estimation of query effectiveness. We show through our experiments on the query performance prediction; and, query similarity calculation tasks that our proposed disentanglement method is able to show better performance compared to the state of the art.

References

[1]
Negar Arabzadeh, Amin Bigdeli, Shirin Seyedsalehi, Morteza Zihayat, and Ebrahim Bagheri. 2021. Matches Made in Heaven: Toolkit and Large-Scale Datasets for Supervised Query Reformulation. In Proceedings of the 30th ACM Int'l Conf. on Information & Knowledge Management. 4417--4425.
[2]
Negar Arabzadeh, Amin Bigdeli, Morteza Zihayat, and Ebrahim Bagheri. 2021. Query Performance Prediction Through Retrieval Coherency. In Advances in Information Retrieval: 43rd European Conf. on IR Research, ECIR 2021, Virtual Event, March 28--April 1, 2021, Proceedings, Part II 43. Springer, 193--200.
[3]
Negar Arabzadeh, Maryam Khodabakhsh, and Ebrahim Bagheri. 2021. BERTQPP: contextualized pre-trained transformers for query performance prediction. In Proceedings of the 30th ACM Int'l Conference on Information & Knowledge Management. 2857--2861.
[4]
Negar Arabzadeh, Alexandra Vtyurina, Xinyi Yan, and Charles LA Clarke. 2022. Shallow pooling for sparse labels. Information Retrieval Journal 25, 4 (2022), 365--385.
[5]
Negar Arabzadeh, Xinyi Yan, and Charles LA Clarke. 2021. Predicting efficiency/effectiveness trade-offs for Dense vs. Sparse retrieval strategy selection. In Proceedings of the 30th ACM Int'l Conference on Information & Knowledge Management. 2862--2866.
[6]
Negar Arabzadeh, Fattane Zarrinkalam, Jelena Jovanovic, Feras Al-Obeidat, and Ebrahim Bagheri. 2020. Neural embedding-based specificity metrics for preretrieval query performance prediction. Information Processing & Management 57, 4 (2020), 102248.
[7]
Negar Arabzadeh, Fattane Zarrinkalam, Jelena Jovanovic, and Ebrahim Bagheri. 2020. Neural embedding-based metrics for pre-retrieval query performance prediction. In Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14--17, 2020, Proceedings, Part II 42. Springer, 78--85.
[8]
David Carmel and Elad Yom-Tov. 2010. Estimating the query difficulty for information retrieval. Synthesis Lectures on Information Concepts, Retrieval, and Services 2, 1 (2010), 1--89.
[9]
Nick Craswell, Bhaskar Mitra, Emine Yilmaz, and Daniel Campos. 2021. Overview of the TREC 2020 deep learning track. CoRR abs/2102.07662 (2021). arXiv:2102.07662 https://arxiv.org/abs/2102.07662
[10]
Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Ellen M Voorhees. 2020. Overview of the TREC 2019 deep learning track. arXiv preprint arXiv:2003.07820 (2020).
[11]
Steve Cronen-Townsend, Yun Zhou, and W Bruce Croft. 2002. Predicting query performance. In Proceedings of the 25th annual Int'l ACM SIGIR Conf. on Research and development in information retrieval. 299--306.
[12]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[13]
Guglielmo Faggioli, Oleg Zendel, J Shane Culpepper, Nicola Ferro, and Falk Scholer. 2022. sMARE: a new paradigm to evaluate and understand query performance prediction methods. Information Retrieval Journal 25, 2 (2022), 94--122.
[14]
Claudia Hauff. 2010. Predicting the effectiveness of queries and retrieval systems. In SIGIR Forum, Vol. 44. 88.
[15]
Ben He and Iadh Ounis. 2004. Inferring Query Performance Using Pre-retrieval Predictors. In String Processing and Information Retrieval, 11th Int'l Conf., SPIRE 2004, Padova, Italy, October 5--8, 2004, Proceedings. 43--54. https://doi.org/10.1007/978--3--540--30213--1_5
[16]
Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. 2020. Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654 (2020).
[17]
Jie Hu, Liujuan Cao, Tong Tong, Qixiang Ye, Shengchuan Zhang, Ke Li, Feiyue Huang, Ling Shao, and Rongrong Ji. 2021. Architecture disentanglement for deep neural networks. In Proceedings of the IEEE/CVF Int'l Conference on Computer Vision. 672--681.
[18]
Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, and Eric P. Xing. 2017. Toward Controlled Generation of Text. In Proceedings of the 34th Int'l Conf. on Machine Learning - Vol. 70 (ICML'17). JMLR.org, 1587--1596.
[19]
Zhenya Huang, Xin Lin, HaoWang, Qi Liu, Enhong Chen, Jianhui Ma, Yu Su, and Wei Tong. 2021. Disenqnet: Disentangled representation learning for educational questions. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 696--704.
[20]
Yongcheng Jing, Yezhou Yang, Zunlei Feng, Jingwen Ye, Yizhou Yu, and Mingli Song. 2019. Neural style transfer: A review. IEEE transactions on visualization and computer graphics 26, 11 (2019), 3365--3385.
[21]
Vladimir Karpukhin, Barlas O?uz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense passage retrieval for opendomain question answering. arXiv preprint arXiv:2004.04906 (2020).
[22]
Maryam Khodabakhsh and Ebrahim Bagheri. 2021. Semantics-enabled query performance prediction for ad hoc table retrieval. Information Processing & Management 58, 1 (2021), 102399.
[23]
Heejin Kim and Kyung-Ah Sohn. 2020. How Positive Are You: Text Style Transfer using Adaptive Style Embedding. In Proceedings of the 28th Int'l Conf. on Computational Linguistics. Int'l Committee on Computational Linguistics, Barcelona, Spain (Online), 2115--2125. https://doi.org/10.18653/v1/2020.coling-main.191
[24]
K. L. Kwok. 1996. A New Method ofWeighting Query Terms for Ad-Hoc Retrieval. In Proceedings of the 19th Annual Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval, SIGIR'96, August 18--22, 1996, Zurich, Switzerland (Special Issue of the SIGIR Forum). 187--195. https://doi.org/10.1145/243199.243266
[25]
Jongpil Lee, Nicholas J Bryan, Justin Salamon, Zeyu Jin, and Juhan Nam. 2020. Metric learning vs classification for disentangled music representation learning. arXiv preprint arXiv:2008.03729 (2020).
[26]
Hang Li and Zhengdong Lu. 2016. Deep learning for information retrieval. In Proceedings of the 39th Int'l ACM SIGIR conference on Research and Development in Information Retrieval. 1203--1206.
[27]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
[28]
Marc Moreno Lopez and Jugal Kalita. 2017. Deep Learning applied to NLP. arXiv preprint arXiv:1703.03091 (2017).
[29]
Iain Mackie, Jeffrey Dalton, and Andrew Yates. 2021. How Deep is your Learning: the DL-HARD Annotated Deep Learning Dataset. In Proceedings of the 44th Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval.
[30]
Shikib Mehri and Giuseppe Carenini. 2017. Chat disentanglement: Identifying semantic reply relationships with random forests and recurrent neural networks. In Proceedings of the Eighth Int'l Joint Conference on Natural Language Processing (Vol. 1: Long Papers). 615--623.
[31]
Chuan Meng, Negar Arabzadeh, Mohammad Aliannejadi, and Maarten de Rijke. 2023. Query Performance Prediction: From Ad-hoc to Conversational Search. arXiv preprint arXiv:2305.10923 (2023).
[32]
Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A human generated machine reading comprehension dataset. choice 2640 (2016), 660.
[33]
Rodrigo Nogueira, Jimmy Lin, and AI Epistemic. 2019. From doc2query to docTTTTTquery. Online preprint 6 (2019).
[34]
Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Jingfang Xu, and Xueqi Cheng. 2017. Deeprank: A new deep architecture for relevance ranking in information retrieval. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 257--266.
[35]
Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084 (2019).
[36]
Dwaipayan Roy, Debasis Ganguly, Mandar Mitra, and Gareth JF Jones. 2019. Estimating gaussian mixture models in the local neighbourhood of embedded word vectors for query performance prediction. Information processing & management 56, 3 (2019), 1026--1045.
[37]
Yashvardhan Sharma and Sahil Gupta. 2018. Deep learning approaches for question answering system. Procedia computer science 132 (2018), 785--794.
[38]
Anna Shtok, Oren Kurland, David Carmel, Fiana Raiber, and Gad Markovits. 2012. Predicting query performance by query-drift estimation. ACM Transactions on Information Systems (TOIS) 30, 2 (2012), 1--35.
[39]
Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2020. Mpnet: Masked and permuted pre-training for language understanding. Advances in Neural Information Processing Systems 33 (2020), 16857--16867.
[40]
Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. 2020. Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. Advances in Neural Information Processing Systems 33 (2020), 5776--5788.
[41]
Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, and Arnold Overwijk. 2020. Approximate nearest neighbor negative contrastive learning for dense text retrieval. arXiv preprint arXiv:2007.00808 (2020).
[42]
Peilin Yang, Hui Fang, and Jimmy Lin. 2017. Anserini: Enabling the use of lucene for information retrieval research. In Proceedings of the 40th Int'l ACM SIGIR Conf. on research and development in information retrieval. 1253--1256.
[43]
Zihan Ye, Fuyuan Hu, Fan Lyu, Linyan Li, and Kaizhu Huang. 2021. Disentangling semantic-to-visual confusion for zero-shot learning. IEEE Transactions on Multimedia 24 (2021), 2828--2840.
[44]
Ying Zhao, Falk Scholer, and Yohannes Tsegay. 2008. Effective Pre-retrieval Query Performance Prediction Using Similarity and Variability Evidence. In Advances in Information Retrieval, 30th European Conf. on IR Research, ECIR 2008, Glasgow, UK, March 30-April 3, 2008. Proceedings. 52--64. https://doi.org/10.1007/978--3--540--78646--7_8
[45]
Ce Zhou, Qian Li, Chen Li, Jun Yu, Yixin Liu, GuangjingWang, Kai Zhang, Cheng Ji, Qiben Yan, Lifang He, et al. 2023. A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv preprint arXiv:2302.09419 (2023).
[46]
Anna Zhu, Zhanhui Yin, Brian Kenji Iwana, Xinyu Zhou, and Shengwu Xiong. 2022. Text Style Transfer Based on Multi-Factor Disentanglement and Mixture. In Proceedings of the 30th ACM Int'l Conf. on Multimedia (MM '22). ACM, New York, NY, USA, 2430--2440. https://doi.org/10.1145/3503161.3548239

Index Terms

  1. Neural Disentanglement of Query Difficulty and Semantics

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
    October 2023
    5508 pages
    ISBN:9798400701245
    DOI:10.1145/3583780
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. disentanglement
    2. information retrieval
    3. query performance prediction

    Qualifiers

    • Short-paper

    Conference

    CIKM '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 114
      Total Downloads
    • Downloads (Last 12 months)114
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 30 Sep 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media