short-paper

Neural Disentanglement of Query Difficulty and Semantics

Authors:

Negar Arabzadeh,

Shirin Seyedsalehi,

Morteza Zihayat,

Ebrahim BagheriAuthors Info & Claims

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Pages 4264 - 4268

https://doi.org/10.1145/3583780.3615189

Published: 21 October 2023 Publication History

Abstract

Researchers have shown that the retrieval effectiveness of queries may depend on other factors in addition to the semantics of the query. In other words, several queries expressed with the same intent, and even using overlapping keywords, may exhibit completely different degrees of retrieval effectiveness. As such, the objective of our work in this paper is to propose a neural disentanglement method that is able to disentangle query semantics from query difficulty. The disentangled query semantics representation provides the means to determine semantic association between queries whereas the disentangled query difficulty representation would allow for the estimation of query effectiveness. We show through our experiments on the query performance prediction; and, query similarity calculation tasks that our proposed disentanglement method is able to show better performance compared to the state of the art.

References

[1]

Negar Arabzadeh, Amin Bigdeli, Shirin Seyedsalehi, Morteza Zihayat, and Ebrahim Bagheri. 2021. Matches Made in Heaven: Toolkit and Large-Scale Datasets for Supervised Query Reformulation. In Proceedings of the 30th ACM Int'l Conf. on Information & Knowledge Management. 4417--4425.

Digital Library

[2]

Negar Arabzadeh, Amin Bigdeli, Morteza Zihayat, and Ebrahim Bagheri. 2021. Query Performance Prediction Through Retrieval Coherency. In Advances in Information Retrieval: 43rd European Conf. on IR Research, ECIR 2021, Virtual Event, March 28--April 1, 2021, Proceedings, Part II 43. Springer, 193--200.

Digital Library

[3]

Negar Arabzadeh, Maryam Khodabakhsh, and Ebrahim Bagheri. 2021. BERTQPP: contextualized pre-trained transformers for query performance prediction. In Proceedings of the 30th ACM Int'l Conference on Information & Knowledge Management. 2857--2861.

[4]

Negar Arabzadeh, Alexandra Vtyurina, Xinyi Yan, and Charles LA Clarke. 2022. Shallow pooling for sparse labels. Information Retrieval Journal 25, 4 (2022), 365--385.

Digital Library

[5]

Negar Arabzadeh, Xinyi Yan, and Charles LA Clarke. 2021. Predicting efficiency/effectiveness trade-offs for Dense vs. Sparse retrieval strategy selection. In Proceedings of the 30th ACM Int'l Conference on Information & Knowledge Management. 2862--2866.

Digital Library

[6]

Negar Arabzadeh, Fattane Zarrinkalam, Jelena Jovanovic, Feras Al-Obeidat, and Ebrahim Bagheri. 2020. Neural embedding-based specificity metrics for preretrieval query performance prediction. Information Processing & Management 57, 4 (2020), 102248.

[7]

Negar Arabzadeh, Fattane Zarrinkalam, Jelena Jovanovic, and Ebrahim Bagheri. 2020. Neural embedding-based metrics for pre-retrieval query performance prediction. In Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14--17, 2020, Proceedings, Part II 42. Springer, 78--85.

Digital Library

[8]

David Carmel and Elad Yom-Tov. 2010. Estimating the query difficulty for information retrieval. Synthesis Lectures on Information Concepts, Retrieval, and Services 2, 1 (2010), 1--89.

[9]

Nick Craswell, Bhaskar Mitra, Emine Yilmaz, and Daniel Campos. 2021. Overview of the TREC 2020 deep learning track. CoRR abs/2102.07662 (2021). arXiv:2102.07662 https://arxiv.org/abs/2102.07662

[10]

Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Ellen M Voorhees. 2020. Overview of the TREC 2019 deep learning track. arXiv preprint arXiv:2003.07820 (2020).

[11]

Steve Cronen-Townsend, Yun Zhou, and W Bruce Croft. 2002. Predicting query performance. In Proceedings of the 25th annual Int'l ACM SIGIR Conf. on Research and development in information retrieval. 299--306.

Digital Library

[12]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[13]

Guglielmo Faggioli, Oleg Zendel, J Shane Culpepper, Nicola Ferro, and Falk Scholer. 2022. sMARE: a new paradigm to evaluate and understand query performance prediction methods. Information Retrieval Journal 25, 2 (2022), 94--122.

Digital Library

[14]

Claudia Hauff. 2010. Predicting the effectiveness of queries and retrieval systems. In SIGIR Forum, Vol. 44. 88.

Digital Library

[15]

Ben He and Iadh Ounis. 2004. Inferring Query Performance Using Pre-retrieval Predictors. In String Processing and Information Retrieval, 11th Int'l Conf., SPIRE 2004, Padova, Italy, October 5--8, 2004, Proceedings. 43--54. https://doi.org/10.1007/978--3--540--30213--1_5

[16]

Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. 2020. Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654 (2020).

[17]

Jie Hu, Liujuan Cao, Tong Tong, Qixiang Ye, Shengchuan Zhang, Ke Li, Feiyue Huang, Ling Shao, and Rongrong Ji. 2021. Architecture disentanglement for deep neural networks. In Proceedings of the IEEE/CVF Int'l Conference on Computer Vision. 672--681.

[18]

Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, and Eric P. Xing. 2017. Toward Controlled Generation of Text. In Proceedings of the 34th Int'l Conf. on Machine Learning - Vol. 70 (ICML'17). JMLR.org, 1587--1596.

[19]

Zhenya Huang, Xin Lin, HaoWang, Qi Liu, Enhong Chen, Jianhui Ma, Yu Su, and Wei Tong. 2021. Disenqnet: Disentangled representation learning for educational questions. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 696--704.

Digital Library

[20]

Yongcheng Jing, Yezhou Yang, Zunlei Feng, Jingwen Ye, Yizhou Yu, and Mingli Song. 2019. Neural style transfer: A review. IEEE transactions on visualization and computer graphics 26, 11 (2019), 3365--3385.

[21]

Vladimir Karpukhin, Barlas O?uz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense passage retrieval for opendomain question answering. arXiv preprint arXiv:2004.04906 (2020).

[22]

Maryam Khodabakhsh and Ebrahim Bagheri. 2021. Semantics-enabled query performance prediction for ad hoc table retrieval. Information Processing & Management 58, 1 (2021), 102399.

[23]

Heejin Kim and Kyung-Ah Sohn. 2020. How Positive Are You: Text Style Transfer using Adaptive Style Embedding. In Proceedings of the 28th Int'l Conf. on Computational Linguistics. Int'l Committee on Computational Linguistics, Barcelona, Spain (Online), 2115--2125. https://doi.org/10.18653/v1/2020.coling-main.191

[24]

K. L. Kwok. 1996. A New Method ofWeighting Query Terms for Ad-Hoc Retrieval. In Proceedings of the 19th Annual Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval, SIGIR'96, August 18--22, 1996, Zurich, Switzerland (Special Issue of the SIGIR Forum). 187--195. https://doi.org/10.1145/243199.243266

Digital Library

[25]

Jongpil Lee, Nicholas J Bryan, Justin Salamon, Zeyu Jin, and Juhan Nam. 2020. Metric learning vs classification for disentangled music representation learning. arXiv preprint arXiv:2008.03729 (2020).

[26]

Hang Li and Zhengdong Lu. 2016. Deep learning for information retrieval. In Proceedings of the 39th Int'l ACM SIGIR conference on Research and Development in Information Retrieval. 1203--1206.

Digital Library

[27]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).

[28]

Marc Moreno Lopez and Jugal Kalita. 2017. Deep Learning applied to NLP. arXiv preprint arXiv:1703.03091 (2017).

[29]

Iain Mackie, Jeffrey Dalton, and Andrew Yates. 2021. How Deep is your Learning: the DL-HARD Annotated Deep Learning Dataset. In Proceedings of the 44th Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval.

Digital Library

[30]

Shikib Mehri and Giuseppe Carenini. 2017. Chat disentanglement: Identifying semantic reply relationships with random forests and recurrent neural networks. In Proceedings of the Eighth Int'l Joint Conference on Natural Language Processing (Vol. 1: Long Papers). 615--623.

[31]

Chuan Meng, Negar Arabzadeh, Mohammad Aliannejadi, and Maarten de Rijke. 2023. Query Performance Prediction: From Ad-hoc to Conversational Search. arXiv preprint arXiv:2305.10923 (2023).

[32]

Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A human generated machine reading comprehension dataset. choice 2640 (2016), 660.

[33]

Rodrigo Nogueira, Jimmy Lin, and AI Epistemic. 2019. From doc2query to docTTTTTquery. Online preprint 6 (2019).

[34]

Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Jingfang Xu, and Xueqi Cheng. 2017. Deeprank: A new deep architecture for relevance ranking in information retrieval. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 257--266.

Digital Library

[35]

Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084 (2019).

[36]

Dwaipayan Roy, Debasis Ganguly, Mandar Mitra, and Gareth JF Jones. 2019. Estimating gaussian mixture models in the local neighbourhood of embedded word vectors for query performance prediction. Information processing & management 56, 3 (2019), 1026--1045.

[37]

Yashvardhan Sharma and Sahil Gupta. 2018. Deep learning approaches for question answering system. Procedia computer science 132 (2018), 785--794.

[38]

Anna Shtok, Oren Kurland, David Carmel, Fiana Raiber, and Gad Markovits. 2012. Predicting query performance by query-drift estimation. ACM Transactions on Information Systems (TOIS) 30, 2 (2012), 1--35.

Digital Library

[39]

Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2020. Mpnet: Masked and permuted pre-training for language understanding. Advances in Neural Information Processing Systems 33 (2020), 16857--16867.

[40]

Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. 2020. Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. Advances in Neural Information Processing Systems 33 (2020), 5776--5788.

[41]

Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, and Arnold Overwijk. 2020. Approximate nearest neighbor negative contrastive learning for dense text retrieval. arXiv preprint arXiv:2007.00808 (2020).

[42]

Peilin Yang, Hui Fang, and Jimmy Lin. 2017. Anserini: Enabling the use of lucene for information retrieval research. In Proceedings of the 40th Int'l ACM SIGIR Conf. on research and development in information retrieval. 1253--1256.

Digital Library

[43]

Zihan Ye, Fuyuan Hu, Fan Lyu, Linyan Li, and Kaizhu Huang. 2021. Disentangling semantic-to-visual confusion for zero-shot learning. IEEE Transactions on Multimedia 24 (2021), 2828--2840.

Digital Library

[44]

Ying Zhao, Falk Scholer, and Yohannes Tsegay. 2008. Effective Pre-retrieval Query Performance Prediction Using Similarity and Variability Evidence. In Advances in Information Retrieval, 30th European Conf. on IR Research, ECIR 2008, Glasgow, UK, March 30-April 3, 2008. Proceedings. 52--64. https://doi.org/10.1007/978--3--540--78646--7_8

[45]

Ce Zhou, Qian Li, Chen Li, Jun Yu, Yixin Liu, GuangjingWang, Kai Zhang, Cheng Ji, Qiben Yan, Lifang He, et al. 2023. A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv preprint arXiv:2302.09419 (2023).

[46]

Anna Zhu, Zhanhui Yin, Brian Kenji Iwana, Xinyu Zhou, and Shengwu Xiong. 2022. Text Style Transfer Based on Multi-Factor Disentanglement and Mixture. In Proceedings of the 30th ACM Int'l Conf. on Multimedia (MM '22). ACM, New York, NY, USA, 2430--2440. https://doi.org/10.1145/3503161.3548239

Digital Library

Index Terms

Neural Disentanglement of Query Difficulty and Semantics
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Noisy Perturbations for Estimating Query Difficulty in Dense Retrievers
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Estimating query difficulty, also known as Query Performance Prediction (QPP), is concerned with assessing the retrieval quality of a ranking method for an input query. Most traditional unsupervised frequency-based models and many recent supervised ...
Co-occurrence based predictors for estimating query difficulty
ICDMW '10: Proceedings of the 2010 IEEE International Conference on Data Mining Workshops

Query difficulty prediction aims to identify, in advance, how reliably an information retrieval system will perform when faced with a particular user request. The prediction of query difficulty level is an interesting and important issue in Information ...
Query containment under bag and bag-set semantics

Conjunctive queries (CQs) are at the core of query languages encountered in many logic-based research fields such as AI, or database systems. The majority of existing work assumes set semantics but often in real applications the manipulation of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

October 2023

5508 pages

ISBN:9798400701245

DOI:10.1145/3583780

General Chairs:
Ingo Frommholz
University of Wolverhampton, UK
,
Frank Hopfgartner
University of Koblenz, Germany
,
Mark Lee
University of Birmingham, UK
,
Michael Oakes
University of Birmingham, UK
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Min Zhang
Tsinghua University, China
,
Rodrygo Santos
Federal University of Minas Gerais, Brazil

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

CIKM '23

Sponsor:

CIKM '23: The 32nd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2023

Birmingham, United Kingdom

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
114
Total Downloads

Downloads (Last 12 months)114
Downloads (Last 6 weeks)9

Reflects downloads up to 30 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents