Abstract
In this paper, we describe the plans for the first LongEval CLEF 2023 shared task dedicated to evaluating the temporal persistence of Information Retrieval (IR) systems and Text Classifiers. The task is motivated by recent research showing that the performance of these models drops as the test data becomes more distant, with respect to time, from the training data. LongEval differs from traditional shared IR and classification tasks by giving special consideration to evaluating models aiming to mitigate performance drop over time. We envisage that this task will draw attention from the IR community and NLP researchers to the problem of temporal persistence of models, what enables or prevents it, potential solutions and their limitations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Qwant search engine: https://www.qwant.com/.
- 2.
- 3.
References
Alkhalifa, R., Kochkina, E., Zubiaga, A.: Building for tomorrow: assessing the temporal persistence of text classifiers. arXiv preprint arXiv:2205.05435 (2022)
Alkhalifa, R., Zubiaga, A.: Capturing stance dynamics in social media: open challenges and research directions. Int. J. Digit. Hum., 1–21 (2022)
Chapelle, O., Zhang, Y.: A dynamic Bayesian network click model for web search ranking. In: Proceedings of the 18th international conference on World Wide Web, WWW 2009, pp. 1–10. Association for Computing Machinery, New York (2009). https://doi.org/10.1145/1526709.1526711
Chuklin, A., Markov, I., Rijke, M.D.: Click models for web search. Synth. Lect. Inf. Concepts Retrieval Serv. 7(3), 1–115 (2015). https://doi.org/10.2200/S00654ED1V01Y201507ICR043
Florio, K., Basile, V., Polignano, M., Basile, P., Patti, V.: Time of your hate: the challenge of time in hate speech detection on social media. Appl. Sci. 10(12), 4180 (2020)
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Lukes, J., Søgaard, A.: Sentiment analysis under temporal shift. In: Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 65–71 (2018)
Ren, R., et al.: A thorough examination on zero-shot dense retrieval (2022). arxiv:2204.12755. https://doi.org/10.48550/ARXIV.2204.12755
Yin, W., Alkhalifa, R., Zubiaga, A.: The emojification of sentiment on social media: collection and analysis of a longitudinal Twitter sentiment dataset. arXiv preprint arXiv:2108.13898 (2021)
Acknowledgements
This work is supported by the ANR Kodicare bi-lateral project, grant ANR-19-CE23-0029 of the French Agence Nationale de la Recherche, and by the Austrian Science Fund (FWF, grant I4471-N). This work is also supported by a UKRI/EPSRC Turing AI Fellowship to Maria Liakata (grant no. EP/V030302/1) and The Alan Turing Institute (grant no. EP/N510129/1) through project funding and its Enrichment PhD Scheme for Iman Bilal. This work has been using services provided by the LINDAT/CLARIAH-CZ Research Infrastructure (https://lindat.cz), supported by the Ministry of Education, Youth and Sports of the Czech Republic (Project No. LM2018101) and has been also supported by the Ministry of Education, Youth and Sports of the Czech Republic, Project No. LM2018101 LINDAT/CLARIAH-CZ.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Alkhalifa, R. et al. (2023). LongEval: Longitudinal Evaluation of Model Performance at CLEF 2023. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. Lecture Notes in Computer Science, vol 13982. Springer, Cham. https://doi.org/10.1007/978-3-031-28241-6_58
Download citation
DOI: https://doi.org/10.1007/978-3-031-28241-6_58
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28240-9
Online ISBN: 978-3-031-28241-6
eBook Packages: Computer ScienceComputer Science (R0)