CLEF 2023 SimpleText Track

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13982))

Included in the following conference series:

European Conference on Information Retrieval

2096 Accesses

Abstract

The general public tends to avoid reliable sources such as scientific literature due to their complex language and lacking background knowledge. Instead, they rely on shallow and derived sources on the web and in social media – often published for commercial or political incentives, rather than the informational value. Can text simplification help to remove some of these access barriers? This paper presents the CLEF 2023 SimpleText track tackling technical and evaluation challenges of scientific information access for a general audience. We provide appropriate reusable data and benchmarks for scientific text simplification, and promote novel research to reduce barriers in understanding complex texts. Our overall use-case is to create a simplified summary of multiple scientific documents based on a popular science query which provides a user with an accessible overview on this specific topic. The track has the following three concrete tasks. Task 1 (What is in, or out?): Selecting passages to include in a simplified summary. Task 2 (What is unclear?): Difficult concept identification and explanation. Task 3 (Rewrite this!): Text simplification - rewriting scientific text. The three tasks together form a pipeline of a scientific text simplification system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Overview of the CLEF 2023 SimpleText Lab: Automatic Simplification of Scientific Texts

Overview of the CLEF 2022 SimpleText Lab: Automatic Simplification of Scientific Texts

Automatic Simplification of Scientific Texts: SimpleText Lab at CLEF-2022

Notes

References

August, T., Reinecke, K., Smith, N.A.: Generating scientific definitions with controllable complexity. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 8298–8317 (2022)
Google Scholar
Bott, S., Saggion, H.: An unsupervised alignment algorithm for text simplification corpus construction. In: Proceedings of the Workshop on Monolingual Text-To-Text Generation, pp. 20–26 (2011)
Google Scholar
Cardon, R., Grabar, N.: French biomedical text simplification: when small and precise helps. In: Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, pp. 710–716. International Committee on Computational Linguistics (2020). https://www.aclweb.org/anthology/2020.coling-main.62
Chandrasekaran, M.K., et al.: Overview of the first workshop on scholarly document processing (SDP). In: Proceedings of the First Workshop on Scholarly Document Processing, pp. 1–6. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.sdp-1.1. https://aclanthology.org/2020.sdp-1.1/
Chen, P., Rochford, J., Kennedy, D.N., Djamasbi, S., Fay, P., Scott, W.: Automatic text simplification for people with intellectual disabilities. In: Artificial Intelligence Science and Technology, pp. 725–731. World Scientific (2016). https://www.worldscientific.com/doi/abs/10.1142/9789813206823_0091
Cruz, F., Coustaty, M., Augereau, O., Kise, K., Journet, N.: An interactive recommendation system for 2nd language vocabulary learning-vocabulometer 2.0. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 3, pp. 28–32. IEEE (2019)
Google Scholar
Ermakova, L., et al.: Overview of SimpleText 2021 - CLEF workshop on text simplification for scientific information access. In: Candan, K.S., et al. (eds.) CLEF 2021. LNCS, vol. 12880, pp. 432–449. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85251-1_27
Chapter Google Scholar
Ermakova, L., et al.: Automatic simplification of scientific texts: SimpleText lab at CLEF-2022. In: Hagen, M., et al. (eds.) ECIR 2022. LNCS, vol. 13186, pp. 364–373. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-99739-7_46
Chapter Google Scholar
Ermakova, L., Ovchinnikova, I., Kamps, J., Nurbakova, D., Araújo, S., Hannachi, R.: Overview of the CLEF 2022 SimpleText task 2: complexity spotting in scientific abstracts. In: Faggioli et al. [12]
Google Scholar
Ermakova, L., Ovchinnikova, I., Kamps, J., Nurbakova, D., Araújo, S., Hannachi, R.: Overview of the CLEF 2022 SimpleText task 3: query biased simplification of scientific texts. In: Faggioli et al. [12]
Google Scholar
Ermakova, L., et al.: Overview of the CLEF 2022 SimpleText lab: automatic simplification of scientific texts. In: Barrón-Cedeño, A., et al. (eds.) CLEF 2022. LNCS, vol. 13390, pp. 470–494. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-13643-6_28
Chapter Google Scholar
Faggioli, G., Ferro, N., Hanbury, A., Potthast, M. (eds.): Proceedings of the Working Notes of CLEF 2022: Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings (2022)
Google Scholar
Gala, N., Tack, A., Javourey-Drevet, L., François, T., Ziegler, J.C.: Alector: a parallel corpus of simplified French texts with alignments of misreadings by poor and dyslexic readers. In: Language Resources and Evaluation for Language Technologies (LREC) (2020)
Google Scholar
Grabar, N., Saggion, H.: Evaluation of automatic text simplification: where are we now, where should we go from here. In: Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1: conférence principale, pp. 453–463 (2022)
Google Scholar
Inui, K., Fujita, A., Takahashi, T., Iida, R., Iwakura, T.: Text simplification for reading assistance: a project note. In: Proceedings of the Second International Workshop on Paraphrasing - Volume 16, PARAPHRASE 2003, pp. 9–16. ACL, USA (2003). https://doi.org/10.3115/1118984.1118986
Kochmar, E., Gooding, S., Shardlow, M.: Detecting multiword expression type helps lexical complexity assessment. In: LREC 2020: Proceedings of the 12th Conference on Language Resources and Evaluation (2020)
Google Scholar
Monteiro, J., Aguiar, M., Araújo, S.: Using a pre-trained SimpleT5 model for text simplification in a limited corpus. In: Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum, Bologna, Italy, 5–8 September 2022, Bologna, Italy. CEUR Workshop Proceedings, CEUR-WS.org (2022)
Google Scholar
Mostert, F., Sampatsing, A., Spronk, M., Kamps, J.: University of Amsterdam at the CLEF 2022 SimpleText track. In: Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum, Bologna, Italy, 5–8 September 2022, Bologna, Italy. CEUR Workshop Proceedings, CEUR-WS.org (2022)
Google Scholar
Nakatani, M., Jatowt, A., Tanaka, K.: Easiest-first search: towards comprehension-based web search. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 2057–2060 (2009)
Google Scholar
Navigli, R., Velardi, P.: Learning word-class lattices for definition and hypernym extraction. In: ACL, pp. 1318–1327 (2010)
Google Scholar
Ravana, S.D., Moffat, A.: Score aggregation techniques in retrieval experimentation. In: Proceedings of the Twentieth Australasian Conference on Australasian Database, vol. 92, pp. 57–66 (2009)
Google Scholar
Rello, L., Baeza-Yates, R., Bott, S., Saggion, H.: Simplify or help? Text simplification strategies for people with dyslexia. In: Proceedings of the 10th International Cross-Disciplinary Conference on Web Accessibility, pp. 1–10 (2013)
Google Scholar
Rigouts Terryn, A., Hoste, V., Drouin, P., Lefever, E.: Termeval 2020: shared task on automatic term extraction using the annotated corpora for term extraction research (ACTER) dataset. In: 6th International Workshop on Computational Terminology (COMPUTERM 2020), pp. 85–94. European Language Resources Association (ELRA) (2020)
Google Scholar
Robertson, S., Zaragoza, H., et al.: The probabilistic relevance framework: BM25 and beyond. Found. Trends® Inf. Retrieval 3(4), 333–389 (2009)
Google Scholar
Rubio, A., Martínez, P.: HULAT-UC3M at SimpleText@CLEF-2022: scientific text simplification using BART. In: Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum, Bologna, Italy, 5–8 September 2022. CEUR Workshop Proceedings, CEUR-WS.org (2022)
Google Scholar
SanJuan, E., Huet, S., Kamps, J., Ermakova, L.: Overview of the CLEF 2022 SimpleText task 1: passage selection for a simplified summary. In: Faggioli et al. [12]
Google Scholar
Sheang, K.C., Saggion, H.: Controllable sentence simplification with a unified text-to-text transfer transformer. In: Proceedings of the 14th International Conference on Natural Language Generation, pp. 341–352 (2021)
Google Scholar
Siddharthan, A.: An architecture for a text simplification system (2002). https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.9968 &rank=1
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: ArnetMiner: extraction and mining of academic social networks. In: KDD 2008, pp. 990–998 (2008)
Google Scholar
Xu, W., Callison-Burch, C., Napoles, C.: Problems in current text simplification research: new data can help. Trans. ACL 3, 283–297 (2015). https://www.mitpressjournals.org/doi/abs/10.1162/tacl_a_00139
Yimam, S.M., et al.: A report on the complex word identification shared task 2018. In: The 13th Workshop on Innovative Use of NLP for Building Educational Applications (NAACL2018 Workshops) (2018)
Google Scholar
Zhang, X., Lapata, M.: Sentence simplification with deep reinforcement learning. In: EMNLP 2017: Conference on Empirical Methods in Natural Language Processing, pp. 584–594. Association for Computational Linguistics (2017)
Google Scholar
Zhu, Z., Bernhard, D., Gurevych, I.: A monolingual tree-based translation model for sentence simplification. In: Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), Beijing, China, pp. 1353–1361. Coling 2010 Organizing Committee (2010). https://www.aclweb.org/anthology/C10-1152
Štajner, S., Sheang, K.C., Saggion, H.: Sentence Simplification Capabilities of Transfer-Based Models (2022)
Google Scholar

Download references

Acknowledgment

This track would not have been possible without the great support of numerous individuals. We want thank in particular Silvia Araujo, Patrice Bellot, Julien Boccou, Pierre De Loor, Radia Hannachi, Helen McCombie, Diana Nurbakova, Irina Ovchinnikov, and Léa Talec; the students of the Université de Bretagne Occidentale; and all the 2022 track participants for their great help in discussing and shaping the track, and in creating all the evaluation data and training data for 2023. We also thank the MaDICS (https://www.madics.fr/ateliers/simpletext/) research group and the French National Research Agency (project ANR-22-CE23-0019-01).

Author information

Authors and Affiliations

Université de Bretagne Occidentale, HCTI, Brest, France
Liana Ermakova
Avignon Université, LIA, Avignon, France
Eric SanJuan & Stéphane Huet
ENIB, Lab-STICC UMR CNRS 6285, Brest, France
Olivier Augereau
Elsevier, Amsterdam, The Netherlands
Hosein Azarbonyad
University of Amsterdam, Amsterdam, The Netherlands
Jaap Kamps

Authors

Liana Ermakova
View author publications
You can also search for this author in PubMed Google Scholar
Eric SanJuan
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Huet
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Augereau
View author publications
You can also search for this author in PubMed Google Scholar
Hosein Azarbonyad
View author publications
You can also search for this author in PubMed Google Scholar
Jaap Kamps
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liana Ermakova .

Editor information

Editors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Jaap Kamps
Université Grenoble-Alpes, Saint-Martin-d’Hères, France
Lorraine Goeuriot
Università della Svizzera Italiana, Lugano, Switzerland
Fabio Crestani
University of Copenhagen, Copenhagen, Denmark
Maria Maistro
University of Tsukuba, Ibaraki, Japan
Hideo Joho
Dublin City University, Dublin, Ireland
Brian Davis
Dublin City University, Dublin, Ireland
Cathal Gurrin
Universität Regensburg, Regensburg, Germany
Udo Kruschwitz
Dublin City University, Dublin, Ireland
Annalina Caputo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ermakova, L., SanJuan, E., Huet, S., Augereau, O., Azarbonyad, H., Kamps, J. (2023). CLEF 2023 SimpleText Track. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. Lecture Notes in Computer Science, vol 13982. Springer, Cham. https://doi.org/10.1007/978-3-031-28241-6_62

Download citation

DOI: https://doi.org/10.1007/978-3-031-28241-6_62
Published: 16 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28240-9
Online ISBN: 978-3-031-28241-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics