Measuring the relative importance of full text sections for information retrieval from scientific literature.

Lana Yeganova, Won Gyu Kim, Donald Comeau, W John Wilbur, Zhiyong Lu

Abstract

With the growing availability of full-text articles, integrating abstracts and full texts of documents into a unified representation is essential for comprehensive search of scientific literature. However, previous studies have shown that naïvely merging abstracts with full texts of articles does not consistently yield better performance. Balancing the contribution of query terms appearing in the abstract and in sections of different importance in full text articles remains a challenge both with traditional bag-of-words IR approaches and for neural retrieval methods. In this work we establish the connection between the BM25 score of a query term appearing in a section of a full text document and the probability of that document being clicked or identified as relevant. Probability is computed using Pool Adjacent Violators (PAV), an isotonic regression algorithm, providing a maximum likelihood estimate based on the observed data. Using this probabilistic transformation of BM25 scores we show an improved performance on the PubMed Click dataset developed and presented in this study, as well as the 2007 TREC Genomics collection.

Anthology ID:: 2021.bionlp-1.27
Volume:: Proceedings of the 20th Workshop on Biomedical Language Processing
Month:: June
Year:: 2021
Address:: Online
Editors:: Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
Venue:: BioNLP
SIG:: SIGBIOMED
Publisher:: Association for Computational Linguistics
Note:
Pages:: 247–256
Language:
URL:: https://aclanthology.org/2021.bionlp-1.27/
DOI:: 10.18653/v1/2021.bionlp-1.27
Bibkey:
Cite (ACL):: Lana Yeganova, Won Gyu Kim, Donald Comeau, W John Wilbur, and Zhiyong Lu. 2021. Measuring the relative importance of full text sections for information retrieval from scientific literature.. In Proceedings of the 20th Workshop on Biomedical Language Processing, pages 247–256, Online. Association for Computational Linguistics.
Cite (Informal):: Measuring the relative importance of full text sections for information retrieval from scientific literature. (Yeganova et al., BioNLP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.bionlp-1.27.pdf

PDF Cite Search Fix data