Abstract
In this paper, we describe Topic Pages, an inventory of scientific concepts and information around them extracted from a large collection of scientific books and journals. The main aim of Topic Pages is to provide all the necessary information to the readers to understand scientific concepts they come across while reading scholarly content in any scientific domain. Topic Pages are a collection of automatically generated information pages using NLP and ML, each corresponding to a scientific concept. Each page contains three pieces of information: a definition, related concepts, and the most relevant snippets, all extracted from scientific peer-reviewed publications. In this paper, we discuss the details of different components to extract each of these elements. The collection of pages in production contains over 360, 000 Topic Pages across 20 different scientific domains with an average of 23 million unique visits per month, constituting it a popular source for scientific information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3615–3620 (2019)
Espinosa-Anke, L., Schockaert, S.: Syntactically aware neural architectures for definition extraction. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), pp. 378–385 (2018)
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: COLING, pp. 539–545 (1992)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Jain, A., Gupta, N., Mujumdar, S., Mehta, S., Madhok, R.: Content driven enrichment of formal text using concept definitions and applications. In: Proceedings of the 29th on Hypertext and Social Media, pp. 96–100 (2018)
Jin, Y., Kan, M.Y., Ng, J.P., He, X.: Mining scientific terms and their definitions: a study of the ACL anthology. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 780–790 (2013)
Kang, D., Head, A., Sidhu, R., Lo, K., Weld, D.S., Hearst, M.A.: Document-level definition detection in scholarly documents: existing models, error analyses, and future directions. In: Proceedings of the First Workshop on Scholarly Document Processing, pp. 196–206 (2020)
Klavans, J.L., Muresan, S.: A method for automatically building and evaluating dictionary resources. In: Proceedings of the Third International Conference on Language Resources and Evaluation (LREC), pp. 231–234 (2002)
Kobyliński, Ł., Przepiórkowski, A.: Definition extraction with balanced random forests. In: International Conference on Natural Language Processing, pp. 237–247 (2008)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Li, S., Xu, B., Chung, T.L.: Definition extraction with LSTM recurrent neural networks. In: Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, pp. 177–189 (2016)
Malaisé, V., Otten, A., Coupet, P.: Omniscience and extensions-lessons learned from designing a multi-domain, multi-use case knowledge representation system. In: European Knowledge Acquisition Workshop, pp. 228–242 (2018)
Murthy, S.K., et a.: Accord: a multi-document approach to generating diverse descriptions of scientific concepts. arXiv preprint arXiv:2205.06982 (2022)
Navigli, R., Velardi, P.: Learning word-class lattices for definition and hypernym extraction. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 1318–1327 (2010)
Reiplinger, M., Schäfer, U., Wolska, M.: Extracting glossary sentences from scholarly articles: a comparative evaluation of pattern bootstrapping and deep analysis. In: ACL-2012 special workshop on rediscovering 50 years of discoveries, pp. 55–65 (2012)
Roig Mirapeix, M., Espinosa Anke, L., Camacho-Collados, J.: Definition extraction feature analysis: from canonical to naturally-occurring definitions. In: Proceedings of the Workshop on the Cognitive Aspects of the Lexicon, pp. 81–91 (2020)
Schwartz, A.S., Hearst, M.A.: A simple algorithm for identifying abbreviation definitions in biomedical text. In: Biocomputing 2003, pp. 451–462 (2002)
Veyseh, A., Dernoncourt, F., Dou, D., Nguyen, T.: A joint model for definition extraction with syntactic connection and semantic consistency. In: AAAI, pp. 9098–9105 (2020)
Veyseh, A.P.B., Dernoncourt, F., Tran, Q.H., Nguyen, T.H.: What does this acronym mean? Introducing a new dataset for acronym identification and disambiguation. In: COLING, pp. 3285–3301 (2020)
Westerhout, E.: Definition extraction using linguistic and structural features. In: Proceedings of the 1st Workshop on Definition Extraction, pp. 61–67 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Azarbonyad, H., Afzal, Z., Tsatsaronis, G. (2023). Generating Topic Pages for Scientific Concepts Using Scientific Publications. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. Lecture Notes in Computer Science, vol 13981. Springer, Cham. https://doi.org/10.1007/978-3-031-28238-6_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-28238-6_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28237-9
Online ISBN: 978-3-031-28238-6
eBook Packages: Computer ScienceComputer Science (R0)