Abstract
Scientific documents are getting published at expanding rates and create challenges for the researchers to keep themselves up to date with the new developments. Scientific document summarization solves this problem by providing summaries of essential facts and findings. We propose a novel extractive summarization technique for generating a summary of scientific documents after considering the citation context. The proposed method extracts the scientific document’s relevant sentences with respect to citation text in semantic space by utilizing the word mover’s distance (WMD); further, it clusters the extracted sentences. Moreover, it assigns a rank to cluster of sentences based on different aspects like similarity with the title of the paper, position of the sentence, length of the sentence, and maximum marginal relevance. Finally, sentences are selected from different clusters based on their ranks to form the summary. We conduct our experiments on CL-SciSumm 2016 and CL-SciSumm 2017 data sets. The obtained results are compared with the state-of-the-art techniques. Evaluation results show that our method outperforms others in terms of ROUGE-2, ROUGE-3, and ROUGE-SU4 scores.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Atanassova, I., Bertin, M., Larivière, V.: On the composition of scientific abstracts. J. Documentation 72(4), 636–647 (2016)
Bornmann, L., Mutz, R.: Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J. Am. Soc. Inf. Sci. 66(11), 2215–2222 (2015)
Carbonell, J.G., Goldstein, J.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. SIGIR. 98, 335–336 (1998)
Cohan, A., Goharian, N.: Scientific article summarization using citation-context and article’s discourse structure. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 390–400. Association for Computational Linguistics, Lisbon, Portugal, September 2015. https://doi.org/10.18653/v1/D15-1045, https://www.aclweb.org/anthology/D15-1045
Cohan, A., Goharian, N.: Scientific article summarization using citation-context and article’s discourse structure. arXiv preprint arXiv:1704.06619 (2017)
Cohan, A., Goharian, N.: Scientific document summarization via citation contextualization and scientific discourse. Int. J. Digit. Libr. 19(2), 287–303 (2017). https://doi.org/10.1007/s00799-017-0216-8
Cohan, A., Soldaini, L., Goharian, N.: Matching citation text and cited spans in biomedical literature: a search-oriented approach. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1042–1048 (2015)
Hernández-Alvarez, M., Gomez, J.M.: Survey about citation context analysis: tasks, techniques, and resources. Nat. Lang. Eng. 22(3), 327–349 (2016)
Jaidka, K., Chandrasekaran, M., Jain, D., Kan, M.Y.: The cl-scisumm shared task 2017: Results and key insights (2017)
Jaidka, K., Chandrasekaran, M.K., Jain, D., Kan, M.Y.: The cl-scisumm shared task 2017: results and key insights. In: BIRNDL@SIGIR (2017)
Jaidka, K., Chandrasekaran, M.K., Rustagi, S., Kan, M.Y.: Overview of the cl-scisumm 2016 shared task. In: Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL), pp. 93–102 (2016)
Jaidka, K., Chandrasekaran, M.K., Rustagi, S., Kan, M.Y.: Insights from cl-scisumm 2016: the faceted scientific document summarization shared task. Int. J. Digit. Libr. 19(2–3), 163–171 (2018)
Kusner, M., Sun, Y., Kolkin, N., Weinberger, K.: From word embeddings to document distances. In: International Conference on Machine Learning, pp. 957–966 (2015)
Lin, C.Y.: ROUGE: A package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81. Association for Computational Linguistics, Barcelona, Spain, July 2004. https://www.aclweb.org/anthology/W04-1013
Mendoza, M., Bonilla, S., Noguera, C., Cobos, C., León, E.: Extractive single-document summarization based on genetic operators and guided local search. Expert Syst. Appl. 41(9), 4158–4169 (2014)
Qazvinian, V., Radev, D.R., Mohammad, S.M., Dorr, B., Zajic, D., Whidby, M., Moon, T.: Generating extractive summaries of scientific paradigms. Journal of Artificial Intelligence Research 46, 165–201 (2013)
Saini, N., Saha, S., Chakraborty, D., Bhattacharyya, P.: Extractive single document summarization using binary differential evolution: optimization of different sentence quality measures. PloS one, 14(11) (2019)
Yasunaga, M., Kasai, J., Zhang, R., Fabbri, A.R., Li, I., Friedman, D., Radev, D.R.: Scisummnet: a large annotated corpus and content-impact models for scientific paper summarization with citation networks. Proc. AAAI Conf. Artif. Intell. 33, 7386–7393 (2019)
Zhang, Q., Couloigner, I.: A new and efficient k-medoid algorithm for spatial clustering. In: Gervasi, O. (ed.) ICCSA 2005. LNCS, vol. 3482, pp. 181–189. Springer, Heidelberg (2005). https://doi.org/10.1007/11424857_20
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Mishra, S.K., Saini, N., Saha, S., Bhattacharyya, P. (2021). Let’s Summarize Scientific Documents! A Clustering-Based Approach via Citation Context. In: Métais, E., Meziane, F., Horacek, H., Kapetanios, E. (eds) Natural Language Processing and Information Systems. NLDB 2021. Lecture Notes in Computer Science(), vol 12801. Springer, Cham. https://doi.org/10.1007/978-3-030-80599-9_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-80599-9_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80598-2
Online ISBN: 978-3-030-80599-9
eBook Packages: Computer ScienceComputer Science (R0)