Abstract
Scientific literature is currently the most important resource for scholars, and their citations have provided researchers with a powerful latent way to analyze scientific trends, influences and relationships of works and authors. This paper is focused on automatic citation analysis and summarization for the scientific literature of computational linguistics, which are also the shared tasks in the 2016 workshop of the 2nd Computational Linguistics Scientific Document Summarization at BIRNDL 2016 (The Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries). Each citation linkage between a citation and the spans of text in the reference paper is recognized according to their content similarities via various computational methods. Then the cited text span is classified to five pre-defined facets, i.e., Hypothesis, Implication, Aim, Results and Method, based on various features of lexicons and rules via Support Vector Machine and Voting Method. Finally, a summary of the reference paper from the cited text spans is generated within 250 words. hLDA (hierarchical Latent Dirichlet Allocation) topic model is adopted for content modeling, which provides knowledge about sentence clustering (subtopic) and word distributions (abstractiveness) for summarization. We combine hLDA knowledge with several other classical features using different weights and proportions to evaluate the sentences in the reference paper. Our systems have been ranked top one and top two according to the evaluation results published by BIRNDL 2016, which has verified the effectiveness of our methods.
Similar content being viewed by others
References
Wan, X., Yang, J., Xiao, J.: Using cross-document random walks for topic-focused multi-document. In: IEEE/Wic/ACM International Conference on Web Intelligence, pp. 1012–1018 (2006)
Garca, J., Laurent, F., Gillard, O.F.: Bag-of-senses versus bag-of-words: comparing semantic and lexical approaches on sentence extraction. In: TAC 2008 Workshop—Notebook Papers and Results (2008)
Bellemare, S., Bergler, S., Witte, R.: ERSS at TAC 2008. In: TAC 2008 Proceedings (2008)
Conroy, J., Schlesinger, J.D.: CLASSY at TAC 2008 Metrics. In: TAC 2008 Proceedings (2008)
Zheng, Y., Takenobu, T.: The TITech Summarization System at TAC-2009. In: TAC 2009 Proceedings (2009)
Annie, L., Ani, N.: Predicting summary quality using limited human input. In: TAC 2009 Proceedings (2009)
Darling, W.M.: Multi-document summarization from first principles. In: Proceedings of the third Text Analysis Conference, TAC-2010. NIST, vol. 150 (2010)
Kokil, J., Muthu, K.C., Sajal, R., MinYen, K.: Overview of the 2nd computational linguistics scientific document summarization shared task (CL-SciSumm 2016). In: The Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2016), Newark, New Jersey, USA (2016)
Genest, P., Lapalme, G., Qubec, M.: Text generation for abstractive summarization. In: TAC 2010 Proceedings (2010)
Jin, F., Huang, M., Zhu, X.: The THU summarization systems at TAC 2010. In: Text Analysis Conference (2010)
Zhang, R., Ouyang, Y., Li, W., Zhang, R., Ouyang, Y., Li, W.: Guided summarization with aspect recognition. In: TAC 2011 Proceedings (2011)
Marina, L., Natalia, V.: Multilingual multi-document summarization with POLY. In: Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization (2013)
Steinberger, J.: The UWB summariser at multiling-2013. In: Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization (2013)
Ardjomand, N., Mcalister, J.C., Rogers, N.J., Tan, P.H., George, A.J., Larkin, D. F.: Multilingual summarization: dimensionality reduction and a step towards optimal term coverage. In: Multiling 2013 Workshop on Multilingual Multi-Document Summarization, pp. 3899–3905 (2013)
Anechitei, D.A., Ignat, E.: Multi-lingual summarization system based on analyzing the discourse structure at MultiLing 2013. In: Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization (2013)
El-Haj, M., Rayson, P.: Using a keyness metric for single and multi document summarisation. In: Multiling 2013 Workshop, ACL (2013)
Fattah, M.A.: A hybrid machine learning model for multi-document summarization. Appl. Intell. 40(40), 592–600 (2014)
Zhang, R., Li, W., Gao, D., Ouyang, Y.: Automatic twitter topic summarization with speech acts. IEEE Trans. Audio Speech Lang. Process. 21(3), 649–658 (2013)
Xu, Y.D., Zhang, X.D., Quan, G.R., Wang, Y.D.: MRS for multi-document summarization by sentence extraction. Tele-commun. Syst. 53(1), 91–98 (2013)
Arora, R., Ravindran, B.: Latent dirichlet allocation based multi-document summarization. In: The Workshop on Analytics for Noisy Unstructured Text Data. ACM, pp. 91–97 (2008)
Krestel, R., Fankhauser, P., Nejdl, W.: Latent dirichlet allocation for tag recommendation. In: ACM Conference on Recommender Systems, pp. 61–68 (2009)
Griffiths, T.L., Steyvers, M., Blei, D.M., Tenenbaum, J.B.: Integrating topics and syntax. Adv. Neural Inf. Process. Syst. 17, 537–544 (2010)
Blei, D.M., Lafferty, J. D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120 (2006)
Wang, C., Blei, D.M.: Decoupling sparsity and smoothness in the discrete hierarchical Dirichlet process. Advances in Neural Information Processing Systems 22. In: Conference on Neural Information Processing Systems 2009. Proceedings of A Meeting Held 7–10 December 2009, Vancouver, British Columbia, Canada, pp. 1982–1989 (2009)
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M. Sharing clusters among related groups: hierarchical Dirichlet processes. Advanced Neural Inf Process Syst 37(2), 1385–1392 (2004)
Blei, D.M., Griffiths, T.L., Jordan, M.I.: The nested Chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM 57(2), 87–103 (2010)
Celikyilmaz, A., HakkaniTur, D.: A hybrid hierarchical model for multi-document summarization. In: ACL 2010, Proceedings of the, Meeting of the Association for Computational Linguistics, July 11–16, 2010, Uppsala, Sweden, pp. 815–824 (2010)
Ren, Z., De Rijke, M.: Summarizing contrastive themes via hierarchical non-parametric processes. In: International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, pp. 93–102 (2015)
Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., Radev, D.: Blind men and elephants: what do citation summaries tell us about a research article? J. Am. Soc. Inf. Sci. Technol. 59(1), 51–62 (2008)
Qazvinian, V., Radev, D.R.: Scientific paper summarization using citation summary networks. In: Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics (2008)
Qazvinian, V., Radev, D.R., Özgür, A.: Citation summarization through keyphrase extraction. In: Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics (2010)
Abu-Jbara, A., Radev, D.: Coherent citation-based summarization of scientific papers. In: Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, pp. 500–509 (2010)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. Comput. Sci. arXiv:1301.3781 (2013)
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. Comput. Sci. 4, 1188–1196 (2014)
Heng, W., Yu, J., Li, L., Liu, Y.: Research on key factors in multi-document topic modelling application with HLDA. J. Chin. Inf. Process. 27(6), 117–127 (2013)
Huang, T., Li, L., Zhang, Y., Chi, J.: Summarization based on multiple feature combination. In: Proceedings of 2016 4th IEEE International Conference on Cloud Computing and Intelligence Systems (IEEE CCIS 2016), 2016.8.17–19, Beijing, China, pp. 11–15 (2016)
Huang, T., Li, L., Zhang, Y.: Multilingual multi-document summarization with enhanced hLDA features. In: Springer Lecture Notes in Artificial Intelligence, LNAI10035, Subseries of Lecture Notes in Computer Science, Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated BigData, 15th China National Conference, CCL 2016, and 4th International Symposium, NLP-NABD 2016. Yantai, China, October 15–16, 2016, Proceedings, pp. 299–312 (2016)
Cao, Z., Li, W., Wu, D.: PolyU at CL-SciSumm 2016. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), Newark, NJ, USA, pp. 132–138, June 2016
Nomoto, T.: NEAL: A neurally enhanced approach to linking citation and reference. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), Newark, NJ, USA, pp. 168–174, June 2016
Saggion, H., AbuRa’Ed, A., Ronzano, F.: Trainable Citation-enhanced summarization of scientific articles. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), Newark, NJ, USA, pp. 175–186, June 2016
Conroy, J., Davis, S.: Vector space and language models for scientific document summarization. In: NAACL-HLT. Association of Computational Linguistics, Newark, NJ, USA, pp. 186–191 (2015)
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant 91546121, 61202247, 71231002 and 61472046; EU FP7 IRSES MobileCloud Project (Grant No. 612212); the 111 Project of China under Grant B08004; EngineeringResearch Center of Information Networks, Ministry of Education(MOE); MOE Liberal arts and Social Sciences Foundation under Grant 16YJA630011; BeijingInstitute of Science and Technology Information; CapInfo Company Limited.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, L., Mao, L., Zhang, Y. et al. Computational linguistics literature and citations oriented citation linkage, classification and summarization. Int J Digit Libr 19, 173–190 (2018). https://doi.org/10.1007/s00799-017-0219-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00799-017-0219-5