Computational linguistics literature and citations oriented citation linkage, classification and summarization

Lei Li¹,
Liyuan Mao¹,
Yazhao Zhang¹,
Junqi Chi¹,
Taiwen Huang¹,
Xiaoyue Cong¹ &
…
Heng Peng¹

466 Accesses
3 Altmetric
Explore all metrics

Abstract

Scientific literature is currently the most important resource for scholars, and their citations have provided researchers with a powerful latent way to analyze scientific trends, influences and relationships of works and authors. This paper is focused on automatic citation analysis and summarization for the scientific literature of computational linguistics, which are also the shared tasks in the 2016 workshop of the 2nd Computational Linguistics Scientific Document Summarization at BIRNDL 2016 (The Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries). Each citation linkage between a citation and the spans of text in the reference paper is recognized according to their content similarities via various computational methods. Then the cited text span is classified to five pre-defined facets, i.e., Hypothesis, Implication, Aim, Results and Method, based on various features of lexicons and rules via Support Vector Machine and Voting Method. Finally, a summary of the reference paper from the cited text spans is generated within 250 words. hLDA (hierarchical Latent Dirichlet Allocation) topic model is adopted for content modeling, which provides knowledge about sentence clustering (subtopic) and word distributions (abstractiveness) for summarization. We combine hLDA knowledge with several other classical features using different weights and proportions to evaluate the sentences in the reference paper. Our systems have been ranked top one and top two according to the evaluation results published by BIRNDL 2016, which has verified the effectiveness of our methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Extracting reference text from citation contexts

Article 02 June 2017

Scientific document summarization via citation contextualization and scientific discourse

Article 09 May 2017

The linguistic patterns and rhetorical structure of citation context: an approach using n-grams

Article 30 September 2016

References

Wan, X., Yang, J., Xiao, J.: Using cross-document random walks for topic-focused multi-document. In: IEEE/Wic/ACM International Conference on Web Intelligence, pp. 1012–1018 (2006)
Garca, J., Laurent, F., Gillard, O.F.: Bag-of-senses versus bag-of-words: comparing semantic and lexical approaches on sentence extraction. In: TAC 2008 Workshop—Notebook Papers and Results (2008)
Bellemare, S., Bergler, S., Witte, R.: ERSS at TAC 2008. In: TAC 2008 Proceedings (2008)
Conroy, J., Schlesinger, J.D.: CLASSY at TAC 2008 Metrics. In: TAC 2008 Proceedings (2008)
Zheng, Y., Takenobu, T.: The TITech Summarization System at TAC-2009. In: TAC 2009 Proceedings (2009)
Annie, L., Ani, N.: Predicting summary quality using limited human input. In: TAC 2009 Proceedings (2009)
Darling, W.M.: Multi-document summarization from first principles. In: Proceedings of the third Text Analysis Conference, TAC-2010. NIST, vol. 150 (2010)
Kokil, J., Muthu, K.C., Sajal, R., MinYen, K.: Overview of the 2nd computational linguistics scientific document summarization shared task (CL-SciSumm 2016). In: The Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2016), Newark, New Jersey, USA (2016)
Genest, P., Lapalme, G., Qubec, M.: Text generation for abstractive summarization. In: TAC 2010 Proceedings (2010)
Jin, F., Huang, M., Zhu, X.: The THU summarization systems at TAC 2010. In: Text Analysis Conference (2010)
Zhang, R., Ouyang, Y., Li, W., Zhang, R., Ouyang, Y., Li, W.: Guided summarization with aspect recognition. In: TAC 2011 Proceedings (2011)
Marina, L., Natalia, V.: Multilingual multi-document summarization with POLY. In: Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization (2013)
Steinberger, J.: The UWB summariser at multiling-2013. In: Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization (2013)
Ardjomand, N., Mcalister, J.C., Rogers, N.J., Tan, P.H., George, A.J., Larkin, D. F.: Multilingual summarization: dimensionality reduction and a step towards optimal term coverage. In: Multiling 2013 Workshop on Multilingual Multi-Document Summarization, pp. 3899–3905 (2013)
Anechitei, D.A., Ignat, E.: Multi-lingual summarization system based on analyzing the discourse structure at MultiLing 2013. In: Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization (2013)
El-Haj, M., Rayson, P.: Using a keyness metric for single and multi document summarisation. In: Multiling 2013 Workshop, ACL (2013)
Fattah, M.A.: A hybrid machine learning model for multi-document summarization. Appl. Intell. 40(40), 592–600 (2014)
Article Google Scholar
Zhang, R., Li, W., Gao, D., Ouyang, Y.: Automatic twitter topic summarization with speech acts. IEEE Trans. Audio Speech Lang. Process. 21(3), 649–658 (2013)
Article Google Scholar
Xu, Y.D., Zhang, X.D., Quan, G.R., Wang, Y.D.: MRS for multi-document summarization by sentence extraction. Tele-commun. Syst. 53(1), 91–98 (2013)
Article Google Scholar
Arora, R., Ravindran, B.: Latent dirichlet allocation based multi-document summarization. In: The Workshop on Analytics for Noisy Unstructured Text Data. ACM, pp. 91–97 (2008)
Krestel, R., Fankhauser, P., Nejdl, W.: Latent dirichlet allocation for tag recommendation. In: ACM Conference on Recommender Systems, pp. 61–68 (2009)
Griffiths, T.L., Steyvers, M., Blei, D.M., Tenenbaum, J.B.: Integrating topics and syntax. Adv. Neural Inf. Process. Syst. 17, 537–544 (2010)
Google Scholar
Blei, D.M., Lafferty, J. D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120 (2006)
Wang, C., Blei, D.M.: Decoupling sparsity and smoothness in the discrete hierarchical Dirichlet process. Advances in Neural Information Processing Systems 22. In: Conference on Neural Information Processing Systems 2009. Proceedings of A Meeting Held 7–10 December 2009, Vancouver, British Columbia, Canada, pp. 1982–1989 (2009)
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M. Sharing clusters among related groups: hierarchical Dirichlet processes. Advanced Neural Inf Process Syst 37(2), 1385–1392 (2004)
Blei, D.M., Griffiths, T.L., Jordan, M.I.: The nested Chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM 57(2), 87–103 (2010)
Article MathSciNet MATH Google Scholar
Celikyilmaz, A., HakkaniTur, D.: A hybrid hierarchical model for multi-document summarization. In: ACL 2010, Proceedings of the, Meeting of the Association for Computational Linguistics, July 11–16, 2010, Uppsala, Sweden, pp. 815–824 (2010)
Ren, Z., De Rijke, M.: Summarizing contrastive themes via hierarchical non-parametric processes. In: International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, pp. 93–102 (2015)
Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., Radev, D.: Blind men and elephants: what do citation summaries tell us about a research article? J. Am. Soc. Inf. Sci. Technol. 59(1), 51–62 (2008)
Article Google Scholar
Qazvinian, V., Radev, D.R.: Scientific paper summarization using citation summary networks. In: Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics (2008)
Qazvinian, V., Radev, D.R., Özgür, A.: Citation summarization through keyphrase extraction. In: Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics (2010)
Abu-Jbara, A., Radev, D.: Coherent citation-based summarization of scientific papers. In: Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, pp. 500–509 (2010)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. Comput. Sci. arXiv:1301.3781 (2013)
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. Comput. Sci. 4, 1188–1196 (2014)
Google Scholar
Heng, W., Yu, J., Li, L., Liu, Y.: Research on key factors in multi-document topic modelling application with HLDA. J. Chin. Inf. Process. 27(6), 117–127 (2013)
Google Scholar
Huang, T., Li, L., Zhang, Y., Chi, J.: Summarization based on multiple feature combination. In: Proceedings of 2016 4th IEEE International Conference on Cloud Computing and Intelligence Systems (IEEE CCIS 2016), 2016.8.17–19, Beijing, China, pp. 11–15 (2016)
Huang, T., Li, L., Zhang, Y.: Multilingual multi-document summarization with enhanced hLDA features. In: Springer Lecture Notes in Artificial Intelligence, LNAI10035, Subseries of Lecture Notes in Computer Science, Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated BigData, 15th China National Conference, CCL 2016, and 4th International Symposium, NLP-NABD 2016. Yantai, China, October 15–16, 2016, Proceedings, pp. 299–312 (2016)
Cao, Z., Li, W., Wu, D.: PolyU at CL-SciSumm 2016. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), Newark, NJ, USA, pp. 132–138, June 2016
Nomoto, T.: NEAL: A neurally enhanced approach to linking citation and reference. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), Newark, NJ, USA, pp. 168–174, June 2016
Saggion, H., AbuRa’Ed, A., Ronzano, F.: Trainable Citation-enhanced summarization of scientific articles. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), Newark, NJ, USA, pp. 175–186, June 2016
Conroy, J., Davis, S.: Vector space and language models for scientific document summarization. In: NAACL-HLT. Association of Computational Linguistics, Newark, NJ, USA, pp. 186–191 (2015)

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant 91546121, 61202247, 71231002 and 61472046; EU FP7 IRSES MobileCloud Project (Grant No. 612212); the 111 Project of China under Grant B08004; EngineeringResearch Center of Information Networks, Ministry of Education(MOE); MOE Liberal arts and Social Sciences Foundation under Grant 16YJA630011; BeijingInstitute of Science and Technology Information; CapInfo Company Limited.

Author information

Authors and Affiliations

Center for Intelligence Science and Technology (CIST), School of Computer, Beijing University of Posts and Telecommunications (BUPT), Beijing, People’s Republic of China
Lei Li, Liyuan Mao, Yazhao Zhang, Junqi Chi, Taiwen Huang, Xiaoyue Cong & Heng Peng

Authors

Lei Li
View author publications
You can also search for this author in PubMed Google Scholar
Liyuan Mao
View author publications
You can also search for this author in PubMed Google Scholar
Yazhao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Junqi Chi
View author publications
You can also search for this author in PubMed Google Scholar
Taiwen Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyue Cong
View author publications
You can also search for this author in PubMed Google Scholar
Heng Peng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, L., Mao, L., Zhang, Y. et al. Computational linguistics literature and citations oriented citation linkage, classification and summarization. Int J Digit Libr 19, 173–190 (2018). https://doi.org/10.1007/s00799-017-0219-5

Download citation

Received: 11 October 2016
Revised: 05 June 2017
Accepted: 06 June 2017
Published: 13 June 2017
Issue Date: September 2018
DOI: https://doi.org/10.1007/s00799-017-0219-5

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Extracting reference text from citation contexts

Scientific document summarization via citation contextualization and scientific discourse

The linguistic patterns and rhetorical structure of citation context: an approach using n-grams

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Computational linguistics literature and citations oriented citation linkage, classification and summarization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Extracting reference text from citation contexts

Scientific document summarization via citation contextualization and scientific discourse

The linguistic patterns and rhetorical structure of citation context: an approach using n-grams

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now