Abstract
Document automatic summarization technology is a method that refines documents and generates summaries representing the whole document to help people quickly extract important information. Aiming at solving lack of semantic information in document abstracts, this paper proposed a weighted hybrid document summary model based on LDA. This model obtains the theme distribution probability through analysing the document. Firstly, we used the FCNNM (Fine-grained Convolutional Neural Network Model) extract the semantic features, then search the surface information of the text from heuristic rules, including the length, location of the sentence and TF-IDF of the words in the sentence, and weighted to calculate the sentence score. Finally, used the greedy algorithm to select the sentence to form the abstract. Experiments show that the proposed model can effectively compensate for the lack of semantics between abstract sentences and text in traditional algorithms, effectively reduce the high redundancy in document abstracts and improve the quality of abstracts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(4), 159–165 (1958)
Edmundson, H.P.: New methods in automatic extracting. J. ACM (JACM) 16(2), 264–285 (1969)
Zhong, S., Liu, Y., Li, B., et al.: Query-oriented unsupervised multi-document summarization via deep learning model. Expert Syst. Appl. 42(21), 8146–8155 (2015)
Xiong, C., Li, X., Li, Y., et al.: Multi-documents summarization based on TextRank and its application in online argumentation platform. Int. J. Data Warehous. Min. (IJDWM) 14(3), 69–89 (2018)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
Liu, N., Tang, X.J., Lu, Y., et al.: Topic-sensitive multi-document summarization algorithm. In: 2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming, pp. 69–74. IEEE (2014)
Yang, C.Z., Fan, J.S., Liu, Y.F.: Multi-document summarization using probabilistic topic-based network models. J. Inf. Sci. Eng. 32(6), 1613–1634 (2016)
Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 289–296. Morgan Kaufmann Publishers Inc. (1999)
Hu, B., Chen, Q., Zhu, F.: LCSTS: a large scale Chinese short text summarization dataset (2015). arXiv preprint arXiv:1506.05865
Momtazi, S.: Unsupervised latent Dirichlet allocation for supervised question classification. Inf. Process. Manag. 54(3), 380–393 (2018)
Agarwal, B., Ramampiaro, H., Langseth, H., et al.: A deep network model for paraphrase detection in short text messages. Inf. Process. Manag. 54(6), 922–937 (2018)
Devlin, J., Chang, M.W., Lee, K., et al.: BERT: pre-training of deep bidirectional transformers for language understanding (2018). arXiv preprint arXiv:1810.04805
Acknowledgments
This research is supported by National Key Research and Development Scheme of China under grant number 2017YFC1405403, and National Natural Science Foundation of China under grant number 61075059, and Green Industry Technology Leading Project (product development category) of Hubei University of Technology under grant number CPYF2017008, and Philosophical and Social Sciences Research Project of Hubei Education Department under Grant 19Q054.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Xiong, C., Shen, L., Wang, Z. (2020). An Automatic Text Summary Method Based on LDA Model. In: Barolli, L., Hellinckx, P., Natwichai, J. (eds) Advances on P2P, Parallel, Grid, Cloud and Internet Computing. 3PGCIC 2019. Lecture Notes in Networks and Systems, vol 96. Springer, Cham. https://doi.org/10.1007/978-3-030-33509-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-33509-0_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33508-3
Online ISBN: 978-3-030-33509-0
eBook Packages: EngineeringEngineering (R0)