Abstract
The purpose of government news summarization is to extract the most important information from official government news reports. It is important for readers to be able to understand government news quickly in the age of information overload. Compared with other types of news, government news reports have more detailed content and possess a more normative format, resulting in greater length. To resolve the contradiction between the length of these reports and the trend of fragmented reading, this research proposes an automatic text summarization model based on multiple features. First, the features are extracted using the TF–IDF algorithm and word vector embedding method based on the bidirectional encoder representation from the transformers model. Second, we score the sentences based on the position, keywords, and similarity features. Finally, the top-ranked sentence is selected to form the summarization. To verify the effectiveness and superiority of the proposed method, Edmundson and ROUGE were adopted. First, based on the Edmundson evaluation criteria, the summarization results of various methods were scored. The score differences between ATS summarization based on the proposed method and manual summarization were minimal across consistency, grammaticality, time sequence, conciseness, and readability, with values of 0.14, 0.18, 0.12, 0.10, and 0.16, respectively. This suggests that our summarization exhibits the highest similarity to manual summarization. Second, we evaluated the summarization results using the ROUGE evaluation criteria. The results indicate that the proposed method achieved significantly higher scores compared to other models. Specifically, for character-level ROUGE-1, the P, R, and F scores reached 0.84, 0.93, and 0.88, respectively. At the word-level ROUGE-1, the P, R, and F scores were 0.81, 0.89, and 0.85, respectively, which demonstrates a noticeable improvement over other models. Furthermore, compared to manual methods, the proposed method has advantages in assisting the reader to obtain important information rapidly.
Similar content being viewed by others
Availability of data and materials
The datasets generated during and analyzed during the current study are available from the corresponding author on reasonable request.
References
China Internet Network Information Center (2022) 49th Statistical Report. The 49th Statistical Report on China’s Internet Development. http://www.cnnic.net.cn/n4/2022/0401/c88-1131.html
Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47(1):1–66
Nallapati R, Zhou B, Santos C, Gulcehre, Xiang B (2016) Abstractive text summarization using sequence-to-sequence RNNs and beyond. In: Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, pp 280–290
Blei D, Ng A, Jordan M (2013) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Hofmann T (2017) Probabilistic latent semantic indexing. ACM SIGIR Forum 51(2):211–218
Blei D, Griffiths T, Jordan M, Tenenbaum J (2009) Hierarchical topic models and the nested Chinese restaurant process. In: Proceedings of the 17th International Conference on Neural Information Processing Systems. MIT Press, pp 17–24
Blei D, Griffiths T, Jordan M (2004) The nested Chinese restaurant process and Bayesian non-parametric inference of topic hierarchies. J Adv Neural Inf Process Syst 16(2):17–24
Çelikyilmaz A, Hakkani-Tür D (2010) A hybrid hierarchical model for multi-document summarization. In: Proceedings of the 48th Meeting of the Association for Computational Linguistics, pp 815–824
Arora R, Ravindran B (2008) Latent dirichlet allocation based multi-document summarization. In: Proceedings of the 2nd Workshop on Analytics for Noisy Unstructured Text Data, pp 91–97
Akhtar N, Ali R, Beg M (2018) Hierarchical summarization of news tweets with twitter-LDA. In: Applications of soft computing for the web, 1st ed, pp 83–98
Itti L, Baldi P (2009) Bayesian surprise attracts human attention. Vision Res 49(10):1295–1306
Omari A, Carmel D, Rokhlenko O, Szpektor I (2016) Novelty based ranking of human answers for community questions. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 215–224
Su G, Li J, Ma Y, Li S (2004) Improving the precision of the keyword-matching pornographic text filtering method using a hybrid mode. J Zhejiang Univ Sci 5(9):100–107
Luo W, Ma H, He Q, Shi Z (2011) Leveraging entropy and relevance for document summarization. J Chin Inf Process 25(5):9–16
Luhn H (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165
Berlanga R, Nebot V (2015) Tailored semantic annotation for semantic search. Web Semant Sci Serv Agents World Wide Web 30:69–81
Liakata M, Saha S, Simon D, Batchelor C, Rebholz D (2012) Automatic recognition of conceptualization zones in scientific articles and two life science applications. Bioinformatics 28(7):991–1000
El-Refaiy AM, Abas AR, El-Henawy IM (2019) Determining extractive summarization for a single document based on collaborative filtering frequency prediction and mean shift clustering. IAENG Int J Comput Sci 46(3):494–505
Darmawan R, Wijaya A (2019) Integration distance similarity with keyword algorithm for improving cohesion between sentences in text summarization. In: Proceedings of the IOP Conference Series: Materials Science and Engineerin, pp 12–19
Devlin J, Chang MW, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Luis A, Adrian L, Torres J (2018) SummTriver: a new trivergent model to evaluate summarizations automatically without human references. Data Knowl Eng 113:184–197
Edmundson HP, Wyllys RE (1961) Automatic abstracting and indexing- survey and recommendations. Commun ACM 5(5):226–234
Wang L, Yao JL, Tao YZ, Zhong L, Liu W, Du Q (2018) A reinforced topic-aware convolutional sequence-to-sequence model for abstractive text summarization. https://arxiv.org/pdf/1805.03616v2.pdf
Saeed MY, Awais M, Younas M, Shah MA, Khan A (2021) An abstractive summarization technique with variable length keywords as per document diversity. Comput Mater Contin 66(3):2409–2423
Gambhir M, Gupta V (2022) Deep learning-based extractive summarization with word-level attention mechanism. Multimed Tools Appl 81(15):20829–20852
Funding
This paper was supported by The National Social Science Fund of China (No.18CTQ033).
Author information
Authors and Affiliations
Contributions
YY and TY wrote the main manuscript text. MJ proposed the method and completed the experiment. TY and HZ completed the visualization of experimental results. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical Approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, Y., Tan, Y., Min, J. et al. Automatic text summarization for government news reports based on multiple features. J Supercomput 80, 3212–3228 (2024). https://doi.org/10.1007/s11227-023-05599-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-023-05599-0