Automatic text summarization for government news reports based on multiple features

Yanni Yang¹,
Yiting Tan¹,
Jintao Min² &
…
Zhengwei Huang³

681 Accesses
6 Citations
Explore all metrics

Abstract

The purpose of government news summarization is to extract the most important information from official government news reports. It is important for readers to be able to understand government news quickly in the age of information overload. Compared with other types of news, government news reports have more detailed content and possess a more normative format, resulting in greater length. To resolve the contradiction between the length of these reports and the trend of fragmented reading, this research proposes an automatic text summarization model based on multiple features. First, the features are extracted using the TF–IDF algorithm and word vector embedding method based on the bidirectional encoder representation from the transformers model. Second, we score the sentences based on the position, keywords, and similarity features. Finally, the top-ranked sentence is selected to form the summarization. To verify the effectiveness and superiority of the proposed method, Edmundson and ROUGE were adopted. First, based on the Edmundson evaluation criteria, the summarization results of various methods were scored. The score differences between ATS summarization based on the proposed method and manual summarization were minimal across consistency, grammaticality, time sequence, conciseness, and readability, with values of 0.14, 0.18, 0.12, 0.10, and 0.16, respectively. This suggests that our summarization exhibits the highest similarity to manual summarization. Second, we evaluated the summarization results using the ROUGE evaluation criteria. The results indicate that the proposed method achieved significantly higher scores compared to other models. Specifically, for character-level ROUGE-1, the P, R, and F scores reached 0.84, 0.93, and 0.88, respectively. At the word-level ROUGE-1, the P, R, and F scores were 0.81, 0.89, and 0.85, respectively, which demonstrates a noticeable improvement over other models. Furthermore, compared to manual methods, the proposed method has advantages in assisting the reader to obtain important information rapidly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic Text Summarization Using Word Embeddings

Automatic Text Summarization: A New Hybrid Model Based on Vector Space Modelling, Fuzzy Logic and Rhetorical Structure Analysis

Robust Single-Document Summarizations and a Semantic Measurement of Quality

Availability of data and materials

The datasets generated during and analyzed during the current study are available from the corresponding author on reasonable request.

References

China Internet Network Information Center (2022) 49th Statistical Report. The 49th Statistical Report on China’s Internet Development. http://www.cnnic.net.cn/n4/2022/0401/c88-1131.html
Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47(1):1–66
Article Google Scholar
Nallapati R, Zhou B, Santos C, Gulcehre, Xiang B (2016) Abstractive text summarization using sequence-to-sequence RNNs and beyond. In: Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, pp 280–290
Blei D, Ng A, Jordan M (2013) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Google Scholar
Hofmann T (2017) Probabilistic latent semantic indexing. ACM SIGIR Forum 51(2):211–218
Article Google Scholar
Blei D, Griffiths T, Jordan M, Tenenbaum J (2009) Hierarchical topic models and the nested Chinese restaurant process. In: Proceedings of the 17th International Conference on Neural Information Processing Systems. MIT Press, pp 17–24
Blei D, Griffiths T, Jordan M (2004) The nested Chinese restaurant process and Bayesian non-parametric inference of topic hierarchies. J Adv Neural Inf Process Syst 16(2):17–24
Google Scholar
Çelikyilmaz A, Hakkani-Tür D (2010) A hybrid hierarchical model for multi-document summarization. In: Proceedings of the 48th Meeting of the Association for Computational Linguistics, pp 815–824
Arora R, Ravindran B (2008) Latent dirichlet allocation based multi-document summarization. In: Proceedings of the 2nd Workshop on Analytics for Noisy Unstructured Text Data, pp 91–97
Akhtar N, Ali R, Beg M (2018) Hierarchical summarization of news tweets with twitter-LDA. In: Applications of soft computing for the web, 1st ed, pp 83–98
Itti L, Baldi P (2009) Bayesian surprise attracts human attention. Vision Res 49(10):1295–1306
Article Google Scholar
Omari A, Carmel D, Rokhlenko O, Szpektor I (2016) Novelty based ranking of human answers for community questions. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 215–224
Su G, Li J, Ma Y, Li S (2004) Improving the precision of the keyword-matching pornographic text filtering method using a hybrid mode. J Zhejiang Univ Sci 5(9):100–107
Article Google Scholar
Luo W, Ma H, He Q, Shi Z (2011) Leveraging entropy and relevance for document summarization. J Chin Inf Process 25(5):9–16
Google Scholar
Luhn H (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165
Article MathSciNet Google Scholar
Berlanga R, Nebot V (2015) Tailored semantic annotation for semantic search. Web Semant Sci Serv Agents World Wide Web 30:69–81
Article Google Scholar
Liakata M, Saha S, Simon D, Batchelor C, Rebholz D (2012) Automatic recognition of conceptualization zones in scientific articles and two life science applications. Bioinformatics 28(7):991–1000
Article Google Scholar
El-Refaiy AM, Abas AR, El-Henawy IM (2019) Determining extractive summarization for a single document based on collaborative filtering frequency prediction and mean shift clustering. IAENG Int J Comput Sci 46(3):494–505
Google Scholar
Darmawan R, Wijaya A (2019) Integration distance similarity with keyword algorithm for improving cohesion between sentences in text summarization. In: Proceedings of the IOP Conference Series: Materials Science and Engineerin, pp 12–19
Devlin J, Chang MW, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Luis A, Adrian L, Torres J (2018) SummTriver: a new trivergent model to evaluate summarizations automatically without human references. Data Knowl Eng 113:184–197
Article Google Scholar
Edmundson HP, Wyllys RE (1961) Automatic abstracting and indexing- survey and recommendations. Commun ACM 5(5):226–234
Article Google Scholar
Wang L, Yao JL, Tao YZ, Zhong L, Liu W, Du Q (2018) A reinforced topic-aware convolutional sequence-to-sequence model for abstractive text summarization. https://arxiv.org/pdf/1805.03616v2.pdf
Saeed MY, Awais M, Younas M, Shah MA, Khan A (2021) An abstractive summarization technique with variable length keywords as per document diversity. Comput Mater Contin 66(3):2409–2423
Google Scholar
Gambhir M, Gupta V (2022) Deep learning-based extractive summarization with word-level attention mechanism. Multimed Tools Appl 81(15):20829–20852
Article Google Scholar

Download references

Funding

This paper was supported by The National Social Science Fund of China (No.18CTQ033).

Author information

Authors and Affiliations

School of Literature and Media, China Three Gorges University, Yichang, China
Yanni Yang & Yiting Tan
School of Computer and Information Technology, China Three Gorges University, Yichang, China
Jintao Min
School of Economics and Management, China Three Gorges University, Yichang, China
Zhengwei Huang

Authors

Yanni Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yiting Tan
View author publications
You can also search for this author in PubMed Google Scholar
Jintao Min
View author publications
You can also search for this author in PubMed Google Scholar
Zhengwei Huang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

YY and TY wrote the main manuscript text. MJ proposed the method and completed the experiment. TY and HZ completed the visualization of experimental results. All authors reviewed the manuscript.

Corresponding author

Correspondence to Yiting Tan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethical Approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 354 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, Y., Tan, Y., Min, J. et al. Automatic text summarization for government news reports based on multiple features. J Supercomput 80, 3212–3228 (2024). https://doi.org/10.1007/s11227-023-05599-0

Download citation

Accepted: 17 August 2023
Published: 30 August 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11227-023-05599-0

Automatic text summarization for government news reports based on multiple features

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Automatic Text Summarization Using Word Embeddings

Automatic Text Summarization: A New Hybrid Model Based on Vector Space Modelling, Fuzzy Logic and Rhetorical Structure Analysis

Robust Single-Document Summarizations and a Semantic Measurement of Quality

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical Approval

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 354 KB)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Automatic text summarization for government news reports based on multiple features

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Automatic Text Summarization Using Word Embeddings

Automatic Text Summarization: A New Hybrid Model Based on Vector Space Modelling, Fuzzy Logic and Rhetorical Structure Analysis

Robust Single-Document Summarizations and a Semantic Measurement of Quality

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical Approval

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 354 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now