Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Automatic text summarization for government news reports based on multiple features

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The purpose of government news summarization is to extract the most important information from official government news reports. It is important for readers to be able to understand government news quickly in the age of information overload. Compared with other types of news, government news reports have more detailed content and possess a more normative format, resulting in greater length. To resolve the contradiction between the length of these reports and the trend of fragmented reading, this research proposes an automatic text summarization model based on multiple features. First, the features are extracted using the TF–IDF algorithm and word vector embedding method based on the bidirectional encoder representation from the transformers model. Second, we score the sentences based on the position, keywords, and similarity features. Finally, the top-ranked sentence is selected to form the summarization. To verify the effectiveness and superiority of the proposed method, Edmundson and ROUGE were adopted. First, based on the Edmundson evaluation criteria, the summarization results of various methods were scored. The score differences between ATS summarization based on the proposed method and manual summarization were minimal across consistency, grammaticality, time sequence, conciseness, and readability, with values of 0.14, 0.18, 0.12, 0.10, and 0.16, respectively. This suggests that our summarization exhibits the highest similarity to manual summarization. Second, we evaluated the summarization results using the ROUGE evaluation criteria. The results indicate that the proposed method achieved significantly higher scores compared to other models. Specifically, for character-level ROUGE-1, the P, R, and F scores reached 0.84, 0.93, and 0.88, respectively. At the word-level ROUGE-1, the P, R, and F scores were 0.81, 0.89, and 0.85, respectively, which demonstrates a noticeable improvement over other models. Furthermore, compared to manual methods, the proposed method has advantages in assisting the reader to obtain important information rapidly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Availability of data and materials

The datasets generated during and analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. China Internet Network Information Center (2022) 49th Statistical Report. The 49th Statistical Report on China’s Internet Development. http://www.cnnic.net.cn/n4/2022/0401/c88-1131.html

  2. Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47(1):1–66

    Article  Google Scholar 

  3. Nallapati R, Zhou B, Santos C, Gulcehre, Xiang B (2016) Abstractive text summarization using sequence-to-sequence RNNs and beyond. In: Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, pp 280–290

  4. Blei D, Ng A, Jordan M (2013) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    Google Scholar 

  5. Hofmann T (2017) Probabilistic latent semantic indexing. ACM SIGIR Forum 51(2):211–218

    Article  Google Scholar 

  6. Blei D, Griffiths T, Jordan M, Tenenbaum J (2009) Hierarchical topic models and the nested Chinese restaurant process. In: Proceedings of the 17th International Conference on Neural Information Processing Systems. MIT Press, pp 17–24

  7. Blei D, Griffiths T, Jordan M (2004) The nested Chinese restaurant process and Bayesian non-parametric inference of topic hierarchies. J Adv Neural Inf Process Syst 16(2):17–24

    Google Scholar 

  8. Çelikyilmaz A, Hakkani-Tür D (2010) A hybrid hierarchical model for multi-document summarization. In: Proceedings of the 48th Meeting of the Association for Computational Linguistics, pp 815–824

  9. Arora R, Ravindran B (2008) Latent dirichlet allocation based multi-document summarization. In: Proceedings of the 2nd Workshop on Analytics for Noisy Unstructured Text Data, pp 91–97

  10. Akhtar N, Ali R, Beg M (2018) Hierarchical summarization of news tweets with twitter-LDA. In: Applications of soft computing for the web, 1st ed, pp 83–98

  11. Itti L, Baldi P (2009) Bayesian surprise attracts human attention. Vision Res 49(10):1295–1306

    Article  Google Scholar 

  12. Omari A, Carmel D, Rokhlenko O, Szpektor I (2016) Novelty based ranking of human answers for community questions. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 215–224

  13. Su G, Li J, Ma Y, Li S (2004) Improving the precision of the keyword-matching pornographic text filtering method using a hybrid mode. J Zhejiang Univ Sci 5(9):100–107

    Article  Google Scholar 

  14. Luo W, Ma H, He Q, Shi Z (2011) Leveraging entropy and relevance for document summarization. J Chin Inf Process 25(5):9–16

    Google Scholar 

  15. Luhn H (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165

    Article  MathSciNet  Google Scholar 

  16. Berlanga R, Nebot V (2015) Tailored semantic annotation for semantic search. Web Semant Sci Serv Agents World Wide Web 30:69–81

    Article  Google Scholar 

  17. Liakata M, Saha S, Simon D, Batchelor C, Rebholz D (2012) Automatic recognition of conceptualization zones in scientific articles and two life science applications. Bioinformatics 28(7):991–1000

    Article  Google Scholar 

  18. El-Refaiy AM, Abas AR, El-Henawy IM (2019) Determining extractive summarization for a single document based on collaborative filtering frequency prediction and mean shift clustering. IAENG Int J Comput Sci 46(3):494–505

    Google Scholar 

  19. Darmawan R, Wijaya A (2019) Integration distance similarity with keyword algorithm for improving cohesion between sentences in text summarization. In: Proceedings of the IOP Conference Series: Materials Science and Engineerin, pp 12–19

  20. Devlin J, Chang MW, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

  21. Luis A, Adrian L, Torres J (2018) SummTriver: a new trivergent model to evaluate summarizations automatically without human references. Data Knowl Eng 113:184–197

    Article  Google Scholar 

  22. Edmundson HP, Wyllys RE (1961) Automatic abstracting and indexing- survey and recommendations. Commun ACM 5(5):226–234

    Article  Google Scholar 

  23. Wang L, Yao JL, Tao YZ, Zhong L, Liu W, Du Q (2018) A reinforced topic-aware convolutional sequence-to-sequence model for abstractive text summarization. https://arxiv.org/pdf/1805.03616v2.pdf

  24. Saeed MY, Awais M, Younas M, Shah MA, Khan A (2021) An abstractive summarization technique with variable length keywords as per document diversity. Comput Mater Contin 66(3):2409–2423

    Google Scholar 

  25. Gambhir M, Gupta V (2022) Deep learning-based extractive summarization with word-level attention mechanism. Multimed Tools Appl 81(15):20829–20852

    Article  Google Scholar 

Download references

Funding

This paper was supported by The National Social Science Fund of China (No.18CTQ033).

Author information

Authors and Affiliations

Authors

Contributions

YY and TY wrote the main manuscript text. MJ proposed the method and completed the experiment. TY and HZ completed the visualization of experimental results. All authors reviewed the manuscript.

Corresponding author

Correspondence to Yiting Tan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethical Approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 354 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, Y., Tan, Y., Min, J. et al. Automatic text summarization for government news reports based on multiple features. J Supercomput 80, 3212–3228 (2024). https://doi.org/10.1007/s11227-023-05599-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05599-0

Keywords

Navigation