Topic modeling combined with classification technique for extractive multi-document text summarization

Rajendra Kumar Roul ORCID: orcid.org/0000-0001-6295-262X¹

1322 Accesses
Explore all metrics

Abstract

The qualities of human readable summaries available in the datasets are not up to the mark, leading to issues in creating an accurate model for text summarization. Although recent works have been largely built upon this issue and set up a strong platform for further improvements, they still have many limitations. Looking in this direction, the paper proposes a novel methodology for summarizing a corpus of documents to generate a coherent summary using topic modeling and classification technique. The objectives of the propose work are highlighted below:

A novel heuristic approach is introduced to find out the actual number of topics that exist in a corpus of documents which handles the stochastic nature of latent dirichlet allocation.
A large corpus of documents is handled by minimizing the huge set of sentences into a small set without losing the important one and thus providing a concise and information rich summary at the end.
Ensuring that the sentences are arranged as per their importance in the coherent summary.
Results of the experiment are compared with the state-of-the-art summary systems.

The outcomes of the empirical work show that the proposed model is more promising compared to the well-known text summarization models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A New Automatic Multi-document Text Summarization using Topic Modeling

Extractive text summarization using clustering-based topic modeling

Article 04 October 2022

Extractive Text Summarization Using Topic Modelling and Entropy

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

https://radimrehurek.com/gensim/auto_examples/tutorials/run_doc2vec_lee.html.
https://towardsdatascience.com/lda2vec-word-embeddings-in-topic-models-4ee3fc4b2843.
https://radimrehurek.com/gensim/.
www.encyclopediaofmath.org/index.php?title=Hellinger_distance&oldid=16453.
For experimental purpose various values are tested between the range 0.2 to 0.8 in steps of 0.05, and 0.4 performed the best among them.
since reduction is being performed, $2/X < 1$.
http://www.nltk.org/.
http://www.duc.nist.gov.

References

Abdi A, Idris N, Alguliyev RM, Aliguliyev RM (2015) Query-based multi-documents summarization using linguistic knowledge and content word expansion. Soft Comput 21(7):1785–1801. https://doi.org/10.1007/s00500-015-1881-4
Article Google Scholar
Abdi A, Shamsuddin SM, Hasan S, Piran J (2018) Machine learning-based multi-documents sentiment-oriented summarization using linguistic treatment. Expert Syst Appl 109:66–85
Article Google Scholar
Abdi A, Shamsuddin SM, Hasan S, Piran J (2019) Automatic sentiment-oriented summarization of multi-documents using soft computing. Soft Comput 23(20):10 551–10 568
Article Google Scholar
Anand D, Wagh R (2019) Effective deep learning approaches for summarization of legal texts. J King Saud Univ Comput Inf Sci. https://doi.org/10.1016/j.jksuci.2019.11.015
Article Google Scholar
Briët J, Harremoës P (2009) Properties of classical and quantum Jensen–Shannon divergence. Phys Rev A 79(5):1–11
Article Google Scholar
Cagliero L, Garza P, Baralis E (2019) ELSA: a multilingual document summarization algorithm based on frequent itemsets and latent semantic analysis. ACM Trans Inf Syst (TOIS) 37(2):1–33
Article Google Scholar
Chatterjee N, Sahoo PK (2015) Random indexing and modified random indexing based approach for extractive text summarization. Comput Speech Lang 29(1):32–44
Article Google Scholar
Chen H, Jin H, Zhao F (2014) PSG: a two-layer graph model for document summarization. Front Comput Sci Sel Publ Chin Univ 8(1):119–130
MathSciNet Google Scholar
Cheng J, Lapata M (2016) Neural summarization by extracting sentences and words. In: Proceedings of the 54th annual meeting of the association for computational linguistics, pp 484–494
Elbarougy R, Behery G, Khatib AE (2020) Graph-based extractive Arabic text summarization using multiple morphological analyzers. J Inf Sci Eng 36(2):347–363
Google Scholar
Fang C, Mu D, Deng Z, Wu Z (2017) Word-sentence co-ranking for automatic extractive text summarization. Expert Syst Appl 72:189–195
Article Google Scholar
Ferreira R, de Souza Cabral L, Freitas F, Lins RD, de França Silva G, Simske SJ, Favaro L (2014) A multi-document summarization system based on statistics and linguistic treatment. Expert Syst Appl 41(13):5780–5787
Article Google Scholar
Genç S, Akay D, Boran FE, Yager RR (2019) Linguistic summarization of fuzzy social and economic networks: an application on the international trade network. Soft Comput 24:1511–1527
Article Google Scholar
Glavaš G, Šnajder J (2014) Event graphs for information retrieval and multi-document summarization. Expert Syst Appl 41(15):6904–6916
Article Google Scholar
Gupta V, Lehal GS (2010) A survey of text summarization extractive techniques. J Emerg Technol Web Intell 2(3):258–268
Google Scholar
Hu Y-H, Chen Y-L, Chou H-L (2017) Opinion mining from online hotel reviews—a text summarization approach. Inf Process Manag 53(2):436–449
Article Google Scholar
Jagarlamudi J, Pingali P, Varma V (2006) Query independent sentence scoring approach to DUC 2006. In: Proceeding of document understanding conference (DUC-2006)
Joshi A, Fidalgo E, Alegre E, Fernández-Robles L (2019) Summcoder: an unsupervised framework for extractive text summarization based on deep auto-encoders. Expert Syst Appl 129:200–215
Article Google Scholar
Kondru J (2007) Using part of speech structure of text in the prediction of its readability. Comput Sci Eng. Compute Science Engineering, University of Texas, Arlington, US. http://proquest.umi.com/pdqweb?did=1216761731&sid=1&Fmt=2&clientld=46449&PQT=309&VName=PQD
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Article MathSciNet Google Scholar
Levy O, Goldberg Y, Dagan I (2015) Improving distributional similarity with lessons learned from word embeddings. Trans Assoc Comput Linguist 3:211–225
Article Google Scholar
Lin C-Y (2004) Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out: proceedings of the ACL-04 workshop, vol 8, pp 74–81
Liu H, Jiang C, Hu C, Zhang L (2016) Efficient relation extraction method based on spatial feature using ELM. Neural Comput Appl 27(2):1–11
Google Scholar
Liu Y, Safavi T, Dighe A, Koutra D (2018) Graph summarization methods and applications: a survey. ACM Comput Surv (CSUR) 51(3):1–34
Article Google Scholar
Lovinger J, Valova I, Clough C (2019) GIST: general integrated summarization of text and reviews. Soft Comput 23(5):1589–1601
Article Google Scholar
Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165
Article MathSciNet Google Scholar
Lynn HM, Choi C, Kim P (2018) An improved method of automatic text summarization for web contents using lexical chain with semantic-related terms. Soft Comput 22(12):4013–4023
Article Google Scholar
Mashechkin I, Petrovskiy M, Popov D, Tsarev DV (2011) Automatic text summarization using latent semantic analysis. Program Comput Softw 37(6):299–305
Article MathSciNet Google Scholar
Melli G (2006) Description of squash, the SFU question answering summary handler for the DUC-2006 summarization task. Safety 1:1–8
Google Scholar
Mihalcea R, Tarau P (2004) Textrank: bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing, pp 404–411
Miller GA (1995) Wordnet: a lexical database for English. Commun ACM 38(11):39–41
Article Google Scholar
Nagwani N (2015) Summarizing large text collection using topic modeling and clustering based on mapreduce framework. J Big Data 2(1):1–18
Article Google Scholar
Ouyang Y, Li W, Li S, Lu Q (2011) Applying regression models to query-focused multi-document summarization. Inf Process Manag 47(2):227–237
Article Google Scholar
Ozsoy MG, Alpaslan FN, Cicekli I (2011) Text summarization using latent semantic analysis. J Inf Sci 37(4):405–417
Article MathSciNet Google Scholar
Parveen D, Ramsl H-M, Strube M (2015) Topical coherence for graph-based extractive summarization. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 1949–1954
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Article Google Scholar
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523
Article Google Scholar
Sanchez-Gomez JM, Vega-Rodríguez MA, Pérez CJ (2018) Extractive multi-document text summarization using a multi-objective artificial bee colony optimization approach. Knowl Based Syst 159:1–8
Article Google Scholar
Sankarasubramaniam Y, Ramanathan K, Ghosh S (2014) Text summarization using wikipedia. Inf Process Manag 50(3):443–461
Article Google Scholar
Tohalino JV, Amancio DR (2018) Extractive multi-document summarization using multilayer networks. Physica A 503:526–539
Article Google Scholar
Valizadeh M, Brazdil P (2015) Exploring actor–object relationships for query-focused multi-document summarization. Soft Comput 19(11):3109–3121
Article Google Scholar
Wan X (2010) Towards a unified approach to simultaneous single-document and multi-document summarizations. In: Proceedings of the 23rd international conference on computational linguistics, Association for Computational Linguistics, pp 1137–1145
Wang X, McCallum A, Wei X (2007) Topical $n$-grams: phrase and topic discovery, with an application to information retrieval. In: Seventh IEEE international conference on data mining (ICDM 2007), IEEE, pp 697–702
Woodsend K, Lapata M (2010) Automatic generation of story highlights. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Association for Computational Linguistics, pp 565–574
Wu Z, Lei L, Li G, Huang H, Zheng C, Chen E, Xu G (2017) A topic modeling based approach to novel document automatic summarization. Expert Syst Appl 84:12–23
Article Google Scholar
Yang G, Wen D, Chen N-S, Sutinen E et al (2015) A novel contextual topic model for multi-document summarization. Expert Syst Appl 42(3):1340–1352
Article Google Scholar
Ye S, Chua T-S, Kan M-Y, Qiu L (2007) Document concept lattice for text understanding and summarization. Inf Process Manag 43(6):1643–1662
Article Google Scholar
Yousefi-Azar M, Hamey L (2017) Text summarization using unsupervised deep learning. Expert Syst Appl 68:93–105
Article Google Scholar
Zamanian M, Heydari P (2012) Readability of texts: state of the art. Theory Pract Lang Stud 2(1):43–53
Article Google Scholar
Zhai C, Lafferty J (2017) A study of smoothing methods for language models applied to ad hoc information retrieval. CM SIGIR Forum 51(2):268–276
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Thapar Intitute of Engineering and Technology, Patiala, Punjab, 147004, India
Rajendra Kumar Roul

Authors

Rajendra Kumar Roul
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rajendra Kumar Roul.

Ethics declarations

Conflict of interest

The author declared that he has no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by the author.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Roul, R.K. Topic modeling combined with classification technique for extractive multi-document text summarization. Soft Comput 25, 1113–1127 (2021). https://doi.org/10.1007/s00500-020-05207-w

Download citation

Published: 30 October 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s00500-020-05207-w

Topic modeling combined with classification technique for extractive multi-document text summarization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A New Automatic Multi-document Text Summarization using Topic Modeling

Extractive text summarization using clustering-based topic modeling

Extractive Text Summarization Using Topic Modelling and Entropy

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Topic modeling combined with classification technique for extractive multi-document text summarization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A New Automatic Multi-document Text Summarization using Topic Modeling

Extractive text summarization using clustering-based topic modeling

Extractive Text Summarization Using Topic Modelling and Entropy

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now