Abstract
With the rapid growth of social media platforms, digitization of official records, and digital publication of articles, books, magazines, and newspapers, lots of data are generated every day. This data is a foundation of information and contains a vast amount of text that may be complex, ambiguous, redundant, irrelevant, and unstructured. Therefore, we require tools and methods that can help us understand and automatically summarize the vast amount of generated text. There are mainly two types of approaches to perform text summarization: abstractive and extractive. In Abstractive Text Summarization, a concise summary is generated by including the salient features of the input documents and paraphrasing documents using new sentences and phrases. While in Extractive Text Summarization, a summary is produced by selecting and combining the most significant sentences and phrases from the source documents. The researchers have given numerous techniques for both kinds of text summarization. In this work, we classify Extractive Text Summarization approaches and review them based on their characteristics, techniques, and performance. We have discussed the existing Extractive Text Summarization approaches along with their limitations. We also classify and discuss evaluation measures and provide the research challenges faced in Extractive Text Summarization.
Similar content being viewed by others
Data availability
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
References
Abdi A, Idris N, Alguliyev RM, Aliguliyev RM (2017) Query-based multi-documents summarization using linguistic knowledge and content word expansion. Soft Comput 21(7):1785–1801. https://doi.org/10.1007/s00500-015-1881-4
Abdi A, Hasan S, Shamsuddin SM, Idris N, Piran J (2021) A hybrid deep learning architecture for opinion-oriented multi-document summarization based on multi-feature fusion. Knowl-Based Syst 213:106658. https://doi.org/10.1016/j.knosys.2020.106658
Abhiman, BD, Hiraman, PY (2021) A text summarization using multi linguistic features and fuzzy logic technique of sentences
Alami N, Meknassi M, En-nahnahi N (2019) Enhancing unsupervised neural networks-based text summarization with word embedding and ensemble learning. Expert Syst Appl 123:195–211. https://doi.org/10.1016/j.eswa.2019.01.037
Alami N, Mallahi ME, Amakdouf H, Qjidaa H (2021) Hybrid method for text summarization based on statistical and semantic treatment. Multimed Tools Appl 80(13):19567–19600. https://doi.org/10.1007/s11042-021-10613-9
Alami N, Meknassi M, En-nahnahi N, El Adlouni Y, Ammor O (2021) Unsupervised neural networks for automatic Arabic text summarization using document clustering and topic modeling. Expert Syst Appl 172:114652. https://doi.org/10.1016/j.eswa.2021.114652
Ali ZH, Hussein AK, Abass HK, Fadel E (2021) Extractive multi document summarization using harmony search algorithm. Telkomnika 19(1):89–95. https://doi.org/10.12928/TELKOMNIKA.v19i1.15766
Al-Sabahi K, Zuping Z, Nadher M (2018) A hierarchical structured self-attentive model for extractive document summarization (HSSAS). IEEE Access 6:24205–24212. https://doi.org/10.1109/ACCESS.2018.2829199
Al-Taani, AT, Al-Omour, MM (2014) An extractive graph-based Arabic text summarization approach. In The International Arab Conference on Information Technology
Amarappa S, Sathyanarayana SV (2013) Named entity recognition and classification in kannada language. Int J Electron Comput Sci Eng 2(1):281–289
Arumae K, Liu F (2019) Guiding extractive summarization with question-answering rewards. arXiv preprint arXiv:1904.02321. https://doi.org/10.48550/arXiv.1904.02321
Asa AS, Akter S, Uddin MP, Hossain MD, Roy SK, Afjal MI (2017) A comprehensive survey on extractive text summarization techniques. Am J Eng Res 6(1):226–239
Awan MN, Beg MO (2021) Top-rank: a topicalpostionrank for extraction and classification of keyphrases in text. Comput Speech Lang 65:101116. https://doi.org/10.1016/j.csl.2020.101116
Azadani MN, Ghadiri N, Davoodijam E (2018) Graph-based biomedical text summarization: an itemset mining and sentence clustering approach. J Biomed Inform 84:42–58. https://doi.org/10.1016/j.jbi.2018.06.005
Baralis E, Cagliero L, Mahoto N, Fiori A (2013) GRAPHSUM: discovering correlations among multiple terms for graph-based summarization. Inf Sci 249:96–109. https://doi.org/10.1016/j.ins.2013.06.046
Barrera A, Verma R (2011) Automated extractive single-document summarization: beating the baselines with a new approach. In proceedings of the 2011 ACM symposium on applied computing (pp. 268-269). https://doi.org/10.1145/1982185.1982247
Baruah N, Sarma SK, Borkotokey S (2019) A novel approach of text summarization using Assamese WordNet. In 2019 4th international conference on information systems and computer networks (ISCON) (pp. 305-310). IEEE. https://doi.org/10.1109/ISCON47742.2019.9036285
Belkebir R, Guessoum A (2018) TALAA-ATSF: a global operation-based Arabic text summarization framework. In intelligent natural language processing: trends and applications (pp. 435–459). Springer, Cham. https://doi.org/10.1007/978-3-319-67056-0_21
Bommasani R, Cardie C (2020) Intrinsic evaluation of summarization datasets. In proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP) (pp. 8075-8096). https://doi.org/10.18653/v1/2020.emnlp-main.649
Cai H, Zheng VW, Chang KCC (2018) A comprehensive survey of graph embedding: problems, techniques, and applications. IEEE Trans Knowl Data Eng 30(9):1616–1637. https://doi.org/10.1109/TKDE.2018.2807452
Cao M, Zhuge H (2020) Grouping sentences as better language unit for extractive text summarization. Futur Gener Comput Syst 109:331–359. https://doi.org/10.1016/j.future.2020.03.046
Castillo JM, Mateo MAL, Paras AD, Sagum RA, Santos VDF (2013) Named entity recognition using support vector machine for Filipino text documents. Int J Future Comput Commun 2(5):530–532. https://doi.org/10.7763/IJFCC.2013.V2.220
Chen KY, Liu SH, Chen B, Wang HM, Jan EE, Hsu WL, Chen HH (2015) Extractive broadcast news summarization leveraging recurrent neural network language modeling techniques. IEEE/ACM Transact Audio, Speech, Lang Process 23(8):1322–1334. https://doi.org/10.1109/TASLP.2015.2432578
Chieu HL, Lee YK (2004) Query based event extraction along a timeline. In proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (pp. 425-432). https://doi.org/10.1145/1008992.1009065
Chouigui A, Ben Khiroun O, Elayeb B (2021) An arabic multi-source news corpus: experimenting on single-document extractive summarization. Arab J Sci Eng 46(4):3925–3938. https://doi.org/10.1007/s13369-020-05258-z
Chowdhury SR, Sarkar K, Dam S (2017) An approach to generic Bengali text summarization using latent semantic analysis. In 2017 international conference on information technology (ICIT) (pp. 11-16). IEEE. https://doi.org/10.1109/ICIT.2017.12
Cizmeciler K, Erdem E, Erdem A (2022) Leveraging semantic saliency maps for query-specific video summarization. Multimed Tools Appl 81(12):17457–17482. https://doi.org/10.1007/s11042-022-12442-w
Daiya D, Singh A, Jadon M (2018) Using statistical and semantic models for multi-document summarization. arXiv preprint arXiv:1805.04579. https://doi.org/10.48550/arXiv.1805.04579
Dang HT (2005) Overview of DUC 2005. In proceedings of the document understanding conference (Vol. 2005, pp. 1-12)
Dang HT (2006) DUC 2005: evaluation of question-focused summarization systems. In proceedings of the workshop on task-focused summarization and question answering (pp. 48-55). https://aclanthology.org/W06-0707.pdf
Dernoncourt F, Ghassemi M, Chang W (2018) A repository of corpora for summarization. In proceedings of the eleventh international conference on language resources and evaluation (LREC 2018). https://aclanthology.org/L18-1509.pdf
Dixit RS, Apte SS (2012) Improvement of text summarization using fuzzy logic-based method. IOSR J Comput Eng (IOSRJCE) 5(6):5–10 http://www.iosrjournals.org/
Dunlavy DM, O’Leary DP, Conroy JM, Schlesinger JD (2007) QCS: a system for querying, clustering and summarizing documents. Inf Process Manag 43(6):1588–1605. https://doi.org/10.1016/j.ipm.2007.01.003
Dutta M, Das AK, Mallick C, Sarkar A, Das AK (2019) A graph-based approach on extractive summarization. In emerging Technologies in Data Mining and Information Security (pp. 179–187). Springer, Singapore. https://doi.org/10.1007/978-981-13-1498-8_16
Dwivedi V, Ghosh S (2022) Classification of Hindi compound nouns using machine learning. SN Comput Sci 3(1):1–5. https://doi.org/10.1007/s42979-021-00895-z
Elayeb B, Chouigui A, Bounhas M, Khiroun OB (2020) Automatic arabic text summarization using analogical proportions. Cogn Comput 12(5):1043–1069. https://doi.org/10.1007/s12559-020-09748-y
Elbarougy R, Behery G, El Khatib A (2020) Extractive Arabic text summarization using modified PageRank algorithm. Egypt Inf J 21(2):73–81. https://doi.org/10.1016/j.eij.2019.11.001
El-Haj MO, Hammo BH (2008) Evaluation of query-based Arabic text summarization system. In 2008 international conference on natural language processing and knowledge engineering (pp. 1-7). IEEE. https://doi.org/10.1109/NLPKE.2008.4906790
El-Kassas WS, Salama CR, Rafea AA, Mohamed HK (2021) Automatic text summarization: a comprehensive survey. Expert Syst Appl 165:113679. https://doi.org/10.1016/j.eswa.2020.113679
Elrefaiy A, Abas AR, Elhenawy I (2018) Review of recent techniques for extractive text summarization. J Theor Appl Inf Technol 96(23):7739–7759
Erkan G, Radev DR (2004) Lexrank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479. https://doi.org/10.1613/jair.1523
Fang C, Mu D, Deng Z, Wu Z (2017) Word-sentence co-ranking for automatic extractive text summarization. Expert Syst Appl 72:189–195. https://doi.org/10.1016/j.eswa.2016.12.021
Fei L, Hu Y, Xiao F, Chen L, Deng Y (2016) A modified topsis method based on numbers and its applications in human resources selection Mathematical Problems in Engineering, 2016. https://doi.org/10.1155/2016/6145196
Ferreira R, de Souza Cabral L, Lins RD, e Silva GP, Freitas F, Cavalcanti GD, Favaro L (2013) Assessing sentence scoring techniques for extractive text summarization. Expert Syst Appl 40(14):5755–5764. https://doi.org/10.1016/j.eswa.2013.04.023
Ferreira R, de Souza Cabral L, Freitas F, Lins RD, de França Silva G, Simske SJ, Favaro L (2014) A multi-document summarization system based on statistics and linguistic treatment. Expert Syst Appl 41(13):5780–5787. https://doi.org/10.1016/j.eswa.2014.03.023
Fitrianah D, Jauhari RN (2022) Extractive text summarization for scientific journal articles using long short-term memory and gated recurrent units. Bullet Electr Eng Inf 11(1). https://doi.org/10.11591/eei.v11i1.3278
Gamal M, El-Sawy A, AbuEl-Atta AH (2021) Hybrid Algorithm Based on Chicken Swarm Optimization and Genetic Algorithm for Text Summarization. Int J Intell Eng Syst, Vol.14, No.3, https://doi.org/10.22266/ijies2021.0630.27
Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47(1):1–66. https://doi.org/10.1007/s10462-016-9475-9
Gambhir M, Gupta V (2022) Deep learning-based extractive text summarization with word-level attention mechanism. Multimed Tools Appl, 1-24. https://doi.org/10.1007/s11042-022-12729-y
Gholamrezazadeh S, Salehi MA, Gholamzadeh B (2009) A comprehensive survey on text summarization systems. In: 2009 2nd international conference on computer science and its applications. IEEE, pp 1–6. https://doi.org/10.1109/CSA.2009.5404226
Goldman J, Renals S, Bird S, De Jong F, Federico M, Fleischhauer C, Wright R (2005) Accessing the spoken word. Int J Digit Libr 5(4):287–298. https://doi.org/10.1007/s00799-004-0101-0
Gong Y, Liu X (2001) Generic text summarization using relevance measure and latent semantic analysis. In proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval (pp. 19-25). https://doi.org/10.1145/383952.383955
Goularte FB, Nassar SM, Fileto R, Saggion H (2019) A text summarization method based on fuzzy rules and applicable to automated assessment. Expert Syst Appl 115:264–275. https://doi.org/10.1016/j.eswa.2018.07.047
Gupta V, Lehal GS (2010) A survey of text summarization extractive techniques. J Emerg Technol Web Intell 2(3):258–268. https://doi.org/10.4304/jetwi.2.3.258-268
Gupta P, Pendluri VS, Vats I (2011) Summarizing text by ranking text units according to shallow linguistic features. In 13th international conference on advanced communication technology (ICACT2011) (pp. 1620-1625). IEEE
Hassel M (2004) Evaluation of automatic text summarization. Licentiate Thesis, Stockholm, Sweden, pp 1–75
Hernández-Castañeda Á, García-Hernández RA, Ledeneva Y, Millán-Hernández CE (2022) Language-independent extractive automatic text summarization based on automatic keyword extraction. Comput Speech Lang 71:101267. https://doi.org/10.1016/j.csl.2021.101267
Herskovic JR, Cohen T, Subramanian D, Iyengar MS, Smith JW, Bernstam EV (2011) MEDRank: using graph-based concept ranking to index biomedical texts. Int J Med Inform 80(6):431–441. https://doi.org/10.1016/j.ijmedinf.2011.02.008
Hin D, Kan A, Chen H, Babar MA (2022) LineVD: statement-level vulnerability detection using graph neural networks. arXiv preprint arXiv:2203.05181.https://doi.org/10.48550/arXiv.2203.05181
Irfan M, Zulfikar WB (2017) Implementation of fuzzy C-means algorithm and TF-IDF on English journal summary. In 2017 second international conference on informatics and computing (ICIC) (pp. 1-5). IEEE. https://doi.org/10.1109/IAC.2017.8280646
Isonuma M, Fujino T, Mori J, Matsuo Y, Sakata I (2017) Extractive summarization using multi-task learning with document classification. In proceedings of the 2017 conference on empirical methods in natural language processing (pp. 2101-2110). https://doi.org/10.18653/v1/D17-1223
Jain HJ, Bewoor MS, Patil SH (2012) Context sensitive text summarization using k means clustering algorithm. Int J Soft Comput Eng 2(2):301–304
Jain D, Borah MD, Biswas A (2021) Automatic summarization of legal bills: a comparative analysis of classical extractive approaches. In 2021 international conference on computing, communication, and intelligent systems (ICCCIS) (pp. 394-400). IEEE. https://doi.org/10.1109/ICCCIS51004.2021.9397119
Jain A, Yadav D, Arora A (2021) Particle swarm optimization for Punjabi text summarization. Int J Oper Res Inf Syst (IJORIS) 12(3):1–17. https://doi.org/10.4018/IJORIS.20210701.oa1
Jang M, Kang P (2021) Learning-free unsupervised extractive summarization model. IEEE Access 9:14358–14368. https://doi.org/10.1109/ACCESS.2021.3051237
Jones KS (2007) Automatic summarising: the state of the art. Inf Process Manag 43(6):1449–1481. https://doi.org/10.1016/j.ipm.2007.03.009
Joshi A, Fidalgo E, Alegre E, Fernández-Robles L (2019) SummCoder: an unsupervised framework for extractive text summarization based on deep auto-encoders. Expert Syst Appl 129:200–215. https://doi.org/10.1016/j.eswa.2019.03.045
Joshi A, Fidalgo E, Alegre E, Alaiz-Rodriguez R (2022) RankSum—an unsupervised extractive text summarization based on rank fusion. Expert Syst Appl 200:116846. https://doi.org/10.1016/j.eswa.2022.116846
Kågebäck M, Mogren O, Tahmasebi N, Dubhashi D (2014) Extractive summarization using continuous vector space models. In proceedings of the 2nd workshop on continuous vector space models and their compositionality (CVSC) (pp. 31-39). https://aclanthology.org/W14-1504.pdf
Kaikhah K (2004) Automatic text summarization with neural networks. In 2004 2nd international IEEE conference on'Intelligent Systems'. Proceedings (IEEE cat. No. 04EX791) (Vol. 1, pp. 40-44). IEEE. https://doi.org/10.1109/IS.2004.1344634
Keyvanpour MR, Shirzad MB, Rashidghalam H (2019) Elts: a brief review for extractive learning-based text summarizatoin algorithms. In 2019 5th international conference on web research (ICWR) (pp. 234-239). IEEE. https://doi.org/10.1109/ICWR.2019.8765294
Khurana A, Bhatnagar V (2022) Investigating entropy for extractive document summarization. Expert Syst Appl 187:115820. https://doi.org/10.1016/j.eswa.2021.115820
Kiyomarsi F, Esfahani FR (2011) Optimizing persian text summarization based on fuzzy logic approach. In 2011 international conference on intelligent building and management
Koto F, Lau JH, Baldwin T (2021) Discourse probing of pretrained language models. arXiv preprint arXiv:2104.05882. https://doi.org/10.48550/arXiv.2104.05882
Kumar YJ, Salim N, Abuobieda A, Albaham AT (2014) Multi document summarization based on news components using fuzzy cross-document relations. Appl Soft Comput 21:265–279. https://doi.org/10.1016/j.asoc.2014.03.041
Kumar A, Sharma A, Nayyar A (2020) Fuzzy logic-based hybrid model for automatic extractive text summarization. In proceedings of the 2020 5th international conference on intelligent information technology (pp. 7-15). https://doi.org/10.1145/3385209.3385235
Kumar Y, Kaur K, Kaur S (2021) Study of automatic text summarization approaches in different languages. Artif Intell Rev 54(8):5897–5929. https://doi.org/10.1007/s10462-021-09964-4
LeClair A, Haque S, Wu L, McMillan C (2020) Improved code summarization via a graph neural network. In proceedings of the 28th international conference on program comprehension (pp. 184-195). https://doi.org/10.1145/3387904.3389268
Li X, Du L, Shen YD (2012) Update summarization via graph-based sentence ranking. IEEE Trans Knowl Data Eng 25(5):1162–1174. https://doi.org/10.1109/TKDE.2012.42
Lins RD, Oliveira H, Cabral L, Batista J, Tenorio B, Salcedo DA, Simske SJ (2019) The CNN-Corpus in Spanish: a large Corpus for extractive text summarization in the Spanish language. In proceedings of the ACM symposium on document engineering 2019 (pp. 1-4). https://doi.org/10.1145/3342558.3345423
Lins RD, Oliveira H, Cabral L, Batista J, Tenorio B, Ferreira R, Simske SJ (2019) The cnn-corpus: A large textual corpus for single-document extractive summarization. In Proceedings of the ACM Symposium on Document Engineering 2019 (pp. 1–10). https://doi.org/10.1145/3342558.3345388
Lins RD, Mello RF, Simske S (2019) DocEng'19 competition on extractive text summarization. In proceedings of the ACM symposium on document engineering 2019 (pp. 1-2). https://doi.org/10.1145/3342558.3351874
Lins RD, de Mello RF, Simske SJ (2020) DocEng'2020 competition on extractive text summarization. In proceedings of the ACM symposium on document engineering 2020 (pp. 1-4). https://doi.org/10.1145/3395027.3419579
Liu B (2012) Sentiment analysis and opinion mining. Synth Lectures Human Lang Technol 5(1):1–167. https://doi.org/10.2200/S00416ED1V01Y201204HLT016
Liu F, Liu Y (2008) Correlation between rouge and human evaluation of extractive meeting summaries. In proceedings of ACL-08: HLT, short papers (pp. 201-204). https://aclanthology.org/P08-2051.pdf
Liu Y, Zhong SH, Li W (2012) Query-oriented multi-document summarization via unsupervised deep learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 26, no 1, pp 1699–1705. https://doi.org/10.1609/aaai.v26i1.8352
Liu SH, Chen KY, Chen B, Wang HM, Yen HC, Hsu WL (2015) Combining relevance language modeling and clarity measure for extractive speech summarization. IEEE/ACM Transact Audio, Speech, Lang Process 23(6):957–969. https://doi.org/10.1109/TASLP.2015.2414820
Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165. https://doi.org/10.1147/rd.22.0159
Luo L, Ao X, Song Y, Pan F, Yang M, He Q (2019) Reading like HER: human reading inspired extractive summarization. In proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) (pp. 3033-3043). https://doi.org/10.18653/v1/D19-1300
Lwin SS, Nwet KT (2018) Extractive summarization for Myanmar language. In 2018 international joint symposium on artificial intelligence and natural language processing (iSAI-NLP) (pp. 1-6). IEEE. https://doi.org/10.1109/iSAI-NLP.2018.8692976
Lwin SS, Nwet KT (2019) Extractive Myanmar news summarization using centroid based word embedding. In: 2019 international conference on advanced information technologies (ICAIT). IEEE, pp 200–205. https://doi.org/10.1109/AITC.2019.8921386
Mandal S, Singh GK, Pal A (2018) A constraints driven PSO based approach for text summarization. J Inf Math Sci 10(4):703–714. https://doi.org/10.26713/jims.v10i4.891
Mathkour HI, Touir AA, Al-Sanea WA (2008) Parsing Arabic texts using rhetorical structure theory. J Comput Sci 4(9):713–720
Maurya AK (2020) Resource and task clustering based scheduling algorithm for workflow applications in cloud computing environment. In 2020 sixth international conference on parallel, distributed and grid computing (PDGC) (pp. 566-570). IEEE. https://doi.org/10.1109/PDGC50313.2020.9315806
Maurya R, Singh SK, Maurya AK, Kumar A (2014) GLCM and multi class support vector machine based automated skin cancer classification. In 2014 international conference on computing for sustainable global development (INDIACom) (pp. 444-447). IEEE. https://doi.org/10.1109/IndiaCom.2014.6828177
Maurya SK, Singh D, Maurya AK (2022) Deceptive opinion spam detection approaches: a literature survey. Applied intelligence, 1-46. https://doi.org/10.1007/s10489-022-03427-1
Meena YK, Gopalani D (2015) Evolutionary algorithms for extractive automatic text summarization. Proced Comput Sci 48:244–249. https://doi.org/10.1016/j.procs.2015.04.177
Mehta P, Majumder P (2018) Effective aggregation of various summarization techniques. Inf Process Manag 54(2):145–158. https://doi.org/10.1016/j.ipm.2017.11.002
Mei JP, Chen L (2012) SumCR: a new subtopic-based extractive approach for text summarization. Knowl Inf Syst 31(3):527–545. https://doi.org/10.1007/s10115-011-0437-x
Mendoza M, Bonilla S, Noguera C, Cobos C, León E (2014) Extractive single-document summarization based on genetic operators and guided local search. Expert Syst Appl 41(9):4158–4169. https://doi.org/10.1016/j.eswa.2013.12.042
Merchant K, Pande Y (2018) Nlp based latent semantic analysis for legal text summarization. In 2018 international conference on advances in computing, communications and informatics (ICACCI) (pp. 1803-1807). IEEE. https://doi.org/10.1109/ICACCI.2018.8554831
Mihalcea R, Tarau P (2004) Textrank: bringing order into text. In proceedings of the 2004 conference on empirical methods in natural language processing (pp. 404-411)
MirShojaee H, Masoumi B, Zeinali E (2017) Biogeography-based optimization algorithm for automatic extractive text summarization. Int J Indust Eng Product Res 28(1):75–84 http://ijiepr.iust.ac.ir/article-1-722-en.html
Mirshojaei SH, Masoomi B (2015) Text summarization using cuckoo search optimization algorithm. J Comput Robot 8(2):19–24 http://www.qjcr.ir/article_683.html
Mohamed M, Oussalah M (2019) SRL-ESA-TextSum: a text summarization approach based on semantic role labeling and explicit semantic analysis. Inf Process Manag 56(4):1356–1372. https://doi.org/10.1016/j.ipm.2019.04.003
Moiyadi HS, Desai H, Pawar D, Agrawal G, Patil NM (2016) NLP based text summarization using semantic analysis. Int J Adv Eng Manag Sci 2(10):239678
Moratanch N, Chitrakala S (2017) A survey on extractive text summarization. In: 2017 international conference on computer, communication and signal processing (ICCCSP). IEEE, pp 1–6. https://doi.org/10.1109/ICCCSP.2017.7944061
Muthu B, Cb S, Kumar PM, Kadry SN, Hsu CH, Sanjuan O, Crespo RG (2021) A framework for extractive text summarization based on deep learning modified neural network classifier. Trans Asian Low-Resource Lang Inf Process 20(3):1–20. https://doi.org/10.1145/3392048
Mutlu B, Sezer EA, Akcayol MA (2019) Multi-document extractive text summarization: a comparative assessment on features. Knowl-Based Syst 183:104848. https://doi.org/10.1016/j.knosys.2019.07.019
Mutlu B, Sezer EA, Akcayol MA (2020) Candidate sentence selection for extractive text summarization. Inf Process Manag 57(6):102359. https://doi.org/10.1016/j.ipm.2020.102359
Nagalla S, Kumar KC (2021) Oppositional lion optimization algorithm and deep neural network based multi-document summarization from large-scale documents. Eur J Mol Clin Med 7(10):1991–2009 https://www.ejmcm.com/article_6857.html
Naik SS, Gaonkar MN (2017) Extractive text summarization by feature-based sentence extraction using rule-based concept. In 2017 2nd IEEE international conference on recent trends in electronics, Information & Communication Technology (RTEICT) (pp. 1364-1368). IEEE. https://doi.org/10.1109/RTEICT.2017.8256821
Nallapati R, Zhou B, Ma M (2016) Classify or select: neural architectures for extractive document summarization. arXiv preprint arXiv:1611.04244. https://doi.org/10.48550/arXiv.1611.04244
Nallapati R, Zhai F, Zhou B (2017) Summarunner: a recurrent neural network-based sequence model for extractive summarization of documents. In Thirty-first AAAI conference on artificial intelligence https://doi.org/10.48550/arXiv.1611.04230, 31
Narayan S, Cohen SB, Lapata M (2018) Ranking sentences for extractive summarization with reinforcement learning. arXiv preprint arXiv:1802.08636. https://doi.org/10.48550/arXiv.1802.08636
Nawaz A, Bakhtyar M, Baber J, Ullah I, Noor W, Basit A (2020) Extractive text summarization models for Urdu language. Inf Process Manag 57(6):102383. https://doi.org/10.1016/j.ipm.2020.102383
Neto JL, Freitas AA, Kaestner CA (2002) Automatic text summarization using a machine learning approach. In Brazilian symposium on artificial intelligence (pp. 205-215). Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36127-8_20
Ozsoy MG, Alpaslan FN, Cicekli I (2011) Text summarization using latent semantic analysis. J Inf Sci 37(4):405–417. https://doi.org/10.1177/0165551511408848
Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In proceedings of the 40th annual meeting of the Association for Computational Linguistics (pp. 311-318). https://aclanthology.org/P02-1040.pdf
Parveen D, Strube M (2015) Integrating importance, non-redundancy and coherence in graph-based extractive summarization. In: IJCAI'15: Proceedings of the 24th International Conference on Artificial Intelligence, pp 1298–1304
Patel D, Shah S, Chhinkaniwala H (2019) Fuzzy logic-based multi-document summarization with improved sentence scoring and redundancy removal technique. Expert Syst Appl 134:167–177. https://doi.org/10.1016/j.eswa.2019.05.045
Patil SR, Mahajan SM (2011) A novel approach for research paper abstracts summarization using cluster-based sentence extraction. In proceedings of the International Conference & Workshop on emerging trends in technology (pp. 583-586). https://doi.org/10.1145/1980022.1980150
Potnurwar A, Pimpalshende A, Aote SS, Bongirwar V (2020) Extractive multi-document text summarization by using binary particle swarm optimization. Helix 10(04):263–265. https://doi.org/10.21786/bbrc/13.14/8
Prasad SN, Narsimha VB, Reddy PV, Babu AV (2015) Influence of lexical, syntactic and structural features and their combination on authorship attribution for Telugu text. Proced Comput Sci 48:58–64. https://doi.org/10.1016/j.procs.2015.04.110
Qaroush A, Farha IA, Ghanem W, Washaha M, Maali E (2021) An efficient single document Arabic text summarization using a combination of statistical and semantic features. J King Saud Univ Comput Inf Sci 33(6):677–692. https://doi.org/10.1016/j.jksuci.2019.03.010
Rahman N, Borah B (2015) A survey on existing extractive techniques for query-based text summarization. In 2015 international symposium on advanced computing and communication (ISACC) (pp. 98-102). IEEE. https://doi.org/10.1109/ISACC.2015.7377323
Rani R, Lobiyal DK (2021) An extractive text summarization approach using tagged-LDA based topic modeling. Multimed Tools Appl 80(3):3275–3305. https://doi.org/10.1007/s11042-020-09549-3
Rautray R, Balabantaray RC (2017) Cat swarm optimization-based evolutionary framework for multi-document summarization. Physica A: Stat Mech Appl 477:174–186. https://doi.org/10.1016/j.physa.2017.02.056
Raval KR, Goyani MM (2022) A survey on event detection-based video summarization for cricket. Multimed Tools Appl, 1-29. https://doi.org/10.1007/s11042-022-12834-y
Ravinuthala VVMK, Chinnam SR (2017) A keyword extraction approach for single document extractive summarization based on topic centrality. Int J Intell Eng Syst https://doi.org/10.22266/ijies2017.1031.17
Rothe S, Schütze H (2014) Cosimrank: a flexible & efficient graph-theoretic similarity measure. In proceedings of the 52nd annual meeting of the Association for Computational Linguistics (volume 1: long papers) (pp. 1392-1402). https://aclanthology.org/P14-1131.pdf
Sahba R, Ebadi N, Jamshidi M, Rad P (2018) Automatic text summarization using customizable fuzzy features and attention on the context and vocabulary. In 2018 world automation congress (WAC) (pp. 1-5). IEEE. https://doi.org/10.23919/WAC.2018.8430483
Sahoo D, Balabantaray R, Phukon M, Saikia S (2016) Aspect-based multi-document summarization. In 2016 international conference on computing, communication and automation (ICCCA) (pp. 873-877). IEEE. https://doi.org/10.1109/CCAA.2016.7813838
Salton G, Singhal A, Mitra M, Buckley C (1997) Automatic text structuring and summarization. Inf Process Manag 33(2):193–207. https://doi.org/10.1016/S0306-4573(96)00062-3
Sanchez-Gomez JM, Vega-Rodríguez MA, Pérez CJ (2018) Extractive multi-document text summarization using a multi-objective artificial bee colony optimization approach. Knowl-Based Syst 159:1–8. https://doi.org/10.1016/j.knosys.2017.11.029
Sanchez-Gomez JM, Vega-Rodriguez MA, Perez CJ (2020) Experimental analysis of multiple criteria for extractive multi-document text summarization. Expert Syst Appl 140:112904. https://doi.org/10.1016/j.eswa.2019.112904
Shaymal AK, Pal M (2007) Triangular fuzzy matrices. Iran J Fuzzy Syst 4(1):75–87 https://www.sid.ir/en/Journal/ViewPaper.aspx?ID=67072
Shen C, Li T (2011) Learning to rank for query-focused multi-document summarization. In 2011 IEEE 11th international conference on data mining (pp. 626-634). IEEE. https://doi.org/10.1109/ICDM.2011.91
Shirwandkar NS, Kulkarni S (2018) Extractive text summarization using deep learning. In 2018 fourth international conference on computing communication control and automation (ICCUBEA) (pp. 1-5). IEEE. https://doi.org/10.1109/ICCUBEA.2018.8697465
Shoaib M, Maurya AK (2014) Comparative study of different web mining algorithms to discover knowledge on the web. In proceedings of Elsevier second international conference on emerging research in computing, information, communication and application (ERCICA-2014) (Vol. 3, pp. 648-654)
Shoaib M, Maurya AK (2014) URL ordering-based performance evaluation of web crawler. In 2014 international conference on advances in Engineering & Technology Research (ICAETR-2014) (pp. 1-7). IEEE. https://doi.org/10.1109/ICAETR.2014.7012962
Siddiqui MK, Ahmad A, Pal O, Ahmad T (2021) CoRank: a clustering cum graph ranking approach for extractive summarization. arXiv preprint arXiv:2106.00619. https://doi.org/10.48550/arXiv.2106.00619
Singh SP, Kumar A, Mangal A, Singhal S (2016) Bilingual automatic text summarization using unsupervised deep learning. In 2016 international conference on electrical, electronics, and optimization techniques (ICEEOT) (pp. 1195-1200). IEEE. https://doi.org/10.1109/ICEEOT.2016.7754874
Singh RK, Khetarpaul S, Gorantla R, Allada SG (2021) SHEG: summarization and headline generation of news articles using deep learning. Neural Comput & Applic 33(8):3251–3265. https://doi.org/10.1007/s00521-020-05188-9
Sirohi NK, Bansal M, Rajan SN (2021) Recent approaches for text summarization using machine learning & LSTM0. J Big Data 3(1):35. https://doi.org/10.32604/jbd.2021.015954
Song S, Huang H, Ruan T (2019) Abstractive text summarization using LSTM-CNN based deep learning. Multimed Tools Appl 78(1):857–875. https://doi.org/10.1007/s11042-018-5749-3
Sreelakshmi PR, Manmadhan S (2021) Image summarization using unsupervised learning. In 2021 7th international conference on advanced computing and communication systems (ICACCS) (Vol. 1, pp. 100-103). IEEE. https://doi.org/10.1109/ICACCS51430.2021.9441682
Srivastava AK, Pandey D, Agarwal A (2021) Extractive multi-document text summarization using dolphin swarm optimization approach. Multimed Tools Appl 80(7):11273–11290. https://doi.org/10.1007/s11042-020-10176-1
Srivastava R, Singh P, Rana KPS, Kumar V (2022) A topic modeled unsupervised approach to single document extractive text summarization. Knowl-Based Syst 246:108636. https://doi.org/10.1016/j.knosys.2022.108636
Steinberger J (2009) Evaluation measures for text summarization. Comput Inf 28(2):251–275 http://147.213.75.17/ojs/index.php/cai/article/view/37
Steinberger J, Jezek K (2004) Using latent semantic analysis in text summarization and summary evaluation. Proc ISIM 4(93-100):8
Suleman RM, Korkontzelos I (2020) Managing the syntactic blindness of latent semantic analysis. In CS & IT conference proceedings (Vol. 10, no. 4). CS & IT conference proceedings. https://doi.org/10.5121/csit.2020.100401
Suleman RM, Korkontzelos I (2021) Extending latent semantic analysis to manage its syntactic blindness. Expert Syst Appl 165:114130. https://doi.org/10.1016/j.eswa.2020.114130
Tarnpradab S, Liu F, Hua KA (2017) Toward extractive summarization of online forum discussions via hierarchical attention networks. Thirtieth Int Flairs Conf, 288-292. https://www.aaai.org/ocs/index.php/FLAIRS/FLAIRS17/paper/view/15500
Thakkar HK, Sahoo PK, Mohanty P (2021) DOFM: domain feature miner for robust extractive summarization. Inf Process Manag 58(3):102474. https://doi.org/10.1016/j.ipm.2020.102474
Thu HNT, Huu QN, Ngoc TNT (2013) A supervised learning method combine with dimensionality reduction in Vietnamese text summarization. In 2013 computing, communications and IT applications conference (ComComAp) (pp. 69-73). IEEE. https://doi.org/10.1109/ComComAp.2013.6533611
Uçkan T, Karcı A (2020) Extractive multi-document text summarization based on graph independent sets. Egypt Inf J 21(3):145–157. https://doi.org/10.1016/j.eij.2019.12.002
Vale R, Lins RD, Ferreira R (2020) An assessment of sentence simplification methods in extractive text summarization. In proceedings of the ACM symposium on document engineering 2020 (pp. 1-9). https://doi.org/10.1145/3395027.3419588
Van Lierde H, Chow TW (2019) Query-oriented text summarization based on hypergraph transversals. Inf Process Manag 56(4):1317–1338. https://doi.org/10.1016/j.ipm.2019.03.003
Verma P, Verma A, Pal S (2022) An approach for extractive text summarization using fuzzy evolutionary and clustering algorithms. Appl Soft Comput 120:108670. https://doi.org/10.1016/j.asoc.2022.108670
Wang D, Zhu S, Li T, Chi Y, Gong Y (2011) Integrating document clustering and multi-document summarization. ACM Trans Knowl Discov Data (TKDD) 5(3):1–26. https://doi.org/10.1145/1993077.1993078
Wang S, Zhao X, Li B, Ge B, Tang D (2017) Integrating extractive and abstractive models for long text summarization. In 2017 IEEE international congress on big data (BigData congress) (pp. 305-312). IEEE. https://doi.org/10.1109/BigDataCongress.2017.46
Wang X, Nie X, Liu X, Wang B, Yin Y (2020) Modality correlation-based video summarization. Multimed Tools Appl 79(45):33875–33890. https://doi.org/10.1007/s11042-020-08690-3
Wang D, Liu P, Zheng Y, Qiu X, Huang X (2020) Heterogeneous graph neural networks for extractive document summarization. arXiv preprint arXiv:2004.12393. https://doi.org/10.48550/arXiv.2004.12393
Wu K, Shi P, Pan D (2015) An approach to automatic summarization for chinese text based on the combination of spectral clustering and LexRank. In 2015 12th international conference on fuzzy systems and knowledge discovery (FSKD) (pp. 1350-1354). IEEE. https://doi.org/10.1109/FSKD.2015.7382140
Wu Z, Lei L, Li G, Huang H, Zheng C, Chen E, Xu G (2017) A topic modeling-based approach to novel document automatic summarization. Expert Syst Appl 84:12–23. https://doi.org/10.1016/j.eswa.2017.04.054
Wu M, Pan S, Zhou C, Chang X, Zhu X (2020) Unsupervised domain adaptive graph convolutional networks. In proceedings of the web conference 2020 (pp. 1457-1467). https://doi.org/10.1145/3366423.3380219
Xu J, Durrett G (2019) Neural extractive text summarization with syntactic compression. arXiv preprint arXiv:1902.00863. https://doi.org/10.48550/arXiv.1902.00863
Yadav J, Meena YK (2016) Use of fuzzy logic and WordNet for improving performance of extractive automatic text summarization. In 2016 international conference on advances in computing, communications and informatics (ICACCI) (pp. 2071-2077). IEEE. https://doi.org/10.1109/ICACCI.2016.7732356
Yadav AK, Saxena S (2016) A new conception of information requisition in web of things. Indian journal of science and technology, 9(44). https://doi.org/10.17485/ijst/2016/v9i44/105143
Yadav H, Ghosh S, Yu Y, Shah RR (2020) End-to-end named entity recognition from English speech. arXivpreprintarXiv:2005.11184. https://doi.org/10.48550/arXiv.2005.11184
Yadav AK, Maurya AK, Yadav RS (2021) Extractive text summarization using recent approaches: a survey. Ingénierie des Systèmes d'Information, 26(1). https://doi.org/10.18280/isi.260112
Ye S, Chua TS, Kan MY, Qiu L (2007) Document concept lattice for text understanding and summarization. Inf Process Manag 43(6):1643–1662. https://doi.org/10.1016/j.ipm.2007.03.010
Yogatama D, Liu F, Smith NA (2015) Extractive summarization by maximizing semantic volume. In proceedings of the 2015 conference on empirical methods in natural language processing (pp. 1961-1966). https://aclanthology.org/D15-1228.pdf
Yu W, Lin X, Zhang W (2013) Towards efficient SimRank computation on large networks. In 2013 IEEE 29th international conference on data engineering (ICDE) (pp. 601-612). IEEE. https://doi.org/10.1109/ICDE.2013.6544859
Zajic DM, Dorr BJ, Lin J (2008) Single-document and multi-document summarization techniques for email threads using sentence compression. Inf Process Manag 44(4):1600–1610. https://doi.org/10.1016/j.ipm.2007.09.007
Zhang K, Xiao Y, Tong H, Wang H, Wang W (2014) WiiCluster: a platform for wikipedia infobox generation. In proceedings of the 23rd ACM international conference on conference on information and knowledge management (pp. 2033-2035). https://doi.org/10.1145/2661829.2661840
Zopf M, Botschen T, Falke T, Heinzerling B, Marasovic A, Mihaylov T, Frank A (2018) What’s important in a text? An extensive evaluation of linguistic annotations for summarization. In 2018 fifth international conference on social networks analysis, management and security (SNAMS) (pp. 272-277). IEEE. https://doi.org/10.1109/SNAMS.2018.8554853
Acknowledgments
We value the opinions of innominate reviewers.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest in this manuscript.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yadav, A.K., Ranvijay, Yadav, R.S. et al. State-of-the-art approach to extractive text summarization: a comprehensive review. Multimed Tools Appl 82, 29135–29197 (2023). https://doi.org/10.1007/s11042-023-14613-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14613-9