Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

State-of-the-art approach to extractive text summarization: a comprehensive review

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the rapid growth of social media platforms, digitization of official records, and digital publication of articles, books, magazines, and newspapers, lots of data are generated every day. This data is a foundation of information and contains a vast amount of text that may be complex, ambiguous, redundant, irrelevant, and unstructured. Therefore, we require tools and methods that can help us understand and automatically summarize the vast amount of generated text. There are mainly two types of approaches to perform text summarization: abstractive and extractive. In Abstractive Text Summarization, a concise summary is generated by including the salient features of the input documents and paraphrasing documents using new sentences and phrases. While in Extractive Text Summarization, a summary is produced by selecting and combining the most significant sentences and phrases from the source documents. The researchers have given numerous techniques for both kinds of text summarization. In this work, we classify Extractive Text Summarization approaches and review them based on their characteristics, techniques, and performance. We have discussed the existing Extractive Text Summarization approaches along with their limitations. We also classify and discuss evaluation measures and provide the research challenges faced in Extractive Text Summarization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

References

  1. Abdi A, Idris N, Alguliyev RM, Aliguliyev RM (2017) Query-based multi-documents summarization using linguistic knowledge and content word expansion. Soft Comput 21(7):1785–1801. https://doi.org/10.1007/s00500-015-1881-4

    Article  Google Scholar 

  2. Abdi A, Hasan S, Shamsuddin SM, Idris N, Piran J (2021) A hybrid deep learning architecture for opinion-oriented multi-document summarization based on multi-feature fusion. Knowl-Based Syst 213:106658. https://doi.org/10.1016/j.knosys.2020.106658

    Article  Google Scholar 

  3. Abhiman, BD, Hiraman, PY (2021) A text summarization using multi linguistic features and fuzzy logic technique of sentences

  4. Alami N, Meknassi M, En-nahnahi N (2019) Enhancing unsupervised neural networks-based text summarization with word embedding and ensemble learning. Expert Syst Appl 123:195–211. https://doi.org/10.1016/j.eswa.2019.01.037

    Article  Google Scholar 

  5. Alami N, Mallahi ME, Amakdouf H, Qjidaa H (2021) Hybrid method for text summarization based on statistical and semantic treatment. Multimed Tools Appl 80(13):19567–19600. https://doi.org/10.1007/s11042-021-10613-9

    Article  Google Scholar 

  6. Alami N, Meknassi M, En-nahnahi N, El Adlouni Y, Ammor O (2021) Unsupervised neural networks for automatic Arabic text summarization using document clustering and topic modeling. Expert Syst Appl 172:114652. https://doi.org/10.1016/j.eswa.2021.114652

    Article  Google Scholar 

  7. Ali ZH, Hussein AK, Abass HK, Fadel E (2021) Extractive multi document summarization using harmony search algorithm. Telkomnika 19(1):89–95. https://doi.org/10.12928/TELKOMNIKA.v19i1.15766

    Article  Google Scholar 

  8. Al-Sabahi K, Zuping Z, Nadher M (2018) A hierarchical structured self-attentive model for extractive document summarization (HSSAS). IEEE Access 6:24205–24212. https://doi.org/10.1109/ACCESS.2018.2829199

    Article  Google Scholar 

  9. Al-Taani, AT, Al-Omour, MM (2014) An extractive graph-based Arabic text summarization approach. In The International Arab Conference on Information Technology

  10. Amarappa S, Sathyanarayana SV (2013) Named entity recognition and classification in kannada language. Int J Electron Comput Sci Eng 2(1):281–289

    Google Scholar 

  11. Arumae K, Liu F (2019) Guiding extractive summarization with question-answering rewards. arXiv preprint arXiv:1904.02321. https://doi.org/10.48550/arXiv.1904.02321

  12. Asa AS, Akter S, Uddin MP, Hossain MD, Roy SK, Afjal MI (2017) A comprehensive survey on extractive text summarization techniques. Am J Eng Res 6(1):226–239

    Google Scholar 

  13. Awan MN, Beg MO (2021) Top-rank: a topicalpostionrank for extraction and classification of keyphrases in text. Comput Speech Lang 65:101116. https://doi.org/10.1016/j.csl.2020.101116

    Article  Google Scholar 

  14. Azadani MN, Ghadiri N, Davoodijam E (2018) Graph-based biomedical text summarization: an itemset mining and sentence clustering approach. J Biomed Inform 84:42–58. https://doi.org/10.1016/j.jbi.2018.06.005

    Article  Google Scholar 

  15. Baralis E, Cagliero L, Mahoto N, Fiori A (2013) GRAPHSUM: discovering correlations among multiple terms for graph-based summarization. Inf Sci 249:96–109. https://doi.org/10.1016/j.ins.2013.06.046

    Article  MathSciNet  Google Scholar 

  16. Barrera A, Verma R (2011) Automated extractive single-document summarization: beating the baselines with a new approach. In proceedings of the 2011 ACM symposium on applied computing (pp. 268-269). https://doi.org/10.1145/1982185.1982247

  17. Baruah N, Sarma SK, Borkotokey S (2019) A novel approach of text summarization using Assamese WordNet. In 2019 4th international conference on information systems and computer networks (ISCON) (pp. 305-310). IEEE. https://doi.org/10.1109/ISCON47742.2019.9036285

  18. Belkebir R, Guessoum A (2018) TALAA-ATSF: a global operation-based Arabic text summarization framework. In intelligent natural language processing: trends and applications (pp. 435–459). Springer, Cham. https://doi.org/10.1007/978-3-319-67056-0_21

    Book  Google Scholar 

  19. Bommasani R, Cardie C (2020) Intrinsic evaluation of summarization datasets. In proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP) (pp. 8075-8096). https://doi.org/10.18653/v1/2020.emnlp-main.649

  20. Cai H, Zheng VW, Chang KCC (2018) A comprehensive survey of graph embedding: problems, techniques, and applications. IEEE Trans Knowl Data Eng 30(9):1616–1637. https://doi.org/10.1109/TKDE.2018.2807452

    Article  Google Scholar 

  21. Cao M, Zhuge H (2020) Grouping sentences as better language unit for extractive text summarization. Futur Gener Comput Syst 109:331–359. https://doi.org/10.1016/j.future.2020.03.046

    Article  Google Scholar 

  22. Castillo JM, Mateo MAL, Paras AD, Sagum RA, Santos VDF (2013) Named entity recognition using support vector machine for Filipino text documents. Int J Future Comput Commun 2(5):530–532. https://doi.org/10.7763/IJFCC.2013.V2.220

    Article  Google Scholar 

  23. Chen KY, Liu SH, Chen B, Wang HM, Jan EE, Hsu WL, Chen HH (2015) Extractive broadcast news summarization leveraging recurrent neural network language modeling techniques. IEEE/ACM Transact Audio, Speech, Lang Process 23(8):1322–1334. https://doi.org/10.1109/TASLP.2015.2432578

    Article  Google Scholar 

  24. Chieu HL, Lee YK (2004) Query based event extraction along a timeline. In proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (pp. 425-432). https://doi.org/10.1145/1008992.1009065

  25. Chouigui A, Ben Khiroun O, Elayeb B (2021) An arabic multi-source news corpus: experimenting on single-document extractive summarization. Arab J Sci Eng 46(4):3925–3938. https://doi.org/10.1007/s13369-020-05258-z

    Article  Google Scholar 

  26. Chowdhury SR, Sarkar K, Dam S (2017) An approach to generic Bengali text summarization using latent semantic analysis. In 2017 international conference on information technology (ICIT) (pp. 11-16). IEEE. https://doi.org/10.1109/ICIT.2017.12

  27. Cizmeciler K, Erdem E, Erdem A (2022) Leveraging semantic saliency maps for query-specific video summarization. Multimed Tools Appl 81(12):17457–17482. https://doi.org/10.1007/s11042-022-12442-w

    Article  Google Scholar 

  28. Daiya D, Singh A, Jadon M (2018) Using statistical and semantic models for multi-document summarization. arXiv preprint arXiv:1805.04579. https://doi.org/10.48550/arXiv.1805.04579

  29. Dang HT (2005) Overview of DUC 2005. In proceedings of the document understanding conference (Vol. 2005, pp. 1-12)

  30. Dang HT (2006) DUC 2005: evaluation of question-focused summarization systems. In proceedings of the workshop on task-focused summarization and question answering (pp. 48-55). https://aclanthology.org/W06-0707.pdf

  31. Dernoncourt F, Ghassemi M, Chang W (2018) A repository of corpora for summarization. In proceedings of the eleventh international conference on language resources and evaluation (LREC 2018). https://aclanthology.org/L18-1509.pdf

  32. Dixit RS, Apte SS (2012) Improvement of text summarization using fuzzy logic-based method. IOSR J Comput Eng (IOSRJCE) 5(6):5–10 http://www.iosrjournals.org/

    Article  Google Scholar 

  33. Dunlavy DM, O’Leary DP, Conroy JM, Schlesinger JD (2007) QCS: a system for querying, clustering and summarizing documents. Inf Process Manag 43(6):1588–1605. https://doi.org/10.1016/j.ipm.2007.01.003

    Article  Google Scholar 

  34. Dutta M, Das AK, Mallick C, Sarkar A, Das AK (2019) A graph-based approach on extractive summarization. In emerging Technologies in Data Mining and Information Security (pp. 179–187). Springer, Singapore. https://doi.org/10.1007/978-981-13-1498-8_16

    Book  Google Scholar 

  35. Dwivedi V, Ghosh S (2022) Classification of Hindi compound nouns using machine learning. SN Comput Sci 3(1):1–5. https://doi.org/10.1007/s42979-021-00895-z

    Article  Google Scholar 

  36. Elayeb B, Chouigui A, Bounhas M, Khiroun OB (2020) Automatic arabic text summarization using analogical proportions. Cogn Comput 12(5):1043–1069. https://doi.org/10.1007/s12559-020-09748-y

    Article  Google Scholar 

  37. Elbarougy R, Behery G, El Khatib A (2020) Extractive Arabic text summarization using modified PageRank algorithm. Egypt Inf J 21(2):73–81. https://doi.org/10.1016/j.eij.2019.11.001

    Article  Google Scholar 

  38. El-Haj MO, Hammo BH (2008) Evaluation of query-based Arabic text summarization system. In 2008 international conference on natural language processing and knowledge engineering (pp. 1-7). IEEE. https://doi.org/10.1109/NLPKE.2008.4906790

  39. El-Kassas WS, Salama CR, Rafea AA, Mohamed HK (2021) Automatic text summarization: a comprehensive survey. Expert Syst Appl 165:113679. https://doi.org/10.1016/j.eswa.2020.113679

    Article  Google Scholar 

  40. Elrefaiy A, Abas AR, Elhenawy I (2018) Review of recent techniques for extractive text summarization. J Theor Appl Inf Technol 96(23):7739–7759

    Google Scholar 

  41. Erkan G, Radev DR (2004) Lexrank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479. https://doi.org/10.1613/jair.1523

    Article  Google Scholar 

  42. Fang C, Mu D, Deng Z, Wu Z (2017) Word-sentence co-ranking for automatic extractive text summarization. Expert Syst Appl 72:189–195. https://doi.org/10.1016/j.eswa.2016.12.021

    Article  Google Scholar 

  43. Fei L, Hu Y, Xiao F, Chen L, Deng Y (2016) A modified topsis method based on numbers and its applications in human resources selection Mathematical Problems in Engineering, 2016. https://doi.org/10.1155/2016/6145196

  44. Ferreira R, de Souza Cabral L, Lins RD, e Silva GP, Freitas F, Cavalcanti GD, Favaro L (2013) Assessing sentence scoring techniques for extractive text summarization. Expert Syst Appl 40(14):5755–5764. https://doi.org/10.1016/j.eswa.2013.04.023

  45. Ferreira R, de Souza Cabral L, Freitas F, Lins RD, de França Silva G, Simske SJ, Favaro L (2014) A multi-document summarization system based on statistics and linguistic treatment. Expert Syst Appl 41(13):5780–5787. https://doi.org/10.1016/j.eswa.2014.03.023

    Article  Google Scholar 

  46. Fitrianah D, Jauhari RN (2022) Extractive text summarization for scientific journal articles using long short-term memory and gated recurrent units. Bullet Electr Eng Inf 11(1). https://doi.org/10.11591/eei.v11i1.3278

  47. Gamal M, El-Sawy A, AbuEl-Atta AH (2021) Hybrid Algorithm Based on Chicken Swarm Optimization and Genetic Algorithm for Text Summarization. Int J Intell Eng Syst, Vol.14, No.3, https://doi.org/10.22266/ijies2021.0630.27

  48. Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47(1):1–66. https://doi.org/10.1007/s10462-016-9475-9

    Article  Google Scholar 

  49. Gambhir M, Gupta V (2022) Deep learning-based extractive text summarization with word-level attention mechanism. Multimed Tools Appl, 1-24. https://doi.org/10.1007/s11042-022-12729-y

  50. Gholamrezazadeh S, Salehi MA, Gholamzadeh B (2009) A comprehensive survey on text summarization systems. In: 2009 2nd international conference on computer science and its applications. IEEE, pp 1–6. https://doi.org/10.1109/CSA.2009.5404226

  51. Goldman J, Renals S, Bird S, De Jong F, Federico M, Fleischhauer C, Wright R (2005) Accessing the spoken word. Int J Digit Libr 5(4):287–298. https://doi.org/10.1007/s00799-004-0101-0

    Article  Google Scholar 

  52. Gong Y, Liu X (2001) Generic text summarization using relevance measure and latent semantic analysis. In proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval (pp. 19-25). https://doi.org/10.1145/383952.383955

  53. Goularte FB, Nassar SM, Fileto R, Saggion H (2019) A text summarization method based on fuzzy rules and applicable to automated assessment. Expert Syst Appl 115:264–275. https://doi.org/10.1016/j.eswa.2018.07.047

    Article  Google Scholar 

  54. Gupta V, Lehal GS (2010) A survey of text summarization extractive techniques. J Emerg Technol Web Intell 2(3):258–268. https://doi.org/10.4304/jetwi.2.3.258-268

    Article  Google Scholar 

  55. Gupta P, Pendluri VS, Vats I (2011) Summarizing text by ranking text units according to shallow linguistic features. In 13th international conference on advanced communication technology (ICACT2011) (pp. 1620-1625). IEEE

  56. Hassel M (2004) Evaluation of automatic text summarization. Licentiate Thesis, Stockholm, Sweden, pp 1–75

    Google Scholar 

  57. Hernández-Castañeda Á, García-Hernández RA, Ledeneva Y, Millán-Hernández CE (2022) Language-independent extractive automatic text summarization based on automatic keyword extraction. Comput Speech Lang 71:101267. https://doi.org/10.1016/j.csl.2021.101267

    Article  Google Scholar 

  58. Herskovic JR, Cohen T, Subramanian D, Iyengar MS, Smith JW, Bernstam EV (2011) MEDRank: using graph-based concept ranking to index biomedical texts. Int J Med Inform 80(6):431–441. https://doi.org/10.1016/j.ijmedinf.2011.02.008

    Article  Google Scholar 

  59. Hin D, Kan A, Chen H, Babar MA (2022) LineVD: statement-level vulnerability detection using graph neural networks. arXiv preprint arXiv:2203.05181.https://doi.org/10.48550/arXiv.2203.05181

  60. Irfan M, Zulfikar WB (2017) Implementation of fuzzy C-means algorithm and TF-IDF on English journal summary. In 2017 second international conference on informatics and computing (ICIC) (pp. 1-5). IEEE. https://doi.org/10.1109/IAC.2017.8280646

  61. Isonuma M, Fujino T, Mori J, Matsuo Y, Sakata I (2017) Extractive summarization using multi-task learning with document classification. In proceedings of the 2017 conference on empirical methods in natural language processing (pp. 2101-2110). https://doi.org/10.18653/v1/D17-1223

  62. Jain HJ, Bewoor MS, Patil SH (2012) Context sensitive text summarization using k means clustering algorithm. Int J Soft Comput Eng 2(2):301–304

    Google Scholar 

  63. Jain D, Borah MD, Biswas A (2021) Automatic summarization of legal bills: a comparative analysis of classical extractive approaches. In 2021 international conference on computing, communication, and intelligent systems (ICCCIS) (pp. 394-400). IEEE. https://doi.org/10.1109/ICCCIS51004.2021.9397119

  64. Jain A, Yadav D, Arora A (2021) Particle swarm optimization for Punjabi text summarization. Int J Oper Res Inf Syst (IJORIS) 12(3):1–17. https://doi.org/10.4018/IJORIS.20210701.oa1

    Article  Google Scholar 

  65. Jang M, Kang P (2021) Learning-free unsupervised extractive summarization model. IEEE Access 9:14358–14368. https://doi.org/10.1109/ACCESS.2021.3051237

    Article  Google Scholar 

  66. Jones KS (2007) Automatic summarising: the state of the art. Inf Process Manag 43(6):1449–1481. https://doi.org/10.1016/j.ipm.2007.03.009

    Article  Google Scholar 

  67. Joshi A, Fidalgo E, Alegre E, Fernández-Robles L (2019) SummCoder: an unsupervised framework for extractive text summarization based on deep auto-encoders. Expert Syst Appl 129:200–215. https://doi.org/10.1016/j.eswa.2019.03.045

    Article  Google Scholar 

  68. Joshi A, Fidalgo E, Alegre E, Alaiz-Rodriguez R (2022) RankSum—an unsupervised extractive text summarization based on rank fusion. Expert Syst Appl 200:116846. https://doi.org/10.1016/j.eswa.2022.116846

    Article  Google Scholar 

  69. Kågebäck M, Mogren O, Tahmasebi N, Dubhashi D (2014) Extractive summarization using continuous vector space models. In proceedings of the 2nd workshop on continuous vector space models and their compositionality (CVSC) (pp. 31-39). https://aclanthology.org/W14-1504.pdf

  70. Kaikhah K (2004) Automatic text summarization with neural networks. In 2004 2nd international IEEE conference on'Intelligent Systems'. Proceedings (IEEE cat. No. 04EX791) (Vol. 1, pp. 40-44). IEEE. https://doi.org/10.1109/IS.2004.1344634

  71. Keyvanpour MR, Shirzad MB, Rashidghalam H (2019) Elts: a brief review for extractive learning-based text summarizatoin algorithms. In 2019 5th international conference on web research (ICWR) (pp. 234-239). IEEE. https://doi.org/10.1109/ICWR.2019.8765294

  72. Khurana A, Bhatnagar V (2022) Investigating entropy for extractive document summarization. Expert Syst Appl 187:115820. https://doi.org/10.1016/j.eswa.2021.115820

    Article  Google Scholar 

  73. Kiyomarsi F, Esfahani FR (2011) Optimizing persian text summarization based on fuzzy logic approach. In 2011 international conference on intelligent building and management

  74. Koto F, Lau JH, Baldwin T (2021) Discourse probing of pretrained language models. arXiv preprint arXiv:2104.05882. https://doi.org/10.48550/arXiv.2104.05882

  75. Kumar YJ, Salim N, Abuobieda A, Albaham AT (2014) Multi document summarization based on news components using fuzzy cross-document relations. Appl Soft Comput 21:265–279. https://doi.org/10.1016/j.asoc.2014.03.041

    Article  Google Scholar 

  76. Kumar A, Sharma A, Nayyar A (2020) Fuzzy logic-based hybrid model for automatic extractive text summarization. In proceedings of the 2020 5th international conference on intelligent information technology (pp. 7-15). https://doi.org/10.1145/3385209.3385235

  77. Kumar Y, Kaur K, Kaur S (2021) Study of automatic text summarization approaches in different languages. Artif Intell Rev 54(8):5897–5929. https://doi.org/10.1007/s10462-021-09964-4

    Article  Google Scholar 

  78. LeClair A, Haque S, Wu L, McMillan C (2020) Improved code summarization via a graph neural network. In proceedings of the 28th international conference on program comprehension (pp. 184-195). https://doi.org/10.1145/3387904.3389268

  79. Li X, Du L, Shen YD (2012) Update summarization via graph-based sentence ranking. IEEE Trans Knowl Data Eng 25(5):1162–1174. https://doi.org/10.1109/TKDE.2012.42

    Article  Google Scholar 

  80. Lins RD, Oliveira H, Cabral L, Batista J, Tenorio B, Salcedo DA, Simske SJ (2019) The CNN-Corpus in Spanish: a large Corpus for extractive text summarization in the Spanish language. In proceedings of the ACM symposium on document engineering 2019 (pp. 1-4). https://doi.org/10.1145/3342558.3345423

  81. Lins RD, Oliveira H, Cabral L, Batista J, Tenorio B, Ferreira R, Simske SJ (2019) The cnn-corpus: A large textual corpus for single-document extractive summarization. In Proceedings of the ACM Symposium on Document Engineering 2019 (pp. 1–10). https://doi.org/10.1145/3342558.3345388

  82. Lins RD, Mello RF, Simske S (2019) DocEng'19 competition on extractive text summarization. In proceedings of the ACM symposium on document engineering 2019 (pp. 1-2). https://doi.org/10.1145/3342558.3351874

  83. Lins RD, de Mello RF, Simske SJ (2020) DocEng'2020 competition on extractive text summarization. In proceedings of the ACM symposium on document engineering 2020 (pp. 1-4). https://doi.org/10.1145/3395027.3419579

  84. Liu B (2012) Sentiment analysis and opinion mining. Synth Lectures Human Lang Technol 5(1):1–167. https://doi.org/10.2200/S00416ED1V01Y201204HLT016

    Article  MathSciNet  Google Scholar 

  85. Liu F, Liu Y (2008) Correlation between rouge and human evaluation of extractive meeting summaries. In proceedings of ACL-08: HLT, short papers (pp. 201-204). https://aclanthology.org/P08-2051.pdf

  86. Liu Y, Zhong SH, Li W (2012) Query-oriented multi-document summarization via unsupervised deep learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 26, no 1, pp 1699–1705. https://doi.org/10.1609/aaai.v26i1.8352

  87. Liu SH, Chen KY, Chen B, Wang HM, Yen HC, Hsu WL (2015) Combining relevance language modeling and clarity measure for extractive speech summarization. IEEE/ACM Transact Audio, Speech, Lang Process 23(6):957–969. https://doi.org/10.1109/TASLP.2015.2414820

    Article  Google Scholar 

  88. Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165. https://doi.org/10.1147/rd.22.0159

    Article  MathSciNet  Google Scholar 

  89. Luo L, Ao X, Song Y, Pan F, Yang M, He Q (2019) Reading like HER: human reading inspired extractive summarization. In proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) (pp. 3033-3043). https://doi.org/10.18653/v1/D19-1300

  90. Lwin SS, Nwet KT (2018) Extractive summarization for Myanmar language. In 2018 international joint symposium on artificial intelligence and natural language processing (iSAI-NLP) (pp. 1-6). IEEE. https://doi.org/10.1109/iSAI-NLP.2018.8692976

  91. Lwin SS, Nwet KT (2019) Extractive Myanmar news summarization using centroid based word embedding. In: 2019 international conference on advanced information technologies (ICAIT). IEEE, pp 200–205. https://doi.org/10.1109/AITC.2019.8921386

  92. Mandal S, Singh GK, Pal A (2018) A constraints driven PSO based approach for text summarization. J Inf Math Sci 10(4):703–714. https://doi.org/10.26713/jims.v10i4.891

    Article  Google Scholar 

  93. Mathkour HI, Touir AA, Al-Sanea WA (2008) Parsing Arabic texts using rhetorical structure theory. J Comput Sci 4(9):713–720

    Article  Google Scholar 

  94. Maurya AK (2020) Resource and task clustering based scheduling algorithm for workflow applications in cloud computing environment. In 2020 sixth international conference on parallel, distributed and grid computing (PDGC) (pp. 566-570). IEEE. https://doi.org/10.1109/PDGC50313.2020.9315806

  95. Maurya R, Singh SK, Maurya AK, Kumar A (2014) GLCM and multi class support vector machine based automated skin cancer classification. In 2014 international conference on computing for sustainable global development (INDIACom) (pp. 444-447). IEEE. https://doi.org/10.1109/IndiaCom.2014.6828177

  96. Maurya SK, Singh D, Maurya AK (2022) Deceptive opinion spam detection approaches: a literature survey. Applied intelligence, 1-46. https://doi.org/10.1007/s10489-022-03427-1

  97. Meena YK, Gopalani D (2015) Evolutionary algorithms for extractive automatic text summarization. Proced Comput Sci 48:244–249. https://doi.org/10.1016/j.procs.2015.04.177

    Article  Google Scholar 

  98. Mehta P, Majumder P (2018) Effective aggregation of various summarization techniques. Inf Process Manag 54(2):145–158. https://doi.org/10.1016/j.ipm.2017.11.002

    Article  Google Scholar 

  99. Mei JP, Chen L (2012) SumCR: a new subtopic-based extractive approach for text summarization. Knowl Inf Syst 31(3):527–545. https://doi.org/10.1007/s10115-011-0437-x

    Article  Google Scholar 

  100. Mendoza M, Bonilla S, Noguera C, Cobos C, León E (2014) Extractive single-document summarization based on genetic operators and guided local search. Expert Syst Appl 41(9):4158–4169. https://doi.org/10.1016/j.eswa.2013.12.042

    Article  Google Scholar 

  101. Merchant K, Pande Y (2018) Nlp based latent semantic analysis for legal text summarization. In 2018 international conference on advances in computing, communications and informatics (ICACCI) (pp. 1803-1807). IEEE. https://doi.org/10.1109/ICACCI.2018.8554831

  102. Mihalcea R, Tarau P (2004) Textrank: bringing order into text. In proceedings of the 2004 conference on empirical methods in natural language processing (pp. 404-411)

  103. MirShojaee H, Masoumi B, Zeinali E (2017) Biogeography-based optimization algorithm for automatic extractive text summarization. Int J Indust Eng Product Res 28(1):75–84 http://ijiepr.iust.ac.ir/article-1-722-en.html

    Google Scholar 

  104. Mirshojaei SH, Masoomi B (2015) Text summarization using cuckoo search optimization algorithm. J Comput Robot 8(2):19–24 http://www.qjcr.ir/article_683.html

    Google Scholar 

  105. Mohamed M, Oussalah M (2019) SRL-ESA-TextSum: a text summarization approach based on semantic role labeling and explicit semantic analysis. Inf Process Manag 56(4):1356–1372. https://doi.org/10.1016/j.ipm.2019.04.003

    Article  Google Scholar 

  106. Moiyadi HS, Desai H, Pawar D, Agrawal G, Patil NM (2016) NLP based text summarization using semantic analysis. Int J Adv Eng Manag Sci 2(10):239678

    Google Scholar 

  107. Moratanch N, Chitrakala S (2017) A survey on extractive text summarization. In: 2017 international conference on computer, communication and signal processing (ICCCSP). IEEE, pp 1–6. https://doi.org/10.1109/ICCCSP.2017.7944061

  108. Muthu B, Cb S, Kumar PM, Kadry SN, Hsu CH, Sanjuan O, Crespo RG (2021) A framework for extractive text summarization based on deep learning modified neural network classifier. Trans Asian Low-Resource Lang Inf Process 20(3):1–20. https://doi.org/10.1145/3392048

    Article  Google Scholar 

  109. Mutlu B, Sezer EA, Akcayol MA (2019) Multi-document extractive text summarization: a comparative assessment on features. Knowl-Based Syst 183:104848. https://doi.org/10.1016/j.knosys.2019.07.019

    Article  Google Scholar 

  110. Mutlu B, Sezer EA, Akcayol MA (2020) Candidate sentence selection for extractive text summarization. Inf Process Manag 57(6):102359. https://doi.org/10.1016/j.ipm.2020.102359

    Article  Google Scholar 

  111. Nagalla S, Kumar KC (2021) Oppositional lion optimization algorithm and deep neural network based multi-document summarization from large-scale documents. Eur J Mol Clin Med 7(10):1991–2009 https://www.ejmcm.com/article_6857.html

    Google Scholar 

  112. Naik SS, Gaonkar MN (2017) Extractive text summarization by feature-based sentence extraction using rule-based concept. In 2017 2nd IEEE international conference on recent trends in electronics, Information & Communication Technology (RTEICT) (pp. 1364-1368). IEEE. https://doi.org/10.1109/RTEICT.2017.8256821

  113. Nallapati R, Zhou B, Ma M (2016) Classify or select: neural architectures for extractive document summarization. arXiv preprint arXiv:1611.04244. https://doi.org/10.48550/arXiv.1611.04244

  114. Nallapati R, Zhai F, Zhou B (2017) Summarunner: a recurrent neural network-based sequence model for extractive summarization of documents. In Thirty-first AAAI conference on artificial intelligence https://doi.org/10.48550/arXiv.1611.04230, 31

  115. Narayan S, Cohen SB, Lapata M (2018) Ranking sentences for extractive summarization with reinforcement learning. arXiv preprint arXiv:1802.08636. https://doi.org/10.48550/arXiv.1802.08636

  116. Nawaz A, Bakhtyar M, Baber J, Ullah I, Noor W, Basit A (2020) Extractive text summarization models for Urdu language. Inf Process Manag 57(6):102383. https://doi.org/10.1016/j.ipm.2020.102383

    Article  Google Scholar 

  117. Neto JL, Freitas AA, Kaestner CA (2002) Automatic text summarization using a machine learning approach. In Brazilian symposium on artificial intelligence (pp. 205-215). Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36127-8_20

    Book  Google Scholar 

  118. Ozsoy MG, Alpaslan FN, Cicekli I (2011) Text summarization using latent semantic analysis. J Inf Sci 37(4):405–417. https://doi.org/10.1177/0165551511408848

    Article  MathSciNet  Google Scholar 

  119. Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In proceedings of the 40th annual meeting of the Association for Computational Linguistics (pp. 311-318). https://aclanthology.org/P02-1040.pdf

  120. Parveen D, Strube M (2015) Integrating importance, non-redundancy and coherence in graph-based extractive summarization. In: IJCAI'15: Proceedings of the 24th International Conference on Artificial Intelligence, pp 1298–1304

  121. Patel D, Shah S, Chhinkaniwala H (2019) Fuzzy logic-based multi-document summarization with improved sentence scoring and redundancy removal technique. Expert Syst Appl 134:167–177. https://doi.org/10.1016/j.eswa.2019.05.045

    Article  Google Scholar 

  122. Patil SR, Mahajan SM (2011) A novel approach for research paper abstracts summarization using cluster-based sentence extraction. In proceedings of the International Conference & Workshop on emerging trends in technology (pp. 583-586). https://doi.org/10.1145/1980022.1980150

  123. Potnurwar A, Pimpalshende A, Aote SS, Bongirwar V (2020) Extractive multi-document text summarization by using binary particle swarm optimization. Helix 10(04):263–265. https://doi.org/10.21786/bbrc/13.14/8

    Article  Google Scholar 

  124. Prasad SN, Narsimha VB, Reddy PV, Babu AV (2015) Influence of lexical, syntactic and structural features and their combination on authorship attribution for Telugu text. Proced Comput Sci 48:58–64. https://doi.org/10.1016/j.procs.2015.04.110

    Article  Google Scholar 

  125. Qaroush A, Farha IA, Ghanem W, Washaha M, Maali E (2021) An efficient single document Arabic text summarization using a combination of statistical and semantic features. J King Saud Univ Comput Inf Sci 33(6):677–692. https://doi.org/10.1016/j.jksuci.2019.03.010

    Article  Google Scholar 

  126. Rahman N, Borah B (2015) A survey on existing extractive techniques for query-based text summarization. In 2015 international symposium on advanced computing and communication (ISACC) (pp. 98-102). IEEE. https://doi.org/10.1109/ISACC.2015.7377323

  127. Rani R, Lobiyal DK (2021) An extractive text summarization approach using tagged-LDA based topic modeling. Multimed Tools Appl 80(3):3275–3305. https://doi.org/10.1007/s11042-020-09549-3

    Article  Google Scholar 

  128. Rautray R, Balabantaray RC (2017) Cat swarm optimization-based evolutionary framework for multi-document summarization. Physica A: Stat Mech Appl 477:174–186. https://doi.org/10.1016/j.physa.2017.02.056

    Article  Google Scholar 

  129. Raval KR, Goyani MM (2022) A survey on event detection-based video summarization for cricket. Multimed Tools Appl, 1-29. https://doi.org/10.1007/s11042-022-12834-y

  130. Ravinuthala VVMK, Chinnam SR (2017) A keyword extraction approach for single document extractive summarization based on topic centrality. Int J Intell Eng Syst https://doi.org/10.22266/ijies2017.1031.17

  131. Rothe S, Schütze H (2014) Cosimrank: a flexible & efficient graph-theoretic similarity measure. In proceedings of the 52nd annual meeting of the Association for Computational Linguistics (volume 1: long papers) (pp. 1392-1402). https://aclanthology.org/P14-1131.pdf

  132. Sahba R, Ebadi N, Jamshidi M, Rad P (2018) Automatic text summarization using customizable fuzzy features and attention on the context and vocabulary. In 2018 world automation congress (WAC) (pp. 1-5). IEEE. https://doi.org/10.23919/WAC.2018.8430483

  133. Sahoo D, Balabantaray R, Phukon M, Saikia S (2016) Aspect-based multi-document summarization. In 2016 international conference on computing, communication and automation (ICCCA) (pp. 873-877). IEEE. https://doi.org/10.1109/CCAA.2016.7813838

  134. Salton G, Singhal A, Mitra M, Buckley C (1997) Automatic text structuring and summarization. Inf Process Manag 33(2):193–207. https://doi.org/10.1016/S0306-4573(96)00062-3

    Article  Google Scholar 

  135. Sanchez-Gomez JM, Vega-Rodríguez MA, Pérez CJ (2018) Extractive multi-document text summarization using a multi-objective artificial bee colony optimization approach. Knowl-Based Syst 159:1–8. https://doi.org/10.1016/j.knosys.2017.11.029

    Article  Google Scholar 

  136. Sanchez-Gomez JM, Vega-Rodriguez MA, Perez CJ (2020) Experimental analysis of multiple criteria for extractive multi-document text summarization. Expert Syst Appl 140:112904. https://doi.org/10.1016/j.eswa.2019.112904

    Article  Google Scholar 

  137. Shaymal AK, Pal M (2007) Triangular fuzzy matrices. Iran J Fuzzy Syst 4(1):75–87 https://www.sid.ir/en/Journal/ViewPaper.aspx?ID=67072

    MathSciNet  Google Scholar 

  138. Shen C, Li T (2011) Learning to rank for query-focused multi-document summarization. In 2011 IEEE 11th international conference on data mining (pp. 626-634). IEEE. https://doi.org/10.1109/ICDM.2011.91

  139. Shirwandkar NS, Kulkarni S (2018) Extractive text summarization using deep learning. In 2018 fourth international conference on computing communication control and automation (ICCUBEA) (pp. 1-5). IEEE. https://doi.org/10.1109/ICCUBEA.2018.8697465

  140. Shoaib M, Maurya AK (2014) Comparative study of different web mining algorithms to discover knowledge on the web. In proceedings of Elsevier second international conference on emerging research in computing, information, communication and application (ERCICA-2014) (Vol. 3, pp. 648-654)

  141. Shoaib M, Maurya AK (2014) URL ordering-based performance evaluation of web crawler. In 2014 international conference on advances in Engineering & Technology Research (ICAETR-2014) (pp. 1-7). IEEE. https://doi.org/10.1109/ICAETR.2014.7012962

  142. Siddiqui MK, Ahmad A, Pal O, Ahmad T (2021) CoRank: a clustering cum graph ranking approach for extractive summarization. arXiv preprint arXiv:2106.00619. https://doi.org/10.48550/arXiv.2106.00619

  143. Singh SP, Kumar A, Mangal A, Singhal S (2016) Bilingual automatic text summarization using unsupervised deep learning. In 2016 international conference on electrical, electronics, and optimization techniques (ICEEOT) (pp. 1195-1200). IEEE. https://doi.org/10.1109/ICEEOT.2016.7754874

  144. Singh RK, Khetarpaul S, Gorantla R, Allada SG (2021) SHEG: summarization and headline generation of news articles using deep learning. Neural Comput & Applic 33(8):3251–3265. https://doi.org/10.1007/s00521-020-05188-9

    Article  Google Scholar 

  145. Sirohi NK, Bansal M, Rajan SN (2021) Recent approaches for text summarization using machine learning & LSTM0. J Big Data 3(1):35. https://doi.org/10.32604/jbd.2021.015954

    Article  Google Scholar 

  146. Song S, Huang H, Ruan T (2019) Abstractive text summarization using LSTM-CNN based deep learning. Multimed Tools Appl 78(1):857–875. https://doi.org/10.1007/s11042-018-5749-3

    Article  Google Scholar 

  147. Sreelakshmi PR, Manmadhan S (2021) Image summarization using unsupervised learning. In 2021 7th international conference on advanced computing and communication systems (ICACCS) (Vol. 1, pp. 100-103). IEEE. https://doi.org/10.1109/ICACCS51430.2021.9441682

  148. Srivastava AK, Pandey D, Agarwal A (2021) Extractive multi-document text summarization using dolphin swarm optimization approach. Multimed Tools Appl 80(7):11273–11290. https://doi.org/10.1007/s11042-020-10176-1

    Article  Google Scholar 

  149. Srivastava R, Singh P, Rana KPS, Kumar V (2022) A topic modeled unsupervised approach to single document extractive text summarization. Knowl-Based Syst 246:108636. https://doi.org/10.1016/j.knosys.2022.108636

    Article  Google Scholar 

  150. Steinberger J (2009) Evaluation measures for text summarization. Comput Inf 28(2):251–275 http://147.213.75.17/ojs/index.php/cai/article/view/37

    Google Scholar 

  151. Steinberger J, Jezek K (2004) Using latent semantic analysis in text summarization and summary evaluation. Proc ISIM 4(93-100):8

    Google Scholar 

  152. Suleman RM, Korkontzelos I (2020) Managing the syntactic blindness of latent semantic analysis. In CS & IT conference proceedings (Vol. 10, no. 4). CS & IT conference proceedings. https://doi.org/10.5121/csit.2020.100401

  153. Suleman RM, Korkontzelos I (2021) Extending latent semantic analysis to manage its syntactic blindness. Expert Syst Appl 165:114130. https://doi.org/10.1016/j.eswa.2020.114130

    Article  Google Scholar 

  154. Tarnpradab S, Liu F, Hua KA (2017) Toward extractive summarization of online forum discussions via hierarchical attention networks. Thirtieth Int Flairs Conf, 288-292. https://www.aaai.org/ocs/index.php/FLAIRS/FLAIRS17/paper/view/15500

  155. Thakkar HK, Sahoo PK, Mohanty P (2021) DOFM: domain feature miner for robust extractive summarization. Inf Process Manag 58(3):102474. https://doi.org/10.1016/j.ipm.2020.102474

    Article  Google Scholar 

  156. Thu HNT, Huu QN, Ngoc TNT (2013) A supervised learning method combine with dimensionality reduction in Vietnamese text summarization. In 2013 computing, communications and IT applications conference (ComComAp) (pp. 69-73). IEEE. https://doi.org/10.1109/ComComAp.2013.6533611

  157. Uçkan T, Karcı A (2020) Extractive multi-document text summarization based on graph independent sets. Egypt Inf J 21(3):145–157. https://doi.org/10.1016/j.eij.2019.12.002

    Article  Google Scholar 

  158. Vale R, Lins RD, Ferreira R (2020) An assessment of sentence simplification methods in extractive text summarization. In proceedings of the ACM symposium on document engineering 2020 (pp. 1-9). https://doi.org/10.1145/3395027.3419588

  159. Van Lierde H, Chow TW (2019) Query-oriented text summarization based on hypergraph transversals. Inf Process Manag 56(4):1317–1338. https://doi.org/10.1016/j.ipm.2019.03.003

    Article  Google Scholar 

  160. Verma P, Verma A, Pal S (2022) An approach for extractive text summarization using fuzzy evolutionary and clustering algorithms. Appl Soft Comput 120:108670. https://doi.org/10.1016/j.asoc.2022.108670

    Article  Google Scholar 

  161. Wang D, Zhu S, Li T, Chi Y, Gong Y (2011) Integrating document clustering and multi-document summarization. ACM Trans Knowl Discov Data (TKDD) 5(3):1–26. https://doi.org/10.1145/1993077.1993078

    Article  Google Scholar 

  162. Wang S, Zhao X, Li B, Ge B, Tang D (2017) Integrating extractive and abstractive models for long text summarization. In 2017 IEEE international congress on big data (BigData congress) (pp. 305-312). IEEE. https://doi.org/10.1109/BigDataCongress.2017.46

  163. Wang X, Nie X, Liu X, Wang B, Yin Y (2020) Modality correlation-based video summarization. Multimed Tools Appl 79(45):33875–33890. https://doi.org/10.1007/s11042-020-08690-3

    Article  Google Scholar 

  164. Wang D, Liu P, Zheng Y, Qiu X, Huang X (2020) Heterogeneous graph neural networks for extractive document summarization. arXiv preprint arXiv:2004.12393. https://doi.org/10.48550/arXiv.2004.12393

  165. Wu K, Shi P, Pan D (2015) An approach to automatic summarization for chinese text based on the combination of spectral clustering and LexRank. In 2015 12th international conference on fuzzy systems and knowledge discovery (FSKD) (pp. 1350-1354). IEEE. https://doi.org/10.1109/FSKD.2015.7382140

  166. Wu Z, Lei L, Li G, Huang H, Zheng C, Chen E, Xu G (2017) A topic modeling-based approach to novel document automatic summarization. Expert Syst Appl 84:12–23. https://doi.org/10.1016/j.eswa.2017.04.054

    Article  Google Scholar 

  167. Wu M, Pan S, Zhou C, Chang X, Zhu X (2020) Unsupervised domain adaptive graph convolutional networks. In proceedings of the web conference 2020 (pp. 1457-1467). https://doi.org/10.1145/3366423.3380219

  168. Xu J, Durrett G (2019) Neural extractive text summarization with syntactic compression. arXiv preprint arXiv:1902.00863. https://doi.org/10.48550/arXiv.1902.00863

  169. Yadav J, Meena YK (2016) Use of fuzzy logic and WordNet for improving performance of extractive automatic text summarization. In 2016 international conference on advances in computing, communications and informatics (ICACCI) (pp. 2071-2077). IEEE. https://doi.org/10.1109/ICACCI.2016.7732356

  170. Yadav AK, Saxena S (2016) A new conception of information requisition in web of things. Indian journal of science and technology, 9(44). https://doi.org/10.17485/ijst/2016/v9i44/105143

  171. Yadav H, Ghosh S, Yu Y, Shah RR (2020) End-to-end named entity recognition from English speech. arXivpreprintarXiv:2005.11184. https://doi.org/10.48550/arXiv.2005.11184

  172. Yadav AK, Maurya AK, Yadav RS (2021) Extractive text summarization using recent approaches: a survey. Ingénierie des Systèmes d'Information, 26(1). https://doi.org/10.18280/isi.260112

  173. Ye S, Chua TS, Kan MY, Qiu L (2007) Document concept lattice for text understanding and summarization. Inf Process Manag 43(6):1643–1662. https://doi.org/10.1016/j.ipm.2007.03.010

    Article  Google Scholar 

  174. Yogatama D, Liu F, Smith NA (2015) Extractive summarization by maximizing semantic volume. In proceedings of the 2015 conference on empirical methods in natural language processing (pp. 1961-1966). https://aclanthology.org/D15-1228.pdf

  175. Yu W, Lin X, Zhang W (2013) Towards efficient SimRank computation on large networks. In 2013 IEEE 29th international conference on data engineering (ICDE) (pp. 601-612). IEEE. https://doi.org/10.1109/ICDE.2013.6544859

  176. Zajic DM, Dorr BJ, Lin J (2008) Single-document and multi-document summarization techniques for email threads using sentence compression. Inf Process Manag 44(4):1600–1610. https://doi.org/10.1016/j.ipm.2007.09.007

    Article  Google Scholar 

  177. Zhang K, Xiao Y, Tong H, Wang H, Wang W (2014) WiiCluster: a platform for wikipedia infobox generation. In proceedings of the 23rd ACM international conference on conference on information and knowledge management (pp. 2033-2035). https://doi.org/10.1145/2661829.2661840

  178. Zopf M, Botschen T, Falke T, Heinzerling B, Marasovic A, Mihaylov T, Frank A (2018) What’s important in a text? An extensive evaluation of linguistic annotations for summarization. In 2018 fifth international conference on social networks analysis, management and security (SNAMS) (pp. 272-277). IEEE. https://doi.org/10.1109/SNAMS.2018.8554853

Download references

Acknowledgments

We value the opinions of innominate reviewers.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Avaneesh Kumar Yadav.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest in this manuscript.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yadav, A.K., Ranvijay, Yadav, R.S. et al. State-of-the-art approach to extractive text summarization: a comprehensive review. Multimed Tools Appl 82, 29135–29197 (2023). https://doi.org/10.1007/s11042-023-14613-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14613-9

Keywords

Navigation