Query-based multi-documents summarization using linguistic knowledge and content word expansion

Asad Abdi¹,
Norisma Idris¹,
Rasim M. Alguliyev² &
…
Ramiz M. Aliguliyev²

998 Accesses
Explore all metrics

Abstract

In this paper, a query-based summarization method, which uses a combination of semantic relations between words and their syntactic composition, to extract meaningful sentences from document sets is introduced. The problem with current statistical methods is that they fail to capture the meaning when comparing a sentence and a user query; hence there is often a conflict between the extracted sentences and users’ requirements. However, this particular method can improve the quality of document summaries because it is able to avoid extracting a sentence whose similarity with the query is high but whose meaning is different. The method is executed by computing the semantic and syntactic similarity of the sentence-to-sentence and sentence-to-query. To reduce redundancy in summary, this method uses the greedy algorithm to impose diversity penalty on the sentences. In addition, the proposed method expands the words in both the query and the sentences to tackle the problem of information limit. It bridges the lexical gaps for semantically similar contexts that are expressed using different wording. The experimental results display that the proposed method is able to improve performance compared with the participating systems in DUC 2006. The experimental results also showed that the proposed method demonstrates better performance as compared to other existing techniques on DUC 2005 and DUC 2006 datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Context-Based Multi-document Summarization

Sentence Similarity Using Syntactic and Semantic Features for Multi-document Summarization

A Method for Semantic Relatedness Based Query Focused Text Summarization

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Abdi A, Idris N (2014) Automated summarization assessment system: quality assessment without a reference summary. In: The international conference on advances in applied science and environmental engineering (ASEE). IRED Press
Abdi A, Idris N, Alguliev RM, Aliguliyev RM (2015) Automatic summarization assessment through a combination of semantic and syntactic information for intelligent educational systems. Inf Process Manag 51:340–358
Article Google Scholar
Alguliev RM, Aliguliyev RM, Mehdiyev CA (2011) Sentence selection for generic document summarization using an adaptive differential evolution algorithm. SwarmEvol Comput 1:213–222
Article Google Scholar
Aliguliyev RM (2009) A new sentence similarity measure and sentence based extractive technique for automatic text summarization. Expert Syst Appl 36:7764–7772
Article Google Scholar
Aytar Y, Shah M, Luo J (2008) Utilizing semantic word similarity measures for video retrieval. In: IEEE conference on Computer vision and pattern recognition (CVPR). IEEE, pp 1–8
Badrinath R, Venkatasubramaniyan S, Madhavan CV (2011) Improving query focused summarization using look-ahead strategy. In: Advances in information retrieval. Springer, pp 641–652
Basak D, Pal S, Patranabis DC (2007) Support vector regression. Neural Inf Process Lett Rev 11:203–224
Google Scholar
Burgess C, Livesay K, Lund K (1998) Explorations in context space: words, sentences, discourse. Discourse Process 25:211–257
Article Google Scholar
Canhasi E, Kononenko I (2014) Weighted archetypal analysis of the multi-element graph for query-focused multi-document summarization. Expert Syst Appl 41:535–543
Article Google Scholar
Carbonell J, Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 335–336
Chali Y, Hasan SA, Joty SR (2011) Improving graph-based random walks for complex question answering using syntactic, shallow semantic and extended string subsequence kernels. Inf Process Manag 47:843–855
Article Google Scholar
Conroy JM, Schlesinger JD, O’leary DP, Goldstein J (2006) Back to basics: CLASSY 2006. In: Proceedings of DUC
Davidson I, Ravi S (2005) Agglomerative hierarchical clustering with constraints: theoretical and empirical results. In: Knowledge discovery in databases: PKDD. Springer, pp 59–70
Erkan G, Radev DR (2004) LexRank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479
Google Scholar
Favre B et al (2006) The LIA-Thales summarization system at DUC-2006. In: Proceedings of document understanding conference (DUC-2006), New York, USA
Goldstein J, Mittal V, Carbonell J, Kantrowitz M (2000) Multi-document summarization by sentence extraction. In: Proceedings of the 2000 NAACL-ANLP workshop on automatic summarization-volume 4. Association for Computational Linguistics, pp 40–48
Guangbing Y (2014) A novel contextual topic model for query-focused multi-document summarization. In: IEEE 26th international conference on tools with artificial intelligence (ICTAI), 10–12 Nov 2014, pp 576–583. doi:10.1109/ICTAI.2014.92
He Q, Hao H-W, Yin X-C (2012) Query-based automatic multi-document summarization extraction method for web pages. In: Proceedings of the 2011 2nd international congress on computer applications and computational science. Springer, pp 107–112
Hoa H (2006) Overview of DUC 2006. In: Document understanding conference. New York City
Hu P, He T, Wang H (2010) Multi-view sentence ranking for query-biased summarization. In: 2010 international conference on computational intelligence and software engineering (CiSE). IEEE, pp 1–4
Huang L, He Y, Wei F, Li W (2010) Modeling document summarization as multi-objective optimization. In: 2010 third international symposium on intelligent information technology and security informatics (IITSI). IEEE, pp 382–386
Idris N, Baba S, Abdullah R (2009) A summary sentence decomposition algorithm for summarizing strategies identification. Comput Inf Sci 2:P200
Google Scholar
Jagarlamudi PPJ, Varma V (2006) Query independent sentence scoring approach to duc 2006. In: In Proceeding of document understanding conference (DUC-2006)
Kanejiya D, Kumar A, Prasad S (2003) Automatic evaluation of students’ answers using syntactically enhanced LSA. In: Proceedings of the HLT-NAACL 03 workshop on Building educational applications using natural language processing-volume 2. Association for Computational Linguistics, pp 53–60
Landauer TK (2002) On the computational basis of learning and cognition: arguments from LSA. Psychol Learn Motiv 41:43–84
Article Google Scholar
Landauer TK, Foltz PW, Laham D (1998) An introduction to latent semantic analysis. Discourse process 25:259–284
Article Google Scholar
Lee J-H, Park S, Ahn C-M, Kim D (2009) Automatic generic document summarization based on non-negative matrix factorization. Inf Process Manag 45:20–34
Article Google Scholar
Li S, Ouyang Y, Sun B, Guo Z (2006a) Peking University at DUC 2006. In: Proceedings of DUC2006
Li Y, McLean D, Bandar ZA, O’shea JD, Crockett K (2006b) Sentence similarity based on semantic nets and corpus statistics. IEEE Trans Knowl Data Eng 18:1138–1150
Lin C-Y (2004) Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out: proceedings of the ACL-04 workshop, pp 74–81
Lloret E, Llorens H, Moreda P, Saquete E, Palomar M (2011) Text summarization contribution to semantic question answering: new approaches for finding answers on the web. Int J Intell Syst 26:1125–1152
Article Google Scholar
Lu W, Cheng J, Yang Q (2012) Question answering system based on web. In: Proceedings of the 2012 fifth international conference on intelligent computation technology and automation. IEEE Computer Society, pp 573–576
Mendoza M, Bonilla S, Noguera C, Cobos C, León E (2014) Extractive single-document summarization based on genetic operators and guided local search. Expert Syst Appl 41:4158–4169
Article Google Scholar
Mihalcea R, Corley C, Strapparava C (2006) Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI, pp 775–780
Miller GA, Charles WG (1991) Contextual correlates of semantic similarity. Lang Cogn Process 6:1–28
Article Google Scholar
Otterbacher J, Erkan G, Radev DR (2005) Using random walks for question-focused sentence retrieval. In: Proceedings of the conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, pp 915–922
Ouyang Y, Li W, Li S, Lu Q (2010) Intertopic information mining for query-based summarization. J Am Soc Inf Sci Technol 61:1062–1072
Article Google Scholar
Ouyang Y, Li W, Li S, Lu Q (2011) Applying regression models to query-focused multi-document summarization. Inf Process Manag 47:227–237
Article Google Scholar
Pandit SR, Potey M (2013) A query specific graph based approach to multi-document text summarization: simultaneous cluster and sentence ranking. In: 2013 international conference on machine intelligence and research advancement (ICMIRA). IEEE, pp 213–217
Pérez D, Gliozzo AM, Strapparava C, Alfonseca E, Rodríguez P, Magnini B (2005) Automatic assessment of students’ free-text answers underpinned by the combination of a BLEU-inspired algorithm and latent semantic analysis. In: FLAIRS conference, pp 358–363
Saggion H, Poibeau T (2013) Automatic text summarization: past, present and future. In: Multi-source, multilingual information extraction and summarization. Springer, pp 3–21
Salton G (1989) Automatic text processing: the transformation, analysis, and retrieval of. Addison-Wesley, Reading
Sarker A, Mollá D, Paris C (2013) An approach for query-focused text summarisation for evidence based medicine. In: Artificial intelligence in medicine. Springer, pp 295–304
Shekhar S, Xiong H (2008) Nearest neighbor algorithm encyclopedia of GIS:771–771
Tang J, Yao L, Chen D (2009) Multi-topic based query-oriented summarization. In: SDM. SIAM, pp 1147–1158
Varadarajan R, Hristidis V (2006) A system for query-specific document summarization. In: Proceedings of the 15th ACM international conference on Information and knowledge management. ACM, pp 622–631
Wan X, Yang J, Xiao J (2007) Manifold-ranking based topic-focused multi-document summarization. In: IJCAI, pp 2903–2908
Warin M (2004) Using WordNet and semantic similarity to disambiguate an ontology retrieved 25 Jan 2008
Wei F, Li W, He Y (2011) Document-aware graph models for query-oriented multi-document summarization. In: Multimedia analysis, processing and communications. Springer, pp 655–678
Wiemer-Hastings P, Wiemer P (2000) Adding syntactic information to LSA. In: Proceedings of the 22nd annual meeting of the Cognitive Science Society. Citeseer
Wiemer-Hastings P, Zipitria I (2001) Rules for syntax, vectors for semantics. In: Proceedings of the twenty-third annual conference of the Cognitive Science Society, pp 1112–1117
Yang G, Wen D, Sutinen E(2013) A contextual query expansion based multi-document summarizer for smart learning. In: 2013 international conference on signal-image technology & internet-based systems (SITIS). IEEE, pp 1010–1016
Ye S, Chua T-S (2006) NUS at DUC 2006: document concept lattice for summarization. In: Proceedings of DUC
Zhang B et al (2005) Improving web search results using affinity graph. In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 504–511
Zhao L, Wu L, Huang X (2009) Using query expansion in graph-based approach for query-focused multi-document summarization. Inf Process Manag 45:35–41
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Artificial Intelligence, Faculty of Computer Science and Information Technology, University of Malaya, 50603, Kuala Lumpur, Malaysia
Asad Abdi & Norisma Idris
Institute of Information Technology, Azerbaijan National Academy of Sciences, 9, B. Vahabzade Street, AZ 1141, Baku, Azerbaijan
Rasim M. Alguliyev & Ramiz M. Aliguliyev

Authors

Asad Abdi
View author publications
You can also search for this author in PubMed Google Scholar
Norisma Idris
View author publications
You can also search for this author in PubMed Google Scholar
Rasim M. Alguliyev
View author publications
You can also search for this author in PubMed Google Scholar
Ramiz M. Aliguliyev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Asad Abdi.

Ethics declarations

Conflict of interest

I hereby and on behalf of the co-authors declare all the authors agreed to submit the article exclusively to this journal and also declare that there is no conflict of interests regarding the publication of this article.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abdi, A., Idris, N., Alguliyev, R.M. et al. Query-based multi-documents summarization using linguistic knowledge and content word expansion. Soft Comput 21, 1785–1801 (2017). https://doi.org/10.1007/s00500-015-1881-4

Download citation

Published: 23 September 2015
Issue Date: April 2017
DOI: https://doi.org/10.1007/s00500-015-1881-4

Query-based multi-documents summarization using linguistic knowledge and content word expansion

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Context-Based Multi-document Summarization

Sentence Similarity Using Syntactic and Semantic Features for Multi-document Summarization

A Method for Semantic Relatedness Based Query Focused Text Summarization

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Query-based multi-documents summarization using linguistic knowledge and content word expansion

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Context-Based Multi-document Summarization

Sentence Similarity Using Syntactic and Semantic Features for Multi-document Summarization

A Method for Semantic Relatedness Based Query Focused Text Summarization

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now