research-article

Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization

Authors:

Chris DingAuthors Info & Claims

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Pages 307 - 314

https://doi.org/10.1145/1390334.1390387

Published: 20 July 2008 Publication History

Abstract

Multi-document summarization aims to create a compressed summary while retaining the main characteristics of the original set of documents. Many approaches use statistics and machine learning techniques to extract sentences from documents. In this paper, we propose a new multi-document summarization framework based on sentence-level semantic analysis and symmetric non-negative matrix factorization. We first calculate sentence-sentence similarities using semantic analysis and construct the similarity matrix. Then symmetric matrix factorization, which has been shown to be equivalent to normalized spectral clustering, is used to group sentences into clusters. Finally, the most informative sentences are selected from each group to form the summary. Experimental results on DUC2005 and DUC2006 data sets demonstrate the improvement of our proposed framework over the implemented existing summarization systems. A further study on the factors that benefit the high performance is also conducted.

References

[1]

http://www-nlpir.nist.gov/projects/duc/pubs/.

[2]

M. Amini and P. Gallinari. The use of unlabeled data to improve supervised learning for text summarization. In Prodeedings of SIGIR 2002.

Digital Library

[3]

D. Arnold, L. Balkan, S. Meijer, R. Humphreys, and L. Sadler. Machine Translation: an Introductory Guide. Blackwells-NCC, 1994.

[4]

C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.

Digital Library

[5]

J. Conroy and D. O'Leary. Text summarization via hidden markov models. In Proceedings of SIGIR 2001.

Digital Library

[6]

I. S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of KDD 2001.

Digital Library

[7]

C. Ding and X. He. K-means clustering and principal component analysis. In Prodeedings of ICML 2004.

Digital Library

[8]

C. Ding, T. Li, W. Peng, and H. Park. Orthogonal nonnegative matrix t-factorizations for clustering. In Proceedings of KDD 2006.

Digital Library

[9]

G. Erkan and D. Radev. Lexpagerank: Prestige in multi-document text summarization. In Proceedings of EMNLP 2004.

[10]

C. Fellbaum. WordNet: An Electronic Lexical Database. MIT Press, 1998.

[11]

J. Goldstein, M. Kantrowitz, V. Mittal, and J. Carbonell. Summarizing text documents: Sentence selection and evaluation metrics. In Proceedings of SIGIR 1999.

Digital Library

[12]

Y. Gong and X. Liu. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of SIGIR 2001.

Digital Library

[13]

S. Harabagiu and F. Lacatusu. Topic themes for multi-document summarization. In Prodeedings of SIGIR 2005.

Digital Library

[14]

T. Hirao, Y. Sasaki, and H. Isozaki. An extrinsic evaluation for question-biased text summarization on qa tasks. In Prodeedings of NAACL 2001 workshop on Automatic Summarization.

[15]

H. Jing and K. McKeown. Cut and paste based text summarization. In Prodeedings of NAACL 2000.

Digital Library

[16]

K. Knight and D. Marcu. Summarization beyond sentence extraction: a probablistic approach to sentence compression. Artificial Intelligence, pages 91--107, 2002.

Digital Library

[17]

D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In NIPS 2001.

Digital Library

[18]

T. Li. A general model for clustering binary data. In Proceedings of SIGKDD 2005, pages 188--197.

Digital Library

[19]

C.-Y. Lin and E.Hovy. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of NLT-NAACL 2003.

Digital Library

[20]

C.-Y. Lin and E. Hovy. From single to multi-document summarization: A prototype system and its evaluation. In Proceedings of ACL 2002.

Digital Library

[21]

I. Mani. Automatic summarization. John Benjamins Publishing Company, 2001.

[22]

R. Mihalcea and P. Tarau. A language independent algorithm for single and multiple document summarization. In Proceedings of IJCNLP 2005.

[23]

M. Palmer, P. Kingsbury, and D. Gildea. The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, pages 71--106, 2005.

Digital Library

[24]

S. Park, J.-H. Lee, D.-H. Kim, and C.-M. Ahn. Multi-document summarization based on cluster using non-negtive matrix factorization. In Proceedings of SOFSEM 2007.

Digital Library

[25]

D. Radev, E. Hovy, and K. Mckeown. Introduction to the special issue on summarization. Computational Linguistics, pages 399--408, 2002.

Digital Library

[26]

D. Radev, H. Jing, M. Stys, and D. Tam. Centroid-based summarization of multiple documents. Information Processing and Management, pages 919--938, 2004.

Digital Library

[27]

B. Ricardo and R. Berthier. Modern information retrieval. ACM Press, 1999.

Digital Library

[28]

G. Sampathsampath and M. Martinovic. A Multilevel Text Processing Model of Newsgroup Dynamics. 2002.

[29]

D. Shen, J.-T. Sun, H. Li, Q. Yang, and Z. Chen. Document summarization using conditional random fields. In Proceedings of IJCAI 2007.

Digital Library

[30]

J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE. Trans. on Pattern Analysis and Machine Intelligence, 22:888--905, 2000.

Digital Library

[31]

A. Turpin, Y. Tsegay, D. Hawking, and H. Williams. Fast generation of result snippets in web search. In Proceedings of SIGIR 2007.

Digital Library

[32]

X. Wan, J. Yang, and J. Xiao. Manifold-ranking based topic-focused multi-document summarization. In Proceedings of IJCAI 2007.

Digital Library

[33]

W.-T. Yih, J. Goodman, L. Vanderwende, and H. Suzuki. Multi-document summarization by maximizing informative content-words. In Proceedings of IJCAI 2007.

Digital Library

[34]

H. Zha. Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In Prodeedings of SIGIR 2005.

Digital Library

Cited By

Cai KChen ZGuo HWang SLi GLi JChen FFeng H(2024)An Evaluative Baseline for Sentence-Level Semantic DivisionMachine Learning and Knowledge Extraction10.3390/make60100036:1(41-52)Online publication date: 2-Jan-2024
https://doi.org/10.3390/make6010003
Liu ZLuo XZhou M(2024)Symmetry and Graph Bi-Regularized Non-Negative Matrix Factorization for Precise Community DetectionIEEE Transactions on Automation Science and Engineering10.1109/TASE.2023.324033521:2(1406-1420)Online publication date: Apr-2024
https://doi.org/10.1109/TASE.2023.3240335
Ma B(2024)Mining Both Commonality and Specificity From Multiple Documents for Multi-Document SummarizationIEEE Access10.1109/ACCESS.2024.338849312(54371-54381)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3388493
Show More Cited By

Index Terms

Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Multi-document summarization using cluster-based link analysis
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

The Markov Random Walk model has been recently exploited for multi-document summarization by making use of the link relationships between sentences in the document set, under the assumption that all the sentences are indistinguishable from each other. ...
Latent dirichlet allocation based multi-document summarization
AND '08: Proceedings of the second workshop on Analytics for noisy unstructured text data

Extraction based Multi-Document Summarization Algorithms consist of choosing sentences from the documents using some weighting mechanism and combining them into a summary. In this article we use Latent Dirichlet Allocation to capture the events being ...
Research on Multi-document Summarization Based on LDA Topic Model
IHMSC '14: Proceedings of the 2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02

Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

July 2008

934 pages

ISBN:9781605581644

DOI:10.1145/1390334

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Mun-Kew Leong
National Library Board, Singapore
,
Program Chairs:
Syung Hyon Myaeng
Information and Communications University, Korea
,
Douglas W. Oard
University of Maryland, College Park, USA
,
Fabrizio Sebastiani
Consiglio Nazionale delle Ricerche, Italy

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 July 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGIR '08

Sponsor:

SIGIR '08: The 31st Annual International ACM SIGIR Conference

July 20 - 24, 2008

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

218
Total Citations
View Citations
2,246
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cai KChen ZGuo HWang SLi GLi JChen FFeng H(2024)An Evaluative Baseline for Sentence-Level Semantic DivisionMachine Learning and Knowledge Extraction10.3390/make60100036:1(41-52)Online publication date: 2-Jan-2024
https://doi.org/10.3390/make6010003
Liu ZLuo XZhou M(2024)Symmetry and Graph Bi-Regularized Non-Negative Matrix Factorization for Precise Community DetectionIEEE Transactions on Automation Science and Engineering10.1109/TASE.2023.324033521:2(1406-1420)Online publication date: Apr-2024
https://doi.org/10.1109/TASE.2023.3240335
Ma B(2024)Mining Both Commonality and Specificity From Multiple Documents for Multi-Document SummarizationIEEE Access10.1109/ACCESS.2024.338849312(54371-54381)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3388493
Jain MJindal RJain A(2024)An experimental study of game theory with various word embeddings for automatic extractive text summarizationMultimedia Tools and Applications10.1007/s11042-024-19828-yOnline publication date: 20-Jul-2024
https://doi.org/10.1007/s11042-024-19828-y
Dey ABhattacharyya SDey SKonar DPlatos JSnasel VMrsic LPal P(2023)A Review of Quantum-Inspired Metaheuristic Algorithms for Automatic ClusteringMathematics10.3390/math1109201811:9(2018)Online publication date: 24-Apr-2023
https://doi.org/10.3390/math11092018
Li WWang RLuo XZhou M(2023)A Second-Order Symmetric Non-Negative Latent Factor Model for Undirected Weighted Network RepresentationIEEE Transactions on Network Science and Engineering10.1109/TNSE.2022.320680210:2(606-618)Online publication date: 1-Mar-2023
https://doi.org/10.1109/TNSE.2022.3206802
Lubashevskiy VLubashevsky I(2023)Evolutionary Approach for Detecting Significant Edges in Social and Communication NetworksIEEE Access10.1109/ACCESS.2023.328490611(58046-58054)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3284906
Anan SIslam NAli MBhuiyan TBijoy MReza AArefin M(2023)Automatic Document Summarization of Unilingual Documents: A ReviewIntelligent Computing and Optimization10.1007/978-3-031-50327-6_36(345-358)Online publication date: 16-Dec-2023
https://doi.org/10.1007/978-3-031-50327-6_36
Bedi PBala MSharma K(2022)Extractive summarization using concept‐space and keyword phraseExpert Systems10.1111/exsy.1311039:10Online publication date: 10-Aug-2022
https://doi.org/10.1111/exsy.13110
Luo XLiu ZJin LZhou YZhou M(2022)Symmetric Nonnegative Matrix Factorization-Based Community Detection Models and Their Convergence AnalysisIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2020.304136033:3(1203-1215)Online publication date: Mar-2022
https://doi.org/10.1109/TNNLS.2020.3041360
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents