Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1390334.1390387acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization

Published: 20 July 2008 Publication History

Abstract

Multi-document summarization aims to create a compressed summary while retaining the main characteristics of the original set of documents. Many approaches use statistics and machine learning techniques to extract sentences from documents. In this paper, we propose a new multi-document summarization framework based on sentence-level semantic analysis and symmetric non-negative matrix factorization. We first calculate sentence-sentence similarities using semantic analysis and construct the similarity matrix. Then symmetric matrix factorization, which has been shown to be equivalent to normalized spectral clustering, is used to group sentences into clusters. Finally, the most informative sentences are selected from each group to form the summary. Experimental results on DUC2005 and DUC2006 data sets demonstrate the improvement of our proposed framework over the implemented existing summarization systems. A further study on the factors that benefit the high performance is also conducted.

References

[1]
http://www-nlpir.nist.gov/projects/duc/pubs/.
[2]
M. Amini and P. Gallinari. The use of unlabeled data to improve supervised learning for text summarization. In Prodeedings of SIGIR 2002.
[3]
D. Arnold, L. Balkan, S. Meijer, R. Humphreys, and L. Sadler. Machine Translation: an Introductory Guide. Blackwells-NCC, 1994.
[4]
C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
[5]
J. Conroy and D. O'Leary. Text summarization via hidden markov models. In Proceedings of SIGIR 2001.
[6]
I. S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of KDD 2001.
[7]
C. Ding and X. He. K-means clustering and principal component analysis. In Prodeedings of ICML 2004.
[8]
C. Ding, T. Li, W. Peng, and H. Park. Orthogonal nonnegative matrix t-factorizations for clustering. In Proceedings of KDD 2006.
[9]
G. Erkan and D. Radev. Lexpagerank: Prestige in multi-document text summarization. In Proceedings of EMNLP 2004.
[10]
C. Fellbaum. WordNet: An Electronic Lexical Database. MIT Press, 1998.
[11]
J. Goldstein, M. Kantrowitz, V. Mittal, and J. Carbonell. Summarizing text documents: Sentence selection and evaluation metrics. In Proceedings of SIGIR 1999.
[12]
Y. Gong and X. Liu. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of SIGIR 2001.
[13]
S. Harabagiu and F. Lacatusu. Topic themes for multi-document summarization. In Prodeedings of SIGIR 2005.
[14]
T. Hirao, Y. Sasaki, and H. Isozaki. An extrinsic evaluation for question-biased text summarization on qa tasks. In Prodeedings of NAACL 2001 workshop on Automatic Summarization.
[15]
H. Jing and K. McKeown. Cut and paste based text summarization. In Prodeedings of NAACL 2000.
[16]
K. Knight and D. Marcu. Summarization beyond sentence extraction: a probablistic approach to sentence compression. Artificial Intelligence, pages 91--107, 2002.
[17]
D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In NIPS 2001.
[18]
T. Li. A general model for clustering binary data. In Proceedings of SIGKDD 2005, pages 188--197.
[19]
C.-Y. Lin and E.Hovy. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of NLT-NAACL 2003.
[20]
C.-Y. Lin and E. Hovy. From single to multi-document summarization: A prototype system and its evaluation. In Proceedings of ACL 2002.
[21]
I. Mani. Automatic summarization. John Benjamins Publishing Company, 2001.
[22]
R. Mihalcea and P. Tarau. A language independent algorithm for single and multiple document summarization. In Proceedings of IJCNLP 2005.
[23]
M. Palmer, P. Kingsbury, and D. Gildea. The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, pages 71--106, 2005.
[24]
S. Park, J.-H. Lee, D.-H. Kim, and C.-M. Ahn. Multi-document summarization based on cluster using non-negtive matrix factorization. In Proceedings of SOFSEM 2007.
[25]
D. Radev, E. Hovy, and K. Mckeown. Introduction to the special issue on summarization. Computational Linguistics, pages 399--408, 2002.
[26]
D. Radev, H. Jing, M. Stys, and D. Tam. Centroid-based summarization of multiple documents. Information Processing and Management, pages 919--938, 2004.
[27]
B. Ricardo and R. Berthier. Modern information retrieval. ACM Press, 1999.
[28]
G. Sampathsampath and M. Martinovic. A Multilevel Text Processing Model of Newsgroup Dynamics. 2002.
[29]
D. Shen, J.-T. Sun, H. Li, Q. Yang, and Z. Chen. Document summarization using conditional random fields. In Proceedings of IJCAI 2007.
[30]
J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE. Trans. on Pattern Analysis and Machine Intelligence, 22:888--905, 2000.
[31]
A. Turpin, Y. Tsegay, D. Hawking, and H. Williams. Fast generation of result snippets in web search. In Proceedings of SIGIR 2007.
[32]
X. Wan, J. Yang, and J. Xiao. Manifold-ranking based topic-focused multi-document summarization. In Proceedings of IJCAI 2007.
[33]
W.-T. Yih, J. Goodman, L. Vanderwende, and H. Suzuki. Multi-document summarization by maximizing informative content-words. In Proceedings of IJCAI 2007.
[34]
H. Zha. Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In Prodeedings of SIGIR 2005.

Cited By

View all
  • (2024)An Evaluative Baseline for Sentence-Level Semantic DivisionMachine Learning and Knowledge Extraction10.3390/make60100036:1(41-52)Online publication date: 2-Jan-2024
  • (2024)Symmetry and Graph Bi-Regularized Non-Negative Matrix Factorization for Precise Community DetectionIEEE Transactions on Automation Science and Engineering10.1109/TASE.2023.324033521:2(1406-1420)Online publication date: Apr-2024
  • (2024)Mining Both Commonality and Specificity From Multiple Documents for Multi-Document SummarizationIEEE Access10.1109/ACCESS.2024.338849312(54371-54381)Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
July 2008
934 pages
ISBN:9781605581644
DOI:10.1145/1390334
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 July 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. multi-document summarization
  2. sentence-level semantic analysis
  3. symmetric non-negative matrix factorization

Qualifiers

  • Research-article

Conference

SIGIR '08
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)An Evaluative Baseline for Sentence-Level Semantic DivisionMachine Learning and Knowledge Extraction10.3390/make60100036:1(41-52)Online publication date: 2-Jan-2024
  • (2024)Symmetry and Graph Bi-Regularized Non-Negative Matrix Factorization for Precise Community DetectionIEEE Transactions on Automation Science and Engineering10.1109/TASE.2023.324033521:2(1406-1420)Online publication date: Apr-2024
  • (2024)Mining Both Commonality and Specificity From Multiple Documents for Multi-Document SummarizationIEEE Access10.1109/ACCESS.2024.338849312(54371-54381)Online publication date: 2024
  • (2024)An experimental study of game theory with various word embeddings for automatic extractive text summarizationMultimedia Tools and Applications10.1007/s11042-024-19828-yOnline publication date: 20-Jul-2024
  • (2023)A Review of Quantum-Inspired Metaheuristic Algorithms for Automatic ClusteringMathematics10.3390/math1109201811:9(2018)Online publication date: 24-Apr-2023
  • (2023)A Second-Order Symmetric Non-Negative Latent Factor Model for Undirected Weighted Network RepresentationIEEE Transactions on Network Science and Engineering10.1109/TNSE.2022.320680210:2(606-618)Online publication date: 1-Mar-2023
  • (2023)Evolutionary Approach for Detecting Significant Edges in Social and Communication NetworksIEEE Access10.1109/ACCESS.2023.328490611(58046-58054)Online publication date: 2023
  • (2023)Automatic Document Summarization of Unilingual Documents: A ReviewIntelligent Computing and Optimization10.1007/978-3-031-50327-6_36(345-358)Online publication date: 16-Dec-2023
  • (2022)Extractive summarization using concept‐space and keyword phraseExpert Systems10.1111/exsy.1311039:10Online publication date: 10-Aug-2022
  • (2022)Symmetric Nonnegative Matrix Factorization-Based Community Detection Models and Their Convergence AnalysisIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2020.304136033:3(1203-1215)Online publication date: Mar-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media