article

Multi-candidate reduction: Sentence compression as a tool for document summarization tasks

Authors:

Bonnie J. Dorr,

Richard SchwartzAuthors Info & Claims

Information Processing and Management: an International Journal, Volume 43, Issue 6

Pages 1549 - 1570

https://doi.org/10.1016/j.ipm.2007.01.016

Published: 01 November 2007 Publication History

Abstract

This article examines the application of two single-document sentence compression techniques to the problem of multi-document summarization-a ''parse-and-trim'' approach and a statistical noisy-channel approach. We introduce the multi-candidate reduction (MCR) framework for multi-document summarization, in which many compressed candidates are generated for each source sentence. These candidates are then selected for inclusion in the final summary based on a combination of static and dynamic features. Evaluations demonstrate that sentence compression is a valuable component of a larger multi-document summarization framework.

References

[1]

A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. v5 i2. 179-190.

Digital Library

[2]

Banko, M., Mittal, V., & Witbrock, M. (2000). Headline generation based on statistical translation. In Proceedings of the 38th annual meeting of the association for computational linguistics (ACL 2000), Hong Kong (pp. 318-325).

Digital Library

[3]

Inferring strategies for sentence ordering in multidocument news summarization. Journal of Artificial Intelligence Research. v17. 35-55.

Digital Library

[4]

An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process. Inequalities. v3. 1-8.

[5]

Bergler, S., Witte, R., Khalife, M., Li, Z., & Rudzicz, F. (2003). Using knowledge-poor coreference resolution for text summarization. In Proceedings of the HLT-NAACL 2003 text summarization workshop and document understanding conference (DUC 2003), Edmonton, Alberta (pp. 85-92).

[6]

An algorithm that learns what's in a name. Machine Learning. v34 i1/3. 211-231.

Digital Library

[7]

Blair-Goldensohn, S., Evans, D., Hatzivassiloglou, V., McKeown, K., Nenkova, A., & Passonneau, R. (2004). Columbia University at DUC 2004. In Proceedings of the 2004 document understanding conference (DUC 2004) at HLT/NAACL 2004, Boston, Massachusetts (pp. 23-30).

[8]

A statistical approach to machine translation. Computational Linguistics. v16 i2. 79-85.

Digital Library

[9]

Carbonell, J., & Goldstein, J. (1998). The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 1998), Melbourne, Australia (pp. 335-336).

Digital Library

[10]

Charniak, E. (2000). A maximum-entropy-inspired parser. In Proceedings of the first meeting of the North American chapter of the association for computational linguistics (NAACL 2000), Seattle, Washington (pp. 132-139).

Digital Library

[11]

Clarke, J., & Lapata, M. (2006). Models for sentence compression: a comparison across domains, training requirements and evaluation measures. In Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics (COLING/ACL 2006), Sydney, Australia (pp. 377-384).

Digital Library

[12]

Conroy, J., Schlesinger, J., & Stewart, J. G. (2005). CLASSY query-based multi-document summarization. In Proceedings of the 2005 document understanding conference (DUC-2005) at HLT/EMNLP 2005, Vancouver, Canada.

[13]

Conroy, J., Schlesinger, J., O'Leary, D., & Goldstein, J. (2006) Back to basics: CLASSY 2006. In Proceedings of the 2006 document understanding conference (DUC 2006) at HLT/NAACL 2006, New York, NY.

[14]

Cutting, D., Pedersen, J., & Sibun, P. (1992). A practical part-of-speech tagger. In Proceedings of the third conference on applied natural language processing. Trento, Italy.

Digital Library

[15]

Dang, H., & Harman, D. (2006). In Proceedings of the 2006 document understanding conference (DUC 2006) at HLT/NAACL 2006.

[16]

Dorr, B. J., & Gaasterland, T. Exploiting aspectual features and connecting words for summarization-inspired temporal-relation extraction. Information Processing and Management, this issue.

Digital Library

[17]

Cross-language headline generation for Hindi. ACM Transactions on Asian Language Information Processing (TALIP). v2 i3. 270-289.

Digital Library

[18]

Dorr, B. J., Zajic, D., & Schwartz, R. (2003b). Hedge trimmer: a parse-and-trim approach to headline generation. In Proceedings of the HLT-NAACL 2003 text summarization workshop and document understanding conference (DUC 2003), Edmonton, Alberta (pp. 1-8).

Digital Library

[19]

Dunning, T. (1994). Statistical identification of language. Tech. Rep. MCCS 94-273, New Mexico State University.

[20]

Euler, T. (2002). Tailoring text using topic words: selection and compression. In Proceedings of 13th international workshop on database and expert systems applications (DEXA 2002), Aix-en-Provence, France (pp. 215-222).

Digital Library

[21]

Goldstein, J., Mittal, V., Carbonell, J., & Kantrowitz, M. (2000). Multi-document summarization by sentence extraction. In Proceedings of ANLP/NAACL 2000 workshop on automatic summarization (pp. 40-48).

Digital Library

[22]

TIPSTER complete. Linguistic Data Consortium (LDC), Philadelphia.

[23]

Jing, H., & McKeown, K. (2000). Cut and paste based text summarization. In Proceedings of the first meeting of the North American chapter of the association for computational linguistics (NAACL 2000), Seattle, Washington (pp. 178-185).

Digital Library

[24]

Klimt, B., & Yang, Y. (2004). Introducing the Enron corpus. In Proceedings of the first conference on email and anti-spam (CEAS). Mountain View, CA.

[25]

Knight, K., & Marcu, D. (2000). Statistics-based summarization-step one: sentence compression. In Proceedings of the seventeenth national conference on artificial intelligence (AAAI-2000), Austin, TX.

Digital Library

[26]

Summarization beyond sentence extraction: a probabilistic approach to sentence compression. Artificial Intelligence. v139 i1. 91-107.

Digital Library

[27]

Lapata, M. (2003). Probabilistic text structuring: experiments with sentence ordering. In Proceedings of the 42nd annual meeting of the association for computational linguistics (ACL 2004), Barcelona, Spain (pp. 545-552).

Digital Library

[28]

Lewis, D. D. (1999). An evaluation of phrasal and clustered representations on a text categorization task. In Proceedings of the 15th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 1992), Copenhagen, Denmark (pp. 37-50).

Digital Library

[29]

Lin, C.-Y., & Hovy, E. (2003). Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 human language technology conference and the North American chapter of the association for computational linguistics annual meeting (HLT/NAACL 2003), Edmonton, Alberta (pp. 71-78).

Digital Library

[30]

Mårdh, I. (1980). Headlinese: On the grammar of English front page headlines, Malmo.

[31]

Mays, E., Damerau, F., & Mercer, R. (1990). Context-based spelling correction. In Proceedings of IBM natural language ITL, Paris, France (pp. 517-522).

[32]

Miller, S., Ramshaw, L., Fox, H., & Weischedel, R. (2000). A novel use of statistical parsing to extract information from text. In Proceedings of the first meeting of the North American chapter of the association for computational linguistics (NAACL 2000), Seattle, Washington (pp. 226-233).

Digital Library

[33]

Muresan, S., Tzoukermann, E., & Klavans, J. L. (2001). Combining linguistic and machine learning techniques for email. In Proceedings of the ACL/EACL 2001 workshop on computational natural language learning (CoNLL), Toulouse, France (pp. 290-297).

Digital Library

[34]

Okazaki, N., Matsuo, Y., & Ishizuka, M. (2004). Improving chronological sentence ordering by precedence relation. In Proceedings of the 20th international conference on computational linguistics (COLING 2004), Geneva, Switzerland (pp. 750-756).

Digital Library

[35]

An algorithm for suffix stripping. Program. v14 i3. 130-137.

[36]

Radev, D., Allison, T., Blair-Goldensohn, S., Blitzer, J., Çelebi, A., Dimitrov, S., et al. (2004). MEAD-a platform for multidocument multilingual text summarization. In Proceedings of the 4th international conference on language resources and evaluation (LREC 2004), Lisbon, Portugal.

[37]

Schwartz, R., Imai, T., Jubala, F., Nguyen, L., & Makhoul, J. (1997). A maximum likelihood model for topic classification of broadcast news. In Proceedings of the fifth European speech communication association conference on speech communication and technology (Eurospeech-97), Rhodes, Greece.

[38]

Sista, S., Schwartz, R., Leek, T. R., & Makhoul, J. (2002). An algorithm for unsupervised topic discovery from broadcast news stories. In Proceedings of the 2002 human language technology conference (HLT2002), San Diego, California (pp. 99-103).

Digital Library

[39]

Turner, J., & Charniak, E. (2005). Supervised and unsupervised learning for sentence compression. In Proceedings of the 43rd annual meeting of the association for computational linguistics (ACL 2005), Ann Arbor, Michigan (pp. 290-297).

Digital Library

[40]

Vanderwende, L., Suzuki, H., & Brockett, C. (2006). Microsoft research at DUC2006: task-focused summarization with sentence simplification and lexical expansion. In Proceedings of the 2006 document understanding conference (DUC 2006) at HLT/NAACL 2006. New York, NY.

[41]

Error bounds for convolution codes and an asymptotically optimal decoding algorithm. IEEE Transactions on Information Theory. v13. 260-269.

Digital Library

[42]

Wang, R., Stokes, N., Doran, W., Newman, E., Carthy, J., & Dunnion, J. (2005). Comparing Topiary-style approaches to headline generation. In Lecture notes in computer science: Advances in information retrieval: 27th European conference on IR research (ECIR 2005). Vol. 3408. Berlin/Heidelberg: Springer, Santiago de Compostela, Spain.

Digital Library

[43]

Zajic, D. M. (2007). Multiple alternative sentence compressions (MASC) as a tool for automatic summarization tasks. PhD thesis, University of Maryland, College Park.

Digital Library

[44]

Zajic, D., Dorr, B. J., & Schwartz, R. (2004). BBN/UMD at DUC-2004: topiary. In Proceedings of the 2004 document understanding conference (DUC 2004) at NLT/NAACL 2004, Boston, MA (pp. 112-119).

[45]

Zajic, D. M., Dorr, B. J., Lin, J., & Schwartz, R. (2005a). UMD/BBN at MSE2005. In Proceedings of the MSE2005 track of the ACL 2005 workshop on intrinsic and extrinsic evaluation measures for MT and/or summarization, Ann Arbor, Michigan.

[46]

Zajic, D. M., Dorr, B. J., Schwartz, R., Monz, C., & Lin, J. (2005b). A sentence-trimming approach to multi-document summarization. In Proceedings of the 2005 document understanding conference (DUC-2005) at HLT/EMNLP 2005, Vancouver, Canada (pp. 151-158).

[47]

Zhou, L., & Hovy, E. (2003) Headline summarization at ISI. In Proceedings of the HLT-NAACL 2003 text summarization workshop and document understanding conference (DUC 2003), Edmonton, Alberta (pp. 174-178).

Cited By

Ibrahim Altmami NEl Bachir Menai M(2022)Automatic summarization of scientific articlesJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2020.04.02034:4(1011-1028)Online publication date: 1-Apr-2022
https://dl.acm.org/doi/10.1016/j.jksuci.2020.04.020
Hu TLiang JYe WZhang S(2021)Keyword-Aware Encoder for Abstractive Text SummarizationDatabase Systems for Advanced Applications10.1007/978-3-030-73197-7_3(37-52)Online publication date: 11-Apr-2021
https://dl.acm.org/doi/10.1007/978-3-030-73197-7_3
Mutlu BSezer EAkcayol M(2019)Multi-document extractive text summarizationKnowledge-Based Systems10.1016/j.knosys.2019.07.019183:COnline publication date: 1-Nov-2019
https://dl.acm.org/doi/10.1016/j.knosys.2019.07.019
Show More Cited By

Index Terms

Multi-candidate reduction: Sentence compression as a tool for document summarization tasks
1. Applied computing
  1. Document management and text processing
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

A query-based multi-document sentiment summarizer
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Review websites, such as Epinions.com, which offer users a platform to share their opinions on diverse products and services, provide a valuable source of opinion-rich information. Browsing through archived reviews to locate different opinions on a ...
Using topic themes for multi-document summarization

The problem of using topic representations for multidocument summarization (MDS) has received considerable attention recently. Several topic representations have been employed for producing informative and coherent summaries. In this article, we ...
An extract-then-abstract based method to generate disaster-news headlines using a DNN extractor followed by a transformer abstractor
Abstract
Generating news headlines has been one of the predominant problems in Natural Language Processing research. Modern transformer models, if fine-tuned, can present a good headline with attention to all the parts of a disaster-news ...
Highlights
- Proposed an extract-then-abstract based disaster-news headline generation approach.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

Copyright © Elsevier Ltd © 2007.

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 01 November 2007

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

33
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ibrahim Altmami NEl Bachir Menai M(2022)Automatic summarization of scientific articlesJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2020.04.02034:4(1011-1028)Online publication date: 1-Apr-2022
https://dl.acm.org/doi/10.1016/j.jksuci.2020.04.020
Hu TLiang JYe WZhang S(2021)Keyword-Aware Encoder for Abstractive Text SummarizationDatabase Systems for Advanced Applications10.1007/978-3-030-73197-7_3(37-52)Online publication date: 11-Apr-2021
https://dl.acm.org/doi/10.1007/978-3-030-73197-7_3
Mutlu BSezer EAkcayol M(2019)Multi-document extractive text summarizationKnowledge-Based Systems10.1016/j.knosys.2019.07.019183:COnline publication date: 1-Nov-2019
https://dl.acm.org/doi/10.1016/j.knosys.2019.07.019
Pan HLiu HTang Y(2019)A Sequence-to-Sequence Text Summarization Model with Topic Based Attention MechanismWeb Information Systems and Applications10.1007/978-3-030-30952-7_29(285-297)Online publication date: 20-Sep-2019
https://dl.acm.org/doi/10.1007/978-3-030-30952-7_29
Cao ZWei FLi WLi SMcIlraith SWeinberger K(2018)Faithful to the originalProceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence10.5555/3504035.3504621(4784-4791)Online publication date: 2-Feb-2018
https://dl.acm.org/doi/10.5555/3504035.3504621
Oliveira HFerreira RLima RLins RFreitas FRiss MSimske S(2016)Assessing shallow sentence scoring techniques and combinations for single and multi-document summarizationExpert Systems with Applications: An International Journal10.1016/j.eswa.2016.08.03065:C(68-86)Online publication date: 15-Dec-2016
https://dl.acm.org/doi/10.1016/j.eswa.2016.08.030
Belkebir RGuessoum A(2016)Concept generalization and fusion for abstractive sentence generationExpert Systems with Applications: An International Journal10.1016/j.eswa.2016.01.00753:C(43-56)Online publication date: 1-Jul-2016
https://dl.acm.org/doi/10.1016/j.eswa.2016.01.007
Finegan-Dollak CRadev D(2016)Sentence simplification, compression, and disaggregation for summarization of sophisticated documentsJournal of the Association for Information Science and Technology10.1002/asi.2357667:10(2437-2453)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1002/asi.23576
Banerjee SMitra PSugiyama K(2015)Multi-document abstractive summarization using ILP based multi-sentence compressionProceedings of the 24th International Conference on Artificial Intelligence10.5555/2832415.2832417(1208-1214)Online publication date: 25-Jul-2015
https://dl.acm.org/doi/10.5555/2832415.2832417
Yan SWan X(2015)Deep Dependency Substructure-Based Learning for Multidocument SummarizationACM Transactions on Information Systems10.1145/276644734:1(1-24)Online publication date: 14-Jul-2015
https://dl.acm.org/doi/10.1145/2766447
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents