Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Multi-candidate reduction: Sentence compression as a tool for document summarization tasks

Published: 01 November 2007 Publication History

Abstract

This article examines the application of two single-document sentence compression techniques to the problem of multi-document summarization-a ''parse-and-trim'' approach and a statistical noisy-channel approach. We introduce the multi-candidate reduction (MCR) framework for multi-document summarization, in which many compressed candidates are generated for each source sentence. These candidates are then selected for inclusion in the final summary based on a combination of static and dynamic features. Evaluations demonstrate that sentence compression is a valuable component of a larger multi-document summarization framework.

References

[1]
A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. v5 i2. 179-190.
[2]
Banko, M., Mittal, V., & Witbrock, M. (2000). Headline generation based on statistical translation. In Proceedings of the 38th annual meeting of the association for computational linguistics (ACL 2000), Hong Kong (pp. 318-325).
[3]
Inferring strategies for sentence ordering in multidocument news summarization. Journal of Artificial Intelligence Research. v17. 35-55.
[4]
An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process. Inequalities. v3. 1-8.
[5]
Bergler, S., Witte, R., Khalife, M., Li, Z., & Rudzicz, F. (2003). Using knowledge-poor coreference resolution for text summarization. In Proceedings of the HLT-NAACL 2003 text summarization workshop and document understanding conference (DUC 2003), Edmonton, Alberta (pp. 85-92).
[6]
An algorithm that learns what's in a name. Machine Learning. v34 i1/3. 211-231.
[7]
Blair-Goldensohn, S., Evans, D., Hatzivassiloglou, V., McKeown, K., Nenkova, A., & Passonneau, R. (2004). Columbia University at DUC 2004. In Proceedings of the 2004 document understanding conference (DUC 2004) at HLT/NAACL 2004, Boston, Massachusetts (pp. 23-30).
[8]
A statistical approach to machine translation. Computational Linguistics. v16 i2. 79-85.
[9]
Carbonell, J., & Goldstein, J. (1998). The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 1998), Melbourne, Australia (pp. 335-336).
[10]
Charniak, E. (2000). A maximum-entropy-inspired parser. In Proceedings of the first meeting of the North American chapter of the association for computational linguistics (NAACL 2000), Seattle, Washington (pp. 132-139).
[11]
Clarke, J., & Lapata, M. (2006). Models for sentence compression: a comparison across domains, training requirements and evaluation measures. In Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics (COLING/ACL 2006), Sydney, Australia (pp. 377-384).
[12]
Conroy, J., Schlesinger, J., & Stewart, J. G. (2005). CLASSY query-based multi-document summarization. In Proceedings of the 2005 document understanding conference (DUC-2005) at HLT/EMNLP 2005, Vancouver, Canada.
[13]
Conroy, J., Schlesinger, J., O'Leary, D., & Goldstein, J. (2006) Back to basics: CLASSY 2006. In Proceedings of the 2006 document understanding conference (DUC 2006) at HLT/NAACL 2006, New York, NY.
[14]
Cutting, D., Pedersen, J., & Sibun, P. (1992). A practical part-of-speech tagger. In Proceedings of the third conference on applied natural language processing. Trento, Italy.
[15]
Dang, H., & Harman, D. (2006). In Proceedings of the 2006 document understanding conference (DUC 2006) at HLT/NAACL 2006.
[16]
Dorr, B. J., & Gaasterland, T. Exploiting aspectual features and connecting words for summarization-inspired temporal-relation extraction. Information Processing and Management, this issue.
[17]
Cross-language headline generation for Hindi. ACM Transactions on Asian Language Information Processing (TALIP). v2 i3. 270-289.
[18]
Dorr, B. J., Zajic, D., & Schwartz, R. (2003b). Hedge trimmer: a parse-and-trim approach to headline generation. In Proceedings of the HLT-NAACL 2003 text summarization workshop and document understanding conference (DUC 2003), Edmonton, Alberta (pp. 1-8).
[19]
Dunning, T. (1994). Statistical identification of language. Tech. Rep. MCCS 94-273, New Mexico State University.
[20]
Euler, T. (2002). Tailoring text using topic words: selection and compression. In Proceedings of 13th international workshop on database and expert systems applications (DEXA 2002), Aix-en-Provence, France (pp. 215-222).
[21]
Goldstein, J., Mittal, V., Carbonell, J., & Kantrowitz, M. (2000). Multi-document summarization by sentence extraction. In Proceedings of ANLP/NAACL 2000 workshop on automatic summarization (pp. 40-48).
[22]
TIPSTER complete. Linguistic Data Consortium (LDC), Philadelphia.
[23]
Jing, H., & McKeown, K. (2000). Cut and paste based text summarization. In Proceedings of the first meeting of the North American chapter of the association for computational linguistics (NAACL 2000), Seattle, Washington (pp. 178-185).
[24]
Klimt, B., & Yang, Y. (2004). Introducing the Enron corpus. In Proceedings of the first conference on email and anti-spam (CEAS). Mountain View, CA.
[25]
Knight, K., & Marcu, D. (2000). Statistics-based summarization-step one: sentence compression. In Proceedings of the seventeenth national conference on artificial intelligence (AAAI-2000), Austin, TX.
[26]
Summarization beyond sentence extraction: a probabilistic approach to sentence compression. Artificial Intelligence. v139 i1. 91-107.
[27]
Lapata, M. (2003). Probabilistic text structuring: experiments with sentence ordering. In Proceedings of the 42nd annual meeting of the association for computational linguistics (ACL 2004), Barcelona, Spain (pp. 545-552).
[28]
Lewis, D. D. (1999). An evaluation of phrasal and clustered representations on a text categorization task. In Proceedings of the 15th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 1992), Copenhagen, Denmark (pp. 37-50).
[29]
Lin, C.-Y., & Hovy, E. (2003). Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 human language technology conference and the North American chapter of the association for computational linguistics annual meeting (HLT/NAACL 2003), Edmonton, Alberta (pp. 71-78).
[30]
Mårdh, I. (1980). Headlinese: On the grammar of English front page headlines, Malmo.
[31]
Mays, E., Damerau, F., & Mercer, R. (1990). Context-based spelling correction. In Proceedings of IBM natural language ITL, Paris, France (pp. 517-522).
[32]
Miller, S., Ramshaw, L., Fox, H., & Weischedel, R. (2000). A novel use of statistical parsing to extract information from text. In Proceedings of the first meeting of the North American chapter of the association for computational linguistics (NAACL 2000), Seattle, Washington (pp. 226-233).
[33]
Muresan, S., Tzoukermann, E., & Klavans, J. L. (2001). Combining linguistic and machine learning techniques for email. In Proceedings of the ACL/EACL 2001 workshop on computational natural language learning (CoNLL), Toulouse, France (pp. 290-297).
[34]
Okazaki, N., Matsuo, Y., & Ishizuka, M. (2004). Improving chronological sentence ordering by precedence relation. In Proceedings of the 20th international conference on computational linguistics (COLING 2004), Geneva, Switzerland (pp. 750-756).
[35]
An algorithm for suffix stripping. Program. v14 i3. 130-137.
[36]
Radev, D., Allison, T., Blair-Goldensohn, S., Blitzer, J., Çelebi, A., Dimitrov, S., et al. (2004). MEAD-a platform for multidocument multilingual text summarization. In Proceedings of the 4th international conference on language resources and evaluation (LREC 2004), Lisbon, Portugal.
[37]
Schwartz, R., Imai, T., Jubala, F., Nguyen, L., & Makhoul, J. (1997). A maximum likelihood model for topic classification of broadcast news. In Proceedings of the fifth European speech communication association conference on speech communication and technology (Eurospeech-97), Rhodes, Greece.
[38]
Sista, S., Schwartz, R., Leek, T. R., & Makhoul, J. (2002). An algorithm for unsupervised topic discovery from broadcast news stories. In Proceedings of the 2002 human language technology conference (HLT2002), San Diego, California (pp. 99-103).
[39]
Turner, J., & Charniak, E. (2005). Supervised and unsupervised learning for sentence compression. In Proceedings of the 43rd annual meeting of the association for computational linguistics (ACL 2005), Ann Arbor, Michigan (pp. 290-297).
[40]
Vanderwende, L., Suzuki, H., & Brockett, C. (2006). Microsoft research at DUC2006: task-focused summarization with sentence simplification and lexical expansion. In Proceedings of the 2006 document understanding conference (DUC 2006) at HLT/NAACL 2006. New York, NY.
[41]
Error bounds for convolution codes and an asymptotically optimal decoding algorithm. IEEE Transactions on Information Theory. v13. 260-269.
[42]
Wang, R., Stokes, N., Doran, W., Newman, E., Carthy, J., & Dunnion, J. (2005). Comparing Topiary-style approaches to headline generation. In Lecture notes in computer science: Advances in information retrieval: 27th European conference on IR research (ECIR 2005). Vol. 3408. Berlin/Heidelberg: Springer, Santiago de Compostela, Spain.
[43]
Zajic, D. M. (2007). Multiple alternative sentence compressions (MASC) as a tool for automatic summarization tasks. PhD thesis, University of Maryland, College Park.
[44]
Zajic, D., Dorr, B. J., & Schwartz, R. (2004). BBN/UMD at DUC-2004: topiary. In Proceedings of the 2004 document understanding conference (DUC 2004) at NLT/NAACL 2004, Boston, MA (pp. 112-119).
[45]
Zajic, D. M., Dorr, B. J., Lin, J., & Schwartz, R. (2005a). UMD/BBN at MSE2005. In Proceedings of the MSE2005 track of the ACL 2005 workshop on intrinsic and extrinsic evaluation measures for MT and/or summarization, Ann Arbor, Michigan.
[46]
Zajic, D. M., Dorr, B. J., Schwartz, R., Monz, C., & Lin, J. (2005b). A sentence-trimming approach to multi-document summarization. In Proceedings of the 2005 document understanding conference (DUC-2005) at HLT/EMNLP 2005, Vancouver, Canada (pp. 151-158).
[47]
Zhou, L., & Hovy, E. (2003) Headline summarization at ISI. In Proceedings of the HLT-NAACL 2003 text summarization workshop and document understanding conference (DUC 2003), Edmonton, Alberta (pp. 174-178).

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 01 November 2007

Author Tags

  1. 07.05.Mh
  2. 43.71.Sy
  3. 89.20.Ff
  4. Headline generation
  5. Hidden Markov model
  6. Parse-and-trim
  7. Summarization

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Automatic summarization of scientific articlesJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2020.04.02034:4(1011-1028)Online publication date: 1-Apr-2022
  • (2021)Keyword-Aware Encoder for Abstractive Text SummarizationDatabase Systems for Advanced Applications10.1007/978-3-030-73197-7_3(37-52)Online publication date: 11-Apr-2021
  • (2019)Multi-document extractive text summarizationKnowledge-Based Systems10.1016/j.knosys.2019.07.019183:COnline publication date: 1-Nov-2019
  • (2019)A Sequence-to-Sequence Text Summarization Model with Topic Based Attention MechanismWeb Information Systems and Applications10.1007/978-3-030-30952-7_29(285-297)Online publication date: 20-Sep-2019
  • (2018)Faithful to the originalProceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence10.5555/3504035.3504621(4784-4791)Online publication date: 2-Feb-2018
  • (2016)Assessing shallow sentence scoring techniques and combinations for single and multi-document summarizationExpert Systems with Applications: An International Journal10.1016/j.eswa.2016.08.03065:C(68-86)Online publication date: 15-Dec-2016
  • (2016)Concept generalization and fusion for abstractive sentence generationExpert Systems with Applications: An International Journal10.1016/j.eswa.2016.01.00753:C(43-56)Online publication date: 1-Jul-2016
  • (2016)Sentence simplification, compression, and disaggregation for summarization of sophisticated documentsJournal of the Association for Information Science and Technology10.1002/asi.2357667:10(2437-2453)Online publication date: 1-Oct-2016
  • (2015)Multi-document abstractive summarization using ILP based multi-sentence compressionProceedings of the 24th International Conference on Artificial Intelligence10.5555/2832415.2832417(1208-1214)Online publication date: 25-Jul-2015
  • (2015)Deep Dependency Substructure-Based Learning for Multidocument SummarizationACM Transactions on Information Systems10.1145/276644734:1(1-24)Online publication date: 14-Jul-2015
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media