Article

Towards multidocument summarization by reformulation: progress and prospects

Authors:

Kathleen R. McKeown,

Judith L. Klavans,

Vasileios Hatzivassiloglou,

Regina Barzilay,

Eleazar EskinAuthors Info & Claims

AAAI '99/IAAI '99: Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence

Pages 453 - 460

Published: 18 July 1999 Publication History

Abstract

By synthesizing information common to retrieved documents, multi-document summarization can help users of information retrieval systems to find relevant documents with a minimal amount of reading. We are developing a multidocument summarization system to automatically generate a concise summary by identifying and synthesizing similarities across a set of related documents. Our approach is unique in its integration of machine learning and statistical techniques to identify similar paragraphs, intersection of similar phrases within paragraphs, and language generation to reformulate the wording of the summary. Our evaluation of system components shows that learning over multiple extracted linguistic features is more effective than information retrieval approaches at identifying similar text units for summarization and that it is possible to generate a fluent summary that conveys similarities among documents even when full semantic interpretations of the input text are not available.

References

[1]

James Allan, Jaime Carbonell, George Doddington, Jon Yamron, and Y. Yang. Topic Detection and Tracking Pilot Study: Final Report. In Proceedings of the Broadcast News Understanding and Transcription Workshop , pages 194-218,1998.

[2]

Regina Barzilay and Michael Elhadad. Using Lexical Chains for Text Summarization. In Proceedings of the ACL Workshop on Intelligent Scalable Text Summarization , pages 10-17, Madrid, Spain, August 1997. Association for Computational Linguistics.

[3]

Regina Barzilay, Kathleen R. McKeown, and Michael Elhadad. Information Fusion in the Context of Multi-Document Summarization. In Proceedings of the 37th Annual Meeting of the ACL , College Park, Maryland, June 1999. Association for Computational Linguistics.

[4]

William Cohen. Learning Trees and Rules with Set-Valued Features. In Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI-96) . American Association for Artificial Intelligence, 1996.

[5]

Michael Collins. A New Statistical Parser Based on Bigram Lexical Dependencies. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics , Santa Cruz, California, 1996.

[6]

Michael Elhadad. Using Argumentation to Control Lexical Choice: A Functional Unification Implementation . PhD thesis, Department of Computer Science, Columbia University, New York, 1993.

[7]

Richard Kittredge and Igor A. Mel'¿uk. Towards a Computable Model of Meaning-Text Relations Within a Natural Sublanguage. In Proceedings of the Eighth International Joint Conference on Artificial Intelligence (IJCAI-83) , pages 657-659, Karlsruhe, West Germany, August 1983.

[8]

Judith Klavans and Min-Yen Kan. The Role of Verbs in Document Access. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (ACL/COLING-98) , Montreal, Canada, 1998.

[9]

Julian M. Kupiec, Jan Pedersen, and Francine Chen. A Trainable Document Summarizer. In Edward A. Fox, Peter Ingwersen, and Raya Fidel, editors, Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval , pages 68-73, Seattle, Washington, July 1995.

[10]

Beth Levin. English Verb Classes and Alternations: A Preliminary Investigation . University of Chicago Press, Chicago, Illinois, 1993.

[11]

Chin-Yew Lin and Eduard Hovy. Identifying Topics by Position. In Proceedings of the 5th ACL Conference on Applied Natural Language Processing , pages 283-290, Washington, D.C., April 1997.

Digital Library

[12]

Inderjeet Mani and Eric Bloedorn. Multi-document Summarization by Graph Search and Matching. In Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-97) , pages 622-628, Providence, Rhode Island, 1997. American Association for Artificial Intelligence.

Digital Library

[13]

Daniel Marcu. From Discourse Structures to Text Summaries. In Proceedings of the ACL Workshop on Intelligent Scalable Text Summarization , pages 82-88, Madrid, Spain, August 1997. Association for Computational Linguistics.

[14]

Daniel Marcu. To Build Text Summaries of High Quality, Nuclearity is not Sufficient. In Proceedings of the AAAI Symposium on Intelligent Text Summarization , pages 1-8, Stanford University, Stanford, California, March 1998. American Association for Artificial Intelligence.

[15]

George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine J. Miller. Introduction to WordNet: An On-Line Lexical Database. International Journal of Lexicography , 3 (4):235-312,1990.

[16]

Chris D. Paice. Constructing Literature Abstracts by Computer: Techniques and Prospects. Information Processing and Management , 26 : 171-186, 1990.

[17]

Dragomir R. Radev and Kathleen R. McKeown. Generating Natural Language Summaries from Multiple On-Line Sources. Computational Linguistics , 24 (3):469-500, September 1998.

[18]

Jacques Robin. Revision-Based Generation of Natural Language Summaries Providing Historical Background: Corpus-Based Analysis, Design, Implementation, and Evaluation . PhD thesis, Department of Computer Science, Columbia University, New York, 1994. Also Columbia University Technical Report CU-CS-034- 94.

[19]

G. Salton and C. Buckley. Term Weighting Approaches in Automatic Text Retrieval. Information Processing and Management , 25 (5):513-523, 1988.

Digital Library

[20]

Alan F. Smeaton. Progress in the Application of Natural Language Processing to Information Retrieval Tasks. The Computer Journal , 35 (3):268-278, 1992.

Digital Library

[21]

Tomek Strzalkowski, Jin Wang, and Bowden Wise. A Robust Practical Text Summarization. In Proceedings of the AAAI Symposium on Intelligent Text Summarization , pages 26-33, Stanford University, Stanford, California, March 1998. American Association for Artificial Intelligence.

[22]

Nina Wacholder. Simplex NPs Clustered by Head: A Method For Identifying Significant Topics in a Document. In Proceedings of the Workshop on the Computational Treatment of Nominals , pages 70-79, Montreal, Canada, October 1998. COLING-ACL.

[23]

Yiming Yang, Tom Pierce, and Jaime Carbonell. A Study on Retrospective and On-Line Event Detection. In Proceedings of the 21st Annual International ACM SIG1R Conference on Research and Development in Information Retrieval , Melbourne, Australia, August 1998.

Cited By

Li JSong YWei ZWong K(2018)A joint model of conversational discourse and latent topics on microblogsComputational Linguistics10.1162/coli_a_0033544:4(719-754)Online publication date: 1-Dec-2018
https://dl.acm.org/doi/10.1162/coli_a_00335
Duan YJatowt ATanaka KDolog PVojtas PBonchi FHelic D(2017)Discovering Typical Histories of Entities by Multi-Timeline SummarizationProceedings of the 28th ACM Conference on Hypertext and Social Media10.1145/3078714.3078725(105-114)Online publication date: 4-Jul-2017
https://dl.acm.org/doi/10.1145/3078714.3078725
Yue LShi ZHan JWang SChen WZuo W(2017)Multi-factors based sentence ordering for cross-document fusion from multimodal contentNeurocomputing10.1016/j.neucom.2016.12.084253:C(6-14)Online publication date: 30-Aug-2017
https://dl.acm.org/doi/10.1016/j.neucom.2016.12.084
Show More Cited By

Index Terms

Towards multidocument summarization by reformulation: progress and prospects
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
2. Information systems
  1. Information retrieval

Recommendations

Multidocument summarization: An added value to clustering in interactive retrieval

A more and more generalized problem in effective information access is the presence in the same corpus of multiple documents that contain similar information. Generally, users may be interested in locating, for a topic addressed by a group of similar ...
Experiments in multidocument summarization
HLT '02: Proceedings of the second international conference on Human Language Technology Research

This paper describes a multidocument summarizer built upon research into the detection of new information. The summarizer uses several new strategies to select interesting and informative sentences, including an innovative measure of importance derived ...
Integrating Document Clustering and Multidocument Summarization

Document understanding techniques such as document clustering and multidocument summarization have been receiving much attention recently. Current document clustering methods usually represent the given collection of documents as a document-term matrix ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

AAAI '99/IAAI '99: Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence

July 1999

998 pages

ISBN:0262511061

Chairmen:
Jim Hendler
Univ. of Maryland, College Park; and DARP/ISO
,
Devika Subramanian
Rice Univ., Houston, TX
,
Ramasamy Uthurusamy
General Motors Research
,
Barbara Hayes-Roth
Extempo Systems, Inc.

Sponsors

AAAI: Am Assoc for Artifical Intelligence

Publisher

American Association for Artificial Intelligence

United States

Publication History

Published: 18 July 1999

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

59
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li JSong YWei ZWong K(2018)A joint model of conversational discourse and latent topics on microblogsComputational Linguistics10.1162/coli_a_0033544:4(719-754)Online publication date: 1-Dec-2018
https://dl.acm.org/doi/10.1162/coli_a_00335
Duan YJatowt ATanaka KDolog PVojtas PBonchi FHelic D(2017)Discovering Typical Histories of Entities by Multi-Timeline SummarizationProceedings of the 28th ACM Conference on Hypertext and Social Media10.1145/3078714.3078725(105-114)Online publication date: 4-Jul-2017
https://dl.acm.org/doi/10.1145/3078714.3078725
Yue LShi ZHan JWang SChen WZuo W(2017)Multi-factors based sentence ordering for cross-document fusion from multimodal contentNeurocomputing10.1016/j.neucom.2016.12.084253:C(6-14)Online publication date: 30-Aug-2017
https://dl.acm.org/doi/10.1016/j.neucom.2016.12.084
Bates AKalita J(2016)Counting Clusters in Twitter PostsProceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies10.1145/2905055.2905295(1-9)Online publication date: 4-Mar-2016
https://dl.acm.org/doi/10.1145/2905055.2905295
Hahn NChang JKim JKittur AKaye JDruin ALampe CMorris DHourcade J(2016)The Knowledge AcceleratorProceedings of the 2016 CHI Conference on Human Factors in Computing Systems10.1145/2858036.2858364(2258-2270)Online publication date: 7-May-2016
https://dl.acm.org/doi/10.1145/2858036.2858364
Alhindi AKruschwitz UFox CAlbakour M(2015)Profile-Based Summarisation for Web Site NavigationACM Transactions on Information Systems10.1145/269966133:1(1-39)Online publication date: 17-Feb-2015
https://dl.acm.org/doi/10.1145/2699661
Ketui NTheeramunkong TOnsuwan C(2015)An EDU-Based Approach for Thai Multi-Document Summarization and Its ApplicationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/264156714:1(1-26)Online publication date: 30-Jan-2015
https://dl.acm.org/doi/10.1145/2641567
Yan SWan X(2014)SRRankIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2014.236046122:12(2048-2058)Online publication date: 1-Dec-2014
https://dl.acm.org/doi/10.1109/TASLP.2014.2360461
Nenkova AMaskey SLiu YWay APantel P(2011)Automatic summarizationProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts of ACL 201110.5555/2002465.2002468(1-86)Online publication date: 19-Jun-2011
https://dl.acm.org/doi/10.5555/2002465.2002468
Yeloglu OMilios EZincir-Heywood NChu WWong WPalakal MHung C(2011)Multi-document summarization of scientific corporaProceedings of the 2011 ACM Symposium on Applied Computing10.1145/1982185.1982243(252-258)Online publication date: 21-Mar-2011
https://dl.acm.org/doi/10.1145/1982185.1982243
Show More Cited By

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents