Abstract
Fast changing knowledge on the Internet can be acquired more efficiently with the help of automatic document summarization and updating techniques. This paper describes a novel approach for multi-document update summarization. The best summary is defined to be the one which has the minimum information distance to the entire document set. The best update summary has the minimum conditional information distance to a document cluster given that a prior document cluster has already been read. Experiments on the DUC/TAC 2007 to 2009 datasets (http://duc.nist.gov/, http://www.nist.gov/tac/) have proved that our method closely correlates with the human summaries and outperforms other programs such as LexRank in many categories under the ROUGE evaluation criterion.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Luhn H P. The automatic creation of literature abstracts. IBM Journal of Research and Development, 1958, 2(2): 159-165.
Wan X, Yang J, Xiao J. Manifold-ranking based topic-focused multi-document summarization. In Proc IJCAI, Hyderabad, India, Jan. 6-12, 2007, pp.2903-2908.
Li M, Vitányi P M. An Introduction to Kolmogorov Complexity and Its Applications. Springer-Verlag, 1997.
Carbonell J, Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proc. SIGIR, Melbourne, Australia, Aug. 24-28, 1998, pp.335-336.
Radev D R, Jing H, Stys M, Tam D. Centroid-based summarization of multiple documents. Information Processing and Management, 2004, 40(6): 919-938.
Kupiec J, Pedersen J, Chen F. A trainable document summarizer. In Proc. SIGIR, Seattle, USA, Jul. 9-13, 1995, pp.68-73.
Leskovec J, Milic-Frayling N, Grobelnik M. Impact of linguistic analysis on the semantic graph coverage and learning of document extracts. In Proc. AAAI, Pittsburgh, USA, Jul. 9-13, 2005, pp.1069-1074.
Shen D, Sun J T, Li H, Yang Q, Chen Z. Document summarization using conditional random fields. In Proc. IJCAI, Hyderabad, India, Jan. 6-12, 2007, pp.2862-2867.
Zhang J, Cheng X, Wu G, Xu H. Adasum: An adaptive model for summarization. In Proc. CIKM, Napa Valley, USA, Oct. 26-30, 2008, pp.901-909.
Erkan G, Radev D R. Lexpagerank: Prestige in multidocument text summarization. In Proc. EMNLP, Barcelona, Spain, Jul. 25-26, 2004, pp.365-371.
Mihalcea R, Tarau P. Textrank — Bring order into texts. In Proc. EMNLP, Barcelona, Spain, Jul. 25-26, 2004, pp.119-126.
Mihalcea R, Tarau P. A language independent algorithm for single and multiple document summarization. In Proc. IJCNLP, Jeju Island, Korea, Oct.11-13, 2005, pp.19-24.
Wan X, Yang J, Xiao J. Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In Proc. ACL, Prague, Czech Republic, Jun. 23-30, 2007, pp.552-559.
Wan X. An exploration of document impact on graph-based multi-document summarization. In Proc. EMNLP, Hawaii, USA, Oct. 25-27, 2008, pp.755-762.
Bennett C H, Gács P, Li M, Vitányi P M, Zurek W H. Information distance. IEEE Transactions on Information Theory, Jul. 1998, 44(4): 1407-1423.
Li M, Badger J H, Chen X, Kwong S, Kearney P, Zhang H. An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics, 2001, 17(2): 149-154.
Li M, Chen X, Li X, Ma B, Vitányi P M. The similarity metric. IEEE Transactions on Information Theory, 2004, 50(12): 3250-3264.
Long C, Zhu X, Li M, Ma B. Information shared by many objects. In Proc. CIKM, Napa Valley, USA, Oct. 26-30, 2008, pp.1213-1220.
Benedetto D, Caglioti E, Loreto V. Language trees and zipping. Physical Review Letters, Jan. 2002, 88(4): 048702.
Bennett C H, Li M, Ma B. Chain letters and evolutionary histories. Scientific American, Jun. 2003, 288(6): 76-81.
Cilibrasi R L, Vitányi P M. The Google similarity distance. IEEE Transactions on Knowledge and Data Engineering, Mar. 2007, 19(3): 370-383.
Zhang X, Hao Y, Zhu X, Li M. Information distance from a question to an answer. In Proc. SIGKDD, San Jose, USA, Aug. 12-15, 2007, pp.874-883.
Ziv J, Lempel A. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 1977, 23(3): 337-343.
Lin C Y, Hovy E. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proc. HLT-NAACL, Edmonton, Canada, May 27-June 1, 2003, pp.71-78.
Nenkova A, Passonneau R,Mckeown K. The pyramid method: Incorporating human content selection variation in summarization evaluation. ACM Transactions on Speech and Language Processing, Apr. 2007, 4(2): 1-23.
Author information
Authors and Affiliations
Corresponding author
Additional information
The work was supported by the National Natural Science Foundation of China under Grant No. 60973104, the National Basic Research 973 Program of China under Grant No. 2007CB311003, and the IRCI Project from IDRC, Canada.
Rights and permissions
About this article
Cite this article
Long, C., Huang, ML., Zhu, XY. et al. A New Approach for Multi-Document Update Summarization. J. Comput. Sci. Technol. 25, 739–749 (2010). https://doi.org/10.1007/s11390-010-9361-x
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-010-9361-x