Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1390749.1390764acmotherconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Latent dirichlet allocation based multi-document summarization

Published: 24 July 2008 Publication History

Abstract

Extraction based Multi-Document Summarization Algorithms consist of choosing sentences from the documents using some weighting mechanism and combining them into a summary. In this article we use Latent Dirichlet Allocation to capture the events being covered by the documents and form the summary with sentences representing these different events. Our approach is distinguished from existing approaches in that we use mixture models to capture the topics and pick up the sentences without paying attention to the details of grammar and structure of the documents. Finally we present the evaluation of the algorithms on the DUC 2002 Corpus multi-document summarization tasks using the ROUGE evaluator to evaluate the summaries. Compared to DUC 2002 winners, our algorithms gave significantly better ROUGE-1 recall measures.

References

[1]
D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. Journal of machine Learning Research 3, pages 993--1022, 2003.
[2]
W. L. David Mimno and A. McCallum. Mixtures of hierarchical topics with pachinko allocation. Proceedings of the 24th international conference on Machine learning, 24:633--640, 2007.
[3]
T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, 101(1):5228--5235, 2004.
[4]
S. M. Harabagiu and F. Lacatusu. Generating single and multi-document summaries with gistexter. In Proceedings of the DUC 2002, pages 30--38, 2002.
[5]
F.-F. Li and P. Perona. A bayesian hierarchical model for learning natural scene categories. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2:524--531, 2005.
[6]
W. Li and A. McCallum. Pachinko allocation: Dag-structured mixture models of topic correlations. Proceedings of the 23rd international conference on Machine learning, 23:577--584, 2006.
[7]
C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. Proceedings of Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004, 2004.
[8]
C.-Y. Lin and E. Hovy. Automatic evaluation of summaries using n-gram co-occurrence statistics. Proceedings of the Human Technology Conference 2003 (HLT-NAACL-2003), 2003.
[9]
C.-Y. Lin and F. J. Och. Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, 2004.
[10]
S. R. M. Saravanan and B. Ravindran. A probabilistic approach to multi-document summarization for generating a tiled summary. Proceedings of the Sixth International Conference on Computational Intelligence and Multimedia Applications, 6:167--172, 2005.
[11]
G. Ronning. Maximum-likelihood estimation of dirichlet distribution. Journal of Statistical Computation and Simulation, 32(4):215--221, 1989.
[12]
H. van Halteren. Writing style recognition and sentence extraction. In U. Hahn and D. Harman (Eds.), Proceedings of the workshop on automatic summarization, pages 66--70, 2002.
[13]
X. Wang and E. Grimson. Spatial latent dirichlet allocation. Proceedings of Neural Information Processing Systems Conference (NIPS) 2007, 2007.
[14]
A. M. Wei Li and D. Blei. Nonparametric bayes pachinko allocation. Proceedings of Conference on Uncertainty in Artificial Intelligence, 2007.
[15]
M. B. Y. W. Teh, M. I. Jordan and D. Blei. Hierarchical dirichlet processes. Journal of the American Statistical Association, 101(476):1566--1581, 2006.

Cited By

View all
  • (2023)Automatic text summarization for government news reports based on multiple featuresThe Journal of Supercomputing10.1007/s11227-023-05599-080:3(3212-3228)Online publication date: 30-Aug-2023
  • (2022)System Design for Detecting Real Estate Speculation Abusing Inside Information: For the Fair Reallocation of LandLand10.3390/land1104056511:4(565)Online publication date: 11-Apr-2022
  • (2022)Extractive summarization of Malayalam documents using latent Dirichlet allocation: An experienceJournal of Intelligent Systems10.1515/jisys-2022-002731:1(393-406)Online publication date: 29-Mar-2022
  • Show More Cited By

Index Terms

  1. Latent dirichlet allocation based multi-document summarization

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    AND '08: Proceedings of the second workshop on Analytics for noisy unstructured text data
    July 2008
    130 pages
    ISBN:9781605581965
    DOI:10.1145/1390749
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 July 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. latent dirichlet allocation
    2. multi-document summarization

    Qualifiers

    • Research-article

    Conference

    AND '08

    Acceptance Rates

    Overall Acceptance Rate 15 of 22 submissions, 68%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)22
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 25 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Automatic text summarization for government news reports based on multiple featuresThe Journal of Supercomputing10.1007/s11227-023-05599-080:3(3212-3228)Online publication date: 30-Aug-2023
    • (2022)System Design for Detecting Real Estate Speculation Abusing Inside Information: For the Fair Reallocation of LandLand10.3390/land1104056511:4(565)Online publication date: 11-Apr-2022
    • (2022)Extractive summarization of Malayalam documents using latent Dirichlet allocation: An experienceJournal of Intelligent Systems10.1515/jisys-2022-002731:1(393-406)Online publication date: 29-Mar-2022
    • (2022)Tiered sentence based topic model for multi-document summarizationJournal of Information and Optimization Sciences10.1080/02522667.2022.213321943:8(2131-2141)Online publication date: 16-Dec-2022
    • (2022)Extractive text summarization using clustering-based topic modelingSoft Computing10.1007/s00500-022-07534-627:7(3965-3982)Online publication date: 4-Oct-2022
    • (2021)OpinionManager: Visual Exploration of Online Reviews in P2P AccommodationProceedings of the 14th International Symposium on Visual Information Communication and Interaction10.1145/3481549.3481551(1-8)Online publication date: 6-Sep-2021
    • (2021)Automatic topic labeling using graph-based pre-trained neural embeddingNeurocomputing10.1016/j.neucom.2021.08.078463:C(596-608)Online publication date: 6-Nov-2021
    • (2021)State and tendency: an empirical study of deep learning question&answer topics on Stack OverflowScience China Information Sciences10.1007/s11432-019-3018-664:11Online publication date: 15-Oct-2021
    • (2020)A new graph-based extractive text summarization using keywords or topic modelingJournal of Ambient Intelligence and Humanized Computing10.1007/s12652-020-02591-xOnline publication date: 17-Oct-2020
    • (2019)Extractive Multi-document Summarization using K-means, Centroid-based Method, MMR, and Sentence PositionProceedings of the 10th International Symposium on Information and Communication Technology10.1145/3368926.3369688(29-35)Online publication date: 4-Dec-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media