Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1143844.1143917acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Pachinko allocation: DAG-structured mixture models of topic correlations

Published: 25 June 2006 Publication History

Abstract

Latent Dirichlet allocation (LDA) and other related topic models are increasingly popular tools for summarization and manifold discovery in discrete data. However, LDA does not capture correlations between topics. In this paper, we introduce the pachinko allocation model (PAM), which captures arbitrary, nested, and possibly sparse correlations between topics using a directed acyclic graph (DAG). The leaves of the DAG represent individual words in the vocabulary, while each interior node represents a correlation among its children, which may be words or other interior nodes (topics). PAM provides a flexible alternative to recent work by Blei and Lafferty (2006), which captures correlations only between pairs of topics. Using text data from newsgroups, historic NIPS proceedings and other research paper corpora, we show improved performance of PAM in document classification, likelihood of held-out data, the ability to support finer-grained topics, and topical keyword coherence.

References

[1]
Blei, D., Griffiths, T., Jordan, M., & Tenenbaum, J. (2004). Hierarchical topic models and the nested chinese restaurant process. In Advances in neural information processing systems 16.
[2]
Blei, D., & Lafferty, J. (2006). Correlated topic models. In Advances in neural information processing systems 18.
[3]
Blei, D., Ng, A., & Jordan, M. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993--1022.
[4]
Chib, S. (1995). Marginal likelihood from the Gibbs output. Journal of the American Statistical Association.
[5]
Diggle, P., & Gratton, R. (1984). Monte Carlo methods of inference for implicit statistical models. Journal of the Royal Statistical Society.
[6]
Griffiths, T., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences (pp. 5228--5235).
[7]
Lawrie, D., Croft, W., & Rosenberg, A. (2001). Finding topic words for hierarchical summarization. Proceedings of SIGIR'01 (pp. 349--357).
[8]
Newton, M., & Raftery, A. (1994). Approximate Bayesian inference with the weighted likelihood bootstrap. Journal of the Royal Statistical Society.
[9]
Teh, Y., Jordan, M., Beal, M., & Blei, D. (2005). Hierarchical Dirichlet processes. Journal of the American Statistical Association.

Cited By

View all
  • (2024)How Effective Is the Judiciary? Evidence on Correlation Between Cases’ Characteristics and Probability of AppealEuropean Journal of Empirical Legal Studies10.62355/ejels.248621:2(179-206)Online publication date: 17-Nov-2024
  • (2024)Unveiling Insights: A Knowledge Discovery Approach to Comparing Topic Modeling Techniques in Digital Health ResearchINTENSIF: Jurnal Ilmiah Penelitian dan Penerapan Teknologi Sistem Informasi10.29407/intensif.v8i1.220588:1(108-121)Online publication date: 10-Feb-2024
  • (2024)Accelerating Communication-Efficient Federated Multi-Task Learning With Personalization and FairnessIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.341181535:11(2239-2253)Online publication date: Nov-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '06: Proceedings of the 23rd international conference on Machine learning
June 2006
1154 pages
ISBN:1595933832
DOI:10.1145/1143844
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2006

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Acceptance Rates

ICML '06 Paper Acceptance Rate 140 of 548 submissions, 26%;
Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)64
  • Downloads (Last 6 weeks)5
Reflects downloads up to 17 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)How Effective Is the Judiciary? Evidence on Correlation Between Cases’ Characteristics and Probability of AppealEuropean Journal of Empirical Legal Studies10.62355/ejels.248621:2(179-206)Online publication date: 17-Nov-2024
  • (2024)Unveiling Insights: A Knowledge Discovery Approach to Comparing Topic Modeling Techniques in Digital Health ResearchINTENSIF: Jurnal Ilmiah Penelitian dan Penerapan Teknologi Sistem Informasi10.29407/intensif.v8i1.220588:1(108-121)Online publication date: 10-Feb-2024
  • (2024)Accelerating Communication-Efficient Federated Multi-Task Learning With Personalization and FairnessIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.341181535:11(2239-2253)Online publication date: Nov-2024
  • (2024)Unveiling Emotions and Themes in Thai Songs via Topic Modeling2024 21st International Joint Conference on Computer Science and Software Engineering (JCSSE)10.1109/JCSSE61278.2024.10613629(682-688)Online publication date: 19-Jun-2024
  • (2024)Chrono Clustering: A Novel Methodology for Dynamic Topic Trend Analysis2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650581(1-6)Online publication date: 30-Jun-2024
  • (2024)Top2LabelExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122676242:COnline publication date: 16-May-2024
  • (2024)Uncovering Flat and Hierarchical Topics by Community Discovery on Word Co-occurrence NetworkData Science and Engineering10.1007/s41019-023-00239-29:1(41-61)Online publication date: 13-Mar-2024
  • (2024)A decadal study on identifying latent topics and research trends in open access LIS journals using topic modeling approachScientometrics10.1007/s11192-024-05058-4129:7(3841-3869)Online publication date: 1-Jul-2024
  • (2024)Big topic modeling based on a two-level hierarchical latent Beta-Liouville allocation for large-scale data and parameter streamingPattern Analysis and Applications10.1007/s10044-024-01213-y27:1Online publication date: 28-Feb-2024
  • (2024)Automatic Topic Title Assignment with Word EmbeddingJournal of Classification10.1007/s00357-024-09476-0Online publication date: 1-Jul-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media