research-article

Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation

Authors:

Christopher DuBois,

Padhraic Smyth,

Max WellingAuthors Info & Claims

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 446 - 454

https://doi.org/10.1145/2487575.2487697

Published: 11 August 2013 Publication History

Abstract

There has been an explosion in the amount of digital text information available in recent years, leading to challenges of scale for traditional inference algorithms for topic models. Recent advances in stochastic variational inference algorithms for latent Dirichlet allocation (LDA) have made it feasible to learn topic models on very large-scale corpora, but these methods do not currently take full advantage of the collapsed representation of the model. We propose a stochastic algorithm for collapsed variational Bayesian inference for LDA, which is simpler and more efficient than the state of the art method. In experiments on large-scale text corpora, the algorithm was found to converge faster and often to a better solution than previous methods. Human-subject experiments also demonstrated that the method can learn coherent topics in seconds on small corpora, facilitating the use of topic models in interactive document analysis software.

References

[1]

A. Asuncion, M. Welling, P. Smyth, and Y. Teh. On smoothing and inference for topic models. In Uncertainty in Artificial Intelligence, 2009.

Digital Library

[2]

D. C. Atkins, T. N. Rubin, M. Steyvers, M. A. Doeden, B. R. Baucom, and A. Christensen. Topic models: A novel method for modeling couple and family text data. Journal of Family Psychology, 6:816--827, 2012.

[3]

A. Banerjee and S. Basu. Topic models over text streams: A study of batch and online unsupervised learning. In SIAM Data Mining, 2007.

[4]

J. Bezanson, S. Karpinski, V. B. Shah, and A. Edelman. Julia: A fast dynamic language for technical computing. CoRR, abs/1209.5145, 2012.

[5]

D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. The Journal of Machine Learning Research, 3:993--1022, 2003.

Digital Library

[6]

J. Boyd-Graber, J. Chang, S. Gerrish, C. Wang, and D. Blei. Reading tea leaves: How humans interpret topic models. In Proceedings of the 23rd Annual Conference on Neural Information Processing Systems, 2009.

[7]

O. Cappé and E. Moulines. On-line expectation--maximization algorithm for latent data models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(3):593--613, 2009.

[8]

B. Carpenter. Integrating out multinomial parameters in latent Dirichlet allocation and naive bayes for collapsed Gibbs sampling. Technical report, LingPipe, 2010.

[9]

T. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101(Suppl 1):5228, 2004.

[10]

M. Hoffman, D. Blei, C. Wang, and J. Paisley. Stochastic variational inference. arXiv preprint arXiv:1206.7051, 2012.

[11]

M. D. Hoffman, D. M. Blei, and F. Bach. Online learning for latent Dirichlet allocation. Advances in Neural Information Processing Systems, 23:856--864, 2010.

Digital Library

[12]

D. Mimno. Computational historiography: Data mining in a century of classics journals. Journal on Computing and Cultural Heritage (JOCCH), 5(1):3, 2012.

Digital Library

[13]

D. Mimno. Reconstructing pompeian households. Uncertainty in Artificial Intelligence, 2012.

[14]

D. Mimno, M. Hoffman, and D. Blei. Sparse stochastic inference for latent Dirichlet allocation. In Proceedings of the International Conference on Machine Learning, 2012.

[15]

T. Minka. Power EP. Technical report, Microsoft Research, Cambridge, UK, 2004.

[16]

D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed algorithms for topic models. The Journal of Machine Learning Research, 10:1801--1828, 2009.

Digital Library

[17]

I. Porteous, D. Newman, A. Ihler, A. Asuncion, P. Smyth, and M. Welling. Fast collapsed gibbs sampling for latent dirichlet allocation. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 569--577, 2008.

Digital Library

[18]

I. Sato and H. Nakagawa. Rethinking collapsed variational Bayes inference for LDA. Proceedings of the International Conference on Machine Learning, 2012.

[19]

A. Smola and S. Narayanamurthy. An architecture for parallel topic models. Proceedings of the VLDB Endowment, 3(1--2):703--710, 2010.

Digital Library

[20]

Y. Teh, D. Newman, and M. Welling. A collapsed variational bayesian inference algorithm for latent Dirichlet allocation. Advances in Neural Information Processing Systems, 19:1353, 2007.

[21]

L. Yao, D. Mimno, and A. McCallum. Efficient methods for topic model inference on streaming document collections. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 937--946. ACM, 2009.

Digital Library

Cited By

Ihou KBouguila N(2024)Big topic modeling based on a two-level hierarchical latent Beta-Liouville allocation for large-scale data and parameter streamingPattern Analysis and Applications10.1007/s10044-024-01213-y27:1Online publication date: 28-Feb-2024
https://doi.org/10.1007/s10044-024-01213-y
Sahagun M(2024)Data Science and Machine Learning Integration in the Engineering Curriculum: Unlocking Innovations and OpportunitiesProceedings of Workshop on Interdisciplinary Sciences 202310.1007/978-981-97-7850-8_10(137-159)Online publication date: 21-Oct-2024
https://doi.org/10.1007/978-981-97-7850-8_10
Islam RKeya KPan SSarwate AFoulds J(2023)Differential Fairness: An Intersectional Framework for Fair AIEntropy10.3390/e2504066025:4(660)Online publication date: 14-Apr-2023
https://doi.org/10.3390/e25040660
Show More Cited By

Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation
1. Computing methodologies

Recommendations

Stochastic variational inference

We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet ...
A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation
NIPS'06: Proceedings of the 19th International Conference on Neural Information Processing Systems

Latent Dirichlet allocation (LDA) is a Bayesian network that has recently gained much popularity in applications ranging from document modeling to computer vision. Due to the large scale nature of these applications, current inference procedures like ...
Practical collapsed variational bayes inference for hierarchical dirichlet process
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

We propose a novel collapsed variational Bayes (CVB) inference for the hierarchical Dirichlet process (HDP). While the existing CVB inference for the HDP variant of latent Dirichlet allocation (LDA) is more complicated and harder to implement than that ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

August 2013

1534 pages

ISBN:9781450321747

DOI:10.1145/2487575

Editors:
Rayid Ghani
University of Chicago
,
Ted E. Senator
SAIC
,
Paul Bradley
MethodCare, Inc.
,
Rajesh Parekh
Groupon
,
Jingrui He
Stevens Institute of Technology
,
General Chairs:
Robert L. Grossman
University of Chicago and Open Data Group
,
Ramasamy Uthurusamy
General Motors Corporation (retired)
,
Program Chairs:
Inderjit S. Dhillon
University of Texas
,
Yehuda Koren
Google

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 August 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD' 13

Sponsor:

KDD' 13: The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 11 - 14, 2013

Illinois, Chicago, USA

Acceptance Rates

KDD '13 Paper Acceptance Rate 125 of 726 submissions, 17%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

72
Total Citations
View Citations
1,178
Total Downloads

Downloads (Last 12 months)57
Downloads (Last 6 weeks)9

Reflects downloads up to 20 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ihou KBouguila N(2024)Big topic modeling based on a two-level hierarchical latent Beta-Liouville allocation for large-scale data and parameter streamingPattern Analysis and Applications10.1007/s10044-024-01213-y27:1Online publication date: 28-Feb-2024
https://doi.org/10.1007/s10044-024-01213-y
Sahagun M(2024)Data Science and Machine Learning Integration in the Engineering Curriculum: Unlocking Innovations and OpportunitiesProceedings of Workshop on Interdisciplinary Sciences 202310.1007/978-981-97-7850-8_10(137-159)Online publication date: 21-Oct-2024
https://doi.org/10.1007/978-981-97-7850-8_10
Islam RKeya KPan SSarwate AFoulds J(2023)Differential Fairness: An Intersectional Framework for Fair AIEntropy10.3390/e2504066025:4(660)Online publication date: 14-Apr-2023
https://doi.org/10.3390/e25040660
Dong XMa JZhang XShaalan AChen QJia S(2023)Revolution trend investigation of tourism destination image with machine learningJournal of Vacation Marketing10.1177/13567667231213152Online publication date: 9-Nov-2023
https://doi.org/10.1177/13567667231213152
Li GZhang H(2023)A Method of Point Cloud Classification Fused the Multilevel Point Set FeaturesProceedings of the 2023 7th International Conference on Electronic Information Technology and Computer Engineering10.1145/3650400.3650490(545-550)Online publication date: 20-Oct-2023
https://dl.acm.org/doi/10.1145/3650400.3650490
Hosseiny Marani ABaumer E(2023)A Review of Stability in Topic Modeling: Metrics for Assessing and Techniques for Improving StabilityACM Computing Surveys10.1145/362326956:5(1-32)Online publication date: 27-Nov-2023
https://dl.acm.org/doi/10.1145/3623269
ElKattan AGavilan DElsharnouby MMahran A(2023)Mapping sharing economy themes: science mapping, topic modeling, and research agendaJournal of Marketing Analytics10.1057/s41270-023-00238-2Online publication date: 28-Sep-2023
https://doi.org/10.1057/s41270-023-00238-2
Chauhan RAcharjya DAvasthi S(2022)Topic Modeling Techniques for Text Mining Over a Large-Scale Scientific and Biomedical Text CorpusInternational Journal of Ambient Computing and Intelligence10.4018/IJACI.29313713:1(1-18)Online publication date: 29-Apr-2022
https://dl.acm.org/doi/10.4018/IJACI.293137
Papadia GPacella MGiliberti V(2022)Topic Modeling for Automatic Analysis of Natural Language: A Case Study in an Italian Customer Support CenterAlgorithms10.3390/a1506020415:6(204)Online publication date: 13-Jun-2022
https://doi.org/10.3390/a15060204
Ihou KAmayri MBouguila N(2022)Stochastic Variational Optimization of a Hierarchical Dirichlet Process Latent Beta-Liouville Topic ModelACM Transactions on Knowledge Discovery from Data10.1145/350272716:5(1-48)Online publication date: 9-Mar-2022
https://dl.acm.org/doi/10.1145/3502727
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents