Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2983323.2983794acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

LDA Revisited: Entropy, Prior and Convergence

Published: 24 October 2016 Publication History

Abstract

Inference algorithms of latent Dirichlet allocation (LDA), either for small or big data, can be broadly categorized into expectation-maximization (EM), variational Bayes (VB) and collapsed Gibbs sampling (GS). Looking for a unified understanding of these different inference algorithms is currently an important open problem. In this paper, we revisit these three algorithms from the entropy perspective, and show that EM can achieve the best predictive perplexity (a standard performance metric for LDA accuracy) by minimizing directly the cross entropy between the observed word distribution and LDA's predictive distribution. Moreover, EM can change the entropy of LDA's predictive distribution through tuning priors of LDA, such as the Dirichlet hyperparameters and the number of topics, to minimize the cross entropy with the observed word distribution. Finally, we propose the adaptive EM (AEM) algorithm that converges faster and more accurate than the current state-of-the-art SparseLDA [20] and AliasLDA [12] from small to big data and LDA models. The core idea is that the number of active topics, measured by the residuals between E-steps at successive iterations, decreases significantly, leading to the amortized σ(1) time complexity in terms of the number of topics. The open source code of AEM is available at GitHub.

References

[1]
A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. Smola. Scalable inference in latent variable models. In WSDM, pages 123--132, 2012.
[2]
S. Arora, R. Ge, Y. Halpern, D. Mimno, A. Moitra, D. Sontag, Y. Wu, and M. Zhu. A practical algorithm for topic modeling with provable guarantees. In ICML, 2013.
[3]
A. Asuncion, M. Welling, P. Smyth, and Y. W. Teh. On smoothing and inference for topic models. In UAI, pages 27--34, 2009.
[4]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, 2003.
[5]
N. de Freitas and K. Barnard. Bayesian latent semantic analysis of multimedia databases. Technical report, University of British Columbia, 2001.
[6]
E. Gaussier and C. Goutte. Relation between PLSA and NMF and implications. In SIGIR, pages 15--19, 2005.
[7]
M. Girolami and A. Kabán. On an equivalence between PLSI and LDA. In SIGIR, pages 433--434, 2003.
[8]
T. L. Griffiths and M. Steyvers. Finding scientific topics. Proc. Natl. Acad. Sci., 101:5228--5235, 2004.
[9]
M. Hoffman, D. Blei, and F. Bach. Online learning for latent Dirichlet allocation. In NIPS, pages 856--864, 2010.
[10]
T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42:177--196, 2001.
[11]
D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788--791, 1999.
[12]
A. Q. Li, A. Ahmed, S. Ravi, and A. J. Smola. Reducing the sampling complexity of topic models. In KDD, pages 891--900, 2014.
[13]
X. Liu, J. Zeng, X. Yang, J. Yan, and Q. Yang. Scalable parallel EM algorithms for latent Dirichlet allocation in multi-core systems. In WWW, pages 669--679, 2015.
[14]
K. P. Murphy. Machine Learning: A Probabilistic Perspective. The MIT Press, 2012.
[15]
I. Porteous, D. Newman, A. Ihler, A. Asuncion, P. Smyth, and M. Welling. Fast collapsed Gibbs sampling for latent Dirichlet allocation. In KDD, pages 569--577, 2008.
[16]
J. Tang, Z. Meng, X. Nguyen, Q. Mei, and M. Zhang. Understanding the limiting factors of topic modeling via posterior contraction analysis. In ICML, 2014.
[17]
Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(576):1566--1581, 2006.
[18]
H. Wallach, D. Mimno, and A. McCallum. Rethinking LDA: Why priors matter. In NIPS, pages 1973--1981, 2009.
[19]
Y. Wang, X. Zhao, Z. Sun, H. Yan, L. Wang, Z. Jin, L. Wang, Y. Gao, C. Law, and J. Zeng. Peacock: Learning long-tail topic features for industrial applications. ACM Transactions on Intelligent Systems and Technology, 6(4):47, 2015.
[20]
L. Yao, D. Mimno, and A. McCallum. Efficient methods for topic model inference on streaming document collections. In KDD, pages 937--946, 2009.
[21]
J. Yuan, F. Gao, Q. Ho, W. Dai, J. Wei, X. Zheng, E. P. Xing, T.-Y. Liu, and W.-Y. Ma. LightLDA: Big topic models on modest compute clusters. In WWW, pages 1351--1361, 2015.
[22]
J. Zeng. A topic modeling toolbox using belief propagation. J. Mach. Learn. Res., 13:2233--2236, 2012.
[23]
J. Zeng, W. K. Cheung, and J. Liu. Learning topic models by belief propagation. IEEE Trans. Pattern Anal. Mach. Intell., 35(5):1121--1134, 2013.
[24]
J. Zeng, Z.-Q. Liu, and X.-Q. Cao. Fast online EM for big topic modeling. IEEE Trans. Knowl. Data Eng., page arXiv:1210.2179 {cs.LG}, 2015.
[25]
K. Zhai, J. Boyd-Graber, and N. Asadi. Mr. LDA: A flexible large scale topic modeling package using variational inference in MapReduce. In WWW, pages 879--888, 2012.

Cited By

View all
  • (2018)Weighting schemes based on EM algorithm for LDAInternational Journal of High Performance Systems Architecture10.5555/3271844.32718538:1-2(94-104)Online publication date: 1-Jan-2018

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
October 2016
2566 pages
ISBN:9781450340731
DOI:10.1145/2983323
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. adaptive em algorithms
  2. big data
  3. convergence
  4. entropy
  5. latent dirichlet allocation
  6. prior

Qualifiers

  • Research-article

Funding Sources

Conference

CIKM'16
Sponsor:
CIKM'16: ACM Conference on Information and Knowledge Management
October 24 - 28, 2016
Indiana, Indianapolis, USA

Acceptance Rates

CIKM '16 Paper Acceptance Rate 160 of 701 submissions, 23%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Weighting schemes based on EM algorithm for LDAInternational Journal of High Performance Systems Architecture10.5555/3271844.32718538:1-2(94-104)Online publication date: 1-Jan-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media