Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2487575.2487697acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation

Published: 11 August 2013 Publication History

Abstract

There has been an explosion in the amount of digital text information available in recent years, leading to challenges of scale for traditional inference algorithms for topic models. Recent advances in stochastic variational inference algorithms for latent Dirichlet allocation (LDA) have made it feasible to learn topic models on very large-scale corpora, but these methods do not currently take full advantage of the collapsed representation of the model. We propose a stochastic algorithm for collapsed variational Bayesian inference for LDA, which is simpler and more efficient than the state of the art method. In experiments on large-scale text corpora, the algorithm was found to converge faster and often to a better solution than previous methods. Human-subject experiments also demonstrated that the method can learn coherent topics in seconds on small corpora, facilitating the use of topic models in interactive document analysis software.

References

[1]
A. Asuncion, M. Welling, P. Smyth, and Y. Teh. On smoothing and inference for topic models. In Uncertainty in Artificial Intelligence, 2009.
[2]
D. C. Atkins, T. N. Rubin, M. Steyvers, M. A. Doeden, B. R. Baucom, and A. Christensen. Topic models: A novel method for modeling couple and family text data. Journal of Family Psychology, 6:816--827, 2012.
[3]
A. Banerjee and S. Basu. Topic models over text streams: A study of batch and online unsupervised learning. In SIAM Data Mining, 2007.
[4]
J. Bezanson, S. Karpinski, V. B. Shah, and A. Edelman. Julia: A fast dynamic language for technical computing. CoRR, abs/1209.5145, 2012.
[5]
D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. The Journal of Machine Learning Research, 3:993--1022, 2003.
[6]
J. Boyd-Graber, J. Chang, S. Gerrish, C. Wang, and D. Blei. Reading tea leaves: How humans interpret topic models. In Proceedings of the 23rd Annual Conference on Neural Information Processing Systems, 2009.
[7]
O. Cappé and E. Moulines. On-line expectation--maximization algorithm for latent data models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(3):593--613, 2009.
[8]
B. Carpenter. Integrating out multinomial parameters in latent Dirichlet allocation and naive bayes for collapsed Gibbs sampling. Technical report, LingPipe, 2010.
[9]
T. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101(Suppl 1):5228, 2004.
[10]
M. Hoffman, D. Blei, C. Wang, and J. Paisley. Stochastic variational inference. arXiv preprint arXiv:1206.7051, 2012.
[11]
M. D. Hoffman, D. M. Blei, and F. Bach. Online learning for latent Dirichlet allocation. Advances in Neural Information Processing Systems, 23:856--864, 2010.
[12]
D. Mimno. Computational historiography: Data mining in a century of classics journals. Journal on Computing and Cultural Heritage (JOCCH), 5(1):3, 2012.
[13]
D. Mimno. Reconstructing pompeian households. Uncertainty in Artificial Intelligence, 2012.
[14]
D. Mimno, M. Hoffman, and D. Blei. Sparse stochastic inference for latent Dirichlet allocation. In Proceedings of the International Conference on Machine Learning, 2012.
[15]
T. Minka. Power EP. Technical report, Microsoft Research, Cambridge, UK, 2004.
[16]
D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed algorithms for topic models. The Journal of Machine Learning Research, 10:1801--1828, 2009.
[17]
I. Porteous, D. Newman, A. Ihler, A. Asuncion, P. Smyth, and M. Welling. Fast collapsed gibbs sampling for latent dirichlet allocation. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 569--577, 2008.
[18]
I. Sato and H. Nakagawa. Rethinking collapsed variational Bayes inference for LDA. Proceedings of the International Conference on Machine Learning, 2012.
[19]
A. Smola and S. Narayanamurthy. An architecture for parallel topic models. Proceedings of the VLDB Endowment, 3(1--2):703--710, 2010.
[20]
Y. Teh, D. Newman, and M. Welling. A collapsed variational bayesian inference algorithm for latent Dirichlet allocation. Advances in Neural Information Processing Systems, 19:1353, 2007.
[21]
L. Yao, D. Mimno, and A. McCallum. Efficient methods for topic model inference on streaming document collections. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 937--946. ACM, 2009.

Cited By

View all
  • (2024)Big topic modeling based on a two-level hierarchical latent Beta-Liouville allocation for large-scale data and parameter streamingPattern Analysis and Applications10.1007/s10044-024-01213-y27:1Online publication date: 28-Feb-2024
  • (2024)Data Science and Machine Learning Integration in the Engineering Curriculum: Unlocking Innovations and OpportunitiesProceedings of Workshop on Interdisciplinary Sciences 202310.1007/978-981-97-7850-8_10(137-159)Online publication date: 21-Oct-2024
  • (2023)Differential Fairness: An Intersectional Framework for Fair AIEntropy10.3390/e2504066025:4(660)Online publication date: 14-Apr-2023
  • Show More Cited By
  1. Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2013
    1534 pages
    ISBN:9781450321747
    DOI:10.1145/2487575
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 August 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. stochastic learning
    2. topic models
    3. variational inference

    Qualifiers

    • Research-article

    Conference

    KDD' 13
    Sponsor:

    Acceptance Rates

    KDD '13 Paper Acceptance Rate 125 of 726 submissions, 17%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)57
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 20 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Big topic modeling based on a two-level hierarchical latent Beta-Liouville allocation for large-scale data and parameter streamingPattern Analysis and Applications10.1007/s10044-024-01213-y27:1Online publication date: 28-Feb-2024
    • (2024)Data Science and Machine Learning Integration in the Engineering Curriculum: Unlocking Innovations and OpportunitiesProceedings of Workshop on Interdisciplinary Sciences 202310.1007/978-981-97-7850-8_10(137-159)Online publication date: 21-Oct-2024
    • (2023)Differential Fairness: An Intersectional Framework for Fair AIEntropy10.3390/e2504066025:4(660)Online publication date: 14-Apr-2023
    • (2023)Revolution trend investigation of tourism destination image with machine learningJournal of Vacation Marketing10.1177/13567667231213152Online publication date: 9-Nov-2023
    • (2023)A Method of Point Cloud Classification Fused the Multilevel Point Set FeaturesProceedings of the 2023 7th International Conference on Electronic Information Technology and Computer Engineering10.1145/3650400.3650490(545-550)Online publication date: 20-Oct-2023
    • (2023)A Review of Stability in Topic Modeling: Metrics for Assessing and Techniques for Improving StabilityACM Computing Surveys10.1145/362326956:5(1-32)Online publication date: 27-Nov-2023
    • (2023)Mapping sharing economy themes: science mapping, topic modeling, and research agendaJournal of Marketing Analytics10.1057/s41270-023-00238-2Online publication date: 28-Sep-2023
    • (2022)Topic Modeling Techniques for Text Mining Over a Large-Scale Scientific and Biomedical Text CorpusInternational Journal of Ambient Computing and Intelligence10.4018/IJACI.29313713:1(1-18)Online publication date: 29-Apr-2022
    • (2022)Topic Modeling for Automatic Analysis of Natural Language: A Case Study in an Italian Customer Support CenterAlgorithms10.3390/a1506020415:6(204)Online publication date: 13-Jun-2022
    • (2022)Stochastic Variational Optimization of a Hierarchical Dirichlet Process Latent Beta-Liouville Topic ModelACM Transactions on Knowledge Discovery from Data10.1145/350272716:5(1-48)Online publication date: 9-Mar-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media