Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2809936.2809943acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large Collections

Published: 18 October 2015 Publication History

Abstract

Probabilistic topic modeling of text collections is a powerful tool for statistical text analysis based on the preferential use of graphical models and Bayesian learning. Additive regularization for topic modeling (ARTM) is a recent semiprobabilistic approach, which provides a simpler inference for many models previously studied only in the Bayesian settings. ARTM reduces barriers to entry into topic modeling research field and facilitates combination of topic models. In this paper we develop the multimodal extension of ARTM approach and implement it in BigARTM open source project for online parallelized topic modeling. We demonstrate the ability of non-Bayesian regularization to combine modalities, languages and multiple criteria to find sparse, diverse, and interpretable topics.

References

[1]
N. Bassiou and C. Kotropoulos. Online PLSA: Batch updating techniques including out-of-vocabulary words. Neural Networks and Learning Systems, IEEE Transactions on, 25(11):1953--1966, Nov 2014.
[2]
D. M. Blei. Probabilistic topic models. Communications of the ACM, 55(4):77--84, 2012.
[3]
D. M. Blei and M. I. Jordan. Modeling annotated data. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pages 127--134, New York, NY, USA, 2003.
[4]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003.
[5]
J.-T. Chien and Y.-L. Chang. Bayesian sparse topic model. Journal of Signal Processessing Systems, 74:375--389, 2013.
[6]
D. A. Cohn and T. Hofmann. The missing link - a probabilistic model of document content and hypertext connectivity. In NIPS, pages 430--436, 2000.
[7]
A. Daud, J. Li, L. Zhou, and F. Muhammad. Knowledge discovery through directed probabilistic topic models: a survey. Frontiers of Computer Science in China, 4(2):280--301, 2010.
[8]
W. De Smet and M.-F. Moens. Cross-language linking of news stories on the web using interlingual topic modelling. In Proceedings of the 2Nd ACM Workshop on Social Web Search and Mining, SWSM '09, pages 57--64, New York, NY, USA, 2009.
[9]
L. Dietz, S. Bickel, and T. Scheffer. Unsupervised prediction of citation influences. In Proceedings of the 24th international conference on Machine learning, ICML '07, pages 233--240, New York, NY, USA, 2007.
[10]
J. Eisenstein, A. Ahmed, and E. P. Xing. Sparse additive generative models of text. In ICML'11, pages 1041--1048, 2011.
[11]
M. D. Hoffman, D. M. Blei, and F. R. Bach. Online learning for latent dirichlet allocation. In NIPS, pages 856--864. Curran Associates, Inc., 2010.
[12]
T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 50--57, 1999.
[13]
Y. Hu, Y. Koren and C. Volinsky. Collaborative filtering for implicit feedback datasets. In IEEE ICDM'08. 2008.
[14]
O. Khalifa, D. Corne, M. Chantler, and F. Halley. Multi-objective topic modelling. In 7th International Conference Evolutionary Multi-Criterion Optimization (EMO 2013), pages 51--65. Springer LNCS, 2013.
[15]
P. Koehn. Europarl: A Parallel Corpus for Statistical Machine Translation. In Conference Proceedings: the tenth Machine Translation Summit, pages 79--86, Phuket, Thailand, 2005.
[16]
M. O. Larsson and J. Ugander. A concave regularization technique for sparse mixture models. In Advances in Neural Information Processing Systems 24, pages 1890--1898, 2011.
[17]
Z. Liu, Y. Zhang, E. Y. Chang, and M. Sun. PLDA+: parallel latent Dirichlet allocation with data placement and pipeline processing. ACM Trans. Intell. Syst. Technol., 2(3):26:1--26:18, May 2011.
[18]
D. Mimno, M. Hoffman, and D. Blei. Sparse stochastic inference for latent Dirichlet allocation. In J. Langford and J. Pineau, editors, Proceedings of the 29th International Conference on Machine Learning (ICML-12), pages 1599--1606, 2012.
[19]
D. Mimno, H. M. Wallach, J. Naradowsky, D. A. Smith, and A. McCallum. Polylingual topic models. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2, EMNLP '09, pages 880--889, 2009.
[20]
D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed algorithms for topic models. J. Mach. Learn. Res., 10:1801--1828, Dec. 2009.
[21]
D. Newman, C. Chemudugunta, and P. Smyth. Statistical entity-topic models. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '06, pages 680--686, New York, NY, USA, 2006.
[22]
X. Ni, J.-T. Sun, J. Hu, and Z. Chen. Mining multilingual topics from wikipedia. In Proceedings of the 18th International Conference on World Wide Web, WWW '09, pages 1155--1156, 2009.
[23]
J. C. Platt, K. Toutanova, W.-T. Yih. Translingual document representations from discriminative projections. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 251--261, Stroudsburg, PA, USA, 2010.
[24]
R. Řehůřek and P. Sojka. Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 45--50, Valletta, Malta, May 2010.
[25]
T. N. Rubin, A. Chambers, P. Smyth, and M. Steyvers. Statistical topic models for multi-label document classification. Machine Learning, 88(1-2), pages 157--208, 2012.
[26]
M. Shashanka, B. Raj, and P. Smaragdis. Sparse over-complete latent variable decomposition of counts data. In J. C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems, NIPS-2007, pages 1313--1320. MIT Press, Cambridge, MA, 2008.
[27]
X. Si and M. Sun. Tag-lda for scalable real-time tag recommendation. Journal of Information & Computational Science, 6:23--31, 2009.
[28]
A. Smola and S. Narayanamurthy. An architecture for parallel topic models. Proc. VLDB Endow., 3(1-2):703--710, Sept. 2010.
[29]
Y. W. Teh, M. I. Jordan, M. J. Beal, D. M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566--1581, 2006.
[30]
A. N. Tikhonov, V. Y. Arsenin. Solution of ill-posed problems. W. H. Winston, Washington, DC. 1977.
[31]
K. V. Vorontsov. Additive regularization for topic models of text collections. Doklady Mathematics, 89(3):301--304, 2014.
[32]
K. V. Vorontsov and A. A. Potapenko. Additive regularization of topic models. Machine Learning, Special Issue on Data Analysis and Intelligent Optimization, 2014.
[33]
K. V. Vorontsov and A. A. Potapenko. Tutorial on probabilistic topic modeling: Additive regularization for stochastic matrix factorization. In AIST'2014, Analysis of Images, Social networks and Texts, volume 436, pages 29--46. Springer International Publishing Switzerland, Communications in Computer and Information Science (CCIS), 2014.
[34]
K. V. Vorontsov, A. A. Potapenko, and A. V. Plavin. Additive Regularization of Topic Models for Topic Selection and Sparse Factorization. In 3rd Int'l Symposium On Learning And Data Sciences (SLDS 2015), Royal Holloway, University of London, UK. Springer, LNAI 9047, pages 193--202, 2015.
[35]
C. Wang and D. M. Blei. Decoupling sparsity and smoothness in the discrete hierarchical Dirichlet process. In NIPS, pages 1982--1989. Curran Associates, Inc., 2009.
[36]
Y. Wang, H. Bai, M. Stanton, W.-Y. Chen, and E. Y. Chang. PLDA: Parallel latent Dirichlet allocation for large-scale applications. In Proceedings of the 5th International Conference on Algorithmic Aspects in Information and Management, pages 301--314, 2009

Cited By

View all
  • (2018)Topic Classification Through Topic Modeling with Additive Regularization for Collection of Scientific PapersProceedings of the 14th Central and Eastern European Software Engineering Conference Russia10.1145/3290621.3290629(1-5)Online publication date: 12-Oct-2018
  • (2018)Thesaurus-Based Topic Models and Their EvaluationProceedings of the 8th International Conference on Web Intelligence, Mining and Semantics10.1145/3227609.3227659(1-9)Online publication date: 25-Jun-2018
  • (2017)Fast and Modular Regularized Topic ModellingProceedings of the 21st Conference of Open Innovations Association FRUCT10.23919/FRUCT.2017.8250181(182-193)Online publication date: 13-Nov-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
TM '15: Proceedings of the 2015 Workshop on Topic Models: Post-Processing and Applications
October 2015
74 pages
ISBN:9781450337847
DOI:10.1145/2809936
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 October 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. additive regularization for topic modeling
  2. bigartm
  3. em-algorithm
  4. latent dirichlet allocation
  5. probabilistic latent sematic analysis
  6. probabilistic topic modeling

Qualifiers

  • Research-article

Funding Sources

Conference

CIKM'15
Sponsor:

Acceptance Rates

TM '15 Paper Acceptance Rate 8 of 12 submissions, 67%;
Overall Acceptance Rate 8 of 12 submissions, 67%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Topic Classification Through Topic Modeling with Additive Regularization for Collection of Scientific PapersProceedings of the 14th Central and Eastern European Software Engineering Conference Russia10.1145/3290621.3290629(1-5)Online publication date: 12-Oct-2018
  • (2018)Thesaurus-Based Topic Models and Their EvaluationProceedings of the 8th International Conference on Web Intelligence, Mining and Semantics10.1145/3227609.3227659(1-9)Online publication date: 25-Jun-2018
  • (2017)Fast and Modular Regularized Topic ModellingProceedings of the 21st Conference of Open Innovations Association FRUCT10.23919/FRUCT.2017.8250181(182-193)Online publication date: 13-Nov-2017
  • (2016)bBridgeProceedings of the 24th ACM international conference on Multimedia10.1145/2964284.2973836(759-761)Online publication date: 1-Oct-2016
  • (2016)"360° user profiling: past, future, and applications" by Aleksandr Farseev, Mohammad Akbari, Ivan Samborskii and Tat-Seng Chua with Martin Vesely as coordinatorACM SIGWEB Newsletter10.1145/2956573.29565772016:Summer(1-11)Online publication date: 6-Jul-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media