Abstract
Topic assignment for a corpus of documents is a task of natural language processing (NLP). One of the noted and well studied methods is Latent Dirichlet Allocation (LDA) where statistical methods are applied. On the other hand applying deep-learning paradigm proved useful for many NLP tasks such as classification [3], sentiment analysis [8], text summarization [11]. This paper compares the results of LDA method and application of representations provided by Word2Vec [5] which makes use of deep learning paradigm.
Similar content being viewed by others
References
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). http://www.jmlr.org/papers/v3/blei03a.html
Documents for tests (2016). http://hereticsconsulting.files.wordpress.com/2016/01/textmining.zip
Enríquez, F., Troyano, J.A., López-Solaz, T.: An approach to the use of word embeddings in an opinion classification task. Expert Syst. Appl. 66, 1–6 (2016). http://dx.doi.org/10.1016/j.eswa.2016.09.005
Loper, E., Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, ETMTNLP 2002, vol. 1, pp. 63–70. Association for Computational Linguistics, Stroudsburg (2002). http://dx.doi.org/10.3115/1118108.1118117
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013, Proceedings of a Meeting, 5–8 December 2013, Lake Tahoe, Nevada, USA, pp. 3111–3119 (2013). http://papers.nips.cc/book/advances-in-neural-information-processing-systems-26-2013
Nallapati, R., Cohen, W.W., Lafferty, J.D.: Parallelized variational EM for latent Dirichlet allocation: an experimental evaluation of speed and scalability. In: Workshops Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007), 28–31 October 2007, Omaha, Nebraska, USA, pp. 349–354. IEEE Computer Society (2007). http://dx.doi.org/10.1109/ICDMW.2007.33
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworksm, pp. 45–50. ELRA, Valletta, May 2010. http://is.muni.cz/publication/884893/en
Sakenovich, N.S., Zharmagambetov, A.S.: On one approach of solving sentiment analysis task for Kazakh and Russian languages using deep learning. In: Nguyen, N.-T., Manolopoulos, Y., Iliadis, L., Trawiński, B. (eds.) ICCCI 2016. LNCS (LNAI), vol. 9876, pp. 537–545. Springer, Cham (2016). doi:10.1007/978-3-319-45246-3_51
Skfuzzy: Fuzzy logic toolkit in python (2016). http://pythonhosted.org/scikit-fuzzy/
Topicmodels: Package for r (2016). https://cran.r-project.org/web/packages/topicmodels/
Yousefi-Azar, M., Hamey, L.: Text summarization using unsupervised deep learning. Expert Syst. Appl. 68, 93–105 (2017). http://dx.doi.org/10.1016/j.eswa.2016.10.017
Zhang, W., Wang, J.: Prior-based dual additive latent Dirichlet allocation for user-item connected documents. In: Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI 2015, pp. 1405–1411. AAAI Press (2015). http://dl.acm.org/citation.cfm?id=2832415.2832445
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Jȩdrzejowicz, J., Zakrzewska, M. (2017). Word Embeddings Versus LDA for Topic Assignment in Documents. In: Nguyen, N., Papadopoulos, G., Jędrzejowicz, P., Trawiński, B., Vossen, G. (eds) Computational Collective Intelligence. ICCCI 2017. Lecture Notes in Computer Science(), vol 10449. Springer, Cham. https://doi.org/10.1007/978-3-319-67077-5_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-67077-5_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67076-8
Online ISBN: 978-3-319-67077-5
eBook Packages: Computer ScienceComputer Science (R0)