Abstract
We introduce a method for computing classifier-based semantic spaces on top of text2ddc . To this end, we optimize text2ddc, a neural network-based classifier for the Dewey Decimal Classification (DDC). By using a wide range of linguistic features, including sense embeddings, we achieve an F-score of 87,4%. To show that our approach is language independent, we evaluate text2ddc by classifying texts in six different languages. Based thereon, we develop a topic model that generates probability distributions over topics for linguistic input at the word (sense), sentence and text level. In contrast to related approaches, these probabilities are estimated with text2ddc, so that each dimension of the resulting embeddings corresponds to a separate DDC class. We finally evaluate this Classifier-based Semantic space (CaSe) in the context of text classification and show that it improves the classification results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
This paper expands and details the work we have presented in [2], providing more information about our model and the used data. We elaborate on the experiments and evaluation of text2ddc and CaSe , and include an error analysis.
- 2.
- 3.
- 4.
References
Bär, D., Biemann, C., Gurevych, I., Zesch, T.: UKP: Computing semantic textual similarity by combining multiple content similarity measures. In: Proceedings of SemEval ’12, pp. 435–440. Stroudsburg (2012)
Baumartz, D., Uslu, T., Mehler, A.: LTV: Labeled topic vector. In: Proceedings of COLING 2018. In: the 27th International Conference on Computational Linguistics: System Demonstrations, August 20–26. The COLING 2018 Organizing Committee, Santa Fe, New Mexico, USA (2018)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3 993–1022 (2003)
vor der Brück, T., Eger, S., Mehler, A.: Complex decomposition of the negative distance kernel. In: IEEE International Conference on Machine Learning and Applications (2015)
Hemati, W., Uslu, T., Mehler, A.: Textimager: a distributed uima-based system for nlp. In: Proceedings of COLING 2016. In: The 26th International Conference on Computational Linguistics: System Demonstrations, pp. 59–63 (2016)
Iacobacci, I., Pilehvar, M.T., Navigli, R.: Sensembed: Learning sense embeddings for word and relational similarity. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). vol. 1, pp. 95–105 (2015)
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)
Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188 (2014)
Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104(2), 211–240 (1997)
Leopold, E.: Models of semantic spaces. In: Mehler, A., Köhler, R. (eds.) Aspects of Automatic Text Analysis, Studies in Fuzziness and Soft Computing, vol. 209, pp. 117–137. Springer, Heidelberg (2007)
Li, J., Jurafsky, D.: Do multi-sense embeddings improve natural language understanding? arXiv preprint arXiv:1506.01070 (2015)
Li, Qi., Li, Tianshi, Chang, Baobao: Learning word sense embeddings from word sense definitions. In: Lin, Chin-Yew., Xue, Nianwen, Zhao, Dongyan, Huang, Xuanjing, Feng, Yansong (eds.) ICCPOL/NLPCC -2016. LNCS (LNAI), vol. 10102, pp. 224–235. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50496-4_19
Magatti, D., Calegari, S., Ciucci, D., Stella, F.: Automatic labeling of topics. In: Intelligent Systems Design and Applications, 2009. ISDA’09. In: Ninth International Conference, pp. 1227–1232. IEEE (2009)
Mei, Q., Shen, X., Zhai, C.: Automatic labeling of multinomial topic models. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 490–499. KDD ’07, ACM, New York, NY, USA (2007). https://doi.org/10.1145/1281192.1281246, http://doi.acm.org/10.1145/1281192.1281246
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Pelevina, M., Arefyev, N., Biemann, C., Panchenko, A.: Making sense of word embeddings. arXiv preprint arXiv:1708.03390 (2017)
Pilehvar, M.T., Navigli, R.: From senses to texts: An all-in-one graph-based approach for measuring semantic similarity. Artif. Intell. 228, 95–128 (2015)
Uslu, T., Mehler, A., Baumartz, D., Henlein, A., Hemati, W.: fastsense: An efficient word sense disambiguation classifier. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018) (2018)
Uslu, T., Mehler, A., Niekler, A., Baumartz, D.: Towards a DDC-based topic network model of wikipedia. In: Proceedings of 2nd International Workshop on Modeling, Analysis, and Management of Social Networks and their Applications (SOCNET 2018), February 28, 2018 (2018)
Vial, L., Lecouteux, B., Schwab, D.: Sense embeddings in knowledge-based word sense disambiguation. In: 12th International Conference on Computational Semantics (2017)
Waltinger, U., Mehler, A., Lösch, M., Horstmann, W.: Hierarchical classification of OAI metadata using the DDC taxonomy. In: Bernardi, R., Chambers, S., Gottfried, B., Segond, F., Zaihrayeu, I. (eds.) Advanced Language Technologies for Digital Libraries (ALT4DL), pp. 29–40. Springer, LNCS (2011)
Wu, L., Fisch, A., Chopra, S., Adams, K., Bordes, A., Weston, J.: Starspace: Embed all the things! CoRR abs/1709.03856 (2017). http://arxiv.org/abs/1709.03856
Zhang, X., LeCun, Y.: Text understanding from scratch. CoRR abs/1502.01710 (2015). http://arxiv.org/abs/1502.01710
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Uslu, T., Mehler, A., Baumartz, D. (2023). Computing Classifier-Based Embeddings with the Help of Text2ddc. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13452. Springer, Cham. https://doi.org/10.1007/978-3-031-24340-0_37
Download citation
DOI: https://doi.org/10.1007/978-3-031-24340-0_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24339-4
Online ISBN: 978-3-031-24340-0
eBook Packages: Computer ScienceComputer Science (R0)