Abstract
This paper reports the results of a study on automatic keyword extraction in German. We employed in general two types of methods: (A) unsupervised, based on information theory, i.e., (i) a bigram model, (ii) a probabilistic parser model, and (iii) a novel model which considers topics within the discourse of target word for the calculation of their information content, and (B) supervised, employing a recurrent neural network (RNN). As baselines, we employed TextRank and the TF-IDF ranking function. The topic model (A)(iii) outperformed clearly all remaining models, even TextRank and TF-IDF. In contrast, RNN performed poorly. We take the results as first evidence that (i) information content can be employed for keyword extraction tasks and has thus a clear correspondence to semantics of natural language, and (ii) that—as a cognitive principle—the information content of words is determined from extra-sentential contexts, i.e., from the discourse of words.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
A k-truss in a graph is a subset of the graph such that every edge in the subject is supported by at least \( k - 2 \) other edges that form triangles with that particular edge. In other words, every edge in the truss must be part of \( k - 2 \) triangles made up of nodes that are part of the truss. https://louridas.github.io/rwa/assignments/finding-trusses/.
- 2.
- 3.
References
Aji, S., Kaimal, R.: Document summarization using positive pointwise mutual information. Int. J. Comput. Sci. Inf. Technol. 4(2), 47 (2012). https://doi.org/10.5121/ijcsit.2012.4204
Bever, T.G.: The cognitive basis for linguistic structures. Cogn. Dev. Lang. (1970)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Celano, G.G., Richter, M., Voll, R., Heyer, G.: Aspect coding asymmetries of verbs: the case of Russian. In: Proceedings of the 14th Conference on Natural Language Processing, pp. 34–39 (2018)
Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches (2014). arXiv preprint arXiv:1409.1259. https://doi.org/10.3115/v1/W14-4012
Cohen, J.: Graph twiddling in a mapreduce world. Comput. Sci. Eng. 11(4), 29–41 (2009). https://doi.org/10.1109/MCSE.2009.120
van Dijk, B.: Parlement européen. In: Evaluation des opérations pilotes d’indexation automatique (Convention spécifique no 52556), Rapport d’évalution finale (1995)
Dretske, F.: Knowledge and the Flow of Information. MIT Press, Cambridge (1981)
Foley, R.: Dretske’s “information-theoretic” account of knowledge. Synthese 159–184 (1987). https://doi.org/10.1007/BF00413933
Frege, G.: Begriffsschrift, a formula language, modeled upon that of arithmetic, for pure thought. From Frege to Gödel: A Source Book in Mathematical Logic, vol. 1931, pp. 1–82 (1879). https://doi.org/10.4159/harvard.9780674864603.c2
Hale, J.: A probabilistic earley parser as a psycholinguistic model. In: 2nd Meeting of the North American Chapter of the Association for Computational Linguistics (2001)
Honnibal, M., Johnson, M.: An improved non-monotonic transition system for dependency parsing. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1373–1378 (2015). https://doi.org/10.18653/v1/D15-1162
Horch, E., Reich, I.: On “article omission” in German and the “uniform information density hypothesis”. Bochumer Linguistische Arbeitsberichte, p. 125 (2016)
Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 216–223 (2003). https://doi.org/10.3115/1119355.1119383
Hulth, A.: Enhancing linguistically oriented automatic keyword extraction. In: Proceedings of Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics 2004: Short Papers, pp. 17–20 (2004). https://doi.org/10.3115/1613984.1613989
Huo, H., Liu, X.H.: Automatic summarization based on mutual information. In: Applied Mechanics and Materials, vol. 513, pp. 1994–1997. Trans Tech Publications, Freienbach (2014). https://doi.org/10.4028/www.scientific.net/AMM.513-517.1994
Jaeger, T.F.: Redundancy and reduction: speakers manage syntactic information density. Cogn. Psychol. 61(1), 23–62 (2010). https://doi.org/10.1016/j.cogpsych.2010.02.002
Jaeger, T.F., Levy, R.P.: Speakers optimize information density through syntactic reduction. In: Advances in Neural Information Processing Systems, pp. 849–856 (2007)
Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. (1972). https://doi.org/10.1108/eb026526
Kamp, H.: Discourse representation theory: what it is and where it ought to go. Nat. Lang. Comput. 320(1), 84–111 (1988)
Krifka, M.: Basic notions of information structure. Acta Linguist. Hung. 55(3–4), 243–276 (2008). https://doi.org/10.1556/aling.55.2008.3-4.2
Kölbl, M., Kyogoku, Y., Philipp, J.N., Richter, M., Rietdorf, C., Yousef, T.: Keyword extraction in German: information-theory vs. deep learning. In: ICAART (1), pp. 459–464 (2020). https://doi.org/10.5220/0009374704590464
Levy, R.: Expectation-based syntactic comprehension. Cognition 106(3), 1126–1177 (2008). https://doi.org/10.1016/j.cognition.2007.05.006
Liu, R., Nyberg, E.: A phased ranking model for question answering. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 79–88 (2013). https://doi.org/10.1145/2505515.2505678
Lombardi, O.: Dretske, Shannon’s theory and the interpretation of information. Synthese 144(1), 23–39 (2005). https://doi.org/10.1007/s11229-005-9127-0
Lombardi, O., Holik, F., Vanni, L.: What is Shannon information? Synthese 193(7), 1983–2012 (2016). https://doi.org/10.1007/s11229-015-0824-z
Marujo, L., Bugalho, M., Neto, J.P.S., Gershman, A., Carbonell, J.: Hourly traffic prediction of news stories (2013). arXiv preprint arXiv:1306.4608
Marujo, L., Ling, W., Trancoso, I., Dyer, C., Black, A.W., Gershman, A., de Matos, D.M., Neto, J.P., Carbonell, J.G.: Automatic keyword extraction on twitter. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 637–643 (2015). https://doi.org/10.3115/v1/P15-2105
May, C., Cotterell, R., Van Durme, B.: An analysis of lemmatization on topic models of morphologically rich language (2016). arXiv preprint arXiv:1608.03995
Mihalcea, R., Tarau, P.: Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)
Ogden, C.K., Richards, I.A.: The Meaning of Meaning: A Study of the Influence of Language upon Thought and of the Science of Symbolism, vol. 29. K. Paul, Trench, Trubner & Company, Limited, London (1923). https://doi.org/10.1038/111566b0
Özgür, A., Özgür, L., Güngör, T.: Text categorization with class-based and corpus-based keyword selection. In: International Symposium on Computer and Information Sciences, pp. 606–615. Springer (2005). https://doi.org/10.1007/11569596_63
Pal, A.R., Maiti, P.K., Saha, D.: An approach to automatic text summarization using simplified Lesk algorithm and wordnet. Int. J. Control. Theory Comput. Model. 3 (2013). https://doi.org/10.5121/ijctcm.2013.3502
Peirce, C.S.: Collected Papers of Charles S. Peirce. In: Hartshorne, C., Weiss, P., Burks, A.W. (eds.) (1932)
Ravindra, G.: Information theoretic approach to extractive text summarization. Ph.D. thesis, Supercomputer Education and Research Center, Indian Institute of Science, Bangalore (2009)
Richter, M., Kyogoku, Y., Kölbl, M.: Estimation of average information content: comparison of impact of contexts. In: Proceedings of SAI Intelligent Systems Conference, pp. 1251–1257. Springer (2019). https://doi.org/10.1007/978-3-030-29513-4_91
Richter, M., Kyogoku, Y., Kölbl, M.: Interaction of information content and frequency as predictors of verbs’ lengths. In: International Conference on Business Information Systems, pp. 271–282. Springer (2019). https://doi.org/10.1007/978-3-030-20485-3
Rietdorf, C., Kölbl, M., Kyogoku, Y., Richter, M.: Summarisation by information maps. A pilot study (2019). Submitted
Rogers, T.M.: Is Dretske’s Theory of Information Naturalistically Grounded? How emergent communication channels reference an abstracted ontic framework (2007). https://www.researchgate.net/publication/326561084. Unpublished
Rooth, M.: Association with focus. Ph.D. thesis, Department of Linguistics, University of Massachusetts, Amherst (1985). Unpublished
Rooth, M.: A theory of focus interpretation. Nat. Lang. Semant. 1(1), 75–116 (1992). https://doi.org/10.1007/BF02342617
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988). https://doi.org/10.1016/0306-4573(88)90021-0
Schofield, A., Mimno, D.: Comparing apples to apple: the effects of stemmers on topic models. Trans. Assoc. Comput. Linguist. 4, 287–300 (2016). https://doi.org/10.1162/tacl_a_00099
Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mob. Comput. Commun. Rev. 5(1), 3–55 (1948). https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
Sowa, J.F., Way, E.C.: Implementing a semantic interpreter using conceptual graphs. IBM J. Res. Dev. 30(1), 57–69 (1986). https://doi.org/10.1147/rd.301.0057
Stolcke, A.: An efficient probabilistic context-free parsing algorithm that computes prefix probabilities (1994). arXiv preprint arXiv:cmp-lg/9411029
Tixier, A., Malliaros, F., Vazirgiannis, M.: A graph degeneracy-based approach to keyword extraction. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1860–1870 (2016).https://doi.org/10.18653/v1/D16-1191
Turney, P.D.: Learning algorithms for keyphrase extraction. Inf. Retr. 2(4), 303–336 (2000). https://doi.org/10.1023/A:1009976227802
Vijayarajan, V., Dinakaran, M., Tejaswin, P., Lohani, M.: A generic framework for ontology-based information retrieval and image retrieval in web data. Hum.-Centric Comput. Inf. Sci. 6(1), 18 (2016). https://doi.org/10.1186/s13673-016-0074-1
Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: Kea: practical automated keyphrase extraction. In: Design and Usability of Digital Libraries: Case Studies in the Asia Pacific, pp. 129–152. IGI Global, Pennsylvania (2005)
Yang, Z., Nyberg, E.: Leveraging procedural knowledge for task-oriented search. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 513–522 (2015). https://doi.org/10.1145/2766462.2767744
Zhang, Q., Wang, Y., Gong, Y., Huang, X.J.: Keyphrase extraction using deep recurrent neural networks on Twitter. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 836–845 (2016). https://doi.org/10.18653/v1/D16-1080
Acknowledgements
This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation), project number: 357550571. The training of the neural network was done on the High Performance Computing (HPC) Cluster of the Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH) of the Technische Universität Dresden. Thanks to Caitlin Hazelwood for proofreading this chapter. This chapter is an extended version from the initial paper with the title ‘Keyword extraction in German: Information-theory vs. deep learning’ published in Proceedings of the 12th International Conference on Agents and Artificial Intelligence (Vol. 1), 459–464, ICAART 2020.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Kölbl, M., Kyogoku, Y., Philipp, J.N., Richter, M., Rietdorf, C., Yousef, T. (2021). The Semantic Level of Shannon Information: Are Highly Informative Words Good Keywords? A Study on German. In: Loukanova, R. (eds) Natural Language Processing in Artificial Intelligence—NLPinAI 2020. Studies in Computational Intelligence, vol 939. Springer, Cham. https://doi.org/10.1007/978-3-030-63787-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-63787-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63786-6
Online ISBN: 978-3-030-63787-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)