Abstract
TAGH is a system for automatic recognition of German word forms. It is based on a stem lexicon with allomorphs and a concatenative mechanism for inflection and word formation. Weighted FSA and a cost function are used in order to determine the correct segmentation of complex forms: the correct segmentation for a given compound is supposed to be the one with the least cost. TAGH is based on a large stem lexicon of almost 80.000 stems that was compiled within 5 years on the basis of large newspaper corpora and literary texts. The number of analyzable word forms is increased considerably by more than 1000 different rules for derivational and compositional word formation. The recognition rate of TAGH is more than 99% for modern newspaper text and approximately 98.5% for literary texts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Augst, G.: Lexikon zur Wortbildung. Morpheminventar Bd. 1-3. Tübingen (1975)
Cormen, T.H., Leiserson, C.L., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2001)
Courtois, B.: Dictionnaires électroniques DELAF anglais et français. In: Leclère, C., Laporte, E., Piot, M., Silberztein, M. (eds.) Syntax, Lexis and Lexicon-Grammar. Papers in honour of Maurice Gross, Lingvisticae Investigationes Supplementa 24, pp. 113–125. Benjamins, Amsterdam-Philadelphia (2004)
Geyken, A., Schrader, N.: LexikoNet - a lexical database based on type and role hierarchies. Technical Report BBAW/DWDS, Berlin (2005)
Golan, J.S.: Semirings and Their Applications. Kluwer, Dordrecht (1999)
Haapalainen, M., Majorin, A.: Gertwol: Ein System zur automatischen Wortformerkennung deutscher Wörter. Lingsoft, Inc. (1994)
Hopcroft, J.E., Ullman, J.D.: Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, Reading (1979)
Kaplan, R.M., Kay, M.: Regular Models of Phonological Rule Systems. Computational Linguistics 20(3), 331–378 (1994)
Karttunen, L.: Constructing Lexical Transducers. In: Proceedings of the Fifteenth International Conference on Computational Linguistics. Coling I 1994, Kyoto, Japan, pp. 406–411 (1994)
Klappenbach, R., Steinitz, W. (eds.): Wörterbuch der deutschen Gegenwartssprache (WDG). Akademie Verlag (1977)
Mohri, M.: Semiring Frameworks and Algorithms for Shortest-Distance Problems. Journal of Automata, Language, and Combinatorics 7(3), 321–350 (2002)
Pustejovsky, J., Hanks, P., Rumshisky, A.: Automated Induction of Sense in Context. In: 5th International Workshop on Linguistically Interpreted Corpora (LINC 2004), Coling (2004)
Riley, M.: The Design Principles of a Weighted Finite-State Transducer Library. Theoretical Computer Science 231, 17–32 (2000)
Sproat, R.: Finite-State Methods in Morphology, Text Analysis and the Analysis of Writing Systems. In: ROCLING X (1997)
Volk, M.: Choosing the right lemma when analysing German nouns. In: Multilinguale Corpora: Codierung, Strukturierung, Analyse, Jahrestagung der GLDV 11, Frankfurt, pp. 304–310 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Geyken, A., Hanneforth, T. (2006). TAGH: A Complete Morphology for German Based on Weighted Finite State Automata. In: Yli-Jyrä, A., Karttunen, L., Karhumäki, J. (eds) Finite-State Methods and Natural Language Processing. FSMNLP 2005. Lecture Notes in Computer Science(), vol 4002. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780885_7
Download citation
DOI: https://doi.org/10.1007/11780885_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35467-3
Online ISBN: 978-3-540-35469-7
eBook Packages: Computer ScienceComputer Science (R0)