TAGH: A Complete Morphology for German Based on Weighted Finite State Automata

Alexander Geyken²¹ &
Thomas Hanneforth²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4002))

Included in the following conference series:

International Workshop on Finite-State Methods and Natural Language Processing

739 Accesses
10 Citations

Abstract

TAGH is a system for automatic recognition of German word forms. It is based on a stem lexicon with allomorphs and a concatenative mechanism for inflection and word formation. Weighted FSA and a cost function are used in order to determine the correct segmentation of complex forms: the correct segmentation for a given compound is supposed to be the one with the least cost. TAGH is based on a large stem lexicon of almost 80.000 stems that was compiled within 5 years on the basis of large newspaper corpora and literary texts. The number of analyzable word forms is increased considerably by more than 1000 different rules for derivational and compositional word formation. The recognition rate of TAGH is more than 99% for modern newspaper text and approximately 98.5% for literary texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Building and Exploiting Lexical Databases for Morphological Parsing

CKMorph: a comprehensive morphological analyzer for Central Kurdish

Article 30 January 2023

A Structural Pattern Based Method for Automated Morphological Analysis of Word Forms in a Natural Language

Article 30 March 2016

References

Augst, G.: Lexikon zur Wortbildung. Morpheminventar Bd. 1-3. Tübingen (1975)
Google Scholar
Cormen, T.H., Leiserson, C.L., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2001)
MATH Google Scholar
Courtois, B.: Dictionnaires électroniques DELAF anglais et français. In: Leclère, C., Laporte, E., Piot, M., Silberztein, M. (eds.) Syntax, Lexis and Lexicon-Grammar. Papers in honour of Maurice Gross, Lingvisticae Investigationes Supplementa 24, pp. 113–125. Benjamins, Amsterdam-Philadelphia (2004)
Chapter Google Scholar
Geyken, A., Schrader, N.: LexikoNet - a lexical database based on type and role hierarchies. Technical Report BBAW/DWDS, Berlin (2005)
Google Scholar
Golan, J.S.: Semirings and Their Applications. Kluwer, Dordrecht (1999)
Book MATH Google Scholar
Haapalainen, M., Majorin, A.: Gertwol: Ein System zur automatischen Wortformerkennung deutscher Wörter. Lingsoft, Inc. (1994)
Google Scholar
Hopcroft, J.E., Ullman, J.D.: Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, Reading (1979)
MATH Google Scholar
Kaplan, R.M., Kay, M.: Regular Models of Phonological Rule Systems. Computational Linguistics 20(3), 331–378 (1994)
Google Scholar
Karttunen, L.: Constructing Lexical Transducers. In: Proceedings of the Fifteenth International Conference on Computational Linguistics. Coling I 1994, Kyoto, Japan, pp. 406–411 (1994)
Google Scholar
Klappenbach, R., Steinitz, W. (eds.): Wörterbuch der deutschen Gegenwartssprache (WDG). Akademie Verlag (1977)
Google Scholar
Mohri, M.: Semiring Frameworks and Algorithms for Shortest-Distance Problems. Journal of Automata, Language, and Combinatorics 7(3), 321–350 (2002)
MathSciNet MATH Google Scholar
Pustejovsky, J., Hanks, P., Rumshisky, A.: Automated Induction of Sense in Context. In: 5th International Workshop on Linguistically Interpreted Corpora (LINC 2004), Coling (2004)
Google Scholar
Riley, M.: The Design Principles of a Weighted Finite-State Transducer Library. Theoretical Computer Science 231, 17–32 (2000)
Article MathSciNet MATH Google Scholar
Sproat, R.: Finite-State Methods in Morphology, Text Analysis and the Analysis of Writing Systems. In: ROCLING X (1997)
Google Scholar
Volk, M.: Choosing the right lemma when analysing German nouns. In: Multilinguale Corpora: Codierung, Strukturierung, Analyse, Jahrestagung der GLDV 11, Frankfurt, pp. 304–310 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Berlin-Brandenburg Academy of Sciences, Germany
Alexander Geyken
University of Potsdam, Germany
Thomas Hanneforth

Authors

Alexander Geyken
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Hanneforth
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Language Research Service, CSC Scientific Computing Ltd., Finland
Anssi Yli-Jyrä
Palo Alto Research Center, Stanford University, P.O. Box, USA
Lauri Karttunen
Department of Mathematics and Turku Centre for Computer Science TUCS, University of Turku, 20014, Turku, Finland
Juhani Karhumäki

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Geyken, A., Hanneforth, T. (2006). TAGH: A Complete Morphology for German Based on Weighted Finite State Automata. In: Yli-Jyrä, A., Karttunen, L., Karhumäki, J. (eds) Finite-State Methods and Natural Language Processing. FSMNLP 2005. Lecture Notes in Computer Science(), vol 4002. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780885_7

Download citation

DOI: https://doi.org/10.1007/11780885_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35467-3
Online ISBN: 978-3-540-35469-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

TAGH: A Complete Morphology for German Based on Weighted Finite State Automata

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Building and Exploiting Lexical Databases for Morphological Parsing

CKMorph: a comprehensive morphological analyzer for Central Kurdish

A Structural Pattern Based Method for Automated Morphological Analysis of Word Forms in a Natural Language

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

TAGH: A Complete Morphology for German Based on Weighted Finite State Automata

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Building and Exploiting Lexical Databases for Morphological Parsing

CKMorph: a comprehensive morphological analyzer for Central Kurdish

A Structural Pattern Based Method for Automated Morphological Analysis of Word Forms in a Natural Language

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation