Abstract
An algorithm has been developed to decompose compound words in Afrikaans. This data driven technique recursively uses an extensive list of Afrikaans words in the decompounding process. String fitting from the beginning and end of words forms the basis of the process, while sublists containing short words that may occur only at the beginning or end of words, and lists of prefixes and suffixes are utilised. Applying the algorithm to the original lexicon of 182 433 words resulted in accuracy of 90,2%, precision of 99,9% and recall of 83,6%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Adda-Decker, M., Adda, G., Lamel, L.: Investigating text normalization and pronunciation variants for German broadcast transcription. In: ICSLP, pp. 266–269 (2000)
Alfonseca, E., Bilac, S., Pharies, S.: Decompounding query keywords from compounding languages. In: ACL 2008: HLT, pp. 253–256 (2008)
Alfonseca, E., Bilac, S., Pharies, S.: German Decompounding in a Difficult Corpus. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 128–139. Springer, Heidelberg (2008)
Brown, R.D.: Corpus-driven splitting of compound words. In: TMI-2002, pp. 616–624. ACL (2002)
Fick, M., Swanepoel, C.J.: Afrikaanse Lettergreepverdelingspatrone. Suid-Afrikaanse Tydskrif vir Natuurwetenskap en Tegnologie (2010)
Fritzinger, F., Fraser, A.: How to Avoid Burning Ducks: Combining Linguistic Analysis and Corpus Statistics for German Compound Processing. In: MATR, pp. 224–234. ACL (2010)
Koehn, P., Arun, A., Hoang, H.: Towards better Machine Translation Quality for German–English Language Pairs. In: Third Workshop on Statistical Machine Translation, pp. 139–142. ACL (2008)
Koehn, P., Knight, K.: Empirical methods for compound splitting. In: EACL, 187–193. ACL (2003)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Monz, C., De Rijke, M.: Shallow Morphological Analysis in Monolingual Information Retrieval for Dutch, German, and Italian. In: Peters, C.A., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 262–277. Springer, Heidelberg (2002)
Pilon, S., Puttkammer, M.J., Van Huyssteen, G.B.: The development of a hyphenator and compound analyser for Afrikaans. Literator (2008)
Popović, M., Stein, D., Ney, H.: Statistical machine translation of German compound words. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 616–624. Springer, Heidelberg (2006)
Schiller, A.: German compound analysis with wfsc. In: Finite State Methods and NLP (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fick, T., Swanepoel, C. (2011). Recursive Decompounding in Afrikaans. In: Habernal, I., Matoušek, V. (eds) Text, Speech and Dialogue. TSD 2011. Lecture Notes in Computer Science(), vol 6836. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23538-2_32
Download citation
DOI: https://doi.org/10.1007/978-3-642-23538-2_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23537-5
Online ISBN: 978-3-642-23538-2
eBook Packages: Computer ScienceComputer Science (R0)