Abstract
This paper presents an efficient dictionary structure of Part- of-Speech(POS) Tagging for Japanese/Korean by extending Aho and Corasick’s pattern matching machine. The proposed method is a simple and fast algorithm to find all possible morphemes in an input sentence and in a single pass, and it stores the relations of grammatical connec- tivity of neighboring morphemes into the output functions. Therefore, the proposed method can reduce both costs of the dictionary lookup and the connection check to find the most suitable word segmentation. From the simulation results, it turns out that the proposed method was 21.8% faster (CPU time) than the general approach using the trie structure. Concerning the number of candidates for checking connections, it was 27.4% less than that of the original morphological analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abe, M. Ooshima, Y., Yuura, K., and Takeichi, N.: A Kana-Kanji Translation System for Non-Segmented Input Sentences Based on Syntactic and Semantic Analysis. Proceedings of the 10th International Conference on Computational Linguistics (1986) pp.280-pp.285.
Aho, A.V., and Corasick, M.J.: Efficient String Matching: An Aid to Bibliographic Search. Communications of the ACM, Vol.18, No.6 (1975) pp.333–340
Akiba, T., Tokunaga, T., and Tanaka, H.: An Extension of LangLAB for Japanese Morphological Analysis. Proceedings of the International Workshop on Sharable Natural Language Resources (1994) pp.36–42
Aoe, J.: An Efficient Digital Search Algorithm by Using a Double-Array Structure. IEEE Transactions on Software Engineering, vol.SE-15 (1989) pp. 1066–1077
Aoe, J.: Computer Algorithms: Key Search Strategies. IEEE Computer Society Press (1991)
Aoe, J., Morimoto, K., Shishibori, M., and Park, K.H.: A Trie Compaction Algorithm for a Large Set of Keys. IEEE Transactions of Knowledge and Date Engineering, Vol.8, No.3 (1996) pp.476–491
Kaplan, S.J.: Designing a Portable Natural Language Database Query System. ACM Transactions on Database Systems, Vol.9, No.1 (1984) pp.1–29
Knuth D.E., Morris, J.H., and Pratt, V.R.: Fast pattern matching in strings. SIAM Journal on Computing, vol.6, No.2 (1977) pp. 323–350
Kurohashi, S., Nakamura, T., Matsumoto, Y., and Nagao, M.: Improvements of Japanese Morphological Analyzer JUMAN. Proceedings of the International Workshop on Sharable Natural Language Resources (1994) pp.22–28
Maruyama, H.: Backtracking-Free Dictionary Access Method for Japanese Morphological Analysis. Proceedings of the 15th International Conference on Computational Linguistics (1994) pp.208–213.
Mori, S.: High Speed Morphological Analysis using DFA. Technical report of IEICE of Japan, NLC96-23 (1996), pp.17–23 (in Japanese)
Sano, H., Kawada, R., and Hasimoto, M.: Morphological Grammar Rules: An Implementation for JUMAN. Proceedings of the International Workshop on Sharable Natural Language Resources (1994) pp.29–35
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ando, K., Lee, Th., Shishibori, M., Aoe, Ji. (2001). A Method of Pre-computing Connectivity Relations for Japanese/Korean POS Tagging. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2001. Lecture Notes in Computer Science, vol 2004. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44686-9_36
Download citation
DOI: https://doi.org/10.1007/3-540-44686-9_36
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41687-6
Online ISBN: 978-3-540-44686-6
eBook Packages: Springer Book Archive