Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1644993.1645053acmotherconferencesArticle/Chapter ViewAbstractPublication PagesichitConference Proceedingsconference-collections
research-article

Hybrid word sense disambiguation using language resources for transliteration of Arabic numerals in Korean

Published: 27 August 2009 Publication History

Abstract

The high frequency of the use of Arabic numerals in informative texts and their multiple senses and readings deteriorate the accuracy of TTS systems. This paper presents a hybrid word sense disambiguation method exploiting a tagged corpus and a Korean wordnet, KorLex 1.0, for the correct and efficient conversion of Arabic numerals into Korean phonemes according to their senses. Individual contextual features are extracted from the tagged corpus and are grouped in order to determine the sense of Arabic numerals. Least upper bound synsets among common hypernyms of contextual features were obtained from the KorLex hierarchy, and they were used as semantic categories of the contextual features of Arabic numerals. The semantic classes were trained to classify the meaning and the reading of Arabic numerals using decision tree and to compose grapheme-to-phoneme rules for an automatic transliteration system for Arabic numerals. The proposed system outperforms the customized TTS systems by 3.9%--20.3%.

References

[1]
Agirre, E. and Rigau, G. 1996. Word sense disambiguation using conceptual density (Paper presented at the COLING1996)
[2]
Allan, K. 1997. Classifiers. Language, 53(2), 285--311
[3]
Chae, W. 1983. A study on numerals and numeral classifier constructions in Korean. Linguistics Study, 19(1), 19--34
[4]
Daelmans, W. and Bosch, A. 1994. A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion. (Paper presented at ESCA-IEEE Speech Synthesis Conference)
[5]
Fellbaum, C. (Ed.) 1998. WordNet - An electronic lexical database. (Cambridge, MA: MIT Press)
[6]
Francis, W. and Kučera, H. 1982. Frequency Analysis of English Usage: Lexicon and Grammar. (Boston: Houghton Mifflin)
[7]
Gale, W. A., Church, K. W., and Yarowsky, D. 1992. A method for disambiguating word senses in a large corpus. Computers and the Humanities, 26, 415--439
[8]
Hausser, R. 1999. Foundations of Computational Linguistics: Man---Machine Communication in Natural Language. (Berlin, Heidelberg: Springer-Verlag Germany)
[9]
Hwang, S., and Yoon, A. 2005. Semantic feature inheritance revisited for building Korean lexical semantic network (1): A case study using sex feature. Korean Linguistics, 29, 309--338
[10]
Jung, Y., Yoon, A. and Kwon, H. 2006. Disambiguation based on Wordnet for transliteration of Arabic numerals for Korean TTS. LNCS, 3878, 366--377
[11]
Kim, J., Choi, H., and Oak, C. 2003. Disambiguation model of homographs based on statistic using weight. Korean Information Science: Softwares and Applications, 30(11), 1112--1123
[12]
Kwon, H., Kang, M., and Choi, S. 2004. Stochastic Korean word-spacing with smoothing using Korean spelling checker. Computer Processing of Oriental Languages, 17, 239--252
[13]
Leacock, C. and Chodorow, M. 1998. Combining local context and WordNet similarity for word sense identification. (In Fellbaum, C. (Ed.), WordNet - An electronic lexical database (pp. 265--283), Cambridge, MA: MIT Press.)
[14]
Lee, E., Lira, S., and Kwon, H. 2004. Output of Korean translation of WordNet 2.0. (Paper presented at the 2nd Workshop on Knowledge Information Processing and Ontology, Daejeon, S. Korea)
[15]
Lyons, J. 1977. Semantics. 2 vols. New York: Cambridge University Press
[16]
Manning, C. D. and Schutze, H. 2001. Foundations of statistical natural language processing. (Cambridge, Messachu-setts: MIT Press)
[17]
Miller, G. A., Beckwith, R., Fellbaum, C., and Gross, D., 1993. Introduction to WordNet: An On-line Lexical Problems.
[18]
Mitchell, T. M. 1997. Machine learning. (New York: McGraw-Hill)
[19]
Olinsky, C. and Black. A. W. 2000. Non-standard word and homograph resolution for Asian language text analysis. (Paper presented at the ICSLP2000, Beijing, China.)
[20]
Quinlan, J. R. 1993. C4.5: programs for machine learning. (San Mateo: Morgan Kaufmann Publishers)
[21]
Resnik, P. 1995. Using information content to evaluate semantic similarity. (Paper presented at the 14th International Conference of Artificial Intelligence)
[22]
Sproat, R., Black, A. W., Chen, S., Kumar, S., Ostendorf, M., and Richards, C. 2001. Normalization of non-standard words. Computer Speech and Language, 15(3), 287--333
[23]
Tetschner, W. 2004. Text-to-speech - Naturalness and accuracy, ASR News, Retrieved September 28, 2004, from http://www.asrnews.com/ttsap/ttspapl 1.htm
[24]
Touretzky, D. S. 1986. The Mathematics of Inheritance Systems. Los Altos, Calif.: Morgan Kaufmann.
[25]
Yarowsky, D. 1992. Word sense disambiguation using statistical models of Roget's categories trained on large corpora. (Paper presented at COLING1992)
[26]
Yarowsky, D. 1996. Homograph disambiguation in text-to-speech synthesis. Progress in Speech Synthesis, 157--172
[27]
Yoon, A., Kwon, H., and Lee, M. 2003 An automatic transcription system for Arabic numerals in Korean (Paper presented at the 2003 International Conference of Natural Language Processing and Knowledge Engineering,)
[28]
Yu, M. S. et al. 2003. Disambiguating the senses of non-text symbols for Mandarin TTS systems with a three-layer classifier. Speech communication, 39(3/4), 191--229

Cited By

View all
  • (2014)Multilingual number image interpreter2014 IEEE International Conference on Computational Intelligence and Computing Research10.1109/ICCIC.2014.7238425(1-6)Online publication date: Dec-2014

Index Terms

  1. Hybrid word sense disambiguation using language resources for transliteration of Arabic numerals in Korean

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        ICHIT '09: Proceedings of the 2009 International Conference on Hybrid Information Technology
        August 2009
        687 pages
        ISBN:9781605586625
        DOI:10.1145/1644993
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 27 August 2009

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Arabic numeral
        2. TTS
        3. word sense disambiguation

        Qualifiers

        • Research-article

        Funding Sources

        Conference

        ICHIT '09

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)1
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 18 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2014)Multilingual number image interpreter2014 IEEE International Conference on Computational Intelligence and Computing Research10.1109/ICCIC.2014.7238425(1-6)Online publication date: Dec-2014

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media