Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Machine transliteration survey

Published: 29 April 2011 Publication History

Abstract

Machine transliteration is the process of automatically transforming the script of a word from a source language to a target language, while preserving pronunciation. The development of algorithms specifically for machine transliteration began over a decade ago based on the phonetics of source and target languages, followed by approaches using statistical and language-specific methods. In this survey, we review the key methodologies introduced in the transliteration literature. The approaches are categorized based on the resources and algorithms used, and the effectiveness is compared.

References

[1]
AbdulJaleel, N. and Larkey, L. S. 2003. Statistical transliteration for English-Arabic cross-language information retrieval. In Proceedings of the Conference on Information and Knowledge Management. 139--146.
[2]
Al-Onaizan, Y., Curin, J., Jahr, M., Knight, K., Lafferty, J., Melamed, D., Och, F. J., Purdy, D., Smith, N., and Yarowsky, D. 1999. Statistical machine translation. Tech. rep., Johns Hopkins University.
[3]
Al-Onaizan, Y. and Knight, K. 2002a. Machine transliteration of names in Arabic text. In Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages. 1--13.
[4]
Al-Onaizan, Y. and Knight, K. 2002b. Translating named entities using monolingual and bilingual resources. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 400--408.
[5]
Alegria, I., Ezeiza, N., and Fernandez, I. 2006. Named entities translation based on comparable corpora. In Proceedings of the EACL Workshop on Multi- Word-Expressions in a Multilingual Context.
[6]
Aramaki, E., Imai, T., Miyo, K., and Ohe, K. 2007. Support vector Machine-based orthographic disambiguation. In Proceedings of the Conference on Theoretical and Methodological Issues in Machine Translation. 21--30.
[7]
Aramaki, E., Imai, T., Miyo, K., and Ohe, K. 2008. Orthographic disambiguation incorporating transliterated probability. In Proceedings of the International Joint Conference on Natural Language Processing. 48--55.
[8]
Arbabi, M., Fischthal, S. M., Cheng, V. C., and Bart, E. 1994. Algorithms for Arabic name transliteration. IBM J. Res. Develop. 38, 2, 183--194.
[9]
Bangalore, S., Bordel, G., and Riccardi, G. 2001. Computing consensus translation from multiple machine translation systems. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding. 351--354.
[10]
Banko, M. and Etzioni, O. 2008. The tradeoffs between open and traditional relation extraction. In Proceedings of 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 28--36.
[11]
Baum, L. E., Petrie, T., Soules, G., and Weiss, N. 1970. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat. 41, 1, 164--171.
[12]
Bilac, S. and Tanaka, H. 2004a. A hybrid back-transliteration system for Japanese. In Proceedings of the 20th International Conference on Computational Linguistics. 597--603.
[13]
Bilac, S. and Tanaka, H. 2004b. Improving back-transliteration by combining information sources. In Proceedings of 1st International Joint Conference on Natural Language Processing. Lecture Notes in Computer Science, vol. 3248, Springer, Berlin, 216--223.
[14]
Bilac, S. and Tanaka, H. 2005. Direct combination of spelling and pronunciation information for robust back-transliteration. In Proceedings of the Conferences on Computational Linguistics and Intelligent Text Processing. 413--424.
[15]
Breen, J. W. 1993. A Japanese electronic dictionary project (Part 1: The dictionary files). Tech. rep., Monash University.
[16]
Brill, E., Kacmarcik, G., and Brockett, C. 2001. Automatically harvesting Katakana-English term pairs from search engine query logs. In Proceedings of the 6th Natural Language Processing Pacific Rim Symposium. 393--399.
[17]
Brown, P. F., Cocke, J., Pietra, S. D., Pietra, V. J. D., Jelinek, F., Lafferty, J. D., Mercer, R. L., and Roossin, P. S. 1990. A statistical approach to machine translation. Comput. Linguist. 16, 2, 79--85.
[18]
Brown, P. F., Pietra, V. J. D., Pietra, S. A. D., and Mercer, R. L. 1993. The mathematics of statistical machine translation: Parameter estimation. Comput. Linguist. 19, 2, 263--311.
[19]
Chen, C. and Chen, H.-H. 2006. A high-accurate Chinese-English NE backward translation system combining both lexical information and web statistics. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL on Main Conference Poster Sessions. 81--88.
[20]
Chen, C.-H. and Hsu, C.-C. 2008. Boosted voting for confirming synonymous transliteration. In Proceedings of the IEEE International Conference on Information and Automation. 1337--1342.
[21]
Chen, H.-H., Lin, W.-C., Yang, C., and Lin, W.-H. 2006. Translating--transliterating named entities for multilingual information access. J. Amer. Soc. Inf. Sci. Technol. 57, 5, 645--659.
[22]
Collier, N. H. and Hirakawa, H. 1997. Acquisition of English-Japanese proper nouns from noisy-parallel newswire articles using Katakana matching. In Proceedings of the 3rd Natural Language Pacific Rim Symposium. 309--314.
[23]
Covington, M. A. 1996. An algorithm to align words for historical comparison. Comput. Linguist. 22, 4, 481--496.
[24]
Crystal, D. 2003. A Dictionary of Linguistics and Phonetics. Wiley-Blackwell.
[25]
Crystal, D. 2006. How Language Works: How Babies Babble, Words Change Meaning, and Languages Live or Die. The Overlook Press, New York.
[26]
Dagan, I. and Church, K. 1994. Termight: Identifying and translating technical terminology. In Proceedings of the 4th Conference on Applied Natural Language Processing. 34--40.
[27]
Dale, R. 2007. Language technology. Slides of HCSNet Summer School Course. Sydney.
[28]
Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Society 39, 1, 1--38.
[29]
Divay, M. and Vitale, A. J. 1997. Algorithms for grapheme-phoneme translation for English and French: Applications for database searches and speech synthesis. Comput. Linguist. 23, 4, 495--523.
[30]
Ekbal, A., Naskar, S. K., and Bandyopadhyay, S. 2006. A modified joint source-channel model for transliteration. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL on Main Conference Poster Sessions. 191--198.
[31]
Eppstein, D. 1998. Finding the k shortest paths. SIAM J. Comput. 28, 2, 652--673.
[32]
Freitag, D. and Khadivi, S. 2007. A sequence alignment model based on the averaged perceptron. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 238--247.
[33]
Fung, P. and McKeown, K. 1997. A technical word- and term-translation aid using noisy parallel corpora across language groups. Mach.Translation 12, 1-2, 53--87.
[34]
Fung, P. and Yee, L. Y. 1998. An IR approach for translating new words from nonparallel, comparable texts. In Proceedings of the 17th International Conference on Computational Linguistics. 414--420.
[35]
Gale, W. and Church, K. 1991. Identifying word correspondence in parallel texts. In Proceedings of the Workshop on Speech and Natural Language. 152--157.
[36]
Gales, M., Liu, X., Sinha, R., Woodland, P., Yu, K., Matsoukas, S., Ng, T., Nguyen, K., Nguyen, L., Gauvain, J.-L., Lamel, L., and Messaoudi, A. 2007. Speech recognition system combination for machine translation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, Los Alamitos, CA, 1277--1280.
[37]
Gao, W., Wong, K.-F., and Lam, W. 2004a. Improving transliteration with precise alignment of phoneme chunks and using contextual features. In Information Retrieval Technology, Asia Information Retrieval Symposium. Lecture Notes in Computer Science, vol. 3411, Springer, Berlin, 106--117.
[38]
Gao, W., Wong, K.-F., and Lam, W. 2004b. Phoneme-based transliteration of foreign names for OOV problem. In Proceedings of the 1st International Joint Conference on Natural Language Processing, Lecture Notes in Computer Science, vol. 3248, Springer, Berlin, 110--119.
[39]
Goldwasser, D. and Roth, D. 2008. Active sample selection for named entity transliteration. In Proceedings of the 46th Annual Meeting of the ACL on Main Conference Poster Sessions. 53--56.
[40]
Goto, I., Kato, N., Ehara, T., and Tanaka, H. 2004. Back transliteration from Japanese to English using target English context. In Proceedings of the 20th International Conference on Computational Linguistics. 827--833.
[41]
Hall, P. A. V. and Dowling, G. R. 1980. Approximate string matching. ACM Comput. Surv. 12, 4, 381--402.
[42]
Henderson, J. C. and Brill, E. 1999. Exploiting diversity in natural language processing: Combining parsers. In Proceedings of the 4th Conference on Empirical Methods in Natural Language Processing. 187--194.
[43]
Hermjakob, U., Knight, K., and III, H. D. 2008. Name translation in statistical machine translation - learning when to transliterate. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 389--397.
[44]
Huang, F. 2005. Cluster-specific named entity transliteration. In Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing.
[45]
Huang, F. and Papineni, K. 2007. Hierarchical system combination for machine translation. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 277--286.
[46]
Huang, F. and Vogel, S. 2002. Improved named entity translation and bilingual named entity extraction. In Proceedings of the 4th IEEE International Conference on Multimodal Interfaces. IEEE, Los Alamitos, CA, 253--258.
[47]
Huang, F., Vogel, S., and Waibel, A. 2005. Clustering and classifying person names by origin. In Proceedings of the National Conference on Artificial Intelligence. 1056--1061.
[48]
Hundt, M. 2006. Corpus Linguistics and the Web (Language and Computers 59). Editions Rodopi BV.
[49]
Jeong, K. S., Myaeng, S. H., Lee, J. S., and Choi, K. S. 1999. Automatic identification and back-transliteration of foreign words for information retrieval. Inf. Process. Manage. 35, 4, 523--540.
[50]
Jiang, L., Zhou, M., Chien, L.-F., and Niu, C. 2007. Named entity translation with web mining and transliteration. In Proceedings of the 20th International Joint Conference on Artificial Intelligence. 1629--1634.
[51]
Jung, S. Y., Hong, S. L., and Paek, E. 2000. An English to Korean transliteration model of extended Markov window. In Proceedings of the 18th Conference on Computational linguistics. 383--389.
[52]
Jurafsky, D. and Martin, J. 2008. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice Hall, Englewood Cliffs, NJ.
[53]
Justeson, J. and Katz, S. 1995. Technical terminology: Some linguistic properties and an algorithm for identification in text. Natural Lang. Eng. 1, 1, 9--27.
[54]
Kang, B.-J. and Choi, K.-S. 2000. Automatic transliteration and back-transliteration by decision tree learning. In Proceedings of the Conference on Language Resources and Evaluation. 1135--1411.
[55]
Kang, I.-H. and Kim, G. 2000. English-to-Korean transliteration using multiple unbounded overlapping phoneme chunks. In Proceedings of the 18th Conference on Computational Linguistics. 418--424.
[56]
Kantor, P. B. and Voorhees, E. M. 2000. The TREC-5 confusion track: Comparing retrieval methods for scanned text. Inf. Retrieval 2, 2-3, 165--176.
[57]
Karimi, S. 2008. Machine transliteration of proper names between English and Persian. Ph.D. dissertation, RMIT University, Melbourne.
[58]
Karimi, S., Scholer, F., and Turpin, A. 2007. Collapsed consonant and vowel models: New approaches for English-Persian transliteration and back-transliteration. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. 648--655.
[59]
Karimi, S., Turpin, A., and Scholer, F. 2006. English to Persian transliteration. In String Processing and Information Retrieval. Lecture Notes in Computer Science, vol. 4209, Springer, Berlin, 255--266.
[60]
Karimi, S., Turpin, A., and Scholer, F. 2007. Corpus effects on the evaluation of automated transliteration systems. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. 640--647.
[61]
Kashani, M., Popowich, F., and Sarkar, A. 2007. Automatic transliteration of proper nouns from Arabic to English. In Proceedings of the 2nd Workshop on Computational Approaches to Arabic Script-based Languages. 81--87.
[62]
Keskustalo, H., Pirkola, A., Visala, K., Leppänen, E., and Järvelin, K. 2003. Non-adjacent digrams improve matching of cross-lingual spelling variants. In String Processing and Information Retrieval. Lecture Notes in Computer Science, vol. 2857, Springer, Berlin, 252--265.
[63]
Klementiev, A. and Roth, D. 2006. Weakly supervised named entity transliteration and discovery from multilingual comparable corpora. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL. 817--824.
[64]
Knight, K. 1999. A Statistical MT Tutorial Workbook. www.isi.edu/natural_language/mt/wkbk.pdf.
[65]
Knight, K. and Graehl, J. 1997. Machine transliteration. In Proceedings of the 8th Conference on European Chapter of the Association for Computational Linguistics. 128--135.
[66]
Knight, K. and Graehl, J. 1998. Machine transliteration. Comput. Linguistics. 24, 4, 599--612.
[67]
Kuo, J.-S., Li, H., and Lin, C.-L. 2009. Harvesting regional transliteration variants with guided search. In Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy. 133--144.
[68]
Kuo, J.-S., Li, H., and Yang, Y.-K. 2007. A phonetic similarity model for automatic extraction of transliteration pairs. ACM Trans. Asian Lang. Inf. Process. 6, 2.
[69]
Kuo, J.-S., Li, H., and Yang, Y.-K. 2008. Active learning for constructing transliteration lexicons from the Web. J. Amer. Soc. Inf. Sci. Technol. 59, 1, 126--135.
[70]
Kuo, J.-S. and Yang, Y.-K. 2004. Constructing transliteration lexicons from web corpora. In Proceedings of the ACL. (Interactive poster and demonstration sessions 3).
[71]
Kuo, J.-S. and Yang, Y.-K. 2005. Incorporating pronunciation variation into extraction of transliterated-term pairs from Web corpora. In Proceedings of the International Conference on Chinese Computing. 131--138.
[72]
Kupiec, J. 1993. An algorithm for finding noun phrase correspondences in bilingual corpora. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics. 17--22.
[73]
Lam, W., Chan, S.-K., and Huang, R. 2007. Named entity translation matching and learning: with application for mining unseen translations. ACM Trans. Inf. Syst. 25, 1, Article 2.
[74]
Lam, W., Huang, R., and Cheung, P.-S. 2004. Learning phonetic similarity for matching named entity translations and mining new translations. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 289--296.
[75]
Larkey, L. S. and Croft, W. B. 1996. Combining classifiers in text categorization. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 289--297.
[76]
Lee, C.-J. and Chang, J. S. 2003. Acquisition of English-Chinese transliterated word pairs from parallel-aligned texts using a statistical machine transliteration model. In Proceedings of the HLT-NAACL Workshop on Building and Using Parallel Texts. 96--103.
[77]
Lee, C.-J., Chang, J. S., and Jang, J.-S. R. 2006a. Alignment of bilingual named entities in parallel corpora using statistical models and multiple knowledge sources. ACM Trans. Asian Lang. Inf. Process. 5, 2, 121--145.
[78]
Lee, C.-J., Chang, J. S., and Jang, J.-S. R. 2006b. Extraction of transliteration pairs from parallel corpora using a statistical transliteration model. Inf. Sci. 176, 1, 67--90.
[79]
Levenshtein, V. I. 1965. Binary codes capable of correcting deletions, insertions and reversals. Doklady Akademii Nauk SSSR 163, 4, 845--848.
[80]
Li, H., Kuo, J.-S., Su, J., and Lin, C.-L. 2008. Mining live transliterations using incremental learning algorithms. Int. J. Comput. Process. Oriental Lang. 21, 2, 183--203.
[81]
Li, H., Sim, K. C., Kuo, J.-S., and Dong, M. 2007. Semantic transliteration of personal names. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 120--127.
[82]
Li, H., Zhang, M., and Su, J. 2004. A joint source-channel model for machine transliteration. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics. 159--166.
[83]
Lin, W.-H. and Chen, H.-H. 2002. Backward machine transliteration by learning phonetic similarity. In Proceedings of the 6th Conference on Natural Language Learning. 1--7.
[84]
Lindén, K. 2005. Multilingual modeling of cross-lingual spelling variants. Inf. Retrieval 9, 3, 295--310.
[85]
Llitjos, A. F. and Black, A. W. 2001. Knowledge of language origin improves pronunciation accuracy of proper names. In Proceedings of the 7th European Conference on Speech Communication and Technology. 1919--1922.
[86]
Loponen, A., Pirkola, A., Järvelin, K., and Keskustalo, H. 2008. A novel implementation of the FITE-TRT translation method. In Proceedings of the 30th European Conference on IR Research. 138--149.
[87]
Lu, W.-H., Chien, L.-F., and Lee, H.-J. 2002. Translation of web queries using anchor text mining. ACM Trans. Asian Lang. Inf. Process. 1, 2, 159--172.
[88]
Malik, M. G. A. 2006. Punjabi machine transliteration. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL. 1137--1144.
[89]
Masuyama, T. and Nakagawa, H. 2005. Web-based acquisition of Japanese Katakana variants. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 338--344.
[90]
Matusov, E., Ueffing, N., and Ney, H. 2006. Computing consensus translation for multiple machine translation systems using enhanced hypothesis alignment. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics. 33--40.
[91]
McEnery, T. and Wilson, A. 1996. Corpus Linguistics. Edinburgh University Press.
[92]
Melamed, I. D. 2000. Models of translational equivalence among words. Comput. Linguistics 26, 2, 221--249.
[93]
Meng, H., Lo, W.-K., Chen, B., and Tang, T. 2001. Generate phonetic cognates to handle name entities in English-Chinese cross-language spoken document retrieval. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding. Los Alamitos, CA, 311--314.
[94]
Nagata, M., Saito, T., and Suzuki, K. 2001. Using the web as a bilingual dictionary. In Proceedings of the Workshop on Data-driven Methods in Machine Translation. 1--8.
[95]
Nomoto, T. 2004. Multi-engine machine translation with voted language model. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, 494--501.
[96]
Nowson, S. and Dale, R. 2007. Charting democracy across parsers. In Proceedings of the Australasian Language Technology Workshop. 75--82.
[97]
Och, F. J. and Ney, H. 2003. A systematic comparison of various statistical alignment models. Comput. Linguistics 29, 1, 19--51.
[98]
Oh, J.-H. and Choi, K.-S. 2002. An English-Korean transliteration model using pronunciation and contextual rules. In Proceedings of the 19th International Conference on Computational Linguistics.
[99]
Oh, J.-H. and Choi, K.-S. 2005. Machine learning based English-to-Korean transliteration using grapheme and phoneme information. IEICE Trans. Inf. Syst. E88-D, 7, 1737--1748.
[100]
Oh, J.-H. and Isahara, H. 2006. Mining the web for transliteration lexicons: Joint-validation approach. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. 254--261.
[101]
Oh, J.-H. and Choi, K.-S. 2006a. An ensemble of transliteration models for information retrieval. Inf. Process. Manage. 42, 4, 980--1002.
[102]
Oh, J.-H. and Choi, K.-S. 2006b. Recognizing transliteration equivalents for enriching domain-specific thesauri. In Proceedings of the 3rd International WordNet Conference. 231--237.
[103]
Oh, J.-H., Choi, K.-S., and Isahara, H. 2006a. A hybrid model for extracting transliteration equivalents from parallel corpora. In Proceedings of the 9th International Conference on Text, Speech and Dialogue. 119--126.
[104]
Oh, J.-H., Choi, K.-S., and Isahara, H. 2006b. Improving machine transliteration performance by using multiple transliteration models. In Proceedings of the 21st International Conference on Computer Processing of Oriental Languages. 85--96.
[105]
Oh, J.-H., Choi, K.-S., and Isahara, H. 2006c. A machine transliteration model based on correspondence between graphemes and phonemes. ACM Trans. Asian Lang. Inf. Process. 5, 3, 185--208.
[106]
Oh, J.-H. and Isahara, H. 2007a. Machine transliteration using multiple transliteration engines and hypothesis re-ranking. In Proceedings of the 11th Machine Translation Summit. 353--360.
[107]
Oh, J.-H. and Isahara, H. 2007b. Validating transliteration hypotheses using the web: Web counts vs. web mining. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. 267--270.
[108]
Paczolay, D., Felföldi, L., and Kocsor, A. 2006. Classifier combination schemes in speech impediment therapy systems. Acta Cybernetica 17, 2.
[109]
Pearson, J. 1998. Terms in Context. John Benjamins Publishing Company.
[110]
Pedersen, T. 2000. A simple approach to building ensembles of na ve Bayesian classifiers for word sense disambiguation. In Proceedings of the 1st Conference on North American Chapter of the Association for Computational Linguistics. 63--69.
[111]
Pervouchine, V., Li, H., and Lin, B. 2009. Transliteration alignment. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 136--144.
[112]
Pirkola, A., Toivonen, J., Keskustalo, H., and Järvelin, K. 2006. FITETRT: A high quality translation technique for OOV words. In Proceedings of the ACM Symposium on Applied Computing. 1043--1049.
[113]
Pirkola, A., Toivonen, J., Keskustalo, H., and Järvelin, K. 2007. Frequency-based identification of correct translation equivalents (FITE) obtained through transformation rules. ACM Trans. Inf. Syst. 26, 1.
[114]
Rapp, R. 1995. Identifying word translations in non-parallel texts. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. 320--322.
[115]
Rodgers, J. L. and Nicewander, W. A. 1988. Thirteen ways to look at the correlation coefficient. Amer. Statistician 42, 1, 59--66.
[116]
Rosti, A.-V., Ayan, N. F., Xiang, B., Matsoukas, S., Schwartz, R., and Dorr, B. 2007. Combining outputs from multiple machine translation systems. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics; Main Conference. 228--235.
[117]
Roth, D. and Zelenko, D. 1998. Part of speech tagging using a network of linear separators. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics. 1136--1142.
[118]
Sherif, T. and Kondrak, G. 2007a. Bootstrapping a stochastic transducer for Arabic-English transliteration extraction. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 864--871.
[119]
Sherif, T. and Kondrak, G. 2007b. Substring-based transliteration. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 944--951.
[120]
Smadja, F. 1992. How to compile a bilingual collocational lexicon automatically. In Proceedings of the AAAI Workshop on Statistically-Based NLP Techniques.
[121]
Sproat, R., Tao, T., and Zhai, C. X. 2006. Named entity transliteration with comparable corpora. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL. 73--80.
[122]
Stalls, B. and Knight, K. 1998. Translating names and technical terms in Arabic text. In Proceedings of the COLING/ACL Workshop on Computational Approaches to Semitic Languages. 34--41.
[123]
Talvensaari, T., Järvelin, K., and Juhola, M. 2007. Creating and exploiting a comparable corpus in cross-language information retrieval. ACM Trans. Inf. Syst. 25, 1.
[124]
Tanaka, K. and Iwasaki, H. 1996. Extraction of lexical translations from non-aligned corpora. InProceedings of the 16th Conference on Computational Linguistics. 580--585.
[125]
Tao, T., Yoon, S.-Y., Fister, A., Sproat, R., and Zhai, C. 2006. Unsupervised named entity transliteration using temporal and phonetic correlation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 22--23.
[126]
Toivonen, J., Pirkola, A., Keskustalo, H., Visala, K., and Järvelin, K. 2005. Translating cross-lingual spelling variants using transformation rules. Inf. Process. Manage. 41, 4, 859--872.
[127]
Toutanova, K., Ilhan, H. T., and Manning, C. D. 2002. Extensions to HMM-based statistical word alignment models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 87--94.
[128]
Tsuji, K., Daille, B., and Kageura, K. 2002. Extracting French-Japanese word pairs from bilingual corpora based on transliteration rules. In Proceedings of the 3rd International Conference on Language Resources and Evaluation. 499--502.
[129]
Van der Eijk, P. 1993. Automating the acquisition of bilingual terminology. In Proceedings of the 6th Conference of the European Chapter of the Association for Computational Linguistics. 113--119.
[130]
van Halteren, H., Zavrel, J., and Daelemans, W. 1998. Improving data-driven word class tagging by system combination. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics. 491--497.
[131]
Virga, P. and Khudanpur, S. 2003a. Transliteration of proper names in cross-language applications. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 365--366.
[132]
Virga, P. and Khudanpur, S. 2003b. Transliteration of proper names in cross-lingual information retrieval. In Proceedings of the ACL Workshop on Multilingual and Mixed-Language Named Entity Recognition. 57--64.
[133]
Vogel, S., Ney, H., and Tillmann, C. 1996. HMM-based word alignment in statistical translation. In Proceedings of the 16th Conference on Computational Linguistics. 836--841.
[134]
Wan, S. and Verspoor, C. 1998. Automatic English-Chinese name transliteration for development of multilingual resources. In Proceedings of the 17th International Conference on Computational Linguistics. 1352--1356.
[135]
Wu, J.-C. and Chang, J. S. 2007. Learning to find English to Chinese transliterations on the web. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 996--1004.
[136]
Xu, L., Fujii, A., and Ishikawa, T. 2006. Modeling impression in probabilistic transliteration into Chinese. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 242--249.
[137]
You, J.-L., Chen, Y.-N., Chu, M., Soong, F., and Wang, J.-L. 2008. Identifying language origin of named entity with multiple information sources. IEEE Trans. Audio, Speech Lang. Process. 16, 6, 1077--1086.
[138]
Zelenko, D. and Aone, C. 2006. Discriminative methods for transliteration. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 612--617.
[139]
Zhang, M., Li, H., and Su, J. 2004. Direct orthographical mapping for machine transliteration. In Proceedings of the 20th International Conference on Computational Linguistics. 716.
[140]
Zhang, Y., Huang, F., and Vogel, S. 2005. Mining translations of OOV terms from the web through cross-lingual query expansion. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 669--670.

Cited By

View all
  • (2024)A Novel Framework for Multilingual Script Detection and Pattern Analysis in Mixed Script QueriesInternational Journal of Experimental Research and Review10.52756/ijerr.2024.v43spl.01643(214-228)Online publication date: 30-Sep-2024
  • (2024)Exploring Web-Based Translation Resources Applied to Hindi-English Cross-Lingual Information RetrievalACM Transactions on Asian and Low-Resource Language Information Processing10.1145/356901023:1(1-19)Online publication date: 15-Jan-2024
  • (2024)Improving neural machine translation by integrating transliteration for low-resource English–Assamese languageNatural Language Processing10.1017/nlp.2024.20(1-22)Online publication date: 27-May-2024
  • Show More Cited By

Index Terms

  1. Machine transliteration survey

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Computing Surveys
    ACM Computing Surveys  Volume 43, Issue 3
    April 2011
    466 pages
    ISSN:0360-0300
    EISSN:1557-7341
    DOI:10.1145/1922649
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 April 2011
    Accepted: 01 September 2009
    Revised: 01 September 2009
    Received: 01 December 2008
    Published in CSUR Volume 43, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Automatic translation
    2. machine learning
    3. machine transliteration
    4. natural language processing
    5. transliteration evaluation

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)35
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 29 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Novel Framework for Multilingual Script Detection and Pattern Analysis in Mixed Script QueriesInternational Journal of Experimental Research and Review10.52756/ijerr.2024.v43spl.01643(214-228)Online publication date: 30-Sep-2024
    • (2024)Exploring Web-Based Translation Resources Applied to Hindi-English Cross-Lingual Information RetrievalACM Transactions on Asian and Low-Resource Language Information Processing10.1145/356901023:1(1-19)Online publication date: 15-Jan-2024
    • (2024)Improving neural machine translation by integrating transliteration for low-resource English–Assamese languageNatural Language Processing10.1017/nlp.2024.20(1-22)Online publication date: 27-May-2024
    • (2024)A Romanization Method for the Bengali Language with Efficient Encoding SchemeProceedings of the 2nd International Conference on Big Data, IoT and Machine Learning10.1007/978-981-99-8937-9_41(605-619)Online publication date: 30-Mar-2024
    • (2024)Spoken Language Translation in Low‐Resource LanguageAutomatic Speech Recognition and Translation for Low Resource Languages10.1002/9781394214624.ch20(445-459)Online publication date: 29-Mar-2024
    • (2023)Translating the List of Participants in the 2020 Tokyo Olympic Games into Japanese2020 東京オリンピック参加者名簿の翻訳Journal of Natural Language Processing10.5715/jnlp.30.74830:2(748-772)Online publication date: 2023
    • (2023)Cross-language information retrieval for poetry form of literature-based on machine transliteration using CNNJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-22359145:2(3025-3037)Online publication date: 1-Jan-2023
    • (2023)Context-aware Transliteration of Romanized South Asian LanguagesComputational Linguistics10.1162/coli_a_0051050:2(475-534)Online publication date: 1-Jun-2023
    • (2023)Influence of Contextual Information on Bengali-English Forward and Backward Transliteration Using Binary Coding2023 1st International Conference on Optimization Techniques for Learning (ICOTL)10.1109/ICOTL59758.2023.10435050(1-6)Online publication date: 7-Dec-2023
    • (2023)Different Models of Transliteration - A Comprehensive Review2023 International Conference on Innovative Data Communication Technologies and Application (ICIDCA)10.1109/ICIDCA56705.2023.10099632(356-363)Online publication date: 14-Mar-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media