research-article

Machine transliteration survey

Authors:

Sarvnaz Karimi,

Andrew TurpinAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 43, Issue 3

Article No.: 17, Pages 1 - 46

https://doi.org/10.1145/1922649.1922654

Published: 29 April 2011 Publication History

Abstract

Machine transliteration is the process of automatically transforming the script of a word from a source language to a target language, while preserving pronunciation. The development of algorithms specifically for machine transliteration began over a decade ago based on the phonetics of source and target languages, followed by approaches using statistical and language-specific methods. In this survey, we review the key methodologies introduced in the transliteration literature. The approaches are categorized based on the resources and algorithms used, and the effectiveness is compared.

References

[1]

AbdulJaleel, N. and Larkey, L. S. 2003. Statistical transliteration for English-Arabic cross-language information retrieval. In Proceedings of the Conference on Information and Knowledge Management. 139--146.

Digital Library

[2]

Al-Onaizan, Y., Curin, J., Jahr, M., Knight, K., Lafferty, J., Melamed, D., Och, F. J., Purdy, D., Smith, N., and Yarowsky, D. 1999. Statistical machine translation. Tech. rep., Johns Hopkins University.

[3]

Al-Onaizan, Y. and Knight, K. 2002a. Machine transliteration of names in Arabic text. In Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages. 1--13.

Digital Library

[4]

Al-Onaizan, Y. and Knight, K. 2002b. Translating named entities using monolingual and bilingual resources. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 400--408.

Digital Library

[5]

Alegria, I., Ezeiza, N., and Fernandez, I. 2006. Named entities translation based on comparable corpora. In Proceedings of the EACL Workshop on Multi- Word-Expressions in a Multilingual Context.

[6]

Aramaki, E., Imai, T., Miyo, K., and Ohe, K. 2007. Support vector Machine-based orthographic disambiguation. In Proceedings of the Conference on Theoretical and Methodological Issues in Machine Translation. 21--30.

[7]

Aramaki, E., Imai, T., Miyo, K., and Ohe, K. 2008. Orthographic disambiguation incorporating transliterated probability. In Proceedings of the International Joint Conference on Natural Language Processing. 48--55.

[8]

Arbabi, M., Fischthal, S. M., Cheng, V. C., and Bart, E. 1994. Algorithms for Arabic name transliteration. IBM J. Res. Develop. 38, 2, 183--194.

Digital Library

[9]

Bangalore, S., Bordel, G., and Riccardi, G. 2001. Computing consensus translation from multiple machine translation systems. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding. 351--354.

[10]

Banko, M. and Etzioni, O. 2008. The tradeoffs between open and traditional relation extraction. In Proceedings of 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 28--36.

[11]

Baum, L. E., Petrie, T., Soules, G., and Weiss, N. 1970. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat. 41, 1, 164--171.

[12]

Bilac, S. and Tanaka, H. 2004a. A hybrid back-transliteration system for Japanese. In Proceedings of the 20th International Conference on Computational Linguistics. 597--603.

Digital Library

[13]

Bilac, S. and Tanaka, H. 2004b. Improving back-transliteration by combining information sources. In Proceedings of 1st International Joint Conference on Natural Language Processing. Lecture Notes in Computer Science, vol. 3248, Springer, Berlin, 216--223.

Digital Library

[14]

Bilac, S. and Tanaka, H. 2005. Direct combination of spelling and pronunciation information for robust back-transliteration. In Proceedings of the Conferences on Computational Linguistics and Intelligent Text Processing. 413--424.

Digital Library

[15]

Breen, J. W. 1993. A Japanese electronic dictionary project (Part 1: The dictionary files). Tech. rep., Monash University.

[16]

Brill, E., Kacmarcik, G., and Brockett, C. 2001. Automatically harvesting Katakana-English term pairs from search engine query logs. In Proceedings of the 6th Natural Language Processing Pacific Rim Symposium. 393--399.

[17]

Brown, P. F., Cocke, J., Pietra, S. D., Pietra, V. J. D., Jelinek, F., Lafferty, J. D., Mercer, R. L., and Roossin, P. S. 1990. A statistical approach to machine translation. Comput. Linguist. 16, 2, 79--85.

Digital Library

[18]

Brown, P. F., Pietra, V. J. D., Pietra, S. A. D., and Mercer, R. L. 1993. The mathematics of statistical machine translation: Parameter estimation. Comput. Linguist. 19, 2, 263--311.

Digital Library

[19]

Chen, C. and Chen, H.-H. 2006. A high-accurate Chinese-English NE backward translation system combining both lexical information and web statistics. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL on Main Conference Poster Sessions. 81--88.

Digital Library

[20]

Chen, C.-H. and Hsu, C.-C. 2008. Boosted voting for confirming synonymous transliteration. In Proceedings of the IEEE International Conference on Information and Automation. 1337--1342.

[21]

Chen, H.-H., Lin, W.-C., Yang, C., and Lin, W.-H. 2006. Translating--transliterating named entities for multilingual information access. J. Amer. Soc. Inf. Sci. Technol. 57, 5, 645--659.

Digital Library

[22]

Collier, N. H. and Hirakawa, H. 1997. Acquisition of English-Japanese proper nouns from noisy-parallel newswire articles using Katakana matching. In Proceedings of the 3rd Natural Language Pacific Rim Symposium. 309--314.

[23]

Covington, M. A. 1996. An algorithm to align words for historical comparison. Comput. Linguist. 22, 4, 481--496.

Digital Library

[24]

Crystal, D. 2003. A Dictionary of Linguistics and Phonetics. Wiley-Blackwell.

[25]

Crystal, D. 2006. How Language Works: How Babies Babble, Words Change Meaning, and Languages Live or Die. The Overlook Press, New York.

[26]

Dagan, I. and Church, K. 1994. Termight: Identifying and translating technical terminology. In Proceedings of the 4th Conference on Applied Natural Language Processing. 34--40.

Digital Library

[27]

Dale, R. 2007. Language technology. Slides of HCSNet Summer School Course. Sydney.

[28]

Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Society 39, 1, 1--38.

[29]

Divay, M. and Vitale, A. J. 1997. Algorithms for grapheme-phoneme translation for English and French: Applications for database searches and speech synthesis. Comput. Linguist. 23, 4, 495--523.

Digital Library

[30]

Ekbal, A., Naskar, S. K., and Bandyopadhyay, S. 2006. A modified joint source-channel model for transliteration. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL on Main Conference Poster Sessions. 191--198.

Digital Library

[31]

Eppstein, D. 1998. Finding the k shortest paths. SIAM J. Comput. 28, 2, 652--673.

Digital Library

[32]

Freitag, D. and Khadivi, S. 2007. A sequence alignment model based on the averaged perceptron. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 238--247.

[33]

Fung, P. and McKeown, K. 1997. A technical word- and term-translation aid using noisy parallel corpora across language groups. Mach.Translation 12, 1-2, 53--87.

Digital Library

[34]

Fung, P. and Yee, L. Y. 1998. An IR approach for translating new words from nonparallel, comparable texts. In Proceedings of the 17th International Conference on Computational Linguistics. 414--420.

Digital Library

[35]

Gale, W. and Church, K. 1991. Identifying word correspondence in parallel texts. In Proceedings of the Workshop on Speech and Natural Language. 152--157.

Digital Library

[36]

Gales, M., Liu, X., Sinha, R., Woodland, P., Yu, K., Matsoukas, S., Ng, T., Nguyen, K., Nguyen, L., Gauvain, J.-L., Lamel, L., and Messaoudi, A. 2007. Speech recognition system combination for machine translation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, Los Alamitos, CA, 1277--1280.

[37]

Gao, W., Wong, K.-F., and Lam, W. 2004a. Improving transliteration with precise alignment of phoneme chunks and using contextual features. In Information Retrieval Technology, Asia Information Retrieval Symposium. Lecture Notes in Computer Science, vol. 3411, Springer, Berlin, 106--117.

Digital Library

[38]

Gao, W., Wong, K.-F., and Lam, W. 2004b. Phoneme-based transliteration of foreign names for OOV problem. In Proceedings of the 1st International Joint Conference on Natural Language Processing, Lecture Notes in Computer Science, vol. 3248, Springer, Berlin, 110--119.

Digital Library

[39]

Goldwasser, D. and Roth, D. 2008. Active sample selection for named entity transliteration. In Proceedings of the 46th Annual Meeting of the ACL on Main Conference Poster Sessions. 53--56.

Digital Library

[40]

Goto, I., Kato, N., Ehara, T., and Tanaka, H. 2004. Back transliteration from Japanese to English using target English context. In Proceedings of the 20th International Conference on Computational Linguistics. 827--833.

Digital Library

[41]

Hall, P. A. V. and Dowling, G. R. 1980. Approximate string matching. ACM Comput. Surv. 12, 4, 381--402.

Digital Library

[42]

Henderson, J. C. and Brill, E. 1999. Exploiting diversity in natural language processing: Combining parsers. In Proceedings of the 4th Conference on Empirical Methods in Natural Language Processing. 187--194.

[43]

Hermjakob, U., Knight, K., and III, H. D. 2008. Name translation in statistical machine translation - learning when to transliterate. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 389--397.

[44]

Huang, F. 2005. Cluster-specific named entity transliteration. In Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing.

Digital Library

[45]

Huang, F. and Papineni, K. 2007. Hierarchical system combination for machine translation. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 277--286.

[46]

Huang, F. and Vogel, S. 2002. Improved named entity translation and bilingual named entity extraction. In Proceedings of the 4th IEEE International Conference on Multimodal Interfaces. IEEE, Los Alamitos, CA, 253--258.

Digital Library

[47]

Huang, F., Vogel, S., and Waibel, A. 2005. Clustering and classifying person names by origin. In Proceedings of the National Conference on Artificial Intelligence. 1056--1061.

Digital Library

[48]

Hundt, M. 2006. Corpus Linguistics and the Web (Language and Computers 59). Editions Rodopi BV.

[49]

Jeong, K. S., Myaeng, S. H., Lee, J. S., and Choi, K. S. 1999. Automatic identification and back-transliteration of foreign words for information retrieval. Inf. Process. Manage. 35, 4, 523--540.

[50]

Jiang, L., Zhou, M., Chien, L.-F., and Niu, C. 2007. Named entity translation with web mining and transliteration. In Proceedings of the 20th International Joint Conference on Artificial Intelligence. 1629--1634.

Digital Library

[51]

Jung, S. Y., Hong, S. L., and Paek, E. 2000. An English to Korean transliteration model of extended Markov window. In Proceedings of the 18th Conference on Computational linguistics. 383--389.

Digital Library

[52]

Jurafsky, D. and Martin, J. 2008. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice Hall, Englewood Cliffs, NJ.

Digital Library

[53]

Justeson, J. and Katz, S. 1995. Technical terminology: Some linguistic properties and an algorithm for identification in text. Natural Lang. Eng. 1, 1, 9--27.

[54]

Kang, B.-J. and Choi, K.-S. 2000. Automatic transliteration and back-transliteration by decision tree learning. In Proceedings of the Conference on Language Resources and Evaluation. 1135--1411.

[55]

Kang, I.-H. and Kim, G. 2000. English-to-Korean transliteration using multiple unbounded overlapping phoneme chunks. In Proceedings of the 18th Conference on Computational Linguistics. 418--424.

Digital Library

[56]

Kantor, P. B. and Voorhees, E. M. 2000. The TREC-5 confusion track: Comparing retrieval methods for scanned text. Inf. Retrieval 2, 2-3, 165--176.

Digital Library

[57]

Karimi, S. 2008. Machine transliteration of proper names between English and Persian. Ph.D. dissertation, RMIT University, Melbourne.

[58]

Karimi, S., Scholer, F., and Turpin, A. 2007. Collapsed consonant and vowel models: New approaches for English-Persian transliteration and back-transliteration. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. 648--655.

[59]

Karimi, S., Turpin, A., and Scholer, F. 2006. English to Persian transliteration. In String Processing and Information Retrieval. Lecture Notes in Computer Science, vol. 4209, Springer, Berlin, 255--266.

Digital Library

[60]

Karimi, S., Turpin, A., and Scholer, F. 2007. Corpus effects on the evaluation of automated transliteration systems. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. 640--647.

[61]

Kashani, M., Popowich, F., and Sarkar, A. 2007. Automatic transliteration of proper nouns from Arabic to English. In Proceedings of the 2nd Workshop on Computational Approaches to Arabic Script-based Languages. 81--87.

[62]

Keskustalo, H., Pirkola, A., Visala, K., Lepp&#228;nen, E., and J&#228;rvelin, K. 2003. Non-adjacent digrams improve matching of cross-lingual spelling variants. In String Processing and Information Retrieval. Lecture Notes in Computer Science, vol. 2857, Springer, Berlin, 252--265.

[63]

Klementiev, A. and Roth, D. 2006. Weakly supervised named entity transliteration and discovery from multilingual comparable corpora. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL. 817--824.

Digital Library

[64]

Knight, K. 1999. A Statistical MT Tutorial Workbook. www.isi.edu/natural_language/mt/wkbk.pdf.

[65]

Knight, K. and Graehl, J. 1997. Machine transliteration. In Proceedings of the 8th Conference on European Chapter of the Association for Computational Linguistics. 128--135.

Digital Library

[66]

Knight, K. and Graehl, J. 1998. Machine transliteration. Comput. Linguistics. 24, 4, 599--612.

Digital Library

[67]

Kuo, J.-S., Li, H., and Lin, C.-L. 2009. Harvesting regional transliteration variants with guided search. In Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy. 133--144.

Digital Library

[68]

Kuo, J.-S., Li, H., and Yang, Y.-K. 2007. A phonetic similarity model for automatic extraction of transliteration pairs. ACM Trans. Asian Lang. Inf. Process. 6, 2.

Digital Library

[69]

Kuo, J.-S., Li, H., and Yang, Y.-K. 2008. Active learning for constructing transliteration lexicons from the Web. J. Amer. Soc. Inf. Sci. Technol. 59, 1, 126--135.

Digital Library

[70]

Kuo, J.-S. and Yang, Y.-K. 2004. Constructing transliteration lexicons from web corpora. In Proceedings of the ACL. (Interactive poster and demonstration sessions 3).

Digital Library

[71]

Kuo, J.-S. and Yang, Y.-K. 2005. Incorporating pronunciation variation into extraction of transliterated-term pairs from Web corpora. In Proceedings of the International Conference on Chinese Computing. 131--138.

[72]

Kupiec, J. 1993. An algorithm for finding noun phrase correspondences in bilingual corpora. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics. 17--22.

Digital Library

[73]

Lam, W., Chan, S.-K., and Huang, R. 2007. Named entity translation matching and learning: with application for mining unseen translations. ACM Trans. Inf. Syst. 25, 1, Article 2.

Digital Library

[74]

Lam, W., Huang, R., and Cheung, P.-S. 2004. Learning phonetic similarity for matching named entity translations and mining new translations. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 289--296.

Digital Library

[75]

Larkey, L. S. and Croft, W. B. 1996. Combining classifiers in text categorization. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 289--297.

Digital Library

[76]

Lee, C.-J. and Chang, J. S. 2003. Acquisition of English-Chinese transliterated word pairs from parallel-aligned texts using a statistical machine transliteration model. In Proceedings of the HLT-NAACL Workshop on Building and Using Parallel Texts. 96--103.

Digital Library

[77]

Lee, C.-J., Chang, J. S., and Jang, J.-S. R. 2006a. Alignment of bilingual named entities in parallel corpora using statistical models and multiple knowledge sources. ACM Trans. Asian Lang. Inf. Process. 5, 2, 121--145.

Digital Library

[78]

Lee, C.-J., Chang, J. S., and Jang, J.-S. R. 2006b. Extraction of transliteration pairs from parallel corpora using a statistical transliteration model. Inf. Sci. 176, 1, 67--90.

Digital Library

[79]

Levenshtein, V. I. 1965. Binary codes capable of correcting deletions, insertions and reversals. Doklady Akademii Nauk SSSR 163, 4, 845--848.

[80]

Li, H., Kuo, J.-S., Su, J., and Lin, C.-L. 2008. Mining live transliterations using incremental learning algorithms. Int. J. Comput. Process. Oriental Lang. 21, 2, 183--203.

[81]

Li, H., Sim, K. C., Kuo, J.-S., and Dong, M. 2007. Semantic transliteration of personal names. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 120--127.

[82]

Li, H., Zhang, M., and Su, J. 2004. A joint source-channel model for machine transliteration. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics. 159--166.

Digital Library

[83]

Lin, W.-H. and Chen, H.-H. 2002. Backward machine transliteration by learning phonetic similarity. In Proceedings of the 6th Conference on Natural Language Learning. 1--7.

Digital Library

[84]

Lind&#233;n, K. 2005. Multilingual modeling of cross-lingual spelling variants. Inf. Retrieval 9, 3, 295--310.

Digital Library

[85]

Llitjos, A. F. and Black, A. W. 2001. Knowledge of language origin improves pronunciation accuracy of proper names. In Proceedings of the 7th European Conference on Speech Communication and Technology. 1919--1922.

[86]

Loponen, A., Pirkola, A., J&#228;rvelin, K., and Keskustalo, H. 2008. A novel implementation of the FITE-TRT translation method. In Proceedings of the 30th European Conference on IR Research. 138--149.

Digital Library

[87]

Lu, W.-H., Chien, L.-F., and Lee, H.-J. 2002. Translation of web queries using anchor text mining. ACM Trans. Asian Lang. Inf. Process. 1, 2, 159--172.

Digital Library

[88]

Malik, M. G. A. 2006. Punjabi machine transliteration. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL. 1137--1144.

Digital Library

[89]

Masuyama, T. and Nakagawa, H. 2005. Web-based acquisition of Japanese Katakana variants. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 338--344.

Digital Library

[90]

Matusov, E., Ueffing, N., and Ney, H. 2006. Computing consensus translation for multiple machine translation systems using enhanced hypothesis alignment. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics. 33--40.

[91]

McEnery, T. and Wilson, A. 1996. Corpus Linguistics. Edinburgh University Press.

[92]

Melamed, I. D. 2000. Models of translational equivalence among words. Comput. Linguistics 26, 2, 221--249.

Digital Library

[93]

Meng, H., Lo, W.-K., Chen, B., and Tang, T. 2001. Generate phonetic cognates to handle name entities in English-Chinese cross-language spoken document retrieval. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding. Los Alamitos, CA, 311--314.

[94]

Nagata, M., Saito, T., and Suzuki, K. 2001. Using the web as a bilingual dictionary. In Proceedings of the Workshop on Data-driven Methods in Machine Translation. 1--8.

Digital Library

[95]

Nomoto, T. 2004. Multi-engine machine translation with voted language model. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, 494--501.

Digital Library

[96]

Nowson, S. and Dale, R. 2007. Charting democracy across parsers. In Proceedings of the Australasian Language Technology Workshop. 75--82.

[97]

Och, F. J. and Ney, H. 2003. A systematic comparison of various statistical alignment models. Comput. Linguistics 29, 1, 19--51.

Digital Library

[98]

Oh, J.-H. and Choi, K.-S. 2002. An English-Korean transliteration model using pronunciation and contextual rules. In Proceedings of the 19th International Conference on Computational Linguistics.

Digital Library

[99]

Oh, J.-H. and Choi, K.-S. 2005. Machine learning based English-to-Korean transliteration using grapheme and phoneme information. IEICE Trans. Inf. Syst. E88-D, 7, 1737--1748.

Digital Library

[100]

Oh, J.-H. and Isahara, H. 2006. Mining the web for transliteration lexicons: Joint-validation approach. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. 254--261.

Digital Library

[101]

Oh, J.-H. and Choi, K.-S. 2006a. An ensemble of transliteration models for information retrieval. Inf. Process. Manage. 42, 4, 980--1002.

Digital Library

[102]

Oh, J.-H. and Choi, K.-S. 2006b. Recognizing transliteration equivalents for enriching domain-specific thesauri. In Proceedings of the 3rd International WordNet Conference. 231--237.

[103]

Oh, J.-H., Choi, K.-S., and Isahara, H. 2006a. A hybrid model for extracting transliteration equivalents from parallel corpora. In Proceedings of the 9th International Conference on Text, Speech and Dialogue. 119--126.

Digital Library

[104]

Oh, J.-H., Choi, K.-S., and Isahara, H. 2006b. Improving machine transliteration performance by using multiple transliteration models. In Proceedings of the 21st International Conference on Computer Processing of Oriental Languages. 85--96.

Digital Library

[105]

Oh, J.-H., Choi, K.-S., and Isahara, H. 2006c. A machine transliteration model based on correspondence between graphemes and phonemes. ACM Trans. Asian Lang. Inf. Process. 5, 3, 185--208.

Digital Library

[106]

Oh, J.-H. and Isahara, H. 2007a. Machine transliteration using multiple transliteration engines and hypothesis re-ranking. In Proceedings of the 11th Machine Translation Summit. 353--360.

[107]

Oh, J.-H. and Isahara, H. 2007b. Validating transliteration hypotheses using the web: Web counts vs. web mining. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. 267--270.

Digital Library

[108]

Paczolay, D., Felf&#246;ldi, L., and Kocsor, A. 2006. Classifier combination schemes in speech impediment therapy systems. Acta Cybernetica 17, 2.

Digital Library

[109]

Pearson, J. 1998. Terms in Context. John Benjamins Publishing Company.

[110]

Pedersen, T. 2000. A simple approach to building ensembles of na ve Bayesian classifiers for word sense disambiguation. In Proceedings of the 1st Conference on North American Chapter of the Association for Computational Linguistics. 63--69.

Digital Library

[111]

Pervouchine, V., Li, H., and Lin, B. 2009. Transliteration alignment. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 136--144.

Digital Library

[112]

Pirkola, A., Toivonen, J., Keskustalo, H., and J&#228;rvelin, K. 2006. FITETRT: A high quality translation technique for OOV words. In Proceedings of the ACM Symposium on Applied Computing. 1043--1049.

Digital Library

[113]

Pirkola, A., Toivonen, J., Keskustalo, H., and J&#228;rvelin, K. 2007. Frequency-based identification of correct translation equivalents (FITE) obtained through transformation rules. ACM Trans. Inf. Syst. 26, 1.

Digital Library

[114]

Rapp, R. 1995. Identifying word translations in non-parallel texts. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. 320--322.

Digital Library

[115]

Rodgers, J. L. and Nicewander, W. A. 1988. Thirteen ways to look at the correlation coefficient. Amer. Statistician 42, 1, 59--66.

[116]

Rosti, A.-V., Ayan, N. F., Xiang, B., Matsoukas, S., Schwartz, R., and Dorr, B. 2007. Combining outputs from multiple machine translation systems. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics; Main Conference. 228--235.

[117]

Roth, D. and Zelenko, D. 1998. Part of speech tagging using a network of linear separators. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics. 1136--1142.

Digital Library

[118]

Sherif, T. and Kondrak, G. 2007a. Bootstrapping a stochastic transducer for Arabic-English transliteration extraction. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 864--871.

[119]

Sherif, T. and Kondrak, G. 2007b. Substring-based transliteration. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 944--951.

[120]

Smadja, F. 1992. How to compile a bilingual collocational lexicon automatically. In Proceedings of the AAAI Workshop on Statistically-Based NLP Techniques.

[121]

Sproat, R., Tao, T., and Zhai, C. X. 2006. Named entity transliteration with comparable corpora. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL. 73--80.

Digital Library

[122]

Stalls, B. and Knight, K. 1998. Translating names and technical terms in Arabic text. In Proceedings of the COLING/ACL Workshop on Computational Approaches to Semitic Languages. 34--41.

Digital Library

[123]

Talvensaari, T., J&#228;rvelin, K., and Juhola, M. 2007. Creating and exploiting a comparable corpus in cross-language information retrieval. ACM Trans. Inf. Syst. 25, 1.

Digital Library

[124]

Tanaka, K. and Iwasaki, H. 1996. Extraction of lexical translations from non-aligned corpora. InProceedings of the 16th Conference on Computational Linguistics. 580--585.

Digital Library

[125]

Tao, T., Yoon, S.-Y., Fister, A., Sproat, R., and Zhai, C. 2006. Unsupervised named entity transliteration using temporal and phonetic correlation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 22--23.

Digital Library

[126]

Toivonen, J., Pirkola, A., Keskustalo, H., Visala, K., and J&#228;rvelin, K. 2005. Translating cross-lingual spelling variants using transformation rules. Inf. Process. Manage. 41, 4, 859--872.

Digital Library

[127]

Toutanova, K., Ilhan, H. T., and Manning, C. D. 2002. Extensions to HMM-based statistical word alignment models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 87--94.

Digital Library

[128]

Tsuji, K., Daille, B., and Kageura, K. 2002. Extracting French-Japanese word pairs from bilingual corpora based on transliteration rules. In Proceedings of the 3rd International Conference on Language Resources and Evaluation. 499--502.

[129]

Van der Eijk, P. 1993. Automating the acquisition of bilingual terminology. In Proceedings of the 6th Conference of the European Chapter of the Association for Computational Linguistics. 113--119.

Digital Library

[130]

van Halteren, H., Zavrel, J., and Daelemans, W. 1998. Improving data-driven word class tagging by system combination. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics. 491--497.

Digital Library

[131]

Virga, P. and Khudanpur, S. 2003a. Transliteration of proper names in cross-language applications. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 365--366.

Digital Library

[132]

Virga, P. and Khudanpur, S. 2003b. Transliteration of proper names in cross-lingual information retrieval. In Proceedings of the ACL Workshop on Multilingual and Mixed-Language Named Entity Recognition. 57--64.

Digital Library

[133]

Vogel, S., Ney, H., and Tillmann, C. 1996. HMM-based word alignment in statistical translation. In Proceedings of the 16th Conference on Computational Linguistics. 836--841.

Digital Library

[134]

Wan, S. and Verspoor, C. 1998. Automatic English-Chinese name transliteration for development of multilingual resources. In Proceedings of the 17th International Conference on Computational Linguistics. 1352--1356.

Digital Library

[135]

Wu, J.-C. and Chang, J. S. 2007. Learning to find English to Chinese transliterations on the web. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 996--1004.

[136]

Xu, L., Fujii, A., and Ishikawa, T. 2006. Modeling impression in probabilistic transliteration into Chinese. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 242--249.

Digital Library

[137]

You, J.-L., Chen, Y.-N., Chu, M., Soong, F., and Wang, J.-L. 2008. Identifying language origin of named entity with multiple information sources. IEEE Trans. Audio, Speech Lang. Process. 16, 6, 1077--1086.

Digital Library

[138]

Zelenko, D. and Aone, C. 2006. Discriminative methods for transliteration. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 612--617.

Digital Library

[139]

Zhang, M., Li, H., and Su, J. 2004. Direct orthographical mapping for machine transliteration. In Proceedings of the 20th International Conference on Computational Linguistics. 716.

Digital Library

[140]

Zhang, Y., Huang, F., and Vogel, S. 2005. Mining translations of OOV terms from the web through cross-lingual query expansion. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 669--670.

Digital Library

Cited By

Chaudhary APradhan RShekhar S(2024)A Novel Framework for Multilingual Script Detection and Pattern Analysis in Mixed Script QueriesInternational Journal of Experimental Research and Review10.52756/ijerr.2024.v43spl.01643(214-228)Online publication date: 30-Sep-2024
https://doi.org/10.52756/ijerr.2024.v43spl.016
Sharma VMittal NVidyarthi AGupta D(2024)Exploring Web-Based Translation Resources Applied to Hindi-English Cross-Lingual Information RetrievalACM Transactions on Asian and Low-Resource Language Information Processing10.1145/356901023:1(1-19)Online publication date: 15-Jan-2024
https://dl.acm.org/doi/10.1145/3569010
Nath BSarkar SMukhopadhyay SRoy A(2024)Improving neural machine translation by integrating transliteration for low-resource English–Assamese languageNatural Language Processing10.1017/nlp.2024.20(1-22)Online publication date: 27-May-2024
https://doi.org/10.1017/nlp.2024.20
Show More Cited By

Index Terms

Machine transliteration survey
1. General and reference
  1. Document types
    1. Surveys and overviews

Recommendations

Low-Resource Machine Transliteration Using Recurrent Neural Networks

Grapheme-to-phoneme models are key components in automatic speech recognition and text-to-speech systems. With low-resource language pairs that do not have available and well-developed pronunciation lexicons, grapheme-to-phoneme models are particularly ...
Compositional Machine Transliteration

Machine transliteration is an important problem in an increasingly multilingual world, as it plays a critical role in many downstream applications, such as machine translation or crosslingual information retrieval systems. In this article, we propose ...
Machine Learning Based English-to-Korean Transliteration Using Grapheme and Phoneme Information

Machine transliteration is an automatic method to generate characters or words in one alphabetical system for the corresponding characters in another alphabetical system. Machine transliteration can play an important role in natural language application ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 43, Issue 3

April 2011

466 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/1922649

Issue’s Table of Contents

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 April 2011

Accepted: 01 September 2009

Revised: 01 September 2009

Received: 01 December 2008

Published in CSUR Volume 43, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

54
Total Citations
View Citations
1,815
Total Downloads

Downloads (Last 12 months)35
Downloads (Last 6 weeks)2

Reflects downloads up to 29 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chaudhary APradhan RShekhar S(2024)A Novel Framework for Multilingual Script Detection and Pattern Analysis in Mixed Script QueriesInternational Journal of Experimental Research and Review10.52756/ijerr.2024.v43spl.01643(214-228)Online publication date: 30-Sep-2024
https://doi.org/10.52756/ijerr.2024.v43spl.016
Sharma VMittal NVidyarthi AGupta D(2024)Exploring Web-Based Translation Resources Applied to Hindi-English Cross-Lingual Information RetrievalACM Transactions on Asian and Low-Resource Language Information Processing10.1145/356901023:1(1-19)Online publication date: 15-Jan-2024
https://dl.acm.org/doi/10.1145/3569010
Nath BSarkar SMukhopadhyay SRoy A(2024)Improving neural machine translation by integrating transliteration for low-resource English–Assamese languageNatural Language Processing10.1017/nlp.2024.20(1-22)Online publication date: 27-May-2024
https://doi.org/10.1017/nlp.2024.20
Tipu AFahad MMandal A(2024)A Romanization Method for the Bengali Language with Efficient Encoding SchemeProceedings of the 2nd International Conference on Big Data, IoT and Machine Learning10.1007/978-981-99-8937-9_41(605-619)Online publication date: 30-Mar-2024
https://doi.org/10.1007/978-981-99-8937-9_41
Shoba SSasithradevi ADeepa S(2024)Spoken Language Translation in Low‐Resource LanguageAutomatic Speech Recognition and Translation for Low Resource Languages10.1002/9781394214624.ch20(445-459)Online publication date: 29-Mar-2024
https://doi.org/10.1002/9781394214624.ch20
Sato S(2023)Translating the List of Participants in the 2020 Tokyo Olympic Games into Japanese2020 東京オリンピック参加者名簿の翻訳Journal of Natural Language Processing10.5715/jnlp.30.74830:2(748-772)Online publication date: 2023
https://doi.org/10.5715/jnlp.30.748
Jadhav RDhore M(2023)Cross-language information retrieval for poetry form of literature-based on machine transliteration using CNNJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-22359145:2(3025-3037)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.3233/JIFS-223591
Kirov CJohny CKatanova AGutkin ARoark B(2023)Context-aware Transliteration of Romanized South Asian LanguagesComputational Linguistics10.1162/coli_a_0051050:2(475-534)Online publication date: 1-Jun-2023
https://doi.org/10.1162/coli_a_00510
Fahad MTipu AKumar Mandal AChakraborty B(2023)Influence of Contextual Information on Bengali-English Forward and Backward Transliteration Using Binary Coding2023 1st International Conference on Optimization Techniques for Learning (ICOTL)10.1109/ICOTL59758.2023.10435050(1-6)Online publication date: 7-Dec-2023
https://doi.org/10.1109/ICOTL59758.2023.10435050
Yadav MKumar IKumar A(2023)Different Models of Transliteration - A Comprehensive Review2023 International Conference on Innovative Data Communication Technologies and Application (ICIDCA)10.1109/ICIDCA56705.2023.10099632(356-363)Online publication date: 14-Mar-2023
https://doi.org/10.1109/ICIDCA56705.2023.10099632
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents