Abstract
By incorporating human feedback in parallel corpora alignment and term translation extraction tasks, and by using all human validated term translation pairs that have been marked as correct, the alignment precision, term translation extraction quality and a bunch of closely correlated tasks improve. Moreover, such a labelled lexicon with entries tagged for correctness enables bilingual learning. From this perspective, we present experiments on automatic classification of translation candidates extracted from aligned parallel corpora. For this purpose, we train SVM based classifiers for three language pairs, English-Portuguese (EN-PT), English-French (EN-FR) and French-Portuguese (FR-PT). The approach enabled micro f-measure classification rates of 95.96%, 75.04% and 65.87% respectively, for the EN-PT, EN-FR and FR-PT language pairs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aires, J., Lopes, G.P., Gomes, L.: Phrase translation extraction from aligned parallel corpora using suffix arrays and related structures. In: Lopes, L.S., Lau, N., Mariano, P., Rocha, L.M. (eds.) EPIA 2009. LNCS, vol. 5816, pp. 587–597. Springer, Heidelberg (2009)
Aker, A., Paramita, M.L., Gaizauskas, R.J.: Extracting bilingual terminologies from comparable corpora. In: Proceedings of the 51st Annual Meeting for Computational linguistics, vol. 2, pp. 402–411 (2013)
Bergsma, S., Kondrak, G.: Alignment-based discriminative string similarity. In: Annual meeting-ACL, vol. 45, p. 656 (2007)
Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational linguistics 19(2), 263–311 (1993)
Chen, B., Cattoni, R., Bertoldi, N., Cettolo, M., Federico, M.: The ITC-irst SMT system for IWSLT-2005, pp. 98–104 (2005)
Fraser, A., Marcu, D.: Measuring word alignment quality for statistical machine translation. Computational Linguistics 33(3), 293–303 (2007)
Gomes, L.: Parallel texts alignment. In: New Trends in Artificial Intelligence, 14th Portuguese Conference in Artificial Intelligence, EPIA 2009, Aveiro, October 2009
Gomes, L., Pereira Lopes, J.G.: Measuring spelling similarity for cognate identification. In: Antunes, L., Pinto, H.S. (eds.) EPIA 2011. LNCS, vol. 7026, pp. 624–633. Springer, Heidelberg (2011)
Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge Univ Pr., pp. 52–61 (1997)
Johnson, J.H., Martin, J., Foster, G., Kuhn, R.: Improving translation quality by discarding most of the phrasetable. In: Proceedings of EMNLP (2007)
Kavitha, K.M., Gomes, L., Lopes, G.P.: Using SVMs for filtering translation tables for parallel corpora alignment. In: 15th Portuguese Conference in Arificial Intelligence, EPIA 2011, pp. 690–702, October 2011
Kavitha, K.M., Gomes, L., Lopes, J.G.P.: Identification of bilingual suffix classes for classification and translation generation. In: Bazzan, A.L.C., Pichara, K. (eds.) IBERAMIA 2014. LNCS, vol. 8864, pp. 154–166. Springer, Heidelberg (2014)
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180. ACL (2007)
Kutsumi, T., Yoshimi, T., Kotani, K., Sata, I., Isahara, H.: Selection of entries for a bilingual dictionary from aligned translation equivalents using support vector machines. In: Proceedings of PACLING (2005)
Lardilleux, A., Lepage, Y.: Sampling-based multilingual alignment. In: Proceedings of RANLP, pp. 214–218 (2009)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10, 707–710 (1966)
Melamed, I.D.: Automatic evaluation and uniform filter cascades for inducing n-best translation lexicons. In: Proceedings of the Third Workshop on Very Large Corpora, pp. 184–198. Boston, MA (1995)
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational linguistics 29(1), 19–51 (2003)
Och, F.J., Ney, H.: The alignment template approach to statistical machine translation. Computational Linguistics 30(4), 417–449 (2004)
Sato, K., Saito, H.: Extracting word sequence correspondences based on support vector machines. Journal of Natural Language Processing 10(4), 109–124 (2003)
Tian, L., Wong, D.F., Chao, L.S., Oliveira, F.: A relationship: Word alignment, phrase table, and translation quality. The Scientific World Journal (2014)
Tiedemann, J.: Extraction of translation equivalents from parallel corpora. In: Proceedings of the 11th NoDaLiDa, pp. 120–128 (1998)
Tomeh, N., Cancedda, N., Dymetman, M.: Complexity-based phrase-table filtering for statistical machine translation (2009)
Tomeh, N., Turchi, M., Allauzen, A., Yvon, F.: How good are your phrases? Assessing phrase quality with single class classification. In: IWSLT, pp. 261–268 (2011)
Vapnik, V.: The Nature of Statistical Learning Theory. Data Mining and Knowledge Discovery 1–47 (2000)
Vilar, D., Popovic, M., Ney, H.: AER: Do we need to “improve” our alignments? In: IWSLT, pp. 205–212 (2006)
Way, A., Hearne, M.: On the role of translations in state-of-the-art statistical machine translation. Language and Linguistics Compass 5(5), 227–248 (2011)
Zens, R., Stanton, D., Xu, P.: A systematic comparison of phrase table pruning techniques. In: Proceedings of the 2012 Joint Conference on EMNLP and CoNLL, EMNLP-CoNLL 2012, pp. 972–983. ACL (2012)
Zhao, B., Vogel, S., Waibel, A.: Phrase pair rescoring with term weightings for statistical machine translation (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Kavitha, K.M., Gomes, L., Aires, J., Lopes, J.G.P. (2015). Classification and Selection of Translation Candidates for Parallel Corpora Alignment. In: Pereira, F., Machado, P., Costa, E., Cardoso, A. (eds) Progress in Artificial Intelligence. EPIA 2015. Lecture Notes in Computer Science(), vol 9273. Springer, Cham. https://doi.org/10.1007/978-3-319-23485-4_73
Download citation
DOI: https://doi.org/10.1007/978-3-319-23485-4_73
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23484-7
Online ISBN: 978-3-319-23485-4
eBook Packages: Computer ScienceComputer Science (R0)