Abstract
Recent research has shown that a balanced harmonic mean (F1 measure) of unigram precision and recall outperforms the widely used BLEU and NIST metrics for Machine Translation evaluation in terms of correlation with human judgments of translation quality. We show that significantly better correlations can be achieved by placing more weight on recall than on precision. While this may seem unexpected, since BLEU and NIST focus on n-gram precision and disregard recall, our experiments show that correlation with human judgments is highest when almost all of the weight is assigned to recall. We also show that stemming is significantly beneficial not just to simpler unigram precision and recall based metrics, but also to BLEU and NIST.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Kishore, P., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a Method for Automatic Evaluation of Machine Translation. In: Proceedings of 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, PA, pp. 311–318 (July 2002)
Doddington, G.: Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics. In: Proceedings of the Second Conference on Human Language Technology (HLT 2002), San Diego, CA, pp. 128–132 (2002)
Su, K.-Y., Wu, M.-W., Chang, J.-S.: A New Quantitative Quality Measure for Machine Translation Systems. In: Proceedings of the fifteenth International Conference on Computational Linguistics (COLING 1992), Nantes, France, pp. 433–439 (1992)
Akiba, Y., Imamura, K., Sumita, E.: Using Multiple Edit Distances to Automatically Rank Machine Translation Output. In: Proceedings of MT Summit VIII, Santiago de Compostela, Spain, pp. 15–20 (2001)
Niessen, S., Och, F.J., Leusch, G., Ney, H.: An Evaluation Tool for Machine Translation: Fast Evaluation for Machine Translation Research. In: Proceedings of the Second International Conference on Language Resources and Evaluation (LREC-2000) Athens, Greece, pp. 39–45 (2000)
Leusch, G., Ueffing, N., Ney, H.: String-to-String Distance Measure with Applications to Machine Translation Evaluation. In: Proceedings of MT Summit IX, New Orleans, LA, September 2003, pp. 240–247 (2003)
Melamed, D., Green, R., Turian, J.: Precision and Recall of Machine Translation. In: Proceedings of HLT-NAACL 2003.Short Papers, Edmonton, Canada, May 2003, pp. 61–63 (2003)
Turian, J.P., Shen, L., Dan Melamed, I.: Evaluation of Machine Translation and its Evaluation. In: Proceedings of MT Summit IX, New Orleans, LA, September 2003, pp. 386–393 (2003)
Lin, C.-Y., Hovy, E.: Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics. In: Proceedings of HLT-NAACL 2003, Edmonton, Canada, May 2003, pp. 71–78 (2003)
van Rijsbergen, C.: Information Retrieval. Butterworths, 2nd edn., London, England (1979)
Coughlin, D.: Correlating Automated and Human Assessments of Machine Translation Quality. In: Proceedings of MT Summit IX, New Orleans, LA, September 2003, pp. 63–70 (2003)
Efron, B., Tibshirani, R.: Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy. Statistical Science 1(1), 54–77 (1986)
Doddington, G.: Automatic Evaluation of Language Translation using N-gram Co-occurrence Statistics. Presentation at DARPA/TIDES 2003 MT Workshop. NIST, Gathersberg, MD (July 2003)
Pang, B., Knight, K., Marcu, D.: Syntax-based Alignment of Multiple Translations: Extracting Paraphrases and Generating New Sentences. In: Proceedings of HLT-NAACL 2003, Edmonton, Canada, May 2003, pp. 102–109 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lavie, A., Sagae, K., Jayaraman, S. (2004). The Significance of Recall in Automatic Metrics for MT Evaluation. In: Frederking, R.E., Taylor, K.B. (eds) Machine Translation: From Real Users to Research. AMTA 2004. Lecture Notes in Computer Science(), vol 3265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30194-3_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-30194-3_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23300-8
Online ISBN: 978-3-540-30194-3
eBook Packages: Springer Book Archive