Abstract
Referential translation machines (RTMs) are a computational model effective at judging monolingual and bilingual similarity while identifying translation acts between any two data sets with respect to interpretants, data close to the task instances. RTMs pioneer a language-independent approach to all similarity tasks and remove the need to access any task- or domain-specific information or resource. We use RTMs for predicting the semantic similarity of text and present state-of-the-art results showing that RTMs can achieve better results on the test set than on the training set. Interpretants are used to derive features measuring the closeness of the test sentences to the training data, the difficulty of translating them, and the presence of the acts of translation, which may ubiquitously be observed in communication. RTMs can achieve top performance at SemEval in various semantic similarity prediction tasks as well as similarity prediction tasks in bilingual settings. We obtain rankings of various prediction tasks using the performance of RTM and relative evaluation metrics, which can help identify which tasks and subtasks require more work by design.
Similar content being viewed by others
Notes
For nodes with an uneven number of children, the nodes in the odd child contribute to the right branches.
Wall Street Journal (WSJ) corpus section 23, distributed with Penn Treebank version 3 (Marcus et al. 1993).
LIX = \(\frac{A}{B} + C \frac{100}{A}\), where there are A words in total, C words longer than 6 characters, and B words that start or end with any of “.”, “:”, “!”, “?” similar to Hagström (2012).
We use \(\lfloor \;.\; \rfloor _\epsilon \) to cap the argument from below to \(\epsilon \).
Some of the results may be rounded with the \({\text\tt {round}}(.)\) function from python (https://www.python.org/) and some with the \({\text\tt {numpy.round}}()\) function from numpy (http://www.scipy.org/), which may cause differences at the least significant digit. For instance, \({\text\tt {round}}(0.8445,3) = 0.845\) and \({\text\tt {numpy.round}}(0.8445,3) = 0.84399999999999997\).
This gives an advantage to participants submitting to all levels in the rankings.
References
Agirre, E., Banea, C., Cardie, C., Cer, D., Diab, M., Gonzalez-Agirre, A., Guo, W., Lopez-Gazpio, I., Maritxalar, M., Mihalcea, R., Rigau, G., Uria, L., & Wiebe, J. (2015). Semeval-2015 task 2: Semantic textual similarity, english, spanish and pilot on interpretability. In: Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 252–263). Denver: Association for Computational Linguistics. http://www.aclweb.org/anthology/S15-2045.
Agirre, E., Banea, C., Cardie, C., Cer, D., Diab, M., Gonzalez-Agirre, A., Guo, W., Mihalcea, R., Rigau, G., & Wiebe, J. (2014). SemEval-2014 Task 10: Multilingual semantic textual similarity. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014) (pp. 81–91). Dublin.
Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A., & Guo, W. (2013). *SEM 2013 shared task: Semantic textual similarity, including a pilot on typed-similarity. In: *SEM 2013: The second joint conference on lexical and computational semantics.
Baker, C.F., Fillmore, C.J., & Lowe, J.B. (1998). The berkeley framenet project. In: Proceedings of the 36th annual meeting of the association for computational linguistics and 17th international conference on computational linguistics, vol. 1, ACL ’98 (pp. 86–90).
Banko, M., & Brill, E. (2001). Scaling to very very large corpora for natural language disambiguation. In: Proceedings of 39th annual meeting of the association for computational linguistics (pp. 26–33). Toulouse. doi:10.3115/1073012.1073017. http://www.aclweb.org/anthology/P01-1005.
Bär, D., Biemann, C., Gurevych, I., & Zesch, T. (2012). Ukp: Computing semantic textual similarity by combining multiple content similarity measures. In: *SEM 2012: The first joint conference on lexical and computational semantics–Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the sixth international workshop on semantic evaluation (SemEval 2012) (pp. 435–440). Montréal (2012). http://www.aclweb.org/anthology/S12-1059.
Biçici, E. (2008). Context-based sentence alignment in parallel corpora. In: A. Gelbukh (ed.) Computational linguistics and intelligent text processing. Lecture notes in computer science (vol. 4919, pp. 434–444). doi:10.1007/978-3-540-78135-6_37.
Biçici, E. (2011). The regression model of machine translation. Ph.D. thesis, Koç University. Supervisor: Deniz Yuret.
Biçici, E. (2013). Referential translation machines for quality estimation. In: Proceedings of the eighth workshop on statistical machine translation (pp. 343–351). Sofia.
Biçici, E. (2015). RTM-DCU: Predicting semantic similarity with referential translation machines. In: SemEval-2015: Semantic evaluation exercises: International workshop on semantic evaluation. Denver.
Biçici, E., & van Genabith, J. (2013). CNGL-CORE: Referential translation machines for measuring semantic similarity. In: *SEM 2013: The second joint conference on lexical and computational semantics (pp. 234–240). Atlanta.
Biçici, E., & Way, A. (2014) Referential translation machines for predicting translation quality. In: Proceedings of the ninth workshop on statistical machine translation (pp. 313–321). Baltimore.
Biçici, E., & Way, A. (2014). RTM-DCU: Referential translation machines for semantic similarity. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval-2014) (pp. 487–496). Dublin. http://aclweb.org/anthology/S14-2085.
Biçici, E., & Yuret, D. (2011). Instance selection for machine translation using feature decay algorithms. In: Proceedings of the sixth workshop on statistical machine translation (pp. 272–283). Edinburgh.
Biçici, E., & Yuret, D. (2011) RegMT system for machine translation, system combination, and evaluation. In: Proceedings of the sixth workshop on statistical machine translation (pp. 323–329). Edinburgh. http://www.aclweb.org/anthology/W11-2137.
Biçici, E., Liu, Q., & Way, A. (2014). Parallel FDA5 for fast deployment of accurate statistical machine translation systems. In: Proceedings of the ninth workshop on statistical machine translation (pp. 59–65). Baltimore.
Biçici, E., Liu, Q., & Way, A. (2015). ParFDA for fast deployment of accurate statistical machine translation systems, benchmarks, and statistics. In: Proceedings of the tenth workshop on statistical machine translation. Lisbon: Association for Computational Linguistics.
Biçici, E., Liu, Q., & Way, A. (2015). Referential translation machines for predicting translation quality and related statistics. In: Proceedings of the tenth workshop on statistical machine translation. Lisbon: Association for Computational Linguistics.
Biçici, E. (2008). Consensus ontologies in socially interacting multiagent systems. Journal of Multiagent and Grid Systems, 4(3), 297–314.
Biçici, E., Groves, D., & van Genabith, J. (2013). Predicting sentence translation quality using extrinsic and language independent features. Machine Translation, 27, 171–192. doi:10.1007/s10590-013-9138-4.
Biçici, E., & Yuret, D. (2015). Optimizing instance selection for statistical machine translation with feature decay algorithms. IEEE/ACM Transactions On Audio, Speech, and Language Processing (TASLP), 23, 339–350. doi:10.1109/TASLP.2014.2381882.
Björnsson, C.H. (1968). Läsbarhet.
Bliss, C. (2012). Comedy is translation. http://www.ted.com/talks/chris_bliss_comedy_is_translation.html.
Bojar, O., Buck, C., Callison-Burch, C., Federmann, C., Haddow, B., Koehn, P., Monz, C., Post, M., Soricut, R., & Specia, L. (2013). Findings of the 2013 Workshop on Statistical Machine Translation. In: Proceedings of the eighth workshop on statistical machine translation (pp. 1–44). Sofia. http://www.aclweb.org/anthology/W13-2201.
Bojar, O., Buck, C., Federmann, C., Haddow, B., Koehn, P., Leveling, J., Monz, C., Pecina, P., Post, M., Saint-Amand, H., Soricut, R., Specia, L., & Tamchyna, A. (2014). Findings of the 2014 workshop on statistical machine translation. In: Proceedings of the ninth workshop on statistical machine translation (pp. 12–58). Baltimore.
Bojar, O., Chatterjee, R., Federmann, C., Haddow, B., Hokamp, C., Huck, M., Pecina, P., Koehn, P., Monz, C., Negri, M., Post, M., Scarton, C., Specia, L., & Turchi, M. (2015). Findings of the 2015 workshop on statistical machine translation. In: Proceedings of the tenth workshop on statistical machine translation. Lisbon.
Brown, P. F., Pietra, S. A. D., Pietra, V. J. D., & Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2), 263–311.
Cover, T. M., & Thomas, J. A. (2006). Elements of information theory. Wiley-Interscience.
de Souza, J.G.C., Buck, C., Turchi, M., & Negri, M. (2013). FBK-UEdin participation to the WMT13 quality estimation shared task. In: Proceedings of the eighth workshop on statistical machine translation (pp. 352–358). Sofia.
Doddington, G. (2002). Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the second international conference on human language technology research (pp. 138–145). San Francisco.
Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42.
Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1–3), 389–422.
Hagström, K. (2012). Swedish readability calculator. https://github.com/keha76/Swedish-Readability-Calculator.
Jurgens, D., Pilehvar, M.T., & Navigli, R. (2014). SemEval-2014 Task 3: Cross-level semantic similarity. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014) (pp. 17–26). Dublin.
Koehn, P. (2010a). Statistical machine translation (1st ed.). New York, USA: Cambridge University Press.
Koehn, P. (2010b). An experimental management system. The Prague Bulletin of Mathematical Linguistics, 94, 87–96.
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., & Herbst, E. (2007). Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions (pp. 177–180). Association for Computational Linguistics. http://aclweb.org/anthology/P07-2045.
Lavie, A., Agarwal, A. (2007). METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the second workshop on statistical machine translation (pp. 228–231). Prague.
Levy, R., & Andrew, G. (2006). Tregex and Tsurgeon: tools for querying and manipulating tree data structures. In: Proceedings of the fifth international conference on language resources and evaluation.
Manandhar, S., & Yuret, D. (2013). Second joint conference on lexical and computational semantics (*sem), volume 2: Proceedings of the seventh international workshop on semantic evaluation (semeval 2013). In: Second joint conference on lexical and computational semantics (*SEM), volume 2: Proceedings of the seventh international workshop on semantic evaluation (SemEval 2013). Association for Computational Linguistics. http://aclweb.org/anthology/S13-2000.
Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of english: The penn treebank. Computational Linguistics, 19(2), 313–330.
Marelli, M., Bentivogli, L., Baroni, M., Bernardi, R., Menini, S., & Zamparelli, R. (2014). SemEval-2014 Task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014) (pp. 1–8). Dublin.
Marelli, M., Menini, S., Baroni, M., Bentivogli, L., Bernardi, R., & Zamparelli, R. (2014). A SICK cure for the evaluation of compositional distributional semantic models. In: N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, S. Piperidis (Eds.), Proceedings of the ninth international conference on language resources and evaluation (LREC’14). Reykjavik.
Mendonça, Â., Jaquette, D., Graff, D., & DiPersio, D. (2011). Spanish Gigaword third edition, linguistic data consortium.
Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39–41.
Nakov, P., & Zesch, T. (2014). Proceedings of the 8th international workshop on semantic evaluation (semeval 2014). In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014). Association for Computational Linguistics. http://aclweb.org/anthology/S14-2000.
Nakov, P., Zesch, T., Cer, D., & Jurgens, D. (2015). Proceedings of the 9th international workshop on semantic evaluation (semeval 2015). In: Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015). Association for Computational Linguistics. http://aclweb.org/anthology/S15-2000.
Papineni, K., Roukos, S., Ward, T., & Zhu, W.J. (2002). BLEU: A method for automatic evaluation of machine translation. In: Proceedings of 40th annual meeting of the association for computational linguistics (pp. 311–318). Philadelphia.
Parker, R., Graff, D., Kong, J., Chen, K., & Maeda, K. (2011). English Gigaword fifth edition, linguistic data consortium.
Pradhan, S. S., Hovy, E. H., Marcus, M. P., Palmer, M., Ramshaw, L. A., & Weischedel, R. M. (2007). Ontonotes: A unified relational semantic representation. International Journal of Semantic Computing, 1(4), 405–419.
Raybaud, S., Langlois, D., & Smaïli, K. (2011). “This sentence is wrong”. Detecting errors in machine-translated sentences. Machine Translation, 25(1), 1–34. doi:10.1007/s10590-011-9094-9.
Seginer, Y. (2007). Learning syntactic structure. Ph.D. thesis, Universiteit van Amsterdam.
Smola, A.J., Murata, N., Schölkopf, B., & Müller, K.R. (1998). Asymptotically optimal choice of \(\varepsilon \)-loss for support vector machines. In: L. Niklasson, M. Boden, T. Ziemke (Eds.), Proceedings of the international conference on artificial neural networks, perspectives in neural computing (pp. 105–110). Berlin.
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., & Makhoul, J. (2006). A study of translation edit rate with targeted human annotation. In: Proceedings of association for machine translation inthe Americas.
Specia, L., Cancedda, N., Dymetman, M., Turchi, M., & Cristianini, N. (2009). Estimating the sentence-level quality of machine translation systems. In: Proceedings of the 13th annual conference of the European association for machine translation (EAMT) (pp. 28–35). Barcelona.
Specia, L., Shah, K., Avramidis, E., & Biçici, E. (2014). QTLaunchPad deliverable D2.2.1 quality estimation for system selection and combination. http://www.qt21.eu/launchpad/deliverable/quality-estimation-system-selection-and-combination.
Stolcke, A. (2002). SRILM - an extensible language modeling toolkit. In: Proceedings of the international conference on spoken language processing (pp. 901–904).
Tan, L., Scarton, C., Specia, L., & van Genabith, J. (2015). Usaar-sheffield: Semantic textual similarity with deep regression and machine translation evaluation metrics. In: Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 85–89). Association for Computational Linguistics. http://aclweb.org/anthology/S15-2015.
Toutanova, K., Klein, D., Manning, C.D., & Singer, Y. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 conference of the North American Chapter of the association for computational linguistics on human language technology–volume 1, NAACL ’03 (pp. 173–180). Stroudsburg.
Wikipedia: LIX (2013). http://en.wikipedia.org/wiki/LIX.
Xu, W., Callison-Burch, C., & Dolan, B. (2015). Semeval-2015 task 1: Paraphrase and semantic similarity in twitter (pit). In: Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 1–11). Denver: Association for Computational Linguistics. http://www.aclweb.org/anthology/S15-2001.
Zarrella, G., Henderson, J., Merkhofer, E.M., & Strickhart, L. (2015). Mitre: Seven systems for semantic similarity in tweets. In: Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 12–17). Denver: Association for Computational Linguistics. http://www.aclweb.org/anthology/S15-2002.
Acknowledgments
This work is supported in part by SFI (12/CE/I2267) as part of the ADAPT CNGL Centre for Global Intelligent Content (www.adaptcentre.ie) at Dublin City University, in part by SFI (13/TIDA/I2740) for the project “Monolingual and Bilingual Text Quality Judgments with Translation Performance Prediction” (www.computing.dcu.ie/ebicici/Projects/TIDA_RTM.html), and in part by the European Commission through the QTLaunchPad FP7 Project (No: 296347). We also thank the SFI/HEA Irish Centre for High-End Computing (ICHEC) for the provision of computational facilities and support.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Biçici, E., Way, A. Referential translation machines for predicting semantic similarity. Lang Resources & Evaluation 50, 793–819 (2016). https://doi.org/10.1007/s10579-015-9322-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-015-9322-7