extended-abstract

KG-BERTScore: Incorporating Knowledge Graph into BERTScore for Reference-Free Machine Translation Evaluation

Authors:

Ying QinAuthors Info & Claims

IJCKG '22: Proceedings of the 11th International Joint Conference on Knowledge Graphs

Pages 121 - 125

https://doi.org/10.1145/3579051.3579065

Published: 13 February 2023 Publication History

Abstract

BERTScore is an effective and robust automatic metric for reference-based machine translation evaluation. In this paper, we incorporate multilingual knowledge graph into BERTScore and propose a metric named KG-BERTScore, which linearly combines the results of BERTScore and bilingual named entity matching for reference-free machine translation evaluation. From the experimental results on WMT19 QE as a metric without references shared tasks, our metric KG-BERTScore gets higher overall correlation with human judgements than the current state-of-the-art metrics for reference-free machine translation evaluation.1 Moreover, the pre-trained multilingual model used by KG-BERTScore and the parameter for linear combination are also studied in this paper.

References

[1]

Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 65–72.

[2]

Loïc Barrault, Ondřej Bojar, Marta R Costa-jussà, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Philipp Koehn, Shervin Malmasi, 2019. Findings of the 2019 Conference on Machine Translation (WMT19). In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1). 1–61.

[3]

Ondřej Bojar, Christian Buck, Christian Federmann, Barry Haddow, Philipp Koehn, Johannes Leveling, Christof Monz, Pavel Pecina, Matt Post, Herve Saint-Amand, 2014. Findings of the 2014 workshop on statistical machine translation. In Proceedings of the ninth workshop on statistical machine translation. 12–58.

[4]

Muhao Chen, Yingtao Tian, Mohan Yang, and Carlo Zaniolo. 2017. Multilingual knowledge graph embeddings for cross-lingual knowledge alignment. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 1511–1517.

[5]

Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Édouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020. Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 8440–8451.

[6]

Erick Fonseca, Lisa Yankovskaya, André FT Martins, Mark Fishel, and Christian Federmann. 2019. Findings of the WMT 2019 shared tasks on quality estimation. In Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2). 1–10.

[7]

Zorik Gekhman, Roee Aharoni, Genady Beryozkin, Markus Freitag, and Wolfgang Macherey. 2020. KoBE: Knowledge-Based Machine Translation Evaluation. In Findings of the Association for Computational Linguistics: EMNLP 2020. 3200–3207.

[8]

Julia Ive, Frédéric Blain, and Lucia Specia. 2018. DeepQuest: a framework for neural-based quality estimation. In Proceedings of the 27th International Conference on Computational Linguistics. 3146–3157.

[9]

Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL-HLT. 4171–4186.

[10]

Fabio Kepler, Jonay Trénous, Marcos Treviso, Miguel Vera, and André F. T. Martins. 2019. OpenKiwi: An Open Source Framework for Quality Estimation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 117–122. https://doi.org/10.18653/v1/P19-3020

[11]

Hyun Kim, Jong-Hyeok Lee, and Seung-Hoon Na. 2017. Predictor-estimator using multilevel task learning with stack propagation for neural quality estimation. In Proceedings of the Second Conference on Machine Translation. 562–568.

[12]

Chi-kiu Lo. 2017. MEANT 2.0: Accurate semantic MT evaluation for any output language. In Proceedings of the second conference on machine translation. 589–597.

[13]

Chi-kiu Lo. 2019. YiSi-a unified semantic MT quality evaluation and estimation metric for languages with different levels of available resources. In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1). 507–513.

[14]

Weiming Lu, Peng Wang, Xinyin Ma, Wei Xu, and Chen Chen. 2020. Enrich cross-lingual entity links for online wikis via multi-modal semantic matching. Information Processing & Management 57, 5 (2020), 102271.

[15]

George A Miller. 1995. WordNet: a lexical database for English. Commun. ACM 38, 11 (1995), 39–41.

Digital Library

[16]

Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st annual meeting of the Association for Computational Linguistics. 160–167.

Digital Library

[17]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318.

[18]

Maja Popović. 2012. Morpheme-and POS-based IBM1 and language model scores for translation quality estimation. In Proceedings of the Seventh Workshop on Statistical Machine Translation. 133–137.

Digital Library

[19]

Maja Popović. 2015. chrF: character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation. 392–395.

[20]

Tharindu Ranasinghe, Constantin Orǎsan, and Ruslan Mitkov. 2020. TransQuest: Translation Quality Estimation with Cross-lingual Transformers. In Proceedings of the 28th International Conference on Computational Linguistics. 5070–5081.

[21]

Thibault Sellam, Dipanjan Das, and Ankur Parikh. 2020. BLEURT: Learning Robust Metrics for Text Generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 7881–7892.

[22]

Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers. 223–231.

[23]

Lucia Specia, Gustavo Paetzold, and Carolina Scarton. 2015. Multi-level translation quality prediction with quest++. In Proceedings of ACL-IJCNLP 2015 system demonstrations. 115–120.

[24]

Lucia Specia, Kashif Shah, José GC De Souza, and Trevor Cohn. 2013. QuEst-A translation quality estimation framework. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 79–84.

[25]

Craig Stewart, Ricardo Rei, Catarina Farinha, and Alon Lavie. 2020. COMET-Deploying a New State-of-the-art MT Evaluation Metric in Production. In AMTA (2). 78–109.

[26]

Hang Yan, Tao Gui, Junqi Dai, Qipeng Guo, Zheng Zhang, and Xipeng Qiu. 2021. A Unified Generative Framework for Various NER Subtasks. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 5808–5822.

[27]

Elizaveta Yankovskaya, Andre Tättar, and Mark Fishel. 2019. Quality estimation and translation metrics via pre-trained word and sentence embeddings. In Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2). 101–105.

[28]

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. 2020. Bertscore: Evaluating text generation with bert. In 8th International Conference on Learning Representations.

Cited By

Zhang MPeng SYang HZhao YQiao XZhu JTao SQin YJiang Y(2022)EntityRank: Unsupervised Mining of Bilingual Named Entity Pairs from Parallel Corpora for Neural Machine Translation2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10021032(3708-3713)Online publication date: 17-Dec-2022
https://doi.org/10.1109/BigData55660.2022.10021032

Index Terms

KG-BERTScore: Incorporating Knowledge Graph into BERTScore for Reference-Free Machine Translation Evaluation
1. Applied computing
  1. Arts and humanities

Index terms have been assigned to the content through auto-classification.

Recommendations

A Comprehensive Survey on Various Fully Automatic Machine Translation Evaluation Metrics
Abstract
The fast advancement in machine translation models necessitates the development of accurate evaluation metrics that would allow researchers to track the progress in text languages. The evaluation of machine translation models is crucial since its ...
Meta-evaluation of Machine Translation Using Parallel Legal Texts
ICCPOL '09: Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy

In this paper we report our recent work on the evaluation of a number of popular automatic evaluation metrics for machine translation using parallel legal texts. The evaluation is carried out, following a recognized evaluation protocol, to assess the ...
Machine translation evaluation versus quality estimation

Most evaluation metrics for machine translation (MT) require reference translations for each sentence in order to produce a score reflecting certain aspects of its quality. The de facto metrics, BLEU and NIST, are known to have good correlation with ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

IJCKG '22: Proceedings of the 11th International Joint Conference on Knowledge Graphs

October 2022

134 pages

ISBN:9781450399876

DOI:10.1145/3579051

Copyright © 2022 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 February 2023

Check for updates

Author Tags

Qualifiers

Extended-abstract
Research
Refereed limited

Conference

IJCKG 2022

IJCKG 2022: 11th International Joint Conference On Knowledge Graphs

October 27 - 28, 2022

Hangzhou, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
85
Total Downloads

Downloads (Last 12 months)47
Downloads (Last 6 weeks)3

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang MPeng SYang HZhao YQiao XZhu JTao SQin YJiang Y(2022)EntityRank: Unsupervised Mining of Bilingual Named Entity Pairs from Parallel Corpora for Neural Machine Translation2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10021032(3708-3713)Online publication date: 17-Dec-2022
https://doi.org/10.1109/BigData55660.2022.10021032

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents