Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

A benchmark dataset and evaluation methodology for Chinese zero pronoun translation

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

The phenomenon of zero pronoun (ZP) has attracted increasing interest in the machine translation community due to its importance and difficulty. However, previous studies generally evaluate the quality of translating ZPs with BLEU score on MT testsets, which is not expressive or sensitive enough for accurate assessment. To bridge the data and evaluation gaps, we propose a benchmark testset and evaluation metric for target evaluation on Chinese ZP translation. The human-annotated testset covers five challenging genres, which reveal different characteristics of ZPs for comprehensive evaluation. We systematically revisit advanced models on ZP translation and identify current challenges for future exploration. We release data, code, and trained models, which we hope can significantly promote research in this field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Our released resources: https://github.com/longyuewangdcu/mZPRT.

  2. https://cemantix.org/conll/2012/introduction.html.

  3. https://cemantix.org/conll/2011/introduction.html.

  4. Anaphoric pronoun, whose reference must be specified by a noun phrase occurring previously in the text. Non-anaphoric pronoun refers to an entity that is salient from larger units of discourse (such as full sentences or passages) or from the extralinguistic environment (outside of the text altogether).

  5. A pronominal determiner phrase without phonological content.

  6. https://aclanthology.org/W15-2500.

  7. https://aclanthology.org/volumes/W17-48.

  8. https://aclanthology.org/W16-2345.

  9. https://github.com/longyuewangdcu/tvsub.

  10. http://longyuewang.com/corpora/resource.html.

  11. https://zhidao.baidu.com.

  12. https://language.chinadaily.com.cn.

  13. https://www.qidian.com.

  14. https://www.webnovel.com.

  15. https://opus.nlpl.eu/OpenSubtitles-v2018.php.

  16. http://www.statmt.org/wmt21/translation-task.html.

  17. https://github.com/moses-smt/mosesdecoder/scripts/generic/multi-bleu.perl.

  18. We combined data in movie subtitle and Q &A forum as the training data for building ZPR models in other domains.

References

  • Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In Y. Bengio & Y. LeCun (Eds.), Proceedings of the 2015 international conference on learning representations, 2015. http://arxiv.org/abs/1409.0473

  • Baran, E., Yang, Y., & Xue, N. (2012). Annotating dropped pronouns in Chinese newswire text. In Proceedings of the 2012 eighth international conference on language resources and evaluation, 2012 (pp. 2795–2799).

  • Barbosa, P. P. (2019). Pro as a minimal NP: Toward a unified approach to pro-drop. Linguistic Inquiry, 50(3), 487–526.

    Article  Google Scholar 

  • Cai, S., Chiang, D., & Goldberg, Y. (2011). Language-independent parsing with empty elements. In Proceedings of the 2011 annual meeting of the Association for Computational Linguistics, Portland, Oregon, 2011 (pp. 212–216).

  • Chen, C., & Ng, V. (2013). Chinese zero pronoun resolution: Some recent advances. In Proceedings of the 2013 conference on empirical methods in natural language processing, 2013 (pp. 1360–1365). https://aclanthology.org/D13-1135

  • Chen, C., & Ng, V. (2015). Chinese zero pronoun resolution: A joint unsupervised discourse-aware model rivaling state-of-the-art resolvers. In Proceedings of the 2015 annual meeting of the Association for Computational Linguistics and the 2015 international joint conference on natural language processing, 2015 (pp. 320–326). https://doi.org/10.3115/v1/P15-2053.

  • Chen, C., & Ng, V. (2016). Chinese zero pronoun resolution with deep neural networks. In Proceedings of the 2016 annual meeting of the Association for Computational Linguistics, 2016 (pp. 778–788). https://doi.org/10.18653/v1/P16-1074.

  • Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 conference on empirical methods in natural language processing, 2014 (pp. 1724–1734). https://doi.org/10.3115/v1/D14-1179.

  • Chung, T., & Gildea, D. (2010). Effects of empty categories on machine translation. In Proceedings of the 2010 conference on empirical methods in natural language processing, 2010 (pp. 636–645). https://aclanthology.org/D10-1062

  • Collins, M., Koehn, P., & Kučerová, I. (2005). Clause restructuring for statistical machine translation. In Proceedings of the 2005 annual meeting of the Association for Computational Linguistics, 2005 (pp. 531–540). https://doi.org/10.3115/1219840.1219906.

  • Denkowski, M., & Lavie, A. (2014). Meteor universal: Language specific translation evaluation for any target language. In Proceedings of the 2014 workshop on statistical machine translation, 2014 (pp. 376–380). https://doi.org/10.3115/v1/W14-3348.

  • Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Long and Short Papers), 2019 (Vol. 1, pp. 4171–4186). https://doi.org/10.18653/v1/N19-1423.

  • Gehring, J., Auli, M., Grangier, D., Yarats, D., & Dauphin, Y. N. (2017). Convolutional sequence to sequence learning. In: Proceedings of the 2017 international conference on machine learning, 2017 (Vol. 70, pp. 1243–1252). https://proceedings.mlr.press/v70/gehring17a.html

  • Huang, G., Liu, L., Wang, X., Wang, L., Li, H., Tu, Z., Huang, C., & Shi, S. (2021). TranSmart: A practical interactive machine translation system. arXiv preprint. arXiv:2105.13072

  • Kalchbrenner, N., & Blunsom, P. (2013). Recurrent continuous translation models. In Proceedings of the 2013 conference on empirical methods in natural language processing, 2013 (pp. 1700–1709). https://aclanthology.org/D13-1176

  • Koehn, P. (2009). Statistical machine translation. Cambridge University Press.

    Book  Google Scholar 

  • Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., & Herbst, E. (2007). Moses: Open source toolkit for statistical machine translation. In Proceedings of the 2007 annual meeting of the Association for Computational Linguistics companion volume proceedings of the demo and poster sessions, 2007 (pp. 177–180).

  • Kong, F., & Zhou, G. (2010). A tree kernel-based unified framework for Chinese zero anaphora resolution. In Proceedings of the 2010 conference on empirical methods in natural language processing, 2010 (pp. 882–891). https://aclanthology.org/D10-1086

  • Läubli, S., Sennrich, R., & Volk, M. (2018). Has machine translation achieved human parity? A case for document-level evaluation. In Proceedings of the 2018 conference on empirical methods in natural language processing, 2018 (pp. 4791–4796). https://doi.org/10.18653/v1/D18-1512.

  • Le Nagard, R., & Koehn, P. (2010). Aiding pronoun translation with co-reference resolution. In Proceedings of the joint 2010 workshop on statistical machine translation and metrics, MATR, 2010 (pp. 252–261).

  • Li, B., Liu, H., Wang, Z., Jiang, Y., Xiao, T., Zhu, J., Liu, T., & Li, C. (2020). Does multi-encoder help? A case study on context-aware neural machine translation. In Proceedings of the 2020 annual meeting of the Association for Computational Linguistics, 2020 (pp. 3512–3518). https://doi.org/10.18653/v1/2020.acl-main.322.

  • Li, C. N., & Thompson, S. A. (1979). Third-person pronouns and zero-anaphora in Chinese discourse. In Discourse and syntax (pp. 311–335). Brill.

  • Ma, Q., Wei, J., Bojar, O., & Graham, Y. (2019). Results of the WMT19 metrics shared task: Segment-level and strong MT systems pose big challenges. In Proceedings of the 2019 conference on machine translation, 2019 (pp. 62–90). https://doi.org/10.18653/v1/W19-5302.

  • Ma, S., Zhang, D., & Zhou, M. (2020). A simple and effective unified encoder for document-level machine translation. In Proceedings of the 2020 annual meeting of the Association for Computational Linguistics, 2020 (pp. 3505–3511). https://doi.org/10.18653/v1/2020.acl-main.321.

  • Miculicich Werlen, L., & Popescu-Belis, A. (2017). Validation of an automatic metric for the accuracy of pronoun translation (APT). In Proceedings of the 2017 workshop on discourse in machine translation, 2017 (pp. 17–25). https://doi.org/10.18653/v1/W17-4802.

  • Mitkov, R. (2014). Anaphora resolution. Routledge.

    Book  Google Scholar 

  • Müller, M., Rios, A., Voita, E., & Sennrich, R. (2018). A large-scale test set for the evaluation of context-aware pronoun translation in neural machine translation. In Proceedings of the 2018 conference on machine translation: Research papers, 2018 (pp. 61–72). https://doi.org/10.18653/v1/W18-6307.

  • Nirenburg, S., Raskin, V., & Tucker, A. (1986). On knowledge-based machine translation. In Proceedings of the 1986 coreference on computational linguistics, 1986 (Vol. 4, pp. 5–24). https://doi.org/10.1007/BF00367750.

  • Och, F. J., & Ney, H. (2003). A systematic comparison of various statistical alignment models. In Proceedings of the 2003 computational linguistics, 2003 (Vol. 29(1), pp. 19–51). https://doi.org/10.1162/089120103321337421.

  • Ott, M., Edunov, S., Grangier, D., & Auli, M. (2018). Scaling neural machine translation. In Proceedings of the 2018 conference on machine translation: Research papers, 2018 (pp. 1–9). https://doi.org/10.18653/v1/W18-6301.

  • Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 2002 annual meeting of the Association for Computational Linguistics, 2002 (pp. 311–318). https://doi.org/10.3115/1073083.1073135.

  • Peral, J., & Ferrández, A. (2003). Translation of pronominal anaphora between English and Spanish: Discrepancies and evaluation. 18, 117–147. https://doi.org/10.1613/jair.1115.

  • Pradhan, S., Moschitti, A., Xue, N., Uryupina, O., & Zhang, Y. (2012). CoNLL-2012 shared task: Modeling multilingual unrestricted coreference in OntoNotes. In Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning—Shared task, 2012 (pp. 1–40). https://aclanthology.org/W12-4501

  • Rao, S., Ettinger, A., Daumé III, H., & Resnik, P. (2015). Dialogue focus tracking for zero pronoun resolution. In Proceedings of the 2015 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2015 (pp. 494–503). https://doi.org/10.3115/v1/N15-1052.

  • Ri, R., Nakazawa, T., & Tsuruoka, Y. (2021). Zero-pronoun data augmentation for Japanese-to-English translation. In Proceedings of the 2021 workshop on Asian translation, 2021 (pp. 117–123). https://doi.org/10.18653/v1/2021.wat-1.11.

  • Sennrich, R., Haddow, B., & Birch, A. (2016). Neural machine translation of rare words with subword units. In Proceedings of the 2016 annual meeting of the Association for Computational Linguistics, 2016 (pp. 1715–1725). https://doi.org/10.18653/v1/P16-1162.

  • Snover, M., Dorr, B., Schwartz, R., Micciulla, L., & Makhoul, J. (2006). A study of translation edit rate with targeted human annotation. In Proceedings of the 2006 conference of the Association for Machine Translation in the Americas: Technical papers, 2006 (pp. 223–231).

  • Song, L., Xu, K., Zhang, Y., Chen, J., & Yu, D. (2020). ZPR2: Joint zero pronoun recovery and resolution using multi-task learning and BERT. In Proceedings of the 2020 annual meeting of the Association for Computational Linguistics, 2020 (pp. 5429–5434). https://doi.org/10.18653/v1/2020.acl-main.482.

  • Su, H., Shen, X., Zhang, R., Sun, F., Hu, P., Niu, C., & Zhou, J. (2019). Improving multi-turn dialogue modelling with utterance ReWriter. In Proceedings of the 2019 annual meeting of the Association for Computational Linguistics, 2019 (pp. 22–31). https://doi.org/10.18653/v1/P19-1003.

  • Sutskever, I., Vinyals, O., & Le, Q.V. (2014). Sequence to sequence learning with neural networks. In Proceedings of the 2014 advances in neural information processing systems, 2014 (Vol. 27).

  • Taira, H., Sudoh, K., & Nagata, M. (2012). Zero pronoun resolution can improve the quality of J–E translation. In Proceedings of the 2012 workshop on syntax, semantics and structure in statistical translation, 2012 (pp. 111–118).

  • Tan, X., Zhang, L., & Zhou, G. (2021). Coupling context modeling with zero pronoun recovering for document-level natural language generation. In Proceedings of the 2021 conference on empirical methods in natural language processing, 2021 (pp. 2530–2540). https://doi.org/10.18653/v1/2021.emnlp-main.197.

  • Tu, Z., Liu, Y., Shang, L., Liu, X., & Li, H. (2017). Neural machine translation with reconstruction. In Proceedings of the 2017 thirty-first AAAI conference on artificial intelligence, 2017 (pp. 3097–3103). https://doi.org/10.1609/aaai.v31i1.10950.

  • Tu, Z., Liu, Y., Shi, S., & Zhang, T. (2018). Learning to remember translation history with a continuous cache. In Proceedings of the 2018 transactions of the Association for Computational Linguistics, 2018 (Vol. 6, pp. 407–420). https://doi.org/10.1162/tacl_a_00029.

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. U., & Polosukhin, I. (2017). Attention is all you need. In Proceedings of the 2017 advances in neural information processing systems, 2017 (Vol. 30).

  • Voita, E., Sennrich, R., & Titov, I. (2019). When a good translation is wrong in context: Context-aware machine translation improves on Deixis, ellipsis, and lexical cohesion. In Proceedings of the 2019 annual meeting of the Association for Computational Linguistics, 2019 (pp. 1198–1212). https://doi.org/10.18653/v1/P19-1116.

  • Voita, E., Serdyukov, P., Sennrich, R., & Titov, I. (2018). Context-aware neural machine translation learns anaphora resolution. In Proceedings of the 2018 annual meeting of the Association for Computational Linguistics, 2018 (pp. 1264–1274). https://doi.org/10.18653/v1/P18-1117.

  • Wang, L., Du, Z., Liu, D., Cai, D., Yu, D., Jiang, H., Wang, Y., Shi, S., & Tu, Z. (2023). GuoFeng: A discourse-aware evaluation benchmark for language understanding, translation and generation. In ICLR conference, 2023.

  • Wang, L., Li, M., Liu, F., Shi, S., Tu, Z., Wang, X., Wu, S., Zeng, J., & Zhang, W. (2021). Tencent translation system for the WMT21 news translation task. In Proceedings of the sixth conference on machine translation, 2021 (pp. 216–224).

  • Wang, L., Tu, Z., Zhang, X., Liu, S., Li, H., Way, A., & Liu, Q. (2017). A novel and robust approach for pro-drop language translation. Machine Translation, 31(1–2), 65–87.

    Article  Google Scholar 

  • Wang, L., Tu, Z., Shi, S., Zhang, T., Graham, Y., & Liu, Q. (2018). Translating pro-drop languages with reconstruction models. In Proceedings of the 2018 AAAI conference on artificial intelligence, 2018 (Vol. 32, pp. 4937–4945). https://doi.org/10.1609/aaai.v32i1.11913.

  • Wang, L., Tu, Z., Wang, X., Ding, L., Ding, L., & Shi, S. (2020). Tencent AI Lab machine translation systems for WMT20 chat translation task. In Proceedings of the fifth conference on machine translation, 2020 (pp. 483–491).

  • Wang, L., Tu, Z., Wang, X., & Shi, S. (2019). One model to learn both: Zero pronoun prediction and translation. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, 2019 (pp. 921–930). https://doi.org/10.18653/v1/D19-1085.

  • Wang, L., Tu, Z., Way, A., & Liu, Q. (2017). Exploiting cross-sentence context for neural machine translation. In EMNLP, 2017.

  • Wang, L., Tu, Z., Way, A., & Liu, Q. (2018). Learning to jointly translate and predict dropped pronouns with a shared reconstruction mechanism. In EMNLP, 2018.

  • Wang, L., Tu, Z., Zhang, X., Li, H., Way, A., & Liu, Q. (2016). A novel approach to dropped pronoun translation. In Proceedings of the 2016 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016 (pp. 983–993). https://doi.org/10.18653/v1/N16-1113.

  • Wang, L., Zhang, X., Tu, Z., Li, H., & Liu, Q. (2016). Dropped pronoun generation for dialogue machine translation. In ICASSP, 2016.

  • Wong, B. T. M., & Kit, C. (2012). Extending machine translation evaluation metrics with lexical cohesion to document level. In Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, 2012 (pp. 1060–1068). https://aclanthology.org/D12-1097

  • Wu, S., Wang, X., Wang, L., Liu, F., Xie, J., Tu, Z., Shi, S., & Li, M. (2020). Tencent neural machine translation systems for the WMT20 news translation task. In Proceedings of the fifth conference on machine translation, 2020 (pp. 313–319).

  • Xiang, B., Luo, X., & Zhou, B. (2013). Enlisting the ghost: Modeling empty categories for machine translation. In Proceedings of the 2013 annual meeting of the Association for Computational Linguistics, 2013 (pp. 822–831).

  • Xue, N., Xia, F., Chiou, F.-D., & Palmer, M. (2005). The Penn Chinese Treebank: Phrase structure annotation of a large corpus. In Proceedings of the 2005 natural language engineering, 2005 (Vol. 11(02), pp. 207–238).

  • Xue, N., & Yang, Y. (2013). Dependency-based empty category detection via phrase structure trees. In Proceedings of the 2013 conference of the North American Chapter of the Association for Computational Linguistics, Atlanta, Georgia, USA, 2013 (pp. 1051–1060).

  • Yang, J., Tong, J., Li, S., Gao, S., Guo, J., & Xue, N. (2019). Recovering dropped pronouns in Chinese conversations via modeling their referents. In Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019 (pp. 892–901). https://doi.org/10.18653/v1/N19-1095.

  • Yang, X., Su, J., & Tan, C. L. (2006). Kernel-based pronoun resolution with structured syntactic knowledge. In Proceedings of the 2006 international conference on computational linguistics and 2006 annual meeting of the Association for Computational Linguistics, 2006 (pp. 41–48). https://doi.org/10.3115/1220175.1220181.

  • Yang, Y., Liu, Y., & Xue, N. (2015). Recovering dropped pronouns from Chinese text messages. In Proceedings of the 2015 annual meeting of the Association for Computational Linguistics and the 2015 international joint conference on natural language processing, 2015 (pp. 309–313). https://doi.org/10.3115/v1/P15-2051.

  • Yang, Y., & Xue, N. (2010). Chasing the ghost: Recovering empty categories in the Chinese treebank. In Proceedings of the 2010 international conference on computational linguistics, 2010 (pp. 1382–1390). https://aclanthology.org/C10-2158

  • Yin, Q., Zhang, Y., Zhang, W., Liu, T., & Wang, W. Y. (2018). Zero pronoun resolution with attention-based neural network. In Proceedings of the 2018 international conference on computational linguistics, 2018 (pp. 13–23).

  • Yu, L., Sartran, L., Stokowiec, W., Ling, W., Kong, L., Blunsom, P., & Dyer, C. (2020). Better document-level machine translation with Bayes’ rule. In Proceedings of the 2020 transactions of the Association for Computational Linguistics, 2020 (Vol. 8, pp. 346–360). https://doi.org/10.1162/tacl_a_00319.

  • Zhang, J., Luan, H., Sun, M., Zhai, F., Xu, J., Zhang, M., & Liu, Y. (2018). Improving the transformer translation model with document-level context. In Proceedings of the 2018 conference on empirical methods in natural language processing, 2018 (pp. 533–542). https://doi.org/10.18653/v1/D18-1049.

  • Zhang, W., Liu, T., Yin, Q., & Zhang, Y. (2019). Neural recovery machine for Chinese dropped pronoun. In Proceedings of the 2019 frontiers of computer science, 2019 (pp. 1023–1033). https://doi.org/10.1007/s11704-018-7136-7.

  • Zhao, S., & Ng, H. T. (2007). Identification and resolution of Chinese zero pronouns: A machine learning approach. In Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, 2007 (pp. 541–550). https://aclanthology.org/D07-1057

Download references

Acknowledgements

This work was supported in part by the Science and Technology Development Fund, Macau SAR (Grant Nos. FDCT/060/2022/AFJ, FDCT/0070/2022/AMJ) and the Multi-year Research Grant from the University of Macau (Grant No. MYRG2020-00054-FST).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Longyue Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Benchmark in other evaluation metrics

Appendix: Benchmark in other evaluation metrics

See Table 11.

Table 11 A benchmark of ZPT evaluated on the proposed dataset

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, M., Wang, L., Liu, S. et al. A benchmark dataset and evaluation methodology for Chinese zero pronoun translation. Lang Resources & Evaluation 57, 1263–1293 (2023). https://doi.org/10.1007/s10579-023-09660-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-023-09660-5

Keywords

Navigation