Detecting Machine-Translated Text using Back Translation

Hoang-Quoc Nguyen-Son, Thao Tran Phuong, Seira Hidano, Shinsaku Kiyomoto

Abstract

Machine-translated text plays a crucial role in the communication of people using different languages. However, adversaries can use such text for malicious purposes such as plagiarism and fake review. The existing methods detected a machine-translated text only using the text’s intrinsic content, but they are unsuitable for classifying the machine-translated and human-written texts with the same meanings. We have proposed a method to extract features used to distinguish machine/human text based on the similarity between the intrinsic text and its back-translation. The evaluation of detecting translated sentences with French shows that our method achieves 75.0% of both accuracy and F-score. It outperforms the existing methods whose the best accuracy is 62.8% and the F-score is 62.7%. The proposed method even detects more efficiently the back-translated text with 83.4% of accuracy, which is higher than 66.7% of the best previous accuracy. We also achieve similar results not only with F-score but also with similar experiments related to Japanese. Moreover, we prove that our detector can recognize both machine-translated and machine-back-translated texts without the language information which is used to generate these machine texts. It demonstrates the persistence of our method in various applications in both low- and rich-resource languages.

Anthology ID:: W19-8626
Volume:: Proceedings of the 12th International Conference on Natural Language Generation
Month:: October–November
Year:: 2019
Address:: Tokyo, Japan
Editors:: Kees van Deemter, Chenghua Lin, Hiroya Takamura
Venue:: INLG
SIG:: SIGGEN
Publisher:: Association for Computational Linguistics
Note:
Pages:: 189–197
Language:
URL:: https://aclanthology.org/W19-8626
DOI:: 10.18653/v1/W19-8626
Bibkey:
Cite (ACL):: Hoang-Quoc Nguyen-Son, Thao Tran Phuong, Seira Hidano, and Shinsaku Kiyomoto. 2019. Detecting Machine-Translated Text using Back Translation. In Proceedings of the 12th International Conference on Natural Language Generation, pages 189–197, Tokyo, Japan. Association for Computational Linguistics.
Cite (Informal):: Detecting Machine-Translated Text using Back Translation (Nguyen-Son et al., INLG 2019)
Copy Citation:
PDF:: https://aclanthology.org/W19-8626.pdf

PDF Cite Search