Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3540250.3549102acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article
Open access

NMTSloth: understanding and testing efficiency degradation of neural machine translation systems

Published: 09 November 2022 Publication History

Abstract

Neural Machine Translation (NMT) systems have received much recent attention due to their human-level accuracy. While existing works mostly focus on either improving accuracy or testing accuracy robustness, the computation efficiency of NMT systems, which is of paramount importance due to often vast translation demands and real-time requirements, has surprisingly received little attention. In this paper, we make the first attempt to understand and test potential computation efficiency robustness in state-of-the-art NMT systems. By analyzing the working mechanism and implementation of 1455 public-accessible NMT systems, we observe a fundamental property in NMT systems that could be manipulated in an adversarial manner to reduce computation efficiency significantly. Our interesting observation is that the output length determines the computation efficiency of NMT systems instead of the input, where the output length depends on two factors: an often sufficiently large yet pessimistic pre-configured threshold controlling the max number of iterations and a runtime generated end of sentence (EOS) token. Our key motivation is to generate test inputs that could sufficiently delay the generation of EOS such that NMT systems would have to go through enough iterations to satisfy the pre-configured threshold. We present NMTSloth, which develops a gradient-guided technique that searches for a minimal and unnoticeable perturbation at character-level, token-level, and structure-level, which sufficiently delays the appearance of EOS and forces these inputs to reach the naturally-unreachable threshold. To demonstrate the effectiveness of NMTSloth, we conduct a systematic evaluation on three public-available NMT systems: Google T5, AllenAI WMT14, and Helsinki-NLP translators. Experimental results show that NMTSloth can increase NMT systems' response latency and energy consumption by 85% to 3153% and 86% to 3052%, respectively, by perturbing just one character or token in the input sentence. Our case study shows that inputs generated by NMTSloth significantly affect the battery power in real-world mobile devices (i.e., drain more than 30 times battery power than normal inputs).

References

[1]
AllenAI. 2022. https://huggingface.co/allenai/wmt16-en-de-dist-12-1. https://huggingface.co/allenai/wmt16-en-de-dist-12-1
[2]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In 3rd International Conference on Learning Representations, ICLR 2015. Openreview.net.
[3]
Yonatan Belinkov and Yonatan Bisk. 2018. Synthetic and Natural Noise Both Break Neural Machine Translation. In 6th International Conference on Learning Representations, ICLR 2018. OpenReview.net.
[4]
Wieland Brendel, Jonas Rauber, Matthias Kümmerer, Ivan Ustyuzhaninov, and Matthias Bethge. 2019. Accurate, reliable and fast robustness evaluation. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019. 12841–12851.
[5]
Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp). 39–57.
[6]
Isaac Caswell and Bowen Liang. 2020. Recent Advances in Google Translate. https://ai.googleblog.com/2020/06/recent-advances-in-google-translate.html
[7]
Simin Chen, Zihe Song, Mirazul Haque, Cong Liu, and Wei Yang. 2022. NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image Caption Generation Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 15365–15374.
[8]
Simin Chen, Zihe Song, Lei Ma, Cong Liu, and Wei Yang. 2021. AttackDist: Characterizing Zero-day Adversarial Samples by Counter Attack. https://openreview.net/forum?id=pAj7zLJK05U
[9]
Minhao Cheng, Jinfeng Yi, Pin-Yu Chen, Huan Zhang, and Cho-Jui Hsieh. 2020. Seq2sick: Evaluating the robustness of sequence-to-sequence models with adversarial examples. In Proceedings of the AAAI Conference on Artificial Intelligence, AAAI 2020. 34, 3601–3608.
[10]
Javid Ebrahimi, Daniel Lowd, and Dejing Dou. 2018. On Adversarial Examples for Character-Level Neural Machine Translation. In Proceedings of the 27th International Conference on Computational Linguistics, ACL 2018. Association for Computational Linguistics, 653–663.
[11]
Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. 2018. HotFlip: White-Box Adversarial Examples for Text Classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018. Association for Computational Linguistics, 31–36.
[12]
Andreas Eisele and Yu Chen. 2010. MultiUN: A Multilingual Corpus from United Nation Documents. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). European Language Resources Association (ELRA).
[13]
Reuben Feinman, Ryan R Curtin, Saurabh Shintre, and Andrew B Gardner. 2017. Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410.
[14]
Michael Figurnov, Maxwell D Collins, Yukun Zhu, Li Zhang, Jonathan Huang, Dmitry Vetrov, and Ruslan Salakhutdinov. 2017. Spatially Adaptive Computation Time for Residual Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017. IEEE, 1039–1048.
[15]
Google. 2022. https://huggingface.co/t5-small. https://huggingface.co/t5-small
[16]
Alex Graves. 2016. Adaptive Computation Time for Recurrent Neural Networks. arXiv preprint arXiv:1603.08983.
[17]
Jiatao Gu, Hany Hassan Awadalla, and Jacob Devlin. 2018. Universal Neural Machine Translation for Extremely Low Resource Languages. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018. Association for Computational Linguistics, 344–354.
[18]
Shashij Gupta, Pinjia He, Clara Meister, and Zhendong Su. 2020. Machine translation testing via pathological invariance. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2020. IEEE, 863–875.
[19]
Mirazul Haque, Anki Chauhan, Cong Liu, and Wei Yang. 2020. ILFO: Adversarial Attack on Adaptive Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020. IEEE, 14252–14261.
[20]
Mirazul Haque, Simin Chen, Wasif Arman Haque, Cong Liu, and Wei Yang. 2021. NODEattack: Adversarial Attack on the Energy Consumption of Neural Odes.
[21]
Hany Hassan Awadalla, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, Will Lewis, Mu Li, Shujie Liu, Tie-Yan Liu, Renqian Luo, Arul Menezes, Tao Qin, Frank Seide, Xu Tan, Fei Tian, Lijun Wu, Shuangzhi Wu, Yingce Xia, Dongdong Zhang, Zhirui Zhang, and Ming Zhou. 2018. Achieving Human Parity on Automatic Chinese to English News Translation. arXiv:1803.05567. arXiv
[22]
Hany Hassan Awadalla, Mostafa Elaraby, Ahmed Tawfik, Mahmoud Khaled, and Aly Osama. 2018. Gender Aware Spoken Language Translation Applied to English-Arabic. In 2nd International Conference on Natural Language and Speech Processing, ICNLSP 2018. IEEE, 1–6.
[23]
Hany Hassan Awadalla, Mostafa Elaraby, and Ahmed Y. Tawfik. 2017. Synthetic Data for Neural Machine Translation of Spoken-Dialects. In Proceedings of the 14th International Conference on Spoken Language Translation, IWSLT 2017, Tokyo, Japan, December 14-15, 2017. International Workshop on Spoken Language Translation, 82–89.
[24]
Pinjia He. 2022. RobustNLP Library. https://github.com/RobustNLP/TestTranslation
[25]
Pinjia He, Clara Meister, and Zhendong Su. 2020. Structure-invariant testing for machine translation. In 2020 IEEE/ACM 42nd International Conference on Software Engineering, ICSE 2020. 961–973.
[26]
Pinjia He, Clara Meister, and Zhendong Su. 2021. Testing machine translation via referential transparency. In 2021 IEEE/ACM 43rd International Conference on Software Engineering, ICSE 2021. 410–422.
[27]
Helsinki-NLP. 2022. https://huggingface.co/Helsinki-NLP/opus-mt-en-de. https://huggingface.co/Helsinki-NLP/opus-mt-en-de
[28]
Sanghyun Hong, Yiğitcan Kaya, Ionuţ-Vlad Modoranu, and Tudor Dumitraş. 2021. A Panda? No, It’s a Sloth: Slowdown Attacks on Adaptive Multi-Exit Neural Network Inference. In 9th International Conference on Learning Representations, ICLR 2021. OpenReview.net.
[29]
Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.
[30]
Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An introduction to statistical learning. 112, Springer.
[31]
Di Jin, Zhijing Jin, Joey Tianyi Zhou, and Peter Szolovits. 2020. Is bert really robust? a strong baseline for natural language attack on text classification and entailment. In Proceedings of the AAAI conference on artificial intelligence, AAAI 2020. 8018–8025.
[32]
Tiedemann Jörg, Hardwick Sam, and Shleifer Sam. 2020. https://blogs.helsinki.fi/language-technology/. https://blogs.helsinki.fi/language-technology/
[33]
Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aäron van den Oord, Alex Graves, and Koray Kavukcuoglu. 2016. Neural Machine Translation in Linear Time. CoRR, abs/1610.10099 (2016).
[34]
Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding deep learning system testing using surprise adequacy. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 1039–1049.
[35]
Philipp Koehn, Franz J Och, and Daniel Marcu. 2003. Statistical phrase-based translation. University of Southern California Marina Del Rey Information Sciences Inst.
[36]
Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, and Ting Wang. 2019. TextBugger: Generating Adversarial Text Against Real-world Applications. In 26th Annual Network and Distributed System Security Symposium, NDSS 2019. The Internet Society.
[37]
Linyang Li, Ruotian Ma, Qipeng Guo, Xiangyang Xue, and Xipeng Qiu. 2020. BERT-ATTACK: Adversarial Attack Against BERT Using BERT. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020. Association for Computational Linguistics, 6193–6202.
[38]
Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, and Luke Zettlemoyer. 2020. Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics, 726–742.
[39]
Nathan Ng, Kyra Yee, Alexei Baevski, Myle Ott, Michael Auli, and Sergey Edunov. 2019. Facebook FAIR’s WMT19 News Translation Task Submission. In Proceedings of the Fourth Conference on Machine Translation, WMT 2019, Florence, Italy, August 1-2, 2019 - Volume 2: Shared Task Papers, Day 1. Association for Computational Linguistics, 314–319.
[40]
Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated whitebox testing of deep learning systems. In proceedings of the 26th Symposium on Operating Systems Principles. 1–18.
[41]
Jeff Pitman. 2021. Google Translate: One billion installs, one billion stories. https://blog.google/products/translate/one-billion-installs/
[42]
Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. In arXiv.
[43]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 140:1–140:67.
[44]
Shuhuai Ren, Yihe Deng, Kun He, and Wanxiang Che. 2019. Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, ACL 2019. Association for Computational Linguistics, 1085–1097.
[45]
Zeyu Sun, Jie M Zhang, Mark Harman, Mike Papadakis, and Lu Zhang. 2020. Automatic testing and improvement of machine translation. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, ICSE 2020. IEEE, 974–985.
[46]
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada. 3104–3112.
[47]
Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th international conference on software engineering, ICSE 2018. IEEE, 303–314.
[48]
Barak Turovsky. 2016. Ten years of Google Translate. https://www.blog.google/products/translate/ten-years-of-google-translate/
[49]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. 5998–6008.
[50]
Huiyan Wang, Jingwei Xu, Chang Xu, Xiaoxing Ma, and Jian Lu. 2020. Dissector: Input validation for deep learning applications by crossing-layer dissection. In 2020 IEEE/ACM 42nd International Conference on Software Engineering, ICSE 2020. 727–738.
[51]
Jingyi Wang, Guoliang Dong, Jun Sun, Xinyu Wang, and Peixin Zhang. 2019. Adversarial sample detection for deep neural network through model mutation testing. In 2019 IEEE/ACM 41st International Conference on Software Engineering, ICSE 2019. 1245–1256.
[52]
Jingyi Wang, Guoliang Dong, Jun Sun, Xinyu Wang, and Peixin Zhang. 2019. Adversarial sample detection for deep neural network through model mutation testing. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 1245–1256.
[53]
Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, and Joseph E Gonzalez. 2018. Skipnet: Learning Dynamic Routing in Convolutional Networks. In Proceedings of the European Conference on Computer Vision, ECCV 2018. IEEE, 409–424.
[54]
Elaine J Weyuker and Filippos I Vokolos. 2000. Experience with performance testing of software systems: issues, an approach, and case study. IEEE transactions on software engineering, 1147–1156.
[55]
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, 38–45.
[56]
Lijun Wu, Yingce Xia, Fei Tian, Li Zhao, Tao Qin, Jianhuang Lai, and Tie-Yan Liu. 2018. Adversarial neural machine translation. In Asian Conference on Machine Learning. 534–549.
[57]
Weilin Xu, David Evans, and Yanjun Qi. 2018. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. In 25th Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, February 18-21, 2018. The Internet Society.
[58]
Yuan Zang, Fanchao Qi, Chenghao Yang, Zhiyuan Liu, Meng Zhang, Qun Liu, and Maosong Sun. 2020. Word-level Textual Adversarial Attacking as Combinatorial Optimization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 6066–6080.
[59]
Xinze Zhang, Junzhe Zhang, Zhenhua Chen, and Kun He. 2021. Crafting Adversarial Examples for Neural Machine Translation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 1967–1977.
[60]
Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition, CVPR 2018. IEEE, 6848–6856.
[61]
Wei Zou, Shujian Huang, Jun Xie, Xinyu Dai, and Jiajun Chen. 2020. A Reinforced Generation of Adversarial Examples for Neural Machine Translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020. 3486–3497.

Cited By

View all
  • (2025)TTSlow: Slow Down Text-to-Speech With Efficiency Robustness EvaluationsIEEE Transactions on Audio, Speech and Language Processing10.1109/TASLPRO.2025.353335733(693-704)Online publication date: 2025
  • (2024)Automated Testing Linguistic Capabilities of NLP ModelsACM Transactions on Software Engineering and Methodology10.1145/367245533:7(1-33)Online publication date: 14-Jun-2024
  • (2024)LLMEffiChecker: Understanding and Testing Efficiency Degradation of Large Language ModelsACM Transactions on Software Engineering and Methodology10.1145/366481233:7(1-38)Online publication date: 26-Aug-2024
  • Show More Cited By

Index Terms

  1. NMTSloth: understanding and testing efficiency degradation of neural machine translation systems
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
        November 2022
        1822 pages
        ISBN:9781450394130
        DOI:10.1145/3540250
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 09 November 2022

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Machine learning
        2. neural machine translation
        3. software testing

        Qualifiers

        • Research-article

        Funding Sources

        • NSF grant
        • Siemens Fellowship

        Conference

        ESEC/FSE '22
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 112 of 543 submissions, 21%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)685
        • Downloads (Last 6 weeks)51
        Reflects downloads up to 13 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2025)TTSlow: Slow Down Text-to-Speech With Efficiency Robustness EvaluationsIEEE Transactions on Audio, Speech and Language Processing10.1109/TASLPRO.2025.353335733(693-704)Online publication date: 2025
        • (2024)Automated Testing Linguistic Capabilities of NLP ModelsACM Transactions on Software Engineering and Methodology10.1145/367245533:7(1-33)Online publication date: 14-Jun-2024
        • (2024)LLMEffiChecker: Understanding and Testing Efficiency Degradation of Large Language ModelsACM Transactions on Software Engineering and Methodology10.1145/366481233:7(1-38)Online publication date: 26-Aug-2024
        • (2024)PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation ModelsProceedings of the ACM on Software Engineering10.1145/36437801:FSE(1194-1215)Online publication date: 12-Jul-2024
        • (2024)The Impact of Uniform Inputs on Activation Sparsity and Energy-Latency Attacks in Computer Vision2024 IEEE Security and Privacy Workshops (SPW)10.1109/SPW63631.2024.00016(104-111)Online publication date: 23-May-2024
        • (2024)Design and Development of User-friendly Bi-lingual Translation System Employing Machine Translation 5 Base Deep Learning Neural Network Framework Based NLP2024 2nd International Conference on Artificial Intelligence and Machine Learning Applications Theme: Healthcare and Internet of Things (AIMLA)10.1109/AIMLA59606.2024.10531504(1-6)Online publication date: 15-Mar-2024
        • (2024)Hybrid mutation driven testing for natural language inferenceJournal of Software: Evolution and Process10.1002/smr.2694Online publication date: 17-Jun-2024
        • (2023)RT-LM: Uncertainty-Aware Resource Management for Real-Time Inference of Language Models2023 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS59052.2023.00023(158-171)Online publication date: 5-Dec-2023
        • (2023)AntiNODE: Evaluating Efficiency Robustness of Neural ODEs2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)10.1109/ICCVW60793.2023.00164(1499-1509)Online publication date: 2-Oct-2023
        • (2023)Dynamic Neural Network is All You Need: Understanding the Robustness of Dynamic Mechanisms in Neural Networks2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)10.1109/ICCVW60793.2023.00163(1489-1498)Online publication date: 2-Oct-2023
        • Show More Cited By

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Login options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media