Abstract
It is known that neural networks are subject to attacks through adversarial perturbations. Worse yet, such attacks are impossible to eliminate, i.e., the adversarial perturbation is still possible after applying mitigation methods such as adversarial training. Multiple approaches have been developed to detect and reject such adversarial inputs. Rejecting suspicious inputs however may not be always feasible or ideal. First, normal inputs may be rejected due to false alarms generated by the detection algorithm. Second, denial-of-service attacks may be conducted by feeding such systems with adversarial inputs. To address this, in this work, we focus on the text domain and propose an approach to automatically repair adversarial texts at runtime. Given a text which is suspected to be adversarial, we novelly apply multiple adversarial perturbation methods in a positive way to identify a repair, i.e., a slightly mutated but semantically equivalent text that the neural network correctly classifies. Experimental results show that our approach effectively repairs about 80% of adversarial texts. Furthermore, depending on the applied perturbation method, an adversarial text could be repaired about one second on average.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agarwal, A., Singh, R., Vatsa, M., Ratha, N.K.: Image transformation based defense against adversarial perturbation on deep learning models. IEEE Trans. Dependable Secure Comput. 15(5), 2106–2121 (2021)
Akhtar, N., Liu, J., Mian, A.: Defense against universal adversarial perturbations. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3389–3398 (2018)
Anscombe, F.J.: Fixed-sample-size analysis of sequential observations. Biometrics 10(1), 89–100 (1954)
Athalye, A., Carlini, N., Wagner, D.A.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: Proceedings of the 35th International Conference on Machine Learning, pp. 274–283 (2018)
Avriel, M., Wilde, D.J.: Optimally proof for the symmetric Fibonacci search technique. Fibonacci Q. J. (1966)
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: Proceedings of the 38th IEEE Symposium on Security and Privacy, pp. 39–57 (2017)
Carlini, N., Wagner, D.A.: Audio adversarial examples: targeted attacks on speech-to-text. In: Proceedings of the 39th IEEE Symposium on Security and Privacy Workshops, pp. 1–7 (2018)
Das, N., et al.: Keeping the bad guys out: protecting and vaccinating deep learning with JPEG compression. CoRR abs/1705.02900 (2017)
Dhillon, G.S., et al.: Stochastic activation pruning for robust adversarial defense. In: Proceedings of the 6th International Conference on Learning Representations (2018)
Dos Santos, C., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers, pp. 69–78 (2014)
Dua, D., Graff, C.: UCI machine learning repository (2017)
Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: HotFlip: white-box adversarial examples for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp. 31–36 (2018)
Gao, J., Lanchantin, J., Soffa, M.L., Qi, Y.: Black-box generation of adversarial text sequences to evade deep learning classifiers. In: Proceedings of the 39th IEEE Symposium on Security and Privacy Workshops, pp. 50–56 (2018)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples (2015)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Hosseini, H., Kannan, S., Zhang, B., Poovendran, R.: Deceiving Google’s perspective API built for detecting toxic comments. arXiv preprint arXiv:1702.08138 (2017)
Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., Madry, A.: Adversarial examples are not bugs, they are features. In: Proceedings of the 33rd Annual Conference on Neural Information Processing Systems, pp. 125–136 (2019)
Iyyer, M., Wieting, J., Gimpel, K., Zettlemoyer, L.: Adversarial example generation with syntactically controlled paraphrase networks. arXiv preprint arXiv:1804.06059 (2018)
Jain, G., Sharma, M., Agarwal, B.: Spam detection in social media using convolutional and long short term memory neural network. Ann. Math. Artif. Intell. 85(1), 21–44 (2019)
Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence, pp. 8018–8025 (2020)
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1746–1751 (2014)
Li, J., Ji, S., Du, T., Li, B., Wang, T.: TextBugger: generating adversarial text against real-world applications. In: Proceedings of the 26th Annual Network and Distributed System Security Symposium (2019)
Liu, J., et al.: Detection based defense against adversarial examples from the steganalysis point of view. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4825–4834 (2019)
Lu, J., Issaranon, T., Forsyth, D.: SafetyNet: detecting and rejecting adversarial examples robustly. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 446–454 (2017)
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: Proceedings of the 6th International Conference on Learning Representations (2018)
McKeeman, W.M.: Differential testing for software. Digit. Tech. J. 10(1), 100–107 (1998)
Meng, D., Chen, H.: MagNet: a two-pronged defense against adversarial examples. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 135–147 (2017)
Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124. Association for Computational Linguistics (2005)
Papernot, N., McDaniel, P., Swami, A., Harang, R.: Crafting adversarial input sequences for recurrent neural networks. In: Proceedings of the 2016 IEEE Military Communications Conference, pp. 49–54 (2016)
Pruthi, D., Dhingra, B., Lipton, Z.C.: Combating adversarial misspellings with robust word recognition, pp. 5582–5591 (2019)
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32
Ren, Y., Ji, D.: Neural networks for deceptive opinion spam detection: an empirical study. Inf. Sci. 385, 213–224 (2017)
Ribeiro, M.T., Singh, S., Guestrin, C.: Semantically equivalent adversarial rules for debugging NLP models. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 856–865 (2018)
Rosenberg, I., Shabtai, A., Elovici, Y., Rokach, L.: Defense methods against adversarial examples for recurrent neural networks. arXiv preprint arXiv:1901.09963 (2019)
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Segler, M.H., Preuss, M., Waller, M.P.: Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555(7698), 604 (2018)
Shafahi, A., et al.: Adversarial training for free! In: Proceedings of the 33rd Annual Conference on Neural Information Processing Systems, pp. 3353–3364 (2019)
Shaffer, J.P.: Multiple hypothesis testing. Annu. Rev. Psychol. 46(1), 561–584 (1995)
Song, Y., Kim, T., Nowozin, S., Ermon, S., Kushman, N.: PixelDefend: leveraging generative models to understand and defend against adversarial examples. In: Proceedings of the 6th International Conference on Learning Representations (2018)
Szegedy, C., et al.: Intriguing properties of neural networks. In: Proceedings of the 2nd International Conference on Learning Representations (2014)
Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1422–1432 (2015)
Van Erven, T., Harremos, P.: Rényi divergence and Kullback-Leibler divergence. IEEE Trans. Inf. Theory 60(7), 3797–3820 (2014)
Wald, A.: Sequential tests of statistical hypotheses. Ann. Math. Stat. 16(2), 117–186 (1945)
Wald, A.: Sequential Analysis, 1st edn. Wiley, Hoboken (1947)
Wang, J., Dong, G., Sun, J., Wang, X., Zhang, P.: Adversarial sample detection for deep neural network through model mutation testing. In: Proceedings of the 41st International Conference on Software Engineering, pp. 1245–1256. IEEE Press (2019)
Wang, W., Wang, R., Ke, J., Wang, L.: TextFirewall: omni-defending against adversarial texts in sentiment classification. IEEE Access 9, 27467–27475 (2021)
Wang, X., Jin, H., He, K.: Natural language adversarial attacks and defenses in word level. arXiv preprint arXiv:1909.06723 (2019)
Xu, H., et al.: Adversarial attacks and defenses in images, graphs and text: a review. Int. J. Autom. Comput. 17(2), 151–178 (2020)
Yang, J., Wu, M., Liu, X.Z.: Defense against adversarial attack using PCA. In: Proceedings of the 6th International Conference on Artificial Intelligence and Security, pp. 627–636 (2020)
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)
Zhang, H., Chen, H., Song, Z., Boning, D., Dhillon, I.S., Hsieh, C.J.: The limitations of adversarial training and the blind-spot attack. arXiv preprint arXiv:1901.04684 (2019)
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Proceedings of the 29th Annual Conference on Neural Information Processing Systems, pp. 649–657 (2015)
Zheng, Z., Hong, P.: Robust detection of adversarial attacks by modeling the intrinsic properties of deep neural networks. In: Proceedings of the 32nd Annual Conference on Neural Information Processing Systems, pp. 7913–7922 (2018)
Acknowledgements
This research is supported by the Key-Area Research and Development Program of Guangdong Province (Grant No.2020B0101100005), Key R&D Program of Zhejiang (2022C01018) and the NSFC Program (62102359). This research is also supported by the National Research Foundation, Singapore under its AI Singapore Programme (AISG Award No: AISG-RP-2019-012).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Dong, G. et al. (2022). Repairing Adversarial Texts Through Perturbation. In: Aït-Ameur, Y., Crăciun, F. (eds) Theoretical Aspects of Software Engineering. TASE 2022. Lecture Notes in Computer Science, vol 13299. Springer, Cham. https://doi.org/10.1007/978-3-031-10363-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-10363-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10362-9
Online ISBN: 978-3-031-10363-6
eBook Packages: Computer ScienceComputer Science (R0)