Repairing Adversarial Texts Through Perturbation

Guoliang Dong⁹,
Jingyi Wang⁹,
Jun Sun¹⁰,
Sudipta Chattopadhyay¹¹,
Xinyu Wang⁹,
Ting Dai¹²,
Jie Shi¹² &
…
Jin Song Dong¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13299))

Included in the following conference series:

International Symposium on Theoretical Aspects of Software Engineering

800 Accesses
1 Citations

Abstract

It is known that neural networks are subject to attacks through adversarial perturbations. Worse yet, such attacks are impossible to eliminate, i.e., the adversarial perturbation is still possible after applying mitigation methods such as adversarial training. Multiple approaches have been developed to detect and reject such adversarial inputs. Rejecting suspicious inputs however may not be always feasible or ideal. First, normal inputs may be rejected due to false alarms generated by the detection algorithm. Second, denial-of-service attacks may be conducted by feeding such systems with adversarial inputs. To address this, in this work, we focus on the text domain and propose an approach to automatically repair adversarial texts at runtime. Given a text which is suspected to be adversarial, we novelly apply multiple adversarial perturbation methods in a positive way to identify a repair, i.e., a slightly mutated but semantically equivalent text that the neural network correctly classifies. Experimental results show that our approach effectively repairs about 80% of adversarial texts. Furthermore, depending on the applied perturbation method, an adversarial text could be repaired about one second on average.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Chinese adversarial examples generation approach with multi-strategy based on semantic

Article 17 March 2022

Adversarial Text Purification: A Large Language Model Approach for Defense

Detecting textual adversarial examples through text modification on text classification systems

Article 24 February 2023

Notes

References

Agarwal, A., Singh, R., Vatsa, M., Ratha, N.K.: Image transformation based defense against adversarial perturbation on deep learning models. IEEE Trans. Dependable Secure Comput. 15(5), 2106–2121 (2021)
Google Scholar
Akhtar, N., Liu, J., Mian, A.: Defense against universal adversarial perturbations. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3389–3398 (2018)
Google Scholar
Anscombe, F.J.: Fixed-sample-size analysis of sequential observations. Biometrics 10(1), 89–100 (1954)
Article Google Scholar
Athalye, A., Carlini, N., Wagner, D.A.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: Proceedings of the 35th International Conference on Machine Learning, pp. 274–283 (2018)
Google Scholar
Avriel, M., Wilde, D.J.: Optimally proof for the symmetric Fibonacci search technique. Fibonacci Q. J. (1966)
Google Scholar
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: Proceedings of the 38th IEEE Symposium on Security and Privacy, pp. 39–57 (2017)
Google Scholar
Carlini, N., Wagner, D.A.: Audio adversarial examples: targeted attacks on speech-to-text. In: Proceedings of the 39th IEEE Symposium on Security and Privacy Workshops, pp. 1–7 (2018)
Google Scholar
Das, N., et al.: Keeping the bad guys out: protecting and vaccinating deep learning with JPEG compression. CoRR abs/1705.02900 (2017)
Google Scholar
Dhillon, G.S., et al.: Stochastic activation pruning for robust adversarial defense. In: Proceedings of the 6th International Conference on Learning Representations (2018)
Google Scholar
Dos Santos, C., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers, pp. 69–78 (2014)
Google Scholar
Dua, D., Graff, C.: UCI machine learning repository (2017)
Google Scholar
Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: HotFlip: white-box adversarial examples for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp. 31–36 (2018)
Google Scholar
Gao, J., Lanchantin, J., Soffa, M.L., Qi, Y.: Black-box generation of adversarial text sequences to evade deep learning classifiers. In: Proceedings of the 39th IEEE Symposium on Security and Privacy Workshops, pp. 50–56 (2018)
Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples (2015)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Hosseini, H., Kannan, S., Zhang, B., Poovendran, R.: Deceiving Google’s perspective API built for detecting toxic comments. arXiv preprint arXiv:1702.08138 (2017)
Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., Madry, A.: Adversarial examples are not bugs, they are features. In: Proceedings of the 33rd Annual Conference on Neural Information Processing Systems, pp. 125–136 (2019)
Google Scholar
Iyyer, M., Wieting, J., Gimpel, K., Zettlemoyer, L.: Adversarial example generation with syntactically controlled paraphrase networks. arXiv preprint arXiv:1804.06059 (2018)
Jain, G., Sharma, M., Agarwal, B.: Spam detection in social media using convolutional and long short term memory neural network. Ann. Math. Artif. Intell. 85(1), 21–44 (2019)
Article Google Scholar
Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence, pp. 8018–8025 (2020)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1746–1751 (2014)
Google Scholar
Li, J., Ji, S., Du, T., Li, B., Wang, T.: TextBugger: generating adversarial text against real-world applications. In: Proceedings of the 26th Annual Network and Distributed System Security Symposium (2019)
Google Scholar
Liu, J., et al.: Detection based defense against adversarial examples from the steganalysis point of view. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4825–4834 (2019)
Google Scholar
Lu, J., Issaranon, T., Forsyth, D.: SafetyNet: detecting and rejecting adversarial examples robustly. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 446–454 (2017)
Google Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: Proceedings of the 6th International Conference on Learning Representations (2018)
Google Scholar
McKeeman, W.M.: Differential testing for software. Digit. Tech. J. 10(1), 100–107 (1998)
Google Scholar
Meng, D., Chen, H.: MagNet: a two-pronged defense against adversarial examples. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 135–147 (2017)
Google Scholar
Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124. Association for Computational Linguistics (2005)
Google Scholar
Papernot, N., McDaniel, P., Swami, A., Harang, R.: Crafting adversarial input sequences for recurrent neural networks. In: Proceedings of the 2016 IEEE Military Communications Conference, pp. 49–54 (2016)
Google Scholar
Pruthi, D., Dhingra, B., Lipton, Z.C.: Combating adversarial misspellings with robust word recognition, pp. 5582–5591 (2019)
Google Scholar
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32
Chapter Google Scholar
Ren, Y., Ji, D.: Neural networks for deceptive opinion spam detection: an empirical study. Inf. Sci. 385, 213–224 (2017)
Article Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: Semantically equivalent adversarial rules for debugging NLP models. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 856–865 (2018)
Google Scholar
Rosenberg, I., Shabtai, A., Elovici, Y., Rokach, L.: Defense methods against adversarial examples for recurrent neural networks. arXiv preprint arXiv:1901.09963 (2019)
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Article Google Scholar
Segler, M.H., Preuss, M., Waller, M.P.: Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555(7698), 604 (2018)
Article Google Scholar
Shafahi, A., et al.: Adversarial training for free! In: Proceedings of the 33rd Annual Conference on Neural Information Processing Systems, pp. 3353–3364 (2019)
Google Scholar
Shaffer, J.P.: Multiple hypothesis testing. Annu. Rev. Psychol. 46(1), 561–584 (1995)
Article Google Scholar
Song, Y., Kim, T., Nowozin, S., Ermon, S., Kushman, N.: PixelDefend: leveraging generative models to understand and defend against adversarial examples. In: Proceedings of the 6th International Conference on Learning Representations (2018)
Google Scholar
Szegedy, C., et al.: Intriguing properties of neural networks. In: Proceedings of the 2nd International Conference on Learning Representations (2014)
Google Scholar
Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1422–1432 (2015)
Google Scholar
Van Erven, T., Harremos, P.: Rényi divergence and Kullback-Leibler divergence. IEEE Trans. Inf. Theory 60(7), 3797–3820 (2014)
Article Google Scholar
Wald, A.: Sequential tests of statistical hypotheses. Ann. Math. Stat. 16(2), 117–186 (1945)
Article MathSciNet Google Scholar
Wald, A.: Sequential Analysis, 1st edn. Wiley, Hoboken (1947)
MATH Google Scholar
Wang, J., Dong, G., Sun, J., Wang, X., Zhang, P.: Adversarial sample detection for deep neural network through model mutation testing. In: Proceedings of the 41st International Conference on Software Engineering, pp. 1245–1256. IEEE Press (2019)
Google Scholar
Wang, W., Wang, R., Ke, J., Wang, L.: TextFirewall: omni-defending against adversarial texts in sentiment classification. IEEE Access 9, 27467–27475 (2021)
Article Google Scholar
Wang, X., Jin, H., He, K.: Natural language adversarial attacks and defenses in word level. arXiv preprint arXiv:1909.06723 (2019)
Xu, H., et al.: Adversarial attacks and defenses in images, graphs and text: a review. Int. J. Autom. Comput. 17(2), 151–178 (2020)
Article Google Scholar
Yang, J., Wu, M., Liu, X.Z.: Defense against adversarial attack using PCA. In: Proceedings of the 6th International Conference on Artificial Intelligence and Security, pp. 627–636 (2020)
Google Scholar
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)
Google Scholar
Zhang, H., Chen, H., Song, Z., Boning, D., Dhillon, I.S., Hsieh, C.J.: The limitations of adversarial training and the blind-spot attack. arXiv preprint arXiv:1901.04684 (2019)
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Proceedings of the 29th Annual Conference on Neural Information Processing Systems, pp. 649–657 (2015)
Google Scholar
Zheng, Z., Hong, P.: Robust detection of adversarial attacks by modeling the intrinsic properties of deep neural networks. In: Proceedings of the 32nd Annual Conference on Neural Information Processing Systems, pp. 7913–7922 (2018)
Google Scholar

Download references

Acknowledgements

This research is supported by the Key-Area Research and Development Program of Guangdong Province (Grant No.2020B0101100005), Key R&D Program of Zhejiang (2022C01018) and the NSFC Program (62102359). This research is also supported by the National Research Foundation, Singapore under its AI Singapore Programme (AISG Award No: AISG-RP-2019-012).

Author information

Authors and Affiliations

Zhejiang University, Hangzhou, China
Guoliang Dong, Jingyi Wang & Xinyu Wang
Singapore Management University, Singapore, Singapore
Jun Sun
Singapore University of Technology and Design, Singapore, Singapore
Sudipta Chattopadhyay
Huawei International Pte. Ltd., Singapore, Singapore
Ting Dai & Jie Shi
National University of Singapore, Singapore, Singapore
Jin Song Dong

Authors

Guoliang Dong
View author publications
You can also search for this author in PubMed Google Scholar
Jingyi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Sun
View author publications
You can also search for this author in PubMed Google Scholar
Sudipta Chattopadhyay
View author publications
You can also search for this author in PubMed Google Scholar
Xinyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ting Dai
View author publications
You can also search for this author in PubMed Google Scholar
Jie Shi
View author publications
You can also search for this author in PubMed Google Scholar
Jin Song Dong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jingyi Wang or Xinyu Wang .

Editor information

Editors and Affiliations

IRIT, Toulouse, France
Yamine Aït-Ameur
Babeș-Bolyai University, Cluj-Napoca, Romania
Florin Crăciun

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dong, G. et al. (2022). Repairing Adversarial Texts Through Perturbation. In: Aït-Ameur, Y., Crăciun, F. (eds) Theoretical Aspects of Software Engineering. TASE 2022. Lecture Notes in Computer Science, vol 13299. Springer, Cham. https://doi.org/10.1007/978-3-031-10363-6_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-10363-6_3
Published: 03 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10362-9
Online ISBN: 978-3-031-10363-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Repairing Adversarial Texts Through Perturbation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Chinese adversarial examples generation approach with multi-strategy based on semantic

Adversarial Text Purification: A Large Language Model Approach for Defense

Detecting textual adversarial examples through text modification on text classification systems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Repairing Adversarial Texts Through Perturbation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Chinese adversarial examples generation approach with multi-strategy based on semantic

Adversarial Text Purification: A Large Language Model Approach for Defense

Detecting textual adversarial examples through text modification on text classification systems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation