Nothing Special   »   [go: up one dir, main page]

Skip to main content

Repairing Adversarial Texts Through Perturbation

  • Conference paper
  • First Online:
Theoretical Aspects of Software Engineering (TASE 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13299))

Included in the following conference series:

Abstract

It is known that neural networks are subject to attacks through adversarial perturbations. Worse yet, such attacks are impossible to eliminate, i.e., the adversarial perturbation is still possible after applying mitigation methods such as adversarial training. Multiple approaches have been developed to detect and reject such adversarial inputs. Rejecting suspicious inputs however may not be always feasible or ideal. First, normal inputs may be rejected due to false alarms generated by the detection algorithm. Second, denial-of-service attacks may be conducted by feeding such systems with adversarial inputs. To address this, in this work, we focus on the text domain and propose an approach to automatically repair adversarial texts at runtime. Given a text which is suspected to be adversarial, we novelly apply multiple adversarial perturbation methods in a positive way to identify a repair, i.e., a slightly mutated but semantically equivalent text that the neural network correctly classifies. Experimental results show that our approach effectively repairs about 80% of adversarial texts. Furthermore, depending on the applied perturbation method, an adversarial text could be repaired about one second on average.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/dgl-prc/text-repair.

  2. 2.

    https://nlp.stanford.edu/projects/glove/.

  3. 3.

    https://radimrehurek.com/gensim/.

  4. 4.

    http://api.fanyi.baidu.com/api/trans/product/index.

  5. 5.

    https://pypi.org/project/autocorrect/.

References

  1. Agarwal, A., Singh, R., Vatsa, M., Ratha, N.K.: Image transformation based defense against adversarial perturbation on deep learning models. IEEE Trans. Dependable Secure Comput. 15(5), 2106–2121 (2021)

    Google Scholar 

  2. Akhtar, N., Liu, J., Mian, A.: Defense against universal adversarial perturbations. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3389–3398 (2018)

    Google Scholar 

  3. Anscombe, F.J.: Fixed-sample-size analysis of sequential observations. Biometrics 10(1), 89–100 (1954)

    Article  Google Scholar 

  4. Athalye, A., Carlini, N., Wagner, D.A.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: Proceedings of the 35th International Conference on Machine Learning, pp. 274–283 (2018)

    Google Scholar 

  5. Avriel, M., Wilde, D.J.: Optimally proof for the symmetric Fibonacci search technique. Fibonacci Q. J. (1966)

    Google Scholar 

  6. Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: Proceedings of the 38th IEEE Symposium on Security and Privacy, pp. 39–57 (2017)

    Google Scholar 

  7. Carlini, N., Wagner, D.A.: Audio adversarial examples: targeted attacks on speech-to-text. In: Proceedings of the 39th IEEE Symposium on Security and Privacy Workshops, pp. 1–7 (2018)

    Google Scholar 

  8. Das, N., et al.: Keeping the bad guys out: protecting and vaccinating deep learning with JPEG compression. CoRR abs/1705.02900 (2017)

    Google Scholar 

  9. Dhillon, G.S., et al.: Stochastic activation pruning for robust adversarial defense. In: Proceedings of the 6th International Conference on Learning Representations (2018)

    Google Scholar 

  10. Dos Santos, C., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers, pp. 69–78 (2014)

    Google Scholar 

  11. Dua, D., Graff, C.: UCI machine learning repository (2017)

    Google Scholar 

  12. Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: HotFlip: white-box adversarial examples for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp. 31–36 (2018)

    Google Scholar 

  13. Gao, J., Lanchantin, J., Soffa, M.L., Qi, Y.: Black-box generation of adversarial text sequences to evade deep learning classifiers. In: Proceedings of the 39th IEEE Symposium on Security and Privacy Workshops, pp. 50–56 (2018)

    Google Scholar 

  14. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples (2015)

    Google Scholar 

  15. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  16. Hosseini, H., Kannan, S., Zhang, B., Poovendran, R.: Deceiving Google’s perspective API built for detecting toxic comments. arXiv preprint arXiv:1702.08138 (2017)

  17. Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., Madry, A.: Adversarial examples are not bugs, they are features. In: Proceedings of the 33rd Annual Conference on Neural Information Processing Systems, pp. 125–136 (2019)

    Google Scholar 

  18. Iyyer, M., Wieting, J., Gimpel, K., Zettlemoyer, L.: Adversarial example generation with syntactically controlled paraphrase networks. arXiv preprint arXiv:1804.06059 (2018)

  19. Jain, G., Sharma, M., Agarwal, B.: Spam detection in social media using convolutional and long short term memory neural network. Ann. Math. Artif. Intell. 85(1), 21–44 (2019)

    Article  Google Scholar 

  20. Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence, pp. 8018–8025 (2020)

    Google Scholar 

  21. Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1746–1751 (2014)

    Google Scholar 

  22. Li, J., Ji, S., Du, T., Li, B., Wang, T.: TextBugger: generating adversarial text against real-world applications. In: Proceedings of the 26th Annual Network and Distributed System Security Symposium (2019)

    Google Scholar 

  23. Liu, J., et al.: Detection based defense against adversarial examples from the steganalysis point of view. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4825–4834 (2019)

    Google Scholar 

  24. Lu, J., Issaranon, T., Forsyth, D.: SafetyNet: detecting and rejecting adversarial examples robustly. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 446–454 (2017)

    Google Scholar 

  25. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: Proceedings of the 6th International Conference on Learning Representations (2018)

    Google Scholar 

  26. McKeeman, W.M.: Differential testing for software. Digit. Tech. J. 10(1), 100–107 (1998)

    Google Scholar 

  27. Meng, D., Chen, H.: MagNet: a two-pronged defense against adversarial examples. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 135–147 (2017)

    Google Scholar 

  28. Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124. Association for Computational Linguistics (2005)

    Google Scholar 

  29. Papernot, N., McDaniel, P., Swami, A., Harang, R.: Crafting adversarial input sequences for recurrent neural networks. In: Proceedings of the 2016 IEEE Military Communications Conference, pp. 49–54 (2016)

    Google Scholar 

  30. Pruthi, D., Dhingra, B., Lipton, Z.C.: Combating adversarial misspellings with robust word recognition, pp. 5582–5591 (2019)

    Google Scholar 

  31. Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32

    Chapter  Google Scholar 

  32. Ren, Y., Ji, D.: Neural networks for deceptive opinion spam detection: an empirical study. Inf. Sci. 385, 213–224 (2017)

    Article  Google Scholar 

  33. Ribeiro, M.T., Singh, S., Guestrin, C.: Semantically equivalent adversarial rules for debugging NLP models. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 856–865 (2018)

    Google Scholar 

  34. Rosenberg, I., Shabtai, A., Elovici, Y., Rokach, L.: Defense methods against adversarial examples for recurrent neural networks. arXiv preprint arXiv:1901.09963 (2019)

  35. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)

    Article  Google Scholar 

  36. Segler, M.H., Preuss, M., Waller, M.P.: Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555(7698), 604 (2018)

    Article  Google Scholar 

  37. Shafahi, A., et al.: Adversarial training for free! In: Proceedings of the 33rd Annual Conference on Neural Information Processing Systems, pp. 3353–3364 (2019)

    Google Scholar 

  38. Shaffer, J.P.: Multiple hypothesis testing. Annu. Rev. Psychol. 46(1), 561–584 (1995)

    Article  Google Scholar 

  39. Song, Y., Kim, T., Nowozin, S., Ermon, S., Kushman, N.: PixelDefend: leveraging generative models to understand and defend against adversarial examples. In: Proceedings of the 6th International Conference on Learning Representations (2018)

    Google Scholar 

  40. Szegedy, C., et al.: Intriguing properties of neural networks. In: Proceedings of the 2nd International Conference on Learning Representations (2014)

    Google Scholar 

  41. Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1422–1432 (2015)

    Google Scholar 

  42. Van Erven, T., Harremos, P.: Rényi divergence and Kullback-Leibler divergence. IEEE Trans. Inf. Theory 60(7), 3797–3820 (2014)

    Article  Google Scholar 

  43. Wald, A.: Sequential tests of statistical hypotheses. Ann. Math. Stat. 16(2), 117–186 (1945)

    Article  MathSciNet  Google Scholar 

  44. Wald, A.: Sequential Analysis, 1st edn. Wiley, Hoboken (1947)

    MATH  Google Scholar 

  45. Wang, J., Dong, G., Sun, J., Wang, X., Zhang, P.: Adversarial sample detection for deep neural network through model mutation testing. In: Proceedings of the 41st International Conference on Software Engineering, pp. 1245–1256. IEEE Press (2019)

    Google Scholar 

  46. Wang, W., Wang, R., Ke, J., Wang, L.: TextFirewall: omni-defending against adversarial texts in sentiment classification. IEEE Access 9, 27467–27475 (2021)

    Article  Google Scholar 

  47. Wang, X., Jin, H., He, K.: Natural language adversarial attacks and defenses in word level. arXiv preprint arXiv:1909.06723 (2019)

  48. Xu, H., et al.: Adversarial attacks and defenses in images, graphs and text: a review. Int. J. Autom. Comput. 17(2), 151–178 (2020)

    Article  Google Scholar 

  49. Yang, J., Wu, M., Liu, X.Z.: Defense against adversarial attack using PCA. In: Proceedings of the 6th International Conference on Artificial Intelligence and Security, pp. 627–636 (2020)

    Google Scholar 

  50. Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)

    Google Scholar 

  51. Zhang, H., Chen, H., Song, Z., Boning, D., Dhillon, I.S., Hsieh, C.J.: The limitations of adversarial training and the blind-spot attack. arXiv preprint arXiv:1901.04684 (2019)

  52. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Proceedings of the 29th Annual Conference on Neural Information Processing Systems, pp. 649–657 (2015)

    Google Scholar 

  53. Zheng, Z., Hong, P.: Robust detection of adversarial attacks by modeling the intrinsic properties of deep neural networks. In: Proceedings of the 32nd Annual Conference on Neural Information Processing Systems, pp. 7913–7922 (2018)

    Google Scholar 

Download references

Acknowledgements

This research is supported by the Key-Area Research and Development Program of Guangdong Province (Grant No.2020B0101100005), Key R&D Program of Zhejiang (2022C01018) and the NSFC Program (62102359). This research is also supported by the National Research Foundation, Singapore under its AI Singapore Programme (AISG Award No: AISG-RP-2019-012).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jingyi Wang or Xinyu Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dong, G. et al. (2022). Repairing Adversarial Texts Through Perturbation. In: Aït-Ameur, Y., Crăciun, F. (eds) Theoretical Aspects of Software Engineering. TASE 2022. Lecture Notes in Computer Science, vol 13299. Springer, Cham. https://doi.org/10.1007/978-3-031-10363-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-10363-6_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-10362-9

  • Online ISBN: 978-3-031-10363-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics