Adversarial Attack on Sentiment Classification

Yi-Ting (Alicia) Tsai, Min-Chu Yang, Han-Yu Chen

Abstract

In this paper, we propose a white-box attack algorithm called “Global Search” method and compare it with a simple misspelling noise and a more sophisticated and common white-box attack approach called “Greedy Search”. The attack methods are evaluated on the Convolutional Neural Network (CNN) sentiment classifier trained on the IMDB movie review dataset. The attack success rate is used to evaluate the effectiveness of the attack methods and the perplexity of the sentences is used to measure the degree of distortion of the generated adversarial examples. The experiment results show that the proposed “Global Search” method generates more powerful adversarial examples with less distortion or less modification to the source text.

Anthology ID:: W19-3653
Volume:: Proceedings of the 2019 Workshop on Widening NLP
Month:: August
Year:: 2019
Address:: Florence, Italy
Editors:: Amittai Axelrod, Diyi Yang, Rossana Cunha, Samira Shaikh, Zeerak Waseem
Venue:: WiNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 166–173
Language:
URL:: https://aclanthology.org/W19-3653
DOI:
Bibkey:
Cite (ACL):: Yi-Ting (Alicia) Tsai, Min-Chu Yang, and Han-Yu Chen. 2019. Adversarial Attack on Sentiment Classification. In Proceedings of the 2019 Workshop on Widening NLP, pages 166–173, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):: Adversarial Attack on Sentiment Classification (Tsai et al., WiNLP 2019)
Copy Citation:

Cite Search