Computer Science > Computation and Language

arXiv:2211.04364 (cs)

[Submitted on 8 Nov 2022]

Title:NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as Artificial Adversaries?

Authors:Saadia Gabriel, Hamid Palangi, Yejin Choi

View PDF

Abstract:While a substantial body of prior work has explored adversarial example generation for natural language understanding tasks, these examples are often unrealistic and diverge from the real-world data distributions. In this work, we introduce a two-stage adversarial example generation framework (NaturalAdversaries), for designing adversaries that are effective at fooling a given classifier and demonstrate natural-looking failure cases that could plausibly occur during in-the-wild deployment of the models.
At the first stage a token attribution method is used to summarize a given classifier's behaviour as a function of the key tokens in the input. In the second stage a generative model is conditioned on the key tokens from the first stage. NaturalAdversaries is adaptable to both black-box and white-box adversarial attacks based on the level of access to the model parameters. Our results indicate these adversaries generalize across domains, and offer insights for future research on improving robustness of neural text classification models.

Comments:	Findings of EMNLP 2022
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2211.04364 [cs.CL]
	(or arXiv:2211.04364v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2211.04364

Submission history

From: Saadia Gabriel [view email]
[v1] Tue, 8 Nov 2022 16:37:34 UTC (8,110 KB)

Computer Science > Computation and Language

Title:NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as Artificial Adversaries?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as Artificial Adversaries?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators