Statistics > Machine Learning

arXiv:1702.04267 (stat)

[Submitted on 14 Feb 2017 (v1), last revised 21 Feb 2017 (this version, v2)]

Title:On Detecting Adversarial Perturbations

Authors:Jan Hendrik Metzen, Tim Genewein, Volker Fischer, Bastian Bischoff

View PDF

Abstract:Machine learning and deep learning in particular has advanced tremendously on perceptual tasks in recent years. However, it remains vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system while being quasi-imperceptible to a human. In this work, we propose to augment deep neural networks with a small "detector" subnetwork which is trained on the binary classification task of distinguishing genuine data from data containing adversarial perturbations. Our method is orthogonal to prior work on addressing adversarial perturbations, which has mostly focused on making the classification network itself more robust. We show empirically that adversarial perturbations can be detected surprisingly well even though they are quasi-imperceptible to humans. Moreover, while the detectors have been trained to detect only a specific adversary, they generalize to similar and weaker adversaries. In addition, we propose an adversarial attack that fools both the classifier and the detector and a novel training procedure for the detector that counteracts this attack.

Comments:	Final version for ICLR2017 (see this https URL)
Subjects:	Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:1702.04267 [stat.ML]
	(or arXiv:1702.04267v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.1702.04267

Submission history

From: Jan Hendrik Metzen [view email]
[v1] Tue, 14 Feb 2017 15:44:26 UTC (707 KB)
[v2] Tue, 21 Feb 2017 06:53:38 UTC (707 KB)

Statistics > Machine Learning

Title:On Detecting Adversarial Perturbations

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:On Detecting Adversarial Perturbations

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators