Google Scholar

Denoised smoothing: A provable defense for pretrained classifiers

H Salman, M Sun, G Yang… - Advances in Neural …, 2020 - proceedings.neurips.cc

H Salman, M Sun, G Yang, A Kapoor, JZ Kolter

Advances in Neural Information Processing Systems, 2020•proceedings.neurips.cc

We present a method for provably defending any pretrained image classifier against $\ell_p
$ adversarial attacks. This method, for instance, allows public vision API providers and users
to seamlessly convert pretrained non-robust classification services into provably robust
ones. By prepending a custom-trained denoiser to any off-the-shelf image classifier and
using randomized smoothing, we effectively create a new classifier that is guaranteed to be
$\ell_p $-robust to adversarial examples, without modifying the pretrained classifier. Our …

Abstract

We present a method for provably defending any pretrained image classifier against adversarial attacks. This method, for instance, allows public vision API providers and users to seamlessly convert pretrained non-robust classification services into provably robust ones. By prepending a custom-trained denoiser to any off-the-shelf image classifier and using randomized smoothing, we effectively create a new classifier that is guaranteed to be -robust to adversarial examples, without modifying the pretrained classifier. Our approach applies to both the white-box and the black-box settings of the pretrained classifier. We refer to this defense as denoised smoothing, and we demonstrate its effectiveness through extensive experimentation on ImageNet and CIFAR-10. Finally, we use our approach to provably defend the Azure, Google, AWS, and ClarifAI image classification APIs. Our code replicating all the experiments in the paper can be found at: https://github. com/microsoft/denoised-smoothing.

proceedings.neurips.cc

Show moreShow less

Save Cite Cited by 176 Related articles All 5 versions View as HTML

Cite

Advanced search

Saved to My library

Denoised smoothing: A provable defense for pretrained classifiers