Abstract
For a standard convolutional neural network, optimizing over the input pixels
to maximize the score of some target class will generally produce a
grainy-looking version of the original image. However, researchers have
demonstrated that for adversarially-trained neural networks, this optimization
produces images that uncannily resemble the target class. In this paper, we
show that these "perceptually-aligned gradients" also occur under randomized
smoothing, an alternative means of constructing adversarially-robust
classifiers. Our finding suggests that perceptually-aligned gradients may be a
general property of robust classifiers, rather than a specific property of
adversarially-trained neural networks. We hope that our results will inspire
research aimed at explaining this link between perceptually-aligned gradients
and adversarial robustness.
Users
Please
log in to take part in the discussion (add own reviews or comments).