Abstract
We show that there exists an inherent tension between the goal of adversarial
robustness and that of standard generalization. Specifically, training robust
models may not only be more resource-consuming, but also lead to a reduction of
standard accuracy. We demonstrate that this trade-off between the standard
accuracy of a model and its robustness to adversarial perturbations provably
exists even in a fairly simple and natural setting. These findings also
corroborate a similar phenomenon observed in practice. Further, we argue that
this phenomenon is a consequence of robust classifiers learning fundamentally
different feature representations than standard classifiers. These differences,
in particular, seem to result in unexpected benefits: the representations
learned by robust models tend to align better with salient data characteristics
and human perception.
Users
Please
log in to take part in the discussion (add own reviews or comments).