Zusammenfassung
We show that there exists an inherent tension between the goal of adversarial
robustness and that of standard generalization. Specifically, training robust
models may not only be more resource-consuming, but also lead to a reduction of
standard accuracy. We demonstrate that this trade-off between the standard
accuracy of a model and its robustness to adversarial perturbations provably
exists even in a fairly simple and natural setting. These findings also
corroborate a similar phenomenon observed in practice. Further, we argue that
this phenomenon is a consequence of robust classifiers learning fundamentally
different feature representations than standard classifiers. These differences,
in particular, seem to result in unexpected benefits: the representations
learned by robust models tend to align better with salient data characteristics
and human perception.
Nutzer