Abstract
Machine learning models are often susceptible to adversarial perturbations of
their inputs. Even small perturbations can cause state-of-the-art classifiers
with high "standard" accuracy to produce an incorrect prediction with high
confidence. To better understand this phenomenon, we study adversarially robust
learning from the viewpoint of generalization. We show that already in a simple
natural data model, the sample complexity of robust learning can be
significantly larger than that of "standard" learning. This gap is information
theoretic and holds irrespective of the training algorithm or the model family.
We complement our theoretical results with experiments on popular image
classification datasets and show that a similar gap exists here as well. We
postulate that the difficulty of training robust classifiers stems, at least
partially, from this inherently larger sample complexity.
Users
Please
log in to take part in the discussion (add own reviews or comments).