Abstract
We study adversarial perturbations when the instances are uniformly
distributed over $\0,1\^n$. We study both "inherent" bounds that apply to any
problem and any classifier for such a problem as well as bounds that apply to
specific problems and specific hypothesis classes.
As the current literature contains multiple definitions of adversarial risk
and robustness, we start by giving a taxonomy for these definitions based on
their goals, we identify one of them as the one guaranteeing misclassification
by pushing the instances to the error region. We then study some classic
algorithms for learning monotone conjunctions and compare their adversarial
risk and robustness under different definitions by attacking the hypotheses
using instances drawn from the uniform distribution. We observe that sometimes
these definitions lead to significantly different bounds. Thus, this study
advocates for the use of the error-region definition, even though other
definitions, in other contexts, may coincide with the error-region definition.
Using the error-region definition of adversarial perturbations, we then study
inherent bounds on risk and robustness of any classifier for any classification
problem whose instances are uniformly distributed over $\0,1\^n$. Using the
isoperimetric inequality for the Boolean hypercube, we show that for initial
error $0.01$, there always exists an adversarial perturbation that changes
$O(n)$ bits of the instances to increase the risk to $0.5$, making
classifier's decisions meaningless. Furthermore, by also using the central
limit theorem we show that when $nınfty$, at most $c n$ bits
of perturbations, for a universal constant $c< 1.17$, suffice for increasing
the risk to $0.5$, and the same $c n $ bits of perturbations on
average suffice to increase the risk to $1$, hence bounding the robustness by
$c n$.
Description
[1810.12272] Adversarial Risk and Robustness: General Definitions and Implications for the Uniform Distribution
Links and resources
Tags
community